Library Day in the Life - 5 - Day 1 - ORIGINAL TITLE!

So here we are again. Library Day in the Life number five! Monday is a work from home day. No audacious commute from Toronto to Hamilton today!

Morning:

Morning soundtrack: BBC World Service podcast, TWiT - It's The New Sex Talk, CBC Spark Daniel Pink on Motivation 3.0, The Protomen - The Protomen

Catch up a bunch of email from last week, and finally got around to setting up Drush. Don't know why I never got around to it before, but it definitely worth the time of checking out if you manage a few Drupal sites. Watched a couple of screencasts on Drush by CivicActions to quickly immerse myself, then got around to updating modules for our dev site. Once everything was up to snuff, I started working on the a cuple of our final functional requirements for the new version of our digital collections site before we start theming it; allowing each record to have its own Dublin Core XML output and adding some Dublin Core meta information to each record's header html output. Mind you, I am a horrible programmer.

The header output code was pulled mostly from this Computed Field php snippet example. I managed to get DC.title, DC.date.created and DC.Date.X-MetadataLastModified working correctly, but the rest of the elements (descriptions, source, format, etc) were another beast entirely. I put off the Dublin Core XML until later in the day when I could rely one of our programmers for assistance, because mind you, I am a horrible programmer.







Afternoon:

Afternoon soundtrack: Squarepusher - Hello Everything, Squarepusher - Just a Souvenir, Daft Punk - Discovery, Film Junk - Inception (spoilers portion of the podcast), BBC World Service podcast

Thought out the spec a lot more for the Dublin Core XML. Decided not to use CCK Computed Fields to make it happened. Don't know why I was thinking it would work, but one of those square peg in a round hold things. Contrary self - I could just make the peg round. Brainstormed a lot more with Matt (one our dynamic duo of programmers) on the Dublin Core XML idea. We agreed we just create a quick module to handle creating the XML. This will be our first custom work with the new version of the site. Due to many problems with the last iteration, while current production version, I wanted to move as far away from custom code as possible and we have been doing very well. But, this makes sense... maybe. There is always a million ways to solve something like this. Maybe tomorrow it will just be a View with a php snippet.

In the background of all wretched coding on my part, I was again working with my favourite module - Views Bulk Operations (VBO)!!! With the first iteration of the site, we made a couple of decisions that I have come to regret. They are not earth shattering or anything, just didn't setup some of the metadata fields how I would have liked them to be setup. For quite sometime I've been trying to thing about an easy way to merge some of them together. Epic mysql query dreams! JOIN, JOIN, INSERT, UPDATE, WHERE, BLERG! Anyway, some wonderful soul wrote a merge fields action for VBO! So, in the background all of today's work, I updated 14559 rows, a couple of times. It only took an average of 12153468ms each time!

Oh yeah, email was answered. Spheroidally.

library clouds in the sky with [diamonds]?

Bacon...

Sorry, had to get that of the way. Those who know, know. Those who do not, oh well. I will address it later... subtly???

Awhile back we got hit with the perfect-downtime-storm. A RAID controller battery randomly failed, and I was down for quite a few hours. Then a day or two later ... a brown-out occurred. Somehow, some way, this killed the brand new RAID controller on the DB server, and disemboweled the RAID controller on the web server. I was down for almost a week awaiting repairs vendors and IT. During this period of utter embarrassment and fury, I finally took somebody up on a long-standing offer to put all of my digital collections stuff on a BEEFCAKE server. I ordered my [twin node] BEEFCAKE and decided that high availability and redundancy was the way to go.

So, I began building a proof of concept: Tomax & Xamot [LAMP with hint of wonderful Tomcat, Java, Solr, and Djatoka for blooming ideas] are my sinister production machines with Heckle & Jeckle [HAProxy & KeepAlive] providing the load balancing. After many hours, the proof of concept succeeded. Kill apache and/or mysql on Tomax, Xamot will be right there still fighting for the Cobra Commander.

I've been sitting on BEEFCAKE for a week or so, almost ready to go to production. But for the last week, I have been diligent with my 99-part hearty diet of bacn, Batman, Green Lantern Corp, and Promethea. Combined with the nicotine patch, my head has been in the clouds - in a good way. I was pretty undecided about the Cloud for a long time and Stallman's talk at U. of T. threw me even farther to one side of the fence {GPL loophole], but Fink's idea-machine-brain rambling on about creating a Cloud at LibMac (another possible proof of concept) started turning gears. (Side note, Fink is more rabid about Open Source than I). The collections within the Digital Collections, (namely PW20C, Concentration Camp Correspondences, Bertrand Russell, Canadian Publishing, et al) are sitting on a fair chuck of metadata begging for something to be done with it. Add that to the Mass Digitization Project (DC, METS/ALTO, and fingers crossed TEI), and EVERGREEN!!! Oh what to do, what do to???

Drupal & Digital Collection Sites - 1

I have written about Drupal & the Digital Collections site (http://digitalcollections.mcmaster.ca) a few times now, but haven't really explained how to make a digital collections site out of Drupal. So, without further ado...

What are the necessities of a digital collections site?

What are some additional features that have become necessary?

  • Tagging
  • Social Bookmarking
  • Faceted Searching
  • Visually rich environment
  • Profiles, internal site bookmarking
  • Contact forms, Image requests, Questions
  • Commenting
  • Content Recommendation

So how do you do all of this with Drupal - sans JPEG2000 support (working on that now)? Well, if you are familiar with Drupal, you should know that it is an open source, modular content management system with an amazing support & development community. A standard out of the box Drupal installation will not yield a digital collections site - additional modules are absolutely necessary. Time, effort, and some coding with have to be done, but it is well worth it. The key to all of it is the Content Construction Kit (CCK). Briefly, CCK allows you to create your own fields for a node. So, here is where we get the ability to have all your standard Dublin Core fields, and any other unique metadata a collection will need to be able to present. What I have done with my site is setup a Content Type for each collection. Each content type shares the standard Dublin Core fields (very helpful for massaging an OAI module for digital collections out of an available OAI module), then they have their own unique additional metadata. For example, the World War II German Concentration Camp and Prison Camp Collection has metadata fields for Prison, Sub-Prison, Prison Block, etc.

I have written about the OAI module a couple of times, but essentially what I did is take the OAI-PMH Module, which is an interface for the Bibliography Module, and rework it so it interfaces with the CCK fields I created for the standard Dublin Core fields. I have not had the time to generalize it, (I hope to in the future if time is willing!) so it is hard coded to my collections right now.

Searching is a built in feature of Drupal. Drupal does a pretty good job of creating a search index for itself, as well as advanced searching features. With content types for each collection, users can limit their search to a specific collection or a site wide search.

Browsing a collection can be done by setting up categories and containers for a collections, then placing each record under a specific collection when creating the records or doing a massive mysql update query if you have imported a number of records to start with. Also, for custom browsable options I have used the Views modules to create views for specific metadata fields, and limited them to a collection. Also, the Faceted Search module allows you list all of the fields you would like exposed to faceted search, thereby allowing a user to browse by a variety of field types.

Not too much to say about JPEG2000 support right now. There are two possible scenarios that I am brainstorming with. The first one is Lizard Tech. Before I started here, the Library had purchased a Lizard Tech Express Server license in order to display the mrsid images for the World War I trench maps. The new version of the Lizard Tech server supports JPEG2000, and has an API that I should be able to get Drupal to work with - fingers crossed! The other option is the aDORe djatoka open source JPEG2000 server. I planned on working on this at the Access 2008 Hackfest, but got distracted with SOPAC and Evergreen.

So, now for the rest - additional features...

Tagging is done with the Community Tags module, and tag clouds are created with Tagadelic.

Social Bookmarking is done with the

Faceted Searching is done with the Faceted Search module.

Visually rich environment is done with a variety of modules and custom template coding. Modules that assist in making this possible include; Views (and many views sub-modules), Zen Theme, jquery, Highslide, and Tabs & Quicktabs.

Profiles, internal site bookmarking... user accounts are a standard feature of any content management system. With Drupal we used a custom view and a user hook to allow registered users to bookmark any record to their account.

Contact forms, Image requests, Questions is done with the Contact Form module. Here users can ask questions about records, request images, a report and problems with the site or records.

Commenting is another build in feature of Drupal. Comments are allowed on every record on the site. Unregistered/Anonymous users have to deal with a CAPTCHA, where as registered users do not.

Content Recommendation is done with the Content Recommendation Engine (CRE). This modules interfaces with a number of other modules. The main one that I utilized is the Voting API. The Voting API combined with the CRE allows for a digg like feature on each record. Each record has a Curate It! link, items that have been "curated" are then featured on the Items Curated Page. Drupal also has a popular content feature as well that I utilize.

So, that is pretty much it for the bullet points listed above. I will have another post or two about Drupal in digital collections. Once featuring the all of the modules that I take advantage of, and another covering any questions anybody has.