Library day in the life - 5 - Day 4

Wow, day 4 already. This week seems to be going by fast. Worked from home for a bit this morning and then took the train in again. Email this week has been miraculously low. Probably from all the moves. The 5th floor eerily empty, absolutely bizarre up there now.


Morning soundtrack: BBC World Service podcast, Aphex Twin - Druqks, Metallica - Master of Puppets. (BTW, did you know Metallica did not produce any records after ...And Justice for All. All the other ones are an urban legend.)

More merges. Down to one standard subject field. Nearly down to one relation field, and coverage field. Should be done with that by the end of the day. Thank you again View Bulk Operations!

Finished pulling quotes together for the potential Crombie family grant.

Lunch with a colleague. Finally get to collect on my World Cup bet. ¡¡¡ESPAÑA!!!


Afternoon soundtrack: nothing. absolutely nothing.

Thursday afternoons bring me down to the William Ready Division of Archives and Research Collections where I work a reference desk shift once a week. I'm one of the few librarians who still work a help desk shift here. Not saying that it is a bad thing, just still getting used to the idea of librarians not on the help desk.

Same old afternoon brief routine, checked servers for updates, checked drupal module updates, ran updates. New to the routine, committed all the commits from yesterday and updated the MUALA Bargaining Updates blog.

Read the press release put out by Sky River for their antitrust lawsuit again OCLC. This one may be interesting. Meanwhile, merges were constantly running in the background. Almost done.

Found an autographed first edition of Charles Bukowski's "It catches my heart in its hands : new and selected poems 1955-1963" down in research collections during my shift.

blog image
blog image

Library day in the life - 5 - Day 3

Day three started off with a Go Train that decided to arrive 20 minutes late. Three cheers for mass transit. The delay was a good thing, it gave me 20 extra minutes and I was able to finish Calvino's, "Six Memos for the Next Millennium."


Morning soundtrack: BBC World Service podcast, Search Engine - Trolling 101, Funkstörung - Appendix

In the trenches of morning emails. ILL requests for theses to be made open access, therefore said theses are made open access. Hooray open access! In the background queued up a few merges. Wait. Wait. Wait. Called Bepress support to work through some workflow issues with electronic submissions of theses and dissertations with graduate studies. Very, Very close to moving toward complete electronic submission of theses and dissertations!!! Lunch in my office, at my desk, as per usual.


Afternoon soundtrack: Funkstörung - Appetite For Discstruction, Plaid - Spokes, Quinoline Yellow - Cyriack Parasol, Telefon Tel Aviv - Map of What is Effortless

More hacking at the Dublin Core html headers. Error. No output. OMG, Output! Not the right output. *FACEPALM* $creators != $creator. Pay attention to your variable names and sometimes you have to explicitly iterate through your arrays kids! (Thanks Matt!) Sloppy code below. Checked server logs, ran server updates, and downloaded and installed drupal module updates on the dev server to round out the afternoon. Chaos Tools had quite a bit of new svn adds. No commits since the svn repository disappeared for a bit with an office move :( ADVERTISEMENT: Check out my significant other's blog if you are interested in what library day in the life is for a public children's librarian.

global $base_url;

// The path of the node

if($node->path) $node_path=$node->path;


$dc[] = '';
$dc[] = '';

//DC TERM - Creator
foreach(element_children($node->field_creator) as $key) {
$creators[]= $node->field_creator[$key]['value'];
foreach($creators as $creator) {
$dc[] = '';

//DC TERM - subject
foreach(element_children($node->field_subject) as $key) {
$subjects[]= $node->field_subject[$key]['value'];
foreach($subjects as $subject) {
$dc[] = '';

//DC TERM - description
foreach(element_children($node->field_description) as $key) {
$descriptions[]= $node->field_description[$key]['value'];
foreach($descriptions as $description) {
$dc[] = '';

//DC TERM - publisher
foreach(element_children($node->field_publisher) as $key) {
$publishers[]= $node->field_publisher[$key]['value'];
foreach($publishers as $publisher) {
$dc[] = '';

//DC TERM - contributor
foreach(element_children($node->field_contributor) as $key) {
$contributors[]= $node->field_contributor[$key]['value'];
foreach($contributors as $contributor) {
$dc[] = '';

//DC TERM - Date
foreach(element_children($node->field_date) as $key) {
$dates[]= $node->field_date[$key]['value'];
foreach($dates as $date) {
$dc[] = '';

//DC TERM - Date
foreach(element_children($node->field_date2) as $key) {
$date2s[]= $node->field_date2[$key]['value'];
foreach($date2s as $date2) {
$dc[] = '';

//DC TERM - type
foreach(element_children($node->field_type) as $key) {
$types[]= $node->field_type[$key]['value'];
foreach($types as $type) {
$dc[] = '';

//DC TERM - format
foreach(element_children($node->field_format) as $key) {
$formats[]= $node->field_format[$key]['value'];
foreach($formats as $format) {
$dc[] = '';

//DC TERM - Identifier
foreach(element_children($node->field_identifier) as $key) {
$identifiers[]= $node->field_identifier[$key]['value'];
foreach($identifiers as $identifier) {
$dc[] = '';

//DC TERM - Language
foreach(element_children($node->field_language) as $key) {
$languages[]= $node->field_language[$key]['value'];
foreach($languages as $language) {
$dc[] = '';

//DC TERM - Relation
foreach(element_children($node->field_relation) as $key) {
$relations[]= $node->field_relation[$key]['value'];
foreach($relations as $relation) {
$dc[] = '';

//DC TERM - Source
foreach(element_children($node->field_source) as $key) {
$sources[]= $node->field_source[$key]['value'];
foreach($sources as $source) {
$dc[] = '';

//DC TERM - Coverage
foreach(element_children($node->field_coverage) as $key) {
$coverages[]= $node->field_coverage[$key]['value'];
foreach($coverages as $coverage) {
$dc[] = '';

//DC TERM - Rights
foreach(element_children($node->field_right) as $key) {
$rights[]= $node->field_right[$key]['value'];
foreach($rights as $right) {
$dc[] = '';

$created = strftime("%Y-%m-%d %H:%M:%S +01:00", $node->created);
$changed = strftime("%Y-%m-%d %H:%M:%S +01:00", $node->changed);
$dc_created = strftime("%Y-%m-%d", $node->created);
$dc_changed = strftime("%Y-%m-%d", $node->changed);

if($created) {
$dc[] = '';
$meta[] = '';
if($changed) {
$dc[] = '';
$dc[] = '';
$meta[] = '';

$node_field[0]['value'] = implode("\n", $meta) . "\n" . implode("\n", $dc) . "\n";

Library day in the life - 5 - Day 2

Here goes day 2! Tuesday is generally my first day of the week physically at work, which generally means that I have lots of meetings. Thankfully i did not have an immediate morning meeting.


Morning soundtrack: Software Freedom Law Center - Episode 0x2C: Eben on Software Liability, Adult. - Resuscitation

Spent the entire morning working on more merges and trying to hunt down some expected deliverable item from a vendor. Getting close to being complete with all of the merges. Once it is complete, Apache SOLR will be all the more happy, as will I. We'll have some very nice facets setup for our new SOLR powered search on digital collections.

Our dynamic duo of programmers both came through on interesting developments this morning as well. Debbie finished our retroactive date conversions. Lots of regular expressions!!! We were able to convert 10600 records to machine readable date ranges. Absolutely fantastic for SOLR faceting, sorting, and all the other fun stuff you can do with actual machine readable date data!!! Matt continued hacking away at the Dublin Core XML module. He even managed to create a singularity this morning; something that not even user 1 can access. ACCESS DENIED!

Lunch time brought about a union meeting. Hooray for MUALA!!!


Afternoon soundtrack: Venetian Snares - Rossz Csillag Allat Szuletett, GZA/Genius - Liquid Swords

ARL-ACRL webinar entitled, "Transitioning from Subscriptions to Open Access: Article Processing Fees and Combined Licensing/Author’s Rights Approaches." Pretty good, but at the same time preaching to the choir.

Pulled some quotes together for equipment for a potential grant. Fingers crossed!!!

Published another 100 or so open access theses and dissertations, and ran some batch conversions on some images. I <3 mogrify.

mogrify -format jpg *.tif

Hopefully tomorrow I can get back to some wretched coding and finish up a couple of outstanding items. Aphoristical.

Library Day in the Life - 5 - Day 1 - ORIGINAL TITLE!

So here we are again. Library Day in the Life number five! Monday is a work from home day. No audacious commute from Toronto to Hamilton today!


Morning soundtrack: BBC World Service podcast, TWiT - It's The New Sex Talk, CBC Spark Daniel Pink on Motivation 3.0, The Protomen - The Protomen

Catch up a bunch of email from last week, and finally got around to setting up Drush. Don't know why I never got around to it before, but it definitely worth the time of checking out if you manage a few Drupal sites. Watched a couple of screencasts on Drush by CivicActions to quickly immerse myself, then got around to updating modules for our dev site. Once everything was up to snuff, I started working on the a cuple of our final functional requirements for the new version of our digital collections site before we start theming it; allowing each record to have its own Dublin Core XML output and adding some Dublin Core meta information to each record's header html output. Mind you, I am a horrible programmer.

The header output code was pulled mostly from this Computed Field php snippet example. I managed to get DC.title, and DC.Date.X-MetadataLastModified working correctly, but the rest of the elements (descriptions, source, format, etc) were another beast entirely. I put off the Dublin Core XML until later in the day when I could rely one of our programmers for assistance, because mind you, I am a horrible programmer.


Afternoon soundtrack: Squarepusher - Hello Everything, Squarepusher - Just a Souvenir, Daft Punk - Discovery, Film Junk - Inception (spoilers portion of the podcast), BBC World Service podcast

Thought out the spec a lot more for the Dublin Core XML. Decided not to use CCK Computed Fields to make it happened. Don't know why I was thinking it would work, but one of those square peg in a round hold things. Contrary self - I could just make the peg round. Brainstormed a lot more with Matt (one our dynamic duo of programmers) on the Dublin Core XML idea. We agreed we just create a quick module to handle creating the XML. This will be our first custom work with the new version of the site. Due to many problems with the last iteration, while current production version, I wanted to move as far away from custom code as possible and we have been doing very well. But, this makes sense... maybe. There is always a million ways to solve something like this. Maybe tomorrow it will just be a View with a php snippet.

In the background of all wretched coding on my part, I was again working with my favourite module - Views Bulk Operations (VBO)!!! With the first iteration of the site, we made a couple of decisions that I have come to regret. They are not earth shattering or anything, just didn't setup some of the metadata fields how I would have liked them to be setup. For quite sometime I've been trying to thing about an easy way to merge some of them together. Epic mysql query dreams! JOIN, JOIN, INSERT, UPDATE, WHERE, BLERG! Anyway, some wonderful soul wrote a merge fields action for VBO! So, in the background all of today's work, I updated 14559 rows, a couple of times. It only took an average of 12153468ms each time!

Oh yeah, email was answered. Spheroidally.

Digitized books into the IR - workflow

This past week, we started depositing digitized books into our institutional repository instance for The McMaster Collection. As of this posting we have 216 books in the collection. However, currently these materials are only available to the McMaster community. This is completely out of my control, and I agree what some of you may be thinking, "wait, out of copyright books are not available to the general public!?"

The workflow is a little complicated right now, but it is the beginning and will definitely be improved. Each book digitized has a specific set of output associated with it; one folder with a TIFF of each page, one folder with an OCR'd text file for each page, one folder for book metadata, and a searchable PDF. The metadata folder has a MARC record (.mrc & MARC21) pulled from WorldCat via Z39.50. Once we have a bulk of digitized books, we copy MARC records to separate directories for processing. Our goal here is to parse the MARC records for certain fields (title, publication date, author, etc) and dump them to a CSV file. We were able to do this by creating a Python script (code below) utilizing a library called pymarc. When the processing of the MARC records is finished, we take the output from the CSV and join (mostly copypasta) it with an XLS file produced by the batch import process for Digital Commons. Once the Digital Commons XLS is finalized, the XLS is uploaded and the Digital Commons system parses the XLS, grabs the PDFs from an accessible directory, and deposits the books.

Future plans...

Automate the copying of PDFs and MARC records via a shell script and set it to a cron. Similarly, once the files are moved the Python script should begin processing the records.

The bottleneck in the entire process is copying the output from the Python script to the Digital Commons XLS. The MARC records are *old* and not very pretty, especially with the date field. Also, the output for the author from the Python script and the input required for author in the XLS is quite different. The values entered by the cataloguer in the author fields of the MARC record are not consistent (last name, first name or first name, last name) and the XLS requires the first name, middle name, and last name in separate fields. I foresee a lot of regex or editing by hand. :( - Matt McCollow -

#!/usr/bin/env python

import csv
from pymarc import MARCReader
from os import listdir
from re import search

SRC_DIR = '/path/to/mrc/records'

# get a list of all .mrc files in source directory
file_list = filter(lambda x: search('.mrc', x), listdir(SRC_DIR))

csv_out = csv.writer(open('marc_records.csv', 'w'), delimiter = ',', quotechar = '"', quoting = csv.QUOTE_MINIMAL)

for item in file_list:
fd = file(SRC_DIR + '/' + item, 'r')
reader = MARCReader(fd)
for record in reader:
title = author = date = subject = oclc = publisher = ''

# title
if record['245'] is not None:
title = record['245']['a']
if record['245']['b'] is not None:
title = title + " " + record['245']['b']

# determine author
if record['100'] is not None:
author = record['100']['a']
elif record['110'] is not None:
author = record['110']['a']
elif record['700'] is not None:
author = record['700']['a']
elif record['710'] is not None:
author = record['710']['a']

# date
if record['260'] is not None:
date = record['260']['c']

# subject
if record['650'] is not None:
subject = record['650']['a']

# oclc number
if record['035'] is not None:
if len(record.get_fields('035')[0].get_subfields('a')) > 0:
oclc = record['035']['a'].replace('(OCoLC)', '')

# publisher
if record['260'] is not None:
publisher = record['260']['b']

csv_out.writerow([title, author, date, subject, oclc, publisher])

McMaster University Librarians unionize

The McMaster University academic librarians have formed a new association – the McMaster University Academic Librarians Association (MUALA) – and decided to unionize under the Ontario Labour Relations Act.

On February 10, the association filed an application for certification with the Ontario Labour Relations Board. The Board held a secret ballot vote the following week and a substantial majority of McMaster librarians voted in favour of certification

Rick Stapleton, was elected president of the new union. The other officers are Nick Ruest (vice-president), Nora Gaskin (treasurer) and Catherine Baird (secretary).

For more information, please contact:

Rick Stapleton
President, MUALA
McMaster University
905.525.9140 x.27885

Nick Ruest
Vice-President, MUALA
McMaster University
905.525.9140 x.21276

Additionally, please see:

insert title here - note to self, think of a title for this post!


email - lots of email. oh joy!

General project maintenance/managerial stuff. Investigate/fix workflow issues with the Mass Digitization project (the robotic book scanner). Hunt for a workstation for the contract worker that is working on the WWI digitization project. Do my due diligence on that massive black cloud looming.

Today is Wednesday. Wednesday is my weekly research help desk shift. Each week I have mixed feelings about my time on the desk. I am only on the desk two hours each week, and never really experienced that whole "being a real librarian" thing. Generally my time is spent directing students to the IT helpdesk, bathrooms, or a floor a particular call number is located. On the rare occasion, I get an actual research help question. They are fun, but too far and few in-between. In the next year or so I will probably not have a research help desk shift anymore since we are moving to a blended service model. I'm still not sure how I really feel about it.


Investigation into questions raised in IR Working Group meeting from yesterday.

Begin preliminary planning of a questions pertaining to providing retrospective open access to previously published works by faculty members. Commence opening a giant can of worms. I also had the backend structure setup for the research centre/institute mentioned yesterday in our IR. This is our first major collaboration with a large section of the faculty. Needless to say I am pumped!!!

Run around frantically trying to find people to finish a couple last minute things. In the process, the fire alarm in the library goes off as I am walking out of the book store to the library. I have no coat on. I am cold.

The rest of the afternoon was spent battling the atemporal destruction of my inner-being by the just right hand of the flying spaghetti monster, holy bacon. Otherwise know as fighting the good fight. Hopefully Slartibartfast will be there to help along the way.

Ponder, Parse, Ponder, Parse...

blog image


Meetings all day. Will everything go better than expected, or will I rage?


email - nope, I'm in meetings all day.

Got into work and discovered the contract worker for the giant 25,000 object digitization project started yesterday and nobody told me.


Checked in the worker and made sure that she was provided with proper documentation regarding file-naming convention, scanning requirements, and storage.

Liaison meeting - teaching with iClickers.

Preliminary meeting with the Science and Technology Center for Archaeological Research project to plan out their research centre in the Institutional Repository. Lots of exciting things were discussed. They are very interested in Open Access, so I gave them some SPARC brochures, and made sure they were aware of the Open Access Addendum for submitting articles to journals. Should see some progress with this project very soon!

Digital Collections - Functional Requirements Meeting (site redesign). Finally! Remember all those posts from the last library day in life were I was talking about moving to Drupal 6 and instituting a bunch of new features??? Well, some things have changed, but we are going to do all of them and more, including a complete site redesign from the ground up.

IR Steering Committee - Iteration 2 here to referred to as IR Working Group. That my friends, is a mouth full. Communication, workflow, advocacy, education. I'm the chair of this committee and gathered a new group of people together to move forward with the institutional repository. The meeting went very well, we have a good game plan for moving forward, and a lot of positive plans of action that should be taking place shortly. PRO-GRESS!

Rocked out to Bad Brains for the commute home.

Everything went better than expected.

blog image

The ultimate question when working from home - When do I put pants on?

I should just write a script that pulls from all of these librarydayinthelife & #libday4 tags and make it write a post for me.


Email. Surprisingly not that much for the morning. Hopefully the trend stays that way through out the day.

Podcast Monday! TWiT, Spark, Quirks & Quarks. Anybody else find Calacanis really annoying when he is on TWiT?

Fink put me on to a Python course from MIT. I really want to be a better programmer, but there are too many hats that I have to wear at work. :(

No new bugs in redmine for digital collections.

Digital collections had a couple of modules that needed to be update. Updates complete.


Surprisingly very little email. I like this trend for the day.

Back to hacking away at getting Jplayer working on some PW20C Case Studies. Last week I switched the embedded videos on the case studies (1,2,3) from a really crappy unsupported module to just embedding them with the embedded media module and Vimeo. Worked out quite well. Once I get the Jplayer working, I can put PW20C and HPCANPUB out to pasture.

Further work with the Jplayer (still haven't got it to work on dev-pw20c) on digital collections. Setting the player up to work with HTML5 and ogg-vorbis. Gah, I hate flash!!!

Late afternoon.

Code refuses to work properly. I would like to bash my head on the desk, but that is not a good idea. Maybe I should just make a rage comic out of this.

Code not doing as it is told or I am an idiot!

Well, I guess Matt has to stare at the code like this o_0 and it magically works now. Chalk it up to another drupalnomoly.

Pants go on as soon as you have your first smoke of the day.

Karate Chop of Love

blog image

New Stuffs on the Horizon...

Now that Historical Perspectives of Canadian Publishing is all finished up we have time, albeit a small amount of time, to concentrate on other portions of the Digital Collections site, and other collections.

World War, 1939-1945, German Concentration Camps and Prisons Collection is nearly complete. Only a few boxes remain to be scanned. The next portion of the project is World War, 1939-1945, Jewish Underground Resistance Collection. This collection is predominantly from 1941-1944 and will contain 325 items. The finding aid for this collection is located here. These collections are two parts of a larger overall project, The Virtual Museum of the Holocaust and Resistance, which is to come much later. That site will be a separate site which pulls from the digital collections site.

Another project that will take a bit more time, but will be an excellent resource once complete is the migration of the World War I Maps & Aerial Photography over to the digital collections site. This will also include approximately 900 more trench maps. The collection will retain the use of mrsid formatting, and the use of the Lizard Tech mrsid delivery server. But, we will also be including JPEG2000 versions of each map & aerial photo and those will be served up with a new Djaktoa image server that our team is working on implementing. Open source > Proprietary :D

The major background project that we will be working on implementing is an upgrade from Drupal 5.x to Drupal 6.x, and cleaning up our code base. Moving to Drupal will provide us with some major improvements. Namely, RDFa support which I am the most excited about! We will also be working on a solution that will allow our catalogue to pull from our collections. Thereby, allowing users to search all of our collections at once from the library catalogue.

Keep an eye on the site. I will announce stuff once we have implemented. Maybe there will be a site redesign in there too!

blog image