DPLA Appfest Drupal integration

Below is the output of the little project I worked on today at the DPLA Appfest. It definitely isn't a perfect solution to the problem. It is not a drop-in module to just grab a collection from the DPLA API and "curate" it in your library's Drupal site. I hate reinventing the wheel especially if there are existing modules that can solve the problem for you. Moreover, as one of the few people that still respects what OAI-PMH does, it would be worth considering using DPLA as and OAI-PMH provider. But, I'm not sure if that is technically legal in OAI-PMH terms given that they are most likely likely harvesting it via OAI-PMH. Don't want to get into and infinite regressing of metadata providers. S'up dawg? All jokes aside, I think OAI-PMH would be a better solution that I what I tossed together because it would make harvesting a "set" a hell of a lot easier. My 2¢.

I also have a live demo of it living on my EC2 instance. I've ingested 2000 items from the API, and decided to throw them into a solr index just to demonstrate the possibilities of what you can do with the ingested content.

Finally, I big giant thank you to DPLA and Chattanooga Public Library for putting this on and the wonderful hospitality. This was absolutely fantastic!


Drupal module or distribution

Your Name: Nate Hill

Type of app: Drupal CMS

Description of App: Many. many libraries choose to use Drupal as their content management system or as their application development framework. A contrib Drupal module that creates a simple interface for admin users to curate collections of DPLA content for display on a library website would be useful.



I don't like recreating the wheel. So, let's see what contrib modules already exist, and see if we can just create a workflow to do this to start with. It would be really nice if DPLA had a OAI-PMH provider, then you could just use CCK + Feeds + Feeds OAI-PMH.

Example: bitly.com/VXMvMr


  • CCK

    drush pm-download cck

  • Feeds

    drush pm-download feeds

  • Feeds - JSON Parser

    drush pm-download feeds_jsonpath_parser cd sites/all/modules/feeds_jsonpath_parser && wget http://jsonpath.googlecode.com/files/jsonpath-0.8.1.php


  • Create a Content Type for the DPLA content you would like to pull in (admin/content/types/add)
  • Create DPLA metadata fields for the Content Type (admin/content/node-type/YOURCONTENTYPE/fields)
  • Create a new feed importer (admin/build/feeds/create)
  • Configure the settings for you new feed importer
    • Basic settings:
    • Select the Content Type you would like to import into
    • Select a fequency you would like Feeds to ingest
    • Fetcher
    • HTTP Fetcher
    • Processor
    • Node processor
    • Select the Content Type you created
    • Mappings (create a mapping for each metadata field you created)
      • Source : jsonpath_parser:0
      • Target : Title
    • Parser
    • JSONPath Parser
    • Settings for JSONPath parser
      • Context: $.docs.*
  • Construct a search you would like to ingest using the DPLA API
    • ex: http://api.dp.la/v1/items?dplaContributor=%22Minnesota%20Digital%20Library%22
  • Start the import! (node/add/YOURCONTENTTYPE)
  • Give the import a title... whatever your heart desires.
  • Add a feed url
  • Click on JSONPath Parser settings, and start adding all of the JSONPaths
  • Click save, and watch the import go.
  • Check out your results

York University Libraries Open Access Week 2012 - #blogvsbook

Yesterday, York University Libraries held a debate in the Scott Library entitled, "Be it resolved the blog replace the book?" The debate turned out pretty awesome, and somehow the team arguing for the book won!? (Some might say it was because of @adr's compelling closing statements.) 

Along with livestreaming the debate on ustream, I pulled together (a special thanks to Ed Summers, and his very permissive licensing) a little node.js application to display a "twitterfall" of the hashtag for the event. As is always the case, technology is bound to fail, somehow, someway, at a live event. Turns out that we owe a very special thank you to the giant Amazon outage, which in turn took out Heroku's infrastructure. Good thing my paranoia urged me to use a backup application to snag the archive for the stream; with all of the variations on the hashtag.

Enough about the debate, and Amazon's large internet burp! What I want to really talk about is some fun ways to play with the data we collected from the Twitter API. The backup application I mentioned earlier, has some nice visualizations incorporated in it. Along with its ease of use, it is pretty slick and simple to use application. But, most important, I have a csv (deposited in the OCUL Dataverse site) of all the tweets, for all the hashtags I could figure out. Which means we (yes you! Download the csv and have fun with this too!) can start doing some cool visualizations. 

Inspired by @Sarah0s' "Dead easy data visualization for libraries" talk at AccessYUL I decided to play with infogr.am to see how easy it would be to toss together a visualization of the number of tweets per user.

This is a fairly basic and easy one to make. You only need two columns: twitter usernames, and corresponding number of tweets. Once you have those entered, just hit publish, and you're good to go. 

So, that is something quick and easy. I have "Designing Data Visualizations" on the way. Hopefully that inspires me a bit more, and maybe I'll start playing with d3js again. Should be fairly straightforward to drop the csv into Google Refine and get some json back. In the interim, I just leave it up to Bill Denton to show us some really cool stuff with the data in R 

iaTorrent update OR Learning by reading code

Last week, inspired from a meeting, I started tossing together a little python program to solve a problem. It wasn't perfect. It was warty. But I think I have something worthwhile now. Or, at least useful for me -- It gives you what you want, and writes to a log when something goes wrong.

What I really want to do here, is just take a moment to sing the praises of learning by reading code. Heading into this little project, I had a basic idea of what I wanted to do, and I knew something like this could be done given Tim's project. I knew that I wanted to make a this a module, and set it up on PyPI, but I had really know idea how to do so. But! I knew of somebody who did, and is quite proflic in my mind. Ed making his code available on Github (and using very open licenses) made it possible for me to learn how to build the structure for a Python module, the structure for writing tests, and using argparse/optparse correctly.

So, here is to learning by reading code!

IA Torrent

Yesterday in a meeting for our Digital Initiatives Advisory Group we were discussing what collections we should consider sending over to the UofT Internet Archive shop, and I asked an innocent newbie question - So, do we have copies of everything we have had the Internet Archive digitize?


No big deal. We're in the infant stages of creating a digital preservation program here, and everything that comes with it. INFRASTRUCTURE!

I knew Tim Ribaric over at Brock University wrote an Internet Archive scraper a while back, so I knew it would be possible to get our content if need be. Knowing that combined with the Internet Archive announcement a little over a month ago about making available torrents for items in the Internet Archive, it inspired me to whip together a Python script to grab all the torrents for a given collection.

Last night I threw together a little proof-of-concept grabbing the RSS feed on the York University Libraries Internet Archive page using BeautifulSoup and some ugly regex.

This morning, still inspired and brainstorming with Dan Richert, I started poking around for different ways to get at our collection. The Internet Archive's advanced search is super helpful for this, and I can get the results as json! So, no regex; as Dan told me, "if you solve a problem with regex, you now have two problems."

On the advanced search page, you will need your query parameters. You can grab those from the 'All items (most recently added first) link on a collection page. For example, the York University Libraries collection query parameters:

(collection:yorkuniversity AND format:pdf) AND -mediatype:collection'

Then selected your desired output format, and number of results. 2608 for me given the number of items in the collection. Then you end up with some json like this:

         "qin":"(collection:yorkuniversity AND format:pdf) AND -mediatype:collection",
         "q":"( collection:yorkuniversity AND format:pdf ) AND -mediatype:collection;",
            "title":"Revised statutes of Ontario, 1990 = Lois refondues de l'Ontario de 1990",
            "title":"Essai philosophique concernant l'entendement humain : ou l'on montre quelle est l'etendue de nos connoissances certaines, et la maniere dont nous y parvenons",
            "title":"Essai philosophique concernant l'entendement humain : où l'on montre quelle est l'étendue de nos connoissances certaines, et la manière dont nous y parvenons",
            "title":"Essai philosophique concernant l'entendement humain, : ou l'on montre quelle est l'etendue de nos connoissances certaines, et la maniere dont nous y parvenons.",

(make sure you lop off '&callback=callback&save=yes' at the end of the url). Once you have the url for the json, it is pretty straightforward from there. You just call the script like so:

ia-torrent.py 'http://archive.org/advancedsearch.php?q=%28collection%3Ayorkuniversity+AND+format%3Apdf%29+AND+-mediatype%3Acollection&fl%5B%5D=identifier&fl%5B%5D=title&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=2608&page=1&output=json' '/tmp/ia-torrent'

Caveats! I haven't been able to download all the torrents for an entire collection yet. Looks like Internet Archive's servers don't like the number of requests, and the script dies out with:

'IOError: [Errno socket error] [Errno 111] Connection refused'

I've tried throttling myself in the script at 15 seconds per request, and still get cut off. If anybody knows if Internet Archive has any published request rates, or has a better idea in implementing this, please let me know! Add a comment, or fork + clone + pull request. Patches are most welcome!

Big thank you to Dan Richert for the impromptu crash course on parsing json this morning!!!

Islandora development visualization

Hit a bit of a wall yesterday getting checksums working when ingesting content into Islandora, so I made a Gource video of the Islandora commits in my fork of the git repo.

Music by RipCD (@drichert) and myself.

How'd I do it?

  1. I wanted to use the Gravatars, so I used this handy little perl script.
  2. Hopped into the Islandora git repo, and ran:

    gource --user-image-dir .git/avatar/ -s 3 --auto-skip-seconds 0.1 --file-idle-time 50 --max-files 500 --disable-bloom --stop-at-end --highlight-users --hide mouse --background-colour 111111 --font-size 20 --title "Islandora Development" --output-ppm-stream - --output-framerate 60 | avconv -y -r 60 -f image2pipe -vcodec ppm -i - -b 8192K ~/Videos/islandora-gource.mp4

  3. Then I used OpenShot to add the music and uploaded to YouTube.

FITS and Islandora integration

Digital preservationistas rejoice?
I managed to get FITS integration working in Islandora via a plugin. The plugin will automatically create a FITS xml datastream for an object upon ingest in the Islandora interface for a given solution pack. Right now I have it working with the Basic Image Solution Pack, Large Image Solution Pack, and PDF Solution Pack. You just have to make sure fits.sh is in your apache user's path (thanks @adr). [UPDATE: Works with the Audio Solution Pack now.]
What I had feared was going to be a pretty insane process turned out to be fairly simple and straightforward, which I'll outline here.

  1. I looked at existing plugins for something similar that I could crib from, and found that something in the exiftool plugin which is used in the audio and video solution packs.
  2. Using the existing plugin, I ran some grep queries to figure out how it is used in the overall codebase (Islandora, and solution packs). 
  3. Created a feature branch
  4. Hammered away until I had something working. (Thanks @mmccollow)
  5. Create an ingest rule for a solution pack. This tells the solution pack to call the plugin.
  6. Test, test, and test.
  7. Merged feature branch with 6.x branch, pushed, and opened up a  pull request.

That is basically it. Let me know if you have any questions. Or, if you know of a way to make it even better, patches welcome ;)
[Update #2]
I've added a configuration option to the Islandora admin page to enable FITS datastream creation, and the ability to define a path to fits.sh. I put it in the advanced section of the admin page which is not expanded by default. This will probably be problematic, and folks won't notice it. It might be a better idea to collect all the various command line tools Islandora uses, and give them all a section in the admin page to define their paths.
I also have FITS creation working with the Video Solution Pack now. Up next, Islandora Scholar... just have to get that up and running ;)

Just an easy way to help out!

Do you have a computer or server sitting around running Linux, and want to help out a good cause? Then you should check out one of the active Archive Team projects. "Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage." Basically, this wonderful group of people monitors the internet for sites/services that are about to disappear, and do their absolute best to make sure these things are preserved. 
I wish I would have got involved with this earlier, but as I was walking out the door from McMaster I stumbled across the MobileMe project a figured I could throw some bandwidth at it. If you are comfortable with installing software on the command line, a little bit of get, and compiling then you should throw in a helpful hand! (Or, if you are a bad ass programmer, you can help write some scripts for projects that need a hand.) The instructions on each project page are pretty straightforward. If you have a question or need a hand to get going, pop into the IRC channel or just ask me.

Why you should attend Digital Odyssey 2012!

 I am really excited about this year's Digital Odyssey, and I want you to be too. Our theme this year is Liberation Technology. I know it might not be quite obvious why we picked this theme, but hopefully it will make some sense in a  few sentences.
Why Liberation Technology? I along with the OLITA Council are passionate about social justice issues, and find the intersection of technology and social justice intriguing - even more so when throw library land into that intersection. We felt that with the many social justice issues intersecting with technology making headlines like the Arab Spring, Anonymous, and the Occupy movement, that focusing specifically on liberation technology would make a timely topic for this year’s Digital Odyssey. We have defined liberation technology as, "... a field of study [that] seeks to understand how information technology can be used to pursue a variety of social goods. This includes any technology that enables citizens to express opinions, deepen participation in society, and expand their freedoms." 
We have a wonderful and passionate keynote speaker, Kate Milberry, who will be opening up and setting the tone for the day giving a talk entitled, "The Knowledge Factory Hack: From Open Access to Anonymous ...or why information wants to be free."
Following the keynote, we have an amazing day of great speakers lined up, and you should totally come!
Regular format talks:

  • Fiacre O’Duinn - Inside the Black Box: Hacker culture, librarians and hardwareJutta Treviranus - Outside-In
  • MJ Suhonos - Open Data is dead!  Long live Open Data!
  • Jutta Treviranus - Outside-In
  • Sarah Wiebe and Shelley Archibald - A TORid Affair: Librarians, Ethics & Liberation Technology.
  • Carys Craig - Encouraging Creativity or Diminishing Dialogue? A Critical Account of Digital Copyright in Canada

Thunder talks:

  • Hannah Turner - Indigenous Knowledge
  • Michelle Thomson - The People’s Library
  • Rebecka Sheffield - Citizen Archivists: Transcribing History for Future Generations
  • Maggie Reid - A balanced look at Copyright Reform?



Yesterday I had the privilege to speak on a panel with Kristin Hoffman of University of Western Ontario and Marc Richard of McGill University entitled, "Academic Librarians on the Front LInes", at the "Academic Librarianship - A Crisis or an Opportunity?" symposium at the University of Toronto. We were each given 15 minutes to address the following questions: What are some of the first hand experiences of academic librarians working at institutions where academic librarianship is under threat? What lessons can be learned?
Given the tumultuous nature of labour relations at McMaster, I prepared a statement that was read. The majority of that statement is information that is already available publicly on our union's website. The following is the statement that I read.

What I plan on briefly talking about is, who and what we are, what we have done and what we have dealt with thus far.
Who and what we are
We are McMaster University Academic Librarians Association, a union that currently represents 18 academic librarians. The unit is split between two library systems: McMaster University Library with 11, the Faculty of Health Sciences Library with 7. These 18 librarians serve a University population of 21,173 full-time undergraduate students and 3,025 full-time graduate students (2009-2010), as well as 894 full-time instructional faculty members—1,434 including clinical faculty.
The bargaining unit does not include Associate University Librarians, the University Librarian, the Health Sciences Library Director, or any librarians not funded by McMaster University.
Four associate university librarians and the university librarian manage the 11 librarians in the University Library, and the Faculty of Health Science Library Director manages the 7 Health Sciences librarians.
At the time of certification, March 16, 2010, MUALA represented 27 librarians.
Collective Agreement
We ratified our first collective agreement in March of this year. The agreement is for a 5 year term ending July 31, 2015. As this is a first agreement, it is an attempt to enshrine past practices from when the librarians were part of MUFA, provide transparency, and codify other practices not previously in place.
Representation from both campus library systems and from CAUT was present on the bargaining team.
Throughout the preparations for, and during the process of negotiation, all members were quick to attend meetings, prepare documentation and provide input as required.
During negotiations in February, cuts to the librarians’ salary budget were announced, which I will touch on shortly.
At present, MUALA is still awaiting the opportunity to officially sign and distribute the collective agreement.
Most recent layoffs
In February of this year, in the midst of bargaining for our first collective agreement, MUALA was informed by the University that it had “experienced a significant change in its financial circumstances, which now necessitates certain cost reductions within the bargaining unit.” MUALA reluctantly signed a “Voluntary Departure Program” agreement in which members who have attained the ‘Rule of 80’ (age + years of service = 80 or more) were offered an early retirement package. As of May 1, five librarians accepted the package and have retired.
The agreement also stated that if “the Program results in insufficient cost reductions the Parties agree that they will meet to negotiate the terms of further reduction initiatives.” We were advised by the University administration on May 4 that the “cost reductions” had fallen short of the target by over $80,000. Shortly thereafter, one of our members announced her resignation in order to accept a position at another institution. Her departure means that the University’s target has been sufficiently met, so no further reductions are necessary.
These developments mark the second time in his short tenure at McMaster University that University Librarian Jeffrey Trzeciak has overseen the reduction of librarian positions as a means of dealing with budget problems. Just two years ago, the University Librarian announced a voluntary separation package that resulted in the departure of two librarians, as well as other library staff. Shortly thereafter, he announced that two other librarian positions were declared ‘redundant’. To our knowledge, these 2009 separations marked the first time in recent years that a University Library in Ontario implemented librarian dismissals as a means of dealing with budget problems.
Meanwhile, the Faculty of Health Sciences Library here at McMaster—which does not report to the University Librarian—has had balanced budgets without layoffs during the same period.”
Interestingly, during this same time period, the Faculty of Health Sciences library was able to add another full-time continuing appointment position.
Low morale is a major issue at McMaster. More so in the University Library. We could arguably have the lowest morale of all university libraries in Canada.
The morale issue runs deep, and cannot be pinpointed to a specific event. But, over two years ago, prior to unionization, the McMaster Librarians (MUFA) held a vote of no confidence for University Librarian Jeffrey Trzeciak. That vote was unanimous. To this day, that unanimous vote of no confidence has yet to be addressed.
This vote was brought up at our most recent Labour Management Committee meeting, with an overall agenda item of morale. In that meeting we stressed that the low morale has been caused by, but not limited to, the events leading up to unionization, the departure of talent from the University Library, poor communication in general from University Library administration, the physical and emotional demeanor of University Library MUALA members, the lack of recognition of the role of its members, and the damage to McMaster’s reputation.
The best the library administration could suggest/offer was for the University Librarian, Jeffrey Trzeciak, to meet with us individually. This was brought to the MUALA members at the next members meeting. Given complete lack of trust, and cultural of fear and intimidation, it was no surprise that the members put forward and passed the following motion:

“MUALA members state that given that we have a long standing motion of non-confidence in the University Librarian, and given that we've undertaken a review of the University Librarian that we have submitted to the President and Provost, and given that the President and Provost have established a review of the University Library, we would prefer to defer any meetings with the subject of morale until the University Library Review is completed, after which we anticipate meeting with the University Librarian in a structured and mediated format.”

I will be very interested to see the results of the CAUT Academic Librarians Stress Survey. Specifically, the numbers from McMaster.
There isn’t much I can really say here, other than just stating the facts to my understanding.
McMaster University Library currently has 6 post docs employed -- by far, the most of any Council on Library and Information Resources (CLIR) affiliated institution -- and one outstanding post doc job posting. Two of which are affiliated with the Centre for Leadership in Learning, and it is not clear if the post docs are directly involved with the library or not. But, here are some brief summaries for a few of the positions, and a segment from the potential 7th postdoc job posting.
1. Will be researching and designing a professionalization and teaching program for graduate students involving the collaborative efforts of the library, the School of Graduate Studies, and the Centre for Leadership in Learning. 
2. Will be investigating best practices in teaching and learning in the field of psychology that relate specifically to models of online-instruction and assessment, evaluation of library resources, and instructional resources.
3. The candidate will be expected to work with the library to:

All of this is very intriguing given that McMaster University Library effectively disbanded our liaison program this past summer via another round of separations. At face value, this appears to be an attempt at systematically replacing librarians with postdocs.
I also want to be very clear here.
Systematically replacing librarians with post-docs is *not* cool.
University Librarian Review
Jeffrey Trzeciak’s appointment as McMaster University Librarian reached the 5 year mark this year. MUALA believes that directors of libraries should be subject to comprehensive 5-year reviews. Accordingly, we undertook a representative opinion survey of our membership in October, 2010, using a survey instrument published by the Association of Research Libraries. The survey consisted of 52 questions addressing five key aspects: vision, leadership, administration, communication and effectiveness, and was completed by 22 of the 25 MUALA members. The University Librarian received an overall performance rating of ‘poor’ from 16 of 22 respondents, and a rating of ‘fair’ from the remaining 6 respondents; he did not receive any overall ratings of ‘excellent’ or ‘good’.
The Meeting with the President and Provost regarding the review
The MUALA executive requested a meeting with McMaster University President Patrick Deane and Provost Ilene Busch-Visniac in order to present them with copies of the resulting report. Our request was granted, and the meeting was held on March 22. The executive summary of the report we presented to them is available on our website.
At the meeting, we requested that: (1) the University implement a formal review of the University Librarian’s first 5 years at McMaster; (2) our report be included as part of this review; and (3) regardless of 1 and 2, we receive a formal written response from the President and Provost on the concerns raised in our report. We were informed by the Provost that the University Librarian had already been re-appointed, and that a review had been conducted by the Provost last autumn. We expressed disappointment that MUALA had not been invited to participate in the review.
The President and Provost thanked us for our report and promised to reply by the end of April. I would like to emphasize that the tone of the meeting was open and cordial.
The Written Response of the President and Provost regarding the review
The MUALA executive received a written reply from the President and Provost on April 11 (the memorandum was dated April 6). They stated their agreement with us that the renewal of a University Librarian appointment “ought to follow a process similar to that for Deans” and “incorporate a review by a duly appointed committee.” The implementation of such a process “should certainly be in place prior to the next opportunity for renewal of the contract for the University Librarian.”
Regarding our overall concerns, they went on to say: “we have made recommendations to Jeff that we hope will address some of the issues raised in the report. We are optimistic that actions Jeff has agreed to take will enable the Librarians to work more effectively with him as a team.”
University Library Review
In June, then president Rick Stapleton received a letter stemming from from the MUALA University Librarian review communication thread, announcing a review of the University Library.
The letter states that the President and Provost are "convening a review of the University Library." While they "have not typically conducted ... reviews of the Library ... this is an oversight that needs to be corrected." They will establish a "review committee" that will "be consulting widely". They will "meet with librarians (and other library staff) when they are on campus." Their report "will be made public and shared through our governance system in a manner identical to that used for academic program reviews." In addition, the President and Provost will solicit from MUALA "suggestions for external reviewers". They go on to say: "While we do not normally seek approval by interested parties of the members of a review committee, it is our intention to strike a committee that is acceptable to MUALA, to the management team of the Library, and to the broader campus community. It is our sincere hope that the review team will then be able to advise us unencumbered by any concerns of bias."
At this time I have no more public information available regarding the University Library review, other than that it is still in the very beginning stages and slowly moving forward with some difficulties.
In closing, I want to stress that MUALA wants normalized labour relations. We have remained fair and reasonable this entire time, and we will continue to do so.

Cozumel - November 5-12, 2011