DPLA Appfest Drupal integration

Below is the output of the little project I worked on today at the DPLA Appfest. It definitely isn't a perfect solution to the problem. It is not a drop-in module to just grab a collection from the DPLA API and "curate" it in your library's Drupal site. I hate reinventing the wheel especially if there are existing modules that can solve the problem for you. Moreover, as one of the few people that still respects what OAI-PMH does, it would be worth considering using DPLA as and OAI-PMH provider. But, I'm not sure if that is technically legal in OAI-PMH terms given that they are most likely likely harvesting it via OAI-PMH. Don't want to get into and infinite regressing of metadata providers. S'up dawg? All jokes aside, I think OAI-PMH would be a better solution that I what I tossed together because it would make harvesting a "set" a hell of a lot easier. My 2¢.

I also have a live demo of it living on my EC2 instance. I've ingested 2000 items from the API, and decided to throw them into a solr index just to demonstrate the possibilities of what you can do with the ingested content.

Finally, I big giant thank you to DPLA and Chattanooga Public Library for putting this on and the wonderful hospitality. This was absolutely fantastic!

Idea

Drupal module or distribution

Your Name: Nate Hill

Type of app: Drupal CMS

Description of App: Many. many libraries choose to use Drupal as their content management system or as their application development framework. A contrib Drupal module that creates a simple interface for admin users to curate collections of DPLA content for display on a library website would be useful.

Workflow

Preamble:

I don't like recreating the wheel. So, let's see what contrib modules already exist, and see if we can just create a workflow to do this to start with. It would be really nice if DPLA had a OAI-PMH provider, then you could just use CCK + Feeds + Feeds OAI-PMH.

Example: bitly.com/VXMvMr

Requirements:

  • CCK

    drush pm-download cck

  • Feeds

    drush pm-download feeds

  • Feeds - JSON Parser

    drush pm-download feeds_jsonpath_parser cd sites/all/modules/feeds_jsonpath_parser && wget http://jsonpath.googlecode.com/files/jsonpath-0.8.1.php

Setup:

  • Create a Content Type for the DPLA content you would like to pull in (admin/content/types/add)
  • Create DPLA metadata fields for the Content Type (admin/content/node-type/YOURCONTENTYPE/fields)
  • Create a new feed importer (admin/build/feeds/create)
  • Configure the settings for you new feed importer
    • Basic settings:
    • Select the Content Type you would like to import into
    • Select a fequency you would like Feeds to ingest
    • Fetcher
    • HTTP Fetcher
    • Processor
    • Node processor
    • Select the Content Type you created
    • Mappings (create a mapping for each metadata field you created)
      • Source : jsonpath_parser:0
      • Target : Title
    • Parser
    • JSONPath Parser
    • Settings for JSONPath parser
      • Context: $.docs.*
  • Construct a search you would like to ingest using the DPLA API
    • ex: http://api.dp.la/v1/items?dplaContributor=%22Minnesota%20Digital%20Library%22
  • Start the import! (node/add/YOURCONTENTTYPE)
  • Give the import a title... whatever your heart desires.
  • Add a feed url
  • Click on JSONPath Parser settings, and start adding all of the JSONPaths
  • Click save, and watch the import go.
  • Check out your results

Right! That hackfest report I should have gave...

When I was at Islandora Camp trying to wrap my head around all things Islandora and Fedora, I was thinking ahead about a possible project in archives and research collections - migrating our collection/fonds descriptions and finding aids over to ICA AtoM.
 
ICA AtoM does some pretty cool stuff in terms of access to collection/fonds descriptions, integrates very nicely with Archivematica with accessioning born digital objects, and associating digital representations of item level objects with their respective collection/fonds. My greedy little brain wanted more! I wanted ICA AtoM to be able to pull in Fedora objects automatically and associate them with their respective collection/fonds. So, this is the hackfest proposal I submitted.
 
So what happened? What'd we end up doing?
 
The amazing Peter Van Garderen made absolutely sure Artefactual Systems staff was highly represented at hackfest, and I had two amazing people from Artefactual trying to parse my sleep-deprived-scatter-brained-state reasoning/logic behind what I wanted to do. David Juhasz and Jesús García Crespo, you rock!
 
We spent the first hour or so working through the Fedora REST API documentation looking for the best way to approach the "problem." After about an hour or so of working through a few conditional queries that would need to be strung together, Jesús jumped in and said, "Why aren't we using SWORD for this!?" Good question!
 
ICA AtoM can speak SWORD and Fedora and speak SWORD so long as you can get the module working. As things at hackfest generally go for me, it failed. I could not for the life of me get the module to build. Spend a some time going through build.xml and ant and I just weren't going to be friends that day.
 
Strike one - don't code conditional Fedora REST API queries - not sharable and scalable
Strike two - I couldn't get the SWORD module to build!
Strike three - ???
 
While brainstorming for other solutions to our "problem", David was looking for examples in which I could share records from our repository. Duh! OAI-PMH! ICA AtoM can harvest OAI. If we can map OAI sets to ICA AtoM collections/fonds, and set records to indivudual items in a collection/fonds we're set. Oh my, another use case of OAI-PMH! Yay!
 
Did we succeed? Not actually. Turns out the OAI-PMH harvesting code wasn't quite up to snuff at the time, and David, bless his heart, worked on trying to get it up to par before the end of the day. We were not able to pull together a working version, but the framework is there. It was there all along! (Ed, yes we could have and totally should have used atom :P )

Drupal & Digital Collection Sites - 1

I have written about Drupal & the Digital Collections site (http://digitalcollections.mcmaster.ca) a few times now, but haven't really explained how to make a digital collections site out of Drupal. So, without further ado...

What are the necessities of a digital collections site?

What are some additional features that have become necessary?

  • Tagging
  • Social Bookmarking
  • Faceted Searching
  • Visually rich environment
  • Profiles, internal site bookmarking
  • Contact forms, Image requests, Questions
  • Commenting
  • Content Recommendation

So how do you do all of this with Drupal - sans JPEG2000 support (working on that now)? Well, if you are familiar with Drupal, you should know that it is an open source, modular content management system with an amazing support & development community. A standard out of the box Drupal installation will not yield a digital collections site - additional modules are absolutely necessary. Time, effort, and some coding with have to be done, but it is well worth it. The key to all of it is the Content Construction Kit (CCK). Briefly, CCK allows you to create your own fields for a node. So, here is where we get the ability to have all your standard Dublin Core fields, and any other unique metadata a collection will need to be able to present. What I have done with my site is setup a Content Type for each collection. Each content type shares the standard Dublin Core fields (very helpful for massaging an OAI module for digital collections out of an available OAI module), then they have their own unique additional metadata. For example, the World War II German Concentration Camp and Prison Camp Collection has metadata fields for Prison, Sub-Prison, Prison Block, etc.

I have written about the OAI module a couple of times, but essentially what I did is take the OAI-PMH Module, which is an interface for the Bibliography Module, and rework it so it interfaces with the CCK fields I created for the standard Dublin Core fields. I have not had the time to generalize it, (I hope to in the future if time is willing!) so it is hard coded to my collections right now.

Searching is a built in feature of Drupal. Drupal does a pretty good job of creating a search index for itself, as well as advanced searching features. With content types for each collection, users can limit their search to a specific collection or a site wide search.

Browsing a collection can be done by setting up categories and containers for a collections, then placing each record under a specific collection when creating the records or doing a massive mysql update query if you have imported a number of records to start with. Also, for custom browsable options I have used the Views modules to create views for specific metadata fields, and limited them to a collection. Also, the Faceted Search module allows you list all of the fields you would like exposed to faceted search, thereby allowing a user to browse by a variety of field types.

Not too much to say about JPEG2000 support right now. There are two possible scenarios that I am brainstorming with. The first one is Lizard Tech. Before I started here, the Library had purchased a Lizard Tech Express Server license in order to display the mrsid images for the World War I trench maps. The new version of the Lizard Tech server supports JPEG2000, and has an API that I should be able to get Drupal to work with - fingers crossed! The other option is the aDORe djatoka open source JPEG2000 server. I planned on working on this at the Access 2008 Hackfest, but got distracted with SOPAC and Evergreen.

So, now for the rest - additional features...

Tagging is done with the Community Tags module, and tag clouds are created with Tagadelic.

Social Bookmarking is done with the

Faceted Searching is done with the Faceted Search module.

Visually rich environment is done with a variety of modules and custom template coding. Modules that assist in making this possible include; Views (and many views sub-modules), Zen Theme, jquery, Highslide, and Tabs & Quicktabs.

Profiles, internal site bookmarking... user accounts are a standard feature of any content management system. With Drupal we used a custom view and a user hook to allow registered users to bookmark any record to their account.

Contact forms, Image requests, Questions is done with the Contact Form module. Here users can ask questions about records, request images, a report and problems with the site or records.

Commenting is another build in feature of Drupal. Comments are allowed on every record on the site. Unregistered/Anonymous users have to deal with a CAPTCHA, where as registered users do not.

Content Recommendation is done with the Content Recommendation Engine (CRE). This modules interfaces with a number of other modules. The main one that I utilized is the Voting API. The Voting API combined with the CRE allows for a digg like feature on each record. Each record has a Curate It! link, items that have been "curated" are then featured on the Items Curated Page. Drupal also has a popular content feature as well that I utilize.

So, that is pretty much it for the bullet points listed above. I will have another post or two about Drupal in digital collections. Once featuring the all of the modules that I take advantage of, and another covering any questions anybody has.