2009 OLITA Digital Odyssey

I must say that the Digital Odyssey was the best one day event I have been to. Just a fantastic day with fantastic people talking about awesome projects. It cheered me up and gave me hope in these crap times. Best part of the day had to be Mike Ridley's keynote speech - The Age of Information is over. It is time for the Age of Imagination. It will be the library's job to nurture and foster creativity.

Workshops attended:

Walter Lewis - The Perfectibility of Data. I must say that Walter may be a bigger metadata fascist that I am. He showed some cool stuff that I didn't know about - media rss feeds! Then using Cooliris to visualize said feed. Also, finally realized how simple it is to provide proper data to interact with google earth & google maps. Just latitude and longitude coordinates!!!

Loren Fantin - Planning and Managing a Digitization Project. Lots of great stuff in Loren's talk. Don't see a blog entry on the Digital Odyssey site yet, so no link. Biggest lessons learned - scope creep!!! & digitization should be apart of collection management.

Art Rhyno - OCR Options for Scanned Content. Great session on the basics & overview of OCR, and OCR software options. Provided many examples from a variety of OCR software packages. Mostly ABBYY & Ocropus.

The text to my presentation, pdf of slides, keynote file, and powerpoint file.

blog image

OMG! You Don't Need CONTENTdm!!!

So, I bet a lot of you are wondering what is up with my with my title? Well, I don’t plan on standing up here taking potshots at OCLC for 15 minutes, but I am sure some people in the crowd wouldn’t mind. Basically, the title should have had a very long sub-title along the lines of, like Dr. Strangelove or: How I learned to Stop Worrying and Embrace Open Source Software.

How many people here know what CONTENTdm is? Well, straight from the site - is a single software solution that handles the storage, management and delivery of your library’s digital collections to the Web.

So, I am an Open Source Software evangelist. Yeah, Yeah, Yeah... I’m a hypocrite. I used proprietary software to make this presentation. I’m not a fascist about open source software, I’m only a fascist when it comes to metadata. But, on a serious note, I strongly believe that libraries should be at the forefront of open source software use. “Being an Open Source Software evangelist is like being a library evangelist.” - Karen Schneider. I also believe, that academic libraries have a responsibility to play a major role in the development of open source software for libraries. As a side note, I believe this ties in very well with the publish or perish notion of academia. What is open source software, but not a constant state of peer-revision?

Which brings me to why libraries should stop buying proprietary/closed out of the box software solutions from vendors. I think all of you know what I am talking about. Horizon, Millenium, LunaInsight, DLXS, and ContentDM. Just to name a few. What to we generally get? Something that works... kinda... for the time being. Support is there... maybe. Oh wait, you want to do that, you’ll need to buy this $20,000 add-on. I think Jessamyn West sums this up quit well in her Evergreen Conference closing keynote, “Closed vendor development = Proof of concept. Go! Ship it!!!” And yes, you can argue the same for open source software, but at least the community can get at source an improve on it!

Now, I told you I was not going to take potshots. So, CONTENTdm is not a pile of garbage. It does what it, it does very well. It does things that the digital collections setup that we have build cannot do yet - JPEG2000 (which hopefully we can launch this summer) and Z39.50. It has an API for custom development. But I want more! I want to be able to move with the times. I want to be able to move at my own pace or the freedom to move with my users. I want the freedom to do what I want to do with the software. What is that? Users want to be able to tag records and bookmark records internally to their account. They want to comment on content, and want a mobile version of the site. Oh wait, I can't do that with CONTENTdm. If I had an open source solution, I would have the freedom to do so.

I would have the freedom to run the program for any purpose. I would have the freedom to study how the program works, and adapt it to my needs. I would have the freedom to redistribute copies so I can help my neighbor. I would have the freedom to improve the program, and release my improvements to the public, so that the whole community benefits. By the way, these four freedoms are from Richard Stallman’s definition of Free Software.

How many people know who Richard Stallman is? Well, for those of you who do not, Richard is the creator of GNU, and founder of the Free Software Foundation.

Just to highlight some of these closed vendor solutions, i.e. CONTENTdm- why don't we take a look at the attack of the clones.

Well, I don’t want to be a clone. I don’t want my site to look like that. To be honest, it looks like something already 5 years old. How am I different? Besides the obviousness of my appearance...

So, what do we use? Drupal.

What is Drupal?

Drupal is a free and open source Modular Framework Content Management System. It is written in PHP, and uses MySQL or PostgreSQL as a database. Drupal is OS agnostic. It can run on a variety of web servers: Apache, IIS (GOD FORBID), Lighttpd and others- so long as you meet the requirements.

Now, when you download Drupal, you do not get a pre-built digital collection platform. You get the Drupal Core. Which is about 10-15 core modules such as; user administration, permissions, search, content types (blogging, pages, etc), commenting, rss, etc.

When Drupal says, Modular, they mean MODULAR. What this image is, is a cropped section of a 2700 x 3800 pixel image representing the contributed modules to Drupal up November 2007. Seriously, look at this, there are thousands of contributed modules.

Now, this presents us with an analogy - this is our foundation, or our little brick house to build off of. Maybe we can start building up this little brick house, into something like this! Now, I’m not saying we’ve built a skyscraper... but the sky is the limits!

So , a little bit of back story now. When I started at McMaster in September of 2007, the library had just received a grant for the Peace & War in the 20th Century digital collection. They had no digital collections infrastructure, and coming straight out of school, I was very scared to say the least. Not scared that we had nothing, but scared of failing. I started thinking that I bit off a little more than I could chew.

Over the summer before I began, they had started working on selecting and scanning images, and creating corresponding metadata. They chose FileMaker Pro to store the metadata, and planned on creating a dynamic website using FileMaker Pro and a ODBC connector. Scary huh!? Written into the grant were things stating that this would be a state of the art, web 2.0 site - i.e., tagging and commenting. Mind you, all of this had to be accomplished in one year. So, after I started, I said to continue scanning, and creating metadata records with FileMaker Pro for the time being. Give me a month to come up with something and then we will go from there. So, after some testing with Joomla, Plone and Drupal, and some pressure to use CONTENTdm, I decided to hedge my bets with Drupal.

So how do you do this? How do you build a Digital Collections site with Drupal?

The best way to tackle this, is not to look at the huge bulk that you have to finish with, but take it apart piece by piece and build with bricks or modules.

What are the key pieces we *really* need? Well, obviously the ability to display our digital object - image, sound, or video with corresponding metadata. We need a metadata format to start with - Dublin Core. We should be friendly, and let others harvest our records (OAI-PMH), thereby adding to the commons. We need a way to get the records in, in a user friendly manner. Finally, users should be able search, and browse records in a variety of ways.

What are the key Drupal Modules to start with?

CCK - The Content Construction Kit allows you to create your own content types, and add custom fields to any content type. So, this is where Dublin Core comes in. For each of our collections, we set up its own content type. Then each content type uses the same dublin core fields + any additional metadata fields that are unique to the collection. So for example, the World War II Concentration Camp correspondences have a lot of additional metadata - so we created fields such as prison camp, sub-camp, block number, censor notes, etc.
Views - The Views module provides a flexible method for Drupal site designers to control how lists and tables of content are presented.
Faceted Search Module - It is what it is, a faceted search module. It allows users to granularly expose themselves to certain content via all the CCK fields that are setup.

[Site Demonstration]

Last but not least - theming! Now, I said I did not want to be a clone. Drupal uses a number of theming engines you can take advantage of. In addition, there a lot of user contributed themes out there. The absolute best one I recommend is the Zen theme. Which is just a framework - a blank white on black setup, with a skeleton css structure that you can add your own muscle to. You can pretty much do whatever you want with it.

Ok, to wrap things up - CONTENTdm is not a free product. By free, I don’t just mean price wise. Free to do what you will with it. Those 4 tenants of Free Software that Richard Stallman will *NEVER* waiver from. But, CONTENTdm is not a bad product, nor is OCLC an evil company. But, times are changing, and business models are changing. Developers and users want more control, they want to do what they will with a product. Mash it up how they please. I haven’t even scratched the surface on what you can do with Drupal and a digital collections site. You saw that little grid of contributed modules. And, modules are not that hard to write. I am not a programmer, but I can manipulate and hack PHP & MySQL to do my bidding and have written modules. So what I am getting to, is why can’t OCLC and other library software vendors open up their products? We are witnessing a revolution right now. How many libraries are moving to Evergreen, Koha, and other open source ILSs? This is not destroying the business models. Companies have to adapt. You can make money with open source software. Look at Red Hat Linux, look at Equinox. One last example with CONTENTdm as my whipping boy - Why not open up CONTENTdm. Let your own users contribute and develop and make the product even better. Do something like Red Hat Linux - give it away for free, and sell support contracts. That is a reliable and proven business model.