Islandora Web ARChive Solution Pack

What is it?

The Islandora Web ARChive Solution Pack is yet another Islandora Solution Pack. This particular solution pack provides the necessary Fedora objects for persisting and disseminating web archive objects; warc files.

What does it do?

Currently, the SP allows a user to upload a warc with an associated MODS form. Once the object is deposited, the associated metadata is displayed along with a download link to the warc file.

You can check out an example here

Can I get the code?

Of course!

Todo?

If I am doing something obviously wrong, please let me know!

Immediate term:

  1. Incorporate Wayback integration for the DIP. I think this is the best disseminator for the warc files. However, I haven't wrapped my head around how to programatically provide access to the warc files in the Wayback. I know that I will have two warc objects, an AIP warc and a DIP warc (Big thank you to @sbmarks for being a soundboard today!). Fedora will manage the AIP, and Wayback will manage the DIP. Do I iFrame the Wayback URI for the object, or link out to it?

  2. Drupal 7 module. Drupal 7 versions of Islandora Solution Packs should be on their way shortly -- Next release I believe. The caveat to using the Drupal 6 version of this module is the mimetype support. It looks like the Drupal 6 api (file_get_mimetype) doesn't pull the correct mimetype for warc file. I should get 'application/warc' but I am getting 'application/octet-stream' -- the fallback default for the api.

Long term:

  1. Incorporate Islandora microservices. What I would really like to do is allow users to automate this entire process. Basically, just say this is a site I would like to archive. This is the frequency at which I would like it archived, with necessary wget options. This is the default metadata profile for it. Then grab the site, ingest it into Fedora, drop the DIP warc into Wayback, and make it all available.

  2. If you have any idea on how to do the above, or how to do it a better manner, please let me know!

FITS and Islandora integration

Digital preservationistas rejoice?
 
I managed to get FITS integration working in Islandora via a plugin. The plugin will automatically create a FITS xml datastream for an object upon ingest in the Islandora interface for a given solution pack. Right now I have it working with the Basic Image Solution Pack, Large Image Solution Pack, and PDF Solution Pack. You just have to make sure fits.sh is in your apache user's path (thanks @adr). [UPDATE: Works with the Audio Solution Pack now.]
 
What I had feared was going to be a pretty insane process turned out to be fairly simple and straightforward, which I'll outline here.

  1. I looked at existing plugins for something similar that I could crib from, and found that something in the exiftool plugin which is used in the audio and video solution packs.
  2. Using the existing plugin, I ran some grep queries to figure out how it is used in the overall codebase (Islandora, and solution packs). 
  3. Created a feature branch
  4. Hammered away until I had something working. (Thanks @mmccollow)
  5. Create an ingest rule for a solution pack. This tells the solution pack to call the plugin.
  6. Test, test, and test.
  7. Merged feature branch with 6.x branch, pushed, and opened up a  pull request.

That is basically it. Let me know if you have any questions. Or, if you know of a way to make it even better, patches welcome ;)
 
[Update #2]
 
I've added a configuration option to the Islandora admin page to enable FITS datastream creation, and the ability to define a path to fits.sh. I put it in the advanced section of the admin page which is not expanded by default. This will probably be problematic, and folks won't notice it. It might be a better idea to collect all the various command line tools Islandora uses, and give them all a section in the admin page to define their paths.
 
I also have FITS creation working with the Video Solution Pack now. Up next, Islandora Scholar... just have to get that up and running ;)

Just an easy way to help out!

Do you have a computer or server sitting around running Linux, and want to help out a good cause? Then you should check out one of the active Archive Team projects. "Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage." Basically, this wonderful group of people monitors the internet for sites/services that are about to disappear, and do their absolute best to make sure these things are preserved. 
 
I wish I would have got involved with this earlier, but as I was walking out the door from McMaster I stumbled across the MobileMe project a figured I could throw some bandwidth at it. If you are comfortable with installing software on the command line, a little bit of get, and compiling then you should throw in a helpful hand! (Or, if you are a bad ass programmer, you can help write some scripts for projects that need a hand.) The instructions on each project page are pretty straightforward. If you have a question or need a hand to get going, pop into the IRC channel or just ask me.