See a Little Warclight

What if you have a few terabytes of web archive data setting around, and wanted to shine a little light into them?

Well, the good news is that now you can! The British Library’s UK Web Archive initiative has created some great software over the last couple years to allow you to index your web archive content into Solr, and provide access to it in a discovery interface called Shine. You can check Shine out in action here (for the British Library’s collections) or here (for our Canadian politics one).

Starting in 2016, our research team was interested in trying to provide access to Canadian web archival collections. Our Web Archives for Longitudinal Knowledge project, or WALK, brings together the web archival holdings of a half-dozen Canadian libraries and aims to provide federated search and access to research derivatives.

In doing so, we realized that while Shine was powerful, it really lacked an active development community: we wanted to provide access in the same spirit, but on a different platform. But we are also in the same boat as the Shine developers: grant funded, with no real sense that in three, four, or five years down the road that we’d have the time to devote full-time energy to a platform. To keep things going, we would need to leverage a bigger open-source community.

Enter Warclight.

What is Warclight?

Warclight is a Project Blacklight based Rails engine that supports the discovery of web archives held in the WARC and ARC formats. It allows faceted full-text search, record view, and other advanced discovery options. Future work on the project will include integrating the Blacklight Advanced Search plugin, and creating a new plugin to recreate the existing trend search functionality in Shine.

One of the biggest strengths of Warclight is that it is based on Blacklight. This opens up a mature open source community, which could allow us to go farther if we’re following the old idiom: “If you want to go fast, go alone. If you want to go further, go together.”

Warclight is designed to work with web archive data that is indexed via the UK Web Archive’s webarchive-discovery project. Warclight currently uses a fork of webarchive-discovery that allows for three additional facets: institution, collection_name, and collection_number. We’ll be working on getting this functionality into core webarchive-discovery so that if you want to take advantage of this functionality you will not need to use our fork of it.

You can try out Warclight at our demo site, or you can build your own Rails app and take it for a spin locally.

If you’d like to contribute to the project, whether it be feedback, use cases, or code contributions, please do not hesitate! Especially if you have thoughts on what fields should be displayed on an item’s view, search result fields, or facet fields. We also have a channel in the Archives Unleashed Slack devoted to Warclight; #warclight.

We can’t wait to work with you all, and bring the rapidly developing Warclight platform to the broader web archiving community!

Stay tuned for next week, when we talk a bit more about what we’ve been able to do with some of the WALK collections, through interfaces like Warclight and the Archives Unleashed Toolkit.


Tagged in Archive, Web Archiving, Apache Solr, Ruby on Rails

By Nick Ruest on .

Canonical link

Exported from Medium on September 23, 2017.


Related

comments powered by Disqus