Warclight


Nick Ruest
York University

Archives Unleashed Vancouver Datathon
Vancouver, B.C.
November 1-2, 2018

So, what's Warclight?

Quick History

UKWA's Shine

IIPC 2016

Quick overview

Warclight is a Project Blacklight based Rails engine that supports the discovery of web archives held in the WARC and ARC formats.

To better understand this, let's take a quick look at the four main components:

  • Solr
  • Blacklight
  • ARCs/WARCs
  • webarchive-discovery

Solr

...is an open source enterprise search platform, written in Java, from the Apache Lucene project. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering...

Blacklight

ARCs/WARCs

🤔

Well, that could get really big, really quick.

🙃

Yes! Hold that thought. We'll come back to it.

So, how do we get these ARCs and WARCs into Solr?

webarchive-discovery

How about a demo?

Scaling

💓 gifcities

Parallelize!


We'd love to hear from you!

Thank you!