Web Archives

A Quick Benchmark of Webarchive-Discovery

This past week Compute Canada provided us with resources to setup our Solr Cloud instance for WALK and Archives Unleashed. We were able to get things setup relatively quickly thanks to a bit of preparation and practice on our local machines in the previous weeks. Once everything was setup (5 virtual machines total; 4 Solr Cloud nodes and one indexer – details below), we started benchmarking webarchive-discovery and our Solr Cloud setup with GNU Parallel.

See a Little Warclight

What if you have a few terabytes of web archive data setting around, and wanted to shine a little light into them? Well, the good news is that now you can! The British Library’s UK Web Archive initiative has created some great software over the last couple years to allow you to index your web archive content into Solr, and provide access to it in a discovery interface called Shine. You can check Shine out in action here (for the British Library’s collections) or here (for our Canadian politics one).

The Archives Unleashed Project: Warcbase is dead, long live the Toolkit

by Ian Milligan, Jimmy Lin, and Nick Ruest We were delighted to be able to announce a few months ago that our project team at the University of Waterloo and York University were awarded a grant from the Andrew W. Mellon Foundation to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. Since that announcement, we’ve been busy at work at a few different things: modernizing and updating our Warcbase web archiving analytics platform, working on a discovery interface and underlying infrastructure, and laying the administrative groundwork for the project itself.

Archives Unleashed Project

Archives Unleashed aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. Supported by a grant from the Andrew W. Mellon Foundation, we will be developing web archive search and data analysis tools to enable scholars and librarians to access, share, and investigate recent history since the early days of the World Wide Web.

Web Archives for Historical Research

Our research focuses on both web histories - writing about the recent past as reflected in web archives - as well as methodological approaches to understanding these repositories.

Islandora Web ARChive SP updates

Community Some pretty exciting stuff has been happening lately in the Islandora community. Earlier this year, Islandora began the transformation to a federally incorporated, community-driven soliciting non-profit. Making it, in my opinion, and much more sustainable project. Thanks to my organization joining on as a member, I’ve been provided the opporutinity to take part in the Roadmap Committe. Since I’ve joined, we have been hard at work creating transparent policies and processes software contributions, licenses, and resources.