Make it WALK!

Nick Ruest
York University

May 10, 2018
Waterloo, Canada
Archives Association of Ontario, 2018



We have a problem facing our collective cultural heritage.

This is a scale that boggles the mind – compare it to the Old Bailey 197,745 trials between 1674 and 1913)

38 million user-built pages

4.1TB of WARCs

💓 gifcities

Researcher's perspective




Could one even study the 1990s and beyond without web archives?

…and the 1990s are history (as painful as it is to say).

And we have fears

The decisions we make today will lay the foundations for how we work with born-digital cultural heritage.

Won't be enough - we'll need search and discovery tools.

But what will our search engines look like?

We can't let the Blackbox write our histories.

Our Nightmare

Historians rely uncritically on date-ordered or algorithmically-ranked keyword search results, putting them at mercy of search algorithms they do not understand.

Some disturbing trends in this area…

The historians -- librarians and archivists -- who came to the meeting were intelligent, kind, and encouraging. But they didn't seem to have a good sense of how to wield quantitative data to answer questions, didn't have relevant computational skills, and didn't seem to have the time to dedicate to a big multiauthor collaboration. It’s not their fault: these things don't appear to be taught or encouraged in history departments right now.

-Erez Leiberman Aiden and Jean-Baptiste Michel

Right now, to use web archives you have to really want to use them.

i.e., you need to be an expert

We want web archives to be used on page 153 of a random book!

So, how you gonna do that?

Interdisciplinary Collaboration

If you're honest and frank, it works.

Web Archives for Historical Research Group

Canadian Political Parties & Political Interest Group Collection (ARCHIVE-IT/Toronto)

  • 50 Websites
  • All major political parties
  • Many minor political parties
  • Political interest groups
  • Collected quarterly between 2005 and present

The Current Interface

  • Very limited; simple search engine, some advanced options, and no facets.
  • Great collection; but nobody uses them.


How could we improve?

Apache Solr

UK Web Archive


Metadata extraction at scale!


What if we built on an existing project?

Archives Unleashed

Our Goals

  • Create relatively easy-to-use tools;
  • Create tools that are UNDERSTANDABLE - no black boxes;
  • Create tools that can push forward research in history, library/archives, and computer science;
  • Help people use these tools, and inspire research & creativity with datathons.

We want to help people unleash their web archives!


Tools & Community!

Lowering the barriers to entry so that humanists, librarians, and archivists can interact with large-scale web archive data, in a transparent way.




What if we had periodic gatherings where colleagues could get background information, learn how to use these tools, and work on a small project with other colleagues?

So web archives aren't boutique.

...and they can speak to a broader audience, and you can imagine how to use them!

So, hopefully, researchers can cite web archives on page 153 of a book without needing to be an expert!

We're always looking for ways to engage archivists, librarians, researchers, developers, or any others interested in born-digital heritage!



Thank you!