Project Sustainability and Research Platforms: The Archives Unleashed Cloud Project

Jun 7, 2019

Slides

Abstract

The Archives Unleashed Project, founded in 2017 with funding from the Andrew W. Mellon Foundation, aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. We respond to one of the major issues facing web archiving research: that while tools exist to work with WARC files and to enable computational analysis, they require a considerable level of technical knowledge to deploy, use, and maintain.

Our project uses the Archives Unleashed Toolkit, an open-source platform for analyzing web archives (https://github.com/archivesunleashed/aut). Due to space constraints we do not discuss the Toolkit at length in this abstract. While the Toolkit can analyze ARC and WARC files at scale, it requires knowledge of the command line and a developer environment. We recognize that this level of technical expertise is beyond the level of the average humanities or social sciences researcher, and our approaches discussed in this paper concern themselves with making these underlying technical infrastructures accessible.

This presentation expands upon the Archives Unleashed Cloud, building upon previous presentations of earlier work at the IIPC meeting in Wellington. This is both to introduce it to researchers, but in this presentation we will focus on stimulating a conversation around where the work of the researcher begins and the work of the research platform ends. It also discusses the problem of long-term project sustainability. Researchers want services such as the Cloud, but how do we provide this service to them in a cost-effective manner? This targeted discussion will speak not only to our project, but broader issues within the web archiving ecosystem throughout the field.

As we develop the working version of the Archives Unleashed Cloud, one of the main concerns of the project team is the future of the Cloud after Mellon funding ends in 2020. While we are currently exploring whether the Cloud makes sense as a stand-alone non-profit corporation, we are still unsure about the future direction. How do services like this, that meet demonstrated needs, survive in the long run? Our presentation discusses our current strategies but hopes to engage the audience around the state-of-the-field and how to best reach web archiving practitioners.

Projects and services like WebRecorder.io and Archive-It have made amazing strides in the world of web archive crawling and capture. The Archives Unleashed Cloud seeks to make web archiving analysis similarly easy and straightforward. Yet the scale of web archival data makes this less straightforward.

Date

Jun 7, 2019

Event

International Internet Preservation Consortium Web Archiving Conference 2019

Location

Zagreb, Croatia

Project Sustainability and Research Platforms: The Archives Unleashed Cloud Project

Abstract

Nick Ruest

Associate Librarian