Capturing the Web Today for Tomorrow: Innovations in capturing and analyzing social media and websites for the new scholarly record


The growth of digital sources since the advent of the World Wide Web in 1991, and the commencement of widespread web archiving in 1996, presents profound new opportunities for social and cultural analysis. In simple terms, the 1990s cannot be studied without web archives: they are both primary sources that reflect how people consume and understand media, as well as repositories that document the thoughts, opinions, and activities of millions of everyday people. These are a dream for social historians. However, all of this opportunity brings challenges. The size and complexity of the data requires interdisciplinary collaboration. Historians might have the research questions but not the technical resources or knowledge to work with these sources, requiring outreach to other disciplines. Libraries and archives are perfectly positioned to work in this new emerging field that brings together historians, computer scientists, and information specialists. In this talk, our speakers will discuss the fruits of one collaboration that has emerged at York University, the University of Alberta, and the University of Waterloo. Bringing together librarians, archivists, historians, and computer scientists, as well as an interdisciplinary team of undergraduate and graduate students, this distributed group is developing several web archival analytics projects. They work using a combination of centralized and de-centralized infrastructure to run data analytics, store web archives, provide a publicly-facing portal, and collaborate. Ian and Nick will discuss the challenges of working in an interdisciplinary environment, and give insights into how the team has been working through in-detail case studies of their work with, Twitter archiving and analysis, Compute Canada, and Warcbase, a web analytics platform. Collaboration between computer scientists, librarians, archivists and humanists is not always a simple one, but it is a collaboration worth perusing.

