archives unleashed

AUT & Last Date Modified

There’s been a long-standing, frequently asked question by participants of Archives Unleashed datathons and the cohort program: how do we find out the date of a resource or page? Dates can be really hard to decipher in web archives. As a result, we tend to rely on the crawl date, which is a pretty easy thing to grab out of a WARC since it is a mandatory field. While this is something we’ve always had in aut, it’s not the date of a response or creation of a website, but instead is the date on which the crawl occurred.

Enhancing Archives Unleashed Toolkit Usability with Spark-Submit

Originally posted here. Over the last month, we have put out several Toolkit releases. The primary focus of the releases has been firming up and improving spark-submit support. What does this mean? The short answer is that it makes the Toolkit easier to use. Think of the “Let’s move tools towards our users” graphic from my “Cloud-hosted web archive data: The winding path to web archive collections as data” post from a few weeks back.

Cloud-hosted web archive data: The winding path to web archive collections as data

Originally posted here. Web archives are hard to use, and while the past activities of Archives Unleashed has helped to lower these barriers, they’re still there. But what if we could process our collections, place the derivatives in a data repository, and then allow our users to directly work with them in a cloud-hosted notebook? 🤔 Last year around this time, the Archives Unleashed team was working on what can now be referred to as our first iteration of notebooks for web archive analysis.

twut. Wait, wut? twut?

Originally posted here. Introduction A few of the Archives Unleashed team members have a pretty in-depth background of working with Twitter data. Jimmy Lin spent some time at Twitter during an extended-sabbatical, Sam Fritz spent some time working with members of the Social Media Lab team previous to joining the Archives Unleashed Project, and Ian Milligan and I have done a fair bit of analysis and writing on our process of collecting and analyzing Canadian Federal Election tweets.