Twitter

Tweets to @realdonaldtrump; How many fucks are there to give?

I’ve been collecting tweets to @realDonaldTrump since June 2017. In my most recent time pulling together, and deduping the dataset I asked myself, “I wonder how many occurrences of ‘fuck’ are in the dataset.” Or, how many fucks are there to give? Well… The data is updated by running a query on the Standard Search API every five days. $ twarc search ‘to:realdonaldtrump’ –log donaldsearch$DATE.log > donaldsearch$DATE.jsonl Which yields something like this every five days.

Twitter Wordcloud Pipeline

At this past week’s Archives Unleashed dataton, I jokingly created some wordclouds of my Co-PI’s timelines. Finished my most likely bigly winning #hackarchives project: A Word Cloud of @lintool's timeline!https://t.co/eK2KPGjaGo — nick ruest (@ruebot) April 27, 2018 Or, @ianmilligan1 #HackArchiveshttps://t.co/qMxiet0osl — nick ruest (@ruebot) April 27, 2018 Mat Kelly asked about the process this morning, so here is a little how-to of the pipeline: Requirements: twarc jq wordcloud_cli.

The world is a beautiful and terrible place

This is the text for my presention at the “National Forum on Ethics and Archiving the Web”. I had the honour of being on an Archiving Trauma panel with some great people. Michael Connor, Chido Muchemwa, Coral Salomón, Tonia Sutherland, and Lauren Work, thank you for sharing your stories! The world is a beautiful and terrible place. Twitter can be beautiful. Twitter is fucking awful. So, capturing traumatic events on Twitter.

A month of tweets at @realDonaldTrump

Twitter Bots

Twitter Datasets and Derivative data

Tweets to Donald Trump (@realDonaldTrump) 59,261,490 tweet ids for tweets directed at Donald Trump (@realDonaldTrump), collected with Documenting the Now’s twarc. Tweets can be “rehydrated” with Documenting the Now’s twarc, or Hydrator. twarc hydrate to_realdonaldtrump_ids.txt to_donaltrump.jsonl. Tweets from May 7, 2017 - June 21, 2017 of the dataset used a combination of the Filter (Streaming) API and Search API. The Filter API failed on June 21, 2017. From June 23, 2017 forward only the Search API was used to collect.

14,478,518 WomensMarch tweets January 12-28, 2017

Overview A couple Saturday mornings ago, I was on the couch listening to records and reading a book when Christina Harlow and MJ Suhonos asked me about collecting #WomensMarch tweets. Little did I know at the time #WomensMarch would be the largest volume collection I have ever seen. By the time I stopped collecting a week later, we’d amassed 14,478,518 unique tweet ids from 3,582,495 unique users, and at one point hit around 1 million tweets in a single hour.

1,203,867 elxn42 images

Background Last August, I began capturing the #elxn42 hashtag as an experiment, and potential research project with Ian Milligan. Once Justin Trudeau was sworn in as the 23rd Prime Minister of Canada, we stopped collection, and began analysing the dataset. We wrote that analysis up for the Code4Lib Journal, which will be published in the next couple weeks. In the interim, you can check out our pre-print here. Included in that dataset is a line-deliminted list of a url to every embedded image tweeted in the dataset; 1,203,867 images.

A look at 14,939,154 paris Bataclan parisattacks porteouverte tweets

On November 13, 2015 I was at the “Web Archives 2015: Capture, Curate, Analyze” listening to Ian Milligan give the closing keynote when Thomas Padilla tweeted the following to me: @ruebot terrible news, possible charlie hebdo connection - https://t.co/SkEusgqgz5 — Thomas Padilla (@thomasgpadilla) November 13, 2015 I immediately started collecting. When tragedies like this happen, I feel pretty powerless. But, I figure if I can collect something like this, similar to what I did for the Charlie Hebdo attacks, it’s something.

An Exploratory look at 13,968,293 JeSuisCharlie, JeSuisAhmed, JeSuisJuif, and CharlieHebdo tweets

#JeSuisCharlie #JeSuisAhmed #JeSuisJuif #CharlieHebdo I’ve spent the better part of a month collecting tweets from the #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, and #CharlieHebdo tweets. Last week, I pulled together all of the collection files, did some clean up, and some more analysis on the data set (76G of json!). This time I was able to take advantage of Peter Binkley’s twarc-report project. According to the report, the earliest tweet in the data set is from 2015-01-07 11:59:12 UTC, and the last tweet in the data set is from 2015-01-28 18:15:35 UTC.