twitter

Four Fucking Years of Donald Trump

Nearly four years ago I decided to start collecting tweets to Donald Trump out of morbid curiosity. If I was a real archivist, I would have planned this out a little bit better, and started collecting on election night in 2016, or inaguration day 2017. I didn’t. Using twarc, I started collecting with the Filter (Streaming) API on May 7, 2017. That process failed, and I pivoted to using the Search API.

twut. Wait, wut? twut?

Originally posted here. Introduction A few of the Archives Unleashed team members have a pretty in-depth background of working with Twitter data. Jimmy Lin spent some time at Twitter during an extended-sabbatical, Sam Fritz spent some time working with members of the Social Media Lab team previous to joining the Archives Unleashed Project, and Ian Milligan and I have done a fair bit of analysis and writing on our process of collecting and analyzing Canadian Federal Election tweets.

17,525,913 images tweeted at Donald Trump

Juxta A couple years ago I wrote about a method for creating a collage out of 1.2M images collected from the 2015 Canadian Federal Election Twitter dataset. That method was very resource intensive in terms of the amount of temporary disk storage required to create the collage. As the number of images in a given collage increased, the amount of temporary disk space scaled exponentially; 3.5T for 1.2M #exln42 images, and ~90T for 6.

Tweets to @realdonaldtrump; How many fucks are there to give?

I’ve been collecting tweets to @realDonaldTrump since June 2017. In my most recent time pulling together, and deduping the dataset I asked myself, “I wonder how many occurrences of ‘fuck’ are in the dataset.” Or, how many fucks are there to give? Well… The data is updated by running a query on the Standard Search API every five days. $ twarc search 'to:realdonaldtrump' --log donald_search_$DATE.log > donald_search_$DATE.jsonl Which yields something like this every five days.

Twitter Wordcloud Pipeline

At this past week’s Archives Unleashed dataton, I jokingly created some wordclouds of my Co-PI’s timelines. Finished my most likely bigly winning #hackarchives project: A Word Cloud of @lintool's timeline!https://t.co/eK2KPGjaGo — nick ruest (@ruebot) April 27, 2018 Or, @ianmilligan1 #HackArchiveshttps://t.co/qMxiet0osl — nick ruest (@ruebot) April 27, 2018 Mat Kelly asked about the process this morning, so here is a little how-to of the pipeline: Requirements: twarc jq wordcloud_cli.

The world is a beautiful and terrible place

This is the text for my presention at the “National Forum on Ethics and Archiving the Web”. I had the honour of being on an Archiving Trauma panel with some great people. Michael Connor, Chido Muchemwa, Coral Salomón, Tonia Sutherland, and Lauren Work, thank you for sharing your stories! The world is a beautiful and terrible place. Twitter can be beautiful. Twitter is fucking awful. So, capturing traumatic events on Twitter.

Twitter Bots

Introduction List of bots I run, divided up by type. anon @gccaedits IP address ranges Periodic Twitter archive requests diffengine @canadaland_diff Account Suspended @cbc_diff Account Suspended @cpc_diff Account Suspended @fairpressdiff Account Suspended @globemail_diff Account Suspended @greenparty_diff Account Suspended @lapress_diff Account Suspended @liberalca_diff Account Suspended @millennial_diff Account Suspended @natpost_diff Account Suspended @ndpca_diff Account Suspended @onn_diff Account Suspended @ontario_diff Account Suspended @pmgcca_diff Account Suspended @therebel_diff Account Suspended @torontosun_diff Account Suspended @torstar_diff Account Suspended @yyc_herald_diff Account Suspended YUDLbots @YUDLbot Deactivated @YUDLdog Deactivated @YUDLcat Deactivated DPLA bots @dplafy Account Suspended @dpl_eh Deactivated Other @notflix_n_chill (code) Deactivated

Twitter Datasets and Derivative data

#healthcanada #NACI #fordnation #medicalfreedom #covid19 #covid19vaccines #protectourfamilies #protectyourchildren #holdtheline tweets 2,661,117 tweet ids for #healthcanada #NACI #fordnation #medicalfreedom #covid19 #covid19vaccines #protectourfamilies #protectyourchildren #holdtheline tweets, collected with Documenting the Now's twarc. Tweets can be “rehydrated” with Documenting the Now’s twarc, or Hydrator. twarc hydrate tweet-ids.txt tweets.jsonl ID files are available for all hashtags or some individual hashtags: covid19-ids.txt covid19vaccines-ids.txt fordnation-ids.txt healthcanada-ids.txt healthcanada-NACI-fordnation-medicalfreedom-covid19-covid19vaccines-protectourfamilies-protectyourchildren-holdtheline-ids.txt holdtheline-ids.txt medicalfreedom-ids.txt NACI-ids.txt protectyourchildren-ids.txt Tweets were collected via the Standard Search API on: November 18, 2021 November 21, 2021 November 26, 2021 December 1, 2021 Dataset #elxn44 tweets (44th Canadian Federal Election) 2,075,645 tweet ids for #elxn44 tweets, collected with Documenting the Now's twarc.

14,478,518 WomensMarch tweets January 12-28, 2017

Overview A couple Saturday mornings ago, I was on the couch listening to records and reading a book when Christina Harlow and MJ Suhonos asked me about collecting #WomensMarch tweets. Little did I know at the time #WomensMarch would be the largest volume collection I have ever seen. By the time I stopped collecting a week later, we’d amassed 14,478,518 unique tweet ids from 3,582,495 unique users, and at one point hit around 1 million tweets in a single hour.

1,203,867 elxn42 images

Background Last August, I began capturing the #elxn42 hashtag as an experiment, and potential research project with Ian Milligan. Once Justin Trudeau was sworn in as the 23rd Prime Minister of Canada, we stopped collection, and began analysing the dataset. We wrote that analysis up for the Code4Lib Journal, which will be published in the next couple weeks. In the interim, you can check out our pre-print here. Included in that dataset is a line-deliminted list of a url to every embedded image tweeted in the dataset; 1,203,867 images.