An Exploratory look at 13,968,293 JeSuisCharlie, JeSuisAhmed, JeSuisJuif, and CharlieHebdo tweets

#JeSuisCharlie #JeSuisAhmed #JeSuisJuif #CharlieHebdo I’ve spent the better part of a month collecting tweets from the #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, and #CharlieHebdo tweets. Last week, I pulled together all of the collection files, did some clean up, and some more analysis on the data set (76G of json!). This time I was able to take advantage of Peter Binkley’s twarc-report project. According to the report, the earliest tweet in the data set is from 2015-01-07 11:59:12 UTC, and the last tweet in the data set is from 2015-01-28 18:15:35 UTC.

An exploratory look at 257,093 JeSuisAhmed tweets

#JeSuisAhmed Had some time last night to do some exploratory analysis on some of the #JeSuisAhmed collection. This analysis is from the first tweet I was able to harvest #JeSuisAhmed to some time on January 14, 2015 when I copied over the json to experiment with a few of the twarc utilities. First tweet in data set: #JeSuisAhmed Reveals the Hero of the Paris Shooting Everyone Needs to Know by @sophie_kleeman http://t.

JeSuisCharlie images

Using the #JeSuisCharlie data set from January 11, 2015 (Warning! Will turn your browser into a potato for a few seconds), these are the image urls that have more than 1000 occurrences in the data set. How to create (requires unshrtn): % twarc.py --query "#JeSuisCharlie" % ~/git/twarc/utils/deduplicate.py JeSuisCharlie-tweets.json > JeSuisCharlie-tweets-deduped.json % cat JeSuisCharlie-tweets-deduped.json | utils/unshorten.py > JeSuisCharlie-tweets-deduped-ushortened.json % ~/git/twarc/utils/image_urls.py JeSuisCharlie-tweets-deduped-ushortened.json >| JeSuisCharlie-20150115-image-urls.txt % cat JeSuisCharlie-20150115-image-urls.txt | sort | uniq -c | sort -rn > JeSuisCharlie-20150115-image-urls-ranked.

Preliminary stats of JeSuisCharlie, JeSuisAhmed, JeSuisJuif, CharlieHebdo

#JeSuisAhmed $ wc -l *json 148479 %23JeSuisAhmed-20150109103430.json 94874 %23JeSuisAhmed-20150109141746.json 5885 %23JeSuisAhmed-20150112092647.json 249238 total $ du -h 2.7G . #JeSuisCharlie $ wc -l *json 3894191 %23JeSuisCharlie-20150109094220.json 1758849 %23JeSuisCharlie-20150109141730.json 226784 %23JeSuisCharlie-20150112092710.json 15 %23JeSuisCharlie-20150112092734.json 5879839 total $ du -h 32G . #JeSuisJuif $ wc -l *json 23694 %23JeSuisJuif-20150109172957.json 50603 %23JeSuisJuif-20150109173104.json 5941 %23JeSuisJuif-20150110003450.json 42237 %23JeSuisJuif-20150112094500.json 5064 %23JeSuisJuif-20150112094648.json 127539 total $ du -h 671M . #CharlieHebdo $ wc -l *json 4444585 %23CharlieHebdo-20150109172713.

Preliminary look at 3,893,553 JeSuisCharlie tweets

Background Last Friday (January 9, 2015) I started capturing #JeSuisAhmed, #JeSuisCharlie, #JeSuisJuif, and #CharlieHebdo with Ed Summers’ twarc. I have about 12 million tweets at the time of writing this, and plan on writing up something a little bit more in-depth in the coming weeks. But for now, some preliminary analysis of #JeSuisCharlie, and if you haven’t seen these two posts (”A Ferguson Twitter Archive”, “On Forgetting and hydration”) by Ed Summers, please do check them out.