An Exploratory look at 13,968,293 #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, and #CharlieHebdo tweets

#JeSuisCharlie #JeSuisAhmed #JeSuisJuif #CharlieHebdo

I've spent the better part of a month collecting tweets from the #JeSuisCharlie, #JeSuisAhmed, #JeSuisJuif, and #CharlieHebdo tweets. Last week, I pulled together all of the collection files, did some clean up, and some more analysis on the data set (76G of json!). This time I was able to take advantage of Peter Binkley's twarc-report project. According to the report, the earliest tweet in the data set is from 2015-01-07 11:59:12 UTC, and the last tweet in the data set is from 2015-01-28 18:15:35 UTC. This data set includes 13,968,293 tweets (10,589,910 retweets - 75.81%) from 3,343,319 different users over 21 days. You can check out a word cloud of all the tweets here.

First tweet in data set (numberic sort of tweet ids):


Hydration

If you want to experiment/follow along with what I've done here, you can "rehydrate" the data set with twarc. You can grab the Tweet ids for the data set from here (Data & Analysis tab).

% twarc.py --hydrate JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweet-ids-20150129.txt > JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json

The hydration process will take some time. I'd highly suggest using GNU Screen or tmux, and grabbing approximately 15 pots of coffee.

Map

In this data set, we have 133,970 tweets with geo coordinates availble. This represents about 0.96% of the entire data set.

The map is available here in a separate page since the geojson file is 83M and will potato your browser while everything loads. If anybody knows how to stream that geojson file to Leaflet.js so the browser doesn't potato, please comment! :-)

Users

These are the top 10 users in the data set.

  1. 35,420 tweets Promo_Culturel
  2. 33,075 tweets BotCharlie
  3. 24,251 tweets YaMeCanse21
  4. 23,126 tweets yakacliquer
  5. 17,576 tweets YaMeCanse20
  6. 15,315 tweets iS_Angry_Bird
  7. 9,615 tweets AbraIsacJac
  8. 9,318 tweets AnAnnoyingTweep
  9. 3,967 tweets rightnowio_feed
  10. 3,514 tweets russfeed

This comes from twarc-report's report-profiler.py.

$ ~/git/twarc-report/reportprofiler.py -o text JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json

Hashtags

There are teh top 10 hashtags in the data set.

  1. 8,597,175 tweets #charliehebdo
  2. 7,911,343 tweets #jesuischarlie
  3. 377,041 tweets #jesuisahmed
  4. 264,869 tweets #paris
  5. 186,976 tweets #france
  6. 177,448 tweets #parisshooting
  7. 141,993 tweets #jesuisjuif
  8. 140,539 tweets #marcherepublicaine
  9. 129,484 tweets #noussommescharlie
  10. 128,529 tweets #afp

URLs

These are the top 10 URLs in the data set. 3,771,042 tweets (27.00%) had an URL associated with them.

These are all shortened urls. I'm working through and issue with unshorten.py.

  1. http://bbc.in/1xPaVhN (43,708)
  2. http://bit.ly/1AEpWnE (19,328)
  3. http://bit.ly/1DEm0TK (17,033)
  4. http://nyr.kr/14AeVIi (14,118)
  5. http://youtu.be/4KBdnOrTdMI (13,252)
  6. http://bbc.in/14ulyLt (12,407)
  7. http://europe1.fr/direct-video (9,228)
  8. http://bbc.in/1DxNLQD (9,044)
  9. http://ind.pn/1s5EV8w (8,721)
  10. http://srogers.cartodb.com/viz/123be814-96bb-11e4-aec1-0e9d821ea90d/embed_map (8,581)

This comes from twarc-report's report-profiler.py.

$ ~/git/twarc-report/reportprofiler.py -o text JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json

Media

These are the top 10 media urls in the data set. 8,141,552 tweets (58.29%) had a media URL associated with them.

36,753 Occurrences

img

35,942 Occurrences

img

33,501 Occurrences

img

31,712 Occurrences

img

29,359 Occurrences

img

26,334 Occurrences

img

25,989 Occurrences

img

23,974 Occurrences

img

22,659 Occurrences

img

22,421 Occurrences

img

This comes from twarc-report's report-profiler.py.

$ ~/git/twarc-report/reportprofiler.py -o text JeSuisCharlie-JeSuisAhmed-JeSuisJuif-CharlieHebdo-tweets-20150129.json