twarc

Preliminary look at 3,893,553 #JeSuisCharlie tweets

Background

Last Friday (January 9, 2015) I started capturing #JeSuisAhmed, #JeSuisCharlie, #JeSuisJuif, and #CharlieHebdo with Ed Summers' twarc. I have about 12 million tweets at the time of writing this, and plan on writing up something a little bit more in-depth in the coming weeks. But for now, some preliminary analysis of #JeSuisCharlie, and if you haven't seen these two posts ("A Ferguson Twitter Archive", "On Forgetting and hydration") by Ed Summers, please do check them out.

How fast were the tweets coming in? Just to try and get a sense of this, I did a quick recording of tailing the twarc log for #JeSuisCharlie capture.

Hydration

If you checked out both of Ed's post, you'll have noticed that the Twitter ToS forbid the distribution of tweets, but we can distribute the tweet ids, and based on that we can "rehydrate" the data set locally. The tweet ids for each hashtag will be/are available here. I'll update and release the tweet ids files as I can.

We're looking at just around 12 million tweets (un-deduped) at the time of writing, so the hydration process will take some time. I'd highly suggest using GNU Screen or tmux

Hydrate

  • #JeSuisCharlie: % twarc.py --hydrate %23JeSuisCharlie-ids-20150112.txt > %23JeSuisCharlie-tweets-20150112.json
  • #JeSuisAhmed: % twarc.py --hydrate %23JeSuisAhmed-ids-20150112.txt > %23JeSuisAhmed-tweets-20150112.json
  • #JeSuisJuif: % twarc.py --hydrate %23JeSuisJuif-ids-20150112.txt > %23JeSuisJuif-tweets-20150112.json
  • #CharlieHebdo: % twarc.py --hydrate %23CharlieHebdo-ids-20150112.txt > %23CharlieHebdo-tweets-20150112.json

Map

#JeSuisCharlie tweets with geo coordinates.

In this data set, we have 51,942 tweets with geo coordinates availble. This represents about 1.33% of the entire data set (3,893,553 tweets).

How do you make this?

  • Create the geojson % ~/git/twarc/utils/geojson.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped.geojson

  • Give the geojson a variable name.

  • Use Leaflet.js to put all the tweets with geo coordinates on a map like this.

Top URLs

Top 10 URLs tweeted from #JeSuisCharlie.

  1. (11220) http://www.newyorker.com/culture/culture-desk/cover-story-2015-01-19?mbid=social_twitter
  2. (2278) http://www.europe1.fr/direct-video
  3. (1615) https://www.youtube.com/watch?v=4KBdnOrTdMI&feature=youtu.be
  4. (1347) https://www.youtube.com/watch?v=-bjbUg9d64g&feature=youtu.be
  5. (1333) http://www.amazon.com/Charlie-Hebdo/dp/B00007LMFU/
  6. (977) http://www.clubic.com/internet/actualite-748637-opcharliehebdo-anonymous-vengeance.html
  7. (934) http://www.maryam-rajavi.com/en/index.php?option=com_content&view=article&id=1735&catid=159&Itemid=506
  8. (810) http://www.lequipe.fr/eStore/Offres/Achat/271918
  9. (771) http://srogers.cartodb.com/viz/123be814-96bb-11e4-aec1-0e9d821ea90d/embed_map
  10. (605) https://www.youtube.com/watch?v=et4fYWKjP_o

Full list of urls can be found here.

How do you get the list?

  • % cat %23JeSuisCharlie-cat-20150115-tweets-deduped.json | ~/git/twarc/utils/unshorten.py > %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json
  • % cat %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json | ~/git/twarc/utils/urls.py| sort | uniq -c | sort -n > %23JeSuisCharlie-cat-20150115-urls.txt

Twitter Clients

Top 10 Twitter clients used from #JeSuisCharlie.

  1. (1283521) Twitter for iPhone
  2. (951925) Twitter Web Client
  3. (847308) Twitter for Android
  4. (231713) Twitter for iPad
  5. (86209)TweetDeck
  6. (82616) Twitter for Windows Phone
  7. (70286) Twitter for Android Tablets
  8. (44189) Twitter for Websites
  9. (39174) Instagram
  10. (21424) Mobile Web (M5)

Full list of clients can be found here.

How do you get this?

  • % ~/git/twarc/utils/source.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped-source.html

Word cloud

Word cloud from #JeSuisCharlie tweets.

I couldn't get the word cloud to embed nice, so you'll have to check it out here.

How do you create the word cloud?

  • % git/twarc/utils/wordcloud.py %23JeSuisCharlie-cat-20150115-tweets.json > %23JeSuisCharlie-wordcloud.html

More Rob Ford tweets on a map

Another example of how global the Rob Ford scandal has become via harvested tweets with geographic coordinates. This example is a harvest of #rofo, #robford, #topoli, and #ShirtlessHorde.

The harvest took place on July 6, 2014, and should cover the discussion around the time of Rob Ford's return on June 30, 2014 to July 6, 2014. The tweets with available geo-information represents less than 10% of all tweets harvested. If you would like the raw tweet data (not the geoJSON - you can grab that if you view the source), you can get it from here. If you would like to see all the tweets harvested, you can view them here. (Warning! This might blow up your browser. There is a fair bit of data here.)

Tweets were harvested with Ed Summer's Twarc.

#rofo OR #robford OR #topoli OR #ShirtlessHorde