# Preliminary look at 3,893,553 JeSuisCharlie tweets

## Background

Last Friday (January 9, 2015) I started capturing #JeSuisAhmed, #JeSuisCharlie, #JeSuisJuif, and #CharlieHebdo with Ed Summerstwarc. I have about 12 million tweets at the time of writing this, and plan on writing up something a little bit more in-depth in the coming weeks. But for now, some preliminary analysis of #JeSuisCharlie, and if you haven’t seen these two posts (”A Ferguson Twitter Archive”, “On Forgetting and hydration”) by Ed Summers, please do check them out.

How fast were the tweets coming in? Just to try and get a sense of this, I did a quick recording of tailing the twarc log for #JeSuisCharlie capture.

## Hydration

If you checked out both of Ed’s post, you’ll have noticed that the Twitter ToS forbid the distribution of tweets, but we can distribute the tweet ids, and based on that we can “rehydrate” the data set locally. The tweet ids for each hashtag will be/are available here. I’ll update and release the tweet ids files as I can.

We’re looking at just around 12 million tweets (un-deduped) at the time of writing, so the hydration process will take some time. I’d highly suggest using GNU Screen or tmux

Hydrate

• #JeSuisCharlie: % twarc.py --hydrate %23JeSuisCharlie-ids-20150112.txt > %23JeSuisCharlie-tweets-20150112.json
• #JeSuisAhmed: % twarc.py --hydrate %23JeSuisAhmed-ids-20150112.txt > %23JeSuisAhmed-tweets-20150112.json
• #JeSuisJuif: % twarc.py --hydrate %23JeSuisJuif-ids-20150112.txt > %23JeSuisJuif-tweets-20150112.json
• #CharlieHebdo: % twarc.py --hydrate %23CharlieHebdo-ids-20150112.txt > %23CharlieHebdo-tweets-20150112.json

## Map

#JeSuisCharlie tweets with geo coordinates.

In this data set, we have 51,942 tweets with geo coordinates availble. This represents about 1.33% of the entire data set (3,893,553 tweets).

How do you make this?

• Create the geojson % ~/git/twarc/utils/geojson.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped.geojson

• Give the geojson a variable name.

• Use Leaflet.js to put all the tweets with geo coordinates on a map like this.

## Top URLs

Top 10 URLs tweeted from #JeSuisCharlie.

Full list of urls can be found here.

How do you get the list?

• % cat %23JeSuisCharlie-cat-20150115-tweets-deduped.json | ~/git/twarc/utils/unshorten.py > %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json
• % cat %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json | ~/git/twarc/utils/urls.py| sort | uniq -c | sort -n > %23JeSuisCharlie-cat-20150115-urls.txt

Top 10 Twitter clients used from #JeSuisCharlie.

5. (86209)TweetDeck
6. (82616) Twitter for Windows Phone
7. (70286) Twitter for Android Tablets
9. (39174) Instagram
10. (21424) Mobile Web (M5)

Full list of clients can be found here.

How do you get this?

• % ~/git/twarc/utils/source.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped-source.html

## Word cloud

Word cloud from #JeSuisCharlie tweets.

I couldn’t get the word cloud to embed nice, so you’ll have to check it out here.

How do you create the word cloud?

• % git/twarc/utils/wordcloud.py %23JeSuisCharlie-cat-20150115-tweets.json > %23JeSuisCharlie-wordcloud.html