Background
Last Friday (January 9, 2015) I started capturing #JeSuisAhmed, #JeSuisCharlie, #JeSuisJuif, and #CharlieHebdo with Ed Summers’ twarc. I have about 12 million tweets at the time of writing this, and plan on writing up something a little bit more in-depth in the coming weeks. But for now, some preliminary analysis of #JeSuisCharlie, and if you haven’t seen these two posts (”A Ferguson Twitter Archive”, “On Forgetting and hydration”) by Ed Summers, please do check them out.
How fast were the tweets coming in? Just to try and get a sense of this, I did a quick recording of tailing the twarc log for #JeSuisCharlie capture.
Hydration
If you checked out both of Ed’s post, you’ll have noticed that the Twitter ToS forbid the distribution of tweets, but we can distribute the tweet ids, and based on that we can “rehydrate” the data set locally. The tweet ids for each hashtag will be/are available here. I’ll update and release the tweet ids files as I can.
We’re looking at just around 12 million tweets (un-deduped) at the time of writing, so the hydration process will take some time. I’d highly suggest using GNU Screen or tmux
Hydrate
- #JeSuisCharlie:
% twarc.py --hydrate %23JeSuisCharlie-ids-20150112.txt > %23JeSuisCharlie-tweets-20150112.json
- #JeSuisAhmed:
% twarc.py --hydrate %23JeSuisAhmed-ids-20150112.txt > %23JeSuisAhmed-tweets-20150112.json
- #JeSuisJuif:
% twarc.py --hydrate %23JeSuisJuif-ids-20150112.txt > %23JeSuisJuif-tweets-20150112.json
- #CharlieHebdo:
% twarc.py --hydrate %23CharlieHebdo-ids-20150112.txt > %23CharlieHebdo-tweets-20150112.json
Map
#JeSuisCharlie tweets with geo coordinates.
In this data set, we have 51,942 tweets with geo coordinates availble. This represents about 1.33% of the entire data set (3,893,553 tweets).
How do you make this?
Create the geojson
% ~/git/twarc/utils/geojson.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped.geojson
Give the geojson a variable name.
Use Leaflet.js to put all the tweets with geo coordinates on a map like this.
Top URLs
Top 10 URLs tweeted from #JeSuisCharlie.
- (11220) http://www.newyorker.com/culture/culture-desk/cover-story-2015-01-19?mbid=social_twitter
- (2278) http://www.europe1.fr/direct-video
- (1615) https://www.youtube.com/watch?v=4KBdnOrTdMI&feature=youtu.be
- (1347) https://www.youtube.com/watch?v=-bjbUg9d64g&feature=youtu.be
- (1333) http://www.amazon.com/Charlie-Hebdo/dp/B00007LMFU/
- (977) http://www.clubic.com/internet/actualite-748637-opcharliehebdo-anonymous-vengeance.html
- (934) http://www.maryam-rajavi.com/en/index.php?option=com_content&view=article&id=1735&catid=159&Itemid=506
- (810) http://www.lequipe.fr/eStore/Offres/Achat/271918
- (771) http://srogers.cartodb.com/viz/123be814-96bb-11e4-aec1-0e9d821ea90d/embed_map
- (605) https://www.youtube.com/watch?v=et4fYWKjP_o
Full list of urls can be found here.
How do you get the list?
% cat %23JeSuisCharlie-cat-20150115-tweets-deduped.json | ~/git/twarc/utils/unshorten.py > %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json
% cat %23JeSuisCharlie-cat-20150115-tweets-deduped-unshortened.json | ~/git/twarc/utils/urls.py| sort | uniq -c | sort -n > %23JeSuisCharlie-cat-20150115-urls.txt
Twitter Clients
Top 10 Twitter clients used from #JeSuisCharlie.
- (1283521) Twitter for iPhone
- (951925) Twitter Web Client
- (847308) Twitter for Android
- (231713) Twitter for iPad
- (86209)TweetDeck
- (82616) Twitter for Windows Phone
- (70286) Twitter for Android Tablets
- (44189) Twitter for Websites
- (39174) Instagram
- (21424) Mobile Web (M5)
Full list of clients can be found here.
How do you get this?
% ~/git/twarc/utils/source.py %23JeSuisCharlie-cat-20150115-tweets-deduped.json > %23JeSuisCharlie-cat-20150115-tweets-deduped-source.html
Word cloud
Word cloud from #JeSuisCharlie tweets.
I couldn’t get the word cloud to embed nice, so you’ll have to check it out here.
How do you create the word cloud?
% git/twarc/utils/wordcloud.py %23JeSuisCharlie-cat-20150115-tweets.json > %23JeSuisCharlie-wordcloud.html