Twitter APIs

Sample, Users, Followers, Friends, Trends, Timeline, Retweets, Replies

API Changes

Compat, Expanded


$ twarc filter "Vancouver" > vancouver.jsonl

$ twarc filter --follow 255681367 > ian.jsonl

$ twarc filter --locations "49.267132, -122.968941" > sfu.jsonl

What's that 1% thing people talk about?


18,000 tweets every 15 minutes

7 day window

$ twarc search "Vancouver" > vancouver.jsonl

$ twarc search "to:ianmilligan1" > ian.jsonl

$ twarc search --geocode "49.267132, -122.968941" > sfu.jsonl

Twitter Data

It's just so wonderful to work with!

  "contributors": null,
  "truncated": true,
  "text": "@realDonaldTrump's new #coverphoto. \n\nI mean... my God... the faces?!!? \n๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ ๐Ÿ˜ \n\n@CNN @JoshMalinaโ€ฆ",
  "is_quote_status": false,
  "in_reply_to_status_id": null,
  "id": 898725869950074900,
  "favorite_count": 0,
  "entities": {
    "symbols": [],
    "user_mentions": [
        "id": 25073877,
        "indices": [
        "id_str": "25073877",
        "screen_name": "realDonaldTrump",
        "name": "Donald J. Trump"
        "id": 759251,
        "indices": [
        "id_str": "759251",
        "screen_name": "CNN",
        "name": "CNN"
        "id": 24931027,
        "indices": [
        "id_str": "24931027",
        "screen_name": "JoshMalina",
        "name": "๐ŸŒŽJoshua Malina๐ŸŒŽ"
    "hashtags": [
        "indices": [
        "text": "coverphoto"
    "urls": [
        "url": "",
        "indices": [
        "expanded_url": "",
        "display_url": "โ€ฆ"


So, that's the gist of a tweet, what can we do with a bunch of those?

There are a lot of potential fields!

...if you're curious...

Derivative datasets

Tweet IDs

On Forgetting and hydration

Twitter ToS โœ…

$ twarc dehydrate tweets.jsonl > tweet-ids.txt


(We'll get more into these later!)

Extracting URLs

cat tweets.json | | sort | uniq > urls.txt

But, what about shortened URLs?

If I have all the shortened unique URLs...

That's a seed list!!

                cat $URLS | while read line; do
                  curl -s -S "$line" > /dev/null
                  sleep 1

People Tweet a lot of images...

$ cat tweets.json | > image_urls.txt

$ cat image_urls.txt | while read line; do wget $line; done

What does 1+ million images look like?


Twitter Account


You already have Python on your system.

You hopefully have pip installed as well.

...if you don't

Installing pip

Nice to have


              $ sudo -H pip install twarc
$ sudo -H pip install twarc
Let's configure twarc

              $ twarc configure
$ twarc configure

Please enter Twitter authentication credentials.

consumer key: some key 
consumer secret: some secret
access token: some access token
access token secret: some token secret

Let's create some datasets!

Option #1

Using the Sample API

Caveat: This can run a long time.

GNU Screen is your friend here.

              $ twarc sample > sample_stream_tweets.jsonl

Check and see how things are going

In another terminal or tab

              $ tail -f sample_stream_tweets.jsonl
$ tail -f sample_stream_tweets.jsonl 
Example of sample stream output showing tweet JSON data
{"favorite_count": 0, "in_reply_to_user_id": null, "in_reply_to_status_id_str": null, "quoted_status_id": 955947308822204416, "coordinates": null, "in_reply_to_screen_name": null, "possibly_sensitive": false, "id": 956541347833331713, "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": [{"indices": [15, 38], "expanded_url": "", "display_url": "\u2026", "url": ""}]}, "quoted_status_id_str": "955947308822204416", "truncated": false, "text": "Fuck Australia", "in_reply_to_user_id_str": null, "is_quote_status": true, "timestamp_ms": "1516892203659", "reply_count": 0, "display_text_range": [0, 14], "source": "Tweetlogix", "lang": "en", "filter_level": "low", "created_at": "Thu Jan 25 14:56:43 +0000 2018", "user": {"listed_count": 25, "profile_link_color": "990000", "follow_request_sent": null, "default_profile_image": false, "verified": false, "profile_background_color": "EBEBEB", "protected": false, "profile_background_image_url": "", "notifications": null, "favourites_count": 1933, "id": 165566903, "profile_background_image_url_https": "", "contributors_enabled": false, "created_at": "Sun Jul 11 23:40:26 +0000 2010", "profile_background_tile": true, "geo_enabled": true, "followers_count": 524, "statuses_count": 93462, "following": null, "profile_image_url_https": "", "url": null, "profile_image_url": "", "utc_offset": -28800, "name": "354", "lang": "en", "screen_name": "Ezell_Jenkins", "is_translator": false, "friends_count": 479, "profile_banner_url": "", "location": "LOST", "profile_text_color": "333333", "translator_type": "none", "description": "Staying black and minding my business...", "time_zone": "Pacific Time (US & Canada)", "profile_use_background_image": true, "profile_sidebar_fill_color": "F3F3F3", "default_profile": false, "id_str": "165566903", "profile_sidebar_border_color": "FFFFFF"}, "retweeted": false, "geo": null, "retweet_count": 0, "id_str": "956541347833331713", "place": null, "quoted_status": {"favorite_count": 1818, "in_reply_to_user_id": null, "in_reply_to_status_id_str": null, "geo": null, "coordinates": null, "in_reply_to_screen_name": null, "possibly_sensitive": false, "id": 955947308822204416, "entities": {"hashtags": [], "symbols": [], "user_mentions": [], "urls": [{"indices": [117, 140], "expanded_url": "", "display_url": "\u2026", "url": ""}]}, "truncated": true, "text": "YIKES: Horrifying video captures a spider wasp and a huntsman spider doing battle in the bathroom of a home in Aust\u2026", "in_reply_to_user_id_str": null, "is_quote_status": false, "reply_count": 716, "display_text_range": [0, 140], "source": "SocialFlow", "lang": "en", "filter_level": "low", "created_at": "Tue Jan 23 23:36:13 +0000 2018", "user": {"listed_count": 52185, "profile_link_color": "336699", "follow_request_sent": null, "default_profile_image": false, "verified": true, "profile_background_color": "6E8EB5", "protected": false, "profile_background_image_url": "", "notifications": null, "favourites_count": 462, "id": 28785486, "profile_background_image_url_https": "", "contributors_enabled": false, "created_at": "Sat Apr 04 12:40:32 +0000 2009", "profile_background_tile": false, "geo_enabled": true, "followers_count": 13362675, "statuses_count": 194187, "following": null, "profile_image_url_https": "", "url": "", "profile_image_url": "", "utc_offset": -18000, "name": "ABC News", "lang": "en", "screen_name": "ABC", "is_translator": false, "friends_count": 734, "profile_banner_url": "", "location": "New York City / Worldwide", "profile_text_color": "333333", "translator_type": "regular", "description": "See the whole picture with @ABC News. Facebook: Instagram:", "time_zone": "Eastern Time (US & Canada)", "profile_use_background_image": true, "profile_sidebar_fill_color": "DDEEF6", "default_profile": false, "id_str": "28785486", "profile_sidebar_border_color": "FFFFFF"}, "retweeted": false, "extended_tweet": {"full_text": "YIKES: Horrifying video captures a spider wasp and a huntsman spider doing battle in the bathroom of a home in Australia.", "display_text_range": [0, 145], "extended_entities": {"media": [{"sizes": {"medium": {"w": 720, "resize": "fit", "h": 720}, "thumb": {"w": 150, "resize": "crop", "h": 150}, "large": {"w": 720, "resize": "fit", "h": 720}, "small": {"w": 680, "resize": "fit", "h": 680}}, "type": "video", "media_url_https": "", "video_info": {"variants": [{"bitrate": 256000, "content_type": "video/mp4", "url": ""}, {"bitrate": 832000, "content_type": "video/mp4", "url": ""}, {"content_type": "application/x-mpegURL", "url": ""}, {"bitrate": 1280000, "content_type": "video/mp4", "url": ""}], "aspect_ratio": [1, 1], "duration_millis": 46880}, "media_url": "", "display_url": "", "indices": [146, 169], "expanded_url": "", "id": 955947163317612546, "id_str": "955947163317612546", "url": ""}]}, "entities": {"media": [{"sizes": {"medium": {"w": 720, "resize": "fit", "h": 720}, "thumb": {"w": 150, "resize": "crop", "h": 150}, "large": {"w": 720, "resize": "fit", "h": 720}, "small": {"w": 680, "resize": "fit", "h": 680}}, "type": "video", "media_url_https": "", "video_info": {"variants": [{"bitrate": 256000, "content_type": "video/mp4", "url": ""}, {"bitrate": 832000, "content_type": "video/mp4", "url": ""}, {"content_type": "application/x-mpegURL", "url": ""}, {"bitrate": 1280000, "content_type": "video/mp4", "url": ""}], "aspect_ratio": [1, 1], "duration_millis": 46880}, "media_url": "", "display_url": "", "indices": [146, 169], "expanded_url": "", "id": 955947163317612546, "id_str": "955947163317612546", "url": ""}], "hashtags": [], "symbols": [], "user_mentions": [], "urls": [{"indices": [122, 145], "expanded_url": "", "display_url": "", "url": ""}]}}, "retweet_count": 1141, "id_str": "955947308822204416", "place": null, "in_reply_to_status_id": null, "favorited": false, "quote_count": 2083, "contributors": null}, "in_reply_to_status_id": null, "favorited": false, "quote_count": 0, "contributors": null}

Option #2

Using the Filter (streaming) API

Collect using terms or hashtags:

              $ twarc filter vancouver,toronto > vancouver_tweets.jsonl

Collect using a bounding box:

              $ twarc filter --location "\-123.27,49.195,-123.020,49.315" > vancouver_location_tweets.jsonl

Collect new tweets from a given user:

              $ twarc filter --follow 255681367 > ian.jsonl

Be creative, and combine them!

              $ twarc filter yolo --location "\-123.27,49.195,-123.020,49.315" > vancouver_yolo_tweets.jsonl

Option #3

Using the Search API

Search and collect using terms or hashtags:

              $ twarc search vancouver > vancouver_search_tweets.jsonl
              $ twarc search "#vancouver OR #toronto" > van_tor_search_tweets.jsonl

Search and collect tweets to a user:

              $ twarc search 'to:realdonaldtrump' > donald_search_tweets.jsonl

Search and collect tweets in a geographic area:

              $ twarc search --geocode 49.246292,-123.116226,20km > vancouver_geo_search_tweets.jsonl

Be creative, and combine them!

              $ twarc search 'trump to:JustinTrudeau' --geocode 49.246292,-123.116226,100km > justin_search_tweets.jsonl

Option #4

Download some!

DocNow Catalog

Internet Archive

Other considerations

You have a log! Logs are great!

$ tail -f twarc.log
Check out the help

Want metadata for a given user?

              $ twarc users ruebot
Example user metadata JSON output

Who follows the user?

              $ twarc followers ruebot

Who does the user follow?

              $ twarc friends ruebot

Retweets and Replies

              $ twarc retweets 896523232098078720 > 896523232098078720_retweets.jsonl
              $ twarc replies 896523232098078720 > 896523232098078720_replies.jsonl
              $ twarc replies 896523232098078720 --recursive > 896523232098078720_recursive_replies.jsonl

Hydrate and Dehydrate

              $ twarc dehydrate justin_search_tweets.jsonl > justin_search_tweet_ids.txt
              $ twarc hydrate justin_search_tweet_ids.txt > justin_search_tweets.jsonl

Let's look at the utilities, and maybe do something with this data!

If you haven't already, you'll need to clone, or download the twarc repo to use the utilities:

              $ git clone

We did some geolocation collection, let's see what it looks like on a map!

              $ ~/git/twarc/utils/ tweets.jsonl > tweets.geojson

If you don't want to mess around with Leaflet, use GitHubGist!

What if you just wanted the text?

              $ cat vancouver_geo_search_tweets.jsonl | jq .full_text

Voyant Tools

What if I wanted all the image urls?

              $ ~/git/twarc/ tweets.jsonl > images_urls.txt

Oh, can I download them?

              cat image_urls.txt | while read line; do wget "$line"; done

Oh, can I get the urls tweeted?

              $ ~/git/twarc/ tweets.jsonl > urls.txt

What if I want to filter out tweets by date?

              $ ~git/utils/ --mindate 25-jan-2018 tweets.jsonl > filtered.jsonl

Now comes the fun part, be creative and ask the data questions!

command line utilities, jq, Python, Pandas, Ruby, aut, etc., these are your friends!

Thank you!