twarc to grab some data.
$ twarc timeline lintool > lintool.jsonl
Extract the tweet text.
$ cat lintool.jsonl | jq -r .full_text > lintool_tweet.txt
Remove all the URLs from the tweets.
$ sed -e 's!http[s]\?://\S*!!g' lintool_tweet.txt > lintool.txt
Create a Wordcloud.
$ wordcloud_cli.py --text lintool.txt --imagefile lintool.png
- Each of these commands have a whole lot of options. Check them out, and experiment.
- Yes, there is probably a better way to do this, and you could even make it into a one-liner. I pulled this together as a favour to Mat.
- We were going to initially include wordclouds of collections in AUK, but
wordcloud_cli.pydoesn’t perform well at scale. Scale being, feeding it txt files of 5G up to 500G of raw text. Maybe one day we’ll revisit it.