The Pushshift Slack Bot
This bot can be used to create data visualizations quickly and easily using the Pushshift API as the back-end. This bot can be used with slash commands (/pushshift). Below is some basic documentation on its use and the parameters that can be used with each type of command.
/pushshift wordcloud (wc can be used as an alias)
The wordcloud generator can create wordclouds from a number of different sources including Reddit authors, subreddits and terms themselves. The wordcloud uses 1,000 comments when building the visualization. The wordcloud generator has a special parameter called “cutoff” which defaults to 5 when not specified. The wordcloud genenerator creates images based on a frequency comparison between the results returned from comments or submissions and the global frequency of the word usage on Reddit. Setting the cutoff to a higher number will limit the wordcloud to words that appear at a much higher frequency compared to all of Reddit in general. When setting the cutoff to a X (an integer), only terms that appear X times more often in your search compared to all of Reddit will be included in the visualization.
Here is an example. Let’s say that you want to see what words are used in the subreddit “askscience” that occur at a frequency at least 10x more often than on all of Reddit. You could use the following command to view this wordcloud: “/pushshift wc subreddit=askscience cutoff=10” This command will use the most recent 1,000 comments from the subreddit and create a data-visualization that looks like this:
For this particular example, we can see that most of the recent comments to that subreddit appear to be referencing tools. The command also accepts many of the same parameters that the Pushshift API accepts. For example, instead of viewing a subreddit, you can use the author parameter to view a wordcloud of a Reddit user’s previous 1,000 comments. You can also the “q” parameter to view a specific word or phrase. Here is an example of viewing the previous 1,000 comments that contain the word Overwatch and DVa (I play Overwatch occasionally).
/pushshift wc q=overwatch+Dva (The cutoff will default to 5)
You can also use the link_id parameter to generate a wordcloud from a specific submission by using the submission’s base36 id (found in the url of the submission). For example, this submission (https://www.reddit.com/r/esist/comments/93oum8/supreme_court_says_kids_can_sue_trump_over/) has a link_id of 93oum8. Here is the command to generate a wordcloud from this submission:
/pushshift wc link_id=93oum8
To recap, /pushshift wc is the base command to generate a wordcloud and can accept parameters such as author, subreddit, q (for word or term), link_id and other Pushshift API parameters (you could use the before and after parameters to generate wordclouds based on specific moments in time for example).
/pushshift activity (act can be used as an alias)
This pushshift command is for generating aggregation data visualizations to see which subreddits (or authors) are the most frequent given some other conditions. For example, to see the top subreddits based on commenting activity for a user for the entire history of that user, you could use this command: /pushshift act author=stuck_in_the_matrix after=0 agg_size=25 This command will show the top 25 subreddits that I am most active in (for my entire account history).
To view activity for submissions instead of comments, use the parameter type=submission. Here is an example of the top 25 subreddits I am most active in based on submissions for my entire account history:
/pushshift act author=stuck_in_the_matrix after=0 agg_size=25 type=submission
As you can see, this is a very powerful tool and give a lot of insight into the types of communities that a Reddit user is involved in. If you want to restrict the data visualization and only view activity for a user for the previous X days, you can use the after parameter (i.e. after=90d to show activity for the previous 90 days)
Another very powerful feature for this command is the ability to view which authors are most active based on a subreddit or a word or phrase. For example, let’s see which accounts are most active in /r/the_donald for the previous 30 days (automoderator is automatically excluded):
/pushshift act subreddit=the_donald aggs=author after=30d
In the previous example, if you wanted to exclude “[deleted]”, you can use the author parameter with negation (i.e. author=![deleted])
Here is an example showing which users had the highest use of the term “thanos did nothing wrong” over the past year: /pushshift act q=”thanos did nothing wrong” aggs=author after=365d
There are many different ways of visualizing data using this powerful command. If you need more assistance, feel free to contact me on Twitter or Reddit!
This command will show activity by time of day for an author, term, subreddit, etc. For example, you can compare two city subreddits (sweden and san francisco) by using the command: /pushshift timeofday subreddit=sanfrancisco after=0 and then /pushshift timeofday subreddit=sweden after=0
This command is similar to the previous one, except it shows activity by hour of the week. For example, here is the weekly activity for the subreddit /r/istodayfridaythe13th: /pushshift timeofweek subreddit=istodayfridaythe13th after=0
This command will show activity across a wide range of time. For example, this is the comment history for the subreddit /r/thanosdidnothingwrong: /pushshift timeline subreddit=thanosdidnothingwrong after=0
As you can see, this tool is still in beta (the x-axis labels are a bit messed up in the previous example). I’ll be working on improving this as time allows!