skip to Main Content

Import Reddit social media networks with NodeXL


Reddit Logo

Reddit is a large and popular social media platform organized into “subreddits” that are topically focused.

NodeXL Pro now can import social media networks from Reddit.

To import Reddit data, select NodeXL Pro > Data > Import > Import from Reddit search network.

The dialog below is used to specify the query term.  The search can be limited to a specific subreddit.

Since Reddit alone limits the collection of search results to the most recent 250, NodeXL integrates the PushShift.io service to collect a larger and longer historical period of messages.  NodeXL collects Reddit message IDs from PushShift.io and then collects the details of each message directly from Reddit.  While Reddit search will not return many posts, this method can collect many more.

The data collected includes the details of each post and the comments that respond to it.  Optionally, users can add the task of collecting details about the usernames mentioned in the collected posts (recommended).

Users can limit the total number of posts to be collected and the time frame to to which the collection is limited.

Fullnames are the sequential message IDs used in Reddit.  Users can specify an initial and final “fullname” to collect a specific set of messageIDs.

It is necessary to sign into your Reddit account in order to access this data.  NodeXL will lead the user through the process of authorizing it to use your Reddit account.

When configured as desired, select the OK button.

When the data collection is completed, NodeXL Pro can then run an automated analysis and reporting process to generate a network visualization and rankings along with content analysis over time.  Like Twitter networks, these are networks built on connections created when users reply to posts and comments from other users.

These networks often have multiple “hub-and-spoke” structures that center around the most influential and central people in the network.  This is often a small fraction of the total population.

To automate the analysis of Reddit data in NodeXL we recommend using a “recipe” or “settings options file” that controls the various steps and processes need to take raw message level data and process it as a network dataset. A sample NodeXL Pro Reddit data recipe can be found on the NodeXL Graph Gallery or linked here.  Download the file and then use NodeXL Pro > Settings > Import to open a file browser and select that file.

Once the “recipe” file is imported, use NodeXL Pro > Graph > Automate > Run.

In a few minutes your data set should result in a network that may resemble this visualization of users discussing “ChatGPT”.

Back To Top