skip to Main Content

Extract networks from message archives – New Import from CSV and XLSX – NodeXL Pro (v.529+) finds “Vertex 2” in collections of messages

There are many text files containing collections of social media messages. People archive Tweets and Reddit posts for a wide variety of reasons. Researchers often have files that have a format that is roughly:

date/time, author name, message text

This format for social media data archives is easy to collect and store. But it is not very useful for network analysis. To convert it to a network format we need to use the author name and join it with the names of the users mentioned in the text. A network data file minimally needs two columns:

Sender, Receiver

We often add many additional columns to these data sets, like date and time, and information about the user’s accounts like their creation date, self descriptions, locations, and counts of messages and followers.

As an example, this is a collection of messages from Twitter that are related to airlines found on GitHub. The file is a CSV collection of data:

This is a common type of data set with a date and an author and message text along with a number of related attributes. In the NodeXL Import from file feature, you can now see these columns and select the ones that are relevant and map them to the attributes of the connection (“edge”) or to the first person or entity or the second person or entity (Vertex 1 and Vertex 2).

Once these data column attributes are selected and mapped, we can move on to the next area of the importer – the “Build an edge between…” feature.

This is where you author all the types of connections that can be encoded in a message on various social media platforms. For example, in X/Twitter, a user can interact with a message in seven distinct ways: tweet (post), reply, mention, retweet, quote, mention in retweet, mention in quote.

We can create rules for extracting each of these edge types in X from this data using this interface. For example, these are the setting for a “Mentions” edge in which a users @username is the very first part of a message.

This rule says: the “source vertex” (vertex 1) is the username found in the column “name.” The second vertex is extracted from the column called “text”. In that column we instruct NodeXL that “Each” part of the text that “starts with” the character “@”, but start with the second item found in the string.

We can add a rule for each of the edge types we want to be able to recognize and generate:

Once these rules have been authored, we can next save the set of rules using “Save Config”. In later sessions, users can use “Load config” rather than rewriting each rule each time.

A sample configuration for converting many csv files contain Tweets into the 7 edge types (tweet (post), reply, mention, retweet, quote, mention in retweet, mention in quote) is available here.

Back To Top