skip to Main Content

How to summarize the URLs, Hashtags and @Users mentioned in clusters of users discussing a Twitter Topic with NodeXL

Social media networks tend to be “clumpy”. Here is the map of connections among people who tweeted the term “global warming”:

NodeXL v.210 and newer now supports text analysis of content collected from social media data sources.  NodeXL applies social network clustering and then analyzes text that is grouped by social clusters.

Connections among people who tweet about a topic, keyword or hashtag form patterns that can lead to the formation of sub-groups and clusters.  Multiple clusters are formed within a network when a sub-population of people link to one another far more than to people in other groups. These regions of dense connections define the boundaries between sub-populations. Clusters often reflect the variation in interest in certain people and topics in the population. Some people and topics are more interesting to one group than others. Within these groups certain people and words get repeated more often than others.

Networks can be partitioned by many methods. NodeXL implements several. A collection of vertices can be grouped by the user by applying labels to the vertex worksheet (“Group by vertex attribute”). Or a group of vertices can be determined by an algorithm that looks for differences in the density of connections and divides by the points of least association (“Group by cluster algorithm”). Networks can also be grouped into separate isolated collections of nodes, called “connected components”.

In NodeXL groups can be visualized in multiple ways. Groups can be collapsed into meta-vertices that stand-in for the members of that group (right-click the graph pane and select “Groups>Collapse all groups”). Group members can also be displayed within a “box” with the “group-in-a-box” feature (found in the layout selection menu in the Graph Pane – select “Layout Options”).

Within each group is a population of people along with the tweets they authored in the time period captured by the data set. Each group has a collection of tweets that can be analyzed. The contents of all the tweets in a network can be scanned and certain types of strings can be counted to measure its frequency of mention. These counts can be repeated for each group, allowing groups to be contrasted based on the relative rates strings like URLs, hashtags, and @usernames. Here is a sample of the worksheet NodeXL creates to display all the data about people, URLs, and hashtags frequently mentioned in each group:

The worksheets offers top URLs, hashtags, and users across the entire network, and within each sub-group. The details offer insights into the people and topics of greatest interest.

Top Hashtags in Tweets in G7 G7 Count
globalwarming 24
climate 14
climatechange 10
environment 9
agw 6
books 6
glennbeck 6
rushlimbaugh 6
wildlife 5
science 5


Top Hashtags in Tweets in G5 G5 Count
tcot 13
teaparty 4
oil 4
globalwarming 4
p2 2
wrp 2
yyc 2
blameman 1
libtards 1
climatechange 1


Top Hashtags in Tweets in G4 G4 Count
ff 2
globalwarming 2
jokeswritethemselves 1
silverlining 1
ulooklikechazbonoonroids 1
jclogic 1
climatechange 1


Top URLs in Tweet, in Entire Graph Entire Graph Count 16 14 10 10 9 9 8 8 7 7

This feature allows the content in sub-groups to be contrasted, thus answering the question: how is this sub-group the same or different from another sub-group?

Back To Top