skip to Main Content

How to summarize the URLs, Hashtags and @Users mentioned in clusters of users discussing a Twitter Topic with NodeXL

Social media networks tend to be “clumpy”. Here is the map of connections among people who tweeted the term “global warming”:

NodeXL v.210 and newer now supports text analysis of content collected from social media data sources.  NodeXL applies social network clustering and then analyzes text that is grouped by social clusters.

Connections among people who tweet about a topic, keyword or hashtag form patterns that can lead to the formation of sub-groups and clusters.  Multiple clusters are formed within a network when a sub-population of people link to one another far more than to people in other groups. These regions of dense connections define the boundaries between sub-populations. Clusters often reflect the variation in interest in certain people and topics in the population. Some people and topics are more interesting to one group than others. Within these groups certain people and words get repeated more often than others.

Networks can be partitioned by many methods. NodeXL implements several. A collection of vertices can be grouped by the user by applying labels to the vertex worksheet (“Group by vertex attribute”). Or a group of vertices can be determined by an algorithm that looks for differences in the density of connections and divides by the points of least association (“Group by cluster algorithm”). Networks can also be grouped into separate isolated collections of nodes, called “connected components”.

In NodeXL groups can be visualized in multiple ways. Groups can be collapsed into meta-vertices that stand-in for the members of that group (right-click the graph pane and select “Groups>Collapse all groups”). Group members can also be displayed within a “box” with the “group-in-a-box” feature (found in the layout selection menu in the Graph Pane – select “Layout Options”).

Within each group is a population of people along with the tweets they authored in the time period captured by the data set. Each group has a collection of tweets that can be analyzed. The contents of all the tweets in a network can be scanned and certain types of strings can be counted to measure its frequency of mention. These counts can be repeated for each group, allowing groups to be contrasted based on the relative rates strings like URLs, hashtags, and @usernames. Here is a sample of the worksheet NodeXL creates to display all the data about people, URLs, and hashtags frequently mentioned in each group:

The worksheets offers top URLs, hashtags, and users across the entire network, and within each sub-group. The details offer insights into the people and topics of greatest interest.

Top Hashtags in Tweets in G7G7 Count
globalwarming24
climate14
climatechange10
environment9
agw6
books6
glennbeck6
rushlimbaugh6
wildlife5
science5

 

Top Hashtags in Tweets in G5G5 Count
tcot13
teaparty4
oil4
globalwarming4
p22
wrp2
yyc2
blameman1
libtards1
climatechange1

 

Top Hashtags in Tweets in G4G4 Count
ff2
globalwarming2
jokeswritethemselves1
silverlining1
ulooklikechazbonoonroids1
jclogic1
climatechange1

 

Top URLs in Tweet, in Entire GraphEntire Graph Count
http://LiveScience.com16
http://bit.ly/IdTUlC14
http://ow.ly/apxEv10
http://is.gd/ZSXuVT10
http://stevengoddard.wordpress.com/2012/04/21/arctic-ice-area-approaching-abnormally-high-range/9
http://bit.ly/IbMs8o9
http://www.financialpost.com/m/wp/fp-comment/blog.html?b=opinion.financialpost.com/2012/04/20/aristotles-climate8
http://bit.ly/JwlWYw8
http://yhoo.it/JdLq0Q7
http://usat.ly/JdNKFh7

This feature allows the content in sub-groups to be contrasted, thus answering the question: how is this sub-group the same or different from another sub-group?

Back To Top