This tutorial shows you how to create a semantic network by using the text analysis feature of NodeXL Pro which can be applied to any column that contains text in the edges or vertices spreadsheets of a NodeXL workbook.
A semantic network is composed of words that are linked to other words. These links are created when words appear frequently next to one another in a collection of documents or messages. Semantic networks reveal the relationships between ideas embedded in the text collection. Maps of semantic networks can reveal the ways certain ideas are most central or the ways some ideas cluster together with others.
Running the NodeXL Pro Words and Word Pairs text analysis feature will create two new spreadsheets in the workbook: Words and Word Pairs. These spreadsheets can then be transformed into a network dataset in which Word Pairs are used as Edges and Words as Vertices.
In addition to semantic text network analysis, this approach can also be used for different kinds of analyses: E.g. applying the text analysis feature to the Hashtags in Tweet column of a Twitter dataset, we can create a hashtag network. Or pointing the feature to the video tag column of a Youtube dataset, we can create a video tag network.
Step 1: Create a NodeXL Pro workbook with Twitter data of your choice
In this example we will work with a dataset with 10k tweets collected with the Twitter Search Network Importer using the following search query: “climate change” lang:en. By adding the search operator lang:en we will get only tweets in English language to avoid semantic conflicts that arise from different word meanings in different languages.
After the Twitter network data download, we run the NodeXL Pro Graph analysis automation feature to create a full social network and content analysis. You can learn how to automate your analysis in the tutorial: How to Automate NodeXL Pro. Running a full analysis is not necessary to create a word pair network, but it is a complementary dataset that will provide the context for this analysis. Thus, we recommend that you run task automation first.
You can find the full report and map for the search query above in NodeXL Graph Gallery.
Step 2: Take a look at the Words and Word Pairs spreadsheets
When task automation is finished, browse to the Words and Word Pairs spreadsheets that have been created (see image below). You can find the column Count on both spreadsheets which shows the overall count of each word and word pair in the whole dataset. The Words and Word Pair Metrics dialog box seen on the left is opened via Analysis > Graph Metrics > Words and word pairs > Options. This step is part of the Automation process.
If a tweet contains the text “Social network analysis is a powerful research method”, there will be five word pairs created: Social network, network analysis, analysis powerful, powerful research, research method.
The words “is” and “a” are skipped because they are part of the word list in the box “Skip these words”. If you see many stop words in the word list, you may want to add these to the Skip Words lists in the dialog box and re-run the text analysis.
Step 3: Create a new network dataset with Word Pairs as Edges and Words as Vertices
Open a new NodeXL Pro workbook and set the network type to Directed via Graph > Network Type > Directed.
Then copy and paste the word pairs from the Word Pair spreadsheet into the Edges spreadsheet of the new workbook so that Word 1 is Vertex 1, and Word 2 is Vertex 2 (as seen on the right). Also copy and paste the column Count which will be used as Edge Weight in the new network.
Repeat this step for the Word spreadsheet which needs to be copied to the Vertices spreadsheet including the column Count.
Note (!): The original word and word pair spreadsheets also contain counts by group as specified in the dialog box above. Make sure to only copy Word Pairs with the category “Entire Graph” in Column F “Group”. You can of course choose to focus on a certain group and create a network just from e.g. G1 or any other group you are interested in.
Save the file. Congratulations, you have now successfully created a new network dataset!
Step 4: Run a Social Network Analysis on the new dataset
Import and unzip the current NodeXL Pro Data recipe bundle, then import and run the “Semantic Network – count” recipe which will analyze, cluster and visualize the word pair network. The result will look like the map on the left.
The data recipe uses the Count column of the Vertices spreadsheet to size the labels. Use the Scale slider above the graph pane to customize the sizing of the vertices and edges within the map.
Also take a look at the high resolution image that has been attached to the file during the automation process.
For a better visual analysis of any group, go to the Groups spreadsheet and set the Visibility (column D) to Skip for all groups which you would like to remove from the map. Then click Refresh above the Graph Pane.
Step 5: Create a hashtag network
To create a hashtag network from the initial dataset, you only need to change the column in the Word and Word Pair Metrics dialog box shown in Step 2 from Tweet to Hashtags in Tweet, and then calculate new Word and Word Pairs spreadsheets. If you do not want to overwrite the existing word and word pairs spreadsheets, save the initial dataset to a new file and then run the text analysis.
Just redo the steps above and you will end up with a hashtag network like the one on the right.
Semantic network composed of linked words reveal the relationships between ideas embedded in a collection of text. Semantic network maps can reveal the most central ideas in a corpus of text and highlight the ways some ideas cluster together. Contrasting these semantic network maps can reveal important differences in the ways distinct groups of people talk about a common issue or idea.
Use the NodeXL Pro Users Network Importer to compare the semantic network of two (or more) Twitter users, e.g. below you can compare the word networks by Donald Trump and Nancy Pelosi based on their past 3,200 tweets each.