Tutorial: Analyzing YouTube User Networks
NodeXL Pro offers several ways to access the official YouTube API (v3). With the NodeXL Pro YouTube data importers you can
- analyze discussions around videos and search terms with User-to-User Video Comment Networks or
- explore Video-to-Video Networks based on video co-commenting and
- explore User-to-User Channel subscription networks (very limited).
This tutorial will show you how to analyze YouTube User Networks. The networks are created based on comments and replies which are posted by users in the comments section below a video. With the NodeXL Pro YouTube User network importer you can analyze user networks around a single video or multiple videos.
Basic knowledge about Task Automation is required. The NodeXL Pro data recipes used in this tutorial can be found in the official recipe bundle on the page Automate NodeXL Pro. You can easily learn how to automate NodeXL Pro by reading this page, looking at this tutorial and/or watching the video on the right.
Step 1: Create YouTube API keys
The NodeXL Pro YouTube data importers have an integrated API key with 100k units per day which can be used by any NodeXL user, but these units are consumed very quickly. That is why quota management is very important.
So before getting started we strongly recommend that you create your own YouTube API keys for your project. Here is a guide on how to quickly receive up to 10 API keys with a daily limit of 10,000 units per key. You can apply for an upgrade to 100,000 units, but need to go through a review process.
The quota cost for the API calls depends on several factors: The overall number of comments, the number of replies to each comment and the overall number of unique users commenting on one video.
In general you have to calculate with lists of 50 videos/comments/replies which cost 50 units each. Asking for a list of e.g. 10 or 20 comments/replies will cost the same as 50 comments/replies. But it is hard to calculate the cost of a query when you do not know how many comments there are out there.
If you are interested in analyzing the comments and replies of a single video, you can get at least 9,500 comments and replies with one 10k API key. We have successfully tested comment downloads up to 60,000 comments and replies from 40,000 users around one single video with one 10k API key.
For clarification you may want to have a look at the official YouTube Data API (v3) Quota Calculator. The shown quota costs are subject to change. Getting data has become more costly with every API update.
If your API key runs out of quota during import of data, you will receive the following error message: “The network couldn’t be obtained. The request cannot be completed because you have exceeded your quota”. As a result you will see only a partial dataset or no data at all in your workbook.
Step 2: Import a YouTube data recipe
The first step is to import the data recipe with the file name “YouTube User Network 01 – standard.NodeXLOptions”. This recipe performs all relevant steps to conduct a full-scale social network analysis and runs text and content analysis on the Comment column of the Edges spreadsheet. You can also choose the recipe “YouTube User Network 02 – alternative layout.NodeXLOptions” or “YouTube User Network 03 – large” which use different layout options, but include the same analytical steps.
Step 3: Save the workbook
Before getting started, it is helpful to first save the file to a filename that includes the basic setup of the import settings, e.g. “NodeXL User Network 100rel-100-100 2022-07-15.xlsx”. This means that the search term NodeXL was used to import 100 videos sorted by date with a maximum of 100 comments to each video and a maximum of 100 replies to any comment on July 15, 2022.
Step 4: Open the YouTube User Network importer
Open the YouTube Video Network importer via Data > Import > From YouTube User Network…
Step 5a: Data importer setup – search term or video ID(s)
On the left you see two options to import data – either by entering a search term or by entering a (list of) video ID(s).
When entering a list of videos, you need to identify the video ID within the video URL first. E.g. when looking at this URL https://www.youtube.com/watch?v=mjAq8eA7uOM, the video ID of this URL is mjAq8eA7uOM. Note that some video URLs contain tracking or time codes right after the ID, these need to be removed.
Step 5b: Data importer setup – edge creation
In the section “Add an edge for each” check the boxes “User that commented on a video” and “User that replied to a comment on a video”. Each comment or reply will create one single row/edge on the edges spreadsheet. The number of edges should be the same as the number of comments shown below the video on the corresponding YouTube video page. In some cases the number of edges is less than the official number. The reason is currently not clear. We can rule out a bug in the NodeXL code and assume it has to do with privacy settings, deleted comments or users.
In addition, NodeXL will add one row for each video with the edge relationship “Posted Video“. This is a simple self-loop (Vertex 1 = Vertex 2) which is created for the user/channel that posted a video. The number of “Posted Video” edges is equvalent to the number of videos that are selected in the next step below.
Step 5c: Data importer setup – search options
In the Options section you can limit the amount of collected videos, comments and replies. These settings play a big role in your quota calculations.
Depending on the popularity of the search term, the API may not return the full number of requested videos – usually no more than about 200 videos per search.
Further, regarding the video search you can sort the results by Relevance, Date, Rating, View Count or Title. You can also select a time frame for the video publication date.
Step 6: Automate
When you are finished with the importer setup, click OK to download the data. After the download click Graph > Automate > Run to analyze the data set. If you select the option “Automate the graph after the data is imported” under Data > Import > Import Options before opening the importer, the analysis will start automatically.
Below you see an example setup with the search term “climate change” and the resulting network map.
Step 7: Review the data
When Task Automation is done, the workbook is ready to explore.
On the edges spreadsheet you find the column “Comment” which contains the comments and replies and metadata such as the commenting user, the publication date of the comment and the like count. NodeXL also extracts URLs and hashtags that are embedded in the comments.
The vertices spreadsheet contains vertex centrality metrics and also metadata on the collected users/channels such as the name, view count, subscriber count and view count of a channel.
Example: Single video network
By entering a single video ID in the ”Import from video list” you are able to analyze the whole discussion around one video. The example below shows the user network around the video “Carl Sagan testifying before Congress in 1985 on climate change”(video ID Wp-WiNXH6hI). The resulting current network is composed of 9,131 comments from 5,813 users. In this case limiting the download is not needed. The resulting NodeXL Pro network report is available in NodeXL Graph Gallery.
Useful practical advice:
In many cases it may be useful to start a project with simple lists of videos to review available videos around a specific search term. The resulting list contains video metadata with links to the videos, the video creators, video title and description and is sortable by view count, comment count or publication date. The quota consumption for these lists is very low. So you can make many list imports with just one 10k API key.
This data review may lead to a list of video IDs that can then be analyzed with both of the NodeXL Pro YouTube data importers.
Here is how you set up the NodeXL Pro YouTube Video Network importer to download a simple list of videos: