NodeXL Logo

How to schedule the creation of a network with NodeXL and Windows Task Scheduler

NodeXL has a number of data importers that can create a network of connections from social media data sources like Twitter, YouTube, flickr, email, and the WWW (along with a number of other data import formats like GraphML, UCINet, CSV, and other Excel workbooks with data).

To create a network you just select the search terms and configurations you want from the NodeXL>Data>Import menu.

If you want to create the same network every day (or at any schedule), a recent feature (since version .125) of NodeXL can help. NodeXLNetworkServer.exe is an application that ships with NodeXL along with a sample configuration file called SampleNetworkConfiguration.xml. By editing the configuration file you can set NodeXL to collect anything available in the menu through Excel.  So far we have exposed the two Twitter data collectors (more on the way) so the configuration file asks you to select a search term or a user’s name, the size of the network and the details you want reported along with the location and name of the destination file that NodeXL will create.  Answer these questions by editing the config file and save it with a useful name that includes the search term.

Step by step details after the jump:

Editing the Configuration file

NodeXL ships with a SampleConfigurationFile.xml that you should copy, rename and edit.

You may want to create a directory to hold this configuration file if you expect that there will be many files.

The configuration file is written in XML but it it is very human readable and editable.  Here are the key elements of the file that depend on you to choose and configure the file appropriately.

NetworkType Specifies the type of network to get.  Must be one of the following values: TwitterSearch or TwitterUser

After setting NetworkType, you must also edit one of the following sections:

TwitterSearchNetworkConfiguration

TwitterUserNetworkConfiguration

TwitterSearchNetworkConfiguration is used only if NetworkType is TwitterSearch.

SearchTerm – What to search for.

WhatToInclude – What to include in the network.  This must be a combination of the following values, separated by commas:

Statuses – Include each person’s status (tweet).

Statistics – Include each person’s statistics.

FollowedEdges – Include an edge for each followed relationship.

RepliesToEdges – Include an edge from person A to person B if person A’s tweet is a reply to person B.

MentionsEdges – Include an edge from person A to person B if person A’s tweet mentions person B.

<WhatToInclude>Statuses,Statistics,FollowedEdges,RepliesToEdges,MentionsEdges</WhatToInclude>

MaximumPeoplePerRequest – The maximum number of people to request for each query, or leave empty for no limit.

NetworkFileFolder – Full path to the folder where the network files should be stored.

NetworkFileFormats – Specifies the file formats to save the network to.  This must be a combination of the following values, separated by commas:

GraphML – Save the network to a GraphML file, which can be imported into a NodeXL workbook.

NodeXLWorkbook – Save the network directly to a NodeXL workbook.  To use this option, the NodeXL Excel Template must be installed on this computer.

<NetworkFileFormats>GraphML,NodeXLWorkbook</NetworkFileFormats>

AutomateNodeXLWorkbook – Specifies whether the NodeXL Excel Template’s automate feature should be run on the workbook.  Must be true or false.  This is used only if NetworkFileFormats (above) includes NodeXLWorkbook.

If true, the automate options you most recently set in the NodeXL Excel Template are used to automate the workbook.  To set the automate options, do the following:

1. Open the NodeXL Excel Template.

2. In the Excel ribbon, Go to NodeXL, Graph, Automate.

Note that the “On this workbook” and “On every NodeXL workbook in this folder” selection in the Automate dialog box is ignored when automating the workbook from the NodeXL Network Server.

<AutomateNodeXLWorkbook>true</AutomateNodeXLWorkbook>

TwitterUserNetworkConfiguration

This section is used only if NetworkType is TwitterUser.

ScreenNameToAnalyze – The screen name of the Twitter user whose network should be analyzed.

WhatToInclude – What to include in the network.  This must be a combination of the following values, separated by commas:

FollowedVertices – Include a vertex for each person followed by the user.

FollowerVertices – Include a vertex for each person following the user.

LatestStatuses – Include each person’s latest status (tweet).

FollowedFollowerEdges – Include an edge for each followed relationship if FollowedVertices is specified, and include an edge for each follower relationship if FollowerVertices is specified,

RepliesToEdges – Include an edge from person A to person B if person A’s latest tweet is a reply to person B.

MentionsEdges – Include an edge from person A to person B if person A’s latest tweet mentions person B.

<WhatToInclude>FollowedVertices,FollowerVertices,LatestStatuses,FollowedFollowerEdges,RepliesToEdges,MentionsEdges</WhatToInclude>

NetworkLevel – Network level to include.  Must be One, OnePointFive, or Two.

MaximumPeoplePerRequest – The maximum number of people to request for each query, or leave empty for no limit.

NetworkFileFolder – Full path to the folder where the network files should be stored.

NetworkFileFormats -Specifies the file formats to save the network to.  This must be a combination of the following values, separated by commas:

GraphML -Save the network to a GraphML file, which can be imported into a NodeXL workbook.

NodeXLWorkbook – Save the network directly to a NodeXL workbook.  To use this option, the NodeXL Excel Template must be installed on this computer.

AutomateNodeXLWorkbook

Specifies whether the NodeXL Excel Template’s automate feature should be run on the workbook.  Must be true or false.  This is used only if NetworkFileFormats (above) includes NodeXLWorkbook. If true, the automate options you most recently set in the NodeXL Excel Template are used to automate the workbook.  To set the automate options, do the following:

1. Open the NodeXL Excel Template.

2. In the Excel ribbon, Go to NodeXL, Graph, Automate.

Note that the “On this workbook” and “On every NodeXL workbook in this folder” selection in the Automate dialog box is ignored when automating the workbook from the NodeXL Network Server.

One key configuration is this last choice to turn on the automated processing of the resulting data set.  If you set this to true, NodeXL constructs a Twitter social network and then performs all of the steps of automation that you define.  For example, NodeXL can calculate graph metrics, find clusters, create subgraphs, map a set of autofill column mappings of data to display attributes, set graph layouts and settings and render a graph without any human intervention.

Once you have a properly edited configuration file you can simply open a command line session (go to the Start menu, type in “CMD”) and type the following:

> NodeXLNetworkServer.exe SampleNetworkConfiguration.xml

And you will get a stream of messages about what parts of the network are being actively collected:

This is a simple example and not too useful: you may as well just do this through the NodeXL Excel interface.  But things get more interesting when one more piece is added: Windows Task Scheduler.  You may not see it that often but Task Scheduler is on almost every Windows desktop and can be used to automate the collection of NodeXL data sets from Twitter and other sources of social media networks.

To get to Task Scheduler just type its name into the Start Menu search box.  It should look like this:

Using this tool you can create new Tasks that will execute on a specific time and frequency.

Create and select a new folder (call it something like “NodeXL Data Collections”) under Task Scheduler Library to hold all your NodeXL data collections separate from other scheduled tasks.

Use the Create Task menu item on the far right.

You will get a Create a task dialog box:

Enter a Name and a Description that captures the search terms of your query and then select the “Actions” tab.

Actions are where the command that you want to execute is defined.  Select “New…” to create a new Action.

This will open a dialog box in which you can define the program to be run along with any settings.

Enter the complete path to the NodeXLNetworkServer.exe application in the “Program/script” field.

Add the complete path to the configuration file in the “Add arguments” field.

To easily capture the complete path (and the quotation marks needed if there are spaces in the path) hold the SHIFT key down while right clicking the name of the configuration file you want to schedule for collection.  You should see an option “Copy as Path” which will place the needed information into the clipboard.  After selecting “Copy as Path”, return to the “New Action” dialog box and paste the path into the “Add arguments” field.

Once the path to the NodeXLNetworkServer.exe file and the path to the configuration file have been entered into the New Action dialog, shift to the Triggers tab of the Create Task dialog.  Select “New…” to create a new trigger.

Triggers are defined by many things, but we will focus on time.  When the New Trigger dialog is set to “On a schedule” you can choose the time and frequency to run the collection.  If you want to run a collection daily at 7:01:30AM each day the following settings should work:

Once the time and recurrence are set select OK and you have a new task!  You may want to create a task that starts in a minute or two to test that the event fires properly.  You can then schedule these collections to run at an hour when you will not be bothered by the interruption.  Multiple collections can be scheduled but several limits suggest that only a few can run simultaneously.  You may want to start collections only a few times a day or hour to allow one collection to complete before others begin.  When these tasks do execute you should see a collection session appear in a console window and report updates as it steps through the many stages of constructing a network dataset.

Collection is much faster if you have a rate limit lifted account, which you must request from Twitter.  With a credentialed rate-limit lifted account you can perform several queries per hour.  With a regular account with credentials (your Twitter login) you can get one or two queries per day depending on the size of the data collected.  In either case it is possible to reach the limit that Twitter will provide.  When that happens NodeXL will pause the collection and wait until the API query budget refreshes and Twitter is willing to serve more query results.  As a result, even accounts without rate-limits lifting can create large complex social media network maps, although at a much slower rate.