NodeXL Pro Tutorial: Wikipedia Article-Article Networks

By Dr. Verónica Espinoza-González (Twitter @Verukita1) February 2023

This tutorial shows the steps that are required to generate a Wikipedia Article- to Article network by using NodeXL Pro’s “From MediaWiki Page Network” importer.

Wikipedia is a free, polyglot and collaboratively edited encyclopedia. It is a project to create a free encyclopedia available via the Internet. Everyone can contribute their knowledge on any topic to create a database with all human wisdom. It is managed by the Wikimedia Foundation, a non-profit organization whose funding is based on donations [1].

Its content is provided in its entirety by volunteers who are free to create, edit and review any entries. Wikipedia is structured as an interconnected network of articles. Each article can have multiple hyperlinks to other Wikipedia entries [2].

Follow the steps below to generate a network of Wikipedia pages and their connections to other Wikipedia pages.

Tutorial requirements

Basic knowledge about Task Automation is required. The NodeXL Pro data recipes used in this tutorial can be found in the official recipe bundle on the page Automate NodeXL Pro. You can easily learn how to automate NodeXL Pro by reading this page, looking at this tutorial and/or watching the video on the right.

Step 1: Data recipe import

Import the NodeXL data recipe via NodeXL Pro > Options > Import, select Wikipedia – Page Network.NodeXLOptions and click Open.

Step 2: Open the data importer

Open the MediaWiki Page Network data importer via NodeXL Pro > Data > Import > From MediaWiki Page Network.

Step 3: Importer setup

Open a web browser and search for the Wikipedia page that you are interested in analyzing. In the NodeXL Pro importer, write the corresponding information in the options of “Seed Article” and “Wiki Domain” as shown in the image (Figure. 3).  Select the “Article-Article Network” option with a level of 1.5 (you can select other levels of analysis if required). Finally, click “Download” and wait for the analysis to conclude.

Network level options

The above example is based on a 1.5 network. This means the importer will look for all pages that are linked from the initial seed page – which represents the 1.0 network. After that the importer will collect new lists of links from each related article and add network edges between these articles if they are connected to each other which is the 1.5 network.

The 2.0 network contains all related articles that are related to the articles of the 1.0 network which results in a much larger network. Downloading a 2.0 page network may take a long time because the Wikipedia API is rather slow. Have a look at the different network sizes and shapes below.

1.0 network
2.0 network

Step 5: Review the data and graph.

After the data download, select Graph > Automate > Run. to run the data recipe imported above. When automation is finished, you will see a fully populated NodeXL workbook and the corresponding network graph.


