Social Media Research Foundation

7:00 am in Foundation, SMRF - Social Media Research Foundation by Marc Smith

Hello!

We are a group of researchers who want to create open tools, generate and host open data, and support open scholarship related to social media.

Social media is the term for all the ways people connect to people through computation.  Mobile devices, social networks, micro-blogging and location sharing are just a few of the ways people engage in computer-mediated collective action.

Mapping, measuring and understanding the landscape of social media is our mission.  We support tool projects that enable the collection, analysis and visualization of social media data.  We host data sets that are relevant to social media research.  And we will support graduate students studying and building research related to social media.

Today, our primary project is NodeXL, the free and open network overview discovery and exploration add-in for Excel 2007 (and 2010) that extends the familiar spreadsheet so that it can collect, analyze and visualize complex social networks.

We plan to take on additional projects that improve the variety and quality of data available to the NodeXL social network analysis platform (among others that consume the open GraphML format).

Bernie Hogan's Facebook Network Map featured in Journal of Social Structure (JOSS) (Made with NodeXL)

7:00 pm in Facebook, Industry, JCMC - Journal of computer-mediated communication, JoSS, Journal, Network clusters and communities, NodeXL, Oxford, Papers, Research, Social Media, Social Network Analysis, Social network, Sociology, University, Visualization by Marc Smith

The Journal of Social Structure has released its First Annual JoSS Visualization Symposium results and two of the images were generated with NodeXL.  One of the two is Bernie Hogan’s radial layout applied to representing Facebook Friend networks.

http://jossviz.wordpress.com/2010/06/23/friendwheel-layout-of-a-facebook-network/

The Journal of Social Structure (JoSS) is an electronic journal of the International Network for Social Network Analysis (INSNA).  Here is Bernie’s description of the graph.

This is a “pinwheel” diagram using the author’s Facebook personal network (captured July 15, 2009).

Nodes represent the author’s friends and links represent friendships among them. The author is not shown. Each ‘wing’ radiating outwards is a partition using a greedy community detection algorithm (Wakita and Tsurumi, 2007). Wings are manually labelled. Node ordering within each wing is based on degree. Node color and size is also based on degree. Nodes position is based on a polar coordinate system: each node is on an equal angle of n/360º with a radius being a log-scaled measure of betweenness. Higher values are closer to the center indicating a sort of cross-partition ‘gravity’.

This layout has several notable features:

- The angle of each wing is proportionate to its share of the network. Thus 25 percent of nodes go from 0 to 90º.

- Partitions are distinguished by their position rather than a node’s color or shape.

- The tail indicates the periphery of each partition. A wing with many tail nodes indicates many people who are only tied to other group members.

- Edges crossing the center show between-partition connections. Since nodes are sorted by degree it is easy to see if edges originate from the most highly connected nodes or the entire partition.



Bernie’s chapter on analyzing Facebook networks with NodeXL appears in the book: Analyzing Social Media Networks with NodeXL: Insights from a connected world.

July 22st, 2010 SNA event at Stanford: Network Analysis Made Easy: Using NodeXL To Map Social Media Networks

11:13 am in Blog by Marc Smith

There is a Stanford Media X event on July 22nd, 2010 on new tools for SNA: Network Analysis Made Easy:  Using NodeXL To Map Social Media Networks http://mediax.stanford.edu/WSI/marc.html Bring a laptop (running Windows and Office 2007 or 2010) to this workshop and you can be analyzing a social media network from systems like Twitter, flickr, YouTube and your own email by the end of the day.  If you can make a pie-chart in Excel, using the free and open NodeXL (http://nodexl.codeplex.com) you can now make a rich network graph from data extracted from social media systems and other common formats.  If you have a network, bring it, if not you can bring a suggested topic that we can map during the course of the day. Even if you leave your laptop behind or have a Mac (sorry, no version is yet available for MacOS – unless you have a virtual machine with Windows and Office) this workshop will introduce the core concepts of network science with application to social networks in general and social media networks in particular. Applied to a range of topics and services, social media network maps can illuminate a variety of “publics” – populations who share a common interest and may share connections.  Maps of topics like “oil spill”, “global warming” and other issue and event related keywords can reveal the groups and factions that cluster around different concepts and terms.  Key contributors in these maps can be identified through the application of network measurements that capture various aspects of a  person’s location in a network graph. Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati

July 12-13, 2010: Microsoft Research Faculty Summit, Redmond, WA

10:42 am in Blog by Marc Smith

Faculty Summit

The 2010 Microsoft Research Faculty Summit was held July 12 and 13 in Redmond, Washington.  Among the many panels and discussions related to the state of computer science the NodeXL team had several representatives talking about the ways network science education can be expanded using an easy to use application for network analysis built on Excel.

Jimmy Lin from the University of Maryland also attended to speak about programming in the cloud.

Here is the abstract for the NodeXL talk:

NodeXL – Social Network Analysis in Excel—Natasa Milic Frayling, Microsoft Research; Ben Shneiderman, University of Maryland; Marc Smith, Connected Action

Businesses, entrepreneurs, individuals, and government agencies alike are looking to social network analysis (SNA) tools for insight into trends, connections, and fluctuations in social media. Microsoft’s NodeXL is a free, open-source SNA plug-in for use with Excel. It provides instant graphical representation of relationships of complex networked data. But it goes further than other SNA tools—NodeXL was developed by a multidisciplinary team of experts that bring together information studies, computer science, sociology, human-computer interaction, and over 20 years of visual analytic theory and information visualization into a simple tool anyone can use. This makes NodeXL of interest not only to end-users but also to researchers and students studying visual and network analytics and their application in the real world. NodeXL has the unique feature that it imports networks from Outlook email, Twitter, flickr, YouTube, WWW, and other sources, plus it offers a rich set of metrics, layouts, and clustering algorithms. This talk will describe NodeXL and our efforts to start the Social Media Research Foundation.

Some photos from the event:

Saul Greenberg at the 2010 MSR Faculty Summit

Saul Greenberg

Ben Shneiderman and Andy van Dam 2010 MSR Faculty Summit

Ben Shneiderman and Andy van Dam

Ben Shneiderman, Natasa Milic-Frayling, and Marc Smith at the 2010 MSR Faculty Summit

Ben ShneidermanNatasa Milic-Frayling and Marc Smith

Tom McMail and Marc Smith at 2010 MSR Faculty Summit

Tom McMail and Marc Smith

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati

Automatic for the people (who use the latest NodeXL!). Release v.1.0.1.128

8:57 am in Blog by Marc Smith

The NodeXL team has just released a new version (v.1.0.1.128) that contains a new “Automation” feature that allows users to define a collection of operations to perform on their network graphs and invoke the complete set in a single button click AND reuse that configuration on other workbook graphs.  In fact, the feature will apply the configuration you define to all the files you specify, allowing easy processing of large collections of network data sets.

This week the feature is partially complete.  Users can invoke the merge duplicate edges, calculate graph metrics, auto-fill columns, create sub-graph images, find clusters and show graph.  These operations can require as many as dozens of clicks when performed manually.  If you have dozens or hundreds of network data sets the result is a daunting case of repetitive strain injury and carpal tunnel syndrome.  Instead, with automation, these operations can be carried out orders of magnitude more frequently without much pain!

The next release will feature the complete package which will then include control over the layout and graph options.  As a result, automatically generated network visualizations can be produced in a pipeline: users will be able to specify a query using the NodeXL desktop network data collector and then automate the processing of  large collections of data sets.

The result should be better analysis of time series data sets that have many “slices”.  The feature points the way to additional development work for supporting the comparison between networks to evaluate their evolution.


The REM album “Automatic for the people” takes its title from the motto of Athens, Georgia, eatery Weaver D’s Delicious Fine Foods.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati

Paper: Tech Report at University of Maryland on EventGraphs

7:23 am in Blog by Marc Smith

A new paper on visualizing social media has been released on the University of Maryland, Human Computer Interaction Laboratory tech report archive.  Co-authored by Derek Hansen,  myself, and Ben Shneiderman, the paper describes and visualizes the patterns of connections formed when people tweet about events like conferences and news stories.

EventGraphs_2010_HCIL_Tech_Report

http://www.cs.umd.edu/localphp/hcil/tech-reports-search.php?number=2010-13

Hansen, D., Smith, M., Shneiderman, B.
EventGraphs: Charting Collections of Conference Connections
HCIL-2010-13

EventGraphs are social media network diagrams constructed from content selected by its association with time-bounded events, such as conferences. Many conferences now communicate a common “hashtag” or keyword to identify messages related to the event. EventGraphs help make sense of the collections of connections that form when people follow, reply or mention one another and a keyword. This paper defines EventGraphs, characterizes different types, and shows how the social media network analysis add-in NodeXL supports their creation and analysis. The paper also identifies the structural and conversational patterns to look for and highlight in EventGraphs and provides design ideas for their improvement.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati

Mapping the connections among people who tweet #sunbelt

9:53 am in Blog by Marc Smith

The International Sunbelt Social Network Conference is the official conference of the International Network for Social Network Analysis (INSNA).

This year’s INSNASunbelt” conference is at the  Riva del Garda Fierecongressi, Trento, Italy!  Here is the 2010 INSNA Sunbelt Program.

This is the NodeXL map of connections among people who tweeted the hashtag used for the conference “#sunbelt”.

2010 - July - NodeXL - sunbelt - 2010-07-01

Having now seen several of these maps for other topics and events (see: http://www.flickr.com/photos/marc_smith/sets/72157622437066929/) this map can be placed in context.  It is a small group, but has a high density of connections.  It lacks isolates, the people who say the term but do not connect to others who say that term.  This means that this is a very “in-group” population: if you know to use the #sunbelt hashtag, you probably connect to someone else who uses the term.  It is a single major cluster of connected people, no obvious sub-graphs or clusters are visible.  Not everyone is central in the graph, and those who are have a prominent role in the network science community.  Here is the top ten list of #sunbelt mentioning twitter users ranked by betweeness centrality.

miriamnotten
barrywellman
memeticbrand
isidromj
drewconway
gephi
kristtina
danevans87
valdiskrebs
ciro

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati

Pierre De Vries Telco Industry Network Map featured in Journal of Social Structure (JOSS) (Made with NodeXL)

3:19 pm in Blog by Marc Smith

The Journal of Social Structure has released its First Annual JoSS Visualization Symposium results and two of the images were generated with NodeXL.  One of the two is The Evolution of FCC Lobbying Coalitions by Pierre de Vries, Research Fellow at the Economic Policy Research Center University of Washington, Seattle.

Pierre has been a deep student of telecommunications policy regulation in the United States for many years.  He has generated a remarkable network map built from the details of filings to the FCC over more than a decade.  These filings are made by companies when they agree or disagree with a proposed policy.  When two companies file in support (or opposition) to the same policy they create a tie between them.  The collection of these connections creates a complex network of coalitions and factions.

http://www.cmu.edu/joss/content/issues/2010jossviz/5_deVries.htm

“The graph is derived from meta-data associated with documents that are filed electronically whenever an organization interacts with the FCC, in accordance with the Administrative Procedures Act. Whenever a letter, comment or other document is filed, the filer provides information on the parties involved, number of pages, relevant proceedings, date, etc.”

“Once the data is cleaned up, an edge list is created in Excel by running another VBA macro. A graph is created from this list with NodeXL, a social network analysis and visualization add-in for Excel 2007. NodeXL’s Fruchterman-Reingold algorithm is used to prepare a preliminary layout; nodes are then moved by hand into visually intelligible positions, respecting the clusters suggested by NodeXL’s implementation of the Wakita-Tsurumi algorithm. Nodes are colored on the basis of eigenvector centrality. The degree of investment that organizations make in lobbying is measured by the total number of filings it made in this proceeding over the period of study, and reflected in the size of the node. This information is obtained by running another VBA macro against the underlying ECFS metadata, and then matching that to the vertices in the graph.”

Read more about this industry network at JoSS.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati

Book: Flier and Cover Art – Analyzing social media networks with NodeXL: Insights from a connected world

11:13 am in Book, Collective Action, Common Goods, Community, Connected Action, Maryland, Measuring social media, Metrics, Network clusters and communities, Network data providers (spigots), Network metrics and measures, Network visualization layouts, NodeXL, Performance scale parallel and cloud computing, Research, Social Media, Social Network Analysis, Social Roles, Social network, Sociology, University, User interface, Visualization by Marc Smith

The production team at Morgan-Kaufmann have created a cover and a flier for the forthcoming book:

2010 – June – NodeXL Book Flyer.

Written and edited by Derek Hansen, Ben Shneiderman and Marc Smith, the book contains contributed chapters on sample social media systems:

[Chapter 10]: Twitter: Conversation, Entertainment and Information, All in One Network!

By Vladimir Barash and Scott Golder

[Chapter 11]: Visualizing and Interpreting Facebook Networks

By Bernie Hogan

[Chapter 12]: WWW Hyperlink Networks

By Robert Ackland

[Chapter 13]: Flickr: Linking People, Photos, and Tags

By Eduarda Mendes Rodrigues and Natasa Milic-Frayling

[Chapter 14]: YouTube: Contrasting Patterns of Interaction and Prominence

By Dana Rotman and Jennifer Golbeck

[Chapter 15]: Wiki Networks: Networks of Creativity and Collaboration

By Howard T Welser, Patrick Underwood, Dan Cosley, Derek Hansen, and Laura Black

This handy poster contains many details about the book contributors, chapters, and the book cover (which you can also see below):

2010 - Book - Analyzing Social Media Networks with NodeXL Cover

Analyzing Social Media Networks with NodeXL: Insights from a Connected World

New NodeXL Network Server (v1.0.1.126) – Frequently Asked Questions

11:01 pm in Metrics, Network data providers (spigots), NodeXL, Social Media, Social network, Twitter, User interface by Marc Smith

NodeXL Network Server Frequently Asked Questions

The NodeXL team has released a new version (v.1.0.1.126) with better support for collecting data from social media network sources, starting with Twitter.  The NodeXL Network Server program now ships in every NodeXL installation.  Tony, the lead developer on the team, created the following FAQ to explain how to use the collector application.

This document describes how the NodeXL Network Server works.

  • What is the NodeXL Network Server?

It’s a Windows command-line program that downloads a network from Twitter and stores the network on disk in several file formats.  It can be run directly from a command line, but is typically scheduled to run on a periodic basis via the Task Scheduler that is built into Windows.

  • Where can the files be found?

The files are in NodeXL’s program folder.  To find out where the folder is, right-click the Microsoft NodeXL, Excel 2007 Template menu item in the Windows Start menu, then select Properties.  On 32-bit English computers, the folder is “C:\Program Files\Microsoft Research\Microsoft NodeXL Excel Template.”

  • Who are its intended users?

The Server is meant for use by people with moderate system administration skills.  It is not difficult to use, but it is not intended for the same audience as the NodeXL Excel Template, where ease of use is of high priority.

  • How do you run the Server from the Windows command line?

Like this:

NodeXLNetworkServer.exe NetworkConfiguration.xml

The program takes a single argument, which is the path to a configuration file that specifies which network should be downloaded and how the network should be saved to disk.  A particular configuration file might specify “Get the Twitter search network for people whose tweets contain ‘Sociology,’ add an edge for each ‘mentions’ relationship, limit to 100 people, include tweets, include statistics, and store the network as a GraphML file in the C:\NodeXLNetworks folder.”

The program immediately gets the requested network, saves it to disk, and exits.  On its own, it does not run on a periodic basis.

  • How do I create a configuration file?

You create a configuration file by copying a provided template file and editing the copy in Notepad.  The template file is named SampleNetworkConfiguration.xml and is stored in the same folder as the program.  The file is in XML format and the XML tags are clearly named and documented.

  • In what file formats can be the network be saved to disk?

You can save the network to either GraphML, which can be imported into a NodeXL workbook; directly to a NodeXL workbook; or both.

  • Do you typically run the program from the command line?

No.  Instead, you typically run it as a scheduled task via a built-in Windows program called Task Scheduler

Task Scheduler is a powerful utility that lets your run any program, including NodeXL Network Server, on a periodic basis.  You can, for example, tell Task Scheduler to run NodeXL Network Server using a particular network configuration file every twelve hours starting June 1, 2010 and ending June 30, 2010; or once a week starting now and continuing forever.  The scheduling options are endless.

  • Why not just include scheduling features in the NodeXL Network Server?

For two reasons.  First, Task Scheduler’s extensive scheduling options would be difficult to duplicate.  Second, if NodeXL Network Server had to download a network on a periodic basis, it would have to run as a Windows service, and Windows services are more complex to implement and to use than a simple command-line program.

  • How are the network files named?

Scheduling the NodeXL Network Server to run periodically can create any number of network files in the specified directory, so a file-naming scheme is needed.  The file name format is

{NetworkConfigFileName}_{Date}_{Time}.{Extension}.

So the above example, in which NetworkConfiguration.xml specifies that networks are to be saved as GraphML, might create a set of network files that look like this:

NetworkConfiguration_2010-06-01_02-00-00.graphml
NetworkConfiguration_2010-06-01_14-00-00. graphml
NetworkConfiguration_2010-06-02_02-00-00. graphml
…
  • What happens if the computer is not turned on at the scheduled time?

By default, the task won’t be performed until the next scheduled time when the computer is turned on.  However, if the computer is sleeping, you can tell Task Scheduler to wake it at the scheduled time to run the task.

  • What happens if the NodeXL Network Server encounters an error?

If the error prevents the network from being downloaded, the NodeXL Network Server creates an error file instead of a network file.  The file name starts with “Error” to make it easy to spot:

Error_NetworkConfiguration_2010-06-02_14-00-00.txt

The error file contains the details of what went wrong.

If one or more errors block part of the network but other parts of the network are successfully downloaded, then the NodeXL Network Server creates the network file containing the partial network, along with a text file that explains how many errors occurred.  The text file name starts with “PartialNetworkInfo” to make it easy to spot:

NetworkConfiguration_2010-06-02_14-00-00.Graphml
PartialNetworkInfo_NetworkConfiguration_Date.txt
  • What if I want to periodically download more than one network?

Simply schedule more than one task, each using a different network configuration file.  The tasks are independent of one another and can be scheduled to run at different times.