ThreadMill 0.1: Social Accounting for Message Thread Collections

The Social Media Research Foundation is pleased to announce the immediate availability of ThreadMill 0.1.  ThreadMill is a free and open application that consumes message thread data and produces reports about each author, thread, forum, and board along with visualizations of the patterns of connection and activity.  ThreadMill is written in Ruby, and depends on MongoDB, SinatraRB, HAML, and Flash to collect, analyze, and report data about collections of conversation threads.

Threaded conversations are a major form of social media.  Message boards, email and email lists, twitter, blog comments, text messages, and discussion forums are all social media systems built around the message thread data structure.  As messages are exchanged through these systems, some messages are sent as a reply to a particular previous message.  As messages are sent in reply to prior messages, chains of messages form.  Message chains come in two major forms: branching and non-branching.  Branching threads are those that allow more than one message to reply to a prior message.  Non-branching threads are single chains, like a string of pearls, that allow only one message to reply to a prior message.  Many web based message boards are non-branching.  Many email systems and discussion forums are branching.

ThreadMill requires a minimal set of data elements to generate its reports.  A data table must minimally have a column of information for each message that includes the name of the message board, the forum, the thread, and the author, along with a unique identifier for each message and the date and time it was posted.  Optional data elements include the unique identifier of the message being replied to, the URL of the message, and the URL for a profile photo.

All forms of threaded message exchange can be measured.  Simple measures like the count of the number of messages or the number of authors are obvious and useful.  Other measures can be created from more sophisticated analysis.  For example, the network of connections that forms as different authors reply to one another can be extracted and analyzed using network analysis methods.  It is possible to calculate metrics from these networks of reply that describe the location of each person in the graph.

ThreadMill generates several data sets that can be used to create visualizations of the activity and structure of a message board collection.

A Treemap data set can illustrate the hierarchy of encapsulated authors within threads, threads within fora, fora within boards, and boards within collections.  Treemap visualizations of collections of threaded conversations can quickly highlight the most active or populous discussions.

An AuthorLine visualization takes the form of a double histogram, with bubbles representing each thread active in each time period sized by the volume of messages the author contributed, sorted by size.  Threads that have been initiated by the author are represented as bubbles above the center line.  Messages that the author contributes to threads started by other authors are represented as bubbles stacked below the center line.  AuthorLines quickly reveal the pattern of activity an author displays and identifies which of several types of contributors the author is.

A scatter plot visualization represents each author as a bubble in an X-Y space defined by the number of different days the author was active against the average number of messages the author contributes to the threads in which they participate.

A time series line chart reveals the days of maximum and minimum activity along with trends.

A network diagram reveals the overall structure of the discussion space and the people who occupy strategic locations within the network graph.

ThreadMill has received generous assistance from Morningside Analytics.  Bruce Woodson implemented ThreadMill.

About Marc Smith

Chief Social Scientist
Marc@connectedaction.net

Connected Action Group
Marc Smith on Twitter
Marc on Delicious
NodeXL

Marc Smith is a sociologist specializing in the social organization of online communities and computer mediated interaction. He founded and managed the Community Technologies Group at Microsoft Research in Redmond, Washington and led the development of social media reporting and analysis tools for Telligent Systems. Smith leads the Connected Action consulting group and lives and works in Silicon Valley, California. Smith co-founded the Social Media Research Foundation (http://www.smrfoundation.org/), a non-profit devoted to open tools, data, and scholarship related to social media research.

Smith is the co-editor with Peter Kollock of Communities in Cyberspace (Routledge), a collection of essays exploring the ways identity; interaction and social order develop in online groups. Along with Derek Hansen and Ben Shneiderman, he is the co-author and editor of Analyzing Social Media Networks with NodeXL: Insights from a connected world, forthcoming from Morgan-Kaufmann in July 2010 which is a guide to mapping connections created through computer-mediated interactions.

Smith's research focuses on computer-mediated collective action: the ways group dynamics change when they take place in and through social cyberspaces. Many "groups" in cyberspace produce public goods and organize themselves in the form of a commons (for related papers see: http://www.connectedaction.net/marc-smith/). Smith's goal is to visualize these social cyberspaces, mapping and measuring their structure, dynamics and life cycles. At Microsoft, he developed the "Netscan" web application and data mining engine that allows researchers studying Usenet newsgroups and related repositories of threaded conversations to get reports on the rates of posting, posters, crossposting, thread length and frequency distributions of activity. Smith applied this work to the development of a generalized community analysis platform for Telligent, providing a web based system for groups of all sizes to discuss and publish their material to the web and analyze the emergent trends that result. He contributes to the open and free NodeXL project (http://www.codeplex.com/nodexl) that adds social network analysis features to the familiar Excel spreadsheet. A tutorial on social network analysis is evolving into a book and is freely available (http://casci.umd.edu/NodeXL_Teaching). NodeXL enables social network analysis of email, twitter, flickr, www, facebook and other network data sets.

The Connected Action consulting group (http://www.connectedaction.net) applies social science methods in general and social network analysis techniques in particular to enterprise and internet social media usage. SNA analysis of data from message boards, blogs, wikis, friend networks, and shared file systems can reveal insights into organizations and processes. Community managers can gain actionable insights into the volumes of community content created in their social media repositories. Mobile social software applications can visualize patterns of association that are otherwise invisible.

Smith received a B.S. in International Area Studies from Drexel University in Philadelphia in 1988, an M.Phil. in social theory from Cambridge University in 1990, and a Ph.D. in Sociology from UCLA in 2001. He is an affiliate faculty at the Department of Sociology at the University of Washington and the College of Information Studies at the University of Maryland. Smith is also a Distinguished Visiting Scholar at the Media-X Program at Stanford University.