Import social media networks from WhatsApp Chats using NodeXL Pro WhatsApp is a widely used…
ICWSM 2010 Liveblog, Day 3

Fourth International AAAI Conference on Weblogs and Social Media (ICWSM-10)
Michael Kearns Keynote
Experiments: Graph Coloring / Consensus / Voting
Topology of the Network vs. what was the network used for?
Voting experiments – similar to consensus, with a crucial strategic difference.
Introduce a tension between:
-Individual preferences
-Collective unity
-Color choices; challenge comes from competing incentives
Red, blue. People unaware of global network structure
Payoffs: if everyone picks same color w/in 2 minutes, experiment ends, and everyone gets some payoff. But different players have different incentives (e.g. I may get paid p if everyone converges to blue, but 2p if everyone converges to red). If there is no consensus, nobody gets a payoff
Systems point: confuse player perceptions of system so that players don’t lock into a consensus early on
Results: Varied homophily, random vs. PA ties. 27 total experiments
Result 1: ~70% experiments solved
Result 2: a minority of hubs in a PA networks will dictate the preferences of the network (24/27 experiments converged, 100% of converged picked the minority preference)
In general, it seems to help (in terms of decreasing convergence time) to have one part of the population “care more” about their preference
Effects of “Personality” – people will be stubborn and hold out for their color even when it’s clearly in the minority.
Lessons Learned, 2005-2009
1. People are remarkably good over large set of collective tasks and network topologies (over all experiments, efficiency close to 90%)
2. Network structure matters, often on a task-specific basis
3. Problem – exogenously imposing network on subjects
-are “hard” network structures just unlikely to arise in the real world?
–Network formation games
New experiments: biased voting game + network formation
-everybody starts off as a single vertex and can’t see anyone else’s color
-at any point, players can spend money to purchase edges (money deducted from final winnings in game)
-you are shown all your neighbors, plus all other nodes in a grid. For nodes that are not your neighbors you are shown their degree and current distance away from you
Strategic tensions:
1. Should you buy edges or not? Ideally, want neighbors to buy edges for you, but need a MST to coordinate on task
2. Buy edges for information or for influence?
3. Buy early or late?
4. Buy from high degree or low degree people?
Experimental Designs: 63 experiments, no network to begin with. Additionally, ran 36 experiments where a network structure existed at beginning of experiment but edges could still be bought
Early results: Subjects do quite poorly at network formation games relative to any previous experiments! (47% in first set of tasks, 38% in second set of tasks)
–preliminary evidence shows that people are building networks that make it difficult to solve the biased voting problem
***Sentiment and Language Analysis***
ICWSM – A Great Catch Name: Semi-Supervised Recognition of Sarscastic Sentences in Online Product Reviews (Tsur et al.)
NLP
Sarcasm Detection
Motivations:
–Model the use of sarcasm – how/why (cognitive)
–Improve review summarization systems
–personalize review ranking systems
Challenge: Many different definitions beyond the basic one
–Context
–World knowledge
How do people cope?
-Temherte slaq (Some Ethiopic Languages): inverse exclam
-Reverse question mark
-#sarcasm
Data:
-Amazon product reviews (~66K)
–Books, Electronics
-Additional study based on ~6 mln tweets
Star Sentiment Baseline (Amazon)
-”Saying or writing the opposite of what you mean”
–Find unhappy reviewers, look for overwhelmingly positive sentiment
SASI: Semi-supervised Algorithm for Sarcasm Identification
-Label sarcasting-tagged sentences. Tags 1-5 (for different levels of sarcasm)
-Extract features from all training sentences
-represent training sentences in feature space, do KNN
Preprocessing: [author],[title],,[company]
Pattern-based features:
–High Frequency Words,
–Content Words
–pattern e.g. {[Frequent] [CW]}*
Weights of pattern based features:
-1: exact match
-alpha – extra elements are found between components
-gamma – incomplete match
Punctuation based features: Number of !, CAPITALIZED words/letters
Classification: weighted-kNN
Experiment 1: 5-fold cross validation on training set: F Score up to .827
Experiment 2: Gold Standard evaluation
–Human annotation of classification of new sentences: F Score up to .788
—F Score improves if you use algorithm on Tweets! (to .827)
Widespread Worry and the Stock Market (Gilbert and Karahalios)
Lab experiments in psych & behavioral econ
–Emotions affect our choices at dcision time
–Fear affects our choices, makes us risk-averse
If we estimate worry and fear, can that tell us anything about the stock market?
–Stock market is probably not efficient (e.g. more likely to go up on a sunny day than down)
–Online media have predictive information
Data: 2008 Livejournal: Feb-Jun, Aug-Sep, Nov-Dec
Why LJ? Place where people talk about their daily lives
Training data: the anxiety index
620K mood-annotated LJ posts. Picked “anxious, worried, nervous, fearful.” = 13K
C1 = Boosted decision tree with top 100 stems
C2 = Complement Naive Bayes
Both classifiers have low true positive rates
Re-mapped to low-frequency data: max of both classifier to label trading day t
Market data: SP_t = S&P-500 closing price
Controlled for volume and volatility of stock market
Method: Granger Causality (Autoregressive Approach, F test)
Result: Adding in anxiety index explains more significantly variance than baseline autoregressive model
claim: estimating worry and fear seems to have some information about market direction
Star Quality: Aggregating Reviews to Rank Products and Merchants (McGlohon, Glance, Reiter)
Google product search
The problem: given reviews, aggregated from different sources, how to measure “true quality” of product. What is the gold standard?
Challenges:
-Different sources have different review scales
-Different sources have different rating distributions
-Reviews may be plagiarized or irrelevant (cf. Danescu-Niculescu-Muzyl 2009)
Outline:
-Analyze ratings aggregated from many review sites
-Propose models to determine “true quality”
-Build evaluation framework
Data:
-Product reviews: 8M ratings (560K products, 3.8M products, 230 sources)
Observation 1: People like passing out 5′s, single-review authors disproportionately more so
Observation 2: Authors / Sources have biases
-Ratings for same product differ widely
-Authors are consistent across products (Like everything or hate everything)
-Sites vary (pricegrabber = 4.5 stars, another site = 2.9 stars average)
Observation 3: The rated object matters
Merchant reviews more “binary”
Netflix more “normal”
Observation 4: How much an object is rate matters (rich-get-richer)
Proposed Models:
1. Mean rating for an object (baseline)
2. Median rating for an object
3. Lower bound on normal confidence interval
4. Binomial confidence interval
5. Average percentile of order statistic (“most websites liked it better than other products”)
6. Filtering anonymous reviews, then average
7. Filter prolific authors, then average
8. Rate authors by reliability
Evaluation Method
-No “ground truth” for quality
-Goal: to see how reliably our ranking of “true quality” agrees with user preferences
-Hold out a pair of ratings from the same author, test on the hold-outs
-For every “prolific” author hold out two pairs of reviews at random for test data
-Then in training data, calculate estimated quality, rank objects accordingly
-Then compare the given ranking with ranking in each pair in test data
-Results: No method significantly outperforms average rating!
Share and Enjoy: