Please Note: This project is not currently active. The content on this site is provided for reference and is not actively maintained.

A Journey to Understand Social Media Sentiment

by February 14, 2011
Brand Bowl 2011

Chrysler stood atop the final standing for Brand Bowl 2011.

On Super Bowl Sunday, 106.5 million viewers were watching the big game—the largest TV audience ever, according to Nielsen. Many tuned in to witness the Packers battle the Steelers; even more, I imagine, were watching to see emerging brand Groupon face off against fan-favorite Go Daddy and advertising stalwarts Pepsi, Doritos and Volkswagen.

Millions were simultaneously browsing the Web, monitoring game stats and their Super Bowl pools, and checking out the brands advertised on the TV spots. A much smaller group of advertising and social media junkies were simultaneously glued to “Brand Bowl 2011,” a venture between ad agency Mullen and social media monitor Radian6 to monitor and rank the sentiment of Twitter references of Super Bowl advertisers.

Within that circle was a tiny group of folks who, like me, weren’t paying much attention to the rankings (Chrysler won, deservedly so), but rather were intensely reading through the Brand Bowl tweet stream and thinking things like, “I wonder if they got the sarcasm correct on that one?” and “How did they handle that horrific typo?”

Why am I so obsessed with sentiment analysis? We determined early on that, for Dialogue Earth to successfully increase public understanding on environmental topics, we needed to first understand the trends and nuances of the current social media dialogue. Over the past year, we’ve contemplated various approaches to interpret social media text. It’s a journey that is far from over, but we have already learned that, to have a shot at correctly interpreting social media, you need a broad approach—one that combines the benefits of subject matter experts, machine algorithms and crowd-sourced workers.

Our work began by forming a small group of researchers, who analyzed emotions present in the comments on New York Times pieces about global warming. We started with this data set mainly because the Times had an excellent API interface from which we could pull text.

We soon realized two things. First, each comment contained so much text that it took a substantial amount of time to code the many emotions. Secondly, we weren’t as interested in understanding emotion as we were opinion on specific topics, like global warming.

We switched from NY Times comments to Twitter tweets, in hopes that the 140 character limit would allow for more rapid coding. And, we approached the coding of each tweet trying to answer a specific question: Could we infer whether the author believed that global warming is/was occurring? We developed a process for pulling specific tweets via the Twitter API. We created a rule book to govern our judgments and, round after round of coding, focused on improving our team’s inter-coder reliability. As well as our team could code, however, we knew we’d never be able to keep up with the thousands of daily tweets on global warming and climate change.

Twitter Sentiment: "global warming" search

Sentiment results from a Twitter Sentiment search on "global warming" tweet

We simultaneously investigated various automated approaches. We looked at tools claiming to provide business intelligence through social media sentiment, such as Twendz, Social Mention, UberVU, Twitrratr, and many, many others. Some were useful, some were broken, some were free, and many were trying hard to sell us a subscription. In the end, the one tool with which we felt most comfortable was Twitter Sentiment, a project created by some Stanford computer science graduate students.

Twitter Sentiment was an academic endeavor, used a smart algorithm to classify positive and negative sentiment, and was completely transparent. We contacted them and began discussing ways to use and improve their classification algorithm. It was clear that their system would be able to code tweets at scale. But, despite some proof that algorithms could detect sarcasm in written text, we were still convinced that computers were going to need some human help to correctly code for context and colloquialisms.

Seeking a balance of accuracy and scalability we looked to “the crowd,” setting out to test whether a crowd-sourced workforce could correctly code a large number of tweets. We re-factored our rule book, and created a pilot “job” within CrowdFlower, a company that aggregates and pays crowd-sourced workers to execute small tasks. The results were encouraging: In a matter of hours, hundreds of workers combined to code a thousand tweets, determining the sentiment of tweets with accuracy as high as 93% compared to coding by our research team.

Today, we’re tackling a much broader topic: weather mood, to feature in a multimedia visualization of sentiment we’re calling Pulse.  We’re again leveraging our research team, who serve as subject matter experts to create and refine a set of instructions for interpreting sentiment on tweets about weather conditions. We’re also working with CrowdFlower to develop jobs that best use their workforce. But, with hundreds of thousands of tweets about the weather each day, we are going to need to involve computer algorithms to help us curate data before sending it off to the crowd. Sampling data and filtering out duplicates and re-tweets are some of the approaches we are testing.

As we look to the future, we’re excited for the potential to combine the best of subject matter experts, machine algorithms and crowd-sourced workers. I’m convinced that our team of collaborators is going to figure out how to code social media sentiment in an accurate, scalable and timely manner. But, clearly, since social media dialogue is ever evolving, and human interpretation is always a bit subjective, I have no doubt our journey towards superior sentiment understanding will go on and on.


Leave a Reply