Please Note: This project is not currently active. The content on this site is provided for reference and is not actively maintained.

Hope for Human Sentiment Analysis Coding

by May 13, 2011

I just read an interesting blog post on Social Times discussing the advantages of machine-based sentiment analysis. In the piece, author Dr. Taras Zagibalov challenges the critics of “automatic” sentiment analysis, who claim that humans can better determine than computers the sentiment of social media text. He asserts that, with the proper tuning of a system’s classifier—creating specific classifiers for each domain (subject matter) and keeping them current—a machine-based sentiment analysis system can outperform human accuracy.

The discussion of human vs. machine sentiment is core to our work at Dialogue Earth, where we are developing Pulse—a social media analytics tool to help tease out nuances in the social media dialogue about key societal topics. Pulse social media analytics tool After evaluating machine-based sentiment products and talking to experts in the field, we determined that effective machine-based coding—for the short, often cryptic, Twitter tweets we sought to understand—remained a challenge.

Our research team began the lengthy process of creating a human-based sentiment coding process. More than 18 months later, we’ve learned a great deal, and believe we have developed a tool that can deliver accurate sentiment analysis of Twitter tweets on a timely basis.

Dr. Zagibalov offers valid points about the pitfalls of human-based sentiment analysis. He asserts that, like computers, humans struggle if they don’t have sufficient context about the text they’re evaluating. He also states that, unlike computers, humans also suffer from issues of time, boredom and concentration, and that they are “not ‘designed’ for doing monotonous work.”

Our process for human-coded sentiment addresses Zagibalov’s concerns. As I outlined in a recent post, Dialogue Earth leverages a crowdsourced workforce to analyze Twitter chatter on topics like the U.S. mood about weather.

Pulse, by Dialogue Earth We begin with a small team of subject matter experts, who research topics and develop two key elements: a keyword list that allows our system to collect only the most relevant tweets, and instructions to provide context for interpreting topic-specific tweets. We have partnered with CrowdFlower to create quality controls on top of distributed workforce channels, including Mechanical Turk.

As we strive to continually improve accuracy and efficiency, we are focusing on two main quality control tactics, which are explained in detail in my colleague’s post.

First, we create and continually optimize the instructions we provide workers at the beginning of each CrowdFlower job. These instructions include tips for identifying tricky issues like sarcasm and context in the online chatter, as well as specific tweet examples for each potential area of confusion.

Second, we leverage CrowdFlower’s system of “gold” units, which are a small percentage of work units for which we know the answer. Each time a worker gets a gold unit wrong, we are able to explain to them the correct answer. If a worker gets too many of these gold units incorrect, we are able to remove them from a job.

As we fine tune our process, the accuracy continues to improve, and we’re getting the turnaround time down. The main barrier to real-time sentiment analysis remains cost, as it can be pricey to pay the crowd for sentiment analysis work. Currently, we require that each tweet is judged by five independent workers. We only add to our data set those tweets for which three or more coders have the same judgment (a confidence score of 60%). This is a quality control tactic that is effective, but costly and time consuming.

With that in mind, we believe that, for cost-effective sentiment analysis that is both timely and accurate, we will need to combine human coding with machine learning. We’ve been talking with experts in machine-based coding about having algorithms run on the front end of each job, pulling out not only duplicates (as we do now) but coding each tweet and providing a confidence score which we can use to determine the units that should go to our crowd-sourced workforce. To complete what we hope to be a cycle on continual improvement, the human-coded data would serve as training data for our machine-based algorithms.

Ultimately, we’re committed to continually advancing our sentiment analysis process, and are looking forward to learning the best that humans and machines have to offer.


Leave a Reply