Sentiment analysis is tough—not as in strict, like a teacher is tough, or in resilient, like a marathoner is tough. More like hard, like an AP calculus test is tough. Not hard, like a block of concrete is hard. Hard, as in difficult. Eh, nevermind.
A colleague of mine just sent me a piece from the Miller-McCune site discussing a flawed mood study about September 11 pager text messages.
Researchers from Johannes Gutenberg University in Germany had concluded that there was an escalating level of “anger” words communicated to pagers as time passed on September 11 (here’s the study). I’ve included the original data graph in this post.
A separate study by a psychologist from Clemson University later discovered that 36 percent of the “anger” words were computer-generated instructions to reboot pagers, each including the word “critical.” In this instance, “critical” meant important.
This unfortunate oversight highlights how important it is to determine relevancy and context when coding sentiment.
The original researchers used a popular text analysis software program, Linguistic Inquiry and Word Count (LIWC), to code the sentiment of the pager text (these transcripts had been released to the public by Wikileaks in 2009). As the LIWC team explains on their site, “With a click of a button, you can determine the degree any text uses positive or negative emotions, self-references, causal words, and 70 other language dimensions.”
Clearly, LIWC is a powerful piece of software. In fact, it’s leveraged by Facebook to produce the Facebook Gross National Happiness Index, which determines whether user posts are positive or negative.
For the past 18 months, my colleagues and I at Dialogue Earth have been developing Pulse, a tool to determine trends in sentiment expressed in the social media.
After evaluating machine-based sentiment products and talking to experts in the field, we determined that effective machine-based coding—for the short, often cryptic, Twitter tweets we sought to understand—remained a challenge. We’ve been using an approach that relies mainly on crowd-sourced workers to code the sentiment of tweets on various topics, from specific chatter about Google and Apple during SXSW Interactive, to the broad public mood about weather and gas prices.
One goal has been to determine which text are truly relevant, and to create rules and instructions for interpreting the context of a particular word or phrase. My colleague has written about our approach to developing quality controls with a crowd-sourced workforce.
Before the crowd gets an assignment, we have scripts running on the front end to filter in only the tweets that are most relevant to whatever question we’re aiming to answer, and to identify duplicate text.
To be clear, I’m not faulting automated sentiment analysis. I envision a future state of Pulse where the majority of sentiment coding is done by algorithms, with humans only taking on text units for which the computer algorithms indicate low confidence.
In a previous post, I wrote a response to a piece in Social Times about automated sentiment in which I expressed our hope for human coded sentiment. The author, Dr. Taras Zagibalov, makes solid points that automated systems perform best when their classifier is tuned for a particular domain (subject matter) and is constantly updated to remain current.
As I reflect on how it that German research team might be kicking themselves for not filtering out the “critical reboot” pager messages (relevant to the volume of text, perhaps, but not to the sentiment), I’m reminded how critical it is to understand one’s data before drawing major conclusions.