Please Note: This project is not currently active. The content on this site is provided for reference and is not actively maintained.

Teasing Out Opinions About Global Warming From Twitter

by June 24, 2011

snapshot-ca A couple of months ago, we posted results from a quick sampling of mood about global warming in the Twittersphere that was featured in Momentum, the publication of the University of Minnesota’s Institute on the Environment. Along with our work on weather mood and mood about gas prices, we are on the verge of releasing a more in-depth analysis of sentiment about global warming. Here, we explain the method behind our sentiment analysis related to global warming, building off an earlier post that presented some of the details of our methodology on studying global warming chatter.

Terminology: “Global Warming” vs. “Climate Change”

For this project, we are considering the terms “global warming” and “climate change” to be interchangeable, even though they really are not. Briefly, global warming addresses the phenomenon of increasing atmospheric concentrations of greenhouse gases causing an increase in the heat-trapping capacity of the atmosphere, thereby warming the atmosphere. Climate change is a term that encompasses global warming, but also includes phenomena such as long-term shifts in precipitation patterns (and ensuing floods, droughts, shifts in snowfall, reduced river flows, etc.), sea level rise, ocean acidification, and many more. We have tuned our collection strategy (below) based on the realization that the general public, including Twitter users, use these terms interchangeably.

Collecting Relevant Tweets

The first step in our process is to dip our proverbial cup in the river of Twitter data and extract relevant posts, or tweets. Our early research indicated that tweets containing the keywords global warming, climate change, or #climate are relevant to answering the question: Does the Twitter user believe that global warming (or climate change) is occurring, or not? This certainly does not capture all relevant tweets—for example, a tweet that contained only a disparaging comment about Al Gore related to his work on the environment would probably be sufficient evidence that the tweet’s author did not believe global warming was occurring. However, we believe using these three keyword phrases is a solid starting point.

For data collection, our team has developed computer scripts that pull data from Twitter’s APIs into our database. For lower-volume activity, like that for global warming, we rely on their search API (returns tweets that match a specified query). For higher volume activity, such as we encounter for weather-related tweets, we use Twitter’s streaming API (allows high-throughput near-realtime access to various subsets of public and protected Twitter data).

Sampling

Our overall approach to sentiment analysis is based on having humans make judgments about the sentiment expressed via tweets. It is our assessment that computer algorithms are not yet capable of high-quality sentiment analyses of these short, often cryptic tweets. Because these judgments are expensive—we pay our crowd-sourced workforce about $0.10 for five “trusted” judgments on a single tweet—we typically extract a random sample of tweets for a given time period. (Note that we are beginning work with a team of computer scientists to see if it will be possible to tune computer algorithms to eliminate the need for some of these costly human judgments.)

Our first attempt to sample the chatter about global warming across the U.S. made it clear that, because of low tweet activity, we would need to  collect data over several weeks (and probably a month) to have enough of a sample for a reasonable analysis. After looking at the flow of tweets into our database during April and May, we settled on taking a sample of up to 500 tweets per geographic unit for a month-long period. Thus, for each state and metropolitan area, with more than 500 tweets over a month, we would extract a random sample of 500 tweets. For those with fewer than 500, we would obtain judgments on every tweet. This is a good starting point, but to be clear, our intention is to provide an interesting snapshot of the chatter and not a representative, national survey of global warming opinions.

Prior to sending a batch of tweets to CrowdFlower, our crowdsourcing partner, we removed duplicate tweets. All told for a month-long period beginning the last week of April, 2011, we sent just over 18,000 tweets to CrowdFlower.

Sentiment Judgments

Our research team developed our overall approach to sentiment analysis while focused on the issue of global warming. Early on, we anticipated that global warming would continue to be in the news and social media chatter.  Plus, extracting sentiment from Twitter on such a tricky topic appeared to be a good challenge.

Workers who participate in our job on CrowdFlower are presented with the following instructions:
cf-survey-instructions-a

The accompanying table provides a number of examples of how judgments should be made:
cf-survey-table

Workers are then asked to make a judgment about the sentiment of individual tweets. Here are three examples, including the judgments returned by the CrowdFlower workers.
cf-survey-sample-tweets-annotated

As mentioned above, we currently seek five trusted judgments for each tweet. CrowdFlower offers an innovative solution for ensuring a high degree of quality control in such crowd-based judgments. We have decided to accept CrowdFlower judgments that have a confidence score of at least 60% (see this post for details on how we came to this decision on quality control).

For the three sample tweets shown above, we conclude from the CrowdFlower data that the first author does not believe global warming is occurring (we also tentatively conclude that a person who re-tweets this without adding any additional sentiment agrees with the original author). The second tweet doesn’t reveal enough about the author’s sentiment on global warming. The third one is suggestive of specific sentiment on global warming, although we really would need a bit more information about the author to make a judgment (note also that the confidence score is just below our threshold, so this tweet would fall into the bucket of tweets that cannot be coded successfully at this time).

This process remains a work in progress, and we welcome your feedback!


Leave a Reply