Yep, it was cold this morning in the Twin Cities. I didn’t need Twitter to tell that. Yet, we can’t always assume that, just because it is cold, people are upset, or that because it is warm, people are happy about the weather. But, we believe tweets will reveal something quite interesting: how people’s emotions are indirectly affected by the weather. For example, are people happy to be inside watching a movie even though it is “super chilly” outside? Or happy that the it is raining because it will help the garden, even though they may not be eager to be out in the rain themselves?
Having set the stage for tackling the issue of weather mood on our Pulse platform, here I describe our process for developing weather as a Pulse topic.
Pulse is fundamentally a tool for monitoring an aspect of public opinion on an ongoing basis. We believe that its interactive nature will help draw people toward Dialogue Earth’s work. Perhaps more importantly, we believe that our team will be able to leverage Pulse to gain a nuanced view of the online chatter on various topics. Those insights will position us well to create timely media that will resonate with the dialogue, ideally targeting information gaps or points of confusion.
For a topic like weather, how do we get started? We are developing a multi-step process that will enable us to start with the kernel of a topic idea and end up with a daily measure of public sentiment on that topic. These are the major process phases:
|Topic Qualification||Which topic should we tackle next?||Potential topics should be showing up in social media dialogue, such as on Twitter. Typically this will mean that the the topic is the focus of recent news coverage.|
|Question Qualification||Can we ask a question about this topic?||Identify a potential question, such as “what emotions are people expressing about the weather?” See if there is a body of literature related to the question. Begin working out a survey—what we will use as a guide for judging expressed emotion.|
|Topic Validation||Is the topic discussed sufficiently in the social media?||With a sense of the question we want to answer, estimate the volume of relevant social media chatter. This necessarily requires developing search strategies, including keyword lists.|
|Survey Validation||Does our question survey hold up to internal testing?||Have the in-house research team make judgments for a sample of relevant posts in the social media using the draft survey. Repeat until the team feels the survey is sound and results are repeatable.|
|Feedback Creation||What feedback should we provide to distributed workers?||Depending on the platform used for presenting jobs to distributed workers, it may be necessary to develop a group of posts for which the correct answer is known. Also, it may be necessary to develop feedback to the workers, should they get these known answers incorrect. This is the case with CrowdFlower, our crowdsourcing partner.|
|Survey Testing||Can distributed workers (the crowd) use our survey to answer our question?||This is the final stage in developing a new topic. Here, a well-tested survey is presented to crowd-sourced workers. Ideally, the performance of the crowd is measured against a set of data for which the internal research team has provided answers.|
How did we apply this phased approach to the topic of weather? In the case of weather, our topic validation phase was atypically brief, as described here. We chose weather because people seem to have a near-universal interest in weather, plus it is how we all interface with the environment on a daily basis.
For the question qualification phase, we spent a good bit of time digging into the academic literature. We probably spent more time on this than, in the end, was warranted. Yet, it helped us to develop a list of positive and negative emotions that people might express about the weather, as shown in the table below.
|Positive Emotions||Negative Emotions|
In the topic validation phase, we needed to get a sense of the volume of chatter in the social media, and specifically on Twitter. We started by using Twitter’s advanced search tool, but soon felt we needed more. We realized we’d need to rely on a bunch of different keywords to collect most of the weather-related tweets. Plus, in many cases, we would need to do complex searches. For example, we would want to exclude “DQ” when searching for “blizzard,” or else we would end up with a bunch of tweets that had nothing to do with weather. We liked the flexibility that Twitterfall offered for adding exceptions and multiple keywords, yet the streaming nature of this tool meant we had to analyze things on-the-fly—and they don’t offer a way to download a set of tweets that match the search criteria. To get around these limitations, our team built a Web application that enables us to create complex lists of keywords (and exceptions), view the individual tweets, highlight keywords in the collected tweets, and then export the tweets for further analysis. The screen shot below is an example of our forecasting app in action, for a subset of our weather keyword list. We’ll be writing a separate post on the rather complex strategy required to get all of the tweets with geo-location information via Twitter’s API—suffice it to say that it was not at all straightforward.
For the survey validation phase, our initial sense was to create a survey that would ask whether the author of a particular tweet expressed any of the emotions in the table above. In fact, we began with an abbreviated list of emotions, that proved to be too limiting: joy, pleasantly surprised, unpleasantly surprised, sad, angry, fearful. The research team felt that this limited set of categories didn’t encompass the full set of emotions observed, so we started expanding the survey to include groups of emotions (e.g., joy / pleased / happy / satisfied / cheerful). After a number of rounds with the internal research team, we settled on providing the list of emotions (those in the table above) and then simply asking whether the tweet author expressed a positive or negative emotion about the weather condition. The survey provides an option for indicating that a particular tweet is not relevant to weather condition. In addition, we observed that people often are only sharing information about the weather, without adding emotion. Here’s a good example of what we would consider an emotion-free tweet.
In a forthcoming companion post, I’ll discuss the last two phases: feedback creation and survey testing. Lastly, here is the complete list of keywords we used to gather tweets related to weather condition. It is a long list that we developed mostly during the topic validation phase, but also tweaked during rounds of improving the survey in the survey validation phase.
- blizzard -dq -dairyqueen -dairy -queen -mmo -war -eat -eating -program -video games -ice cream -entertainment
- cold out
- cold outside
- cold weather
- “cool weather”
- degrees -bachelor -job -doctoral -college
- dry weather
- heat wave, heatwave
- “hot out” -oven -baked
- hot outside
- humidity -weather -temp -wind -winds -temperature -airport -MPH -precip
- rainy -fund
- sleet -weather -temp -wind -winds -temperature -airport -MPH -precip, slush -fund
- snow -temp -wind -winds -temperature -airport -MPH -precip
- storm -blackberry -bberry
- sunny -wx exceptions, sunshine -”little miss”
- temp -”temp-to-perm” -assignment -gig -wx terms (-wind -mph)
- thunderstorm, thunderstorms, thunder storm, thunder storms
- “warm out”
- warm outside
- warm weather
- weather -advisory -temp -wind -winds -temperature -airport -MPH -precip
- wet weather
- windy -city
- #weather -temp -wind -winds -temperature -airport -MPH -precip -humidity
- #wx -advisory -temp -wind -winds -temperature -airport -MPH -precip -humidity