Please Note: This project is not currently active. The content on this site is provided for reference and is not actively maintained.

Teasing Out Weather Mood From Twitter Posts: A Pulse Pilot

by March 8, 2011

In choosing a topic to use as a test case for our Pulse social media analytics tool, we wanted to pick something that is broadly discussed. What better topic to start with than people’s mood about the weather? It is hard to escape having a few thoughts about the weather on a regular basis. Snow storms, sunny warm days, and heatwaves, to mention a few, cross party lines and ideological divides. Plus, people love to discuss the weather, so we figured there would be lots of chatter in the social media—and we haven’t been disappointed. Read more on our weather strategy here.

In this post, I describe our first demonstration of the Pulse platform to describe weather mood across the U.S. using 12,500 tweets collected on February 4th. While our process is a work in progress, there are several key steps: identifying and collecting useful social media posts, getting reliable judgments about the sentiment in these posts made by crowd-sourced workers, publishing the data on our Pulse platform, and finally, combining our sentiment data with external data sources to tease out a story about the drivers of the observed sentiment.


After several rounds of preliminary testing of our survey to guide distributed (commonly referred to as “crowd-sourced”) workers accessed via CrowdFlower, we confirmed that it would be possible to get high-quality judgments about weather mood from these distributed workers (read more here).

Using our Forecasting application, we collected tweets with our suite of weather-related keywords (read more here) on February 4th, 2011. Due to the way the application works, most of the tweets were for the 4th, with decreasing numbers for a few preceding days as can readily be seen in the interactive map and graph. In total, this collection run yielded about 24,000 tweets. (Note that we are developing a more sophisticated system for collecting relevant tweets on an ongoing basis, but that system was not operational for this data run.)

One key step in our process is to filter the collected tweets in order to reduce unnecessary work for the crowd. For example, duplicate tweets or re-tweets that have no added sentiment should only be analyzed a single time by the crowd. Using some simple spreadsheet functions, I removed exact duplicates and re-tweets of original tweets already captured in the data set. This eliminated about 1800 tweets. The total dropped to about 19,000 after eliminating a subset that had no apparent geo-location information associated with them (note our Forecasting application collects some tweets that have no geo-location information). The next step was to filter out tweets that had geo-location information that was suspect—for example, some people list their location as “Mars.” Our technical team uses geo-location services from Yahoo! and Google that we’ll describe in forthcoming blog posts. We ended up with about 12,500 tweets that had geo-location information that was provisionally deemed useful.


We set up a job on CrowdFlower with the 12,500 tweets (units) on the afternoon of February 25th with instructions (above) and survey (below). We sought five judgments per tweet from “trusted workers” (see more about that in this post).


In our testing leading up to this run, we paid 1.3 cents per judgment. Largely due to sticker shock for the overall job, I reduced the amount to 1 cent per judgment. The job ended up running more slowly than predicted by the CrowdFlower estimator. It could have been that workers found the price-per-judgment to be less attractive. Another factor was that the default is to limit a worker to a total of 500 judgments, so for 60,000 plus judgments, we needed to attract at least 120 individual workers, and probably a bunch more.


The job finished in three days, with over 66,000 trusted judgments out of total of nearly 90,000. We had about 25% of the total judgments that were deemed untrusted, because workers did not get a sufficient number of gold units correct (read more here).

Next, our team moved the judgments into our database, preparing to publish them to our Pulse interface. Based on our preliminary testing, we only used data that had a CrowdFlower confidence score of at least 60%. Then, when mapping the data, we required that a state or metro area have at least 10 tweets for a given day to appear—we’ll definitely need to be more rigorous about this in future tests, although this seemed like a reasonable starting point for a minimum data threshold.

This link will take you to the interactive Pulse platform for this data set of weather-related tweets collected on February 4th. You’ll find a separate post in the interface that provides a brief summary of what we could tease out from the data. Future pilot tests will focus on improving all aspects of our process, working toward being able to report weather mood across the U.S. on a daily basis. We’ll soon be adding other topics to the mix. Next up: analyzing the mood about gas prices.

Leave a Reply