Please Note: This project is not currently active. The content on this site is provided for reference and is not actively maintained.

Digging into the South-by-Southwest (SXSW) Twitter Traffic on Apple & Google

by March 16, 2011

ssGiven we couldn’t be at South by Southwest (SXSW) this year, we thought it would be interesting to apply our developing Pulse technology to the Twitter chatter connected with the event. Pulse represents our approach to sifting out interesting information from social media dialogue. Our first major application has been in the area of weather mood, a pilot study of which is chronicled here.

The quick overview is that we leverage the power of the crowd using CrowdFlower’s platform to extract a high-quality, nuanced understanding of sentiment from Twitter tweets. Prior to going to the crowd, we develop a strategy to create a survey that we can give to crowd-based workers so that they can make reliable judgments about author sentiment. We then collect a bunch of relevant tweets, do some pre-processing to limit the size of the sentiment coding job sent to the crowd, do some preliminary rounds of coding to ensure quality control, and then run a coding job on a large number of tweets.

We’re actively tuning the system to be able to monitor weather mood across the U.S. on a continuous basis beginning in a week or so. We also are poised to do a major pilot on mood about gas prices, and we are contemplating one on the topic of spring flooding. In parallel, we think it makes good sense to apply Pulse to some topics outside of our domain. We think this will make us smarter about monitoring social media on topics connected with the environment, yielding a technology platform that is more flexible and adaptive to the continually-evolving dialogue.

The launch of SXSW just happened to coincide with our team discussions about good test cases. Within a few hours, we had turned on our collection apparatus to get all tweets that carried the hash tag #sxsw as well as any one of several keywords representing brands and products. In the end, we decided to focus our initial study on two brands (Apple and Google) and several products / platforms (iPhone, iPad, and Android).

In the past, we traditionally began our analysis work with a universe of tweets that our research team had spent a lot of time studying. This enabled us to kick off a CrowdFlower job with a bunch of units for which we knew the answer—a critical ingredient for the CrowdFlower process to work well. The timeline for this project did not allow that, so we ran three preliminary, small jobs with several hundred tweets in them.

We then used CrowdFlower’s approach they call “Gold Digging” to create units with know answers. Through this process, we created the survey pictured below. While quite straightforward, it has the potential to yield some interesting data. It is worth noting that we did not initially identify user experience with apps as a key part of the survey, but we quickly realized that a lot of the chatter was more about the user’s experience running an app on a device, rather than the device itself.



It turned out that we collected roughly 15,000 tweets from Friday (3/11/11) through Wednesday morning (3/16/11). We’re working on a robust process for removing duplicates and near-duplicates in preparation for sending a batch of tweets to CrowdFlower. For now, I removed all exact duplicates—including those that only differed because of a different shortened url—using a few spreadsheet formulas. This knocked out about 1/3 of the total.

The final step was to kick of a CrowdFlower job, which I did around 6 PM on 3/16. We’ll follow this post with a discussion of the “exciting” results!


Leave a Reply