Please Note: This project is not currently active. The content on this site is provided for reference and is not actively maintained.

Posts Tagged ‘machine learning’

Training the Cloud with the Crowd: Training A Google Prediction API Model Using CrowdFlower’s Workforce

by February 29, 2012



Can a machine be taught to determine the sentiment of a Twitter message about weather?  With the data from over 1 million crowd sourced human judgements the goal was to use this data to train a predictive model and use this machine learning system to make judgements.  Below are the highlights from the research and development of a machine learning model in the cloud that predicts the sentiment of text regarding the weather.  The following are the major technologies used in this research:  Google Prediction APICrowdFlowerTwitter,  Google Maps.

The only person that can really determine the true sentiment of a tweet is the person who wrote it.  When the human crowd worker makes tweet sentiment judgements only 44% of the time do all 5 humans make the same judgement.  CrowdFlower’s crowd sourcing processes are great for managing the art and science of sentiment analysis.  You can scale up CrowdFlower’s number of crowd workers per record to increase accuracy, of course at a scaled up cost.

The results of this study show that when all 5 crowd workers agree on the sentiment of tweet the predictive model makes the same judgement 90% of the time.  When you take all tweets the CrowdFlower and Predictive model return the same judgement 71% of the time.  Both CrowdFlower and Google Predictions supplement rather than substitute each other.  As shown in this study, CrowdFlower can successfully be used to build a domain/niche specific data set to train a Google Prediciton model.  I see the power of integrating machine learning into  crowd sourcing systems like CrowdFlower.  CrowdFlower users could have the option of automatically training a predictive model as the crowd workers make their judgements.  CrowdFlower could continually monitor the models trending accuracy and then progressively include machine workers into the worker pool.  Once the model hit X accuracy you could have a majority of data stream routed to predictive judgments while continuing to feed a small percentage of data the crowd to refresh current topics and continually validate accuracy.  MTurk hits may only be pennies but Google Prediction ‘hits’ cost even less.


More »

Hope for Human Sentiment Analysis Coding

by May 13, 2011

I just read an interesting blog post on Social Times discussing the advantages of machine-based sentiment analysis. In the piece, author Dr. Taras Zagibalov challenges the critics of “automatic” sentiment analysis, who claim that humans can better determine than computers the sentiment of social media text. He asserts that, with the proper tuning of a system’s classifier—creating specific classifiers for each domain (subject matter) and keeping them current—a machine-based sentiment analysis system can outperform human accuracy.

The discussion of human vs. machine sentiment is core to our work at Dialogue Earth, where we are developing Pulse—a social media analytics tool to help tease out nuances in the social media dialogue about key societal topics. Pulse social media analytics tool (more…)

More »