“Information is money; information in advance is priceless”
Data has traditionally been treated in organisations as a means to review performance and take decisions. In the current decade, data is the new oil. Oil rigs were once treated precious pieces of assets when large sized Internal Combustion engines were being engineered for compactness and durability.
From ExxonMobil to Saudi Aramco, companies have reaped riches for decades by drilling deep into ocean beds. Now a similar treatment is applied to data. Be it user generated, sensor generated, browser cookies or data centre cooling towers temperature record.
The other area that has evolved along with data is computing prowess, hardware and algorithms. Data analysis was traditionally thought to be restricted to quantitative analysis. However, another area that has evolved is the one that deals with languages and text. Natural language processing and machine learning based on grammar and phonetics has had an interesting journey thus far and the future looks very promising.
Further intriguing are the analysis that are conducted on real time data streams and data generated from social media hot spots. Twitter is one such hot spot. Despite a dwindling user base, plethora of negativity, twitter continues to wield significant power in the social media ecosystem.
An analysis technique that concurs well with twitter streams is a branch of analytics that endeavours to estimate sentiments against each tweet. Popularly known as sentiment analysis or opinion mining, this method essentially breaks down every word from the twitter stream to run a comparison against a corpus of positive and negative words. The comparison generates a count of positive and negative words to determine the overall sentiment.
The Obama administration used sentiment analysis to gauge public opinion to policy announcements and campaign messages ahead of the 2012 presidential election.
Sentiment analysis is knocking on the doors of investment houses as well. Analyst’s at Goldman Sachs use sentiment analysis to make investment decisions between stocks because shifts in sentiment on social media have been shown to correlate with shifts in the stock market.
Taking cues from the above instances imagine running a sentiment analysis using twitter feeds in the context of British elections. Would it be possible to predict vote swings between Labour and conservatives using twitter feeds well in advance?
The answer is yes. There are ready made open source algorithms that are available to run such analyses. A pre-requisite is a topic of choice for gathering tweets and perhaps a programming platform. There are numerous open source programming platforms available that can enable such analysis with minimal programming dexterity.
The sentiment analysis on Theresa May and Jeremy Corbyn from twitter feeds collected on June 6th 2017 showed the following:
As seen in the abscissa of the histogram, the scale runs from negative 4 to positive 4. In both the instance, there are around 500 tweets that have a zero score, meaning neutral. Excluding the neutral ones, the count of scores on the negative scale of Theresa may is higher than Corbyn. Similarly, the scores on the positive side are comparable. However, the bar against +2 for Corbyn seems to be slightly bigger than Theresa May which can skew the score towards positive.
The overall score computed for Theresa May turned out to be -43 and that of Jeremy Corbyn was 119. The findings are indicative of a shift in sentiment from conservatives to Labour. Let’s evaluate the efficacy of such a finding?
The final results of the elections were declared on 9th of June 2017. The below shown graph depicts the final tally of Conservatives and Labour in the 2017 elections. A drop in 13 seats for the conservatives and 30 seat gain for Labour party. The final results clearly correlate with the sentiments expressed on twitter.
Graph source: financial times