Every day we create over 2.5
quintillion bytes of data. This is over 20000 times the size of the
English text version of Wikipedia. This is information from Facebook
statuses to Tweets to product reviews to millions of different
things. Now say that you are a company and you want to find out the
public opinion on something be it a product, a politician, or a
medical procedure. It is not easy to have a person read the
equivalent of 20,000 wikipedias to decide if people like something or
not. And while much of this data is irrelevant, it is often very hard
to know where to look for your data. And even if you do, if you know
you want to look at every tweet in the last week and find the opinion
on something, well searching every tweet made in the last week would
still take more man power than most countries let alone businesses
can provide.
Here comes sentiment analysis.
Sentiment analysis is a technique that uses computers to analyze text
to judge opinion. Your initial thought might be "Well, that
shouldn't be too hard, and computers work a lot faster than people."
You would be slightly right and slightly wrong. Getting computers to
recognize a human concept just from the words used is a very hard
task. A first approach would be to look for positive or negative
words in relation to your product. But what if someone says "I
would hate for someone to live without this product" or "If
you enjoy pleasant day of sizzling the skin off your feet or eating
food so lively that you get dysentery, this vacation spot is the
place for you!" Sarcasm is a complex linguistic process that
many humans fail to understand, let alone machines.
And yet this is what these computers
do. They analyze text to judge public opinion and companies look at
the results and make decisions based on what they find. These
techniques are incredibly versatile and used in a myriad of ways in
the electronic world.