Data Analytics and the Modern World: Big Data and Big Problems: March 2014

Sentiment Analysis

Every day we create over 2.5 quintillion bytes of data. This is over 20000 times the size of the English text version of Wikipedia. This is information from Facebook statuses to Tweets to product reviews to millions of different things. Now say that you are a company and you want to find out the public opinion on something be it a product, a politician, or a medical procedure. It is not easy to have a person read the equivalent of 20,000 wikipedias to decide if people like something or not. And while much of this data is irrelevant, it is often very hard to know where to look for your data. And even if you do, if you know you want to look at every tweet in the last week and find the opinion on something, well searching every tweet made in the last week would still take more man power than most countries let alone businesses can provide.

Here comes sentiment analysis. Sentiment analysis is a technique that uses computers to analyze text to judge opinion. Your initial thought might be "Well, that shouldn't be too hard, and computers work a lot faster than people." You would be slightly right and slightly wrong. Getting computers to recognize a human concept just from the words used is a very hard task. A first approach would be to look for positive or negative words in relation to your product. But what if someone says "I would hate for someone to live without this product" or "If you enjoy pleasant day of sizzling the skin off your feet or eating food so lively that you get dysentery, this vacation spot is the place for you!" Sarcasm is a complex linguistic process that many humans fail to understand, let alone machines.

And yet this is what these computers do. They analyze text to judge public opinion and companies look at the results and make decisions based on what they find. These techniques are incredibly versatile and used in a myriad of ways in the electronic world.

Time Series Modelling

Time series are incredibly useful tools for modeling systems. Time series are basically representations of variables that change over time. They can be used to model ocean currents, stocks, population, and pretty much everything that changes over time.

Construction of time series is done by analyzing past data for a number of trends. These things can be as simple as is the data cyclic as in does it repeat a pattern over some time interval. Or it can be more complex such as having various frequency dependencies that cause various smaller cycles to occur within a larger cycle.

Some time series are chaotic in nature meaning that starting out with similar but not equal initial conditions can yield large differences in their progressions over time. Many natural systems are chaotic such as water flow during a storm, double pendulum machines, or turbulence in a vortex.

Time series can also be used to model systems that change with respect to other variables over time. This way it can model things like the stock market which changes due to many variables such as inflation or earnings. Developing an accurate time series model then allows extrapolation to future events and allows for predictions to be made. This also shows some of the limitations of the theoretical uses because clearly we do not have accurate predictors of the stock market.

This occurs because there are so many variables that affect our system that we cannot perfectly model the system. Generally we settle for approximations of systems which gives us a general idea but does not give perfect results. We construct these models to allow general predictions to be made and we strive to improve our models as this gives us results that are closer and closer to reality.

SQL and You

If you have ever used the internet in any way you have interacted with SQL and probably don't know it. SQL is an amazing tool used in the backbone of the internet to access data. SQL is the communication line between data stored in a table and a user's screen where the data is wanted. Data such as your personal information on your facebook profile, your tastes as catalogued by Netflix, frequently searched terms from Google, or item types you've shown an interest in to Amazon. All of this information is stored neatly in tables in a server and SQL is the key to getting it where it needs to go.

SQL is a programming language that is a key part of website design. SQL commands can store information inputted by an individual, recall data from a table, and check various conditions in user and website variables to alter commands accordingly. SQL is highly versatile and can be incorporated directly into a websites backbone with html support. What this means is that the very code that describes the layout of a website can have pieces of SQL that handle movement of data from server to client.

SQL is heavily involved with targeted advertisements. Companies that have data on you such as facebook or google will store this information in a table. When web pages are loaded and advertisements are selected they are picked so that the advertisements have characteristics that have appealed to you in the past and have been documented in their databases.

Sunday, March 23, 2014

Sentiment Analysis

Time Series Modelling

SQL and You