My first exposure to statistical analysis was from the marvelous book Chaos Theory Tamed by Garnett P. Williams. This book explores chaotic systems, systems that are organized by clearly defined rules yet have seemingly-random behavior and are highly sensitive to initial conditions. In this book he explored many of the fundamental techniques I apply to analyzing time series models such as stationarity checks, analysis for seasonality, periodicity, and trend.
The idea behind chaos theory is looking for trends in information, finding clues that indicate there is a pattern behind the data rather than just random noise. And this technique applies to many other branches of statistical analysis. One of the main goals in modelling is determining if there is some predictability in variables or if they have no effect on one another. Finding these correlations is vital to developing proper models.
Chaotic behavior is often found in places like fluid dynamics and hypothesized to be in systems as complex as the stock market. It is an incredibly interesting phenomena that demonstrates many of the interesting features of statistical modelling.
Data Analytics and the Modern World: Big Data and Big Problems
Monday, April 7, 2014
Universally Selected Hyper Logically Developed Quantum Information Theory (U SHLD QuIT)
A common problem that I have mentioned many times before is the problem of big data. There is an enormous amount of information in the world. We have learned how to harness inputs from a myriad of different fields and the result is more data that can be feasibly handled using classical techniques.
This naturally gives rise to the question, what are some non-classical techniques? And many of these have been discussed before such as a different way of isolating trends or a different method of modelling. These ideas are based on the fact that we have only so much processing power and increasingly large amounts of data. But there is another option. What if we limit our processing technique, but in exchange give it nearly unlimited power? In other terms, running our programs won't tell us the same things, but they will run orders of magnitude faster! What is this amazing technology you ask? Well, welcome to quantum computing.
Quantum computing has suffered a large part of poorly-researched journalism over the years but after focusing my summer research project at Stanford on quantum information theory I feel prepared enough to banish the illusions.
The basis of quantum computing is that the concept of a bit, a "light bulb" that is either on or off, can be slightly changed. In classical computing this idea of on or off, all or nothing is how we store data. Through long strings of on and off light bulbs (or 1's and 0's as they are often known) we can express all manners of ideas. Quantum computing uses physical properties of the universe to make things a little bit more interesting. Instead of a bit being on or off, it has some probability of being on, some probability of being off. Basically that means that we don't know if it is a 1 or a 0 and if we look closely enough we can find out, but without looking closely all we know are these probabilities. (And while this explanation still skates around some major concerns, it is accurate enough for this blog post)
But Ryan, what does this have to do with analyzing data? I'm glad you asked! It turns out that since this bit can have a whole continuous spectrum of probabilities of its on and offs, it can store a lot more data in it. This means that we can put are large amounts of data, translate them into these "quantum bits" and use them for our purposes. But, this comes with a great drawback. Information in a "qubit" is not as accessible as a regular bit. When we "read" qubits, information is lost. It resolves into either a 1 or a 0, and any other information is lost. However, there are certain mathematical techniques that we can use to solve problems faster than we could using classical bits. And thus comes the hope that someday we can use these techniques to analyze large amounts of data in a quick fashion.
SUMaC and Statistics
Mathematics and modelling go hand in hand. Much of mathematics is simply development of models that fit some subset of the universe or categorize some phenomena. So when it comes to statistical modelling a good deal of math is often involved. Yet this is a problem, because as useful as this modelling is, it suffers an enormous scarcity issue due to a challenging problem.
Thankfully I avoided this issue by enrolling in Mrs. Bailey's Category Theory class. There I learned an appreciation of math that I had lacked before and it led me on my path toward the Stanford University Mathematics Camp. And that was where I learned the true meaning of being a mathematician. It is more than creating formulas and equations. These things are often done, but it comes down to more than that. Mathematics is about solving problems in a logical manner. And these techniques are the cornerstone of succeeding in the modern business world.
Friday, April 4, 2014
A Brief Summary of Ryan Smith
One of my frequent human interactions in the last few week has been in the Stanford facebook group. It is composed of the admitted students and after regular decisions came out a week ago the group has been inundated with new people. A common trend is for people to introduce themselves, talking about things they like, and similar events. I eventually decided to take a stab at it and here is my introduction.
I want you all to know that you are influencing me with peer pressure and that is wrong and you should feel terrible. Well, with that out of the way...
Hello everyone, I'm Ryan Smith and if you know a way to put me into a stasis until September please let me know! I am the youngest of 5, an act-a-holic, frequent video game connoisseur, and math enthusiast. That's in fact how I came to apply to Stanford. A few of my friends had attended the Stanford Mathematics Camp (SUMaC represent!) and this led me to apply in 2013 and I proceeded to have one of the best summers of my life. I met lots of amazing people, many of which are in this group, fell in love with the campus, the lifestyle, and the community. From that point on I couldn't see myself going anywhere else and I was and am very relieved that I found out that I was going in December.
On other topics, I've made a second home at my local community theater and have been a part of over 40 performances in the last 4 years and if there is anything that I am going to miss it will be my lovely Fountain Hills Theater.
The other major influences in my life have been Warcraft III, my first major online game, WoW and LoL as a place where I found many of my closest friends, and my obnoxious older sister who has shaped my mind to her own purposes.
I"m going to major in Mathematics and Computer Science and love learning about all of the amazing technologies we use. Anyway, that's me, hi. How are you?
Abstract
Today I am showing my abstract for my SRP presentation. This is the first step toward my actual presentation which I will present in May. Without further adieu here is my abstract.
As the world of data analytics becomes increasingly vital to
the business world, many corporations are utilizing it to streamline their
marketing, sales, and development departments. This research project explores
the data manipulation techniques and tools used by software giants like Google,
Facebook, Amazon, and Netflix to market their product and improve their
services. These companies utilize petabytes of information that ranges from
data on their clients to marketing trends of certain products and this
information requires proper handling to prove useful. There are many different
approaches to analyzing this data such as time series analysis or regression
modeling and as time progresses even more advanced techniques are being developed.
The research on this topic was conducted by analyzing the tools used by these
companies, such as sentiment analysis and segmentation modeling and the tools
used to manage data in general such as SQL and R. The purpose of this project
is to provide a perspective on how important information management is to the
modern world and shows that the new techniques in data analysis are critically
important to success as a major business.
Update on Life
Today I'm giving a general update on things I've been doing for the past few weeks. I've been learning a lot about the programming language/analytics tool R which is enormously useful for creating models and processing data. It shares common features with many languages like C+ or Java and only requires learning a little new syntax. It's made a number of my projects easier.
This last weekend I learned all about the mathematics of sound design helping my theater set up for their annual fundraiser Broadway in the Hills. The gist of it is that setting up a temporary acoustic environment in a day is enormously challenging and requires a LOT of wiring.
In terms of colleges last week was the D-Day for a lot of schools and I am happy to announce that I was rejected by all of the other high end schools I applied to including Harvard, Caltech, MIT, and Harvey Mudd. While slightly saddening I can understand their decisions as my applications may have suffered after I was accepted into Stanford in December.
On top of my internship I am currently a part of 3 performances of the Fountain Hills Theater. I am running sound for the comedy The Man Who Came to Dinner, student stage managing The Little Princess: Sara Crewe, and performing at Papa Vito in our annual Murder Mystery event, Bellamorte! I do these with a mix of pleasure and pain as I know that there will not be many more chances for me to spend time at my home away from home for these past 4 years but I hope to go out with a bang!
This last weekend I learned all about the mathematics of sound design helping my theater set up for their annual fundraiser Broadway in the Hills. The gist of it is that setting up a temporary acoustic environment in a day is enormously challenging and requires a LOT of wiring.
In terms of colleges last week was the D-Day for a lot of schools and I am happy to announce that I was rejected by all of the other high end schools I applied to including Harvard, Caltech, MIT, and Harvey Mudd. While slightly saddening I can understand their decisions as my applications may have suffered after I was accepted into Stanford in December.
On top of my internship I am currently a part of 3 performances of the Fountain Hills Theater. I am running sound for the comedy The Man Who Came to Dinner, student stage managing The Little Princess: Sara Crewe, and performing at Papa Vito in our annual Murder Mystery event, Bellamorte! I do these with a mix of pleasure and pain as I know that there will not be many more chances for me to spend time at my home away from home for these past 4 years but I hope to go out with a bang!
Sunday, March 23, 2014
Sentiment Analysis
Every day we create over 2.5
quintillion bytes of data. This is over 20000 times the size of the
English text version of Wikipedia. This is information from Facebook
statuses to Tweets to product reviews to millions of different
things. Now say that you are a company and you want to find out the
public opinion on something be it a product, a politician, or a
medical procedure. It is not easy to have a person read the
equivalent of 20,000 wikipedias to decide if people like something or
not. And while much of this data is irrelevant, it is often very hard
to know where to look for your data. And even if you do, if you know
you want to look at every tweet in the last week and find the opinion
on something, well searching every tweet made in the last week would
still take more man power than most countries let alone businesses
can provide.
Here comes sentiment analysis.
Sentiment analysis is a technique that uses computers to analyze text
to judge opinion. Your initial thought might be "Well, that
shouldn't be too hard, and computers work a lot faster than people."
You would be slightly right and slightly wrong. Getting computers to
recognize a human concept just from the words used is a very hard
task. A first approach would be to look for positive or negative
words in relation to your product. But what if someone says "I
would hate for someone to live without this product" or "If
you enjoy pleasant day of sizzling the skin off your feet or eating
food so lively that you get dysentery, this vacation spot is the
place for you!" Sarcasm is a complex linguistic process that
many humans fail to understand, let alone machines.
And yet this is what these computers
do. They analyze text to judge public opinion and companies look at
the results and make decisions based on what they find. These
techniques are incredibly versatile and used in a myriad of ways in
the electronic world.
Subscribe to:
Posts (Atom)