Senior
Research Project
Ryan
Smith
October
25, 2013
I. Title
of Project: Data Analytics and the Modern World: Big Data and Big Problems
II. Statement
of Purpose:
In the fast-paced, technological
world in which we live, data is becoming more and more important. Data analysis
is the study of records to provide information. It is clearly important; Google
spent over $300M on data analysis in the last 4 years alone. Yet current data
analysis tells us very little about the data. We can find out basic things such
as the percentage of people who prefer one product to another or which
advertising methods are the most effective. But there is so much more to be
discovered by advanced techniques. Data analysis is the key to unlocking the
large amounts of information available to us.
My research question is "what
are the current applications of data analysis, what companies are currently
using data analysis and how do they do it?" Through my internship as well
as additional research I hope to delve into the field of data analysis and
ultimately answer these questions.
III. Background:
The first true mathematics book I
ever read was Chaos Theory Tamed by Garnett Williams. It delves into the
study of chaos which is a complicated deterministic system that obeys many
mathematical laws. The tools that find chaotic trends in data and are commonly
used tools in the data analysis field. This first piqued my interest in data analytics
because I was interested in its mathematical nature.
I have also taken AP Statistics as a
BASIS Student because I was interested in its mathematical nature and this
expanded my knowledge of elementary data analysis and aroused a curiosity in
the more advanced techniques used in a variety of different fields.
My research project at the Stanford
University Mathematics Camp was on Quantum Information Theory which is
tangentially related to Data Analysis. Quantum computers can be used to analyze
data much more efficiently and can find trends impossible to spot using only
classical computers.
I am also highly interested in
general in the trends that can be found in lists of numbers. Disregarding
quantum phenomena, the entire process of the universe could be predicted if one
had enough data and knew how to look at it. This intrigues me and leads to my
interest in the field.
IV. Prior
Research:
Enormous amounts of research have
been conducted due to the importance of data analytics to the business world.
An article by CIO discusses big data and says that it
"refers to very large data sets, particularly those not neatly organized to fit into a traditional data warehouse" (Carr, 2012). The article then discusses a major tool of data analysis: Hadoop. Hadoop is a open-source platform for data processing that follows the NoSQL approach meaning that it Not only uses SQL to analyze web data. Many CIOs of large companies are documented supporting Hadoop because of the time it saves by analyzing raw data without preprocessing.
"refers to very large data sets, particularly those not neatly organized to fit into a traditional data warehouse" (Carr, 2012). The article then discusses a major tool of data analysis: Hadoop. Hadoop is a open-source platform for data processing that follows the NoSQL approach meaning that it Not only uses SQL to analyze web data. Many CIOs of large companies are documented supporting Hadoop because of the time it saves by analyzing raw data without preprocessing.
The article also delves into a major
need in the world of big data: the need for speed. Because there is so much
data being gathered, speed of analysis is becoming more important to allow for
quick decision making. There has been a speed up in processing time that comes
from both advances in computer technology as well as smarter algorithms such as
columnar databases. These columnar databases allow for quicker access to data
by being able to select small portions of an entire table.
The falling costs of technology and
desire for mobile data have also been important in the spread of data
analytics. The ability to store information more cheaply allows such massive
amounts of information to be stored. And the fact that corporations and their
clients want this data accessible from mobile devices means the data must be
processed into a form that is easily interpretable from a phone screen rather than
a monitor.
The article's last major point is on
the impact of social networking. It says
"With the explosion of Facebook, Twitter and other social media, more
companies want to analyze the data these sites generate" which shows the
impact of social networking (Carr, 2012). This article shows the importance of
data analytics in the modern world.
Market Watch's article on Hadoop
shows its importance to the field as a tool of analysis. While shorter than the
prior article, it still gives an interesting stance on Hadoop. The article
documents SGI and Extreme Networks partnering to improve Hadoop and make it a
faster tool. With these new improvements, Market Watch claims that the Big Data
industry will reach a value of $24 B by 2016 (Comtex, 2013). The firms are
mainly collaborating on increasing the efficiency of the platform and giving it
better specs to work with. This desire for efficiency shows the importance of
fast and accurate calculation of data in the business world.
In this article by IDG, the author
brings to light the competitive search for Data Analytics talent. Interesting
numbers from the article include that by 2018 there could be a shortage of over
1.8 M people with the proper understanding of data analytics that will make
hiring practices even more competitive (Hein, 2012). The article also states
that margins could increase by 60% by proper use of data analysis techniques.
These numbers show the importance data analysis has and the desire of companies
to keep moving with the current trends.
The article says these changes arise
from data analysis companies finding more and more ways to use data and pull
results from it. As the number of uses and possible interpretations of data
increase, so does the number of people needed to analyze this data. This leads
to the enormous need for talent in the field to accommodate the demand for data
analysis techniques.
This article from Tech Republic
emphasizes the need for data analytics because information needs to be
understandable and usable by all. There is a major need for the right
information to be presented. The 80-20 rule says that only 20% of reports are
useful and this trend holds in Big Data analysis. Thus, tools are necessary
that can figure out which pieces of information are most relevant and can then
streamline its delivery to those who need it. This article tells that there
needs to be a bridge between the various levels of necessary data, from the
overall picture to the firm overseers to the nitty gritty detail needed by the
analytics teams.
V. Significance:
This project is worth undertaking
because it will consolidate a large amount of categorical data on data analysis
techniques. The Senior Project Committee should allow me to conduct this
research to expand my knowledge of data analysis as well as expand the
compilation of data analysis techniques. While new knowledge will not
necessarily be added, existing data will be compiled together into an easily
readable form rather than in several inaccessible sources.
I bring the perspective of a fresh
pair of eyes to the data analysis scene. I am dually interested in theoretical
techniques and practical applications allowing for unique insights. My final
presentation will provide a broader picture of the data analysis scene to
novices in the field and allow for easier explanation to laymen.
VI. Research
Design & Methods:
I will be serving as an intern at
the data analytics firm Axtria. Through this I plan to learn more about the
field of data analysis as well as gain an understanding of its use in the work
place. I will also be doing independent study in the field in hopes to broaden
my knowledge and answer my research questions.
I intend to analyze the companies of
Google, Netflix, Amazon, and Facebook in terms of how they use data to fill
various roles like creating smart advertisements or suggesting what movies to
watch. I will research the individual companies methods for gathering and using
data. I will then incorporate all of this information into my final
presentation on how companies use data analytics.
VII. Problems:
The internship at Axtria is an
unknown variable and I do not know what exactly I will be learning or how much
I will be able to use towards answering my research questions. I can get around
this by compensating with independent research. The field is very large which
makes it more difficult to attain all information necessary to answer my
research questions. I can get around this by narrowing my field of study to a
more specific portion of the field once I know what I will be doing during my
internship. I am not sure how much information is readily available about the
various companies I am studying in terms of their data use. I can compensate
for this by focusing on the ones that provide information and possibly researching
other companies if the ones I have selected are not viable.
VIII. Bibliography:
Carr, F, David. (2012, March 23)5
Business Analytics Trends and How to Exploit Them. CIO Magazine Online Edition. Retrieved
from http://www.cio.com/article/702779/5_Business_Analytics_Tech_Trends_and_How_t o_Exploit_Them.
COMTEX. (2013, November 18)SGI and
Extreme Networks Partner to Bring High Performance
Big Data and Hadoop Solutions. Market Watch: WSJ Online Edition.Retrieved from http://www.marketwatch.com/story/sgi-and-extreme- networks-partner-to-bring-high-performance-big-data-and-hadoop-solutions-2013- 11-18.
O'Hein, Rich. (2012, December 12) IT
Departments Battle for Analytics Talent. IDG News Online. Retrieved from http://news.idg.no/cw/art.cfm?id=9CCA13F5-C543-FDE7- D2A086F5F2A97C65.
No comments:
Post a Comment