Senior Research Project Proposal

Senior Research Project
Ryan Smith
October 25, 2013

I.             Title of Project: Data Analytics and the Modern World: Big Data and Big Problems


II.             Statement of Purpose:
              
In the fast-paced, technological world in which we live, data is becoming more and more important. Data analysis is the study of records to provide information. It is clearly important; Google spent over $300M on data analysis in the last 4 years alone. Yet current data analysis tells us very little about the data. We can find out basic things such as the percentage of people who prefer one product to another or which advertising methods are the most effective. But there is so much more to be discovered by advanced techniques. Data analysis is the key to unlocking the large amounts of information available to us.

My research question is "what are the current applications of data analysis, what companies are currently using data analysis and how do they do it?" Through my internship as well as additional research I hope to delve into the field of data analysis and ultimately answer these questions.


III.            Background:
The first true mathematics book I ever read was Chaos Theory Tamed by Garnett Williams. It delves into the study of chaos which is a complicated deterministic system that obeys many mathematical laws. The tools that find chaotic trends in data and are commonly used tools in the data analysis field. This first piqued my interest in data analytics because I was interested in its mathematical nature.

I have also taken AP Statistics as a BASIS Student because I was interested in its mathematical nature and this expanded my knowledge of elementary data analysis and aroused a curiosity in the more advanced techniques used in a variety of different fields.
My research project at the Stanford University Mathematics Camp was on Quantum Information Theory which is tangentially related to Data Analysis. Quantum computers can be used to analyze data much more efficiently and can find trends impossible to spot using only classical computers.

I am also highly interested in general in the trends that can be found in lists of numbers. Disregarding quantum phenomena, the entire process of the universe could be predicted if one had enough data and knew how to look at it. This intrigues me and leads to my interest in the field.


IV.           Prior Research:
Enormous amounts of research have been conducted due to the importance of data analytics to the business world. An article by CIO discusses big data and says that it
"refers to very large data sets, particularly those not neatly organized to fit into a traditional data warehouse" (Carr, 2012). The article then discusses a major tool of data analysis: Hadoop. Hadoop is a open-source platform for data processing that follows the NoSQL approach meaning that it Not only uses SQL to analyze web data. Many CIOs of large companies are documented supporting Hadoop because of the time it saves by analyzing raw data without preprocessing.

The article also delves into a major need in the world of big data: the need for speed. Because there is so much data being gathered, speed of analysis is becoming more important to allow for quick decision making. There has been a speed up in processing time that comes from both advances in computer technology as well as smarter algorithms such as columnar databases. These columnar databases allow for quicker access to data by being able to select small portions of an entire table.

The falling costs of technology and desire for mobile data have also been important in the spread of data analytics. The ability to store information more cheaply allows such massive amounts of information to be stored. And the fact that corporations and their clients want this data accessible from mobile devices means the data must be processed into a form that is easily interpretable from a phone screen rather than a monitor.

The article's last major point is on the impact of social networking.  It says "With the explosion of Facebook, Twitter and other social media, more companies want to analyze the data these sites generate" which shows the impact of social networking (Carr, 2012). This article shows the importance of data analytics in the modern world.

Market Watch's article on Hadoop shows its importance to the field as a tool of analysis. While shorter than the prior article, it still gives an interesting stance on Hadoop. The article documents SGI and Extreme Networks partnering to improve Hadoop and make it a faster tool. With these new improvements, Market Watch claims that the Big Data industry will reach a value of $24 B by 2016 (Comtex, 2013). The firms are mainly collaborating on increasing the efficiency of the platform and giving it better specs to work with. This desire for efficiency shows the importance of fast and accurate calculation of data in the business world.

In this article by IDG, the author brings to light the competitive search for Data Analytics talent. Interesting numbers from the article include that by 2018 there could be a shortage of over 1.8 M people with the proper understanding of data analytics that will make hiring practices even more competitive (Hein, 2012). The article also states that margins could increase by 60% by proper use of data analysis techniques. These numbers show the importance data analysis has and the desire of companies to keep moving with the current trends.

The article says these changes arise from data analysis companies finding more and more ways to use data and pull results from it. As the number of uses and possible interpretations of data increase, so does the number of people needed to analyze this data. This leads to the enormous need for talent in the field to accommodate the demand for data analysis techniques.

This article from Tech Republic emphasizes the need for data analytics because information needs to be understandable and usable by all. There is a major need for the right information to be presented. The 80-20 rule says that only 20% of reports are useful and this trend holds in Big Data analysis. Thus, tools are necessary that can figure out which pieces of information are most relevant and can then streamline its delivery to those who need it. This article tells that there needs to be a bridge between the various levels of necessary data, from the overall picture to the firm overseers to the nitty gritty detail needed by the analytics teams.

V.           Significance:
This project is worth undertaking because it will consolidate a large amount of categorical data on data analysis techniques. The Senior Project Committee should allow me to conduct this research to expand my knowledge of data analysis as well as expand the compilation of data analysis techniques. While new knowledge will not necessarily be added, existing data will be compiled together into an easily readable form rather than in several inaccessible sources.

I bring the perspective of a fresh pair of eyes to the data analysis scene. I am dually interested in theoretical techniques and practical applications allowing for unique insights. My final presentation will provide a broader picture of the data analysis scene to novices in the field and allow for easier explanation to laymen.


VI.           Research Design & Methods:
I will be serving as an intern at the data analytics firm Axtria. Through this I plan to learn more about the field of data analysis as well as gain an understanding of its use in the work place. I will also be doing independent study in the field in hopes to broaden my knowledge and answer my research questions.

I intend to analyze the companies of Google, Netflix, Amazon, and Facebook in terms of how they use data to fill various roles like creating smart advertisements or suggesting what movies to watch. I will research the individual companies methods for gathering and using data. I will then incorporate all of this information into my final presentation on how companies use data analytics.



VII.          Problems:
The internship at Axtria is an unknown variable and I do not know what exactly I will be learning or how much I will be able to use towards answering my research questions. I can get around this by compensating with independent research. The field is very large which makes it more difficult to attain all information necessary to answer my research questions. I can get around this by narrowing my field of study to a more specific portion of the field once I know what I will be doing during my internship. I am not sure how much information is readily available about the various companies I am studying in terms of their data use. I can compensate for this by focusing on the ones that provide information and possibly researching other companies if the ones I have selected are not viable.


VIII.         Bibliography:

Carr, F, David. (2012, March 23)5 Business Analytics Trends and How to Exploit Them. CIO              Magazine Online Edition. Retrieved from                http://www.cio.com/article/702779/5_Business_Analytics_Tech_Trends_and_How_t               o_Exploit_Them.

COMTEX. (2013, November 18)SGI and Extreme Networks Partner to Bring High            Performance Big Data and Hadoop Solutions. Market Watch: WSJ Online           Edition.Retrieved from http://www.marketwatch.com/story/sgi-and-extreme-     networks-partner-to-bring-high-performance-big-data-and-hadoop-solutions-2013-      11-18.


O'Hein, Rich. (2012, December 12) IT Departments Battle for Analytics Talent. IDG News               Online. Retrieved from http://news.idg.no/cw/art.cfm?id=9CCA13F5-C543-FDE7-               D2A086F5F2A97C65.

No comments:

Post a Comment