Für neue Autoren:
kostenlos, einfach und schnell
Für bereits registrierte Autoren
21 Seiten, Note: 1,0
2 The basics of Big Data
2.2 Historic Development
2.3 Usage and Application
3 Market overview Big Data solutions
3.1 Apache Hadoop - Enabling Big Data
3.2 Market analysis of Big Data solutions
3.3 Cost estimates
3.4 Big Data-Solutions of selected vendors
Market overview of Big Data solutions
Early on in my career, I was responsible for implementing a data warehouse solution for the largest university in Austria with more than 7,000 employees. After a rigid RFQ process, we selected Cognos to be the tool of choice, and we successfully implemented a solution based on a relational Oracle database. Ever since that time, business intelligence has fascinated me as a tool to turn data into information. Due to the continuing digitalization of our everyday environment, the amount of data that is collected by all kind of devices, processes and human and machine interactions is growing exponentially. The desire to analyze more and more data to gain a better understanding of e.g. customer needs, manufacturing efficiencies or e.g. to create predictive analysis based on past consumer behavior drove the need to enhance the functionality of existing business intelligence solutions towards a more open Big Data architecture, that allows the analysis of massive amounts of structured and unstructured data.
In recent years, several well-known IT companies released new products that specialize in Big Data analysis. In this research paper, I’d like to take a look at the current Big Data vendors, and present the status quo of the leading Big Data solutions.
The term Big Data refers to a large amount of data that exceeds the processing capacity of conventional database systems. In practice, the term Big Data is used not only for large amounts of data, but also for the method used to analyze these data. Thus, the term "Big Data" can either mean the data volume, the method or the system with which the data are analyzed and evaluated. The Big Data technology is not based on a singular technology, it is the result of the interaction of a whole series of innovations and technologies. Thus, there is not only one isolated solution that enables the analysis of large amounts of data. It is the combination of different technological developments that allows companies with the willingness to invest in these technologies to start to transform huge amount of data into information.
Big Data has four important characteristics:
1. Volume: An increasing number of organizations and enterprises have gigantic data volumes ranging from a few terabytes to petabytes.
2. Variety: More and more sources of data are of different types, which can be grouped roughly into unstructured, semi structured and structured data. Companies are increasingly collecting external data, for example from social networks.
3. Velocity: Huge amounts of data have to be quickly evaluated in real-time. The processing speed has to keep up with the pace of data growth.
4. Veracity: The collected data has to be clean, which results in the need to prevent “dirty” data from accumulating in the databases. Valid data are the basis for all further analyses.
Due to the advancing technological developments and digitalization, companies and organizations have more and more opportunities to collect different data that can be analyzed to gain a competitive advantage. As a result, the volume of collected data has risen rapidly in recent years. Already back in 2012, an estimated 2.5 exabytes of data have been produced every day, and this number is supposed to double every 40 months. This data volume was eventually too large to be evaluated by the conventional database systems.
Another problem with this enormous amount of collected data is that it is unstructured. There is no uniform data format since data is available from more and more different sources. These different data formats could not be analyzed and interpreted by the traditional databases and data models, and especially the data volume exceeded the limits of the underlying database systems.
A possible solution to these problems was offered with the introduction of an open-source framework for parallel data processing on highly scalable server clusters, called Apache Hadoop. With the release of Hadoop, it was becoming possible to analyze these permanently growing amounts of data and to exploit them for one’s own advantages almost in real-time, using various analysis methods.
Once the potential of Apache Hadoop had been recognized, many BI vendors integrated their own Big Data solution into a database systems based on the Hadoop framework. Thus, Hadoop and the Big Data solutions of the individual providers are constantly developed and improved.
There are also more and more small and medium-sized enterprises (SMEs) that are working on integrating big data processes and have recognized their potential. This means that not only big companies such as Big Data's global players are affected, but also SMEs started to turn Big Data into deep insights.
In addition, SMEs are also developing and enhancing their own solutions for their Big Data problems and are also offering them to other companies. This is how Big Data software solutions develop, which are very different from each other.
Big data solutions are used for the analysis and reporting of very large data sets with different data formats.
Big Data offers businesses and organizations the opportunity to work with many petabytes of data in a single database system. This opens up new ways for companies and organizations to use and analyze the collected data for their advantages.
As can also be seen in the following figure, the data volumes are expected to grow, with only a portion of this data being useful. The majority of this data is not relevant and precisely this data is sorted out using the Big Data database functionalities so that only the relevant data is kept for further analyses.
illustration not visible in this excerpt
Figure 1: Opportunity for Big Data
For example, Big Data methods are increasingly being used for marketing purposes like personalized product recommendations or trend monitoring and sales forecasting, but offer many advantages and new possibilities in almost all industries in risk management and fraud detection, innovation, production efficiency, talent management and hiring, or preventative maintenance.
In the next years, companies will increasingly use Big Data to support their decision-making processes or business processes. This will provide companies with competitive advantages over other market participants and thus increase their chances of achieving a better market position. This could be done, for example, with predictive analysis, which would allow companies to identify upcoming changes in the market earlier than their competition, and give them time to prepare themselves accordingly.
Hadoop is a project of the Apache Software Foundation, which distributes open-source utilities, program libraries, and a framework for the development and execution of distributed programs that work on the clusters of hundreds and thousands of nodes. Hadoop is used for the implementation of search and context mechanisms of many heavily visited websites, such as Yahoo or Facebook. Hadoop was developed on Google's MapReduce algorithm. The application is subdivided into a large set of similar elementary tasks that run on cluster nodes and that are combined into the end result.
The project consists of four modules: (1) Hadoop Common (Middleware, which is a collection of infrastructural program libraries and utilities used for other modules and related projects), (2) HDFS (Hadoop Distributed File System used to store the large (3) YARN (a system for task scheduling and cluster management) and (4) Hadoop MapReduce (a platform used to program and execute distributed MapReduce calculations). Previously, there were a number of projects that later became independent projects within the Apache Software Foundation System.
Hadoop is one of the fundamental Big Data technologies. Around Hadoop, an ecosystem of connected projects and technologies has emerged. Most of them have developed into independent projects. Already since 2005, there has been an active process of commercialization of these technologies. Several companies focus on the development of commercial Hadoop distributions and technical support services for this ecosystem. Virtually all major suppliers of BI and analytics tools include Hadoop in their product strategies and in their range of solutions.
The implementation of a Big Data software solution is a strategic decision of a company to become an analytical competitor. This is the ability to create competitive advantages through data analysis. This goal has to be put in context with the companies’ core competences - for a company that focuses solely on the processing and analysis of data, this objective is of course more important than it is for others.
After companies have developed into an analytical competitor, there is still another improvement at this level, which contributes to a competitive advantage. This aspect is called "information supremacy". Information supremacy means that a company is perceived as a reliable supplier of quality-assured analyzes to support decision-making processes."
Big Data has grown enormously in the world over the past few years. The global sales of Big Data products and business analytics applications, tools and services rose to around 122 billion USD in 2015 and is expected to increase to 187 billion USD in 2019. According to the market research firm IDC, the top industries in 2019 will be discrete manufacturing (23 billion), banking (22 billion), and process manufacturing (16 billion). Other industries with more than 10 billion USD of Big Data revenues are federal and central government, professional services, telecommunications and retail.
The dominant players in the Big Data software solutions market are the well-known companies IBM, Microsoft, Oracle and SAP. However, these companies must not rest on their market leadership. Others are also pushing to get to the top, such as Exasol or Teradata. IBM is leading due to many acquisitions in the BI and BA sectors, as well as through the technical advancement of its solution and high consulting competence. The following chart shows the comparison of providers for Big Data Analytics that was taken from the annual Big Data Vendor Benchmark 2016 that is done annually since 2013 by Experton Group, a well-known German market research and consulting firm. In the chart, the suppliers are divided into four different categories. These are Product Challenger, Leader, Follower and Market Challenger. All vendors listed here are benchmarked.
illustration not visible in this excerpt
Figure 2: Comparison of providers for Big Data Analytics
 Dumbill (2012)
 Bitkom (2014), p. 17
 Bitkom (2014), p. 12
 Normandeau (2013)
 McAfee, Brynjolfsson (2012), p. 62
 Dumbill (2012)
 Bitkom (2014), p. 12
 Bitkom (2014), p. 14
 Schäfer et al. (2012), p. 8
 McAfee, Brynjolfsson (2012), p. 63
 Gantz et al. (2012), p. 9
 Schäfer et al. (2012), p. 5
 Bitkom (2014), p. 20
 Apache Software Foundation (2016)
 Bachmann et al. (2014), p. 48
 Bachmann et al. (2014), p. 52ff.
 Davis (2016)
 Bayer (2016)
 Bayer (2016)
Der GRIN Verlag hat sich seit 1998 auf die Veröffentlichung akademischer eBooks und Bücher spezialisiert. Der GRIN Verlag steht damit als erstes Unternehmen für User Generated Quality Content. Die Verlagsseiten GRIN.com, Hausarbeiten.de und Diplomarbeiten24 bieten für Hochschullehrer, Absolventen und Studenten die ideale Plattform, wissenschaftliche Texte wie Hausarbeiten, Referate, Bachelorarbeiten, Masterarbeiten, Diplomarbeiten, Dissertationen und wissenschaftliche Aufsätze einem breiten Publikum zu präsentieren.
Kostenfreie Veröffentlichung: Hausarbeit, Bachelorarbeit, Diplomarbeit, Dissertation, Masterarbeit, Interpretation oder Referat jetzt veröffentlichen!