In this we discuss security issues for big data Hadoop environment. Big data applications are a great benefit to organization, business and in many small and large scale industries. Security and privacy issues are magnified by velocity, variety and volume of big data. Hadoop projects security as top agenda which in turn represents classified as critical term. With the increasing acceptance of Hadoop, there is increasing trend to create a vast security feature. Therefore a traditional security mechanism, which are tailored to securing a small scale static data are in adequate. The important issues relating to Hadoop are authentication, authorization, editing and encryption within a cluster. In this paper we have highlighted different security aspects of big data Hadoop.
Table of Contents
I. INTRODUCTION
A. Volume
B. Variety
C. Velocity
D. Complexity
II. TRADITIONAL HADOOP SECURITY
III. SECURITY ISSUES AND CHALLENGES
A. Fragmented data
B. Node to node communication
C. Distributed computing
D. Interaction with client
E. Controlling data access
IV. SECURITY SOLUTIONS FOR HADOOP
A. Authentication
1) Apache Knox
B. Authorization
1) Apache Sentry
C. Encryption
1) Project Rhino
Objectives and Topics
The primary objective of this paper is to analyze the critical security and privacy challenges inherent in Big Data Hadoop environments and to evaluate existing technological solutions designed to mitigate these risks. The research focuses on the transition from traditional, static data security mechanisms to robust frameworks capable of protecting distributed, high-velocity, and high-volume data ecosystems.
- Evolution of Hadoop security models and historical limitations.
- Core security challenges including data fragmentation, distributed computing risks, and access control.
- Authentication strategies using Kerberos, SPNEGO, and Apache Knox.
- Authorization frameworks, specifically focusing on Apache Sentry and role-based access control.
- Encryption techniques for data in motion and at rest, including the integration of Project Rhino.
Excerpt from the Book
III. SECURITY ISSUES AND CHALLENGES
Hadoop presents some different sets of security issues for data centers managers and security professionals. The various security issues and challenges are: A. Fragmented data. Big data clusters contain data that portray the quality of fluidity,allowing multiple copies moving from one node to another which ensures redundancy and resiliency [7].the data is available for fragmentation which is shared among multiple servers which results in more complexity as there is no security model to handle this issue.
B. Node to node communication. The main issue with Hadoop is they don’t implement secure communication; they bring into the use of RPC over TCP/IP[7].
C. Distributed computing. Since the available of resources increases with distributed computing as the data is processes at any instant where it is available. This results in high risks of attacks then in the centralized computing.
D. Interaction with client. Communication with client takes place with resource manager, nodes. Even communication is efficient but it is difficult to shield nodes from clients and name server from nodes.
E. Controlling data access. The available database security schema provides role base access. Big data was designed with very little security in mind. The installation of big data is based on web services model with very little security for preventing web threat making it a highly susceptible.
Summary of Chapters
I. INTRODUCTION: This chapter defines Big Data by its core properties (Volume, Variety, Velocity, Complexity) and provides an overview of the Hadoop framework and its components.
II. TRADITIONAL HADOOP SECURITY: This section discusses the historical lack of security in early Hadoop versions and the initial attempts to implement rudimentary authentication and authorization.
III. SECURITY ISSUES AND CHALLENGES: This chapter outlines the specific vulnerabilities found in Hadoop, such as risks during node-to-node communication, distributed computing threats, and issues with fragmented data.
IV. SECURITY SOLUTIONS FOR HADOOP: This section provides a technical deep dive into modern security measures, covering authentication (Apache Knox), authorization (Apache Sentry), and encryption (Project Rhino) to protect sensitive data.
Keywords
Big Data, Hadoop, HDFS, Mapreduce, Security, Authentication, Authorization, Encryption, Kerberos, Apache Knox, Apache Sentry, Project Rhino, Data Privacy, Distributed Systems, Network Security
Frequently Asked Questions
What is the core focus of this publication?
The paper focuses on identifying and analyzing the security vulnerabilities present within Big Data Hadoop environments and exploring modern technological solutions to address these risks.
Which key areas of security are discussed in the paper?
The main areas covered are authentication of users and services, fine-grained authorization, and encryption for both data at rest and data in motion.
What is the primary objective of the research?
The primary objective is to highlight that traditional security mechanisms are inadequate for Big Data and to propose advanced, robust frameworks suitable for securing large-scale distributed Hadoop clusters.
What scientific or technical methods are addressed?
The paper addresses technical implementation methods such as Kerberos, SPNEGO, Apache Knox, Apache Sentry, and block-level encryption techniques like Project Rhino.
What is covered in the main body of the work?
The main body examines the specific challenges arising from data fragmentation, node-to-node communication, and distributed computing, followed by detailed solutions for securing the Hadoop ecosystem.
Which keywords best characterize this work?
The work is characterized by terms such as Big Data, Hadoop, security, authentication, authorization, encryption, and distributed systems.
How does Apache Knox contribute to Hadoop security?
Apache Knox provides a single point of secure access and authentication for various Hadoop clusters, acting as a gateway that facilitates firewalls between clients and the clusters.
What is the role of Project Rhino in the Hadoop ecosystem?
Project Rhino provides block-level encryption for stored data and enhances HBASE security by offering cell-level authentication and transparent encryption for tables.
- Arbeit zitieren
- Rohit Sharma (Autor:in), 2018, Security Issues of Big Data Hadoop, München, GRIN Verlag, https://www.hausarbeiten.de/document/413453