Hadoop Developer Training

5216 Learners

Big Data Hadoop Developer training delivers the key concepts and expertise necessary to develop robust data processing applications using Apache Hadoop. The interactive sessions and demonstrations carried by an industry expert will help the aspirants in understanding all the features and programming skills easily. The Hadoop developer course focuses on the fundamentals and advanced topics of Hadoop, MapReduce, Hadoop Distributed File System (HDFC), Hadoop cluster, Pig, Hive, Hbase, ZooKeeper, Sqoop, and Flume.

By the end of Hadoop Developer training, the participants will be able to:

  • Describe the concepts of Apache Hadoop, Hadoop Ecosystem, MapReduce, and HDFS
  • Develop, debug, and implement the MapReduce applications
  • Set up different configurations of Hadoop cluster
  • Maintain and monitor Hadoop cluster by considering the optimal hardware and networking settings
  • Leverage Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, and other projects from the Apache Hadoop ecosystem
Target audience

Experienced developers who want to write, maintain and/or optimize Apache Hadoop codes

Prerequisites

The candidates with programming experience, preferably in Java can undergo this training. However,the candidates with exposure to other programming languages PHP, Python, or C#can also get benefited from this training.

Big Data Hadoop Developer training delivers the key concepts and expertise necessary to develop robust data processing applications using Apache Hadoop. The interactive sessions and demonstrations carried by an industry expert will help the aspirants in understanding all the features and programming skills easily. The Hadoop developer course focuses on the fundamentals and advanced topics of Hadoop, MapReduce, Hadoop Distributed File System (HDFC), Hadoop cluster, Pig, Hive, Hbase, ZooKeeper, Sqoop, and Flume.

By the end of Hadoop Developer training, the participants will be able to:

  • Describe the concepts of Apache Hadoop, Hadoop Ecosystem, MapReduce, and HDFS
  • Develop, debug, and implement the MapReduce applications
  • Set up different configurations of Hadoop cluster
  • Maintain and monitor Hadoop cluster by considering the optimal hardware and networking settings
  • Leverage Pig, Hive, Hbase, ZooKeeper, Sqoop, Flume, and other projects from the Apache Hadoop ecosystem
Target audience

Experienced developers who want to write, maintain and/or optimize Apache Hadoop codes

Prerequisites

The candidates with programming experience, preferably in Java can undergo this training. However,the candidates with exposure to other programming languages PHP, Python, or C#can also get benefited from this training.

Hadoop Developer Training Course Content

1. Meet Hadoop

  • Data
  • Data Storage and Analysis
  • Comparison with Other Systems
  • RDBMS
  • Grid Computing
  • Volunteer Computing
  • A Brief History of Hadoop
  • Apache Hadoop and the Hadoop Ecosystem
  • Hadoop Releases

2. MapReduce

  • A Weather Dataset
  • Data Format
  • Analyzing the Data with Unix Tools
  • Analyzing the Data with Hadoop
  • Map and Reduce
  • Java MapReduce
  • Scaling Out
  • Data Flow
  • Combiner Functions
  • Running a Distributed MapReduce Job
  • Hadoop Streaming
  • Compiling and Running

3. The Hadoop Distributed File System (HDFS)

  • The Design of HDFS
  • HDFS Concepts
  • Blocks
  • Namenodes and Datanodes
  • HDFS Federation
  • HDFS High-Availability
  • The Command-Line Interface
  • Basic Filesystem Operations
  • Hadoop Filesystems
  • Interfaces
  • The Java Interface
  • Reading Data from a Hadoop URL
  • Reading Data Using the FileSystem API
  • Writing Data
  • Directories
  • Querying the Filesystem
  • Deleting Data
  • Data Flow
  • Anatomy of a File Read
  • Anatomy of a File Write
  • Coherency Model
  • Parallel Copying with distcp
  • Keeping an HDFS Cluster Balanced
  • Hadoop Archives

4. Hadoop I/O

  • Data Integrity
  • Data Integrity in HDFS
  • LocalFileSystem
  • ChecksumFileSystem
  • Compression
  • Codecs
  • Compression and Input Splits
  • Using Compression in MapReduce
  • Serialization
  • The Writable Interface
  • Writable Classes
  • File-Based Data Structures
  • SequenceFile
  • MapFile

5. Developing a MapReduce Application

  • The Configuration API
  • Combining Resources
  • Variable Expansion
  • Configuring the Development Environment
  • Managing Configuration
  • GenericOptionsParser, Tool, and ToolRunner
  • Writing a Unit Test
  • Mapper
  • Reducer
  • Running Locally on Test Data
  • Running a Job in a Local Job Runner
  • Testing the Driver
  • Running on a Cluster
  • Packaging
  • Launching a Job
  • The MapReduce Web UI
  • Retrieving the Results
  • Debugging a Job
  • Hadoop Logs
  • Tuning a Job
  • Profiling Tasks
  • MapReduce Workflows
  • Decomposing a Problem into MapReduce Jobs
  • JobControl

6. How MapReduce Works

  • Anatomy of a MapReduce Job Run
  • Classic MapReduce (MapReduce 1)
  • Failures
  • Failures in Classic MapReduce
  • Failures in YARN
  • Job Scheduling
  • The Capacity Scheduler
  • Shuffle and Sort
  • The Map Side
  • The Reduce Side
  • Configuration Tuning
  • Task Execution
  • The Task Execution Environment
  • Speculative Execution
  • Output Committers
  • Task JVM Reuse
  • Skipping Bad Records

7. MapReduce Types and Formats

  • MapReduce Types
  • The Default MapReduce Job
  • Input Formats
  • Input Splits and Records
  • Text Input
  • Binary Input
  • Multiple Inputs
  • Database Input (and Output)
  • Output Formats
  • Text Output
  • Binary Output
  • Multiple Outputs
  • Lazy Output
  • Database Output

8. MapReduce Features

  • Counters
  • Built-in Counters
  • User-Defined Java Counters
  • User-Defined Streaming Counters
  • Sorting
  • Preparation
  • Partial Sort
  • Total Sort
  • Secondary Sort
  • Joins
  • Map-Side Joins
  • Reduce-Side Joins
  • Side Data Distribution
  • Using the Job Configuration
  • Distributed Cache
  • MapReduce Library Classes

9. Setting Up a Hadoop Cluster

  • Cluster Specification
  • Network Topology
  • Cluster Setup and Installation
  • Installing Java
  • Creating a Hadoop User
  • Installing Hadoop
  • Testing the Installation
  • SSH Configuration
  • Hadoop Configuration
  • Configuration Management
  • Environment Settings
  • Important Hadoop Daemon Properties
  • Hadoop Daemon Addresses and Ports
  • Other Hadoop Properties
  • User Account Creation
  • YARN Configuration
  • Important YARN Daemon Properties
  • YARN Daemon Addresses and Ports
  • Security
  • Kerberos and Hadoop
  • Delegation Tokens
  • Other Security Enhancements
  • Benchmarking a Hadoop Cluster
  • Hadoop Benchmarks
  • User Jobs
  • Hadoop in the Cloud
  • Hadoop on Amazon EC2

10. Administering Hadoop

  • HDFS
  • Persistent Data Structures
  • Safe Mode
  • Audit Logging
  • Tools
  • Monitoring
  • Logging
  • Metrics
  • Java Management Extensions
  • Routine Administration Procedures
  • Commissioning and Decommissioning Nodes
  • Upgrades

11. Pig

  • Installing and Running Pig
  • Execution Types
  • Running Pig Programs
  • Grunt
  • Pig Latin Editors
  • An Example
  • Generating Examples
  • Comparison with Databases
  • Pig Latin
  • Structure
  • Statements
  • Expressions
  • Types
  • Schemas
  • Functions
  • Macros
  • User-Defined Functions
  • A Filter UDF
  • An Eval UDF
  • A Load UDF
  • Data Processing Operators
  • Loading and Storing Data
  • Filtering Data
  • Grouping and Joining Data
  • Sorting Data
  • Combining and Splitting Data
  • Pig in Practice
  • Parallelism
  • Parameter Substitution

12. Hive

  • Installing Hive
  • The Hive Shell
  • An Example
  • Running Hive
  • Configuring Hive
  • Hive Services
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Updates, Transactions, and Indexes
  • HiveQL
  • Data Types
  • Operators and Functions
  • Tables
  • Managed Tables and External Tables
  • Partitions and Buckets
  • Storage Formats
  • Importing Data
  • Altering Tables
  • Dropping Tables
  • Querying Data
  • Sorting and Aggregating
  • MapReduce Scripts
  • Joins
  • Subqueries
  • Views
  • User-Defined Functions
  • Writing a UDF
  • Writing a UDAF

13. Hbase

  • Backdrop
  • Concepts
  • Whirlwind Tour of the Data Model
  • Implementation
  • Installation
  • Test Drive
  • Clients
  • Java
  • Avro, REST, and Thrift
  • Schemas
  • Loading Data
  • Web Queries
  • HBase Versus RDBMS
  • Successful Service
  • Hbase

14. ZooKeeper

  • Installing and Running ZooKeeper
  • Group Membership in ZooKeeper
  • Creating the Group
  • Joining a Group
  • Listing Members in a Group
  • Deleting a Group
  • The ZooKeeper Service
  • Data Model
  • Operations
  • Implementation
  • Consistency
  • Sessions
  • States

15. Sqoop

  • Getting Sqoop
  • A Sample Import
  • Generated Code
  • Additional Serialization Systems
  • Database Imports: A Deeper Look
  • Controlling the Import
  • Imports and Consistency
  • Direct-mode Imports
  • Working with Imported Data
  • Imported Data and Hive
  • Importing Large Objects

16. Flume

  • Introduction
    • Overview
    • Architecture
  • Data flow model
  • Reliability
  • Building Flume
    • Getting the source
    • Compile/test Flume
  • Developing custom components
    • Client
      • Client SDK
      • RPC client interface
      • RPC clients - Avro and Thrift
      • Failover Client
      • Load Balancing RPC client
    • Embedded agent
    • Transaction interface
    • Sink
    • Source
    • Channel

Drop Us a Query

+91 9555006479

Available 24x7 for your queries

Free Hadoop Developer Training Assessment

This assessment tests understanding of course content through MCQ and short answers, analytical thinking, problem-solving abilities, and effective communication of ideas.

Try it Now customize time
customize time

+91 9555006479

Available 24x7