- 1. The Case for Apache Hadoop
- 2. Hadoop Distributed File System
- 3. MapReduce
- 4. Overview of the Hadoop Ecosystem
- 5. Planning your Hadoop Cluster
- 6. Hadoop Installation
- 7. Advanced Configuration
- 8. Hadoop Security
- 9. Managing and Scheduling Jobs
- 10. Cluster Maintenance
- 11. Cluster Monitoring and Troubleshooting
- 12. Populating HDFS From External Sources
- 13. Installing and Managing Other Hadoop Projects
- 14. Hadoop Distributed File System (HDFS)
1. The Case for Apache Hadoop
- Brief History of Hadoop
- Core Hadoop Components
- Fundamental Concepts
2. Hadoop Distributed File System
- HDFS Features
- HDFS Design Assumptions
- Overview of HDFS Architecture
- Writing and Reading Files
- Name Node Considerations
- An Overview of HDFS Security
- Hands-On Exercise
3. MapReduce
- What Is MapReduce?
- Features of MapReduce
- Basic MapReduce Concepts
- Architectural Overview
- MapReduce Version 2
- Failure Recovery
- Hands-On Exercise
4. Overview of the Hadoop Ecosystem
- What is the Hadoop Ecosystem?
- Integration Tools
- Analysis Tools
- Data Storage and Retrieval Tools
5. Planning your Hadoop Cluster
- General planning Considerations
- Choosing the Right Hardware
- Network Considerations
- Configuring Nodes
6. Hadoop Installation
- Deployment Types
- Installing Hadoop
- Using Hadoop Manager for Easy Installation
- Basic Configuration Parameters
- Hands-On Exercise
7. Advanced Configuration
- Advanced Parameters
- Configuring Rack Awareness
- Configuring Federation
- Configuring High Availability
- Using Configuration Management Tools
8. Hadoop Security
- Why Hadoop Security Is Important
- Hadoop’s Security System Concepts
- What Kerberos Is and How it Works
- Configuring Kerberos Security
- Integrating a Secure Cluster with Other Systems
9. Managing and Scheduling Jobs
- Managing Running Jobs
- Hands-On Exercise
- FIFO Scheduler
- FairScheduler
- Configuring the FairScheduler
- Hands-On Exercise
10. Cluster Maintenance
- Checking HDFS Status
- Hands-On Exercise
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the Cluster
- Hands-On Exercise
- NameNode Metadata Backup
- Cluster Upgrading
11. Cluster Monitoring and Troubleshooting
- General System Monitoring
- Managing Hadoop’s Log Files
- Using the NameNode and JobTracker Web UIs
- Hands-On Exercise
- Cluster Monitoring with Ganglia
- Common Troubleshooting Issues
- Benchmarking Your Cluster
12. Populating HDFS From External Sources
- An Overview of Flume
- Hands-On Exercise
- An Overview of Sqoop
- Best Practices for Importing Data
13. Installing and Managing Other Hadoop Projects
- Hive
- Pig
- HBase
14. Hadoop Distributed File System (HDFS)
- HDFS Design
- HDFS Daemons
- HDFS Federation
- HDFS HA
- Securing HDFS (Kerberos)
- File Read and Write Paths