Hadoop and Spark for Administrators培训
Introduction
Introduction to Cloud Computing and Big Data solutions
Overview of Apache Hadoop Features and Architecture
Setting up Hadoop
Planning a Hadoop cluster (on-premise, cloud, etc.)
Selecting the OS and Hadoop distribution
Provisioning resources (hardware, network, etc.)
Downloading and installing the software
Sizing the cluster for flexibility
Working with HDFS
Understanding the Hadoop Distributed File System (HDFS)
Overview of HDFS Command Reference
Accessing HDFS
Performing Basic File Operations on HDFS
Using S3 as a complement to HDFS
Overview of the MapReduce
Understanding Data Flow in the MapReduce Framework
Map, Shuffle, Sort and Reduce
Demo: Computing Top Salaries
Working with YARN
Understanding resource management in Hadoop
Working with ResourceManager, NodeManager, Application Master
Scheduling jobs under YARN
Scheduling for large numbers of nodes and clusters
Demo: Job scheduling
Integrating Hadoop with Spark
Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
Understanding Resilient Distributed Datasets (RDDs)
Creating an RDD
Implementing RDD Transformations
Demo: Implementing a Text Search Program for Movie Titles
Managing a Hadoop Cluster
Monitoring Hadoop
Securing a Hadoop cluster
Adding and removing nodes
Running a performance benchmark
Tuning a Hadoop cluster to optimizing performance
Backup, recovery and business continuity planning
Ensuring high availability (HA)
Upgrading and Migrating a Hadoop Cluster
Assessing workload requirements
Upgrading Hadoop
Moving from on-premise to cloud and vice-versa
Recovering from failures
Troubleshooting
Summary and Conclusion