曙海培训基地-Hadoop and Spark for Administrators培训

课程大纲

课程大纲：

         Hadoop and Spark for Administrators培训

Introduction

Introduction to Cloud Computing and Big Data solutions
Overview of Apache Hadoop Features and Architecture
Setting up Hadoop

Planning a Hadoop cluster (on-premise, cloud, etc.)
Selecting the OS and Hadoop distribution
Provisioning resources (hardware, network, etc.)
Downloading and installing the software
Sizing the cluster for flexibility
Working with HDFS

Understanding the Hadoop Distributed File System (HDFS)
Overview of HDFS Command Reference
Accessing HDFS
Performing Basic File Operations on HDFS
Using S3 as a complement to HDFS
Overview of the MapReduce

Understanding Data Flow in the MapReduce Framework
Map, Shuffle, Sort and Reduce
Demo: Computing Top Salaries
Working with YARN

Understanding resource management in Hadoop
Working with ResourceManager, NodeManager, Application Master
Scheduling jobs under YARN
Scheduling for large numbers of nodes and clusters
Demo: Job scheduling
Integrating Hadoop with Spark

Setting up storage for Spark (HDFS, Amazon, S3, NoSQL, etc.)
Understanding Resilient Distributed Datasets (RDDs)
Creating an RDD
Implementing RDD Transformations
Demo: Implementing a Text Search Program for Movie Titles
Managing a Hadoop Cluster

Monitoring Hadoop
Securing a Hadoop cluster
Adding and removing nodes
Running a performance benchmark
Tuning a Hadoop cluster to optimizing performance
Backup, recovery and business continuity planning
Ensuring high availability (HA)
Upgrading and Migrating a Hadoop Cluster

Assessing workload requirements
Upgrading Hadoop
Moving from on-premise to cloud and vice-versa
Recovering from failures
Troubleshooting

Summary and Conclusion

课程教师

进阶课程

课程教师

进阶课程

开始新实验

开始评估课实验

开始实验