- An Introduction to Spark
- About Resilient Distributed Dataset and DataFrames
- The Spark application programming
- An Introduction to Spark libraries
- About Spark configuration, monitoring and tuning
1. An Introduction to Spark
- What is Spark and what is its purpose?
- Components of the Spark unified stack
- Resilient Distributed Dataset (RDD)
- Downloading and installing Spark standalone
- Scala and Python overview
- Launching and using Spark’s Scala and Python shell ©
2. About Resilient Distributed Dataset and DataFrames
- Understand how to create parallelized collections and external datasets
- Work with Resilient Distributed Dataset (RDD) operations
- Utilize shared variables and key-value pairs
3. The Spark application programming
- Understand the purpose and usage of the SparkContext
- Initialize Spark with the various programming languages
- Describe and run some Spark examples
- Pass functions to Spark
- Create and run a Spark standalone application
- Submit applications to the cluster
4. An Introduction to Spark libraries
- Understand and use the various Spark libraries
5. About Spark configuration, monitoring and tuning
- Understand components of the Spark cluster
- Configure Spark to modify the Spark properties, environmental variables, or logging properties
- Monitor Spark using the web UIs, metrics, and external instrumentation
- Understand performance tuning considerations