Module 1: Google Cloud Dataproc Overview
- Creating and managing clusters.
- Leveraging custom machine types and preemptible worker nodes.
- Scaling and deleting Clusters.
Module 2: Running Dataproc Jobs
- Running Pig and Hive jobs.
- Separation of storage and compute.
Module 3: Integrating Dataproc with Google Cloud Platform
- Customize cluster with initialization actions.
- BigQuery Support.
Module 4: Making Sense of Unstructured Data with Google’s Machine Learning APIs
- Google’s Machine Learning APIs.
- Common ML Use Cases.
- Invoking ML APIs.
Module 5: Serverless data analysis with BigQuery
- What is BigQuery.
- Queries and Functions.
- Loading data into BigQuery.
- Exporting data from BigQuery.
- Nested and repeated fields.
- Querying multiple tables.
- Performance and pricing.
Module 6: Serverless, autoscaling data pipelines with Dataflow
- The Beam programming model.
- Data pipelines in Beam Python.
- Data pipelines in Beam Java.
- Scalable Big Data processing using Beam.
- Incorporating additional data.
- Handling stream data.
- GCP Reference architecture.
Module 7: Getting started with Machine Learning
- What is machine learning (ML).
- Effective ML: concepts, types.
- ML datasets: generalization.
Module 8: Building ML models with Tensorflow
- Getting started with TensorFlow.
- TensorFlow graphs and loops + lab.
- Monitoring ML training.
Module 9: Scaling ML models with CloudML
- Why Cloud ML?
- Packaging up a TensorFlow model.
- End-to-end training.
Module 10: Feature Engineering
- Creating good features.
- Transforming inputs.
- Synthetic features.
- Preprocessing with Cloud ML.
Module 11: Architecture of streaming analytics pipelines
- Stream data processing: Challenges.
- Handling variable data volumes.
- Dealing with unordered/late data.
Module 12: Ingesting Variable Volumes
- What is Cloud Pub/Sub?
- How it works: Topics and Subscriptions.
Module 13: Implementing streaming pipelines
- Challenges in stream processing.
- Handle late data: watermarks, triggers, accumulation.
Module 14: Streaming analytics and dashboards
- Streaming analytics: from data to decisions.
- Querying streaming data with BigQuery.
- What is Google Data Studio?
Module 15: High throughput and low-latency with Bigtable
- What is Cloud Spanner?
- Designing Bigtable schema.
- Ingesting into Bigtable.