Apache Beam | A Hands-On course to build Big data Pipelines

Apache Beam | A Hands-On course to build Big data Pipelines

image description

What you will learn

  • Learn Apache Beam with its Real-Time implementation.
  • Build Real-Time business's Big data processing pipelines using Apache Beam.
  • Learn a portable programming model whose pipelines can be deployed on Spark, Flink, GCP (Google Cloud Dataflow) etc.
  • Understand the working of each and every component of Apache Beam with HANDS-ON Practicals.
  • Develop pipelines for Real world Big data case studies in various business domains.
  • Data-sets and Beam codes used in lectures are available in resources tab. This will save your typing efforts.
  • Course will be updated upon each new Beam version update.


Section 1: Introduction

Section 2: Transformations in Beam

Section 3: Side Inputs and Outputs

Section 4: Real Time Case Study - Identifying Bank's Defaulter Customers

Section 5: Data encoding & decoding

Section 6: Type Hints in Beam

Section 7: Build Streaming data Pipelines

Section 8: Implementing Windows in Apache Beam

Section 9: Watermarks in Streaming environment

Section 10: Triggers and its Implementation

Section 11: Real Time Case Study - Mobile Game Analysis

Section 12: Deploy Beam pipeline on Google Cloud Dataflow

Section 13: BONUS

Course Description

Build Big data pipelines with Apache Beam in any language and run it via Spark, Flink, GCP (Google Cloud Dataflow).


  • Basic knowledge of Distributed data processing architecture.
  • Basic knowledge of Python would be helpful.


Apache Beam is a unified and portable programming model for both Batch and Streaming use cases.

Earlier we could run Spark, Flink & Cloud Dataflow Jobs only on their respective clusters. But now Apache Beam has come up with a portable programming model where we can build language agnostic Big data pipelines and run it using any Big data engine (Apache Spark, Flink or in Google Cloud Platform using its Cloud Dataflow and many more Big data engines).

Apache Beam is the future of building Big data processing pipelines and is going to be accepted by mass companies due to its portability. Many big companies have even started deploying Beam pipelines in their production servers.

What's included in the course ?

  • Complete Apache Beam concepts explained from Scratch to Real-Time implementation.
  • Each and every Apache Beam concept is explained with a HANDS-ON example of it.
  • Include even those concepts, the explanation to which is not very clear even in Apache Beam's official documentation.
  • Build 2 Real-time Big data case studies using Beam.
  • Codes and Datasets used in lectures are attached in the course for your convenience.

Who this course is for:

  • Students who want to learn Apache Beam from scratch to its Live Project Implementation.
  • Data engineers who want to build unified & portable Big data processing pipelines.
  • Developers who want to learn a futuristic programming model for Big data processing.