Skip to main content

Scala & Apache Spark

11 Modules ~36 hours Intermediate → Advanced

Master Scala 3 + Apache Spark 3+ for distributed data processing: from functional programming foundations to multi-terabyte ETL, structured streaming, MLlib, and tuning Spark on Kubernetes.

Course roadmap

#ModuleStatusTopics
0Setup & Hello SparkPlan readyInstall Scala 3, sbt, Spark in Docker, first DataFrame
1Scala FundamentalsPlan readyTypes, immutability, case classes, pattern matching
2Functional ProgrammingPlan readyHigher-order functions, map/flatMap, Option/Either, type classes
3Spark Core (RDDs)Plan readyRDD API, transformations vs actions, lineage, persist/cache
4Spark SQL & DataFramesPlan readyDataFrame API, schema, Catalyst optimizer, joins
5Datasets & EncodersPlan readyType-safe API, Encoders, performance tradeoffs
6Spark StreamingPlan readyStructured Streaming, watermarks, exactly-once, Kafka source
7MLlibPlan readyPipelines, transformers, estimators, model selection
8Tuning & OptimizationPlan readyPartitioning, shuffles, broadcast joins, AQE, skew handling
9Production SparkPlan readySpark on Kubernetes, dynamic allocation, Spark Operator, monitoring
10CapstonePlan readyBuild a streaming ETL: Kafka → Spark Streaming → Iceberg/Delta Lake

What's available now

Curriculum plan published. Content rolling out 2026 H2.

Related courses:

Last updated

2026-05 — Curriculum plan published.