DEV Community

Aditi Sharma
Aditi Sharma

Posted on

πŸš€ Day 33 of My Data Journey

Apache Spark Architecture & Fun Facts πŸ”₯

At its core, Spark has a simple but powerful architecture:

πŸ”Ή Driver Program β†’ Controls the application.
πŸ”Ή Cluster Manager β†’ Allocates resources (YARN, Mesos, Standalone, Kubernetes).
πŸ”Ή Executors β†’ Run tasks on worker nodes.
πŸ”Ή RDDs (Resilient Distributed Datasets) β†’ The magic behind fault-tolerance & parallelism.

✨ Interesting Facts:
β€’ Spark stores intermediate data in memory = ⚑ super fast.
β€’ Fault-tolerant by design β†’ if a node fails, Spark rebuilds data.
β€’ Supports batch, streaming, ML, and graph in one framework.

πŸ’‘ Spark = A powerhouse of speed + scalability in Big Data!

Top comments (0)