DEV Community

Prazwal Ks
Prazwal Ks

Posted on

DataBricks,Spark Summary

Databricks was founded by the creators of Apache Spark, Delta Lake, and MLflow.

Over 2000 global companies use the Databricks platform across big data & machine learning lifecycle.

Databricks Vision is to Accelerate innovation by unifying data science, data engineering, and business.

Databricks offers
1.Databricks Workspace - Interactive Data Science & Collaboration
2.Databricks Workflows - Production Jobs & Workflow Automation
Databricks Runtime
3.Databricks I/O (DBIO) - Optimized Data Access Layer
4.Databricks Serverless - Fully Managed Auto-Tuning Platform
5.Databricks Enterprise Security (DBES) - End-To-End Security & Compliance

Apache Spark
Spark is a unified processing engine that can analyze big data using SQL, machine learning, graph processing, or real-time stream analysis

Spark Engine
At its core is the Spark Engine.
The DataFrames API provides an abstraction above Resilient Distributed Datasets (RDDs) while simultaneously improving performance 5-20x over traditional RDDs with its Catalyst Optimizer.
Spark ML provides high quality and finely tuned machine learning algorithms for processing big data.
The Graph processing API gives us an easily approachable API for modeling pairwise relationships between people, objects, or nodes in a network.
The Streaming APIs give us End-to-End Fault Tolerance, with Exactly-Once semantics, and the possibility for sub-millisecond latency.
And it all works together seamlessly!

Top comments (0)