DEV Community

Cover image for Apache Spark Basics
Adi Polak
Adi Polak

Posted on • Originally published at github.com

Apache Spark Basics

This is a basic cheat sheet, glossary and the very beginning of getting started with Apache Spark, every time we will share a new post with terms or code snippets, they will appear here as well at a generic form.

If you work with Apache Spark and look for a cheat sheet, this is for you as well!

First thing first:

-1- the workspace:

First, we need to create the workspace, we are using Databricks workspace and here is a tutorial for creating it.

-2- Basic Apache Spark Vocabulary :

Dataframe 

This is a distributed collection of data organized into named columns that provide operations to filter, group, or compute aggregates. Dataframe data is often distributed across multiple machines. It can be in-memory data or on disk.

Dataset

Strongly typed collection of objects that can be transformed in parallel using functional or relational operations. Each Dataset is a typed view of Dataframe.
Dataset is defined as "lazy", meaning the computations are only triggered when an action is invoked.

RelationalGroupedDataset

A set of methods for aggregations on a DataFrame, created by groupBy, cube or rollup.

This is an evolving page and more terms, code snippets and architecture design will be added.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)

AWS Security LIVE!

Tune in for AWS Security LIVE!

Join AWS Security LIVE! for expert insights and actionable tips to protect your organization and keep security teams prepared.

Learn More

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay