DEV Community

# spark

Posts

ūüĎč Sign in for the ability to sort posts by relevant, latest, or top.
How to run Amazon EMR Serverless with --packages flag

How to run Amazon EMR Serverless with --packages flag

Reactions 3 Comments
6 min read
Sentiment Analysis using Kafka, Apache Spark

Sentiment Analysis using Kafka, Apache Spark

Reactions 5 Comments
6 min read
Running Delta Lake on Amazon EMR Serverless

Running Delta Lake on Amazon EMR Serverless

Reactions 14 Comments
7 min read
Deep Dive into Apache Iceberg via Apache Zeppelin

Deep Dive into Apache Iceberg via Apache Zeppelin

Reactions 7 Comments
7 min read
How to recover from a Kafka topic reset in Spark Structured Streaming

How to recover from a Kafka topic reset in Spark Structured Streaming

Reactions 2 Comments
4 min read
Build a real-time streaming app with Docker, Redpanda, and Apache Spark

Build a real-time streaming app with Docker, Redpanda, and Apache Spark

Reactions 7 Comments
6 min read
MongoDB $weeklyUpdate #70 (May 20, 2022): Apache Spark, Verizon, and MongoDB World!

MongoDB $weeklyUpdate #70 (May 20, 2022): Apache Spark, Verizon, and MongoDB World!

Reactions 3 Comments
3 min read
How to use Spark and Pandas to prepare big data

How to use Spark and Pandas to prepare big data

Reactions 11 Comments
5 min read
ETL with Spark on Azure Databricks and Azure Data Warehouse (Part 2)

ETL with Spark on Azure Databricks and Azure Data Warehouse (Part 2)

Reactions 10 Comments
5 min read
Build a rest service from the command line, as simple as ‚Äúevery request has a response.‚ÄĚ

Build a rest service from the command line, as simple as ‚Äúevery request has a response.‚ÄĚ

Reactions 6 Comments
3 min read
Details of 4 best opensource projects about big data you should try outÔľą‚Ö†ÔľČ

Details of 4 best opensource projects about big data you should try outÔľą‚Ö†ÔľČ

Reactions 8 Comments
5 min read
Spark programming basics (Python version)

Spark programming basics (Python version)

Reactions 11 Comments
6 min read
Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment

Quick use of CDC: A new demo from lakesoul makes it easier to set up the environment

Reactions 8 Comments
5 min read
4 best opensource projects about big data you should try out

4 best opensource projects about big data you should try out

Reactions 16 Comments 3
3 min read
A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake

A new unified streaming and batch table storage solution similar to iceberg/hudi/delta lake

Reactions 7 Comments
2 min read
Testing PySpark & Pandas in style

Testing PySpark & Pandas in style

Reactions 3 Comments
2 min read
How to handle nested JSON with Apache Spark

How to handle nested JSON with Apache Spark

Reactions 3 Comments
3 min read
Spark aggregation with native API's

Spark aggregation with native API's

Reactions 6 Comments
3 min read
Spark Catalyst Optimizer and spark Expression basics

Spark Catalyst Optimizer and spark Expression basics

Reactions 4 Comments
4 min read
Quill- Most efficient Scala driver for Apache Cassandra and Spark

Quill- Most efficient Scala driver for Apache Cassandra and Spark

Reactions 2 Comments
4 min read
Exploring Apache Spark New Pandas API

Exploring Apache Spark New Pandas API

Reactions 8 Comments
5 min read
Data Lake explained

Data Lake explained

Reactions 6 Comments
4 min read
Jupyter notebooks for Spark with customised Docker containers

Jupyter notebooks for Spark with customised Docker containers

Reactions 8 Comments
2 min read
Creating and running Spark Jobs in Scala on Cloud Dataproc !!!

Creating and running Spark Jobs in Scala on Cloud Dataproc !!!

Reactions 6 Comments
3 min read
Serverless Spark on GCP : How does it compare with Dataflow ?

Serverless Spark on GCP : How does it compare with Dataflow ?

Reactions 4 Comments
5 min read
Spark is lit once again

Spark is lit once again

Reactions 9 Comments
4 min read
Updating Partition Values With Apache Hudi

Updating Partition Values With Apache Hudi

Reactions 5 Comments
3 min read
Using Apache Hudi on Amazon EMR

Using Apache Hudi on Amazon EMR

Reactions 6 Comments 1
5 min read
Running Apache Spark on EKS Fargate

Running Apache Spark on EKS Fargate

Reactions 6 Comments
4 min read
Data Optimization for Compacted Partitions

Data Optimization for Compacted Partitions

Reactions 3 Comments
8 min read
Build your own Air Quality Map with OpenAQ and EMR on EKS

Build your own Air Quality Map with OpenAQ and EMR on EKS

Reactions 4 Comments
12 min read
Databricks and PyODBC - Avoiding another MS repo outage

Databricks and PyODBC - Avoiding another MS repo outage

Reactions 5 Comments
2 min read
Spark : Replace collect()[][]

Spark : Replace collect()[][]

Reactions 4 Comments 1
1 min read
Getting Info About Spark Partitions

Getting Info About Spark Partitions

Reactions 5 Comments
3 min read
Creating a Spark Standalone Cluster with Docker and docker-compose(2021 update)

Creating a Spark Standalone Cluster with Docker and docker-compose(2021 update)

Reactions 21 Comments 2
7 min read
Data storage patterns, versioning and partitions

Data storage patterns, versioning and partitions

Reactions 9 Comments
9 min read
My Journey With Spark On Kubernetes... In Python (1/3)

My Journey With Spark On Kubernetes... In Python (1/3)

Reactions 32 Comments
9 min read
My Journey With Spark On Kubernetes... In Python (3/3)

My Journey With Spark On Kubernetes... In Python (3/3)

Reactions 13 Comments 1
17 min read
My Journey With Spark On Kubernetes... In Python (2/3)

My Journey With Spark On Kubernetes... In Python (2/3)

Reactions 16 Comments
9 min read
Unit testing your PySpark library

Unit testing your PySpark library

Reactions 7 Comments
9 min read
How to recover from a deleted _spark_metadata folder in Spark Structured Streaming

How to recover from a deleted _spark_metadata folder in Spark Structured Streaming

Reactions 7 Comments 3
5 min read
Spark and Docker: Your Spark development cycle just got 10x faster !

Spark and Docker: Your Spark development cycle just got 10x faster !

Reactions 15 Comments
7 min read
How-to guide: Set up, Manage & Monitor Spark on Kubernetes

How-to guide: Set up, Manage & Monitor Spark on Kubernetes

Reactions 20 Comments
10 min read
Apache Spark Java Tutorial: Simplest Guide to Get Started

Apache Spark Java Tutorial: Simplest Guide to Get Started

Reactions 7 Comments
3 min read
Is Structured Streaming Exactly-Once? Well, it depends...

Is Structured Streaming Exactly-Once? Well, it depends...

Reactions 8 Comments
4 min read
can a map function be executed on multiple executors for an item in RDD.

can a map function be executed on multiple executors for an item in RDD.

Reactions 3 Comments
1 min read
Predicting machine failures with distributed computing (Spark, AWS EMR, and DL)

Predicting machine failures with distributed computing (Spark, AWS EMR, and DL)

Reactions 9 Comments
10 min read
Using Aerospike Connect For Spark

Using Aerospike Connect For Spark

Reactions 6 Comments
5 min read
Migrating from a plain Spark Application to ZIO with ZparkIO

Migrating from a plain Spark Application to ZIO with ZparkIO

Reactions 9 Comments
6 min read
Spark: unit, integration and end-to-end tests.

Spark: unit, integration and end-to-end tests.

Reactions 16 Comments
5 min read
Spark Journey begins...

Spark Journey begins...

Reactions 8 Comments
3 min read
Working with nested structures in Spark

Working with nested structures in Spark

Reactions 6 Comments 1
3 min read
Intoduction to Apache Spark

Intoduction to Apache Spark

Reactions 10 Comments
6 min read
Large-Scale Data Quality Verification in .NET PT.1

Large-Scale Data Quality Verification in .NET PT.1

Reactions 2 Comments
9 min read
Spark Side Menu Micro-Interactions Deconstruction

Spark Side Menu Micro-Interactions Deconstruction

Reactions 3 Comments
2 min read
Unit Testing Apache Spark Structured Streaming using MemoryStream

Unit Testing Apache Spark Structured Streaming using MemoryStream

Reactions 7 Comments
4 min read
Setting up IntelliJ IDEA for Apache Spark and Scala development

Setting up IntelliJ IDEA for Apache Spark and Scala development

Reactions 5 Comments
2 min read
Exploiting Schema Inference in Apache Spark

Exploiting Schema Inference in Apache Spark

Reactions 2 Comments
3 min read
How to create a low-cost Apache Spark cluster on Microsoft Azure

How to create a low-cost Apache Spark cluster on Microsoft Azure

Reactions 7 Comments
4 min read
How to make a column non-nullable in Spark Structured Streaming

How to make a column non-nullable in Spark Structured Streaming

Reactions 3 Comments
2 min read
loading...