DEV Community

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Day 10: Partitioning vs Bucketing - The Spark Optimization Guide Every Data Engineer Needs

Comments
2 min read
Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Deepening My Roots in the Data Ecosystem - Choosing Depth Over Breadth

Comments
2 min read
Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Automate Python Manual Extraction: Build End-to-End PDF -> LLM -> SQL Flows with CocoIndex, Ollama, and Postgres

Comments
3 min read
The Boring Debug Checklist That Fixes Most “RAG Failures”

The Boring Debug Checklist That Fixes Most “RAG Failures”

Comments
2 min read
Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Function Calling and Tool Use: Turning LLMs into Action-Taking Agents

Comments
18 min read
dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

dremioframe & iceberg: Pythonic interfaces for Dremio and Apache Iceberg

Comments
8 min read
AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

AWS Lambda and AWS Glue Python Shell in the Context of Lightweight ETL

3
Comments
7 min read
SQL: Doing GROUP BY in CsvPath

SQL: Doing GROUP BY in CsvPath

Comments
5 min read
đŸ”„ Day 3: RDDs - The Foundation of Spark

đŸ”„ Day 3: RDDs - The Foundation of Spark

Comments
2 min read
đŸ”„ Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

đŸ”„ Day 4: RDD Internals - Partitions, Shuffles & Repartitioning Demystified

Comments
2 min read
The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

The Developer's Guide to Normalizing Historical Airline Flight Data for Machine Learning

Comments
6 min read
Overview of Real-Time Data Synchronization from MySQL to VeloDB

Overview of Real-Time Data Synchronization from MySQL to VeloDB

5
Comments
5 min read
Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Stop Writing df.describe(): Automate EDA with D-Tale (The Lazy Engineer's Way)

Comments
3 min read
CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

CHW Monthly Activity Aggregation: Turning Visit Logs into Insight

Comments
5 min read
đŸ”„ Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

đŸ”„ Day 2: Understanding Spark Architecture - How Spark Executes Your Code Internally

Comments
2 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.