Eliana Lam for AWS Community On Air

Posted on Nov 22

Building Streaming Iceberg Tables for Real-Time Logistics Analytics

#aws #cloud #beginners #productivity

Speaker: Fahad Shah @ AWS Amarathon 2025

Summary by Amazon Nova

Modern Logistics Challenges:

Managing multiple streams for trucks, drivers, routes, fuel, maintenance, shipments, and warehouses.
Need for real-time operational views and long-term analytics.

Data Storage Requirements:

Fresh, joined views for immediate operations.
Use of Apache Iceberg for long-term analytics.

Technology Stack:

RisingWave: Data platform for streaming capabilities.
Lakekeeper: Open REST catalog for data management.
Kafka: Event backbone for streaming data.
Object Storage (e.g., MinIO): Storage solution for data.

Objective:

Demonstrate how to build streaming Iceberg tables using the specified open stack.
Provide a simple and effective solution for modern logistics data management.

The Logistics Analytics Problem

Today's logistics platforms generate:
Trucks: fleet inventory and locations
Drivers: rosters and assignments
Shipments: origin, destination, and weight
Warehouses: capacity and sites
Routes: ETAs and distances
Fuel & Maintenance: cost and reliability signals
The challenge:
Operational teams need fresh, joined views across all of these streams.
Data teams need the same data in Iceberg for BI, AI, and historical analysis.

What We’ll Build (Streaming Iceberg Pattern)

Kafka feeds seven logistics topics into RisingWave.
A multi-way streaming join is expressed in SQL and materialized continuously inside RisingWave.
The result is persisted from RisingWave as a native Apache Iceberg table in S3-compatible object store like MinIO.
Engines like Spark, Trino, and DuckDB query the same Iceberg tables via an open REST catalog.

Why Streaming Iceberg Tables with RisingWave?

[ 1 ] Batch-first workflows:
Periodic jobs, stale joins, and heavy pipelines.
Separate ETL tools to write into Iceberg.
[ 2 ] RisingWave + streaming Iceberg tables:
Continuously updated joins and aggregates in RisingWave MVs.
Iceberg snapshots that are always “almost current.”
One RisingWave pipeline that serves both real-time dashboards and offline analytics.
Goal: Make Iceberg feel like a database by letting RisingWave own the streaming pipeline and Iceberg writes.

High-Level Architecture

Our end-to-end stack:
Kafka — event backbone for 7 logistics topics.
RisingWave (streaming database) — ingest, join, and aggregate in SQL; manage materialized views.
RisingWave Iceberg Table Engine + Lakekeeper — open REST catalog over Iceberg tables.
MinIO — S3-compatible object storage.
Pattern: Kafka → RisingWave → Iceberg in MinIO → Query from any engine via REST catalog.

Logistics streams in RisingWave & multi-way streaming joins

The Seven Logistics Streams in RisingWave
Our running example uses seven Kafka topics that become sources in RisingWave:
trucks — fleet inventory, capacity, current location.
driver — driver details and assigned_truck_id.
shipments — origin, destination, weight, truck binding.
warehouses — warehouse location and capacity.
route — route_id, truck_id, driver_id, ETD/ETA, distance_km.
fuel — refueling events (time, liters, station).
maint — maintenance history and costs.
RisingWave treats each one as a streaming table, ready to be joined with simple PostgreSQL-style SQL.

Pattern 1: Multi-Way Streaming Join in RisingWave

In RisingWave, we express the core logistics logic as one multi-way streaming join.
LEFT JOIN drivers → trucks to keep unmatched drivers visible.
JOIN shipments to attach workload and destinations.
JOIN warehouses to bring in capacity and location.
JOIN route for ETD/ETA and distance.
JOIN fuel and maint for cost and reliability signals.
This becomes logistics_joined_mv — a continuously updated, denormalized logistics record per truck/driver/route inside RisingWave.

Fleet KPIs, native Iceberg tables & cross-engine reads

Pattern 2: Fleet KPIs View in RisingWave

On top of the joined MV, we define another RisingWave MV for fleet KPIs:
Capacity utilization (%) per truck.
Total fuel cost and maintenance cost per truck.
Combined total operational cost.
Current route context (ID, ETD, ETA, distance_km).
Associated driver details. overview in RisingWave becomes a live fleet performance table — for Grafana and operational dashboards.

Pattern 3: Streaming to Native Iceberg from RisingWave

Instead of a custom writer service:
[ 1 ] We define logistics_joined_iceberg as a native Iceberg table managed by RisingWave.
[ 2 ] The schema mirrors logistics_joined_mv.
[ 3 ] A small config in RisingWave controls how often streaming changes are committed as Iceberg snapshots.

Pattern 4: Cross-Engine Reads via REST Catalog

With the Iceberg table created by RisingWave and registered in a Lakekeeper REST catalog:
[ 1 ] Spark attaches lakekeeper as a catalog
[ 2 ] Trino / DuckDB / Dremio can use their Iceberg connectors to read the same table.
[ 3 ] All engines see the same Iceberg data that RisingWave continuously updates.
No copies, no proprietary table formats — just plain Iceberg, written by RisingWave.

From local laptop to production cluster: deployment options

Deployment Options: From Laptop to Cluster
[ 1 ] Local (for learning and prototyping):
Run RisingWave, Kafka, MinIO, and Lakekeeper with Docker.
Perfect for experimenting with streaming joins and Iceberg tables on your laptop.
[ 2 ] Production (for real workloads):
Deploy RisingWave and the rest of the stack via Kubernetes + Helm.
Use storage classes, resource limits, and persistence suitable for your environment.
Same SQL and patterns in RisingWave — just more durable, scalable, and automated.

Simplifying the Traditional Iceberg Stack

Traditional Iceberg deployments often require:
A separate stream processing engine.
Standalone Iceberg writer jobs.
External compaction and maintenance workflows.
Extra glue to keep catalogs, writers, and storage aligned.
With RisingWave:
[ 1 ] The streaming database handles ingestion, joins, materialized views, and Iceberg writes.
[ 2 ] The REST catalog + MinIO keep everything fully open and interoperable.
Fewer moving parts, less operational overhead.

Reference architecture with RisingWave

Think of the system in three layers, centered on RisingWave:
[ 1 ] Streams → RisingWave Tables.
Kafka topics become streaming tables in RisingWave.
[ 2 ] Tables → RisingWave Materialized Views.
Streaming joins and aggregates become live MVs (logistics_joined_mv, truck_fleet_overview).
[ 3 ] Views → Streaming Iceberg Tables.
RisingWave turns an MV into a streaming Iceberg table with a small config and an INSERT....SELECT.
Once you see RisingWave as the “streaming SQL + Iceberg engine”, you can reuse this model in many domains.

Reusable Patterns Beyond Logistics

The RisingWave + Iceberg pattern applies to:
E-commerce: orders, inventory, pricing, customer events.
FinTech: transactions, balances, risk signals.
Industrial IoT: machines, sensors, alerts, maintenance.
Telecom: sessions, usage, QoS metrics.
Anywhere you have multiple real-time streams plus a need for open, long-term storage, you can use RisingWave MVs and Iceberg tables the same way.

Key Takeaways (RisingWave + Iceberg)

A reference architecture combining Kafka, RisingWave, REST catalog, MinIO, and Iceberg.
Practical patterns: multi-way streaming joins, KPI views, and native Iceberg writes from RisingWave.
Get real-time logistics analytics without custom writers, ad-hoc compaction jobs, or tight vendor lock-in.

Team:

AWS FSI Customer Acceleration Hong Kong

AWS Amarathon Fan Club

AWS Community Builder Hong Kong

DEV Community

Building Streaming Iceberg Tables for Real-Time Logistics Analytics

Top comments (0)