DEV Community

Cover image for Building Streaming Iceberg Tables for Real-Time Logistics Analytics
Eliana Lam for AWS Community On Air

Posted on

Building Streaming Iceberg Tables for Real-Time Logistics Analytics

Speaker: Fahad Shah @ AWS Amarathon 2025

Summary by Amazon Nova



Modern Logistics Challenges:

  • Managing multiple streams for trucks, drivers, routes, fuel, maintenance, shipments, and warehouses.

  • Need for real-time operational views and long-term analytics.

Data Storage Requirements:

  • Fresh, joined views for immediate operations.

  • Use of Apache Iceberg for long-term analytics.

Technology Stack:

  • RisingWave: Data platform for streaming capabilities.

  • Lakekeeper: Open REST catalog for data management.

  • Kafka: Event backbone for streaming data.

  • Object Storage (e.g., MinIO): Storage solution for data.

Objective:

  • Demonstrate how to build streaming Iceberg tables using the specified open stack.

  • Provide a simple and effective solution for modern logistics data management.

The Logistics Analytics Problem

  • Today's logistics platforms generate:

  • Trucks: fleet inventory and locations

  • Drivers: rosters and assignments

  • Shipments: origin, destination, and weight

  • Warehouses: capacity and sites

  • Routes: ETAs and distances

  • Fuel & Maintenance: cost and reliability signals

  • The challenge:

  • Operational teams need fresh, joined views across all of these streams.

  • Data teams need the same data in Iceberg for BI, AI, and historical analysis.

What We’ll Build (Streaming Iceberg Pattern)

  • Kafka feeds seven logistics topics into RisingWave.

  • A multi-way streaming join is expressed in SQL and materialized continuously inside RisingWave.

  • The result is persisted from RisingWave as a native Apache Iceberg table in S3-compatible object store like MinIO.

  • Engines like Spark, Trino, and DuckDB query the same Iceberg tables via an open REST catalog.

Why Streaming Iceberg Tables with RisingWave?

  • [ 1 ] Batch-first workflows:

  • Periodic jobs, stale joins, and heavy pipelines.

  • Separate ETL tools to write into Iceberg.

  • [ 2 ] RisingWave + streaming Iceberg tables:

  • Continuously updated joins and aggregates in RisingWave MVs.

  • Iceberg snapshots that are always “almost current.”

  • One RisingWave pipeline that serves both real-time dashboards and offline analytics.

  • Goal: Make Iceberg feel like a database by letting RisingWave own the streaming pipeline and Iceberg writes.



High-Level Architecture

  • Our end-to-end stack:

  • Kafka — event backbone for 7 logistics topics.

  • RisingWave (streaming database) — ingest, join, and aggregate in SQL; manage materialized views.

  • RisingWave Iceberg Table Engine + Lakekeeper — open REST catalog over Iceberg tables.

  • MinIO — S3-compatible object storage.

  • Pattern: Kafka → RisingWave → Iceberg in MinIO → Query from any engine via REST catalog.

Logistics streams in RisingWave & multi-way streaming joins

  • The Seven Logistics Streams in RisingWave

  • Our running example uses seven Kafka topics that become sources in RisingWave:

  • trucks — fleet inventory, capacity, current location.

  • driver — driver details and assigned_truck_id.

  • shipments — origin, destination, weight, truck binding.

  • warehouses — warehouse location and capacity.

  • route — route_id, truck_id, driver_id, ETD/ETA, distance_km.

  • fuel — refueling events (time, liters, station).

  • maint — maintenance history and costs.

  • RisingWave treats each one as a streaming table, ready to be joined with simple PostgreSQL-style SQL.

Pattern 1: Multi-Way Streaming Join in RisingWave

  • In RisingWave, we express the core logistics logic as one multi-way streaming join.

  • LEFT JOIN drivers → trucks to keep unmatched drivers visible.

  • JOIN shipments to attach workload and destinations.

  • JOIN warehouses to bring in capacity and location.

  • JOIN route for ETD/ETA and distance.

  • JOIN fuel and maint for cost and reliability signals.

  • This becomes logistics_joined_mv — a continuously updated, denormalized logistics record per truck/driver/route inside RisingWave.



Fleet KPIs, native Iceberg tables & cross-engine reads

Pattern 2: Fleet KPIs View in RisingWave

  • On top of the joined MV, we define another RisingWave MV for fleet KPIs:

  • Capacity utilization (%) per truck.

  • Total fuel cost and maintenance cost per truck.

  • Combined total operational cost.

  • Current route context (ID, ETD, ETA, distance_km).

  • Associated driver details. overview in RisingWave becomes a live fleet performance table — for Grafana and operational dashboards.

Pattern 3: Streaming to Native Iceberg from RisingWave

  • Instead of a custom writer service:

  • [ 1 ] We define logistics_joined_iceberg as a native Iceberg table managed by RisingWave.

  • [ 2 ] The schema mirrors logistics_joined_mv.

  • [ 3 ] A small config in RisingWave controls how often streaming changes are committed as Iceberg snapshots.

Pattern 4: Cross-Engine Reads via REST Catalog

  • With the Iceberg table created by RisingWave and registered in a Lakekeeper REST catalog:

  • [ 1 ] Spark attaches lakekeeper as a catalog

  • [ 2 ] Trino / DuckDB / Dremio can use their Iceberg connectors to read the same table.

  • [ 3 ] All engines see the same Iceberg data that RisingWave continuously updates.

  • No copies, no proprietary table formats — just plain Iceberg, written by RisingWave.



From local laptop to production cluster: deployment options

  • Deployment Options: From Laptop to Cluster

  • [ 1 ] Local (for learning and prototyping):

  • Run RisingWave, Kafka, MinIO, and Lakekeeper with Docker.

  • Perfect for experimenting with streaming joins and Iceberg tables on your laptop.

  • [ 2 ] Production (for real workloads):

  • Deploy RisingWave and the rest of the stack via Kubernetes + Helm.

  • Use storage classes, resource limits, and persistence suitable for your environment.

  • Same SQL and patterns in RisingWave — just more durable, scalable, and automated.

Simplifying the Traditional Iceberg Stack

  • Traditional Iceberg deployments often require:

  • A separate stream processing engine.

  • Standalone Iceberg writer jobs.

  • External compaction and maintenance workflows.

  • Extra glue to keep catalogs, writers, and storage aligned.

  • With RisingWave:

  • [ 1 ] The streaming database handles ingestion, joins, materialized views, and Iceberg writes.

  • [ 2 ] The REST catalog + MinIO keep everything fully open and interoperable.

  • Fewer moving parts, less operational overhead.

Reference architecture with RisingWave

  • Think of the system in three layers, centered on RisingWave:

  • [ 1 ] Streams → RisingWave Tables.

  • Kafka topics become streaming tables in RisingWave.

  • [ 2 ] Tables → RisingWave Materialized Views.

  • Streaming joins and aggregates become live MVs (logistics_joined_mv, truck_fleet_overview).

  • [ 3 ] Views → Streaming Iceberg Tables.

  • RisingWave turns an MV into a streaming Iceberg table with a small config and an INSERT....SELECT.

  • Once you see RisingWave as the “streaming SQL + Iceberg engine”, you can reuse this model in many domains.

Reusable Patterns Beyond Logistics

  • The RisingWave + Iceberg pattern applies to:

  • E-commerce: orders, inventory, pricing, customer events.

  • FinTech: transactions, balances, risk signals.

  • Industrial IoT: machines, sensors, alerts, maintenance.

  • Telecom: sessions, usage, QoS metrics.

  • Anywhere you have multiple real-time streams plus a need for open, long-term storage, you can use RisingWave MVs and Iceberg tables the same way.

Key Takeaways (RisingWave + Iceberg)

  • A reference architecture combining Kafka, RisingWave, REST catalog, MinIO, and Iceberg.

  • Practical patterns: multi-way streaming joins, KPI views, and native Iceberg writes from RisingWave.

  • Get real-time logistics analytics without custom writers, ad-hoc compaction jobs, or tight vendor lock-in.



Team:

AWS FSI Customer Acceleration Hong Kong

AWS Amarathon Fan Club

AWS Community Builder Hong Kong

Top comments (0)