RisingWave v2.8: Query Your Lakehouse, Backfill Faster, and Tune Jobs Individually

#database #sql #opensource #dataengineering

RisingWave v2.8: Query Your Lakehouse, Backfill Faster, and Tune Jobs Individually

If you run streaming pipelines on RisingWave and batch queries on a separate engine, v2.8 changes the equation. This release adds a DataFusion-powered query engine that lets you run batch SQL directly on your data lake (Iceberg, Delta Lake, Hudi) without moving data.

Beyond the lakehouse, we’ve overhauled backfilling to make it significantly faster and introduced per-job resource isolation, giving you granular control over how individual streaming jobs consume CPU and memory.

Here are the highlights of RisingWave v2.8.

Query Your Lakehouse Directly (Powered by DataFusion)

RisingWave has long supported sinking data to data lakes like Apache Iceberg, Delta Lake, and Apache Hudi. However, querying that data usually required an external engine like Trino, StarRocks, or Spark.

In v2.8, we’ve integrated Apache DataFusion as a native batch query engine. You can now use RisingWave to:

Query Sinks Directly: Run SQL queries on the data you’ve already exported to your lakehouse.
Join Streams with Lake Data: Perform complex joins between real-time streaming data and historical data stored in the lake.
Unified Experience: Use the same SQL dialect and connection for both real-time and historical analysis.

This turns RisingWave into a unified engine for the entire data lifecycle—from ingestion and streaming to long-term storage and ad-hoc analysis.

Faster Backfilling with Parallelism

Backfilling—the process of populating a new materialized view with historical data—is a critical but resource-intensive task. In previous versions, backfilling was often a bottleneck for large datasets.

v2.8 introduces Parallel Backfilling. By distributing the backfill task across multiple nodes and utilizing more CPU cores, RisingWave can now process historical data significantly faster. In our internal benchmarks, we’ve seen backfill times drop by up to 50% for large-scale tables, allowing you to get your new materialized views online in record time.

Individual Job Tuning & Resource Isolation

In a multi-tenant or complex streaming environment, one "heavy" job can sometimes impact the performance of others. v2.8 addresses this with Per-Job Resource Isolation.

You can now:

Set Resource Limits: Define specific CPU and memory limits for individual streaming jobs.
Prioritize Critical Jobs: Ensure that your most important pipelines always have the resources they need, regardless of other activity on the cluster.
Granular Monitoring: Track resource consumption at the job level to identify and optimize inefficient queries.

This level of control makes RisingWave more robust for production environments where predictable performance is non-negotiable.

Other Notable Improvements

Enhanced Connector Support: Improved stability and performance for Kafka, Pulsar, and Kinesis connectors.
SQL Enhancements: Support for more window functions and improved query optimization for complex joins.
Dashboard Updates: The RisingWave dashboard now provides more detailed insights into cluster health and job performance.

Get Started with v2.8

RisingWave v2.8 is available now. You can try it out via:

RisingWave Cloud: The easiest way to run RisingWave. Sign up for a free account.
Docker: docker pull risingwavelabs/risingwave:v2.8.0
Binary: Download from our GitHub releases page.

For a full list of changes, check out the v2.8 Release Notes.