kelsey-deltastream for DeltaStream

Posted on Dec 13 • Originally published at deltastream.io on Nov 13

What’s Coming in Apache Flink 2.0?

#technology

What’s Coming in Apache Flink 2.0?

As champions for Apache Flink, we are excited for the 2.0 release and all that it will bring. Apache Flink 1.0 was released in 2016, and while we don’t have an exact release date, it looks like 2.0 will be released in late 2024/early 2025. Version 1.2 was just released in August 2024. Version 2.0 is set to be a major milestone release, marking a significant evolution in the stream processing framework. This blog runs down some of the key features and changes coming in Flink 2.0.

Disaggregated State Storage and Management

One of the most exciting features of Flink 2.0 is the introduction of disaggregated state storage and management. It will utilize a Distributed File System (DFS) as the primary storage for state data. This architecture separates compute and storage resources, addressing key scalability and performance needs for large-scale, cloud-native data processing.

Core Advantages of Disaggregated State Storage

Improved Scalability By decoupling storage from compute resources, Flink can manage massive datasets—into the hundreds of terabytes—without being constrained by local storage. This separation enables efficient scaling in containerized and cloud environments.
Enhanced Recovery and Rescaling The new architecture supports faster state recovery on job restarts, efficient fault tolerance, and quicker job rescaling with minimal downtime. Key components include shareable checkpoints and LazyRestore for on-demand state recovery.
Optimized I/O Performance Flink 2.0 uses asynchronous execution and grouped remote state access to minimize the latency impact of remote storage. A hybrid caching mechanism can improve cache efficiency, providing up to 80% better throughput than traditional file-level caching.
Improved Batch Processing Disaggregated state storage enhances batch processing by better handling large state data and integrating batch and stream processing tasks, making Flink more versatile across diverse workloads.
Dynamic Resource Management The architecture enables flexible resource allocation, minimizing CPU and network usage spikes during maintenance tasks like compaction and cleanup.

API and Configuration Changes

Several API and configuration changes will be introduced, including:

Removal of deprecated APIs, including the DataSet API and Scala versions of DataStream and DataSet APIs
Deprecation of the legacy SinkFunction API in favor of the Unified Sink API
Overhaul of the configuration layer, enhancing user-friendliness and maintainability
Introduction of new abstractions such as Materialized Tables in v1.2 and further enhanced in v2
Updates to configuration options, including proper type usage (e.g., Duration, Enum, Int)

Modernization and Unification

Flink 2.0 aims to further unify batch and stream processing:

Modernization of legacy components, such as replacing the legacy SinkFunction with the new Unified Sink API
Enhanced features that combine batch and stream processing seamlessly
Improvements to Adaptive Batch Execution for optimizing logical and physical plans

Performance Improvements

The community is working on making Flink’s performance on bounded streams (batch use cases) competitive with dedicated batch processors. This can further simplify your data processing stack.

Dynamic Partition Pruning (DPP) to minimize I/O costs
Runtime Filter to reduce I/O and shuffle costs
Operator Fusion CodeGen to improve query execution performance

Cloud-Native Focus

Flink 2.0 is being designed with cloud-native architectures in mind:

Improved efficiency in containerized environments
Better scalability for large state sizes
More efficient fault tolerance and faster rescaling

Summary of Flink 2.0

This is an exciting time for Apache Flink 2.0. It represents a significant leap forward in unified batch and stream processing, focusing on cloud-native architectures, improved performance, and streamlined APIs. These changes aim to address the evolving needs of data-driven applications and set new standards for what’s possible in data processing. DeltaStream is proudly powered by Apache Flink, which makes it easy to start running Flink in minutes. Get a free trial of DeltaStream and see for yourself.

The post What’s Coming in Apache Flink 2.0? appeared first on DeltaStream.

DEV Community

What’s Coming in Apache Flink 2.0?

What’s Coming in Apache Flink 2.0?

Disaggregated State Storage and Management

Core Advantages of Disaggregated State Storage

API and Configuration Changes

Modernization and Unification

Performance Improvements

Cloud-Native Focus

Summary of Flink 2.0

Top comments (0)

Read next

Understanding React Hooks: A Guide to Modern React Development

Mastering React's useReducer Hook: Managing Complex State with Actions

Apollo Client for GraphQL State Management in React: Simplifying Data Fetching and Caching

How do I display images from Google Drive on a website?