DEV Community: Thriving.dev | Software Architecture

Reduce Rebalance Downtime (by x450) for Stateless Kafka Streams Apps [Simple Steps]

Thriving.dev | Software Architecture — Sun, 18 Jun 2023 22:43:22 +0000

In this post, we’ll learn how Kafka Streams Consumers behave differently from regular Kafka Consumers, the consequences for the application, as well as steps to minimise downtimes in event processing when consumer group members change.

With the default configuration, a containerised stateless Streams app pauses processing for >45s when one app instance (group member) is removed or restarted.
For real-time data streaming workloads with a low e2e latency as an NFR (non-functional requirement), such a long ‘rebalance downtime’ often is unacceptable.

Fortunately, there’s a simple yet efficient solution to address this problem.

(i) As a bonus, we will look under the hood of Kafka Consumer Groups, the Group Coordinator & Rebalance Protocol, and measure, analyse and evaluate a simulation of a group member (replica) re-creation running on Kubernetes.

...TLDR? here's a spoiler:

props.put("internal.leave.group.on.close", true);

Theory

Regular Consumer Behaviour

Let’s briefly recap on Kafka Consumers and Consumer groups.

An Apache Kafka® Consumer is a client application that subscribes to (reads and processes) events.

A consumer group is a set of consumers which cooperate to consume data from some topics. The partitions of all the topics are divided among the consumers in the group. As new group members arrive and old members leave, the partitions are re-assigned so that each member receives a proportional share of the partitions. This is known as rebalancing the group.

(…) One of the brokers is designated as the group’s coordinator and is responsible for managing the members of the group as well as their partition assignments.

(…) When the consumer starts up, it finds the coordinator for its group and sends a request to join the group. The coordinator then begins a group rebalance so that the new member is assigned its fair share of the group’s partitions. Every rebalance results in a new generation of the group.

Each member in the group must send heartbeats to the coordinator in order to remain a member of the group. If no heartbeat is received before expiration of the configured session timeout, then the coordinator will kick the member out of the group and reassign its partitions to another member.

(Source: Kafka Consumer | Confluent Documentation)

When a consumer leaves a group due to a controlled shutdown or a crash, its partitions are reassigned automatically to other consumers. Similarly, when a consumer (re) joins an existing group, all partitions are rebalanced among the group members. This dynamic group cooperation is facilitated by the Kafka Rebalance Protocol.

For a rebalance scenario where one instance is stopped, the Consumer sends a LeaveGroup request to the coordinator before stopping (as part of a graceful shutdown, Consumer#close()), which triggers a rebalance.

During the entire rebalancing process, i.e. as long as the partitions are not reassigned, consumers no longer process any data. Fortunately, rebalancing is very fast, typically between anything from 50ms to seconds. It may vary depending on different factors, such as load on your Kafka cluster or the complexity of your Streams topology (no. of input topics, streams tasks := partitions, and state stores, … -> total no. of consumers).

!= Streams Consumer Behaviour

For Kafka Streams, some config properties are overridden via (StreamsConfig.CONSUMER_DEFAULT_OVERRIDES). One of those properties is "internal.leave.group.on.close", set to false (enabled by default for regular Consumers).

Please note it’s a non-public config, which may change without prior notice with new releases.
Reference: ConsumerConfig.LEAVE_GROUP_ON_CLOSE_CONFIG.

This means Consumers will not send LeaveGroup requests when stopped but will be removed by the coordinator only when the Consumer session times out (ref. session.timeout.ms).
The default Consumer session timeout is 45s (note: was 10s before the Kafka 3.0.0 release, ref
KIP-735). Consequently, no data is processed for more than 45 seconds for tasks assigned to the Consumer that had been stopped.

It even worsens if a new Consumer (re) joins the group while suspected dead (no more heartbeats received), where all consumers shut down, and task assignment is blocked until the timeout is exceeded. The coordinator evicts the old Consumer that had been stopped from the group. Until then, processing comes completely to a halt for all tasks, also known as ‘stop-the-world’ rebalancing. While the ‘incremental cooperative rebalancing protocol’ introduced with Kafka 2.5 avoids ‘stop-the-world’ rebalancing for regular Consumers, the mentioned Kafka Streams overrides nullify some aspects.

Example Scenario: Kubernetes Pod Evicted … and Replaced

Running your apps on Kubernetes takes a long way to achieve a robust, highly-available deployment. Kubernetes monitors your containers’ health, allows you to scale, and ensures all desired replicas are up and running according to your spec.

But still, to be truly elastic and minimise downtime of your data stream processing, your application must be able to handle Pods (/container) to be restarted, evicted, and re-created gracefully.
There are many potential causes, e.g. application upgrades (CI/CD), k8s cluster security patching, (auto-)scaling, resource shortage, or k8s nodes running on Spot instances being interrupted.

Next, we look at a simple yet common example.

Example infrastructure setup: Stateless Kafka Streams app, 6 streams tasks, running on Kubernetes as Deployment, with 3 replicas.

Scenario: One pod is terminated and successively replaced.

Initial state, all 3 pods are running & healthy, the streams app is processing, balanced task assignment
Pod (P1.1) terminated (deleted) by k8s, shutting down gracefully
A replacement Pod (P1.2) is scheduled & placed
Final state, the replacement Pod is running & healthy, the streams app is processing, balanced task assignment

I would like to share a screenshot depicting the consumer lag metrics, rendered in Grafana, for a simulation of our scenario.

Let’s walk through the results and explain the behaviour:

18:34:00: the Pod (P1.1) is terminated and stops processing. The consumer lag of partitions [1,4] starts to build up
18:34:20: the replacement Pod (P1.2) has come up; the streams task sends a JoinGroup to the group coordinator
18:34:21: rebalancing triggered, assignments revoked, pauses - due to no heartbeats received from (P1.1)
18:34:21: all consumers pause processing, waiting for assignment; lag starts to build up for all partitions
18:34:45: rebalancing continues, new assignment, processing continues
18:34:48: all consumers caught up; consumer lags are back to healthy jitter

To better illustrate everything that is happening over time, here’s a time bar diagram highlighting all important steps:

Here are the belonging application logs for the rebalancing, which occurred at 16:34:44.988 and took 92ms.

2023-06-17 16:34:44,988 INFO State transition from RUNNING to REBALANCING
2023-06-17 16:34:45,080 INFO State transition from REBALANCING to RUNNING

So we can conclude the following downtimes:

partitions [2,5]: 48s
partitions [0,1,2,3,4]: 25s While the actual rebalancing took only 92ms.

😵 Wait, 48s? Really???

Depending on your stream processing use case, 45s+ downtime might be no big deal, but for real-time low-latency data streams, it’s a massive breach of the NFR.

So let’s see what options we’ve got to mitigate:

Option 1: Lower consumer session timeout

Since the session timeout determines the downtime, one way to mitigate is to reduce session.timeout.ms.
Don’t forget to decrease the value of heartbeat.interval.ms to ensure three heartbeats plus a buffer can fit within the timeout period.

session.timeout.ms=6000
heartbeat.interval.ms=1500

Read the config here: Kafka Consumer Configurations

Option 2: Enable ‘leaveGroupOnClose’

…but why work with timeouts when it’s perfectly valid to have your stateless Streams Consumers notify the coordinator when closing down?!?

To enable ‘leaveGroupOnClose’ (overriding the override 😜), configure your Kafka Streams app with following property:

internal.leave.group.on.close=true

Warn: Please note it’s a non-public config, which may change without prior notice with new releases.
Reference: ConsumerConfig.LEAVE_GROUP_ON_CLOSE_CONFIG.

Re-do the Example with ‘leaveGroupOnClose’ 🚀

Drum roll 🥁 … and here, without further ado, the results:

As we can (not) see - the two rebalancings complete so fast that there's not even the slightest consumer lag increase visible in the metrics.

Here’s the visual explanation:

Finally, here are also the application logs showing the timings of the rebalancing, which happened twice. One at 17:46:00.332 that took 92ms, and the other at 17:46:21.361 in 98ms.

2023-06-17 17:46:00,332 INFO State transition from RUNNING to REBALANCING
2023-06-17 17:46:00,424 INFO State transition from REBALANCING to RUNNING
2023-06-17 17:46:21,361 INFO State transition from RUNNING to REBALANCING
2023-06-17 17:46:21,458 INFO State transition from REBALANCING to RUNNING

Pro Tips

Stateless <> Stateful

This post recommends setting internal.leave.group.on.close=true for stateless (!) Kafka Streams applications.

Before implementing internal.leave.group.on.close=true for stateful applications, it is crucial to understand all potential consequences.

Info: Unfortunately, my evaluation using internal.leave.group.on.close=true in combination with standby replicas was not very promising.

The expected fluent task re-assignment to hot standby while one replica "restarts" - and subsequent re-distribution of tasks, does not work.

The Kafka Streams specific HighAvailabilityTaskAssignor has known issues such as uneven task assignment, frozen warmup tasks ('task movement'), and not recognising caught-up standby tasks when the consumer group changes.
Please note there are plans to address those issues with the next version of the Consumer Rebalance Protocol (see footnotes).

Often the best plan to keep downtimes low during rebalance for stateful apps is to stick with RocksDB + StatefulSet + PersistentVolumes + restart within (!) the session timeout

=> re-join with previous assignment, re-use RocksDB state, and avoid rebalancing entirely...

Tip: Alternatively, take a look at kafka-streams-cassandra-state-store, introduced in an earlier blog post.

k8s Deployment .spec.minReadySeconds

Frequently rebalancing within a short timeframe can cause consumer delays and strain the Kafka cluster.

If your application/container has a quick restart time, such as when running as a GraalVM native executable, it’s worth considering the use of .spec.minReadySeconds to maintain control and ensure upgrades occur in a controlled manner. This will help prevent frequent rebalancing within a short timeframe.

Conclusion

By configuring your Kafka Streams app with internal.leave.group.on.close=true, a graceful shutdown immediately triggers a rebalancing process and tasks are re-assigned to other active members within the group.
The processing downtime is significantly reduced while also improving elasticity and resilience. As a result, your applications enables interruption-free CI/CD and can be auto-scaled.

Please note that this recommendation only applies to stateless streams applications.! Tread carefully for stateful topologies, and do your homework!

Remember that internal.leave.group.on.close is a non-public config, which may change without prior notice with new releases. Always check the source code for changes when upgrading the Kafka Streams dependency.

Footnotes

When writing this blog post, the latest version of kafka-streams was 3.4.1.
There’s a ticket KAFKA-6995 from June 2018 proposing to make the config public. The ticket is closed as ’Won’t Fix’. Concerns of the core developer team can be found in the discussion.
Looking into the crystal ball: A Kafka Design Proposal (KIP) is in progress to introduce a new group membership and rebalance protocol for the Kafka Consumer and, by extensions, Kafka Streams. => KIP-848: The Next Generation of the Consumer Rebalance Protocol
It was also introduced on Current 2022: The Next Generation of the Consumer Rebalance Protocol With David Jacot | UK
The application + docker-compose setup that was put together for this article can be found on the thriving-dev GitHub Organisation: :icon{name="mdi-github" class="inline -mt-0.5 w-6 h-6"} https://github.com/thriving-dev/kafka-streams-leave-group-on-close
Many thanks to @MatthiasJSax for proofreading the blog post! 🙇

References and Further Reading

This post was originally published on Thriving.dev.

Introducing 'kafka-streams-cassandra-state-store'

Thriving.dev | Software Architecture — Fri, 02 Jun 2023 15:06:32 +0000

The Java library to be introduced - thriving-dev/kafka-streams-cassandra-state-store - is a Kafka Streams State Store implementation that persists data to Apache Cassandra.

It's a 'drop-in' replacement for the official Kafka Streams state store solutions, notably RocksDB (default) and InMemory.

By moving the state to an external datastore the stateful streams app (from a deployment point of view) effectively becomes stateless - which greatly improves elasticity, reduces rebalancing downtimes & failure recovery.

Cassandra/ScyllaDB is horizontally scalable and allows for huge amounts of data which provides a boost to your existing Kafka Streams application with very little change to the sourcecode.

In addition to the CassandraKeyValueStore this post will also cover all out-of-the box state store solutions, explain individual characteristics, benefits, drawbacks, and limitations in detail.

Following the introduction and getting started guide, there's also a demo available.

If you don't want to wait, feel free to head over to the Thriving.dev YouTube Channel.

The first public release was on 9 January 2023.
When writing this blog post the latest version was: 0.4.0 - available on Maven Central!

Basics Recap

(Feel free to skip straight to the next section if you're already familiar with Kafka Streams and Apache Cassandra…)

Kafka Streams

Quoting Apache Kafka - Wikipedia:

“Kafka Streams (or Streams API) is a stream-processing library written in Java. It was added in the Kafka 0.10.0.0 release. The library allows for the development of stateful stream-processing applications that are scalable, elastic, and fully fault-tolerant. The main API is a stream-processing domain-specific language (DSL) that offers high-level operators like filter, map, grouping, windowing, aggregation, joins, and the notion of tables. Additionally, the Processor API can be used to implement custom operators for a more low-level development approach. The DSL and Processor API can be mixed, too. For stateful stream processing, Kafka Streams uses RocksDB to maintain local operator state. Because RocksDB can write to disk, the maintained state can be larger than available main memory. For fault-tolerance, all updates to local state stores are also written into a topic in the Kafka cluster. This allows recreating state by reading those topics and feed all data into RocksDB.”

In case you are entirely new to Kafka Streams, I recommend to get started with reading some official materials provided by Confluent, e.g. Introduction Kafka Streams API

Apache Cassandra

Quoting Apache Cassandra - Wikipedia:

“Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.”

Purpose

While Wikipedia’s summary (see above) only mentions RocksDB, Kafka Streams ships with following KeyValueStore implementations:

org.apache.kafka.streams.state.internals.RocksDBStore
org.apache.kafka.streams.state.internals.InMemoryKeyValueStore
org.apache.kafka.streams.state.internals.MemoryLRUCache

Let’s look at the traits of each store implementation in more detail…

RocksDBStore

RocksDB is the default state store for Kafka Streams.

The RocksDBStore is a persistent key-value store based on RocksDB (surprise!). State is flushed to disk, allowing the state to exceed the size of available memory.

Since the state is persisted to disk, it can be re-used and does not need to be restored (changelog topic replay) when the application instance comes up after a restart (e.g. following an upgrade, instance migration, or failure).
The RocksDB state store provides good performance and is well configured out of the box, but might need to be tuned for certain use cases (which is no small feat and requires understanding of RocksDB configuration). Writing to and reading from disk comes with I/O, for performance reasons buffering and caching patterns are in place. The record cache (on heap) is particularly useful for optimising writes by reducing the number of updates to local state and changelog topics. The RocksDB block cache (off heap) optimises reads.

Info: In a typical modern setup stateful Kafka Streams applications run on Kubernetes as a StatefulSet with persistent state stores (RocksDB) on PersistentVolumes.

InMemoryKeyValueStore

The InMemoryKeyValueStore, as the name suggests, maintains state in-memory (RAM).

One obvious benefit is that the pure in-memory stores come with good performance (operates in RAM…). Further, hosting and operating are simpler compared to RocksDB, since there is no requirement to provide and manage disks.

Drawbacks for having the store in-memory are limitations in store size and increased infrastructure costs (RAM is more expensive than disk storage). Further, state always is lost on application restart and therefore first needs to be restored from changelog topics (recovery takes longer).

Tip: When low rebalance downtimes / quick recovery is concerned, using standby replicas (num.standby.replicas) help to reduce recovery time.

MemoryLRUCache (`Stores.lruMap`)

The MemoryLRUCache is an in-memory store based on HashMap. The term cache comes from the LRU (least recently used) behaviour combined with the maxCacheSize cap (per streams task!).

It’s a rather uncommon choice but can be a valid fit for certain use cases. Same as the InMemoryKeyValueStore state always is lost on application restart and is restored from changelog topics.

Note: maxCacheSize applies client-side only (in-memory HashMap, per streams task state store -> the least recently used entry is dropped when the underlying HashMap’s capacity is breached) but does not ‘cleanup’ the changelog topic (send tombstones). The (compacted) changelog topic keeps growing in size while the state available to processing is constrained by maxCacheSize.

Therefore, it is recommended to use in combination with custom changelog topic config cleanup.policy=[compact,delete] (also retention.ms) to have a time-based retention in place that satisfies your functional data requirements (if possible).

⚠️ Reminder: The maxCacheSize is applied per streams task (~input topic partitions), so take into consideration when calculating total capacity, memory requirements per app instance, …

CassandraKeyValueStore

Now finally we get to the subject to this blog post, the custom implementation of a state store that persists data to Apache Cassandra.

With CassandraKeyValueStore data is persistently stored in an external database -> Apache Cassandra <- or compatible solutions (e.g. ScyllaDB). Apache Cassandra is a distributed, clustered data store that allows to scale horizontally to enable up to Petabytes of data, thus **very large Kafka Streams state* can be accommodated.

Moving the state into an external data store - outside the application so to say - allows you to effectively run the app in a stateless fashion. Further, with logging disabled, there's no changelog topic -> no state restore required which enables fluent rebalancing, helps reduce rebalance downtimes and reduce recovery time.

This greatly improves elasticity and scalability of your application, which opens up for more possibilities such as e.g. efficient & fluent autoscaling...

It can also help ease/avoid known problems with the Kafka Streams specific task assignment such as 'uneven load distribution' and 'idle consumers' (I'm thinking about writing a separate blog post on these issues...).

Tip: Kafka Streams property internal.leave.group.on.close=true allows to achieve low rebalance downtimes by telling the consumers to send a LeaveGroup request to the group leader on graceful shutdown.

For more information on such kafka internals I can recommend to watch+read following Confluent developer guide: Consumer Group Protocol.

Note that this property is also used & explained in the demo.

⚠ Please be aware this is an in-official property (not part of the public API), thus can be deprecated or dropped any time.

⚠ Adding an external, 3rd party Software to the heart (or rather stomach?) to your stream processing application, adds a new, additional single point of failure to your architecture.

Usage Example

Get it!

The artifact is available on Maven Central:

Maven

<dependency>
    <groupId>dev.thriving.oss</groupId>
    <artifactId>kafka-streams-cassandra-state-store</artifactId>
    <version>${version}</version>
</dependency>

Gradle (Groovy DSL)

implementation 'dev.thriving.oss:kafka-streams-cassandra-state-store:${version}’

Classes of this library are in the package dev.thriving.oss.kafka.streams.cassandra.state.store.

Quick Start

High-level DSL <> StoreSupplier

When using the high-level DSL, i.e., StreamsBuilder, users create StoreSuppliers that can be further customized via Materialized.

For example, a topic read as KTable can be materialized into a cassandra k/v store with custom key/value serdes, with logging and caching disabled:

StreamsBuilder builder = new StreamsBuilder();
KTable<Long,String> table = builder.table(
  "topicName",
  Materialized.<Long,String>as(
                 CassandraStores.builder(session, "store-name")
                         .keyValueStore()
              )
              .withKeySerde(Serdes.Long())
              .withValueSerde(Serdes.String())
              .withLoggingDisabled()
              .withCachingDisabled());

Processor API <> StoreBuilder

When using the Processor API, i.e., Topology, users create StoreBuilders that can be attached to Processors.

For example, you can create a cassandra stringKey value store with custom key/value serdes, logging and caching disabled like:

Topology topology = new Topology();

StoreBuilder<KeyValueStore<String, Long>> storeBuilder = Stores.keyValueStoreBuilder(
                CassandraStores.builder(session, "store-name")
                        .keyValueStore(),
                Serdes.String(),
                Serdes.Long())
        .withLoggingDisabled()
        .withCachingDisabled();

topology.addStateStore(storeBuilder);

Demo

Features the notorious 'word-count example', written as a quarkus application, running in a fully clustered docker-compose localstack.

Source code for this demo: kafka-streams-cassandra-state-store/examples/word-count-quarkus (at 0.4.0)

Store Types

kafka-streams-cassandra-state-store comes with 2 different store types:

keyValueStore
globalKeyValueStore

keyValueStore (recommended default)

A persistent KeyValueStore<Bytes, byte[]>.
The underlying cassandra table is partitioned by the store context task partition.
Therefore, all CRUD operations against this store always query by and return results for a single stream task.

globalKeyValueStore

A persistent KeyValueStore<Bytes, byte[]>.
The underlying cassandra table uses the record key as sole /PRIMARY KEY/.
Therefore, all CRUD operations against this store work from any streams task and therefore always are “global”.
Due to the nature of cassandra tables having a single PK (no clustering key), this store supports only a limited number of operations.

⚠ If you're planning to use this store type, please make sure to get a full understanding of the specifics by reading the relevant docs to understand its behaviour.

Advanced

For more detailed documentation, please visit the GitHub project…

Under the hood

Implemented/compiled with

Java 17
kafka-streams 3.4
datastax java-driver-core 4.15.0

Supported client-libs

Kafka Streams 2.7.0+ (maybe even earlier versions, but wasn’t tested further back)
Datastax java client (v4) 'com.datastax.oss:java-driver-core:4.15.0'
ScyllaDB shard-aware datastax java client (v4) fork 'com.scylladb:java-driver-core:4.14.1.0'

Supported databases

Apache Cassandra 3.11
Apache Cassandra 4.0, 4.1
ScyllaDB (should work from 4.3+)

Underlying CQL Schema

keyValueStore

Using defaults, for a state store named "word-count" following CQL Schema applies:

CREATE TABLE IF NOT EXISTS word_count_kstreams_store (
    partition int,
    key blob,
    time timestamp,
    value blob,
    PRIMARY KEY ((partition), key)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }

globalKeyValueStore

Using defaults, for a state store named "clicks-global" following CQL Schema applies:

CREATE TABLE IF NOT EXISTS clicks_global_kstreams_store (
    key blob,
    time timestamp,
    value blob,
    PRIMARY KEY (key)
) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }

Feat: Cassandra table with default TTL

Pro Tip:
Cassandra has a table option default_time_to_live (default expiration time (“TTL”) in seconds for a table) which can be useful for certain use cases where data (state) expires after a known time span.

Please note writes to cassandra are made with system time. The table TTL is applied based on time of write -> the time of the current record being processed (!= stream time).

The default_time_to_live can be defined via the builder withTableOptions method, e.g.:

CassandraStores.builder(session, "word-grouped-count")
        .withTableOptions("""
                compaction = { 'class' : 'LeveledCompactionStrategy' }
                AND default_time_to_live = 86400
                """)
        .keyValueStore()

Cassandra table partitioning (avoiding large partitions)

Kafka is persisting data in segments and is built for sequential r/w. As long as there’s sufficient disk storage space available to brokers, a high number of messages for a single topic partition is not a problem.

Apache Cassandra on the other hand can get inefficient (up to severe failures such as load shedding, dropped messages, and to crashed and downed nodes) when partition size grows too large.
The reason is that searching becomes too slow as search within partition is slow. Also, it puts a lot of pressure on (JVM) heap.

⚠ The community has offered a standard recommendation for Cassandra users to keep Partitions under 400MB, and preferably under 100MB.

For the current implementation, the cassandra table created for the ‘default’ key-value store is partitioned by the kafka partition key (“wide partition pattern”).
Please keep these issues in mind when working with relevant data volumes.
In case you don’t need to query your store / only lookup by key (‘range’, ‘prefixScan’; ref Supported operations by store type) it’s recommended to use globalKeyValueStore rather than keyValueStore since it is partitioned by the event key (:= primary key).

References:

blog post on Wide Partitions in Apache Cassandra 3.11 Note: in case anyone has funded knowledge if/how this has changed with Cassandra 4, please share in the comments below!!

stackoverflow question

Known Limitations

Adding additional infrastructure for data persistence external to Kafka comes with certain risks and constraints.

Consistency

Kafka Streams supports at-least-once and exactly-once processing guarantees. At-least-once semantics is enabled by default.

Kafka Streams exactly-once processing guarantees is using Kafka transactions. These transactions wrap the entirety of processing a message throughout your streams topology, including messages published to outbound topic(s), changelog topic(s), and consumer offsets topic(s).

This is possible through transactional interaction with a single distributed system (Apache Kafka). Bringing an external system (Cassandra) into play breaks this pattern. Once data is written to the database it can’t be rolled back in the event of a subsequent error / failure to complete the current message processing.

⚠ => If you need strong consistency, have exactly-once processing enabled (streams config: processing.guarantee="exactly_once_v2"), and/or your processing logic is not fully idempotent then using kafka-streams-cassandra-state-store is discouraged!

ℹ️ Please note this is also the case when using kafka-streams with the native state stores (RocksDB/InMemory) with at-least-once processing.guarantee (default).

For more information on Kafka Streams processing guarantees, check the sources referenced below.

Incomplete Implementation of Interfaces `StateStore`

For now, only KeyValueStore is supported (vs. e.g. WindowStore/SessionStore).
Also, not all methods have been implemented. Please check store types method support table above for more details.

Next Steps

Here are some of the tasks (high level) in the current backlog:

Features
- Implement KIP-889: Versioned State Stores (coming soon with Kafka 3.5.0 release)
- Add a simple (optional) InMemory read cache -> Caffeine?
- Support WindowStore / SessionStore
Non-functional
- Benchmark
- Add metrics
Ops
- GitHub actions to release + publish to maven central (snapshot / releases)
- Add Renovate

Interested to contribute? Please reach out!

Conclusion

It's been a fun journey so far, starting from an initial POC, to a working library published to maven central - though still to be considered 'experimental', since it's not been production-tested yet.

The out-of-the-box state stores satisfy most requirements, no need to switch without necessity.
Still it's a usable piece of software that may fill a gap for specific requirements.

I'm looking forward to work on next steps such as benchmarking / load testing.

Feedback is very welcome, also, if you are planning to, or have decided to use the library in a project, please leave a comment below.

Footnotes

At the time of writing this blog post the latest versions of relevant libs were

Kafka / Streams API: 3.4.0
Cassandra java-driver-core: 4.15.0
kafka-streams-cassandra-state-store: 0.4.0

References

This is a re-publish of https://thriving.dev/blog/introducing-kafka-streams-cassandra-state-store.

DEV Community: Thriving.dev | Software Architecture

Reduce Rebalance Downtime (by x450) for Stateless Kafka Streams Apps [Simple Steps]

Theory

Regular Consumer Behaviour

!= Streams Consumer Behaviour

Example Scenario: Kubernetes Pod Evicted … and Replaced

😵 Wait, 48s? Really???

Option 1: Lower consumer session timeout

Option 2: Enable ‘leaveGroupOnClose’

Re-do the Example with ‘leaveGroupOnClose’ 🚀

Pro Tips

Stateless <> Stateful

k8s Deployment .spec.minReadySeconds

Conclusion

Footnotes

References and Further Reading

Introducing 'kafka-streams-cassandra-state-store'

Basics Recap

Kafka Streams

Apache Cassandra

Purpose

RocksDBStore

InMemoryKeyValueStore

MemoryLRUCache (Stores.lruMap)

CassandraKeyValueStore

Usage Example

Get it!

Maven

Gradle (Groovy DSL)

Quick Start

High-level DSL <> StoreSupplier

Processor API <> StoreBuilder

Demo

Store Types

keyValueStore (recommended default)

globalKeyValueStore

Advanced

Under the hood

Implemented/compiled with

Supported client-libs

Supported databases

Underlying CQL Schema

keyValueStore

globalKeyValueStore

Feat: Cassandra table with default TTL

Cassandra table partitioning (avoiding large partitions)

Known Limitations

Consistency

Incomplete Implementation of Interfaces StateStore

Next Steps

Conclusion

Footnotes

References

MemoryLRUCache (`Stores.lruMap`)

Incomplete Implementation of Interfaces `StateStore`