DEV Community: Brice LEPORINI

Breaking Free from Chaos: Kafka's Epic Quest to Save Microservices from Circuit Breaker

Brice LEPORINI — Mon, 29 May 2023 10:19:21 +0000

In the realm of microservices architecture, developers often encounter challenges when it comes to handling the resilience and fault tolerance of distributed systems. To address these challenges, the Circuit Breaker pattern emerged as a popular solution. Originally introduced as for managing faults in distributed systems, it aimed to prevent cascading failures and provide a fallback mechanism.

A common example of such kind of failure can be an e-commerce system with three microservices: Inventory, Payment, and Order.

The Inventory service experiences a sudden surge in traffic during a sale event, causing it to slow down or become unresponsive.
The Payment service, relying on the Inventory service to check product availability, starts experiencing delays in processing payments due to the slow response from Inventory.
The Order service, depending on both Inventory and Payment services, faces issues in processing customer orders, leading to delays and potential errors in the order fulfillment process.

By implementing the Circuit Breaker pattern, you can mitigate the impact of cascading failures. The Circuit Breaker would detect the failures in the Inventory Service and trip, temporarily isolating it. This allows the Payment Service and the Order Service to avoid unnecessary requests and quickly fail over to alternative mechanisms, such as using cached data or providing fallback responses. By preventing the propagation of failures, the Circuit Breaker pattern helps to maintain the stability and resilience of the system.

Why should you limit its usage?

First of all, a Microservices architecture is designed to provide autonomy and independent scalability for individual services. However, when the Circuit Breaker pattern is applied, it de-facto means a certain level of dependency and coupling between services. This interference with service autonomy can undermine the very principles of microservices architecture. This kind of design is what I use to name a distributed monolith instead of a microservice architecture implementing a proper autonomy principle. Therefore, a systematic use of the Circuit Breaker might be a clue that your architecture should be revisited.

Then it adds an additional layer of complexity to your microservices architecture. Each service needs to include circuit breaker logic, making the code more intricate and harder to maintain. As the number of services and interdependencies grow, managing circuit breakers becomes increasingly challenging, leading to a more convoluted system. Also, it raises a couple of questions like :

What's the expected behavior when the circuit breaker trips?
Fall back to a predefined value?
Return a cached response from a previous call?
Raise an error code?

All those points need to be discussed with the business and implemented carefully. Long story short it means that you need to implement infrastructure management related code in your application while you should focus on the business logic.

Event-Driven Architecture to the Rescue

An event-driven architecture can foster loose coupling and reduce the reliance on the Circuit Breaker pattern. Indeed, services can communicate through events rather than direct synchronous calls. Events are produced when significant actions or changes occur, and other services consume those events to react accordingly. This asynchronous nature of communication reduces tight coupling between services as they don't have direct dependencies on each other's interfaces. By decoupling services through events, the need for direct service-to-service interactions decreases. This means there are fewer points of failure and potential cascading failures, reducing the need for extensive usage of the Circuit Breaker pattern. This approach reduces the need for synchronous calls and minimizes the chances of failures propagating throughout the system. With looser coupling and the absence of tight dependencies, the Circuit Breaker pattern becomes less necessary for isolating failures.

That being said, it's important to bear in mind that in an event-driven architecture, services maintain eventual consistency rather than immediate consistency. They react to events and update their own state asynchronously, which allows them to operate independently without relying on immediate responses from other services.

Kafka

That kind of architecture is often paired with Kafka because it is built upon a distributed commit log design. It maintains an immutable, ordered sequence of records, or "log," allowing events to be written, stored, and consumed in the order they occurred. This inherent log-based architecture makes Kafka highly suitable for capturing, storing, and processing streams of events at scale.

In addition, Kafka's fault tolerance, scalability, event retention, stream processing capabilities, exactly-once semantics, and ecosystem make it uniquely suited as a foundational component for building robust and scalable event-driven architectures. While other messaging systems can fulfill some aspects of event-driven architectures, Kafka's specific design and features make it a powerful and instrumental choice for event streaming, data processing, and reliable event-driven communication at scale.

While the Circuit Breaker pattern can provide some benefits in terms of fault tolerance and resilience, it should be used judiciously in a microservices-oriented architecture. The increased complexity, operational overhead, delayed feedback, interference with service autonomy, and lack of granularity are important factors to consider when deciding whether to adopt this.

As a conclusion, I would say that this decision is obviously not binary, the rule of thumb should be avoiding the Circuit Breaker, however implementing an event driven architecture might not be possible for the whole information system, as you may have to deal with external APIs out of your control or legacy systems that may be too expensive to refactor to emit events.

Kafka: Let's talk (again) about replication

Brice LEPORINI — Mon, 24 Apr 2023 07:45:18 +0000

Like in many others distributed systems, Kafka leverages data replication to implement reliability and everyone that has tapped into Kafka knows about the replication factor configuration property that is required to create any topic. So why writing a paper in 2023 to talk about that? Well in fact things have evolved other time and data replication is now a broader topic than just the partition replication factor, so let me give you an overview of the current options.

Replication factor

Just a brief recap of this capability that is at the roots of Kafka itself. Data in a production Kafka cluster is replicated 3 times: once in the partition that is the leader in the replica set and two additional times in the followers. In fact, 3 is not enforced but when it comes to replication having 3 copies, it’s the commonly accepted minimum, it permits the data to still be available despite one failure aor a maintenance. Another instrumental configuration parameter is complementing this: min.insync.replicas. The in-sync replica set is, as its name says, the set of replicas made up of the leader and all other replicas in sync with the leader. So, considering a producer set with acks=all, a record write is considered successful if at least min.insync.replicas were able to acknowledge the write. So with a Replication Factor of 3, the usual min ISR value is set to 2, that way if one broker is unavailable for any reason, whether because of a failure or any planned operation like an upgrade, then a partition can still accept new records, giving the guarantee that they will be replicated even in this degraded configuration. This is why a minimal cluster requires 3 brokers, but 4 is recommended to keep the topic creation availability in case of a broker loss.

On the producer side, in the majority of the use cases, acks=all is the standard setting for all the reasons explained above. Note that even if min.insync.replicas=2, during nominal operations, most of the time the ISR set counts all 3 replicas. Hence, this makes this replication process synchronous.

External replication with MirrorMaker or Confluent Replicator

External replication is when records are replicated to another cluster. There are various reasons for doing that :

Sharing data between two locations that are too far to support synchronous replication: imagine applications hosted in the US east coast producing records and other consumers located in the west coast, implying too much network latency. Another reason is if you need to share data in real time with partners, and you want to copy the records from a set of topics to a foreign cluster managed by the partner.
DR scenarios: if your organization has 2 and only 2 DC, then you can't stretch the cluster, and you need to run two distinct clusters and replicate records from the primary to the DR one. But this also applies if you're hosting your cluster in the public cloud in three availability zones, and your business is so critical that you want to cover the risk of a complete region loss.
When you have on-prem applications like legacy core banking systems or mainframe applications and you need to stream data to new generation applications hosted in the public cloud, one good way to implement that kind of hybrid scenario is to replicate the on-prem cluster to a fully managed one in the cloud. It also drastically helps to streamline the network round trips as there's only one kind of flow to govern, and you can read multiple times the same data and pay the network cost between your DC and the cloud only once.
Data migration between two Kafka clusters

So tools like MirrorMaker and Confluent Replicator allow that kind of cross-cluster replication. You can see them like external applications consuming records from one side and producing on the other side, obviously the reality is a bit more complex as they're covering a wide range of edge cases. Both of them are implemented as Kafka Connect connectors, note that MirrorMaker version 1 is not a connector. So as they're replicating beyond the ISR set, this makes this kind of replication asynchronous by design, and you should also pay attention to the fact that you can't guarantee that all records are replicated at the moment of a complete disaster, so the guaranteed RPO can't be 0.

Asynchronous replication with Cluster Linking

We saw that asynchronous replication makes sense in some scenarios, especially to avoid any latency impact on the producer side. External replication is one option for doing so, but it also comes with a couple of challenges to care about:

those tools are external to the broker, which implies additional resource to manage, in a fault-tolerant manner and with the proper monitoring.
as consumer offsets are stored in a distinct topic, the consumer offsets can't be preserved, so offsets need to be translated from one cluster to another, this topic is extensively covered in the documentation.

This is where Cluster Linking comes into play. It's a feature offered by Confluent Server, which can be seen as a broker on steroids with a wide set of added capabilities, and Cluster Linking is one of them. Here the game changer is that as the replication is a feature internal to the broker, so it makes a byte-to-byte replication allowing to preserve the consumer offsets from the source cluster to the destination one. The other benefit is the reduction of the footprint on the infrastructure as there's no need to manage external components for that matter.
Cluster Linking is also available on Confluent Cloud.

Asynchronous intra-cluster replication

At that stage you should wonder how would it be possible to have asynchronous replication as the followers are expected to be part of the ISR set? This is the trick: Confluent introduced an additional kind of replica: the observer. It's another additional feature from the Confluent Server and it's different in the sense it's not part of the ISR set, which allows replicating asynchronously the leader.

Ok, so now let's talk about the use cases where this feature can fit. As formerly mentioned, if you need to share data with some applications that are hosted far away from the producer, implying a latency beyond the acceptable from a producer's perspective, then building a Multi Region Cluster spanned across those two places makes sense. It relies on the follower fetching feature that was introduced in Kafka 2.4.

Another very interesting scenario is when you combine observers and Automatic Observer Promotion because it unlocks the option to stretch the cluster across only 2 DC for the data plane. It's quite common in many organizations to have only 2 DC but remember that the control plane is implemented with Zookeeper, which is quorum based, so it needs an odd number of locations in order to maintain the quorum in case of a DC loss. So, if using the public cloud to host a Zookeeper tie-breaker is an option, which is usually accepted by Info Sec teams as no business data is managed by Zookeeper, then it's possible to overcome the 2 DC limitation mentioned previously. This is what we call 2.5 DC Architecture, to learn more, see this blog post: Automatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1. The main benefits of using a stretched cluster rather than replicated clusters are that you don't need to restart and reconfigure the client applications on the DR; as it's a single cluster, then you need fewer components and more importantly, you can guarantee that the RPO will be 0, meaning no data loss in the event of a unavailable DC.

I hope this gives clarification on all available options in terms of replication, however if you still need help to figure out what can be the appropriate setup for your use case, let's connect on LinkedIn and discuss about it!

Canary release with Kafka

Brice LEPORINI — Mon, 27 Mar 2023 13:56:53 +0000

When a new version for a product has to be released, it has always been key as it's a critical moment for engineering teams as well as for operation teams; mainly because it's usually managed as a big bang switch over. So during this operation, if something bad happens like a migration procedure that is not running as expected, or the system is not working like it should be, whether it's a functional issue or a performance one, it puts a lot of pressure on the teams because it has a direct impact on the business.

Canary release, or sometimes called friends and family, is a more and more common practice when a new version of a software is rolled out. It consists in keeping both versions running side by side and gradually increasing the traffic managed by the new version during the sunset phase of the old one. Then it becomes way easier to check that no major bug is impacting the new version or to verify that it's able to deal with the real life traffic. Long story short, whatever the problem can be faced, the chances that it impacts the business are drastically dropping down.

The term Canary Release comes from the time when people were working underground in the mines and the risk was if the level of CO2 in the air increased, as it's scentless, miners died silently. To cover that risk, they used to work with a canary bird in a cage and if the canary dies, it means that the level of CO2 increased dangerously, as this kind of bird is even more sensitive than humans to CO2.

When thinking about how to implement that for a REST API or a web application, the usual way is to use a reverse proxy to balance the traffic across the two versions, but when considering data streaming, how to implement that with data processors while maintaining all guarantees that Kafka offers? This is the challenge I faced and this paper is to share my proposition to tackle it.

Let's put some basic requirements

First of all, it shouldn't have any impact on the design or on the code neither of the producer nor of the consumer. Then the producer shouldn't be aware that the traffic is split down to two different processors for the same purpose, which is a common concept between loose coupled applications. Finally, it must guarantee that every record is processed, and processed by only one consumer.

Scenario

Let's consider that you want to introduce a new version and at first it will only deal with 10% of the traffic. Usual verifications are applied: no excessive consumer lag, no functional regressions or bug. Then you plan to gradually increase the traffic to be applied and at the final stage, completely switch to the new version, allowing you to dispose the old one.

Using ksqlDB as a router

It is the basic idea. As the traffic needs to be split in two based on a ratio, we can leverage the RANDOM that returns a value between 0.0 and 1.0.

create or replace stream original 
    (original_key <key_type> key) 
    with (...);
create or replace stream legacy_version 
    with(...) as 
    select * from original where random() > 0.1 ;
create or replace stream new_version
    with(...) as 
    select * from original where random() <= 0.1 ;

Please notice that the Schema Registry utilization offers more concise definitions. As the outcome for a persistent query is one and only one topic, then you need two persistent queries.

Not that fast

That way the amount of traffic processed by both versions is compliant with the ratio defined, however, due to the nature of the random function, there's neither the guarantee that every record is processed nor that each one is processed only once. In order to work around that, we need to set in stone the assigned ratio to each record and then apply the routing:

create or replace stream original 
    (original_key <key_type> key) 
    with (...);
create or replace stream original_rated 
    with(...) as 
    select *, random() as rate from original;
create or replace stream legacy_version
    with(...) as
    select * from original_rated where rate > 0.1 ;
create or replace stream new_version 
    with(...) as 
    select * from original_rated where rate <= 0.1 ;

In order to start the canary release transition, the former version is stopped, then the queries above are started and the two versions of the service are started, reconfigured to consume the assigned topics.

Looks good, but that's not enough.

The default ksqlDB behavior when running a query is to read the topic from the latest offset. That means that the sequence above is leading to data loss as it's quite sure that messages are pushed to the topic while running the transition from the normal to the canary release. As a consequence, the rating query is required to start consuming the input topic from the last offsets committed by the legacy service before it stopped. Unfortunately, ksqlDB doesn't offer the capability to define starting offsets for a given query. The only options are the value accepted in the query configuration parameter auto.offset.reset: earliest or earliest. But the game is not yet over as the language gives access to the offset and partition of each record, thanks to the pseudo columns.

So the procedure requires additional steps: after the former version is stopped, for each partition, collect the committed offset for the consumer group assigned to the legacy and build the rating query accordingly in order to start consuming only unprocessed messages, as an example:

SET 'auto.offset.reset'='earliest';
create or replace stream original (...);
create or replace stream original_rated with(...) as 
    select *, random() as rate from original 
    where  
        (ROWPARTITION=0 and ROWOFFSET > 165)  OR  
        (ROWPARTITION=3 and ROWOFFSET > 176)  OR  
        (ROWPARTITION=1 and ROWOFFSET > 149)  OR  
        (ROWPARTITION=5 and ROWOFFSET > 151)  OR  
        (ROWPARTITION=2 and ROWOFFSET > 152)  OR  
        (ROWPARTITION=4 and ROWOFFSET > 167) ;

It does the trick but also comes with the drawback that it requires to stream the entire content of the input topic. If the amount of data stored in the partitions is huge, it can take a non-negligible time.

Way to go!

Now both versions can be started. Please notice that all those operations don't have any impact on the data producer and there's no impact at all on the applications, the only requirement is to provide the consumed topic as a configuration parameter.

Then, over time, the routing ratio can be revised running the following sequence :

Pause the rating query, not drop, otherwise offsets will be lost
Update the ratio from the two routing queries
Resume the rating query

Final stage: promote the new version

For obvious reasons, even if you assign 100% of the traffic to the new version, all of that can't stay like that forever as it's a waste of storage (the records are copied twice in three topics) as well as a waste of processing resources (each canary release implies three persistent queries). So a final procedure is required to safely reconfigure the new application to process messages from the original input topic:

Pause the rating query
Wait til the consumer groups of the two downstream services have zero lag on all partitions
Collect the offsets for the consumer groups of the two applications
For each partition: compute the offset the new version should start from in the original topic considering adding the last offset of the former version in the original partition + the offset of the legacy in the filtered rated partition + the offset of the new version in its respective partition; reset the offset for this partition group to this resulting offset
Start the new version configured to consume the original topic and dispose everything else, that's it!

Automate it!

Now that every step is detailed, why not build some tooling to make it automatic? This is what I did in Canary Router. It's based on a couple of shell scripts and comes with minimal requirements. It mainly leverages Confluent Cloud resources, so the only requirement is to sign up, you'll be awarded of 400$ of free credits to spin up streaming resources, which is more than enough for a basic Kafka cluster and a small ksqlDB cluster. It's definitely not production ready, but it shows a path for running that kind of operation, and can be easily reimplemented in Java or Go to make it a proper tool.

The demo is based on a Datagen source connector and the downstream services are nothing more than containers running dumb kafka-console-consumer. If you want to test it with real life consumers, just implement the (start|stop)_(legacy|new)_service four shell functions to pass as a context to the scripts, like the one implemented in context.sh.

Remarks and limitation

There's a script that checks that the number of messages in the new/legacy topics are consistent with the fact there's no duplicate and no data loss, however it's weak as the only reliable way would be to check every message and not just count.

This design works as long as the processing order is not a requirement. Even if there's no doubt that the messages are copied in the same partition numbers in the different topics, there's no guarantee that all messages with the same key are in the same topic. Then especially as the traffic is by design not evenly balanced, there will definitely be at some point some records that will be processed not in the order they were produced. That being said, I think a logic can be implemented in order to enforce that once a key becomes assigned to the new service, then all records will be sent to this topic. Feel free to comment or fork the repository and propose a pull request 😉.

Using OpenId Connect with Confluent Cloud

Brice LEPORINI — Mon, 27 Feb 2023 12:05:22 +0000

I hope you've already read my previous post about the capability that was added in Kafka 3.1 to authenticate applications using an external OpenId Connect identity provider. Now you also can do the same with Confluent Cloud. Initially, the only way to authenticate applications was to use API keys and secret managed in Confluent Cloud, but offering the capability to manage centrally accounts, credentials and authentication flows in a single identity provider is a common expectation in many organizations.

To set it up, it's quite easy and resides in two steps. First you need to declare a new identity provider for your Confluent Cloud organization. Azure and Okta are completely integrated, but let's focus on vanilla OpenId Connect. One good thing with OIDC is that this standard is completely discoverable, as an example you can freely dump the configuration for the Google OIDC service:

$ curl https://accounts.google.com/.well-known/openid-configuration
{
 "issuer": "https://accounts.google.com",
 "authorization_endpoint": "https://accounts.google.com/o/oauth2/v2/auth",
 "device_authorization_endpoint": "https://oauth2.googleapis.com/device/code",
 "token_endpoint": "https://oauth2.googleapis.com/token",
 "userinfo_endpoint": "https://openidconnect.googleapis.com/v1/userinfo",
 "revocation_endpoint": "https://oauth2.googleapis.com/revoke",
 "jwks_uri": "https://www.googleapis.com/oauth2/v3/certs",
[...]
}

.well-known/openid-configuration is an endpoint implemented by all providers and this is the only thing you need to set the identity provider in Confluent Cloud :

As a result, with the configuration URL, Confluent Cloud is able to automatically gather the issuer URI, but more importantly the JWKS, which provides the public keys to verify the JWTs.

The second step is to declare an identity pool. In fact, it's a way to define how JWT tokens issued by the IDP are qualified to be authenticated to the Confluent Cloud service :

For this demo, let's keep it simple. The claims.sub default value for the identity claim field is perfectly fine as it's a registered claim to identify the principal. Here's an example payload of a JWT (modified, not really issued by Google 😉):

{
  "iss": "https://accounts.google.com",
  "sub": "dZJPsd9oVtAciRY8F5lHzk4yS0hfnBiE@clients",
  "aud": "https://kafka.auth",
  "iat": 1672817905,
  "exp": 1672904305,
  "azp": "dZJPsd9oVtAciRY8F5lHzk4yS0hfnBiE",
  "scope": "scope",
  "gty": "client-credentials"
}

Then let's set that every JWT that comes with the https://kafka.auth value in the aud claim is valid. Notice that the audience claim can be an array of strings instead of a single valued field. This value is set in the IDP.

To finalize the creation, you need to bind roles and resources to this new identity pool, which is an usual operation for every Confluent Cloud administrator!

Now let's check that it's working with a dumb Kafka consumer. Thanks to the New Client wizard, getting a base configuration to start with is easy:

But you need to tweak it a bit to define how the Java application must request the JWT to provide Confluent Cloud, it's almost like what was showed in my previous post but in addition you need to set the JAAS sonfiguration with the logical cluster id and the identity pool id:

sasl.mechanism=OAUTHBEARER
sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler
sasl.login.connect.timeout.ms=15000
sasl.oauthbearer.token.endpoint.url=https://oauth2.googleapis.com/token
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \
clientId="dZJPsd9oVtAciRY8F5lHzk4yS0hfnBiE" \
clientSecret="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
extension_logicalCluster="lkc-000000" \
extension_identityPoolId="pool-XXXXX" ;

Then you can test it:

$ docker run --rm -ti -v $PWD:/work --workdir /work confluentinc/cp-kafka kafka-console-consumer --consumer.config config.properties --topic test --bootstrap-server pkc-xxxxxx.europe-west1.gcp.confluent.cloud:9092 --from-beginning
[2023-01-04 12:17:49,565] WARN These configurations '[basic.auth.credentials.source, acks, schema.registry.url, basic.auth.user.info]' were supplied but are not used yet. (org.apache.kafka.clients.consumer.ConsumerConfig)
{"ordertime":1497014222380,"orderid":18,"itemid":"Item_184","address":{"city":"Mountain View","state":"CA","zipcode":94041}}
{"ordertime":1497014222380,"orderid":18,"itemid":"Item_184","address":{"city":"Mountain View","state":"CA","zipcode":94041}}
^CProcessed a total of 2 messages

Give it an automation flavour...

All of that was manually set up, using graphical user interfaces and wizards in order to walk you gradually through this process, however modern organization requires an automated way to provision resources. Guess what, you have multiple options to do that with Confluent Cloud. The low level one is to use the Confluent Cloud REST API but more probably you will opt for the Terraform option. That way, you have a real Infrastructure as Code approach and it's completely embeddable in a global infrastructure definition. So feel free to read the Confluent Cloud Terraform provider documentation and especially the sections about the identity provider and about the identity pool.

Obviously all of that is only a initial introductory to OIDC integration in Confluent Cloud and I recommend having a look to the comprehensive documentation.

OpenID Connect authentication with Apache Kafka 3.1

Brice LEPORINI — Tue, 03 Jan 2023 09:49:28 +0000

Dear reader, this is not going to be fun because today we're talking about security. However, to make it less boring, this is about taking advantage of the support of OpenID Connect (OIDC) in Kafka 3.1, the foundation of Confluent Platform 7.1.

OpenID Connect

Let's start with a few words about OIDC. It's an open standard that completes OAuth2.0. The aim of this paper is not to get a proper introduction to OIDC, but let's emphasize some key differences with OAuth2.0.

First of all, in OAuth2.0, the token is nothing more than an opaque string that has to be verified against the authorization server to be trusted. OIDC uses JSON Web Token. It's a signed JSON document, base64 encoded. The cool thing is, as it's signed, applications can trust it without requiring any requests to the authorization server, so it implies only processing resources, which scales way better than point-to-point connections. The only element the application (here the application is a Kafka broker) needs is the public key for validating the token, and it's published by the authorization server, with another open specification, JWKS, and is easily cacheable.

Obviously, this is an extremely incomplete summary of OIDC. What I like about it is that it frees the application from authentication method complexity. With OIDC, the organization can opt for simple user/password authentication, MFA, biometrics, SSO, multiple authentication flows and other options: it has no impact on the application as long as it complies with the standard.

Putting it together with Kafka

Here, we're keeping it simple as the use case is to make an application to application authentication. So I'm using only client id and client secret. In order to make it as light as possible, the authorization server is Auth0, a fully managed service with a free tier. To set it up, I recommend reading the section Backend/API of the documentation. Kafka is part of the listed backend, but the Configure Auth0 APIs paragraph of any kind of backend fits for this PoC. You can feel free to opt for any other authentication provider as the standard is open and there are multiple implementation alternatives, self-managed as well as as-a-service.

The support of OIDC in Kafka 3.1 is an extension of an existing feature and is defined in KIP-768. The authentication flow is pretty simple:

During startup, the broker collects the public key set from the authorization server. The client starts by authenticating against the authorization server, then the latter issues a JWT. This token is then used for the SASL/OAUTHBEARER authentication. The broker now validates the token by verifying the signature and claims to clear the client.

To make it more fun, I'm using Kafka in KRaft mode (so without Zookeeper) based on this example running in Docker provided by Confluent.

The first step is to validate the Auth0 setup, and Kafka comes with a handy command line tool:

$ docker run -ti --rm confluentinc/cp-kafka:7.1.0 kafka-run-class org.apache.kafka.tools.OAuthCompatibilityTool \
  --clientId XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
  --clientSecret XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX \
  --sasl.oauthbearer.jwks.endpoint.url https://xxxx-xxxxx.us.auth0.com/.well-known/jwks.json \
  --sasl.oauthbearer.token.endpoint.url https://xxxx-xxxxx.us.auth0.com/oauth/token \
  --sasl.oauthbearer.expected.audience https://kafka.auth

PASSED 1/5: client configuration
PASSED 2/5: client JWT retrieval
PASSED 3/5: client JWT validation
PASSED 4/5: broker configuration
PASSED 5/5: broker JWT validation
SUCCESS

All configurations come from the authentication provider. Any other kind of output would require you to have a look at the Auth0 configuration.
Now let's tweak the broker configuration:

version: '2'
services:

  broker:
    image: confluentinc/cp-kafka:7.1.0
    hostname: broker
    container_name: broker
    ports:
      - "9092:9092"
      - "9101:9101"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,OIDC:SASL_PLAINTEXT'
      KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker:29092,OIDC://localhost:9092'
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_JMX_PORT: 9101
      KAFKA_JMX_HOSTNAME: localhost
      KAFKA_PROCESS_ROLES: 'broker,controller'
      KAFKA_NODE_ID: 1
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker:29093'
      KAFKA_LISTENERS: 'PLAINTEXT://broker:29092,CONTROLLER://broker:29093,OIDC://0.0.0.0:9092'
      KAFKA_INTER_BROKER_LISTENER_NAME: 'PLAINTEXT'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
      KAFKA_SASL_ENABLED_MECHANISMS: OAUTHBEARER
      KAFKA_SASL_OAUTHBEARER_JWKS_ENDPOINT_URL: $JWKS_ENDPOINT_URL
      KAFKA_OPTS: -Djava.security.auth.login.config=/tmp/kafka_server_jaas.conf
      KAFKA_SASL_OAUTHBEARER_EXPECTED_AUDIENCE: $OIDC_AUD
      KAFKA_LISTENER_NAME_OIDC_OAUTHBEARER_SASL_SERVER_CALLBACK_HANDLER_CLASS: org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler
    volumes:
      - ./update_run.sh:/tmp/update_run.sh
      - ./kafka_server_jaas.conf:/tmp/kafka_server_jaas.conf
      - ./client.properties:/tmp/client.properties
    command: "bash -c 'if [ ! -f /tmp/update_run.sh ]; then echo \"ERROR: Did you forget the update_run.sh file that came with this docker-compose.yml file?\" && exit 1 ; else /tmp/update_run.sh && /etc/confluent/docker/run ; fi'"

Here are the differences with the original example:

$ diff compose.yml compose.ori.yml
14,15c14,15
<       KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,OIDC:SASL_PLAINTEXT'
<       KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker:29092,OIDC://localhost:9092'
---
>       KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: 'CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT'
>       KAFKA_ADVERTISED_LISTENERS: 'PLAINTEXT://broker:29092,PLAINTEXT_HOST://localhost:9092'
25c25
<       KAFKA_LISTENERS: 'PLAINTEXT://broker:29092,CONTROLLER://broker:29093,OIDC://0.0.0.0:9092'
---
>       KAFKA_LISTENERS: 'PLAINTEXT://broker:29092,CONTROLLER://broker:29093,PLAINTEXT_HOST://0.0.0.0:9092'
29,33d28
<       KAFKA_SASL_ENABLED_MECHANISMS: OAUTHBEARER
<       KAFKA_SASL_OAUTHBEARER_JWKS_ENDPOINT_URL: $JWKS_ENDPOINT_URL
<       KAFKA_OPTS: -Djava.security.auth.login.config=/tmp/kafka_server_jaas.conf
<       KAFKA_SASL_OAUTHBEARER_EXPECTED_AUDIENCE: $OIDC_AUD
<       KAFKA_LISTENER_NAME_OIDC_OAUTHBEARER_SASL_SERVER_CALLBACK_HANDLER_CLASS: org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerValidatorCallbackHandler
36,37d30
<       - ./kafka_server_jaas.conf:/tmp/kafka_server_jaas.conf
<       - ./client.properties:/tmp/client.properties

Long story short, the external listener has been renamed and configured to use SASL_PLAINTEXT with the OAUTHBEARER mechanism. Notice that the coordinates of the authorization service are provided with environment variables in order to keep it generic.

The JAAS configuration is pretty basic:

KafkaServer {
    org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required;
};

Now let’s start the broker:

$ JWKS_ENDPOINT_URL=https://xxxx-xxxxx.us.auth0.com/.well-known/jwks.json OIDC_AUD=https://kafka.auth compose up
[+] Running 1/1
 ⠿ Container broker  Created                       0.1s
Attaching to broker
broker  | ===> User
broker  | uid=1000(appuser) gid=1000(appuser) groups=1000(appuser)
broker  | ===> Configuring ...
broker  | ===> Running preflight checks ...
broker  | ===> Check if /var/lib/kafka/data is writable ...
broker  | ===> Check if Zookeeper is healthy ...
broker  | ignore zk-ready  40
broker  | Formatting /tmp/kraft-combined-logs
broker  | ===> Launching ...
broker  | ===> Launching kafka ...
[...]
broker  | [2022-04-15 06:35:13,095] INFO KafkaConfig values:
broker  |   advertised.listeners = PLAINTEXT://broker:29092,OIDC://localhost:9092
[...]
broker  |   listener.security.protocol.map = CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,OIDC:SASL_PLAINTEXT
broker  |   listeners = PLAINTEXT://broker:29092,CONTROLLER://broker:29093,OIDC://0.0.0.0:9092
[...]
broker  |   sasl.enabled.mechanisms = [OAUTHBEARER]
[...]
broker  |   sasl.oauthbearer.expected.audience = [https://kafka.auth]
[...]
broker  |   sasl.oauthbearer.jwks.endpoint.url = https://xxxx-xxxxx.us.auth0.com/.well-known/jwks.json
[...]
broker  | [2022-04-15 06:35:13,159] INFO [BrokerLifecycleManager id=1] The broker has been unfenced. Transitioning from RECOVERY to RUNNING. (kafka.server.BrokerLifecycleManager)

Good stuff! Next, let's configure the client:

security.protocol=SASL_PLAINTEXT
sasl.mechanism=OAUTHBEARER
sasl.login.callback.handler.class=org.apache.kafka.common.security.oauthbearer.secured.OAuthBearerLoginCallbackHandler
sasl.login.connect.timeout.ms=15000
sasl.oauthbearer.token.endpoint.url=https://xxxx-xxxxx.us.auth0.com/oauth/token
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule required \
clientId="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
clientSecret="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" ;

We're now good to go with some basic produce and consume tests. You may have noticed that I also mounted the client configuration file in the broker container, it's pure convenience to run the clients in the same container:

$ docker exec -ti broker kafka-console-producer --producer.config /tmp/client.properties --bootstrap-server localhost:9092 --topic test
>Hello OIDC!
>%

And in a different terminal:

$ docker exec -ti broker kafka-console-consumer --consumer.config /tmp/client.properties --bootstrap-server localhost:9092 --topic test
Hello OIDC!
^CProcessed a total of 1 messages

That's it!

Running the client without the proper configuration raises errors on both sides, testifying that the broker is rejecting it as expected:

$ docker exec -ti broker kafka-console-consumer --bootstrap-server localhost:9092 --topic test
[2022-04-15 06:57:28,564] WARN [Consumer clientId=console-consumer, groupId=console-consumer-9357] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)

broker  | [2022-04-15 06:57:28,247] INFO [SocketServer listenerType=BROKER, nodeId=1] Failed authentication with /127.0.0.1 (Unexpected Kafka request of type METADATA during SASL handshake.) (org.apache.kafka.common.network.Selector)

If you increase the level of the org.apache.kafka.common.security logger, you'll be able to see the token parsed.

For reference, I also recommend reading the SASL/OAUTHBEARER documentation.

ksqlDB : pull queries on streams

Brice LEPORINI — Mon, 14 Nov 2022 12:54:57 +0000

ksqlDB (initally named KSQL) is not a new product as the first preview versions were released more than four years ago now. And since the beginning, there’s been a common misunderstanding: developers think at first that KSQL is a language to interrogate topic content. And this isn’t the DNA : as a streaming database; its purpose is to offer the means to process data in real time, at scale and in a resilient manner, with various high level features such as joining streams of events, continuously aggregating data, etc.

However, the fact that developers had that kind of expectation from the product is not a surprise; trying to do things in the way you know doing it ever since sounds natural. The challenge is paradigm shift. As a solutions engineer, trying to ease the change of culture and to open eyes on the differences between data at rest and data in motion is my daily job.

That being said, there’s no point in opposing reacting on events to interrogating states, both are valid designs to be used properly depending on the needs. I guess that’s why ksqlDB evolved over time and offered pull queries in v0.6.0. At first it was only available for materialized views, then to tables, and now it’s finally available for streams.

Yes, now you have the capability to write that kind of statement : SELECT ... from <a topic mapped as a stream> WHERE ...

As a pull query, it’s executed, the data is fetched and returned to the client then the connection is closed.

The right tool for the job

Before throwing away PostgreSQL and other databases of that kind, let’s take a step back and see what will be the impact of that kind of query. I created a small test with Confluent Cloud with a basic cluster and used the Datagen Source Connector in order to generate some dummy data, based on the inventory model. So I let it run till I had a significant amount of data on this topic, around 1GB. After that, I mapped the topic in ksqlDB:

CREATE STREAM inventory WITH(kafka_topic='inventory', value_format='AVRO');

Pay attention that thanks to the Schema Registry integration, the stream is automatically created with the expected data structure :

Then I executed a useless query that returns no records at all:

select * from  INVENTORY where id < -10;

As expected, the execution returned no record. What’s good with Confluent Cloud is that in addition to spinning up a cluster in a couple of seconds, it comes with an out-of-the-box metrics in the user interface, and the impact of that query can be checked almost immediately in the consumption graph:

You can clearly see that the whole content of the topic was scanned, generating a lot of outgoing traffic.

Thank you Captain Obvious, you’ve just demonstrated that Kafka is not a database. So in which case are these pull queries useful? Well, imagine a topic in which you have to do forensics, spotting a set of records against some criteria requires to build an application with a consumer that fully reads the topic content and applies the discriminant (in fact this is what a pull query does). Dramatically painful for a basic need. With pull queries on streams, the longest part is the execution 😎.

All of that is to illustrate what is briefly mentioned in the ksqlDB 0.23.1 blog post with tangible facts regarding the caution of use. If you want to know more about ksqlDB, head over to ksqldb.io to get started, where you can follow the quick start, read the docs, and learn more! Pro tip, I especially recommend checking out Confluent Cloud as it will give you the opportunity to have a complete working environment in a couple of minutes 😉.

Why you should not query a database in your stream processors

Brice LEPORINI — Thu, 24 Mar 2022 06:19:56 +0000

Enriching an event with data from another source is one of the more common use cases in event streaming. But where does the extra enrichment information for the event come from ? In Kafka Streams it could be easily written like this:

StreamsBuilder streamsBuilder = new StreamsBuilder();

streamsBuilder.stream("my_topic")
    .mapValues(order -> enrichRecord(order, findCustomerById(order.getCusutomerId())))
    .to("enriched_records");

Or in ksqlDB with a User Defined Function :

@UdfDescription(name = "find_customer_by_id",
                author = "Brice ",
                version = "1.0.2",
                description = "Finds a Customer entity based on its id.")
public class FindCustomerByIdUdf {

    @Udf(schema="...")
    public Struct findCustomerById(@UdfParameter int customerId) {
        [...]
        return customer;
    }
}

used like that in ksqlDB:

create stream enriched_records with(kafka_topic="enriched_records", ...) as
select
    order_id,
    [...]
    find_customer_by_id(customer_id)
from orders emit changes;

You can replace the database query with an external call of any kind: REST API request, lookup in a file, etc.
It compiles, it works and it looks like what we've been doing forever, so what’s the problem in doing this?

First of all, there’s a semantic issue because stream processing is expected to be idempotent, meaning that processing again and again the same stream of events should produce the same values, unless you change the implementation of the application, obviously… And involving a third party in order to provide data to enrich your stream breaks this property, because there’s no guarantee that the external call gives the same value each time you invoke it with the same arguments.

Then let’s talk about the architecture concerns. Kafka is a distributed system. Dealing with failures is part of its DNA and there are multiple architecture patterns in order to face almost any kind of outage. This is why Kafka is the first class choice as a central nervous system for many organizations. If you put in the middle of your pipeline a dependency to an external datastore that doesn’t provide the same guarantees, then the resilience and the performance of your application are now the ones offered by this foreign system… And it’s not uncommon to fetch data from a traditional RDBMS, don’t get me wrong. Those are really good tools providing great features but not with the same guarantee, and when it’s not available the whole pipeline is down, ruining your efforts to provide a resilient streaming platform.

My next point against this kind of design is when the external call produces a side effect (meaning each call creates or updates foreign data). In addition to the former point, it breaks the Exactly Once Semantics feature offered out-of-the-box by ksqlDB and Kafka Streams (and to vanilla Kafka client at the cost of some boilerplate) because in case of a failure of any kind during the processing of a record, there’s no means to automatically rollback changes in the remote system. Let’s illustrate it with a practical scenario: imagine the remote request increments a counter and during operations, one of the ksqlDB workers becomes unreachable for any reason. Then the workload is rebalanced to the survival instances and the last uncommitted batch of records is processed once again, meaning there are also unexpected increments in the foreign system. Hashtag data corruption.

This is a well known issue of lack of distributed transaction management… but lack may not be the right term because this is not something that’s expected to be implemented. Indeed, in the past there were options, like XA, to deal with distributed transactions, but it was really cumbersome to set up, and it provided real scalability concerns by design. So this is definitely not what you expect when building a data streaming platform able to process GB of records per seconds!

So how to sort this out?

Usually data enrichment is nothing more than data lookup and record merging, so the best way to do that is onboarding that data in Kafka topics and putting a table abstraction on top of it. More details about that concept in this blog post, then joining the stream of events to this table in order to merge the records. The good thing about this is that the external datastore is no longer interrogated, therefore this point of failure is now fixed. Even if the remote system is unavailable, it won’t have any effect on the pipeline.

And this is something that can be translated in ksqlDB to (considering co partitioning) :

create stream enriched_records with(kafka_topic="enriched_records_by_customer_id", ...) as
select
    o.order_id,
    [...]
from orders o join customers c on o.customer_id = c.customer_id
emit changes;

The options to make this data available in a topic are multiple: if the remote system is an application already onboarded in Kafka, then it can be updated to stream changes in the destination topic. If it’s a database or a legacy system not expected to share records in Kafka, then you can utilize source connectors such as Change Data Capture or JDBC connector.

What if the remote system is out of my organisation?

This happens when you have to deal with a partner API or any kind of remote system under the control of another business unit, so it’s not possible to onboard this data in Kafka. So it looks like you’re doomed to do the call in the stream processor… Well, not that fast because there’s another concern, a bit more technical but that you should not pass over. And to understand we have to go deeper in the layers down to the Kafka client library. At the end of the day, processing a stream of records is nothing more than implementing a kind of loop:

final Consumer<String, DataRecord> consumer = new KafkaConsumer<String, DataRecord>(props);
consumer.subscribe(Arrays.asList(topic));

 while (true) {
        ConsumerRecords<String, DataRecord> records = consumer.poll(100);
        for (ConsumerRecord<String, DataRecord> record : records) {
            // Doing stuff
            [...]
        }
      }

Whether you’re writing ksql queries or Kafka Streams Java code, it will result in that kind of poll loop. The Kafka Java client library comes with the following configuration properties:

max.poll.interval.ms: The maximum delay between invocations of poll() when using consumer group management.[…] If poll() is not called before expiration of this timeout, then the consumer is considered failed and the consumer group coordinator will trigger a rebalance in order to reassign the partitions to another member.
max.poll.records: The maximum number of records returned in a single call to poll().

Now let’s say that the remote system slows down for any reason and each query/request has a one second response time. The default value for max.poll.records is 500, so it means that one iteration in the poll loop can take up to 500 seconds… And the default value for max.poll.interval.ms is 300000, so what will happen in this context is that the GroupCoordinator will consider the client as down and trigger a rebalance. And your Kafka Streams application (or ksqlDB persistent query) is not down, so the batch of records won’t be committed and after the rebalance, the same records will be processed again and again, continuously increasing the consumer lag. This can lead to a snowball effect because the root cause of that is a slow remote system and because it’s slow, it's invoked more and more without any chance to recover…

Don’t think about it as theoretical concerns, because it’s something I’ve seen on the field!

The cheapest answer is to tune the values of max.poll.records or max.poll.interval.ms, which could be fine to adapt to usual latency and response time, but it can be risky to push the limit to deal to casual spikes because this can lead to a vicious circle.

What about using an asynchronous client to avoid blocking the poll loop thread? This design doesn’t work at all because KafkaConsumer is not thread safe. It’s not a lack of thread safety, it’s enforced by the Kafka processing model because otherwise you lose the ordering guarantee.

So is there any viable option?

Yes there is, at the cost of a less straightforward design… The basic idea is to split the process in two, delegate the request processing to another component that can take advantage of an asynchronous design:

Here, requests are records stored in a topic, consumed by a dedicated processor that runs the request asynchronously to avoid blocking the poll loop and that will eventually write the result in an output topic. Then the second part of the initial pipeline is able to move forward by joining results with pending jobs.

Wait a minute, it sounds exactly like what was described as irrelevant in a Kafka context, isn’t it? Not exactly because as long as the result of the request is not required to commit the request’s topic offsets, that’s ok. On the other hand, it requires to implement at a higher level things like timeouts, maximum in-flight requests and crash recovery.

It obviously increases the complexity, however it offers a real opportunity to implement rich error management scenarios like retries with various back off policies.

You can check out an implementation example of that kind of architecture here. It’s not battle tested but it can give you inspiration for your own needs.