DEV Community: Moslem Chalfouh

Kafka Retry Done Right: The Day I Chose a Simpler Fix Over @RetryableTopic

Moslem Chalfouh — Sun, 22 Feb 2026 15:56:52 +0000

When the event is valid but the entity isn’t ready

Context: Spring Kafka, Confluent Cloud, Java enterprise backend.

The Problem: The Update Succeeded… And That Was The Bug

A Kafka consumer. An incoming event carrying data to apply to a core entity. A downstream archival process. Until I found silently corrupted entities in production with zero errors in the logs.

The update logic was correct. The entity existed. The data in the event was valid.

But the entity wasn’t in the right lifecycle state to receive this update yet. An internal validation workflow was still in progress upstream.

My consumer didn’t know. It applied the update anyway — successfully — and silently corrupted the entity’s lifecycle in production.

Expected lifecycle vs. Reality: The update succeeds, but breaks the logic order.

A silent lifecycle violation is worse than a crash, nothing alerts, nothing fails visibly: a technically successful operation at the wrong moment is worse than a failure. A failure you can detect. A silent lifecycle corruption you cannot.

The gap between “Existence” and “Readiness” is where bugs live.

Two Retry Approaches That Look Fine (Until They Hurt)

Trap #1 — “I’ll just poll until it’s ready”

If I keep the record “in-flight” and refuse to commit the offset until the entity is ready, I don’t only delay this one message. Kafka is FIFO per partition : everything behind that offset on that partition is stuck.

With concurrency=3, it's not "the broker is blocked" — it's one partition's entire throughput that stalls, silently, under load.

Holding the offset freezes the queue. Partition 0 is stuck, while others flow.

Trap #2 — “I’ll Thread.sleep() and try again"

Sleeping in the listener thread is a classic way to accidentally trigger a rebalance. If the consumer stops polling for longer than max.poll.interval.ms, the broker assumes the consumer is dead, rebalances the group, and the message gets replayed — potentially forever.

The infinite rebalance loop: Broker thinks consumer is dead.

The “Ideal” Pattern I Didn’t Use (On Purpose)

In theory, non-blocking retry is the cleanest approach: acknowledge the message immediately, park it in a dedicated retry topic, and process it later without blocking the main partition. That’s exactly what Spring Kafka’s @RetryableTopic gives you.

@RetryableTopic(
    attempts = "4",
    backoff = @Backoff(delay = 30_000, multiplier = 4),
    include = EntityNotReadyException.class
)
@KafkaListener(topics = "entity-created")
public void consume(String message) { ... }

Under the hood, RetryableTopic creates dedicated retry topics automatically (e.g. entity-created-retry-0, entity-created-dlt).

The key benefit: the offset is committed immediately, so the main partition keeps flowing while the message is retried asynchronously.

It’s a clean option when you’re allowed to create the extra topics.

One important nuance: RetryableTopic doesn’t make retries disappear — it moves them to intermediate retry topics that Spring Kafka creates and consumes automatically. The delay is enforced by a separate consumer on those retry topics. If the retry topic consumer also fails, you end up managing a second-level DLT. Elegant, yes — but not zero-complexity.

Why I didn’t use it: in a restricted enterprise environment (managed Confluent Cloud cluster, governance rules), creating extra retry topics can be forbidden or slow to approve. Topic creation goes through a ticket queue — or a flat “no”. I couldn’t assume I had that lever.

So I went with the best solution available within my constraints.

The Business Question That Made The Architecture Obvious

Before designing anything, I asked one question:

“How long does it usually take for the entity to reach its stable state after the event fires?”

Answer: “A few minutes. Not guaranteed, but usually fast.”

That changed everything. I didn’t need a sophisticated non-blocking retry topology. I needed a pragmatic delay to cover the common case, plus a safety net for the rest. I sized the solution for the failure rate we actually observed — not for a theoretical worst case.

The Fix I Shipped: Reuse What Was Already There

The project already had a retry mechanism wired through Spring Kafka’s DefaultErrorHandler with exponential backoff — already handling transient failures like network timeouts. I just needed to plug in.

// This is essentially the only new business logic I added
if (!isEntityInStableState(fetchedEntity)) {
    throw new EntityNotReadyException(
        "Entity lifecycle not ready: " + event.getEntityId()
    );
}
// → DefaultErrorHandler takes over automatically

The (simplified for confidentiality) backoff configuration:

ExponentialBackOff backOff = new ExponentialBackOff(120_000L, 2.0);
backOff.setMaxInterval(600_000L); // cap at 10 min per attempt
backOff.setMaxElapsedTime(3_600_000L); // stop retrying after ~1h
DeadLetterPublishingRecoverer recoverer = new DeadLetterPublishingRecoverer(kafkaTemplate);
DefaultErrorHandler errorHandler = new DefaultErrorHandler(recoverer, backOff);

Retry schedule: 2 min → 4 min → 8 min → 10 min → 10 min → … → DLQ

⚠️ Critical rule: max.poll.interval.ms must be strictly greater than maxInterval (your per-attempt cap), not maxElapsedTime. The thread only blocks for one interval at a time, not for the entire retry window.

Concrete example: with maxInterval = 600,000 ms (10 min), set max.poll.interval.ms = 660,000 (11 min). Setting it to match maxElapsedTime (1h+) would dangerously delay dead consumer detection.

In production, a CloudWatch alarm on the DLQ ensures the on-call team is notified if an entity never reaches its stable state after 1 hour. Don’t ship a retry mechanism without a safety net you can actually see.

Only 1/3 of throughput is paused during backoff. The rest flows.

Decision Framework (Cheat Sheet)

The cheat sheet: Start simple, reuse existing infrastructure, and only accept partition-blocking if your traffic allows it.

In short: if you can create topics → RetryableTopic. If you’re on a governed broker with rare failures → DefaultErrorHandler + backoff. If failures are frequent at high volume → fix the upstream contract.

What I Learned: Complexity Is a Choice

The temptation in event-driven systems is to reach for the most powerful tool available. Complexity is a choice — and in this case, I chose not to make it.

This wasn’t a Kafka feature problem. It was a definition problem : the event signaled data availability, while my process assumed entity readiness. That gap is invisible until it isn’t — and when it surfaces, it leaves no trace in your logs.

Technically, I could have fought for @RetryableTopic. I could have built a custom non-blocking retry topology. I could have asked the upstream team to delay their event trigger. Instead, I aligned with business reality ("usually a few minutes") and chose the simplest architecture that respected Kafka's partition semantics and my enterprise constraints.

The best architecture decision I made that week was asking a business question before opening my IDE.

What I Learned Deploying My First RAG System on AWS Bedrock

Moslem Chalfouh — Wed, 31 Dec 2025 14:55:43 +0000

For the past two years, I’ve been using Generative AI tools as an enthusiast — experimenting with prompts, testing models, seeing what they could do.

Over the last 12 months, I shifted my focus to understanding how to use these tools professionally on AWS. Not just “Can I build a chatbot?” but “How do I deploy this securely with proper infrastructure?”

To validate what I’d been learning about AWS AI workflows, I decided to build a concrete example: a RAG chatbot using AWS Bedrock Knowledge Bases , Aurora Serverless v2 , and Terraform.

This article walks through that build process — what worked, what didn’t, and what I had to do manually because Terraform couldn’t handle it yet.

1. The “Why”: RAG and Embeddings

The first step was understanding why we need this complexity. Large Language Models (LLMs) like Claude have a fixed knowledge cutoff and don’t know my private data.

We could paste documents into the prompt, but that hits token limits quickly. The standard solution is RAG:

Ingest: Convert documents into vectors (embeddings).
Store: Save them in a vector database.
Retrieve: Find relevant chunks when a user asks a question.
Generate: Send those chunks to the LLM to write an answer.

What Are Embeddings?

Embeddings are mathematical representations of text meaning. The Titan Embeddings model converts text into a vector of 1,536 numbers:

"Tesla Model Y range: 455 km" → [0.23, −0.45, 0.87, …, 0.12]

The key insight: similar meanings produce similar vectors. “Range,” “autonomy,” and “distance per charge” all generate nearby vectors, even though the words are different. This solves the problem of traditional keyword search, which only finds exact matches.

Why Vector Databases?

Regular databases search for exact matches (WHERE id = 123). With embeddings, you need similarity search : "Find the 5 closest vectors to my query."

This requires:

Specialized indexes (HNSW) to organize vectors spatially
Distance calculations (cosine similarity) across thousands of vectors
Fast retrieval (milliseconds, not seconds)

That’s what pg_vector adds to Postgres—a vector(1536) column type and similarity search operators. Without it, searching embeddings would be impossibly slow.

2. The Stack: Turning Theory Into Practice

Now that we understand the concepts, how do we actually implement this on AWS? I wanted an architecture that handled all the embedding and vector search complexity while staying simple to operate.

Here’s what I chose:

Compute: AWS Bedrock (Serverless AI).
Embedding Model: Titan Embeddings G1 (managed by Bedrock).
Vector Storage: Aurora Serverless v2 with pg_vector extension.
Document Storage: S3 (for source PDFs).
IaC: Terraform.
UI: Streamlit (Python).

Why this stack?

Bedrock Knowledge Base handles the entire RAG workflow automatically — chunking documents, calling the Titan embedding model, and storing vectors in Aurora. I don’t write the embedding logic; I just configure where the vectors go.

Aurora with pg_vector was chosen over specialized vector databases (like Pinecone or Weaviate) for simplicity. It's Postgres with a vector extension—one SQL command to enable, and I can use standard database tooling I already know.

Aurora Limitations to Know

pg_vector works great for this use case, but keep in mind:

HNSW indexes load into memory. With ~10,000 documents (50k chunks), you’re looking at ~300MB of vector data.
Query performance may degrade above 100,000 vectors. At that scale, consider OpenSearch Serverless.
No distributed search — Aurora is single-instance.

For knowledge bases under 5,000 documents, Aurora + pg_vector is the simplest choice.

The complete workflow: User → Guardrail → Bedrock KB → Aurora → Claude

The final result: A Streamlit interface querying proprietary data via Bedrock

3. The Terraform Struggle: Circular Dependencies

When I tried to automate the deployment, I hit a logic problem. Bedrock Knowledge Base needs the Aurora Cluster ARN to know where to store data. However, the IAM Role for Bedrock needs permission to access specific Aurora tables.

Trying to do this in one terraform apply resulted in errors because Terraform couldn't resolve the dependencies.

The Solution: I split the project into two separate stacks.

Stack 1: Infrastructure

Deploys the VPC, S3 bucket, and Aurora Cluster.

# Aurora cluster
resource "aws_rds_cluster" "aurora_serverless" {
  engine = "aurora-postgresql"
  engine_mode = "provisioned"
  serverlessv2_scaling_configuration {
    min_capacity = 0.5
    max_capacity = 16
  }
}

# S3 bucket for documents
resource "aws_s3_bucket" "documents" {
  bucket = "my-bedrock-documents"
}
output "aurora_cluster_arn" {
  value = aws_rds_cluster.aurora_serverless.arn
}

Stack 2: Bedrock Knowledge Base

Reads the outputs from Stack 1 via terraform_remote_state and deploys the Knowledge Base.

data "terraform_remote_state" "stack1" {
  backend = "s3"
  config = {
    bucket = "my-terraform-state"
    key = "stack1/terraform.tfstate"
  }
}
resource "aws_bedrockagent_knowledge_base" "main" {
  name = "my-bedrock-kb"
  role_arn = aws_iam_role.bedrock_kb_role.arn
  knowledge_base_configuration {
    vector_knowledge_base_configuration {
      embedding_model_arn = "arn:aws:bedrock:us-west-2::foundation-model/amazon.titan-embed-text-v1"
    }
    type = "VECTOR"
  }
  storage_configuration {
    type = "RDS"
    rds_configuration {
      credentials_secret_arn = data.terraform_remote_state.stack1.outputs.aurora_secret_arn
      resource_arn = data.terraform_remote_state.stack1.outputs.aurora_cluster_arn
      database_name = "myapp"
      table_name = "bedrock_integration.bedrock_kb"
      field_mapping {
        vector_field = "embedding"
        text_field = "chunks"
        metadata_field = "metadata"
        primary_key_field = "id"
      }
    }
  }
}

This separation made the state management much cleaner and avoided the circular dependency hell.

Success: The Knowledge Base deployed via Terraform and ready in the console

The IAM Role Bedrock Needs

Getting IAM right was critical. Bedrock needs specific permissions to talk to S3, Secrets Manager, and Aurora.

resource "aws_iam_role_policy" "bedrock_kb_policy" {
  role = aws_iam_role.bedrock_kb_role.id
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = ["s3:GetObject", "s3:ListBucket"]
        Resource = ["${aws_s3_bucket.documents.arn}", "${aws_s3_bucket.documents.arn}/*"]
      },
      {
        Effect = "Allow"
        Action = ["bedrock:InvokeModel"]
        Resource = "arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v1"
      },
      {
        Effect = "Allow"
        Action = ["secretsmanager:GetSecretValue"]
        Resource = aws_secretsmanager_secret.aurora_credentials.arn
      },
      {
        Effect = "Allow"
        Action = ["rds-data:ExecuteStatement", "rds-data:BatchExecuteStatement"]
        Resource = aws_rds_cluster.aurora_serverless.arn
      }
    ]
  })
}

Missing any of these results in silent failures during sync or query.

4. The Manual Parts (and One That Is Becoming Obsolete)

Despite using Terraform, I realized that AWS Bedrock isn’t fully automatable yet. However, the platform is maturing fast.

Model Access (The “Ghost” Step)

When I started this project back in October, I hit a wall: AccessDeniedException. I had to manually go into the AWS Console and request access for "Titan Embeddings" and "Claude". It was a one-time toggle that Terraform couldn't handle.

The Model Access screen (a necessary stop for older accounts)

Good news for you: As of late 2025, AWS has largely removed this requirement. Most serverless models are now enabled by default in supported regions. If you are building this today, you likely won’t need to touch this, but if you get a permission error, check the Model Access page just in case.

Database Schema

This is still a manual friction point. Bedrock expects the table to exist before it can sync, but it won’t create it for you. I had to connect to Aurora and run the SQL setup manually:

Manually running the SQL in the Query Editor to create the pg_vector schema

CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE bedrock_integration.bedrock_kb (
  id uuid PRIMARY KEY,
  embedding vector(1536), -- Matches Titan G1 output
  chunks text,
  metadata json
);
CREATE INDEX ON bedrock_integration.bedrock_kb USING hnsw (embedding vector_cosine_ops);

Data Sync

Uploading a file to S3 doesn’t automatically trigger ingestion. You still have to manually click “Sync” in the console or trigger it via the API (start_ingestion_job).

5. The Python Layer: Keeping It Simple

I’m not a Python developer, so I kept the code minimal and modular. Two files handle everything:

bedrock_utils.py (The RAG Logic)

This file contains key functions using two different Bedrock clients.

import boto3

# Two separate clients for different purposes
bedrock_runtime = boto3.client('bedrock-runtime')  
bedrock_agent_runtime = boto3.client('bedrock-agent-runtime')

bedrock-runtime: For invoking foundation models (Claude)
bedrock-agent-runtime: For querying the Knowledge Base

This separation was confusing at first, but it makes sense once you understand that the Knowledge Base is technically an “agent” service, while model invocation is a “runtime” service.

app.py (The Streamlit Interface)

The UI is straightforward — 54 lines total. The Streamlit framework handles the chat history and UI rendering automatically.

6. The Guardrail Pattern (Saving Cost & Tokens)

💡 Most RAG tutorials skip this step. They assume every query is valid. In reality, users will ask off-topic questions (“What’s the weather?”), which triggers expensive vector searches for nothing.

Adding a 10-token classification step before RAG saves both money and UX.

During testing, I noticed that every question triggered the full RAG process, which takes time and costs money. A simple “Hello” shouldn’t trigger an expensive vector search.

I added a validation step before RAG using bedrock-runtime directly to classify the intent with a cheaper model (Claude Haiku). The function checks if the user's question falls into predefined categories before triggering the expensive RAG workflow.

Note: My categories are specific to my test documents (local biodiversity, town history, Tesla specs). For your use case, replace these with your own domain-specific categories. The key is to keep it simple — 3–4 categories maximum for reliable classification.

The logs showing the guardrail in action: Valid categories trigger RAG, while off-topic inputs are filtered out

This simple pattern saved tokens and made the application feel more responsive. The guardrail uses only ~10 tokens, while a full RAG query can use 200–500 tokens.

7. Conclusion

Building this project clarified a few things for me about the AWS AI ecosystem:

Infrastructure is key: The Python code is short, but the IAM roles, Terraform configuration, and Network setup took the most time.
Aurora is enough: You don’t necessarily need a specialized vector DB; Postgres works fine for this scale and is easier to maintain.
Bedrock abstractions work: The retrieve_and_generate API effectively hides the complexity of vector search, letting you focus on the application logic.

What’s Missing for Production

This is a learning project , not a production system. To deploy this in a real enterprise environment, you’d need to add:

Authentication & Authorization: No login system, no role-based access control
API Gateway + Lambda: Replace Streamlit with a proper REST API
CI/CD Pipeline: Automated testing and deployment (GitHub Actions, CodePipeline)
Cost Monitoring: Budget alerts, usage tracking per user/department
Logging & Observability: CloudWatch dashboards, distributed tracing
Security hardening: VPC endpoints, encryption at rest, audit trails
Rate limiting: Prevent abuse and control Bedrock costs

The goal here was to understand how the pieces fit together, not to build a turnkey solution.

👉 View on GitHub: terraform-bedrock-rag

ECS, Lambda, or EC2? How Hexagonal Architecture Made the Choice Irrelevant

Moslem Chalfouh — Mon, 22 Dec 2025 17:28:45 +0000

You start a project locally. Everything runs smoothly, tests are green, and infrastructure feels like a “later” problem. Then “later” arrives — bringing deadlines, security compliance, and platform changes.

On a recent project, a Kafka-driven Java service went through three major infrastructure pivots before hitting production: containers, serverless, and finally classic EC2. The service was designed to generate business documents and call on-premises APIs.

The only reason I could pivot the project without a full rewrite was strict adherence to Hexagonal Architecture.

Here is the story of how that structure absorbed the chaos.

The Use Case: Kafka In, Legacy Out

On paper, the functional requirement was deceptively simple:

Consume events from a Kafka topic.
Apply routing and validation rules.
Generate a business document.
Call an on-premises API to update downstream processes.

Locally, it was just a Spring Boot app with some JSON and a few services.

I focused purely on the domain model and boundaries, ignoring whether the entry point would eventually be a Lambda handler, a Kafka listener, or a container.

Act I — The Container Hype (ECS)

Initially, the plan was to use Amazon ECS. It was the exciting option: containerize the app, push it, and run it in a managed cluster.

But there was a hidden constraint. While ECS was trendy among delivery teams, it was not yet an officially approved standard for our security and compliance department. This meant:

Extra validation steps.
Uncertain timelines.
A high risk of a “No-Go” decision right before launch.

For a project under strict delivery pressure, betting on a platform still awaiting approval was a gamble we couldn’t afford. I had to pivot.

Act II — The Serverless Promise (Lambda + Kafka Connector)

The logical plan B was Serverless. Infrastructure teams wanted to avoid OS patching, and AWS Lambda fit that bill perfectly.

The architecture seemed elegant:

Messages arrive in Kafka (Confluent).
A Kafka connector pushes them to a Lambda trigger.
The Lambda processes the event, generates the document, calls the API, and vanishes.

Java on Lambda: Debunking Myths

Despite skepticism about running Java on Lambda (cold starts, heavy runtime), I leveraged a modern stack: Java 21 + Spring Boot 3. I used Virtual Threads for I/O-bound efficiency and SnapStart to reduce cold start latency.

Technically, it worked. Locally and in non-prod, the Lambda accepted payloads, mapped them to domain objects, and executed the business logic perfectly.

Then organizational reality hit.

The Compliance Wall

Just before go-live, a new constraint dropped: the specific Confluent connector required to trigger the Lambda was not qualified for production.

The consequences were immediate:

The push model (Connector → Lambda) was banned.
The service had to consume directly from Kafka.
Serverless was effectively dead for this release.

I needed a third option. Fast.

Act III — Landing on EC2 (And Why It Didn’t Hurt)

With two options off the table, we turned to the most battle-tested solution available: a classic EC2 instance running a Spring Boot application.

In a tightly coupled architecture, this would have required major surgery: rewriting entry points, refactoring message parsing, and risking regression in the business logic.

But for us? The impact was trivial.

The Real Hero: Hexagonal Architecture

Because I had structured the service around clear Hexagonal principles, our project structure looked like this:

The only thing that changed across our three AWS pivots was the Driving Adapter (the left side).

The Stable Center

The use case remained untouched:

// application/port/in/ProcessRequestUseCase.java
public interface ProcessRequestUseCase {
    void process(BusinessRequestEvent event);
}

The domain model never knew if the data came from a Lambda JSON payload or a Kafka ConsumerRecord.

Adapter 1 — The Lambda Approach (Abandoned)

When we targeted Lambda, our entry point looked like this:

// infrastructure/lambda/LambdaHandler.java
public class LambdaHandler implements RequestHandler<Map<String, Object>, String> {
    private final ProcessRequestUseCase useCase;

    @Override
    public String handleRequest(Map<String, Object> event, Context context) {
        // Adapt JSON -> Domain
        BusinessRequestEvent domainEvent = map(event);
        useCase.process(domainEvent);
        return "OK";
    }
}

Adapter 2 — The EC2 Approach (Final Production)

When we switched to EC2, we simply swapped in a Spring Kafka listener:

// infrastructure/kafka/KafkaConsumerListener.java
@Component
public class KafkaConsumerListener {
    private final ProcessRequestUseCase useCase;

    @KafkaListener(topics = "${app.kafka.topic}", groupId = "${app.kafka.group}")
    public void onMessage(ConsumerRecord<String, String> record) {
        // Adapt Kafka Record -> Domain
        BusinessRequestEvent domainEvent = map(record.value());
        useCase.process(domainEvent);
    }
}

What changed?

Infrastructure concerns (annotations, configuration, scaling).

What stayed the same?

The Domain Model.
The Use Case API.
The entire business logic.

This decoupling is precisely what allowed the service to survive three infrastructure decisions without rewriting a single line of business code.

The Takeaway

The story isn’t about “Containers vs. Serverless vs. EC2.” Those are implementation details that will inevitably change based on cost, trends, and governance.

The real lesson is:

Infrastructure is volatile — internal standards and compliance rules are moving targets.
Business logic should be stable — it shouldn’t break because you changed a compute platform.
Architecture buys you options — the freedom to pivot without panic.

By keeping the domain pure and the adapters thin, I absorbed an ECS experiment, a Serverless attempt, and an EC2 fallback.

Infrastructure decisions will keep changing. Good architecture is what lets you sleep at night when they do.