DEV Community: Redis

Semantic Caching in Agentic AI: Determining Cache Eligibility and Invalidation

Ashwin Hariharan — Tue, 14 Apr 2026 12:30:00 +0000

A few years ago, "AI in your app" meant a chatbot that answered FAQs. Today, it means an agent that can search, filter, compare, book, and transact - all while remembering what you said three messages ago.

Amazon's Rufus has already fielded tens of millions of questions from shoppers - product comparisons like "What's the difference between OLED and QLED TVs?", recommendations ("best wireless outdoor speakers"), and also specific use cases ("lawn games for kids' birthday parties"). Booking.com's AI Trip Support does the same for travelers - a guest with a car asks "Is parking available at the hotel?", the agent pulls property information and responds in seconds.

At any given moment, thousands of those questions are just variations of something that's already been asked and answered.

But these assistants aren't just answering questions. Every time users ask the same questions over and over, it triggers an expensive, high-latency LLM call, and often database lookups too. AI agents need to search, filter, compare, and carry context across an entire conversation. And that one distinction changes everything.

This is exactly the problem semantic caching was built to solve. For a standard Retrieval-Augmented Generation (RAG) application, it works really well - convert commonly-asked questions into vector embeddings and store the question-response pairs in a cache. The next time someone asks something semantically similar, you simply return the cached answer.

But when an AI agent can take actions and produce responses that depend on who's asking and what they've already done, you can't cache everything the same way. Cache the wrong response, and you're serving someone else's cart. Cache too aggressively, and your users get stale prices and out-of-stock recommendations.

So the question isn't just "can we cache this?" - it's "should we, and for how long?" That's a much harder problem.

Caching in AI Agents
Strategies for Semantic Caching
- String-based pattern matching
- LLM-Based Decision Making
- Tool-Based Decision Making
- Semantic Routing
Handling Multi-Turn Conversations
Production Considerations

Caching in Agentic AI

Agentic AI introduces caching challenges that a conventional RAG pipeline never has to deal with. With AI agents, state is maintained across turns, tools are called, multi-step workflows are executed, and responses depend on context that keeps changing. That changes the caching problem considerably. Let's explore what that looks like in practice!

To see how these challenges play out, I'll use an AI-powered e-commerce application as a running example. Think of a shopping assistant that helps users search for products, manage their cart, and answer product-related questions. For such an application, here's how the workflow might look like:

User query comes in
Check semantic cache first
If cache hit → return cached response (fast path)
If cache miss → invoke AI agent with tools
Agent processes query, calls tools as needed
Store the agent's response to a semantic cache with appropriate TTL
Return response to user

Here's a basic implementation in LangGraph:

from langgraph.graph import StateGraph, START, END
from typing import TypedDict, List

class AgentState(TypedDict):
    messages: List[dict]
    session_id: str
    cache_status: str
    result: str
    tools_used: List[str]

# Node 1: Check semantic cache
async def query_cache_check(state: AgentState) -> AgentState:
    query = state["messages"][-1]["content"]

    # Check if we have a semantically similar cached response
    cached_result = await check_semantic_cache(query)

    if cached_result:
        return {
            **state,
            "cache_status": "hit",
            "result": cached_result
        }

    return {
        **state,
        "cache_status": "miss"
    }

# Node 2: AI Agent with tools
async def agent_node(state: AgentState) -> AgentState:
    # Invoke LLM with tools
    result = await agent.invoke(state["messages"])

    return {
        **state,
        "result": result["output"],
        "tools_used": result["tools_called"]
    }

# Node 3: Cache the result
async def cache_result_node(state: AgentState) -> AgentState:
    query = state["messages"][-1]["content"]

    await save_to_semantic_cache(query, state["result"], ttl)

    return state

# Build the graph
graph = StateGraph(AgentState)

# Add nodes
graph.add_node("cache_check", query_cache_check)
graph.add_node("agent", agent_node)
graph.add_node("cache_result", cache_result_node)

# Define edges
graph.add_edge(START, "cache_check")

# If cache hit, go straight to END; if miss, run the agent
def should_invoke_agent(state: AgentState) -> str:
    return END if state["cache_status"] == "hit" else "agent"

graph.add_conditional_edges("cache_check", should_invoke_agent)

graph.add_edge("agent", "cache_result")
graph.add_edge("cache_result", END)

workflow = graph.compile()

This gets us a working agentic pipeline with semantic caching baked in. But there's a bigger question lurking underneath.

The hardest part for semantic caching is in deciding what to cache and for how long.

"There are only two hard things in Computer Science: cache invalidation and naming things."

— Phil Karlton

Strategies for Semantic Caching

Not all agent operations have the same caching requirements. Cache invalidation was already hard - agentic AI just made it harder.

For example, for a shopping AI agent:

Product searches depend on inventory (changes hourly)
Product information is relatively static (changes rarely)
Cart operations are personal (shouldn't be cached)
General shopping advice can be timeless (have a much longer ttl)

How do you make this decision programmatically?

Let's explore four approaches.

Approach 1: String-Based Pattern Matching

The most straightforward approach is to scan the user's query for keywords.

def determine_cache_ttl_by_string(query: str) -> int:

    # Long TTL for product information (relatively static)
    product_info_keywords = ['what is', 'tell me about', 'specs', 'features']
    if any(keyword in query for keyword in product_info_keywords):
        return 24 * 60 * 60  # 24 hours

    # Short TTL for product searches (inventory changes)
    search_keywords = ['find', 'search', 'show me', 'looking for']
    if any(keyword in query for keyword in search_keywords):
        return 2 * 60 * 60  # 2 hours

    # Don't cache personal operations
    personal_keywords = ['my cart', 'add to cart', 'my order', 'checkout']
    if any(keyword in query for keyword in personal_keywords):
        return 0  # Don't cache

    # Default
    return 6 * 60 * 60  # 6 hours

Usage:

query = "What are the features of the MacBook Pro?"
ttl = determine_cache_ttl_by_string(query)  # Returns 86400 (24 hours)

await save_to_semantic_cache(query, response, ttl)

Now, this approach is a decent starting point to understand the problem, but I wouldn't ship this in production. The upside is that it's dead simple to implement, fast (no LLM calls involved), and easy to debug. The patterns are explicit, so you always know why a decision was made.

But the downside is that it's fragile and would break easily. "Show my cart" and "View my cart" might behave differently depending on which keywords you've defined. It has no understanding of context or intent, and every time your app evolves, someone has to go back and update the keyword list.

Approach 2: LLM-Based Decision Making

Instead of hardcoding rules, why not delegate the decision to the LLM?

from langchain_openai import ChatOpenAI

async def determine_cache_ttl_by_llm(query: str) -> int:
    """
    Use an LLM to intelligently determine cache TTL
    """
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

    prompt = f"""Analyze this user query and determine the appropriate cache TTL.

Query: "{query}"

Categories:
- PERSONAL: User-specific operations (cart, orders, profile). Return: 0 (don't cache)
- PRODUCT_INFO: Product details, specs, features. Return: 86400 (24 hours)
- ADVICE: Shopping recommendations, buying guides. Return: 43200 (12 hours)
- SEARCH: Product searches, availability checks. Return: 7200 (2 hours)
- DEFAULT: Anything else. Return: 21600 (6 hours)

Return ONLY the TTL number (in seconds), nothing else."""

    response = await llm.ainvoke(prompt)

    return ttl

Usage:

query = "Can you add this laptop to my cart?"
ttl = await determine_cache_ttl_by_llm(query)  # Returns 0 (personal operation)

query = "What are the specs of the iPhone 15?"
ttl = await determine_cache_ttl_by_llm(query)  # Returns 86400 (product info)

The upsides are real, with some tradeoffs as well.

The big win here is that it actually understands what the query means. Ambiguous or oddly phrased queries are handled gracefully, there's no keyword list to maintain, and it can reason about edge cases that a string matcher would simply miss.

Every cache decision now requires an LLM call - adding 200–500ms of latency before you've even answered the user's question. It also costs money on every single request, even if small models are relatively cheap. Because LLMs are non-deterministic, the same query might get a different TTL on different days, making behavior harder to predict. If your prompt isn't well-crafted, you'll get inconsistent decisions that are difficult to debug. Also, it's not easy to write unit tests for LLM decisions.

So this approach is a good fit if you can stomach the latency and cost in exchange for smarter decisions.

Approach 3: Tool-Based Decision Making

In agentic AI, the tools an agent calls can reveal the nature of an operation better than the query text alone. So instead of analyzing what the user asked, look at what tools the agent used.

def determine_tool_based_cache_ttl(tools_used: List[str]) -> int:

    # Don't cache personal/dynamic operations
    personal_tools = [
        'add_to_cart',
        'view_cart',
        'clear_cart',
        'checkout',
        'view_order_history'
    ]

    # Long TTL for product information (changes rarely)
    product_info_tools = [
        'get_product_details',
        'get_product_specs',
        'get_product_reviews'
    ]

    # Medium TTL for shopping advice (static content)
    advice_tools = [
        'get_recommendations',
        'compare_products',
        'answer_general_question'
    ]

    # Short TTL for product searches (inventory changes)
    search_tools = [
        'search_products',
        'check_availability',
        'filter_by_category'
    ]

    if any(tool in tools_used for tool in personal_tools): # Personal operations detected
        return 0  # Don't cache

    if any(tool in tools_used for tool in product_info_tools):
        return 24 * 60 * 60  # 24 hours

    if any(tool in tools_used for tool in advice_tools): # Shopping advice detected
        return 12 * 60 * 60  # 12 hours

    if any(tool in tools_used for tool in search_tools):
        return 2 * 60 * 60  # 2 hours

    # Default for direct responses or unknown tools
    return 6 * 60 * 60  # 6 hours

Advantages of Tool-Based Caching

In practice, this approach adds up to some real advantages:

1. No Additional LLM Calls:
The agent already executed - you're just reading which tools it used. No extra latency or cost.

2. Deterministic and Testable:

Because it's working from what the agent actually did - not what the user said - the decisions tend to be reliable.

def test_cache_ttl_decisions():
    # Personal operations shouldn't cache
    assert determine_tool_based_cache_ttl(['add_to_cart']) == 0
    assert determine_tool_based_cache_ttl(['view_cart', 'search_products']) == 0

    # Product info gets long TTL
    assert determine_tool_based_cache_ttl(['get_product_details']) == 86400

    # Product searches get short TTL
    assert determine_tool_based_cache_ttl(['search_products']) == 7200

Tests are simple, fast, and reliable.

3. Intent-Based, Not Text-Based:

The query "show me options" could mean:

Show cart → don't cache (uses view_cart tool)
Show products → cache for 2 hours (uses search_products tool)

Tools reveal actual intent.

4. Easy to Extend:

Adding a new tool? Just update the mapping:

# New tool for checking order status
personal_tools = [
    'add_to_cart',
    'view_cart',
    'clear_cart',
    'check_order_status'  # ← New tool
]

5. Works Across Different Query Phrasings:

However the user phrases it, if the same tool gets called, the same caching decision gets made. For example, these queries use the same tool → same caching behavior:

"Add milk to my cart"
"Put milk in cart"
"I want to add milk"

In each case, the agent calls add_to_cart, and that's enough to know: TTL = 0, don't cache.

Check out an implementation on Github

That said, there's one thing worth keeping in mind. Since the caching decision is made after the agent runs, the very first request always goes to the LLM - there's no way to short-circuit that. And because the same query can sometimes invoke different tools depending on context or agent state, caching behavior can occasionally be inconsistent in ways that are hard to predict.

Approach 4: Semantic Routing

Semantic routing uses vector embeddings to classify user input / queries into pre-defined categories or labels based on their semantic meaning. This enables some really useful patterns:

Instead of doing this:

User query → LLM → analyze tools → determine cache TTL

You can do this:

User query → vector classifier → category → determine cache TTL

How it works with RedisVL:

RedisVL provides a SemanticRouter that simplifies semantic routing using Redis for vector storage.

from redisvl.extensions.router import SemanticRouter
from redisvl.extensions.router import Route
from redisvl.utils.vectorize import HFTextVectorizer

# Define routes with examples and TTLs

personal_operations = Route(
    "name" = "personal",
    "references" = [
        "show my cart",
        "add to cart",
        "view my orders",
        "checkout",
        "my purchase history"
    ],
    "metadata" = {"ttl": 0}  # Don't cache
)

product_info = Route(
    "name" = "product_info",
    "references" = [
        "what are the specs",
        "tell me about this product",
        "product features",
        "product details"
    ],
    "metadata" = {"ttl":  86400}  # 24 hours
)

shopping_advice = Route(
    "name" = "shopping_advice",
    "references" = [
        "what should I buy",
        "recommend a laptop",
        "best headphones",
        "which one is better"
    ],
    "metadata" = {"ttl": 43200}  # 12 hours
)

# Create semantic router
router = SemanticRouter(
    name="cache_router",
    routes=[personal_operations, product_info, shopping_advice],
    vectorizer=HFTextVectorizer(),
    redis_url="redis://localhost:6379"
)

async def classify_query_semantically(query: str) -> tuple[str, int]:
    """
    Classify query using RedisVL semantic router
    Returns (category_name, cache_ttl)
    """
    # Route the query to the best matching category
    result = router(query)

    category = result.name
    ttl = result.metadata["ttl"]
    similarity = result.score

    print(f"🎯 Classified '{query}' as '{category}' (similarity: {similarity:.3f})")

    return category, ttl

Then in your langgraph code, you can do this:

# Integration with workflow
async def determine_cache_ttl_by_routing(state: AgentState) -> AgentState:
    query = state["messages"][-1]["content"]

    # First, classify the query semantically
    category, ttl = await classify_query_semantically(query)

    # If it's a personal query, skip cache entirely
    if category == "personal":
        return {
            **state,
            "cache_status": "skip",
            "category": category
        }

    # Check semantic cache for non-personal queries
    cached_result = await check_semantic_cache(query)

    if cached_result:
        return {
            **state,
            "cache_status": "hit",
            "result": cached_result,
            "category": category
        }

    return {
        **state,
        "cache_status": "miss",
        "category": category,
        "planned_ttl": ttl  # Pass TTL to caching node
    }

Usage example:

# User asks
query = "what are the specs of the MacBook Pro?"

# Semantic router classifies
category, ttl = await classify_query_semantically(query)
# Output: Classified 'what are the specs of the MacBook Pro?' as 'product_info' (similarity: 0.912)
# Returns: ("product_info", 86400)

# Different phrasing, same classification
query = "tell me about the MacBook Pro features"
category, ttl = await classify_query_semantically(query)
# Output: Classified 'tell me about the MacBook Pro features' as 'product_info' (similarity: 0.889)
# Returns: ("product_info", 86400)

Advantages of Semantic Routing

Using vector embeddings to classify intent turns out to be a surprisingly versatile idea -

1. Works Anywhere in the Pipeline:

Most caching approaches are tied to a specific point in the request lifecycle. However, Semantic routing can fit wherever you need it:

Before cache check: Skip cache entirely for personal queries
Before agent execution: Route to specialized agents per category
After agent execution: Validate caching decisions
For query preprocessing: Clean or enrich queries before processing

2. No LLM Calls:

The classification is done entirely with an embedding model and a vector similarity lookup - no LLM involved. That keeps it fast (20 - 50ms) and cheap (a fraction of a cent per call), compared to 200 -500ms and ~$0.001 for an LLM-based decision.

3. Deterministic and Testable:

Because classification is based on vector similarity against fixed reference examples, the behavior is consistent. You can write unit tests for it:

async def test_semantic_routing():
    # Personal queries
    category, ttl = await classify_query_semantically("show my cart")
    assert category == "personal" and ttl == 0

    category, ttl = await classify_query_semantically("add laptop to cart")
    assert category == "personal" and ttl == 0

    # Product info queries
    category, ttl = await classify_query_semantically("what are the specs of iPhone 15")
    assert category == "product_info" and ttl == 86400

    category, ttl = await classify_query_semantically("tell me about the MacBook features")
    assert category == "product_info" and ttl == 86400

4. Easy to Tune:

Want better classification? Add more reference examples to your routes:

router.add_route_references(
    route_name="product_info",
    references=[
        "technical specifications",
        "product description"
    ]
)

RedisVL automatically updates the route embeddings when you add new references.

5. Multi-Category Support:

RedisVL's semantic router can return confidence scores for multiple routes, allowing you to handle edge cases:

# Get top N routes instead of just the best match
result = router(query, return_top_k=3)

# Can inspect multiple categories and their scores
# Then combine TTLs or route to multiple agents based on confidence

One limitation worth noting: when a query is genuinely ambiguous - with no additional context to anchor it - the router will still pick the closest matching route rather than flagging uncertainty. So, this technique works best when queries are reasonably self-contained.

Handling Multi-Turn Conversations

For stateless Q&A, where every query stands on its own without requiring session state or prior context, semantic caching works very well with the above approaches - one query, one caching decision. But in a conversational AI application, that's rarely the case.

Consider this exchange:

Turn 1: "Show me wireless headphones under $100"
Turn 2: "What about Sony ones?"
Turn 3: "Are any of them waterproof?"

Turn 2 and Turn 3 are meaningless on their own. And this creates problems at two points in the caching pipeline.

On the cache check step: If you embed the raw message "What about Sony ones?" and search the cache, it could match all sorts of unrelated conversations - another user asking about Sony TVs, Sony cameras, anything Sony. You'd serve the wrong cached response, and the user would never know why the answer felt off.
On the cache storage step: Caching a raw message like "What about Sony ones?" without any context is just as problematic. That string could mean anything depending on the conversation, so you'd either over-match (returning wrong results) or under-match (never finding a valid hit).

The most practical solution is query rewriting. Before checking the cache, use an LLM to rewrite the current message into a self-contained, standalone question. This involves context selection and context compression techniques to pick the relevant turns from the conversation history and distill them into what's actually needed:

"What about Sony ones?" → "Show me Sony wireless headphones under $100"
"Are any of them waterproof?" → "Are any Sony wireless headphones under $100 waterproof?"

The rewritten query is what gets embedded and checked against the cache - and what gets stored, if there's a miss. This way, it doesn't matter how the user phrased it or how many turns deep they are. What gets cached is the intent, not the message.

The tradeoff is an extra LLM call per turn. But if you're already using an LLM for TTL decisions, you can combine both into a single call - rewrite the query and decide the TTL in one shot.

Production Considerations

For simple applications, a single TTL configured for the entire cache is often good enough. Agentic AI changes that - when your system can take actions and respond depending on user state, a one-size-fits-all TTL becomes a liability. That's precisely the problem the approaches in this article are designed to solve.

But production semantic caching is messier than demos make it look.

The first thing to reconsider is the assumption that every LLM response should be automatically written back to the cache. In practice, that's risky. It's often preferable - especially for FAQ bots and customer support tools - to maintain a human-curated cache: a verified set of question-response pairs that are periodically reviewed and expanded based on analytics.

When you do auto-cache LLM responses, build an audit process around it. Have a human or another LLM periodically review recently cached entries for factual accuracy before they keep getting served to users.

Relatedly, LLMs don't always give clean, cacheable answers. Sometimes they ask a clarifying question. Sometimes they say they don't have enough information. Caching those responses and serving them to future users is its own category of problem. If your workflow supports these kinds of responses, add a classification step - have the LLM categorize its own output before deciding whether to cache it at all.

And then there's PII. If users are asking questions that contain personally identifiable information, it needs to be scrubbed before it hits the cache, or be user-scoped. It's non-negotiable if you're operating under GDPR or similar regulations.

If you want to see what this looks like in practice, here's an example of a dedicated compliance step built into an agent workflow.

Finally, the safest production deployments tend to be the ones with narrow scopes - an FAQ bot for account info, a separate one for offers, another for product details. The broader the scope, the harder it is to reason about what's safe to cache and for how long. If you're building a general-purpose agent, expect to invest significantly more in guardrails.

Tools like Redis can make implementing semantic caching much simpler and more reliable. If you'd like to dive deeper, the resources below are a great place to start:

Resources:

Shopping AI Agent Demo with Langgraph - Github - see the approaches in this article implemented end-to-end
Semantic Cache FAQs - blog post
A Semantic Cache using LangChain - Youtube
Semantic Caching for AI Agents - Course

Building Reliable Agents with the Transactional Outbox Pattern and Redis Streams

Ricardo Ferreira — Fri, 27 Mar 2026 15:06:01 +0000

AI agents are pretty good at deciding what should happen next, given a well-defined business workflow. In the case of a customer support agent, for example, they can read a conversation, apply a policy, and return a response like "approve the refund" or "escalate this case." That part is exciting, and it is usually what gets demoed first. But the hard part starts right after the decision is made.

In a real system, a decision only matters if the rest of the platform can trust it. If an agent decides a customer should get a refund, that decision still has to turn into real work across the rest of the application. The support case needs to be updated, and billing needs to issue the refund. The customer may need an email, and the CRM probably needs the updated status, too.

If the app updates the case and then crashes before billing gets the event, you now have a case that says "refund approved" and a customer who never actually got refunded. That is the kind of bug that makes a system feel flaky even when the model made the right call. But the worst part is the damage to customer experience. I would be really mad at the company if this happened to me.

For scenarios like this, the Transactional Outbox pattern exists. Instead of treating "update the case" and "tell the rest of the system" as two separate operations, we commit them together and let the rest of the platform react asynchronously afterward. This pattern became fairly famous in the context of microservices, as they often need a reliable way to hand off tasks. I think the pattern is also useful for agents, because the fundamental problem is the same.

In this post, I will discuss the Transactional Outbox pattern in the context of agents and provide an opinionated view of why I believe it is a best practice for agentic applications. I will discuss the pattern around the following question: once an agent makes a business decision, how can you ensure the rest of the system can rely on it?

The problem is the handoff

Developers often stress about designing systems that can survive unimaginable production incidents. But the reality is, you don't need a major outage or some exotic distributed systems incident to witness the worst. Sometimes it is as simple as the need for one service to try to do two related things in two separate steps. Like, first, it updates the business state, then it publishes an event for downstream systems.

That looks harmless until something fails in between. If the state update succeeds and the event publish does not, the source of truth has moved forward, but the rest of the workflow has not.

Here is that failure in one picture:

What makes this annoying is that nothing looks obviously broken at first. During an incident investigation, if someone checks the case record, it looks correct. The problem only shows up later when billing never acts, the customer complains, or support has to manually reconcile what happened. That is why I think this is not an AI problem. It is a handoff problem.

Motivation for the Transactional Outbox pattern

The Transactional Outbox pattern exists because "save state, then publish the event" is fragile by design. The pattern gives you a cleaner contract: when business state changes, the application also writes an outbox event in the same atomic operation.

That one change removes the worst failure mode. You no longer end up in the state where the case changed, but the event silently disappeared. It also keeps the request path honest. The service does not have to directly coordinate billing, notifications, CRM sync, and everything else just to be correct.

Instead, the request path only needs to guarantee one thing: the decision and the outbox event are committed together. Once that happens, everything else becomes recoverable instead of fragile.

That is why this pattern fits agentic systems so well. Agents make decisions that trigger follow-up work, but those decisions need a durable "this happened" moment before the rest of the system can safely react.

Why is "Just Retry the Publish" not enough?

Every time I promote a discussion with developers around the Transactional Outbox pattern, I often hear them saying, "Why not just retry if the publish fails?" I think that happens because implementing the pattern correctly requires certain design decisions and technology stacks. The instinct is usually to look for a simpler alternative.

For this reason, I like to stress the following: the use of retries is reasonable until you look closely at where the failure occurs. This means that retries only help if the application still knows it has something to retry. If the process crashes after the state update but before the event is durably recorded anywhere, there is nothing left to retry.

That is the key difference between retries and an outbox. Retries help you deliver an event that already exists, while the outbox ensures the event exists in the first place. Once you look at it that way, the pattern feels less like a ceremony and more like basic design principles. If the business state changes, the system needs a durable record of the event that describes that change.

Redis Streams is great for this pattern

Redis Streams are a good fit for this kind of outbox because they already behave like the commit log we want. You can append events to them, consume them in order, track what is pending, and let different consumer groups process the same stream independently. That matters because the outbox is not really a queue in the narrow sense. It is a commit log for business events.

Admittedly, the Transactional Outbox pattern is often implemented using Apache Kafka and technologies such as Debezium. That is where the pattern became most notorious. I helped many developers implement this pattern with Kafka, and it works great for getting things done. However, because I have tons of experience with Kafka, I can say that the implementation effort can sometimes exceed the main problem they were trying to solve due to Kafka's inherent complexity. You spend more time dealing with Kafka than the actual problem.

Redis Streams, on the other hand, makes that pretty natural. A single event can be appended once and then processed independently by several downstream concerns. The other reason Streams fit well is that they sit comfortably inside Redis. If your support case state also lives in Redis, the state change and the outbox append can share one commit boundary.

That part is important. The pattern is strongest when the business state and the outbox live in the same datastore, because that gives you a single atomic write instead of a dual-write problem wearing different clothes. With Kafka, you would need to handle two different distributed systems: the commit log itself and the data store where the update must occur.

Diving deep into the architecture

For this example, the support case state and the outbox both live in Redis. The current case state is stored in a hash, and the outbox is stored in a Redis Stream.

A case key might look like support:{tenant-acme}:case:case-123, while the outbox stream might be support:{tenant-acme}:outbox. The use of hash tags here is important because you must be intentional about where the data will be stored in Redis. During development, you may work with a single server which is the equivalent of a single shard. The data will naturally live in the same place. However, in production, you may have a clustered Redis environment with multiple shards.

The shared hash tag keeps both keys in the same slot in clustered Redis, which is what lets them participate in the same transaction. That gives us a clean split of responsibilities. The case record tells us what is true now, and the outbox stream tells us what happened and what the rest of the platform still needs to process. Yes, a relatively simple use of key prefixes could make this entire implementation useless if not carefully thought out.

From there, downstream concerns consume the stream through their own consumer groups. Billing can issue the refund, notifications can contact the customer, and CRM sync can update external systems, all without forcing the support service to orchestrate them directly in the request path.

That flow looks like this:

The thing I like most about the Transactional Outbox pattern is that it keeps responsibilities clear. The support service is responsible for making the decision durable, and the rest of the platform is responsible for responding to it.

Trade-offs that are interesting to consider

The basic implementation of the pattern is simple. The design choices around it are where things get interesting. One of the first questions you must ask is where your source of truth lives. If the support case is also in Redis, the case update and the outbox append can share one transaction. If the case lives somewhere else and Redis only holds the stream, you are back in dual-write territory.

Another big choice is partitioning. It is tempting to imagine a single global outbox stream for the whole application, but that often becomes awkward in a clustered Redis setup. A per-tenant stream is often a better balance. It keeps related events together, provides useful ordering, and avoids making every transactional write depend on a single global key. It also makes querying and data retrieval a bit easier during investigation scenarios.

Consumer isolation is another trade-off that is worth saying out loud. One consumer group per downstream concern is a very nice model operationally, because billing, notifications, and CRM sync can all move at their own pace. The flip side is that you now own several background workflows. Each one has lag, retries, health, and recovery behavior to think about. This is where the world of microservices cross paths again with agentic systems. Each agent is not only a set of code and resources. They also bring operational complexities that must be owned by someone.

Retention matters too. An outbox is a log, and logs grow. If you trim too aggressively, you lose the replay window and the investigation history. If you never trim at all, the stream just keeps growing and eventually becomes an operational problem in its own right. Deciding how large the stream is allowed to grow must be a discussion that takes place before the app even goes to production. Not an afterthought.

Durability is another place where the architecture gets real fast. If the outbox carries important business decisions like refunds, escalations, or account changes, Redis is no longer "just a cache" in this design. It is part of the system's correctness model. You must treat Redis as a single source of truth, and as such, think carefully about how to handle details like replication, failover, and geographic disasters.

Finally, there is idempotency. The outbox makes the handoff reliable, but it does not magically make downstream effects exactly-once behavior in the business sense. If a worker crashes after reading but before acknowledging, another worker may retry the same event later. That means the side effect needs to be safe to run more than once. The usual instinct for developers is to write the worker as a function that hooks into the stream, pulls the latest records, and processes them as if the data is simply mutable. Nope, you must treat them as immutable objects.

Okay, let's see some code

This post is not meant to be a complete implementation reference, but I know that, as a developer, looking at code helps make understanding concrete. I will try to provide the example with fewer details so you can understand the design principles. I'm sure your coding agent can help with your actual final code. Also, I will use Java because it comes naturally to me — but feel free to ask your coding agent to translate it into another language.

Let's start by looking for a runtime helper class that instantiates Jedis, a Redis client for Java:

import redis.clients.jedis.RedisClient;
import redis.clients.jedis.UnifiedJedis;

public final class RuntimeSupport {

    public UnifiedJedis createJedisFromEnv() {
        String redisHost = System.getenv().getOrDefault("REDIS_HOST", "localhost");
        int redisPort = Integer.parseInt(System.getenv().getOrDefault("REDIS_PORT", "6379"));

        return RedisClient.builder()
                .hostAndPort(redisHost, redisPort)
                .build();
    }
}

Next, let's take a look at how keys and group naming are handled in a small constants class instead of scattering strings through the code.

public final class SupportConstants {
    public static final String STREAM_GROUP_START_ID = "0-0";
    public static final String BILLING_GROUP_NAME = "billing-cg";
    public static final String NOTIFICATIONS_GROUP_NAME = "notifications-cg";
    public static final String CRM_SYNC_GROUP_NAME = "crm-sync-cg";

    private SupportConstants() {}
}

For the Redis keys themselves, a small helper record keeps the slotting decision obvious:

public record SupportKeys(String caseKey, String outboxKey) {

    public static SupportKeys forCase(String tenantId, String caseId) {
        String hashTag = "{" + tenantId + "}";
        return new SupportKeys(
                "support:" + hashTag + ":case:" + caseId,
                "support:" + hashTag + ":outbox"
        );
    }
}

The core write path is where the architectural idea becomes real. When the support service accepts the agent's decision, it updates the case state and appends a RefundApproved event in a single Redis transaction.

import redis.clients.jedis.AbstractTransaction;
import redis.clients.jedis.Response;
import redis.clients.jedis.StreamEntryID;
import redis.clients.jedis.UnifiedJedis;

import java.time.Instant;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.UUID;

public final class RefundApprovalService {
    private final UnifiedJedis jedis;

    public RefundApprovalService(UnifiedJedis jedis) {
        this.jedis = Objects.requireNonNull(jedis, "jedis must not be null");
    }

    public RefundCommitted approveRefund(RefundDecision decision) {
        SupportKeys keys = SupportKeys.forCase(decision.tenantId(), decision.caseId());

        Map<String, String> caseFields = new LinkedHashMap<>();
        caseFields.put("case_id", decision.caseId());
        caseFields.put("customer_id", decision.customerId());
        caseFields.put("refund_id", decision.refundId());
        caseFields.put("status", "refund_approved");
        caseFields.put("decision_source", "support-agent");
        caseFields.put("updated_at", decision.decidedAt().toString());

        Map<String, String> outboxFields = new LinkedHashMap<>();
        outboxFields.put("event_id", decision.eventId());
        outboxFields.put("event_type", "RefundApproved");
        outboxFields.put("case_id", decision.caseId());
        outboxFields.put("customer_id", decision.customerId());
        outboxFields.put("refund_id", decision.refundId());
        outboxFields.put("decision_source", "support-agent");
        outboxFields.put("occurred_at", decision.decidedAt().toString());

        try (AbstractTransaction redisTx = jedis.multi()) {
            redisTx.hset(keys.caseKey(), caseFields);
            Response<StreamEntryID> streamEntryId =
                    redisTx.xadd(keys.outboxKey(), StreamEntryID.NEW_ENTRY, outboxFields);

            List<Object> execResults = redisTx.exec();
            if (execResults == null) {
                throw new IllegalStateException("Refund approval transaction aborted");
            }

            return new RefundCommitted(
                    decision.caseId(),
                    decision.eventId(),
                    streamEntryId.get().toString()
            );
        }
    }

    public record RefundDecision(
            String tenantId,
            String caseId,
            String customerId,
            String refundId,
            String eventId,
            Instant decidedAt
    ) {
        public static RefundDecision create(
                String tenantId,
                String caseId,
                String customerId,
                String refundId
        ) {
            return new RefundDecision(
                    tenantId,
                    caseId,
                    customerId,
                    refundId,
                    UUID.randomUUID().toString(),
                    Instant.now()
            );
        }
    }

    public record RefundCommitted(
            String caseId,
            String eventId,
            String streamEntryId
    ) {}
}

This one method is the whole architectural point made concrete. If the transaction does not complete, neither the case update nor the outbox event exists. If it does complete, both exist. That is the durability boundary that the rest of the workflow can rely on.

Here is the same moment as a diagram:

On the consumer side, the worker will act on the message written to the stream.

import redis.clients.jedis.StreamEntryID;
import redis.clients.jedis.UnifiedJedis;
import redis.clients.jedis.exceptions.JedisDataException;
import redis.clients.jedis.params.XReadGroupParams;
import redis.clients.jedis.resps.StreamEntry;

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import java.util.Objects;

import static redis.clients.jedis.StreamEntryID.XREADGROUP_UNDELIVERED_ENTRY;

public final class BillingConsumer {
    private static final StreamEntryID PENDING_ID =
            new StreamEntryID(SupportConstants.STREAM_GROUP_START_ID);
    private static final StreamEntryID NEW_ENTRY_ID = XREADGROUP_UNDELIVERED_ENTRY;

    private final UnifiedJedis jedis;
    private final BillingGateway billingGateway;
    private final String consumerName;

    public BillingConsumer(
            UnifiedJedis jedis,
            BillingGateway billingGateway,
            String consumerName
    ) {
        this.jedis = Objects.requireNonNull(jedis, "jedis must not be null");
        this.billingGateway = Objects.requireNonNull(billingGateway, "billingGateway must not be null");
        this.consumerName = Objects.requireNonNull(consumerName, "consumerName must not be null");
    }

    public void run(String tenantId) throws InterruptedException {
        String outboxKey = SupportKeys.forCase(tenantId, "unused").outboxKey();
        createConsumerGroup(outboxKey);

        while (!Thread.currentThread().isInterrupted()) {
            List<StreamMessage> pendingEntries = readGroup(outboxKey, PENDING_ID, 10);
            if (!pendingEntries.isEmpty()) {
                processEntries(outboxKey, pendingEntries);
                continue;
            }

            List<StreamMessage> newEntries = readGroup(outboxKey, NEW_ENTRY_ID, 10);
            if (!newEntries.isEmpty()) {
                processEntries(outboxKey, newEntries);
            } else {
                Thread.sleep(200L);
            }
        }
    }

    private void createConsumerGroup(String outboxKey) {
        try {
            jedis.xgroupCreate(
                    outboxKey,
                    SupportConstants.BILLING_GROUP_NAME,
                    new StreamEntryID(SupportConstants.STREAM_GROUP_START_ID),
                    true
            );
        } catch (JedisDataException e) {
            if (!e.getMessage().contains("BUSYGROUP")) {
                throw e;
            }
        }
    }

    private List<StreamMessage> readGroup(String outboxKey, StreamEntryID streamEntryID, int count) {
        XReadGroupParams params = XReadGroupParams.xReadGroupParams().count(count);

        List<Map.Entry<String, List<StreamEntry>>> rawEntries = jedis.xreadGroup(
                SupportConstants.BILLING_GROUP_NAME,
                consumerName,
                params,
                Map.of(outboxKey, streamEntryID)
        );

        return parseEntries(rawEntries);
    }

    private void processEntries(String outboxKey, List<StreamMessage> entries) {
        for (StreamMessage entry : entries) {
            if (!"RefundApproved".equals(entry.fields().get("event_type"))) {
                jedis.xack(
                        outboxKey,
                        SupportConstants.BILLING_GROUP_NAME,
                        new StreamEntryID(entry.id())
                );
                continue;
            }

            billingGateway.issueRefund(
                    entry.fields().get("refund_id"),
                    entry.fields().get("customer_id"),
                    entry.fields().get("event_id")
            );

            jedis.xack(
                    outboxKey,
                    SupportConstants.BILLING_GROUP_NAME,
                    new StreamEntryID(entry.id())
            );
        }
    }

    private static List<StreamMessage> parseEntries(List<Map.Entry<String, List<StreamEntry>>> rawEntries) {
        if (rawEntries == null || rawEntries.isEmpty()) {
            return Collections.emptyList();
        }

        List<StreamMessage> entries = new ArrayList<>();
        for (Map.Entry<String, List<StreamEntry>> streamData : rawEntries) {
            for (StreamEntry streamEntry : streamData.getValue()) {
                entries.add(new StreamMessage(
                        streamEntry.getID().toString(),
                        streamEntry.getFields()
                ));
            }
        }

        return entries;
    }

    private record StreamMessage(String id, Map<String, String> fields) {}

    public interface BillingGateway {
        void issueRefund(String refundId, String customerId, String idempotencyKey);
    }
}

Closing

What I like most about the Transactional Outbox pattern is that it respects the actual shape of agentic systems. Agents are good at deciding what should happen next given a flow, but the platform is still responsible for turning that decision into a durable state and letting the rest of the workflow react safely. The pattern gives you a clean handoff for that.

Redis Streams make it practical when your application state and the outbox both live in Redis. That doesn't make the design free of trade-offs. You still need to think about partitioning, retention, durability, lag, and idempotency. It just prevents you from thinking about dual-write problems. It gives you a system where an agent's decision becomes a durable fact before the rest of the platform starts depending on it.

By applying the Transactional Outbox pattern in your agents, you can be the difference between an agent that looks clever in a demo and a system you can actually trust.

Syncing Data from Amazon DynamoDB to Redis with Apache SeaTunnel

Ricardo Ferreira — Fri, 09 Jan 2026 17:28:17 +0000

Let's say you have customer data stored in Amazon DynamoDB, which serves as the single source of truth for your application. However, now you need that same data in Redis for lightning-fast caching, real-time analytics, or perhaps to power a recommendation engine with vector search capabilities. As an example, consider this table named mySourceTable stored at DynamoDB, which currently contains two items.

The challenge? How do you keep them in sync without writing a massive pile of code?

Personally, my ideal go-to solution is to use Redis Data Integration (RDI), as I have demonstrated and explained the why in this other post. Unfortunately, at the time of writing this blog, RDI does not support Amazon DynamoDB as a source. That leaves developers with the option to write complex data integration pipelines themselves or to invest in costly (and often overpriced) ETL solutions. None of these options is pleasant.

But there is another way.

Enter Apache SeaTunnel. An open source data integration project that makes this sync as easy as writing a config file. In this blog post, I will delve into this project and explain, step by step, how to synchronize data from Amazon DynamoDB to Redis.

Apache SeaTunnel: Your Open Source Data Pipeline Buddy

Apache SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool that supports both batch and streaming modes. Think of it as the universal translator for your data systems. It speaks DynamoDB, Redis, and over 100 other data systems fluently.

What makes SeaTunnel special for this use case:

Zero Code: Define your pipeline in simple configuration files
Distributed: Scale syncing horizontally when your data grows
Fault Tolerant: Built-in checkpointing ensures no data loss

You can use SeaTunnel with two distinct configurations: standalone mode for one-off data movement, or cluster mode, which enables a data pipeline infrastructure to be up and running, always ready to execute new jobs.

Option 1: The Quick Hit (Standalone Mode)

This option is perfect for one-time migrations, local development, or scheduled batch jobs. One command, and you're done.

Your first step is to create your pipeline configuration (jobs/my.config):

env {
  parallelism = 1
  job.mode = "BATCH"
}

source {
  Amazondynamodb {
    url = "http://dynamodb.us-east-1.amazonaws.com"
    region = "us-east-1"
    access_key_id = "YOUR_ACCESS_KEY"
    secret_access_key = "YOUR_SECRET_KEY"
    table = "mySourceTable"
    schema = {
      fields {
        customerId = int
        customerName = string
        address = string
      }
    }
  }  
}

sink {
  Redis {
    host = host.docker.internal  # or your Redis host
    port = 6379
    support_custom_key = true
    key = "customer:{customerId}"
    data_type = hash
  }
}

Run it with a single Docker command:

docker run --rm -it \
  -v $(pwd)/jobs:/config \
  apache/seatunnel:2.3.12 \
  ./bin/seatunnel.sh -m local -c /config/my.config

This command starts a new container with a standalone instance of SeaTunnel, which executes the job described in the file my.config. You should see an output like this:

2026-01-09 16:57:50,786 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Loading configuration '/opt/seatunnel/config/seatunnel.yaml' from System property 'seatunnel.config'
2026-01-09 16:57:50,788 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Using configuration file at /opt/seatunnel/config/seatunnel.yaml
2026-01-09 16:57:50,789 INFO  [o.a.s.e.c.c.SeaTunnelConfig   ] [main] - seatunnel.home is /opt/seatunnel
2026-01-09 16:57:50,830 INFO  [amlSeaTunnelDomConfigProcessor] [main] - Dynamic slot is enabled, the schedule strategy is set to REJECT
2026-01-09 16:57:50,830 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Loading configuration '/opt/seatunnel/config/hazelcast.yaml' from System property 'hazelcast.config'
2026-01-09 16:57:50,830 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Using configuration file at /opt/seatunnel/config/hazelcast.yaml
2026-01-09 16:57:50,962 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Loading configuration '/opt/seatunnel/config/hazelcast-client.yaml' from System property 'hazelcast.client.config'
2026-01-09 16:57:50,962 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Using configuration file at /opt/seatunnel/config/hazelcast-client.yaml
2026-01-09 16:57:50,985 WARN  [c.h.i.AddressPicker           ] [main] - [LOCAL] [seatunnel-420804] [5.1] You configured your member address as host name. Please be aware of that your dns can be spoofed. Make sure that your dns configurations are correct.
2026-01-09 16:57:50,985 INFO  [c.h.i.AddressPicker           ] [main] - [LOCAL] [seatunnel-420804] [5.1] Resolving domain name 'localhost' to address(es): [127.0.0.1, 0:0:0:0:0:0:0:1]
2026-01-09 16:57:50,986 INFO  [c.h.i.AddressPicker           ] [main] - [LOCAL] [seatunnel-420804] [5.1] Interfaces is disabled, trying to pick one address from TCP-IP config addresses: [localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1]
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2026-01-09 16:57:51,189 INFO  [o.a.s.e.s.SeaTunnelServer     ] [main] - SeaTunnel server start...
2026-01-09 16:57:51,191 INFO  [c.h.system                    ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Based on Hazelcast IMDG version: 5.1.0 (20220228 - 21f20e7)
2026-01-09 16:57:51,191 INFO  [c.h.system                    ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Cluster name: seatunnel-420804
2026-01-09 16:57:51,191 INFO  [c.h.system                    ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] 

 _____               _____                             _ 
/  ___|             |_   _|                           | |
\ `--.   ___   __ _   | |   _   _  _ __   _ __    ___ | |
 `--. \ / _ \ / _` |  | |  | | | || '_ \ | '_ \  / _ \| |
/\__/ /|  __/| (_| |  | |  | |_| || | | || | | ||  __/| |
\____/  \___| \__,_|  \_/   \__,_||_| |_||_| |_| \___||_|


2026-01-09 16:57:51,191 INFO  [c.h.system                    ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Copyright © 2021-2024 The Apache Software Foundation. Apache SeaTunnel, SeaTunnel, and its feather logo are trademarks of The Apache Software Foundation.
2026-01-09 16:57:51,191 INFO  [c.h.system                    ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Integrity Checker is disabled. Fail-fast on corrupted executables will not be performed.
To enable integrity checker do one of the following: 
  - Change member config using Java API: config.setIntegrityCheckerEnabled(true);
  - Change XML/YAML configuration property: Set hazelcast.integrity-checker.enabled to true
  - Add system property: -Dhz.integritychecker.enabled=true (for Hazelcast embedded, works only when loading config via Config.load)
  - Add environment variable: HZ_INTEGRITYCHECKER_ENABLED=true (recommended when running container image. For Hazelcast embedded, works only when loading config via Config.load)
2026-01-09 16:57:51,193 INFO  [c.h.system                    ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] The Jet engine is disabled.
To enable the Jet engine on the members, do one of the following:
  - Change member config using Java API: config.getJetConfig().setEnabled(true)
  - Change XML/YAML configuration property: Set hazelcast.jet.enabled to true
  - Add system property: -Dhz.jet.enabled=true (for Hazelcast embedded, works only when loading config via Config.load)
  - Add environment variable: HZ_JET_ENABLED=true (recommended when running container image. For Hazelcast embedded, works only when loading config via Config.load)
2026-01-09 16:57:51,331 INFO  [c.h.s.security                ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Enable DEBUG/FINE log level for log category com.hazelcast.system.security  or use -Dhazelcast.security.recommendations system property to see 🔒 security recommendations and the status of current config.
2026-01-09 16:57:51,392 INFO  [o.a.s.e.s.SeaTunnelNodeContext] [main] - Using LiteNodeDropOutTcpIpJoiner TCP/IP discovery
2026-01-09 16:57:51,393 WARN  [c.h.c.CPSubsystem             ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] CP Subsystem is not enabled. CP data structures will operate in UNSAFE mode! Please note that UNSAFE mode will not provide strong consistency guarantees.
2026-01-09 16:57:51,462 INFO  [.c.c.DefaultClassLoaderService] [main] - start classloader service with cache mode
2026-01-09 16:57:51,464 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Loading configuration '/opt/seatunnel/config/seatunnel.yaml' from System property 'seatunnel.config'
2026-01-09 16:57:51,464 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Using configuration file at /opt/seatunnel/config/seatunnel.yaml
2026-01-09 16:57:51,466 INFO  [amlSeaTunnelDomConfigProcessor] [main] - Dynamic slot is enabled, the schedule strategy is set to REJECT
2026-01-09 16:57:51,466 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Loading configuration '/opt/seatunnel/config/hazelcast.yaml' from System property 'hazelcast.config'
2026-01-09 16:57:51,466 INFO  [c.h.i.c.AbstractConfigLocator ] [main] - Using configuration file at /opt/seatunnel/config/hazelcast.yaml
2026-01-09 16:57:51,471 WARN  [o.a.s.e.s.TaskExecutionService] [pool-3-thread-1] - [localhost]:5801 [seatunnel-420804] [5.1] The Node is not ready yet, Node state STARTING,looking forward to the next scheduling
2026-01-09 16:57:51,471 INFO  [o.a.s.e.s.TaskExecutionService] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Created new BusWork : 1257526338
2026-01-09 16:57:51,474 WARN  [a.s.e.s.s.s.DefaultSlotService] [hz.main.seaTunnel.slotService.thread] - failed send heartbeat to resource manager, will retry later. this address: [localhost]:5801
2026-01-09 16:57:51,475 INFO  [o.a.s.e.s.CoordinatorService  ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Start pending job schedule thread
2026-01-09 16:57:51,554 WARN  [o.a.h.u.NativeCodeLoader      ] [main] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2026-01-09 16:57:51,607 INFO  [o.a.s.e.s.CoordinatorService  ] [pool-7-thread-1] - [localhost]:5801 [seatunnel-420804] [5.1] 
***********************************************
     CoordinatorService Thread Pool Status
***********************************************
activeCount               :                   1
corePoolSize              :                  10
maximumPoolSize           :          2147483647
poolSize                  :                   1
completedTaskCount        :                   0
taskCount                 :                   1
***********************************************

2026-01-09 16:57:51,612 INFO  [o.a.s.e.s.JettyService        ] [main] - SeaTunnel REST service will start on port 8080
2026-01-09 16:57:51,617 INFO  [o.a.s.s.o.e.j.u.log           ] [main] - Logging initialized @1069ms to org.apache.seatunnel.shade.org.eclipse.jetty.util.log.Slf4jLog
2026-01-09 16:57:51,643 WARN  [a.s.s.o.e.j.s.h.ContextHandler] [main] - Empty contextPath
2026-01-09 16:57:51,652 INFO  [o.a.s.s.o.e.j.s.Server        ] [main] - jetty-9.4.56.v20240826; built: 2024-08-26T17:15:05.868Z; git: ec6782ff5ead824dabdcf47fa98f90a4aedff401; jvm 1.8.0_342-b07
2026-01-09 16:57:51,665 INFO  [o.a.s.s.o.e.j.s.session       ] [main] - DefaultSessionIdManager workerName=node0
2026-01-09 16:57:51,665 INFO  [o.a.s.s.o.e.j.s.session       ] [main] - No SessionScavenger set, using defaults
2026-01-09 16:57:51,666 INFO  [o.a.s.s.o.e.j.s.session       ] [main] - node0 Scavenging every 660000ms
2026-01-09 16:57:51,727 INFO  [a.s.s.o.e.j.s.h.ContextHandler] [main] - Started o.a.s.s.o.e.j.s.ServletContextHandler@7069f076{/,null,AVAILABLE}
2026-01-09 16:57:51,731 INFO  [.s.s.o.e.j.s.AbstractConnector] [main] - Started ServerConnector@6e31d989{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2026-01-09 16:57:51,732 INFO  [o.a.s.s.o.e.j.s.Server        ] [main] - Started @1184ms
2026-01-09 16:57:51,747 INFO  [c.h.i.d.Diagnostics           ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2026-01-09 16:57:51,749 INFO  [c.h.c.LifecycleService        ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] [localhost]:5801 is STARTING
Members {size:1, ver:1} [
        Member [localhost]:5801 - 579e9374-7d47-4d91-80d5-c7a9155d42d0 [master node] [active master] this
]

2026-01-09 16:57:52,775 INFO  [o.a.s.e.s.CoordinatorService  ] [pool-5-thread-1] - [localhost]:5801 [seatunnel-420804] [5.1] This node become a new active master node, begin init coordinator service
2026-01-09 16:57:52,779 INFO  [c.h.c.LifecycleService        ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] [localhost]:5801 is STARTED
2026-01-09 16:57:52,797 INFO  [o.a.s.e.s.CoordinatorService  ] [pool-5-thread-1] - [localhost]:5801 [seatunnel-420804] [5.1] Loaded event handlers: [org.apache.seatunnel.api.event.LoggingEventHandler@1ebbcf65]
2026-01-09 16:57:52,803 INFO  [.c.i.s.ClientInvocationService] [main] - hz.client_1 [seatunnel-420804] [5.1] Running with 2 response threads, dynamic=true
2026-01-09 16:57:52,807 INFO  [.h.i.p.i.PartitionStateManager] [seatunnel-coordinator-service-1] - [localhost]:5801 [seatunnel-420804] [5.1] Initializing cluster partition table arrangement...
2026-01-09 16:57:52,811 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-420804] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is STARTING
2026-01-09 16:57:52,811 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-420804] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is STARTED
2026-01-09 16:57:52,814 INFO  [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel-420804] [5.1] Trying to connect to cluster: seatunnel-420804
2026-01-09 16:57:52,815 INFO  [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel-420804] [5.1] Trying to connect to [localhost]:5801
2026-01-09 16:57:52,823 INFO  [.p.t.AuthenticationMessageTask] [hz.main.priority-generic-operation.thread-0] - [localhost]:5801 [seatunnel-420804] [5.1] Received auth from Connection[id=1, /127.0.0.1:5801->/127.0.0.1:45255, qualifier=null, endpoint=[127.0.0.1]:45255, remoteUuid=3be41dca-f0e7-4603-ad27-4e5b3aecbb4a, alive=true, connectionType=JVM, planeIndex=-1], successfully authenticated, clientUuid: 3be41dca-f0e7-4603-ad27-4e5b3aecbb4a, client name: hz.client_1, client version: 5.1
2026-01-09 16:57:52,824 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-420804] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_CONNECTED
2026-01-09 16:57:52,824 INFO  [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel-420804] [5.1] Authenticated with server [localhost]:5801:579e9374-7d47-4d91-80d5-c7a9155d42d0, server version: 5.1, local address: /127.0.0.1:45255
2026-01-09 16:57:52,825 INFO  [c.h.i.d.Diagnostics           ] [main] - hz.client_1 [seatunnel-420804] [5.1] Diagnostics disabled. To enable add -Dhazelcast.diagnostics.enabled=true to the JVM arguments.
2026-01-09 16:57:52,829 INFO  [c.h.c.i.s.ClientClusterService] [hz.client_1.event-10] - hz.client_1 [seatunnel-420804] [5.1] 

Members [1] {
        Member [localhost]:5801 - 579e9374-7d47-4d91-80d5-c7a9155d42d0 [master node]
}

2026-01-09 16:57:52,841 INFO  [.c.i.s.ClientStatisticsService] [main] - Client statistics is enabled with period 5 seconds.
2026-01-09 16:57:52,858 INFO  [o.a.s.c.s.u.ConfigBuilder     ] [main] - Loading config file from path: /config/my.config
2026-01-09 16:57:52,951 INFO  [o.a.s.c.s.u.ConfigShadeUtils  ] [main] - Load config shade spi: [base64]
2026-01-09 16:57:52,967 INFO  [o.a.s.c.s.u.ConfigBuilder     ] [main] - Parsed config file: 
{
    "env" : {
        "parallelism" : 1,
        "job.mode" : "BATCH"
    },
    "source" : [
        {
            "url" : "http://dynamodb.us-east-1.amazonaws.com",
            "region" : "us-east-1",
            "access_key_id" : "<REDACTED>",
            "secret_access_key" : "<REDACTED>",
            "table" : "mySourceTable",
            "schema" : {
                "fields" : {
                    "customerId" : "int",
                    "customerName" : "string",
                    "address" : "string"
                }
            },
            "plugin_name" : "Amazondynamodb"
        }
    ],
    "sink" : [
        {
            "host" : "host.docker.internal",
            "port" : 6379,
            "support_custom_key" : true,
            "key" : "customer:{customerId}",
            "data_type" : "hash",
            "plugin_name" : "Redis"
        }
    ]
}

2026-01-09 16:57:52,971 INFO  [p.MultipleTableJobConfigParser] [main] - add common jar in plugins :[]
2026-01-09 16:57:52,978 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Load SeaTunnelSink Plugin from /opt/seatunnel/connectors
2026-01-09 16:57:52,980 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Discovery plugin jar for: PluginIdentifier{engineType='seatunnel', pluginType='source', pluginName='Amazondynamodb'} at: [file:/opt/seatunnel/connectors/connector-amazondynamodb-2.3.12.jar]
2026-01-09 16:57:52,980 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - find connector jar and dependency for PluginIdentifier{engineType='seatunnel', pluginType='source', pluginName='Amazondynamodb'}: [file:/opt/seatunnel/connectors/connector-amazondynamodb-2.3.12.jar]
2026-01-09 16:57:52,982 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Load SeaTunnelSink Plugin from /opt/seatunnel/connectors
2026-01-09 16:57:52,985 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Load SeaTunnelSink Plugin from /opt/seatunnel/connectors
2026-01-09 16:57:52,986 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Discovery plugin jar for: PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='Redis'} at: [file:/opt/seatunnel/connectors/connector-redis-2.3.12.jar]
2026-01-09 16:57:52,986 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - find connector jar and dependency for PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='Redis'}: [file:/opt/seatunnel/connectors/connector-redis-2.3.12.jar]
2026-01-09 16:57:52,988 INFO  [p.MultipleTableJobConfigParser] [main] - start generating all sources.
2026-01-09 16:57:53,003 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - get the CatalogTable from source AmazonDynamodb: .default.default.default
2026-01-09 16:57:53,008 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Load SeaTunnelSource Plugin from /opt/seatunnel/connectors
2026-01-09 16:57:53,011 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Discovery plugin jar for: PluginIdentifier{engineType='seatunnel', pluginType='source', pluginName='Amazondynamodb'} at: [file:/opt/seatunnel/connectors/connector-amazondynamodb-2.3.12.jar]
2026-01-09 16:57:53,011 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - find connector jar and dependency for PluginIdentifier{engineType='seatunnel', pluginType='source', pluginName='Amazondynamodb'}: [file:/opt/seatunnel/connectors/connector-amazondynamodb-2.3.12.jar]
2026-01-09 16:57:53,012 INFO  [p.MultipleTableJobConfigParser] [main] - start generating all transforms.
2026-01-09 16:57:53,012 INFO  [p.MultipleTableJobConfigParser] [main] - start generating all sinks.
2026-01-09 16:57:53,013 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Load SeaTunnelSink Plugin from /opt/seatunnel/connectors
2026-01-09 16:57:53,014 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - Discovery plugin jar for: PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='Redis'} at: [file:/opt/seatunnel/connectors/connector-redis-2.3.12.jar]
2026-01-09 16:57:53,014 INFO  [.s.p.d.AbstractPluginDiscovery] [main] - find connector jar and dependency for PluginIdentifier{engineType='seatunnel', pluginType='sink', pluginName='Redis'}: [file:/opt/seatunnel/connectors/connector-redis-2.3.12.jar]
2026-01-09 16:57:53,024 INFO  [o.a.s.a.t.f.FactoryUtil       ] [main] - Create sink 'Redis' with upstream input catalog-table[database: default, schema: default, table: default]
2026-01-09 16:57:53,042 INFO  [o.a.s.e.c.j.ClientJobProxy    ] [main] - Start submit job, job id: 1062052604328017921, with plugin jar [file:/opt/seatunnel/connectors/connector-amazondynamodb-2.3.12.jar, file:/opt/seatunnel/connectors/connector-redis-2.3.12.jar]
2026-01-09 16:57:53,047 INFO  [.e.s.r.AbstractResourceManager] [hz.main.client.thread-4] - Init ResourceManager
2026-01-09 16:57:53,047 INFO  [.e.s.r.AbstractResourceManager] [hz.main.client.thread-4] - initWorker... 
2026-01-09 16:57:53,047 INFO  [.e.s.r.AbstractResourceManager] [hz.main.client.thread-4] - init live nodes: [[localhost]:5801]
2026-01-09 16:57:53,048 INFO  [.e.s.r.AbstractResourceManager] [SeaTunnel-CompletableFuture-Thread-2] - received new worker register: [localhost]:5801
2026-01-09 16:57:53,152 INFO  [o.a.s.c.s.r.c.RedisParameters ] [hz.main.seaTunnel.task.thread-4] - Try to get redis version information from the jedis.info() method
2026-01-09 16:57:53,152 INFO  [o.a.s.c.s.r.c.RedisParameters ] [hz.main.seaTunnel.task.thread-4] - The version of Redis is :8.4.0
2026-01-09 16:57:53,154 INFO  [a.s.a.s.m.MultiTableSinkWriter] [hz.main.seaTunnel.task.thread-4] - init multi table sink writer, queue size: 1
2026-01-09 16:57:53,233 INFO  [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-12] - log event: ReaderOpenEvent(createdTime=1767977873233, jobId=1062052604328017921, eventType=LIFECYCLE_READER_OPEN)
2026-01-09 16:57:53,402 INFO  [.s.t.SourceSplitEnumeratorTask] [hz.main.seaTunnel.task.thread-4] - received reader register, readerID: TaskLocation{taskGroupLocation=TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}, taskID=1000200000000, index=0}
2026-01-09 16:57:53,429 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-9] - checkpoint is disabled, because in batch mode and 'checkpoint.interval' of env is missing.
2026-01-09 16:57:53,506 INFO  [a.s.AmazonDynamoDBSourceReader] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - AmazonDynamoDB Source Reader [0] waiting for splits
2026-01-09 16:57:53,506 INFO  [a.s.AmazonDynamoDBSourceReader] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - AmazonDynamoDB Source Reader [0] waiting for splits
2026-01-09 16:57:53,529 INFO  [.s.t.SourceSplitEnumeratorTask] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] - received enough reader, starting enumerator...
2026-01-09 16:57:53,530 INFO  [nDynamoDBSourceSplitEnumerator] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] - Assigning org.apache.seatunnel.connectors.seatunnel.amazondynamodb.source.AmazonDynamoDBSourceSplit@36fe5b1d to 0 reader.
2026-01-09 16:57:53,530 INFO  [nDynamoDBSourceSplitEnumerator] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] - Assigning org.apache.seatunnel.connectors.seatunnel.amazondynamodb.source.AmazonDynamoDBSourceSplit@81bd425 to 0 reader.
2026-01-09 16:57:53,530 INFO  [nDynamoDBSourceSplitEnumerator] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] - Assign splits [org.apache.seatunnel.connectors.seatunnel.amazondynamodb.source.AmazonDynamoDBSourceSplit@36fe5b1d, org.apache.seatunnel.connectors.seatunnel.amazondynamodb.source.AmazonDynamoDBSourceSplit@81bd425] to reader 0
2026-01-09 16:57:53,533 INFO  [a.s.AmazonDynamoDBSourceReader] [hz.main.generic-operation.thread-21] - Reader [0] received noMoreSplit event.
2026-01-09 16:57:53,934 INFO  [a.s.AmazonDynamoDBSourceReader] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - AmazonDynamoDB Source Reader [0] waiting for splits
2026-01-09 16:57:53,934 INFO  [a.s.AmazonDynamoDBSourceReader] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - Closed the bounded amazonDynamodb source
2026-01-09 16:57:53,941 INFO  [.s.e.s.c.CheckpointCoordinator] [hz.main.generic-operation.thread-23] - skip schedule trigger checkpoint because checkpoint type is COMPLETED_POINT_TYPE
2026-01-09 16:57:53,943 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-8] - wait checkpoint completed: 9223372036854775807
2026-01-09 16:57:55,974 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-8] - pending checkpoint(9223372036854775807/1@1062052604328017921) notify finished!
2026-01-09 16:57:55,974 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-8] - start notify checkpoint completed, job id: 1062052604328017921, pipeline id: 1, checkpoint id:9223372036854775807
2026-01-09 16:57:55,979 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-8] - start clean pending checkpoint cause CheckpointCoordinator completed.
2026-01-09 16:57:55,979 INFO  [.s.e.s.c.CheckpointCoordinator] [seatunnel-coordinator-service-8] - Turn checkpoint_state_1062052604328017921_1 state from null to FINISHED
2026-01-09 16:57:55,987 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] - [localhost]:5801 [seatunnel-420804] [5.1] taskDone, taskId = 1000100000000, taskGroup = TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}
2026-01-09 16:57:55,987 INFO  [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-32] - log event: EnumeratorCloseEvent(createdTime=1767977875987, jobId=1062052604328017921, eventType=LIFECYCLE_ENUMERATOR_CLOSE)
2026-01-09 16:57:55,998 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] - [localhost]:5801 [seatunnel-420804] [5.1] taskGroup TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1} complete with FINISHED
2026-01-09 16:57:55,998 INFO  [o.a.s.e.s.TaskExecutionService] [hz.main.seaTunnel.task.thread-5] - [localhost]:5801 [seatunnel-420804] [5.1] Task TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1} complete with state FINISHED
2026-01-09 16:57:55,998 INFO  [o.a.s.e.s.CoordinatorService  ] [hz.main.seaTunnel.task.thread-5] - [localhost]:5801 [seatunnel-420804] [5.1] Received task end from execution TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}, state FINISHED
2026-01-09 16:57:56,000 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-5] - Job (1062052604328017921), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-Amazondynamodb]-SplitEnumerator (1/1)], taskGroupLocation: [TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] turned from state RUNNING to FINISHED.
2026-01-09 16:57:56,000 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-5] - Job (1062052604328017921), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-Amazondynamodb]-SplitEnumerator (1/1)], taskGroupLocation: [TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] state process is stopped
2026-01-09 16:57:56,000 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-8] - Job (1062052604328017921), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-Amazondynamodb]-SplitEnumerator (1/1)], taskGroupLocation: [TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=1}] future complete with state FINISHED
2026-01-09 16:57:56,039 INFO  [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-33] - log event: ReaderCloseEvent(createdTime=1767977876039, jobId=1062052604328017921, eventType=LIFECYCLE_READER_CLOSE)
2026-01-09 16:57:56,040 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - [localhost]:5801 [seatunnel-420804] [5.1] taskDone, taskId = 1000200000000, taskGroup = TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}
2026-01-09 16:57:56,076 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - [localhost]:5801 [seatunnel-420804] [5.1] taskDone, taskId = 1000200010000, taskGroup = TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}
2026-01-09 16:57:56,077 INFO  [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-34] - log event: WriterCloseEvent(createdTime=1767977876076, jobId=1062052604328017921, eventType=LIFECYCLE_WRITER_CLOSE)
2026-01-09 16:57:56,079 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] - [localhost]:5801 [seatunnel-420804] [5.1] taskGroup TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2} complete with FINISHED
2026-01-09 16:57:56,079 INFO  [o.a.s.e.s.TaskExecutionService] [hz.main.seaTunnel.task.thread-2] - [localhost]:5801 [seatunnel-420804] [5.1] Task TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2} complete with state FINISHED
2026-01-09 16:57:56,079 INFO  [o.a.s.e.s.CoordinatorService  ] [hz.main.seaTunnel.task.thread-2] - [localhost]:5801 [seatunnel-420804] [5.1] Received task end from execution TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}, state FINISHED
2026-01-09 16:57:56,080 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-2] - Job (1062052604328017921), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-Amazondynamodb]-SourceTask (1/1)], taskGroupLocation: [TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] turned from state RUNNING to FINISHED.
2026-01-09 16:57:56,080 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-2] - Job (1062052604328017921), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-Amazondynamodb]-SourceTask (1/1)], taskGroupLocation: [TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] state process is stopped
2026-01-09 16:57:56,080 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-8] - Job (1062052604328017921), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-Amazondynamodb]-SourceTask (1/1)], taskGroupLocation: [TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}] future complete with state FINISHED
2026-01-09 16:57:56,080 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-8] - Job SeaTunnel_Job (1062052604328017921), Pipeline: [(1/1)] will end with state FINISHED
2026-01-09 16:57:56,081 INFO  [o.a.s.e.s.m.JobMaster         ] [hz.main.seaTunnel.task.thread-2] - release the task group resource TaskGroupLocation{jobId=1062052604328017921, pipelineId=1, taskGroupId=2}
2026-01-09 16:57:56,081 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-8] - Job SeaTunnel_Job (1062052604328017921), Pipeline: [(1/1)] turned from state RUNNING to FINISHED.
2026-01-09 16:57:56,081 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-35] - received slot release request, jobID: 1062052604328017921, slot: SlotProfile{worker=[localhost]:5801, slotID=2, ownerJobID=1062052604328017921, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='e992c180-0d29-42db-ac3f-249a1130c0b1'}
2026-01-09 16:57:56,105 INFO  [o.a.s.e.s.m.JobMaster         ] [seatunnel-coordinator-service-8] - release the pipeline Job SeaTunnel_Job (1062052604328017921), Pipeline: [(1/1)] resource
2026-01-09 16:57:56,105 INFO  [a.s.e.s.s.s.DefaultSlotService] [hz.main.generic-operation.thread-39] - received slot release request, jobID: 1062052604328017921, slot: SlotProfile{worker=[localhost]:5801, slotID=1, ownerJobID=1062052604328017921, assigned=true, resourceProfile=ResourceProfile{cpu=CPU{core=0}, heapMemory=Memory{bytes=0}}, sequence='e992c180-0d29-42db-ac3f-249a1130c0b1'}
2026-01-09 16:57:56,106 INFO  [o.a.s.e.s.d.p.SubPlan         ] [seatunnel-coordinator-service-8] - Job SeaTunnel_Job (1062052604328017921), Pipeline: [(1/1)] state process is stop
2026-01-09 16:57:56,106 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-7] - Job SeaTunnel_Job (1062052604328017921), Pipeline: [(1/1)] future complete with state FINISHED
2026-01-09 16:57:56,107 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-7] - Job SeaTunnel_Job (1062052604328017921) turned from state RUNNING to FINISHED.
2026-01-09 16:57:56,107 INFO  [o.a.s.e.s.d.p.PhysicalPlan    ] [seatunnel-coordinator-service-7] - Job SeaTunnel_Job (1062052604328017921) state process is stop
2026-01-09 16:57:56,116 INFO  [o.a.s.a.e.LoggingEventHandler ] [seatunnel-coordinator-service-7] - log event: JobStateEvent(jobId=1062052604328017921, jobName=SeaTunnel_Job, jobStatus=FINISHED, createdTime=1767977876116)
2026-01-09 16:57:56,117 INFO  [o.a.s.e.c.j.ClientJobProxy    ] [main] - Job (1062052604328017921) end with state FINISHED
2026-01-09 16:57:56,132 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - 
***********************************************
           Job Statistic Information
***********************************************
Start Time                : 2026-01-09 16:57:52
End Time                  : 2026-01-09 16:57:56
Total Time(s)             :                   3
Total Read Count          :                   2
Total Write Count         :                   2
Total Failed Count        :                   0
***********************************************

2026-01-09 16:57:56,133 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-420804] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTTING_DOWN
2026-01-09 16:57:56,135 INFO  [c.h.i.s.t.TcpServerConnection ] [hz.main.IO.thread-in-1] - [localhost]:5801 [seatunnel-420804] [5.1] Connection[id=1, /127.0.0.1:5801->/127.0.0.1:45255, qualifier=null, endpoint=[127.0.0.1]:45255, remoteUuid=3be41dca-f0e7-4603-ad27-4e5b3aecbb4a, alive=false, connectionType=JVM, planeIndex=-1] closed. Reason: Connection closed by the other side
2026-01-09 16:57:56,135 INFO  [.c.i.c.ClientConnectionManager] [main] - hz.client_1 [seatunnel-420804] [5.1] Removed connection to endpoint: [localhost]:5801:579e9374-7d47-4d91-80d5-c7a9155d42d0, connection: ClientConnection{alive=false, connectionId=1, channel=NioChannel{/127.0.0.1:45255->localhost/127.0.0.1:5801}, remoteAddress=[localhost]:5801, lastReadTime=2026-01-09 16:57:56.131, lastWriteTime=2026-01-09 16:57:56.117, closedTime=2026-01-09 16:57:56.134, connected server version=5.1}
2026-01-09 16:57:56,135 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-420804] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is CLIENT_DISCONNECTED
2026-01-09 16:57:56,136 INFO  [c.h.c.i.ClientEndpointManager ] [hz.main.event-1] - [localhost]:5801 [seatunnel-420804] [5.1] Destroying ClientEndpoint{connection=Connection[id=1, /127.0.0.1:5801->/127.0.0.1:45255, qualifier=null, endpoint=[127.0.0.1]:45255, remoteUuid=3be41dca-f0e7-4603-ad27-4e5b3aecbb4a, alive=false, connectionType=JVM, planeIndex=-1], clientUuid=3be41dca-f0e7-4603-ad27-4e5b3aecbb4a, clientName=hz.client_1, authenticated=true, clientVersion=5.1, creationTime=1767977872821, latest clientAttributes=lastStatisticsCollectionTime=1767977872841,enterprise=false,clientType=JVM,clientVersion=5.1,clusterConnectionTimestamp=1767977872816,clientAddress=127.0.0.1,clientName=hz.client_1,credentials.principal=null,os.committedVirtualMemorySize=9714790400,os.freePhysicalMemorySize=2077376512,os.freeSwapSpaceSize=1073737728,os.maxFileDescriptorCount=1048576,os.openFileDescriptorCount=103,os.processCpuTime=4280000000,os.systemLoadAverage=3.26,os.totalPhysicalMemorySize=8321540096,os.totalSwapSpaceSize=1073737728,runtime.availableProcessors=14,runtime.freeMemory=286196824,runtime.maxMemory=477626368,runtime.totalMemory=376438784,runtime.uptime=2297,runtime.usedMemory=90241960, labels=[]}
2026-01-09 16:57:56,136 INFO  [c.h.c.LifecycleService        ] [main] - hz.client_1 [seatunnel-420804] [5.1] HazelcastClient 5.1 (20220228 - 21f20e7) is SHUTDOWN
2026-01-09 16:57:56,136 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed SeaTunnel client......
2026-01-09 16:57:56,136 INFO  [c.h.c.LifecycleService        ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] [localhost]:5801 is SHUTTING_DOWN
2026-01-09 16:57:56,138 INFO  [c.h.i.p.i.MigrationManager    ] [hz.main.cached.thread-15] - [localhost]:5801 [seatunnel-420804] [5.1] Shutdown request of Member [localhost]:5801 - 579e9374-7d47-4d91-80d5-c7a9155d42d0 [master node] [active master] this is handled
2026-01-09 16:57:56,141 INFO  [c.h.i.i.Node                  ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Shutting down connection manager...
2026-01-09 16:57:56,142 INFO  [c.h.i.i.Node                  ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Shutting down node engine...
2026-01-09 16:57:56,149 INFO  [.s.s.o.e.j.s.AbstractConnector] [main] - Stopped ServerConnector@6e31d989{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}
2026-01-09 16:57:56,149 INFO  [o.a.s.s.o.e.j.s.session       ] [main] - node0 Stopped scavenging
2026-01-09 16:57:56,150 INFO  [a.s.s.o.e.j.s.h.ContextHandler] [main] - Stopped o.a.s.s.o.e.j.s.ServletContextHandler@7069f076{/,null,STOPPED}
2026-01-09 16:57:56,151 INFO  [.c.c.DefaultClassLoaderService] [main] - close classloader service
2026-01-09 16:57:56,152 INFO  [o.a.s.e.s.EventService        ] [event-forwarder-0] - Event forward thread interrupted
2026-01-09 16:57:56,155 INFO  [c.h.i.i.NodeExtension         ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Destroying node NodeExtension.
2026-01-09 16:57:56,156 INFO  [c.h.i.i.Node                  ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] Hazelcast Shutdown is completed in 18 ms.
2026-01-09 16:57:56,156 INFO  [c.h.c.LifecycleService        ] [main] - [localhost]:5801 [seatunnel-420804] [5.1] [localhost]:5801 is SHUTDOWN
2026-01-09 16:57:56,156 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed HazelcastInstance ......
2026-01-09 16:57:56,156 INFO  [s.c.s.s.c.ClientExecuteCommand] [main] - Closed metrics executor service ......
2026-01-09 16:57:56,156 INFO  [s.c.s.s.c.ClientExecuteCommand] [SeaTunnel-CompletableFuture-Thread-7] - run shutdown hook because get close signal

Once the job finishes, the Docker container created to execute it will be terminated, and you will be able to see the records written on Redis.

As promised, this was as simple as writing a single configuration file. Of course, understanding the numerous configuration options that both the source and sink connectors offer may take some time to grasp. You can check the configuration properties for Amazon DynamoDB here and those for Redis here.

Option 2: The Production Powerhouse (Cluster Mode)

When you need an always-on data pipeline infrastructure that can handle multiple jobs, scale on demand, and provide REST API access, cluster mode is your friend.

Step 1: Set Up Your Infrastructure

Create a docker-compose.yaml that spins up a complete SeaTunnel cluster. It also includes Redis, which, for this use case, will be the sink of the data pipeline.

version: '3.8'

services:

  seatunnel-master:
    image: apache/seatunnel:2.3.12
    container_name: seatunnel-master
    environment:
      - ST_DOCKER_MEMBER_LIST=172.16.0.3,172.16.0.4,172.16.0.5
    volumes:
      - ./config/hazelcast-master.yaml:/opt/seatunnel/config/hazelcast-master.yaml
      - ./config/seatunnel.yaml:/opt/seatunnel/config/seatunnel.yaml
    entrypoint: >
      /bin/sh -c "
      /opt/seatunnel/bin/seatunnel-cluster.sh -r master
      "    
    ports:
      - "5801:5801"
      - "8080:8080"  # REST API endpoint
    networks:
      seatunnel_network:
        ipv4_address: 172.16.0.3
    healthcheck:
      test: [ "CMD-SHELL", "curl -f http://localhost:8080/system-monitoring-information || exit 1" ]
      interval: 30s
      timeout: 10s
      retries: 10
      start_period: 60s

  seatunnel-worker1:
    image: apache/seatunnel:2.3.12
    container_name: seatunnel-worker1
    environment:
      - ST_DOCKER_MEMBER_LIST=172.16.0.3,172.16.0.4,172.16.0.5
    volumes:
      - ./config/hazelcast-worker.yaml:/opt/seatunnel/config/hazelcast-worker.yaml
      - ./config/seatunnel.yaml:/opt/seatunnel/config/seatunnel.yaml
    entrypoint: >
      /bin/sh -c "
      /opt/seatunnel/bin/seatunnel-cluster.sh -r worker
      " 
    depends_on:
      - seatunnel-master
    networks:
      seatunnel_network:
        ipv4_address: 172.16.0.4
    healthcheck:
      test: ["CMD-SHELL", "pgrep -f seatunnel || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 45s

  seatunnel-worker2:
    image: apache/seatunnel:2.3.12
    container_name: seatunnel-worker2
    environment:
      - ST_DOCKER_MEMBER_LIST=172.16.0.3,172.16.0.4,172.16.0.5
    volumes:
      - ./config/hazelcast-worker.yaml:/opt/seatunnel/config/hazelcast-worker.yaml
      - ./config/seatunnel.yaml:/opt/seatunnel/config/seatunnel.yaml
    entrypoint: >
      /bin/sh -c "
      /opt/seatunnel/bin/seatunnel-cluster.sh -r worker
      " 
    depends_on:
      - seatunnel-master
    networks:
      seatunnel_network:
        ipv4_address: 172.16.0.5
    healthcheck:
      test: ["CMD-SHELL", "pgrep -f seatunnel || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 45s

  redis-database:
    container_name: redis-database
    hostname: redis-database
    image: redis:8.4.0
    ports:
      - "6379:6379"
    networks:
      seatunnel_network:
        ipv4_address: 172.16.0.2
    healthcheck:
      test: [ "CMD-SHELL", "redis-cli ping | grep PONG" ]
      interval: 10s
      retries: 5
      start_period: 5s
      timeout: 5s

networks:
  seatunnel_network:
    driver: bridge
    ipam:
      config:
        - subnet: 172.16.0.0/24

Step 2: Configure your Cluster

Create config/seatunnel.yaml to enable the REST API:

seatunnel:
  engine:
    http:
      enable-http: true
      port: 8080

Create config/hazelcast-master.yaml and config/hazelcast-worker.yaml:

hazelcast:
  network:
    rest-api:
      enabled: true
      endpoint-groups:
        CLUSTER_READ:
          enabled: true
        CLUSTER_WRITE:
          enabled: true
        HEALTH_CHECK:
          enabled: true
        DATA:
          enabled: true
    port:
      auto-increment: true
      port-count: 100
      port: 5801
    join:
      tcp-ip:
        enabled: true
        member-list:
          - 172.16.0.3
          - 172.16.0.4
          - 172.16.0.5

Step 3: Launch Your Cluster

docker compose up -d

Your SeaTunnel cluster is now running with:

1 Master node (coordinator + API server)
2 Worker nodes (execute the actual data pipelines)
Redis database (your sink and target system)

Step 4: Submit Jobs via REST API

Now the magic happens! Submit jobs using a simple HTTP POST:

curl -X POST http://localhost:8080/submit-job?jobName=dynamodb-to-redis \
  -H "Content-Type: application/json" \
  -d '{
    "env": {
        "parallelism": 2,
        "job.mode": "BATCH"
    },
    "source": [
        {
            "plugin_name": "Amazondynamodb",
            "plugin_output": "dynamodb_source",
            "url": "http://dynamodb.us-east-1.amazonaws.com",
            "region": "us-east-1",
            "access_key_id": "YOUR_ACCESS_KEY",
            "secret_access_key": "YOUR_SECRET_KEY",
            "table": "mySourceTable",
            "schema": {
                "fields": {
                    "customerId": "int",
                    "customerName": "string",
                    "address": "string"
                }
            }
        }
    ],
    "transform": [],
    "sink": [
        {
            "plugin_name": "Redis",
            "host": "redis-database",
            "port": 6379,
            "support_custom_key": true,
            "key": "customer:{customerId}",
            "data_type": "hash"
        }
    ]
}'

Once this job is accepted, it will start executing immediately. You can actually follow up with the execution of this job in the web console that SeaTunnel exposes. Go to the URL http://localhost:8080/#/overview and navigate to the Jobs tab.

Wrapping Up

With Apache SeaTunnel, syncing data from Amazon DynamoDB to Redis is a configuration exercise. Whether you need a one-time migration or an always-on data integration pipeline, Apache SeaTunnel has the tools for your needs.

The beauty of this approach:

No code to maintain – just configuration
Battle-tested connectors – focus on your business logic
Flexible deployment – from laptop to production cluster
Scalable by design – scale with your data horiontally

Ready to give it a spin? Start with standalone mode for quick experiments, then graduate to cluster mode when you're ready for production.

Have questions or want to share your specific data integration challenges? Please drop a comment below or find me on social media. Happy data syncing! 🚀

Organizing AI Applications: Lessons from traditional software architecture

Ashwin Hariharan — Mon, 05 Jan 2026 13:30:00 +0000

When I started learning AI and diving into frameworks like LangGraph, n8n, and the OpenAI APIs, I found plenty of great tutorials. They taught me how to build a simple chatbot, how to make my first LLM call, how to chain a few prompts together. Useful stuff for getting started.

Great for for learning. Less great for shipping.

After the first couple of weeks, I wanted to build an actual production-ready application which goes beyond standard POCs. Something which uses AI but involving dozens of routes, multiple features and services, database operations, caching layers. But those beginner tutorials weren't enough. Where do the embeddings live? How do I structure my agent workflows? Should my API routes call AI directly, or is there supposed to be a layer in between?.

The documentation showed me how to use the framework abstractions and APIs. It didn't show me how to organize them.

If you're like me, coming from a JavaScript/TypeScript background, you know the language gives you a lot of freedom - no enforced folder structure, no prescribed architecture. You can organize your code however you want. But that freedom comes at a price.

Without clear patterns to guide where things should go, you might end up with working code in all the wrong places. Calls to OpenAI API scattered everywhere, business logic tangled with your routes, and you just know this is going to be painful to maintain later.

What makes projects maintainable
Architectural patterns for AI apps
- Structure by business components or modules
- Layer by 3-tier-architecture
- Tools and prompts should call domain logic, not implement it
- Dependency Inversion
- Using environment variables
- Separate persistent data from agent memory

The cost of 'just making it work'

Here's the thing: software architecture trends come and go. In 2015, everyone said microservices were the future. In 2018, serverless. In 2021, JAMstack. In 2023, everyone quietly went back to monoliths.

But you know what remained constant through all these trends? The fundamental principle of separating concerns. These same software development principles apply to AI and agentic AI applications. Whether you're building traditional web apps or AI-powered systems, the need for clear architecture remains constant.

What Makes AI projects maintainable

Let's establish what properties a good AI project should have:

Clear Ownership Boundaries

Clear ownership boundaries define which part of your system is responsible for what. Good boundary means when something breaks or needs extension, you immediately know which component to check.

Each distinct concern in your application should be handled by a single, well-defined module or component. This way, when something goes wrong or when you need to add a feature, you'll immediately know which part of your codebase is responsible.

Clear boundaries mean each concern lives in an identifiable place. When something goes wrong, you immediately know which module to check. When you need to add a feature, you know which component to extend.

Reusability across entry points

Reusability means writing logic once and calling it from anywhere.

Your core business logic should work the same way regardless of how it's triggered. Whether called from a web API, an AI agent, a scheduled job, a command-line tool, a message queue, or a test suite, the same functionality should be available without rewriting it.

Why it matters: Today it's a chat API. Tomorrow you may want a Slack bot. Next week, batch processing. And who knows? Maybe you'll discover that your users actually prefer the regular search over your fancy AI chatbot anyway. If your AI code is tied to say, your controllers, you might have to rewrite it each time.

Testability

Testability is the degree to which a piece of software can be tested easily and effectively. It describes how simple it is to check that the software works as intended. High testability means tests can be written quickly, run reliably, and give clear results, while low testability leads to tests that are difficult, slow, or unclear.

AI applications have many moving parts. Vector search slow? Cache not hitting? Agent hallucinating? Embedding generation failing? When you're debugging, you shouldn't have to hunt through your entire codebase to find the problem.

Provider Independence

Swapping from GPT-4 to Claude to Gemini shouldn't require changing business logic.

Why it matters: AI models evolve weekly. Today's best model is next month's deprecated one. Providers change pricing. Features get sunset. Your architecture should make provider switching very straightforward.

If you're a software engineer who has started building AI applications and you want to move beyond simple code snippets and demos, let me show you what worked for me. In the following sections, I'll walk you through patterns I extracted from a real project - patterns that apply whether you're building e-commerce, SaaS tools, content platforms, or any AI-powered application.

For illustration, I'll use an AI-powered restaurant discovery application. Think of a platform where users search for restaurants, browse by cuisine or location, and make reservations - the typical flow you'd see in apps like Zomato or Yelp. Now imagine you decide to add a chat feature to make restaurant discovery more conversational - a dining assistant in a corner chat window that enables finding the perfect restaurant through natural conversation.

View the implementation on GitHub

So now, instead of just filtering by "Italian cuisine", diners can now also ask questions like "I need a romantic spot with live music for an anniversary dinner".

What this really comes down to:

How do you add these AI capabilities without major refactoring of your existing system? And without over-engineering a solution that's way more complex than it needs to be?

Throughout the code examples, I will use Node.js / JavaScript as its syntax is widely familiar to anyone building apps for the web and generalizes well to other languages. But these architectural patterns apply equally to Python, Java, or any other language you're working with.

Architectural patterns for AI apps

After some research and quite a few experiments and trying out different approaches, I settled on few patterns that actually work. Before we dive in, know that these aren't necessarily AI-specific. They're the same principles that make any large application maintainable. But when you build with these patterns from the start, adding and testing AI features later becomes straightforward.

Let's look at what they look like:

Pattern 1: Structure by business components or modules

Rather than organizing code by technical function (all controllers together, all models together), organize by business components. Each module represents a bounded context with its own API, logic, and data access.

Why This Matters for AI:

AI applications typically have multiple distinct domains:

Conversation management (chat history, session state)
AI workflow orchestration (agents, tools, prompts)
Business entities (restaurants, users, reservations)
Data processing (embeddings, vector search)

Mixing these concerns creates cognitive overload and makes it hard to reason about the system and maintain it.

// ❌ Bad: Organized by technical layers

src/
├── controllers/
│   ├── chatController.ts
│   ├── restaurantController.ts
│   └── reservationController.ts
├── services/
│   ├── chatService.ts
│   ├── restaurantService.ts
│   └── reservationService.ts
└── repositories/
    ├── chatRepository.ts
    ├── restaurantRepository.ts
    └── reservationRepository.ts

Problem: To understand the "chat" feature, you jump between three different directories. Adding a new feature touches files across the entire codebase.

Colocation: For a feature, put related code close together. Code that changes for the same feature should be neighbors and a short navigation away.

This is also popularly known as "Domain-driven design"

When you add a new feature to the chat system, you typically need to modify the API endpoint, update the business logic, and adjust the data access layer. With domain-driven design, all these files are in the same chat/ directory-you never leave that folder. Without it, you're jumping between controllers/, services/, and repositories/ directories, trying to remember which pieces connect.

Each domain is self-contained with its own API, Domain, and Data layers

Benefit: Everything related to "chat" lives in one place. Each business component is self-contained.

// ✅ Good: Organized by business components

modules/
├── chat/
│   ├── api/chatController.ts
│   ├── service/chatService.ts
│   └── data/chatRepository.ts
├── restaurants/
│   ├── api/restaurantController.ts
│   ├── service/restaurantService.ts
│   └── data/restaurantRepository.ts
└── reservations/
    ├── api/reservationController.ts
    ├── service/reservationService.ts
    └── data/reservationRepository.ts

When you need to modify how restaurants are searched, you go directly to modules/restaurants/. When you need to add a new AI tool, it goes in modules/ai/agentic/tools. There's no guessing, no hunting through dozens of files.

This also means you could extract any of these services into a separate microservice later without major refactoring - the boundaries are already defined.

Pattern 2: Layer your feature modules with 3-tier architecture

While Pattern 1 is about grouping by business domain, Pattern 2 applies the same concept of colocation within each domain. The API, domain, and data layers for a feature stay together in the same feature module folder, not scattered across the codebase.

Three-tier architecture

Within each module, maintain clear separation between three concerns:

Entry Points (API routes, message queue consumers, scheduled jobs)
Domain Logic (business rules, workflows, services)
Data Access (database queries, external API calls) - this is also called repository-pattern.

The Critical Rule:

Never pass framework-specific objects (Express Request/Response, HTTP headers, etc.) into your domain layer. Domain logic should be pure and reusable across different entry points.

// ❌ Don't do this: Domain logic coupled to Express

import { Request, Response } from 'express';

export async function handleChatMessage(req: Request, res: Response) {
  const sessionId = req.cookies.sessionId;
  const message = req.body.message;

  // AI workflow logic mixed with HTTP handling
  const result = await aiAgent.run(message);

  res.json({ response: result });
}

The problem with the above code is that this function would only work with Express. But AI workflows often need to be triggered from multiple sources:

HTTP API requests (user chat)
Scheduled jobs (batch processing)
Message queues (async workflows)
Tests (validation)

If your AI logic is tightly coupled to your HTTP layer, you can't reuse it elsewhere, and you won't be able to call it from a scheduled job, test, or CLI tool.

Your AI workflow orchestration should be completely separate from the HTTP layer. Here's an example:

// ✅ Good: Clean separation

import { Request, Response } from 'express';
import { getResponseFromAgent } from '@modules/chat';

export async function handleChatMessage(req: Request, res: Response) {
  const sessionId = req.cookies.sessionId as string;
  const message = req.body.message as string;

  const result = await getResponseFromAgent(message, sessionId);

  res.json({ response: result });
}

// ✅ business logic - no HTTP dependencies
export async function getResponseFromAgent(message: string, sessionId: string) {

  const result = await aiAgent.run(message, sessionId);
  return {
    response: result.text,
    toolsUsed: result.tools
  };
}

Now, getResponseFromAgent() can be called from anywhere - HTTP endpoints, scheduled jobs, tests, or CLI scripts.

The API layer now focuses only on handling HTTP concerns - receiving the request, extracting the session ID and message, and returning a response - while delegating all business logic to the domain layer.

Similarly, use the Repository pattern to prevent database details from leaking into business logic:

// ❌ Don't put this in your services / domain logic
async function searchRestaurants(query: string) {
  const redis = await redisClient.connect();
  const results = await redis.ft.search('idx:restaurants', query);
  // transform query response...
}

// ✅ Good
async function searchRestaurants(query: string) {
  const queryEmbeddings = await generateEmbeddings([query]);
  return restaurantRepository.vectorSearchRestaurants(queryEmbeddings);
}

// ✅ restaurant-repository.ts
export class RestaurantRepository {
  async vectorSearchRestaurants(searchVector: number[]) {
    // Redis-specific implementation hidden
    const results = await redis.ft.search(/* ... */);
    return this.transformToRestaurants(results);
  }
}

The domain layer remains pure and independent of HTTP and database implementation, returning structured results. Now if you need to switch databases based on performance or cost, your domain services don't change - only the repository implementation does.

Pattern 3: Tools and prompts should call domain logic, not implement it

When you're building AI agents with LangChain, LangGraph, or similar frameworks, you define "tools" - functions the AI can call to perform actions. Need to search products? Create a tool for that. Add items to cart? Tool. Get user preferences? Tool.

There are two common anti-patterns where business logic ends up in the wrong place:

Anti-pattern 1: Business logic in prompts

const prompt = `
You are a restaurant discovery assistant. Follow these rules:
1. Only show budget-friendly restaurants (under ₹500 per person) for budget users
2. Apply member discounts on reservations for gold members
3. Suggest fine dining establishments to premium members
...
`

Problem: Your business rules now live in natural language. They're non-deterministic, untestable, and invisible to code review.

Anti-pattern 2: Business logic in tools

It feels natural to write business logic directly in tool functions. During a conversation in the chat interface, the user wants to search products, so you write the search logic right there in the tool. You need database access, so you import the database client. You need to validate the search query, calculate relevance scores, apply business rules - all of it goes into the tool.

// ❌ Bad: Business logic inside AI tool

import { tool } from '@langchain/core/tools';
import { z } from 'zod';
import { createClient } from 'redis';

export const searchRestaurantsTool = tool(
  async ({ query }) => {
    // Database access directly in tool
    const client = await createClient.connect();
    const embedding = await openai.embeddings.create({ input: query });
    const results = await redis.ft.search('idx:restaurants', ...);

    // Business logic in tool
    const filtered = results.filter(r => r.priceFor2 < 2000);
    const sorted = filtered.sort((a, b) => b.rating - a.rating);

    return sorted.slice(0, 5);
  },
  {
    name: "search_restaurants",
    description: "Search for restaurants",
    schema: z.object({ query: z.string() })
  }
);

The problem with the above code is that the core logic is locked in the AI tool. Few months later, you might realize you need that same search logic in a REST API endpoint, or in a scheduled job, or in a different agent. But it's tightly coupled to the AI framework you used.

Tools should be thin wrappers that translate AI intent into domain service calls.

That's it. The tool's job is simple:

Receive parameters from the AI
Validate/transform them if needed
Call the appropriate domain service
Return the result in a format the AI understands

The actual business logic? That lives in domain services, completely independent of any AI framework.

AI tools as thin wrappers calling domain services

Here's how you can write it:

// ✅ Good: logic lives separately

export async function searchRestaurants(query, limit) {
  const queryEmbeddings = await generateEmbeddings([query]);
  const searchVector = queryEmbeddings[0];

  const restaurants = await restaurantRepository.vectorSearchRestaurants(
    searchVector,
    limit
  );

  return restaurants.sort((a, b) => b.rating - a.rating);
}

// ✅ Tool simply calls domain service
import { searchRestaurants } from '@modules/restaurants';

import { tool } from '@langchain/core/tools';
import { z } from 'zod';

export const searchRestaurantsTool = tool(
  async ({ query }) => {
    // Just a thin wrapper
    const restaurants = await searchRestaurants({ query, limit: 5 });
    return restaurants;
  },
  {
    name: "search_restaurants",
    description: "Search for restaurants",
    schema: z.object({ query: z.string() })
  }
);

Benefit: Your business logic is now independent, just plain functions that take input and return output - you can reuse it everywhere. searchRestaurants() can now be called from AI tools, HTTP endpoints, CLI scripts, or tests.

Ask yourself: "If I needed this logic in a REST API endpoint tomorrow, would I have to copy-paste code or could I just import a function?"

If the answer is copy-paste, your logic is in the wrong place.

When your tools are thin adapters to domain services, you can test the core behavior locally without calling the LLM at all. This makes tests fast, reliable, and deterministic. Your code becomes:

Reusable across HTTP, CLI, scheduled jobs, tests
Testable without AI framework mocking
Maintainable because business logic lives in one place
Flexible because you can change AI frameworks without rewriting business logic

Pattern 4: Dependency Inversion

High-level policy (your business logic) should not depend on low-level details (specific frameworks, providers, or databases). This follows the Dependency Inversion Principle from Robert C. Martin's SOLID principles: your code should depend on abstractions, not on concrete implementations.

In simpler terms: when you use external services-whether it's OpenAI, AWS Bedrock, LangChain, or Redis-hide them behind interfaces that match your domain's needs, not theirs.

Why this matters for AI:

AI applications introduce volatile dependencies that change frequently:

AI Providers switch constantly: Today's OpenAI becomes tomorrow's Anthropic or AWS Bedrock
Models evolve rapidly: GPT-4 becomes GPT-4 Turbo becomes GPT-4o
Frameworks shift: LangChain might become LangGraph, or you might switch to a different agentic framework entirely.
Vector databases compete: Pinecone, Weaviate, Redis Vector Search - requirements change.

Without DIP, switching any of these requires changes across your entire codebase. With DIP, you change one file.

Real-world example: switching AI providers

Consider an embeddings service. Without DIP, you'd scatter OpenAI SDK calls throughout your codebase:

// ❌ Bad: Tightly coupled to OpenAI everywhere
import OpenAI from 'openai';

async function searchRestaurants(query: string) {
  const openai = new OpenAI({ apiKey: process.env.OPENAI_KEY });
  const embedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query
  });
  // search logic using embedding...
}

Now imagine your company decides to switch to AWS Bedrock for cost savings. You'd need to find and modify every place that calls OpenAI's embedding API.

With DIP, high level policy does not depend on low level details:

// ✅ Good: embeddings.ts

import OpenAI from 'openai';
export async function generateEmbeddings(texts: string[]) {
  // Implementation hidden behind interface
}

async function searchRestaurants(query) {
    // Generate embedding for the search query
    const textEmbeddings = await generateEmbeddings([query]);

    // Use repository for pure data operations
    return restaurantRepository.vectorSearchRestaurants(textEmbeddings);
}

The implementation file changes based on your provider, but the interface stays the same:

import { BedrockEmbeddings } from '@langchain/aws';

export async function generateEmbeddings(texts: string[]) {
  const embeddings = new BedrockEmbeddings({
    model: process.env.model,
    region: process.env.awsRegion
  });
  return await Promise.all(texts.map(text => embeddings.embedQuery(text)));
}

// openai-version/embeddings.ts
import OpenAI from 'openai';

export async function generateEmbeddings(texts: string[]) {
  const openai = new OpenAI({ apiKey: CONFIG.openAiApiKey });
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: texts
  });
  return response.data.map(item => item.embedding);
}

Your domain services import from the abstraction:

import { generateEmbeddings } from '@modules/ai/helpers';

export async function findRestaurantsBySemanticSearch(query: string) {
  const queryEmbeddings = await generateEmbeddings([query]);
  const searchVector = queryEmbeddings[0];
  return await restaurantRepository.vectorSearchRestaurants(searchVector);
}

Switching providers now means changing one file, not hunting through your entire codebase. The LLM becomes just one dependency behind a clear interface, which we can replace with mocks or fixtures during testing.

Example: Abstracting agent frameworks

The same principle applies to agentic AI frameworks. Your business logic shouldn't know whether you're using LangGraph, CrewAI, or AutoGen.

Instead of spreading LangGraph-specific code everywhere, isolate your agentic AI workflow implementation in its own module:

// ✅ Good
import { HumanMessage } from '@langchain/core/messages';
import { restaurantReservationsWorkflow } from '@modules/ai/workflows';

export async function processUserQuery(query: string, sessionId: string) {
  const result = await restaurantReservationsWorkflow.invoke({
    messages: [new HumanMessage(query)],
    sessionId
  });

  return {
    response: result.result,
    cacheStatus: result.cacheStatus,
    toolsUsed: result.toolsUsed
  };
}

With Dependency Inversion, you can experiment with different models without touching business logic and move between frameworks gradually as your needs evolve. It does introduce extra abstraction, which can sometimes feel like over-engineering, but it's most valuable when dependencies are volatile, you’re evaluating multiple options, or vendor lock-in is a real risk.

Pattern 5: Use environment-aware, secure, and hierarchical config

AI applications have complex configuration needs:

Multiple AI provider credentials (OpenAI, Anthropic, AWS), model identifiers and versions
Rate limits and timeouts
Feature flags for different AI capabilities
Vector database connections
Caching configuration
Guardrail settings

So your configuration should be:

Environment-aware: Different values for dev/staging/production
Secure: Secrets never committed to version control
Validated: Fail fast on startup if config is invalid
Hierarchical: Organized for easy discovery
Type-safe (optional): Preferably with TypeScript or runtime validation

// ❌ Bad: Secrets hardcoded, scattered config
const openai = new OpenAI({
  apiKey: "sk-abc123...",  // Hardcoded secret!
});

const bedrockModel = "anthropic.claude-v2";  // Magic string
const redisHost = "localhost";  // Where's production config?

The problem with the configuration above is that the configuration is scattered everywhere, and there’s no clean way to switch between models and environments.

// ✅ Good: Centralized, validated, environment-aware
// config.js
import dotenv from 'dotenv';
dotenv.config();

const requiredEnvVars = [
  'OPENAI_API_KEY',
  'AWS_ACCESS_KEY_ID',
  'REDIS_HOST'
];

// Fail fast on startup if config is missing
requiredEnvVars.forEach(varName => {
  if (!process.env[varName]) {
    throw new Error(`Missing required environment variable: ${varName}`);
  }
});

export default {
  // AI Providers
  openAi: {
    apiKey: process.env.OPENAI_API_KEY,
    model: process.env.OPENAI_MODEL || 'gpt-4o',
    timeout: parseInt(process.env.OPENAI_TIMEOUT || '30000')
  },

  aws: {
    region: process.env.AWS_REGION || 'us-east-1',
    accessKeyId: process.env.AWS_ACCESS_KEY_ID,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
    bedrockModelId: process.env.BEDROCK_MODEL_ID || 'anthropic.claude-v2'
  },

  // Databases
  redis: {
    host: process.env.REDIS_HOST,
    port: parseInt(process.env.REDIS_PORT || '6379'),
    password: process.env.REDIS_PASSWORD
  }
};

# .env (never committed to git)
OPENAI_API_KEY=sk-abc123...
AWS_ACCESS_KEY_ID=AKIA...
REDIS_HOST=localhost

# .env.production (deployed separately)
OPENAI_API_KEY=sk-prod123...
AWS_ACCESS_KEY_ID=AKIA...
REDIS_HOST=redis.production.com

The benefit now is that all configuration lives in one place, it's automatically validated on startup, switching between environments is easy, and secrets are no longer hardcoded in the codebase.

Pattern 6: Separate persistent data from agent memory

AI agents need to remember things during conversations, but not everything should live in your main database. User preferences? Database. The fact that someone just asked about "cozy sweaters" 30 seconds ago? Agent memory.

Why this matters for AI:

Mixing persistent and ephemeral data leads to bloated databases and slow AI responses. Different types of data have different storage requirements:

Database: User profiles, restaurant catalog, reservation history - things that need to persist indefinitely
Agent Memory: Conversation context, temporary preferences, session state - things that can expire
Vector Store: Restaurant embeddings, semantic search indexes - specialized AI data structures

The Implementation:

services/
├── chat/
│   ├── services/
│   │   ├── workflow.js         # LangGraph orchestration
│   │   └── memory.js           # Session state, conversation context
│   └── data/
│       └── session-store.js    # Fast, ephemeral storage
├── restaurants/
│   └── data/
│       ├── restaurant-repository.js    # Persistent restaurant vector store
└── users/
    └── data/
        └── user-store.js       # User profiles, preferences

Benefits:

Faster AI responses (no database queries for temporary data)
Cleaner main database (only persistent data)
Automatic cleanup (session data expires)
Better performance (right storage for each data type)

Getting the storage layer right is crucial for AI performance.

summary

Check out Redis for AI learning path for hands-on experience implementing these memory patterns and vector search capabilities.

Your final project architecture could look something like this:

modules/
│
├── restaurants/             # Restaurants Domain
│   ├── api/                   # HTTP layer
│   ├── service/                # Business logic
│   └── data/                  # Data access
│
├── reservations/            # Reservations Domain
│   ├── api/
│   ├── service/
│   └── data/
│
├── chat/                    # Conversation Domain
│   ├── api/
│   ├── service/
│   └── data/
├── ai/                      # AI Domain
│   ├── agentic-restaurant-workflow/
│   │   ├── index.js          # Workflow orchestration
│   │   ├── nodes.js          # Agent definitions
│   │   ├── tools.js          # AI tools
│   │   └── state.js          # State management
│   └── helpers/
│       ├── embeddings.js     # Embedding generation
│       └── caching.js        # Cache logic

Complete architecture showing all six patterns working together

Check out the example implementation on Github!

redis-developer / restaurant-discovery-ai-agent-demo

An Agentic AI restaurant discovery platform that combines Redis's speed with LangGraph's intelligent workflow orchestration. Get personalized restaurant recommendations, make reservations, and get lightning-fast responses through semantic caching.

🍽️ Restaurant Discovery AI Agent

Redis-powered restaurant discovery with intelligent dining assistance.

An AI-powered restaurant discovery platform that combines Redis's speed with LangGraph's intelligent workflow orchestration. Get personalized restaurant recommendations, smart dining suggestions, and lightning-fast responses through semantic caching.

App screenshots

Tech Stack

Node.js (v24+) + Express - Backend runtime and API framework
Redis - Restaurant store, agentic AI memory, conversational history, and semantic caching
Redis LangCache API - Semantic caching for LLM responses
LangGraph - AI workflow orchestration
OpenAI API - GPT-4 for intelligent responses and embeddings for vector search
HTML + CSS + Vanilla JS - Frontend UI

Product Features

Smart Restaurant Discovery: AI-powered assistant helps you find restaurants, discover cuisines, and manage your reservations. Both text and vector-based search across restaurants
Dining Intelligence: Get restaurant recommendations with detailed information for any cuisine or occasion using RAG (Retrieval Augmented Generation)
Demo Reservation System: Reservation management…

View on GitHub

Next steps

Applied AI is quite fluid today - patterns, frameworks, and libraries are changing constantly. Teams are expected to deliver features fast. In this environment, your architecture needs flexibility. You can't afford to lock your code into rigid structures that make change expensive.

I have been working with the patterns we discussed for quite some time and they really helped. Remember, these are not rigid rules. Learn what works for your project, and adapt. Think of your project not as a monolithic whole, but as independent, composable features with clear interfaces:

If your code is scattered across technical layers, start grouping domains together grouping domains together.
If your AI tools contain database queries, extract them into domain services.
If you're tightly coupled to OpenAI, add an abstraction layer.

If you're working in the Node.js ecosystem, frameworks like NestJS can help you implement many of these patterns out of the box - modules, dependency injection, layered architecture. But here's the thing: these patterns aren't tied to any specific framework or even any specific language.

Choose the tools that work for your team, and these patterns will help you organize things in such a way that refactoring is something that is performed casually on a daily basis. It will also help pave the way to full-blown microservices in the future once your app grows. You'll likely refactor anyway after you've built something real.

Thanks to Raphael De Lio and Bhavana Giri for the detailed technical review and great suggestions on the article's structure.

From PostgreSQL to Redis: Accelerating Your Applications with Redis Data Integration

Ricardo Ferreira — Wed, 17 Dec 2025 01:57:25 +0000

Here's a statistic that might surprise you: 90% of all relational OLTP workloads are pure reads. Let that sink in. Nine out of ten database operations in your transactional system are simply fetching data, not modifying it. Yet these reads are competing for the same resources as your critical write operations. Resources like CPU, disk I/O, and network bandwidth.

Let me illustrate the impact of this with a practical example. Say you are responsible for an e-commerce platform. Orders are flowing in, customers are browsing products, and your PostgreSQL database is handling transactions as expected. However, a problem lies beneath the surface, one that becomes apparent during peak shopping hours. Page load times creep up. Product searches feel sluggish. Cart updates lag just enough to frustrate users. Yes, this is a bad user experience for sure. In the world of e-commerce, where Amazon has accustomed customers to expect nothing less than sub-second responses, every millisecond of delay translates to lost revenue.

The root cause? Disk and network I/O are hindering your transactions. Your perfectly normalized PostgreSQL database, while excellent at maintaining data consistency and handling complex transactions, wasn't designed for the read-heavy, millisecond-response-time demands of modern applications. Every product view, every category browse, and every user profile fetch requires a round trip to disk-based storage, and this is expensive as it competes for resources with write operations.

Cache-Aside Pattern: A Band-Aid, Not a Cure

For years, developers have turned to the cache-aside pattern as the go-to solution for addressing the load challenges with read-intensive applications. The logic of this pattern seems sound: apps handle reads primarily with Redis, and only hit the source database on cache misses, and update the cache with fresh data. It's the "happy path" developers all dream about.

Everything is great until reality sets in. The cache-aside pattern quickly reveals three critical flaws:

1. Repetitive Update Logic: Every application must implement the same caching logic. Each microservice, each new feature, each development team reinvents the wheel. It's challenging to maintain best practices across projects, and database schema changes often break with every new release.

2. The Thundering Herd Problem: When cache keys expire simultaneously, imagine a flash sale starting at midnight with thousands of requests hammering your database at once. Your database must be sized not for average load, but for these sporadic read spikes. Query times slow to a crawl, eventually causing cascading failures.

3. Data Invalidation Nightmares: What happens when records are deleted from the database? How do you handle updates that affect multiple cached entries? There's no atomic way to write to both Redis and your database, leading to inconsistency windows that corrupt user experiences.

After years of experiencing problems like this, developers came up with another pattern that aims to extend the cache aside with a more proactive approach. This pattern is known as refresh ahead.

Refresh-Ahead Pattern: You Don't Call Me; I Call You!

Right, so you know reads must be served by Redis as it is faster than disk-based databases. But with cache aside, you must wait until a read request comes in to effectively read from the source database. Why can't we change this paradigm and let the cache be populated proactively using a dedicated update engine?

This is what the refresh-ahead pattern is all about. You leverage an engine that will be responsible for pulling records from your source database and moving the data to Redis for eventual reads. The same engine must also be responsible for periodically monitoring the source database to identify changes and updating Redis accordingly. This includes monitoring deleted records to trigger key invalidation at Redis.

This is a great pattern to implement in conjunction with Redis for read-intensive use cases. Some teams turn to Change Data Capture (CDC) using tools like Apache Kafka and Debezium to achieve this. Others decide to implement complex ETL pipelines. Regardless of the implementation stack, the idea is the same: capture database changes as events and stream them to Redis. However, this approach introduces what we call the "distributed systems hole".

A complexity trap that consumes entire development teams whose main job is not always to maintain data pipelines like that. Implementing the refresh ahead pattern manually often creates the following problems:

Developer Overutilization: Your best engineers will spend months building and maintaining data pipelines instead of working on the systems that are actually tied to the company's revenue, creating a perception that they are not working toward the organization's goals.

The Expertise Tax: Apache Kafka, Debezium, and ETL experts command premium salaries; it's hard to keep them, and more importantly, replace them. If the team is not carefully planned from day one, it will be hard to justify to the business why they need to delay that important launch because someone from the team has left.

Operational Complexity: Every schema change necessitates pipeline updates, and every deployment carries the risk of data inconsistency. This requires teams to be on-call every time the domain model changes, as this will break the integration built to sustain the data pipeline that has been developed.

Let's go back to the original problem. All you wanted was to speed up your application because reads are more frequent than writes. Instead, you've created a distributed systems monster that requires constant feeding and care.

Implementing the Refresh-Ahead Pattern with RDI

This is where Redis Data Integration (RDI) changes the game entirely. RDI implements the refresh-ahead pattern with a future-proof solution that moves data proactively from your source database to Redis, keeping both in perfect sync without the complexity overhead. Unlike traditional CDC solutions, RDI requires no expertise in distributed systems. Its configuration, not code. It's operational simplicity, not complexity. It's a solution that doesn't hold you back.

Major enterprises, such as Axis Bank, are already utilizing RDI to accelerate their applications, and guess what: you can use it too. RDI is available for on-premise deployments via Redis Enterprise, and for cloud users via Redis Cloud.

Let's see how this works with a real e-commerce dataset stored at PostgreSQL to showcase RDI's capabilities. For this example, you can use the Docker Compose file below that creates a pre-configured PostgreSQL database with CDC enabled.

services:
  postgres:
    image: debezium/postgres:15-alpine
    hostname: postgres
    container_name: postgres
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=postgres
    ports:
      - 5432:5432
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - ./pgdata:/var/lib/postgresql/data
      - ./scripts/initial-load.sql:/docker-entrypoint-initdb.d/initial-load.sql
  pgadmin:
    image: dpage/pgadmin4
    container_name: pgadmin4
    restart: always
    ports:
      - "8888:80"
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@postgres.com
      PGADMIN_DEFAULT_PASSWORD: pgadmin4pwd
    healthcheck:
      test: ["CMD", "wget", "-O", "-", "http://localhost:80/misc/ping"]
      interval: 10s
      timeout: 5s
      retries: 5
    volumes:
      - ./pgadmin:/var/lib/pgadmin

As the container with the PostgreSQL databases starts up, a script is executed to create the necessary tables. You can find this script here.

The database contains normalized tables, some of which contain foreign key relationships. Some of these tables are exactly what you'd expect in a typical transactional system:

Categories and Products with a one-to-many relationship
Customers placing Orders containing multiple OrderItems
Suppliers connected to Products through a many-to-many relationship

These tables represent your source of truth—but optimized for consistency, not speed.

How RDI Works?

Once you have installed RDI and deployed your data pipeline, here is what happens behind the scenes. First, RDI performs an initial cache loading, populating Redis with all your existing data. In the demo below, you can see 78 records being synchronized initially, creating data streams for each table. The RDI dashboard displays real-time metrics, including records inserted, updated, and deleted, with timestamps accurate to the millisecond.

After this, every time you insert a new user through pgAdmin, it appears in Redis within milliseconds. When you update an order status, Redis reflects the change instantly. When you delete a product, it is automatically removed from Redis. This isn't eventual consistency with fingers crossed. It's guaranteed synchronization through CDC.

By default, records will be written into Redis using Hashes data type. For simple entities, such as categories, that have flat, predictable fields, you can use Redis hashes. With Hashes, the primary key forms part of the Redis key (e.g., category:1). This provides O(1) access to any field, without the need for table scans, and no more index lookups. This is an example of how the data will look in Redis using Hashes.

For more complicated data entities that require the usage of nested data, arrays, or flexible schemas, you can use the JSON data type in Redis to store data with more flexibility. This is an example of how the data will look in Redis using JSON:

But RDI goes beyond simple replication. The stream processor layer continuously translates data to your preferred data model, as well as the layout you implement via transformations. For example, consider the users table with the following record:

id: 52
username: riferrei  
first_name: Ricardo
last_name: Ferreira
email: ricardo.ferreira@example.com

You want to create two additional fields in the final record that will be written to Redis. You can create a transformation using YAML like this:

name: custom-job
source:
  schema: public
  table: user
transform:
  - uses: add_field
    with:
      expression: first_name || ' ' || last_name
      field: display_name
      language: sql
  - uses: add_field
    with:
      expression:
        CASE
          WHEN email LIKE '%@example.com' THEN 'internal'
          ELSE 'external'
        END
      field: user_type
      language: sql
output:
  - uses: redis.write
    with:
      connection: target
      data_type: json

After the transformation that implements data enrichment, the final record becomes a Redis JSON document with computed fields:

{
  "id": 52,
  "username": "riferrei",
  "first_name": "Ricardo", 
  "last_name": "Ferreira",
  "email": "ricardo.ferreira@example.com",
  "display_name": "Ricardo Ferreira",
  "user_type": "internal"
}

Notice the two new fields that don't exist in PostgreSQL:

display_name: Concatenated from first and last names
user_type: Computed based on email domain logic

This transformation happens in the RDI stream processor, which parses your YAML configuration with the transformations. No code is required in your application. No cache invalidation needed as well. Just pure, configuration-driven transformation.

If you want to try this demo yourself, you can do so by following the instructions in the following GitHub repository:

https://github.com/redis-developer/postgres-to-redis-rdi-demo

The beauty of this repository is that you can run it entirely on your local machine using Kubernetes. The repository includes everything you need:

Automated deployment scripts for RDI (both local and cloud options)
A pre-configured PostgreSQL database with sample e-commerce data
Transformation job examples showing JSON and Hash outputs
Step-by-step instructions with visual guides

Within minutes, you'll have a complete CDC pipeline streaming data from PostgreSQL to Redis, transforming relational tables into high-performance key-value structures.

Beyond Simple Caching: A Living Data Layer

What sets Redis apart from other caching solutions is its ability to provide a future-proof solution for problems. This isn't a cache that might be stale or needs complex invalidation logic. It's a real-time materialized view of your source database, transformed and optimized for high-speed access. The configuration-driven approach RDI provides means you can evolve your data pipeline without touching application code:

Need to add a new computed field? Just update the YAML configuration file.
Want to change how data is structured in Redis? Modify the transformation job.
Schema changed in the source? RDI adapts automatically. No action needed.

No redeployment, no code changes, no downtime. Just operational simplicity that lets developers focus on innovation instead of infrastructure. RDI solves the fundamental tension in modern application architecture. You no longer have to choose between consistency and speed, simplicity and performance, and developer productivity and operational excellence.

RDI delivers exactly that: an out-of-the-box data pipeline that offloads reads to Redis, speeding up both your applications and your learning curve.

The Future Is Refresh-Ahead

As we move toward software architectures where data is always in flux, where latency is measured in microseconds, and where scale is assumed rather than planned, the ability to seamlessly synchronize and transform data between complementary stores becomes essential.

Redis Data Integration represents more than a technical solution. It's a paradigm shift in how we think about data architecture. It's the realization that we don't need to accept the trade-offs we've lived with for years. We can achieve transactional consistency with disk-based databases, along with blazing-fast reads in Redis, without compromising on complexity.

The question isn't whether you need real-time data synchronization; it's when you need it. It's whether you can afford to keep solving the 90% problem with yesterday's solutions. Welcome to the refresh-ahead revolution. Your applications and your users will thank you.

Implementing Semantic Anomaly Detection with OpenTelemetry and Redis

Ricardo Ferreira — Tue, 09 Dec 2025 17:07:16 +0000

Detecting Anomalies in OpenTelemetry Logs Using Vector Embeddings and Redis

Finding Unknown Unknowns in Your Logs

Picture this: It's 3 AM, and your payment service starts failing. The errors don't match any of your alert rules. The log messages are technically distinct from anything you've seen before, but semantically, they're similar to a database connection issue that occurred six months ago. Your rule-based system missed it because the error message format changed. Your keyword searches failed because the new error uses different terminology. By the time your team discovers the issue through customer complaints, you have already lost thousands in revenue.

This scenario plays out daily across engineering teams. Modern applications generate millions of logs, and buried within them are critical anomalies, such as security breaches, performance degradations, and system failures. The challenge isn't just the volume; it's that we don't know what we're looking for. New failure modes emerge constantly, attackers develop novel techniques, and dependencies fail in unexpected ways. Meanwhile, traditional monitoring relies on flawed assumptions, such as anticipating all failure modes (rule-based detection) and assuming that anomalies will use predictable keywords (search-based detection). But what if, instead of looking for specific patterns, we could teach our system to understand what "normal" looks like and flag anything that deviates from that understanding?

This is where semantic anomaly detection comes in. By converting logs into vector embeddings that capture their meaning, we can identify anomalies based on how different they are from normal behavior, even if we've never seen that specific error before. It's like teaching your monitoring system to understand context and meaning, not just pattern matching.

Understanding the Solution: Embeddings and Vector Search

Before diving into code, let's understand what makes this approach powerful. An embedding is a method for representing data as an array of numbers (a vector) with the purpose of capturing its semantic meaning. When we convert the log message "User authentication failed for admin account" into an embedding, we obtain a vector of the form [0.23, -0.45, 0.67, ...], stored in a 768-dimensional space, where similar meanings result in similar vectors. We can achieve this by using a vector store, such as Redis, and executing a semantic search.

Here's the key insight: logs that describe similar events will have similar embeddings, even if they use different words. "Authentication failed" and "Login unsuccessful" will produce vectors that are close together in the 768-dimensional space. Meanwhile, "Authentication failed" and "Payment processed successfully" will be far apart. This distance becomes our anomaly score.

This is powerful, but it can be further enhanced. When we combine this with OpenTelemetry's structured logging, we can add additional context to the semantic search. OpenTelemetry (or OTEL for short) provides standardized fields for observability data:

service.name tells us which service generated the log
http.status_code indicates success or failure
net.peer.ip shows who's connecting
trace.id links related logs together

These are only a few examples of fields that can be provided by OTEL structured logging. They were included in the OTEL project in 2023 when Elastic donated its structured approach to logs, known as Elastic Common Schema (ECS), to OTEL. You can read more about this here.

By incorporating this context into our embeddings, we create rich representations that not only understand what happened, but also where, when, and under what circumstances. A "connection timeout" from your database service at 3 AM is very different from the same message from a third-party API during business hours, and our embeddings will reflect that.

Building the Anomaly Detection System

Let's build a production-ready anomaly detector using Redis Open Source, which includes support for Redis Query Engine (RQE). This will allow us to easily store vector embeddings and implement semantic search, which is required for this use case.

Setting Up Dependencies

The Docker Compose file below provides you with an easy way to spin up Redis, as well as Redis Insight, which you can use to browse and inspect data.

services:

  redis-database:
    container_name: redis-database
    hostname: redis-database
    image: redis:8.4.0
    volumes:
      - ./data:/data
    environment:
      REDIS_ARGS: --save 30 1
    ports:
      - "6379:6379"
    healthcheck:
      test: [ "CMD-SHELL", "redis-cli ping | grep PONG" ]
      interval: 10s
      retries: 5
      start_period: 5s
      timeout: 5s

  redis-insight:
    container_name: redis-insight
    hostname: redis-insight
    image: redis/redisinsight:2.70.1
    depends_on:
      - redis-database
    environment:
      RI_REDIS_HOST: "redis-database"
      RI_REDIS_PORT: "6379"
    ports:
      - "5540:5540"
    healthcheck:
      test: ["CMD", "sh", "-c", "wget -q -O- http://redis-insight:5540/api/health | grep -q '\"status\":\"up\"'"]
      interval: 10s
      retries: 5
      start_period: 5s
      timeout: 5s

When both services are up, you can access your Redis database using a browser. Navigate to http://localhost:5540, and you should see the following page:

To interact with RQE, we will use RedisVL (Redis Vector Library), which provides a clean Python interface for vector operations in Redis. We'll start with the core components and gradually build up to a fully functional system.

Setting Up RedisVL and the Vector Index

First, we need to create a vector index that can store our log embeddings and enable fast similarity search. RedisVL makes this easy for you while handling the complexity of index management behind the scenes:

from redisvl.schema import IndexSchema
from redisvl.index import SearchIndex
from sentence_transformers import SentenceTransformer
import numpy as np
from datetime import datetime
from typing import Dict, List, Tuple

class OTELAnomalyDetector:
    def __init__(self):
        """
        Initialize our anomaly detection system with RedisVL.

        We're using sentence-transformers for embeddings because:
        1. They're specifically trained for semantic similarity
        2. They run efficiently on CPU (no GPU required)
        3. They produce fixed-size vectors perfect for similarity search
        """

        # Initialize the embedding model
        # all-MiniLM-L6-v2 gives us 768-dimensional embeddings
        # It's fast (2000+ sentences/sec on CPU) and accurate
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')

        # Define our Redis schema
        # This tells RedisVL how to store and index our data
        schema = {
            "index": {
                "name": "otel-logs",
                "prefix": "log:",  # All keys will start with "log:"
                "storage_type": "hash"  # Redis hash for flexible field storage
            },
            "fields": [
                # Vector field for embeddings - this is where the magic happens
                {
                    "name": "embedding",
                    "type": "vector",
                    "attrs": {
                        "dims": 768,  # Dimensions from our model
                        "distance_metric": "cosine",  # Cosine similarity for semantic matching
                        "algorithm": "hnsw",  # Hierarchical Navigable Small World for fast search
                        "datatype": "float32"
                    }
                },
                # OTEL semantic convention fields for context
                {"name": "service_name", "type": "tag"},  # Which service?
                {"name": "severity", "type": "tag"},  # ERROR, WARN, INFO
                {"name": "http_status_code", "type": "numeric"},  # 200, 404, 500
                {"name": "timestamp", "type": "numeric"},  # When did this happen?
                {"name": "message", "type": "text"},  # Original log message
                {"name": "trace_id", "type": "tag"},  # For tracing correlation
            ]
        }

        # Create the RedisVL index
        self.index = SearchIndex(IndexSchema.from_dict(schema))
        self.index.create(overwrite=False)  # Don't overwrite if exists

        # Anomaly detection threshold
        # Cosine distance > 0.7 indicates an anomaly
        self.anomaly_threshold = 0.7

The schema design is crucial. We're not just storing vectors; we're creating a searchable space where we can filter by service, time, or severity before applying vector similarity. This contextual search is what makes our anomaly detection intelligent rather than just mathematical.

Processing OTEL Logs into Embeddings

Now let's transform OpenTelemetry logs into embeddings. The key is creating a text representation that preserves the semantic meaning and context:

def process_otel_log(self, log_record: Dict) -> Tuple[bool, float, str]:
    """
    Process an OpenTelemetry log record through our anomaly detection pipeline.

    This method is the heart of our system. It takes structured OTEL data,
    converts it to a semantic representation, and determines if it's anomalous.

    Returns:
        - is_anomaly: Boolean indicating if this log is anomalous
        - anomaly_score: Float between 0 and 1 (higher = more anomalous)
        - explanation: Human-readable explanation of the decision
    """

    # Step 1: Create a semantic representation of the log
    # We're not just concatenating fields - we're creating a narrative
    # that captures the meaning and context

    log_text = self._create_semantic_text(log_record)

    # Step 2: Generate the embedding
    # This converts our text into a 768-dimensional vector
    # The model understands semantic relationships, so similar events
    # produce similar vectors even with different wording

    embedding = self.encoder.encode(log_text, convert_to_numpy=True)

    # Step 3: Search for similar historical logs
    # This is where we determine if this log is normal or anomalous

    is_anomaly, score, similar_logs = self._check_anomaly(embedding, log_record)

    # Step 4: Store this log for future comparisons
    # Every log helps improve our understanding of "normal"

    self._store_log(log_record, embedding, score)

    # Step 5: Generate explanation
    # This helps operators understand why something was flagged

    explanation = self._generate_explanation(is_anomaly, score, similar_logs, log_record)

    return is_anomaly, score, explanation

def _create_semantic_text(self, log_record: Dict) -> str:
    """
    Convert OTEL structured data into semantic text for embedding.

    The order and format matter! We're creating a consistent narrative
    that helps the embedding model understand context. Think of this as
    writing a one-sentence story about what happened.
    """

    parts = []

    # Start with the service context
    service = log_record.get('resource', {}).get('service.name', 'unknown')
    parts.append(f"In service {service}")

    # Add severity context
    severity = log_record.get('severity_text', 'INFO')
    if severity == 'ERROR':
        parts.append("an error occurred")
    elif severity == 'WARN':
        parts.append("a warning was raised")
    else:
        parts.append("an event happened")

    # Add HTTP context if present
    attributes = log_record.get('attributes', {})
    if 'http.method' in attributes:
        method = attributes['http.method']
        path = attributes.get('http.route', 'unknown')
        status = attributes.get('http.status_code', 'unknown')
        parts.append(f"during {method} {path} with status {status}")

    # Add network context
    if 'net.peer.ip' in attributes:
        parts.append(f"from IP {attributes['net.peer.ip']}")

    # Add the actual message
    message = log_record.get('body', 'no message')
    parts.append(f": {message}")

    # Add trace context for distributed tracing correlation
    if 'trace_id' in log_record:
        parts.append(f"[trace:{log_record['trace_id'][:8]}]")

    return " ".join(parts)

This semantic text creation is more art than science. We're crafting a narrative that preserves the important context while being consistent enough for the embedding model to find patterns. The model has been trained on billions of sentences, so it understands that "error occurred during POST /api/login with status 401" indicates a failed authentication attempt.

Detecting Anomalies with Semantic Search

Here's where Redis really shines. We will implement a fast similarity search to determine if a log is anomalous. We will develop a method that performs anomaly detection while feeding Redis with additional logs for future analysis.

from redisvl.query import VectorQuery
import json

def _check_anomaly(self, embedding: np.ndarray, log_record: Dict) -> Tuple[bool, float, List]:
    """
    Determine if a log is anomalous by comparing it to historical patterns.

    The core insight: normal logs cluster together in vector space,
    while anomalies are outliers. We use k-NN (k-Nearest Neighbors)
    to find similar logs and calculate how different this one is.
    """

    # Build a vector similarity query with context filters
    # We're not searching all logs - we're searching logs from the same service
    # in a recent time window for more accurate anomaly detection

    service_name = log_record.get('resource', {}).get('service.name', 'unknown')
    current_time = datetime.now().timestamp()
    one_hour_ago = current_time - 3600

    # Create RedisVL vector query
    # This combines semantic similarity with metadata filtering
    query = VectorQuery(
        vector=embedding,
        vector_field_name="embedding",
        num_results=20,  # Find 20 most similar logs
        return_fields=["message", "severity", "timestamp", "service_name"],
        filter_expression=f"@service_name:{{{service_name}}} @timestamp:[{one_hour_ago} {current_time}]"
    )

    # Execute the search
    results = self.index.search(query)

    if len(results) < 5:
        # Not enough historical data - can't determine if anomalous
        # Default to not anomalous to avoid false positives
        return False, 0.0, []

    # Calculate anomaly score based on distances to nearest neighbors
    distances = []
    similar_logs = []

    for result in results[:10]:  # Use top 10 for scoring
        # RedisVL returns cosine distance (0 = identical, 2 = opposite)
        distance = float(result['vector_distance'])
        distances.append(distance)

        similar_logs.append({
            'message': result['message'],
            'severity': result['severity'],
            'distance': distance,
            'similarity': 1 - (distance / 2)  # Convert to similarity score
        })

    # Calculate anomaly score using statistics
    # We use both mean and minimum distance for robustness
    mean_distance = np.mean(distances)
    min_distance = np.min(distances)

    # Weighted combination - if even the closest log is far, it's very anomalous
    anomaly_score = (0.7 * mean_distance + 0.3 * min_distance)

    # Determine if anomalous based on threshold
    is_anomaly = anomaly_score > self.anomaly_threshold

    return is_anomaly, anomaly_score, similar_logs[:3]  # Return top 3 for explanation

def _store_log(self, log_record: Dict, embedding: np.ndarray, anomaly_score: float):
    """
    Store the processed log with its embedding in RedisVL.

    This builds our historical baseline. Every normal log makes our
    anomaly detection more accurate by better defining "normal".
    """

    # Generate unique ID using timestamp and trace ID
    timestamp = log_record.get('timestamp', datetime.now().timestamp())
    trace_id = log_record.get('trace_id', 'notrace')
    log_id = f"log:{trace_id}:{int(timestamp * 1000)}"

    # Prepare document for RedisVL
    doc = {
        'embedding': embedding,  # RedisVL handles serialization
        'service_name': log_record.get('resource', {}).get('service.name', 'unknown'),
        'severity': log_record.get('severity_text', 'INFO'),
        'http_status_code': log_record.get('attributes', {}).get('http.status_code', 0),
        'timestamp': timestamp,
        'message': log_record.get('body', ''),
        'trace_id': trace_id,
        'anomaly_score': anomaly_score,
        'raw_log': json.dumps(log_record)  # Store original for debugging
    }

    # Store in Redis via RedisVL
    self.index.load([doc], keys=[log_id])

The beauty of this approach is its simplicity. We're not training complex models or maintaining elaborate rule sets. We're just asking: "Have we seen something like this before?" If the answer is no (high distance to all historical logs), it's an anomaly worth investigating.

Making Results Actionable

An anomaly detection system is only useful if it provides actionable insights. Let's add explanation generation to this code to help operators understand why something was flagged:

def _generate_explanation(self, is_anomaly: bool, score: float, 
                         similar_logs: List, log_record: Dict) -> str:
    """
    Generate human-readable explanations for anomaly decisions.

    This is crucial for building trust in the system. Operators need
    to understand not just that something is anomalous, but why.
    """

    if not is_anomaly:
        if similar_logs:
            return (f"Normal behavior. Similar to {len(similar_logs)} recent logs "
                   f"from the same service. Closest match: '{similar_logs[0]['message'][:50]}...' "
                   f"with {similar_logs[0]['similarity']:.1%} similarity.")
        else:
            return "Normal behavior. Insufficient historical data for detailed comparison."

    # For anomalies, provide detailed explanation
    severity_assessment = "High" if score > 0.9 else "Medium" if score > 0.8 else "Low"

    explanation_parts = [
        f"{severity_assessment} severity anomaly detected (score: {score:.2f}).",
        f"This log is significantly different from recent patterns in {log_record.get('resource', {}).get('service.name', 'the service')}."
    ]

    if similar_logs:
        # Show what normal looks like for comparison
        explanation_parts.append(
            f"Most similar normal log: '{similar_logs[0]['message'][:50]}...' "
            f"but with only {similar_logs[0]['similarity']:.1%} similarity."
        )
    else:
        explanation_parts.append("No similar logs found in recent history.")

    # Add specific indicators if present
    attributes = log_record.get('attributes', {})
    if attributes.get('http.status_code', 0) >= 500:
        explanation_parts.append("Server error status code detected.")

    if 'error' in log_record.get('body', '').lower():
        explanation_parts.append("Error keyword present in message.")

    return " ".join(explanation_parts)

A variation of this implementation could use an LLM to provide more structure using a human-like narrative. But we decided to keep things simple here. Just know the art of the possible.

Testing Everything with an Example

Let's see our anomaly detector in action with real OpenTelemetry logs:

# Initialize the detector
detector = OTELAnomalyDetector()

# Example OTEL logs - some normal, some anomalous
test_logs = [
    # Normal authentication log
    {
        "timestamp": 1699564800.0,
        "severity_text": "INFO",
        "body": "User login successful for user123",
        "resource": {"service.name": "auth-service"},
        "attributes": {
            "http.method": "POST",
            "http.route": "/api/login",
            "http.status_code": 200,
            "net.peer.ip": "192.168.1.100"
        },
        "trace_id": "abc123"
    },

    # Another normal log - similar pattern
    {
        "timestamp": 1699564860.0,
        "severity_text": "INFO",
        "body": "Authentication completed for user456",
        "resource": {"service.name": "auth-service"},
        "attributes": {
            "http.method": "POST",
            "http.route": "/api/login",
            "http.status_code": 200,
            "net.peer.ip": "192.168.1.101"
        },
        "trace_id": "def456"
    },

    # Suspicious log - SQL injection attempt
    {
        "timestamp": 1699564920.0,
        "severity_text": "ERROR",
        "body": "Invalid input detected: '; DROP TABLE users; --",
        "resource": {"service.name": "auth-service"},
        "attributes": {
            "http.method": "POST",
            "http.route": "/api/login",
            "http.status_code": 400,
            "net.peer.ip": "185.220.101.45"  # Known malicious IP range
        },
        "trace_id": "ghi789"
    },

    # Another anomaly - unusual error pattern
    {
        "timestamp": 1699564980.0,
        "severity_text": "ERROR",
        "body": "Database connection pool exhausted - unable to serve requests",
        "resource": {"service.name": "auth-service"},
        "attributes": {
            "http.method": "POST",
            "http.route": "/api/login",
            "http.status_code": 503,
            "net.peer.ip": "192.168.1.102"
        },
        "trace_id": "jkl012"
    }
]

# Process logs and detect anomalies
print("=" * 80)
print("ANOMALY DETECTION RESULTS")
print("=" * 80)

for log in test_logs:
    is_anomaly, score, explanation = detector.process_otel_log(log)

    print(f"\nLog: {log['body'][:60]}...")
    print(f"Timestamp: {datetime.fromtimestamp(log['timestamp'])}")
    print(f"Service: {log['resource']['service.name']}")
    print(f"Anomaly: {'🔴 YES' if is_anomaly else '🟢 NO'}")
    print(f"Score: {score:.3f}")
    print(f"Explanation: {explanation}")
    print("-" * 40)

When you run this, you'll see the system correctly identify the SQL injection attempt and database error as anomalies, while recognizing the normal authentication logs as expected behavior. The explanations help you understand why each decision was made. Our anomaly detection system works because embeddings capture semantic meaning, not just text patterns.

When we process "Invalid input detected: '; DROP TABLE users; --", the embedding model recognizes this as semantically different from normal authentication logs. It understands that DROP TABLE is a SQL command, that the semicolon and comment markers indicate injection attempts, and that this pattern is inconsistent with successful logins.

The anomaly scores tell us how unusual each log is:

0.0 - 0.3: Very normal, closely matches historical patterns
0.3 - 0.7: Somewhat unusual but likely benign
0.7 - 0.9: Anomalous, worth investigating
0.9+: Highly anomalous, immediate attention needed

The contextual filtering (by service and time) is crucial. A "database connection timeout" might be normal for a batch processing service, but anomalous for your authentication service. By comparing logs within the same context, we avoid false positives from cross-service differences.

Monitoring the Monitor

As what happens with any system, you must ensure it is up and running so you can rely on it when you need it the most. As a best practice, you should track key metrics to ensure your anomaly detection system is working correctly:

def get_system_metrics(self) -> Dict:
    """
    Get metrics about the anomaly detection system itself.
    """
    info = self.index.info()

    return {
        'total_logs': info.get('num_docs', 0),
        'index_memory_mb': info.get('memory_usage_mb', 0),
        'avg_embedding_time_ms': self._measure_embedding_speed(),
        'avg_search_time_ms': self._measure_search_speed(),
        'anomaly_rate': self._calculate_recent_anomaly_rate(),
        'index_fragmentation': info.get('fragmentation_ratio', 0)
    }

Why This Approach Works?

The combination of OpenTelemetry structure, semantic embeddings, and Redis's fast vector search capabilities creates a powerful anomaly detection system that:

Understands Context: Unlike regex patterns, embeddings understand that "authentication failed" and "login unsuccessful" mean the same thing. This is powerful. Anomalies don't have to be fancy IP masking attempts, as we often see in movies. It's often a log text said differently.
Learns Continuously: Every log processed improves the baseline, making detection more accurate over time. In this case, more data literally means more value.
Scales Efficiently: RedisVL's HNSW index maintains fast search times even with millions of logs stored. Redis's in-memory design enables queries to execute with extremely low latency, while providing the tools to scale out as needed—both vertically and horizontally.
Requires No Training: Pre-trained embedding models work out of the box, no ML expertise required. Often, this is what hurts engineering teams the most, as they require ML expertise from day one to get things working.
Provides Explanations: Operators understand why something was flagged, building trust in the system. They may not be experts, but they are able to capture the essence of what went wrong and take the required actions.

Conclusion

In this blog post, we've built an anomaly detection system that understands the semantic meaning of logs, not just their text patterns. By combining OpenTelemetry's structured observability with embedding-based semantic search in Redis, we can detect novel anomalies that rule-based systems would likely miss.

The beauty of this approach is its simplicity. We're not training complex models or maintaining hundreds of rules. We're just asking, "Is this log semantically similar to what we've seen before?" When the answer is no, we've found an anomaly worth investigating. You can verify this by deploying this implementation for one critical service and allowing it to learn for a week. You'll be surprised how quickly it starts catching issues your existing monitoring misses. As you gain confidence, expand to more services, each maintaining its own baseline of normal behavior.

The future of observability isn't about writing more alerts or hiring more analysts. It's about systems that understand meaning, learn patterns, and surface what truly matters. By using OpenTelemetry for structure, embeddings for understanding, and RedisVL for scale, that future is accessible to any engineering team today.

Designing Data Systems with Vector Embeddings using Redis Vector Sets

Ricardo Ferreira — Sun, 03 Aug 2025 15:10:07 +0000

Introduction

Every software engineer has faced this challenge: how do you model complex, multifaceted relationships in your data? Whether it's products in an e-commerce catalog, documents in a search engine, or content in a recommendation system, we often struggle to represent data entities with multiple attributes and complex relationships.

Today, we'll explore this fundamental problem through an unexpected lens: Pixar's Finding Nemo. By modeling Marlin's journey to find his son, Nemo, you'll learn how Redis Vector Sets and vector embeddings can elegantly solve problems that most traditional approaches struggle with.

Best of all? You'll learn how to do this hands-on.

The Challenge: Data Design

Imagine you're tasked with building a system representing Marlin's journey in Finding Nemo. If you haven't watched the movie (well, if that is true, then shame on you), here is a TL;DR of the storyline.

Nemo, a clownfish, gets lost in the sea and is dragged to Sydney
Marlin, his father, starts a rescue journey alone and afraid
He meets Dory, a helpful but rather forgetful blue tang
They encounter Bruce and his shark friends, reformed predators
A school of moonfish gives Marlin and Dory directions
Crush and the sea turtles help them ride the sea to the EAC
A whale swallows them and takes them to Sydney by accident
Nigel the pelican recognizes Marlin and shares where Nemo is
Finally, Marlin reunites with Nemo

These are the requirements your system needs to keep in mind:

Preserve the journey order — Who does Marlin meet first, second, third?
Find similar characters — Which characters are most like Dory?
Filter by attributes — Show me all the "helpers" or "large creatures"
Handle flexible insertion — Add characters in any order, not just chronologically
Support proximity queries — Who does Marlin meet after the sharks?

Which well-known approaches would you use to address these requirements? Let's discuss some options and their trade-offs.

🔗 Using Linked Lists

package main

type Character struct {
    Name string
    Next *Character
}

func main() {
    marlin := &Character{Name: "Marlin"}
    dory := &Character{Name: "Dory"}
    marlin.Next = dory
    // Same for next ones...
}

Linked Lists are great for preserving the journey order. It allows you to navigate the relationships in different directions: forward-only, bidirectionally, and with circular support. But here is the problem:

💡 You can't query by attributes, no similarity search, and rigid insertion order.

🗄️ Relational Database

CREATE TABLE journey (
    id INT PRIMARY KEY,
    character_name VARCHAR(50),
    position INT,
    species VARCHAR(50),
    helpfulness FLOAT,
    size FLOAT
);

Relational databases are fairly attractive because they provide a proven programming model. If your data is in a table, you probably know how to query it with SQL. But here is the problem:

💡 Similarity queries require complex JOINs, "find nearest neighbors" is expensive, and multi-attribute distance calculations are cumbersome.

🕸️ Graph Database

(marlin:Character)-[:MEETS]->(dory:Character)-[:MEETS]->(sharks:Character)

Graph databases excel in scenarios requiring complex relationships. Let's face it, this is how the world looks most of the time. But here is the problem:

💡 While suitable for relationships, calculating multi-dimensional similarity is still complex, and ordering isn't inherent.

📄 Document Store with Full-Text Search

{
  "character": "Dory",
  "attributes": ["helpful", "forgetful", "blue tang"],
  "position": 2
}

Document stores are great because they provide flexible data models that allow you to adapt your query needs faster and painlessly. But here is the problem:

💡 Text matching isn't the same as mathematical similarity, and no accurate distance calculations exist.

What's Missing?

All these approaches treat the journey order and character attributes as separate concerns. As developers, we often tend to pick one of these approaches because we are used to it and struggle to implement the remaining requirements. It's almost as if we hope it will work great in the end, like magic.

What if we could encode both in a unified mathematical representation? What if each character could exist as a point in multi-dimensional space, where their position store both when they appear and what they're like? Well, this is why vector embeddings are here for.

A New Perspective with Vectors

Instead of thinking of characters as records, nodes, or documents, imagine them as points in a multi-dimensional space. Each dimension represents an attribute that helps you design your data entity correctly:

Journey Position (when Marlin meets them)
Helpfulness (how much they assist)
Size (physical dimensions)
Swimming Style (movement patterns)
Courage (bravery level)

With these five dimensions, finding "who comes next in the journey" becomes now a nearest-neighbor search. Finding "similar characters" is just measuring distances in this space. Magic? No. Mathematics. The usage of dimensions with a vector store allows you the following:

Flexible Insertion: Add data in any order; relationships are maintained by vector dimensions with loose references across different records.
Multi-Attribute Similarity: Distance calculations consider all dimensions simultaneously. There are no more per-field comparisons.
Semantic Queries: Search by example or by constructing meaningful query vectors. Search by meaning instead of purely precision.
Hybrid Search: Combine vector similarity with attribute filtering. Vectors for semantic search, filters for additional result precision.
Performance: HNSW indices make even high-dimensional searches fast. Search massive amounts of data with O(Log N) performance.

Building Marlin's Journey with Redis Vector Sets

Let's build this step by step. We'll use Redis Vector Sets, providing high-performance vector similarity search and additional filtering capabilities. You will need Redis Open Source for this.

Step 1: Creating Our Universe

First, let's check that we're starting fresh and understand what we're building. Let's see what data type we're creating:

TYPE finding-nemo

➡️ Expected output:

"none"

Great! We're starting with a clean slate. Now let's add our first character—Marlin. He is the anxious father starting his journey.

VADD finding-nemo VALUES 5 0.0 0.5 0.2 0.1 0.1 marlin SETATTR '{"species":"clownfish","type":"father","quote":"I have to find my son!"}'

➡️ Expected output:

(integer) 1

What just happened?

We've created a new vector set, under the key finding-nemo
We've added Marlin as a 5-dimensional point at position (0.0, 0.5, 0.2, 0.1, 0.1)
We've attached metadata to Marlin about species, role, and his iconic quote

The vector embedding [0.0, 0.5, 0.2, 0.1, 0.1] tells us Marlin starts at the first position (0.0), has moderate helpfulness (0.5), is small (0.2), swims cautiously (0.1), and begins with low courage (0.1). In this example, we have used only five dimensions because these are all the attributes we need to implement the scenario. However, you are not limited to this. You can use as many dimensions as you need. Have you ever heard about OpenAI's embedding models that create 1536 dimensions? This is one example of how far you can go!

Step 2: Building the Journey (Out of Order!)

Here's where it gets interesting. In traditional systems, we'd need to insert characters in order. But with vectors, watch this:

VADD finding-nemo VALUES 5 6.0 0.9 0.5 0.7 0.7 nigel SETATTR '{"species":"pelican","type":"informant","quote":"Hop inside my mouth!"}'
VADD finding-nemo VALUES 5 1.0 1.0 0.2 0.2 0.2 dory SETATTR '{"species":"blue tang","type":"helper","quote":"Just keep swimming!"}'
VADD finding-nemo VALUES 5 5.0 0.9 1.0 0.9 0.6 whale SETATTR '{"species":"whale","type":"transporter","quote":"*whale sounds*"}'
VADD finding-nemo VALUES 5 3.0 0.7 0.7 0.4 0.3 moonfish SETATTR '{"species":"moonfish","type":"guides","quote":"Follow the EAC!"}'

Notice we're adding Nigel (position 6.0) before Dory (position 1.0). In a Linked List, this would break our ordering. But vectors don't care about insertion order. They care about position in space.

Let's complete our cast:

VADD finding-nemo VALUES 5 7.0 0.0 0.2 0.1 0.8 nemo SETATTR '{"species":"clownfish","type":"son","quote":"Dad!"}'
VADD finding-nemo VALUES 5 4.0 0.9 0.6 0.8 0.5 turtles SETATTR '{"species":"sea turtles","type":"transporters","quote":"Righteous! Righteous!"}'
VADD finding-nemo VALUES 5 2.0 0.7 0.8 0.3 0.3 sharks SETATTR '{"species":"sharks","type":"reformed predators","quote":"Fish are friends, not food!"}'

Step 3: Verifying Our Vector Universe

Let's examine what we've built. What type of data structure did we create?

TYPE finding-nemo

➡️ Expected output:

vectorset

Vector Sets provide a way for you to inspect your data very easily. Use the command VCARD to count how many characters are in our journey.

VCARD finding-nemo

➡️ Expected output:

(integer) 8

What if you want to investigate how many dimensions you are using? This is quite common, as the team that loads vectors into the databases is not always the same one that queries them. Use the command VDIM to find how many dimensions each character has.

VDIM finding-nemo

➡️ Expected output:

(integer) 5

If you need to retrieve information about your vector set, use the command VINFO for this.

VINFO finding-nemo

➡️ Expected output:

1) "quant-type"
2) "int8"
3) "hnsw-m"
4) "16"
5) "vector-dim"
6) "5"
7) "projection-input-dim"
8) "0"
9) "size"
10) "8"
11) "max-level"
12) "1"
13) "attributes-count"
14) "8"
15) "vset-uid"
16) "0"
17) "hnsw-max-node-uid"
18) "8"

Step 4: Tracing the Journey

Now for the revealing moment. Despite inserting characters randomly, can we trace Marlin's journey in the correct order? The answer is yes. You must start from before the journey (-1.0) and find all the characters in order.

VSIM finding-nemo VALUES 5 -1.0 0.5 0.2 0.1 0.1

➡️ Expected output:

1) "marlin"
2) "dory"
3) "sharks"
4) "moonfish"
5) "turtles"
6) "whale"
7) "nigel"
8) "nemo"

🎉 Perfect! But how did this work?

The query vector [-1.0, 0.5, 0.2, 0.1, 0.1] is cleverly designed:

Position -1.0 places us "before" the journey starts
The other values (0.5, 0.2, 0.1, 0.1) match Marlin's characteristics

Redis finds the nearest neighbors in order, effectively tracing the path from start to finish. The significant gaps between journey positions (0, 1, 2... 7) ensure the first dimension dominates the distance calculation.

Step 5: Finding Similar Characters

Vector sets shine at similarity search. Let's explore relationships. For instance, who are the three characters most similar to Marlin?

VSIM finding-nemo ELE marlin COUNT 3

➡️ Expected output:

1) "marlin"
2) "dory"    # Makes sense - small fish, next in journey
3) "sharks"  # Next encounter after Dory

Who's closest to Nemo?

VSIM finding-nemo ELE nemo COUNT 3

➡️ Expected output:

1) "nemo"
2) "nigel"   # Met right before reunion
3) "whale"   # Carried Marlin to Sidney

The similarity considers all dimensions, not just journey position, but also size, helpfulness, and other attributes.

Step 6: Finding Helpers with Filtered Searches

Here's where vector sets truly excel over traditional approaches. Let's find all helpers and transporters near Dory.

VSIM finding-nemo ELE dory FILTER '.type == "helper" || .type == "transporters"' COUNT 5

➡️ Expected output:

1) "dory"
2) "turtles"

This combines vector similarity with attribute filtering, which would require complex queries in traditional databases.

Step 7: Semantic Queries

Let's find the largest creatures by searching near a "large creature" point. Let's search near a point representing large, helpful creatures.

VSIM finding-nemo VALUES 5 3.5 0.8 1.0 0.5 0.5 COUNT 3

➡️ Expected output:

2) "moonfish" # Reef largest (size=0.8)
1) "whale"    # Largest (size=1.0)
3) "turtles"  # Also sizeable (size=0.6)

We didn't need to write WHERE size > 0.7 as vector space naturally clusters large creatures together. Semantic search is a powerful type of querying that exploits data's proximity instead of its precision.

Summary

We've solved challenges that stump traditional approaches by representing Marlin's journey as vectors. Through the elegant mathematics of vector spaces, we can query by order, find similar entities, filter by attributes, and add data with flexible insertion. While the Finding Nemo example in this blog post may have been whimsical, all the underlying principles are foundational to modern AI and search systems.

Vector Sets, part of Redis Open Source, provide the perfect environment for exploring data modeling with vector embeddings, while providing a robust implementation of the HNSW algorithm.

So, the next time you face a complex data modeling challenge, ask yourself, could this be a vector? Who knows. Sometimes the best solutions come from seeing your data from a different dimension. Pun intended.

Semantic Caching with Spring AI & Redis

Raphael De Lio — Thu, 31 Jul 2025 09:37:38 +0000

TL;DR: You’re building a semantic caching system using Spring AI and Redis to improve LLM application performance.

Unlike traditional caching that requires exact query matches, semantic caching understands the meaning behind queries and can return cached responses for semantically similar questions.

It works by storing query-response pairs as vector embeddings in Redis, allowing your application to retrieve cached answers for similar questions without calling the expensive LLM, reducing both latency and costs.

The Problem with Traditional LLM Applications

LLMs are powerful but expensive. Every API call costs money and takes time. When users ask similar questions like “What beer goes with grilled meat?” and “Which beer pairs well with barbecue?”, traditional systems would make separate LLM calls even though these queries are essentially asking the same thing.

Traditional exact-match caching only works if users ask the identical question word-for-word. But in real applications, users phrase questions differently while seeking the same information.

How Semantic Caching Works

Video: What is a semantic cache?

Semantic caching solves this by understanding the meaning behind queries rather than matching exact text. When a user asks a question:

The system converts the query into a vector embedding
It searches for semantically similar cached queries using vector similarity
If a similar query exists above a certain threshold, it returns the cached response
If not, it calls the LLM, gets a response, and caches both the query and response for future use

Behind the scenes, this works thanks to vector similarity search. It turns text into vectors (embeddings) — lists of numbers — stores them in a vector database, and then finds the ones closest to your query when checking for cached responses.

Today, we’re gonna build a semantic caching system for a beer recommendation assistant. It will remember previous responses to similar questions, dramatically improving response times and reducing API costs.

To do that, we’ll build a Spring Boot app from scratch and use Redis as our semantic cache store. It’ll handle vector embeddings for similarity matching, enabling our application to provide lightning-fast responses for semantically similar queries.

Redis as a Semantic Cache for AI Applications

Video: What's a vector database

Redis Open Source 8 not only turns the community version of Redis into a Vector Database, but also makes it the fastest and most scalable database in the market today. Redis 8 allows you to scale to one billion vectors without penalizing latency.

For semantic caching, Redis serves as:

A vector store using Redis JSON and the Redis Query Engine for storing query embeddings
A metadata store for cached responses and additional context
A high-performance search engine for finding semantically similar queries

Spring AI and Redis

Video: What’s an embedding model?

Spring AI provides a unified API for working with various AI models and vector stores. Combined with Redis, it allows developers to easily build semantic caching systems that can:

Store and retrieve vector embeddings for semantic search
Cache LLM responses with semantic similarity matching
Reduce API costs by avoiding redundant LLM calls
Improve response times for similar queries

Building the Application

Our application will be built using Spring Boot with Spring AI and Redis. It will implement a beer recommendation assistant that caches responses semantically, providing fast answers to similar questions about beer pairings.

0. GitHub Repository

The full application can be found on GitHub

https://github.com/redis-developer/redis-springboot-resources/tree/main/artificial-intelligence/semantic-caching-with-spring-ai

1. Add the required dependencies

From a Spring Boot application, add the following dependencies to your Maven or Gradle file:

implementation("org.springframework.ai:spring-ai-transformers:1.0.0")
implementation("org.springframework.ai:spring-ai-starter-vector-store-redis")
implementation("org.springframework.ai:spring-ai-starter-model-openai")

2. Configure the Semantic Cache Vector Store

We’ll use Spring AI’s RedisVectorStore to store and search vector embeddings of cached queries and responses:

@Configuration
class SemanticCacheConfig {
    @Bean
    fun semanticCachingVectorStore(
        embeddingModel: TransformersEmbeddingModel,
        jedisPooled: JedisPooled
    ): RedisVectorStore {
        return RedisVectorStore.builder(jedisPooled, embeddingModel)
            .indexName("semanticCachingIdx")
            .contentFieldName("content")
            .embeddingFieldName("embedding")
            .metadataFields(
                RedisVectorStore.MetadataField("answer", Schema.FieldType.TEXT)
            )
            .prefix("semantic-caching:")
            .initializeSchema(true)
            .vectorAlgorithm(RedisVectorStore.Algorithm.HSNW)
            .build()
    }
}

Let’s break this down:

Index Name: semanticCachingIdx — Redis will create an index with this name for searching cached responses
Content Field: content — The raw prompt that will be embedded
Embedding Field: embedding — The field that will store the resulting vector embedding
Metadata Fields:
answer: TEXT field for storing the LLM's response
Prefix: semantic-caching: — All keys in Redis will be prefixed with this to organize the data
Vector Algorithm: HSNW — Hierarchical Navigable Small World algorithm for efficient approximate nearest neighbor search

3. Implement the Semantic Caching Service

The SemanticCachingService handles storing and retrieving cached responses from Redis:

@Service
class SemanticCachingService(
    private val semanticCachingVectorStore: RedisVectorStore
) {
    private val logger = LoggerFactory.getLogger(SemanticCachingService::class.java)
    fun storeInCache(prompt: String, answer: String) {
        // Create a document for the vector store
        val document = Document(
            prompt,
            mapOf("answer" to answer)
        )
        // Store the document in the vector store
        semanticCachingVectorStore.add(listOf(document))

        logger.info("Stored response in semantic cache for prompt: ${prompt.take(50)}...")
    }
    fun getFromCache(prompt: String, similarityThreshold: Double = 0.8): String? {
        // Execute similarity search
        val results = semanticCachingVectorStore.similaritySearch(
            SearchRequest.builder()
                .query(prompt)
                .topK(1)
                .build()
        )
        // Check if we found a semantically similar query above threshold
        if (results?.isNotEmpty() == true) {
            val score = results[0].score ?: 0.0
            if (similarityThreshold < score) {
                logger.info("Cache hit! Similarity score: $score")
                return results[0].metadata["answer"] as String
            } else {
                logger.info("Similar query found but below threshold. Score: $score")
            }
        }
        logger.info("No cached response found for prompt")
        return null
    }
}

Key features of the semantic caching service:

Stores query-response pairs as vector embeddings in Redis
Retrieves cached responses using vector similarity search
Configurable similarity threshold for cache hits
Comprehensive logging for debugging and monitoring

4. Integrate with the RAG Service

The RagService orchestrates the semantic caching with the standard RAG pipeline:

@Service
class RagService(
    private val chatModel: ChatModel,
    private val vectorStore: RedisVectorStore,
    private val semanticCachingService: SemanticCachingService
) {
    private val logger = LoggerFactory.getLogger(RagService::class.java)
    fun retrieve(message: String): RagResult {
        // Check semantic cache first
        val startCachingTime = System.currentTimeMillis()
        val cachedAnswer = semanticCachingService.getFromCache(message, 0.8)
        val cachingTimeMs = System.currentTimeMillis() - startCachingTime
        if (cachedAnswer != null) {
            logger.info("Returning cached response")
            return RagResult(
                generation = Generation(AssistantMessage(cachedAnswer)),
                metrics = RagMetrics(
                    embeddingTimeMs = 0,
                    searchTimeMs = 0,
                    llmTimeMs = 0,
                    cachingTimeMs = cachingTimeMs,
                    fromCache = true
                )
            )
        }
        // Standard RAG process if no cache hit
        logger.info("No cache hit, proceeding with RAG pipeline")

        // Retrieve relevant documents
        val startEmbeddingTime = System.currentTimeMillis()
        val searchResults = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(message)
                .topK(5)
                .build()
        )
        val embeddingTimeMs = System.currentTimeMillis() - startEmbeddingTime
        // Create context from retrieved documents
        val context = searchResults.joinToString("\n") { it.text }

        // Generate response using LLM
        val startLlmTime = System.currentTimeMillis()
        val prompt = createPromptWithContext(message, context)
        val response = chatModel.call(prompt)
        val llmTimeMs = System.currentTimeMillis() - startLlmTime
        // Store the response in semantic cache for future use
        val responseText = response.result.output.text ?: ""
        semanticCachingService.storeInCache(message, responseText)
        return RagResult(
            generation = response.result,
            metrics = RagMetrics(
                embeddingTimeMs = embeddingTimeMs,
                searchTimeMs = 0, // Combined with embedding time
                llmTimeMs = llmTimeMs,
                cachingTimeMs = 0,
                fromCache = false
            )
        )
    }
    private fun createPromptWithContext(query: String, context: String): Prompt {
        val systemMessage = SystemMessage("""
            You are a beer recommendation assistant. Use the provided context to answer 
            questions about beer pairings, styles, and recommendations.

            Context: $context
        """.trimIndent())

        val userMessage = UserMessage(query)

        return Prompt(listOf(systemMessage, userMessage))
    }
}

Key features of the integrated RAG service:

Checks semantic cache before expensive LLM calls
Falls back to standard RAG pipeline for cache misses
Automatically caches new responses for future use
Provides detailed performance metrics including cache hit indicators

Running the Demo

The easiest way to run the demo is with Docker Compose, which sets up all required services in one command.

Step 1: Clone the repository

git clone https://github.com/redis-developer/redis-springboot-resources.git
cd redis-springboot-resources/artificial-intelligence/semantic-caching-with-spring-ai

Step 2: Configure your environment

Create a .env file with your OpenAI API key:

OPENAI_API_KEY=sk-your-api-key

Step 3: Start the services

docker compose up --build

This will start:

redis: for storing both vector embeddings and cached responses
redis-insight: a UI to explore the Redis data
semantic-caching-app: the Spring Boot app that implements the semantic caching system

Step 4: Use the application

When all services are running, go to localhost:8080 to access the demo. You'll see a beer recommendation interface:

If you click on Start Chat, it may be that the embeddings are still being created, and you get a message asking for this operation to complete. This is the operation where the documents we'll search through will be turned into vectors and then stored in the database. It is done only the first time the app starts up and is required regardless of the vector database you use.

Once all the embeddings have been created, you can start asking your chatbot questions. It will semantically search through the documents we have stored, try to find the best answer for your questions, and cache the responses semantically in Redis:

If you ask something similar to a question had already been asked, your chatbot will retrieve it from the cache instead of sending the query to the LLM. Retrieving an answer much faster now.

Exploring the Data in Redis Insight

RedisInsight provides a visual interface for exploring the cached data in Redis. Access it at localhost:5540 to see:

Semantic Cache Entries: Stored as JSON documents with vector embeddings
Vector Index Schema: The schema used for similarity search
Performance Metrics: Monitor cache hit rates and response times

If you run the FT.INFO semanticCachingIdx command in the RedisInsight workbench, you'll see the details of the vector index schema that enables efficient semantic matching.

Wrapping up

And that’s it — you now have a working semantic caching system using Spring Boot and Redis.

Instead of making expensive LLM calls for every similar question, your application can now intelligently cache and retrieve responses based on semantic meaning. Redis handles the vector storage and similarity search with the performance and scalability Redis is known for.

With Spring AI and Redis, you get an easy way to integrate semantic caching into your Java applications. The combination of vector similarity search for semantic matching and efficient caching gives you a powerful foundation for building cost-effective, high-performance AI applications.

Whether you’re building chatbots, recommendation engines, or question-answering systems, this semantic caching architecture gives you the tools to dramatically reduce costs while maintaining response quality and improving user experience.

Try it out, experiment with different similarity thresholds, explore other embedding models, and see how much you can save on LLM costs while delivering faster responses!

Stay Curious!

Is your Vector Database Really Fast?

Ricardo Ferreira — Tue, 22 Jul 2025 15:56:57 +0000

A few weeks ago, I was re-watching Ford v Ferrari (a great movie, by the way), and there's this scene where Carroll Shelby explains that winning isn't just about having the fastest car. It's about the perfect lap. The driver, the weather, the tires, the brake assembly, and even the timing of gear shifts matter. Everything matters.

That got me thinking about vector databases. We spend so much time debating Postgres vs. Redis vs. Pinecone vs. Weaviate vs. Qdrant vs. Milvus, comparing benchmarks, arguing about which one is "fastest." But here's the thing: we're missing the forest for the trees. Like in racing, the database is only one component of a complex system.

After spending the last few weeks researching vector search systems, I've learned that the difference between a blazing-fast vector search and a sluggish one rarely comes down to which database you picked. It's about understanding the entire system and optimizing each component. Let me share what I've learned about what really makes vector databases fast.

💡 Heads up: if you are more of a video person and prefer to learn about this from the comfort of your couch, here is one that summarizes this blog post.

1. Indexing Algorithms: The Engine Under the Hood

Let's start with the heart of any vector database: the indexing algorithm. This allows us to search millions or billions of vectors without comparing every single one. Let's review the most popular ones.

HNSW (Hierarchical Navigable Small World)

HNSW is like building a multi-story parking garage for your vectors. Each level has connections between vectors, with the top levels having long-range connections (think express highways) and lower levels having local connections (neighborhood streets).

Here's what happens during a search:

Start at the top layer with few, long-range connections
Greedily traverse to find the general neighborhood
Descend to lower layers for increasingly precise navigation
Final layer has all vectors with dense local connections

The beauty of HNSW is its logarithmic scaling. Doubling your dataset size only adds one more hop to your search path. But memory usage wise, this comes at a cost:

Memory usage = Vector data + (M × 2 × sizeof(int) × number_of_vectors × average_layers)

Where M is the number of bi-directional links per node. With M=16, which is a common default, you're looking at roughly 64-128 bytes of overhead per vector. For a million 1536-dimensional float32 vectors, that's:

Vector data: 5.7 GB
Index overhead: ~100 MB
Total: ~5.8 GB

The key insight: HNSW trades memory for speed. If you have the RAM to afford your entire dataset, it's hard to beat.

IVF (Inverted File Index)

IVF takes a different approach. It's like organizing a library by topic before searching. During index building:

Run k-means clustering to create nlist centroids
Assign each vector to its nearest centroid
During search, find the nprobe nearest centroids
Only search vectors within those clusters

The math here is elegant. If you have N vectors split into √N clusters, and search √N clusters, you're examining approximately N/√N × √N = √N vectors instead of N. That's a massive reduction.

But here's where it gets interesting. The optimal nlist isn't always √N. It depends on your data distribution:

import numpy as np

# For uniformly distributed data
optimal_nlist = int(np.sqrt(num_vectors))

# For clustered data (common with text embeddings)
optimal_nlist = int(4 * np.sqrt(num_vectors))  # Start higher

# For highly skewed distributions
# You might need even more clusters

Product Quantization: The Compression Game

PQ is fascinating—it's like JPEG compression for vectors. Instead of storing exact coordinates, you:

Split your vector into m subvectors
Learn a codebook of 256 centroids for each subspace
Replace each subvector with its nearest centroid ID (1 byte)

A 1536-dimensional float32 vector (6KB) can be compressed to 96 bytes with m=96, a 98.4% reduction. The tradeoff is accuracy—you're essentially rounding each subvector to one of 256 possible values.

The clever bit: you can precompute distances between codebook entries. During search, you're just doing lookups and additions, not floating-point multiplications.

Here's my decision framework

Algorithm	Best For	Memory	Query Time	Build Time
HNSW	<1M vectors, <5ms latency	High	Fastest	Slow
IVF	1M-100M vectors	Medium	Fast	Medium
IVF-PQ	>100M vectors	Low	Medium	Slow

< 1M vectors, need < 5ms latency: HNSW, no question
1M-100M vectors, can tolerate 10-20ms: IVF with tuned parameters
100M-1B vectors, memory constrained: IVF-PQ or optimized product quantization
1B+ vectors: Distributed IVF-PQ or specialized systems

2. Hardware Optimization: The Track Matters

The GPU Acceleration Paradox

I was surprised that GPUs could slow vector search for many real-world workloads. The issue is data transfer overhead. Moving vectors from CPU to GPU memory takes time—often more than the computation for small batches.

Consider these benchmarks on a typical setup (NVIDIA A100, PCIe Gen4):

Single query (1536d vector, 1M dataset):
- CPU (AVX-512): 2.3ms
- GPU (including transfer): 5.1ms

Batch of 100 queries:
- CPU: 180ms
- GPU: 22ms

Batch of 1000 queries:
- CPU: 1,750ms
- GPU: 87ms

The breakeven point is typically around 50-100 concurrent queries. If your workload is primarily single queries, believe it or not, but CPU might be faster!

Memory vs. Disk: The Brutal Truth

Everyone knows memory is faster than disk, but the magnitude might surprise you:

Random access latency:
- RAM: ~100 nanoseconds
- NVMe SSD: ~100 microseconds (1,000x slower)
- SATA SSD: ~500 microseconds (5,000x slower)
- HDD: ~10 milliseconds (100,000x slower)

This translates directly to query latency for vector search. A memory-based search might traverse 20 nodes in 2ms, while the exact search hitting disk could take 200ms or more.

But here's the thing—modern NVMe drives with proper prefetching can narrow this gap. I've seen well-tuned disk-based systems achieve 20-30ms latencies for million-scale datasets. The key is minimizing random access:

Use larger page sizes (64KB instead of 4KB)
Implement aggressive prefetching
Keep hot paths (graph upper layers for HNSW) in memory
Use memory-mapped files with proper madvise hints

CPU Vectorization: Free Performance

Modern CPUs have SIMD instructions that can process multiple vector elements simultaneously. The impact is substantial:

// To compile with AVX-512 support:
// gcc -mavx512f -O3 vector_ops.c -o vector_ops

#include <immintrin.h>

// Scalar dot product (simplified)
float dot_product_scalar(float* a, float* b, int d) {
    float sum = 0.0f;
    for (int i = 0; i < d; i++) {
        sum += a[i] * b[i];
    }
    return sum;
}

// AVX-512 dot product (processes 16 floats at once)
float dot_product_avx512(float* a, float* b, int d) {
    __m512 sum = _mm512_setzero_ps();
    for (int i = 0; i < d; i += 16) {
        __m512 va = _mm512_loadu_ps(&a[i]);
        __m512 vb = _mm512_loadu_ps(&b[i]);
        sum = _mm512_fmadd_ps(va, vb, sum);
    }
    return _mm512_reduce_add_ps(sum);
}

Distance calculations can be 8-16x faster on modern Intel/AMD processors. Most vector databases enable this automatically, but verify that yours does.

3. Distance Metrics & Dimensionality: The Physics of Similarity

The Computational Cost Hierarchy

Not all distance metrics are created equal. Here's the computational cost for d-dimensional vectors:

Dot Product: d multiplications, d-1 additions

Time complexity: O(d)
Operations: 2d - 1

Euclidean Distance: d multiplications, d additions, 1 square root

Time complexity: O(d)
Operations: 2d + 1
Note: Can skip square root for nearest neighbor search

Cosine Similarity: 3d multiplications, 3d-2 additions, 1 division, 2 square roots

Time complexity: O(d)
Operations: 6d - 1 (if not pre-normalized)
Operations: 2d - 1 (if pre-normalized)

The lesson? If you're using cosine similarity, always pre-normalize your vectors. This one-time cost at insertion may save 67% of computation on every query.

The Curse of Dimensionality

High dimensions aren't just about more computation. They fundamentally change the geometry of your search space. With higher dimensions:

All vectors become approximately equidistant
The ratio between nearest and furthest neighbors approaches 1
Traditional indexing structures become less effective

I've measured this effect directly. With random vectors:

10 dimensions: Nearest/furthest neighbor ratio ≈ 0.5
100 dimensions: Ratio ≈ 0.8
1000 dimensions: Ratio ≈ 0.95

This is why dimension reduction can paradoxically improve search quality while reducing computation. Here's what I've used to verify this:

from sklearn.decomposition import PCA
from sklearn.random_projection import GaussianRandomProjection
import numpy as np

# Analyze intrinsic dimensionality
# Assuming 'vectors' is a numpy array of shape (n_samples, n_features)
pca = PCA(n_components=0.95)  # Retain 95% variance
pca.fit(vectors)
print(f"Intrinsic dimensionality: {pca.n_components_}")

# If significant reduction possible, consider:
# 1. PCA for optimal reduction (slower, better quality)
# 2. Random projection for fast reduction (faster, good quality)
# 3. Autoencoder for non-linear reduction (slowest, best quality)

4. Embedding Models: The Hidden Performance Lever

Your choice of embedding model doesn't just affect search quality. It has massive performance implications. Here is what I've learned.

Model Characteristics That Matter

Dimensionality: This is obvious but often overlooked. OpenAI's ada-002 produces 1536-dimensional vectors, while many open-source models produce 384 or 768 dimensions. That's 4x or 2x less computation and memory.

Vector Distribution: Some models produce vectors with very different statistical properties:

# Analyzing vector distributions
import numpy as np

def analyze_vectors(vectors):
    # Average norm
    norms = np.linalg.norm(vectors, axis=1)
    print(f"Avg norm: {np.mean(norms):.2f} ± {np.std(norms):.2f}")

    # Sparsity (near-zero components)
    sparsity = np.sum(np.abs(vectors) < 0.01) / vectors.size
    print(f"Sparsity: {sparsity:.2%}")

    # Component distribution
    print(f"Component range: [{np.min(vectors):.2f}, {np.max(vectors):.2f}]")

I've found that models with more uniform distributions (higher entropy) create more challenging search spaces. Models with sparse activations can benefit from specialized indexes.

Numerical Precision: Many embeddings maintain quality with reduced precision:

import numpy as np

# Test precision reduction impact
# Assuming you have a function to load your vectors
# original_vectors = load_vectors()  # float32

# Example with random vectors for demonstration:
original_vectors = np.random.randn(10000, 384).astype(np.float32)

float16_vectors = original_vectors.astype(np.float16)
int8_vectors = (original_vectors * 127).astype(np.int8)

# Measure recall degradation
# Often < 1% loss for float16, < 5% for int8
# You would need to implement a recall measurement function

The Model-Database Co-Design Opportunity

Here's an insight that's often missed: your embedding model and vector database should be designed together. Example optimizations:

Binary embeddings (SBERT with binary quantization) can use Hamming distance, enabling bitwise operations
Sparse embeddings (SPLADE, ColBERT) benefit from inverted indexes rather than dense indexes
Multi-vector embeddings need specialized retrieval strategies

5. Index Tuning: The Art of Configuration

Default parameters are rarely optimal. Here's how to systematically tune your index:

HNSW Tuning Strategy

import pandas as pd
import numpy as np

def tune_hnsw_parameters(vectors, queries, ground_truth):
    """
    This is a template function. You'll need to implement:
    - build_hnsw: function to build HNSW index with given parameters
    - benchmark_index: function to measure recall and queries per second

    Example usage with FAISS:
    import faiss

    def build_hnsw(vectors, M, ef_construction):
        index = faiss.IndexHNSWFlat(vectors.shape[1], M)
        index.hnsw.efConstruction = ef_construction
        index.add(vectors)
        return index
    """
    results = []

    # Test M values (connectivity)
    for M in [8, 16, 32, 64]:
        # Test ef_construction values (build quality)
        for ef_c in [100, 200, 500]:
            index = build_hnsw(vectors, M=M, ef_construction=ef_c)

            # Test ef_search values (search quality)
            for ef_s in [50, 100, 200, 500]:
                recall, qps = benchmark_index(index, queries, ground_truth, ef_search=ef_s)
                results.append({
                    'M': M, 'ef_construction': ef_c, 'ef_search': ef_s,
                    'recall': recall, 'qps': qps,
                    'memory_mb': index.memory_usage() / 1024 / 1024
                })

    return pd.DataFrame(results)

Key insights from tuning hundreds of indexes:

M=16 is a sweet spot for most workloads
ef_construction can often be lower than defaults (200 vs 500) with minimal impact
ef_search can be adjusted per query based on importance

IVF Tuning Strategy

The nlist/nprobe relationship is critical:

import numpy as np

# Theoretical optimal for uniform data
# Assuming you have the number of vectors
num_vectors = 1000000  # Example
nlist = int(np.sqrt(num_vectors))

# But real data isn't uniform. Measure cluster imbalance:
# This assumes you've already performed clustering
# Example implementation:
from sklearn.cluster import KMeans

def calculate_cluster_imbalance(vectors, nlist):
    kmeans = KMeans(n_clusters=nlist, random_state=42)
    labels = kmeans.fit_predict(vectors)
    cluster_sizes = np.bincount(labels)
    imbalance_ratio = max(cluster_sizes) / np.mean(cluster_sizes)

    if imbalance_ratio > 3:
        # Highly imbalanced - need more clusters
        nlist = int(2 * np.sqrt(len(vectors)))

    return nlist, imbalance_ratio

6. Chunking Strategies: The Multiplier Effect

Chunking might seem like a preprocessing detail, but it's actually a major performance factor. Your chunking strategy determines:

Vector count (linear impact on search time)
Vector quality (affects search precision)
Index efficiency (impacts clustering and traversal)

Performance-Oriented Chunking

This is how I tested my chunks:

def optimize_chunking(documents, target_latency_ms=10):
    """
    Template function for chunking optimization.

    You'll need to implement:
    - estimate_max_vectors: based on your benchmarks
    - use_hierarchical_chunking: your chunking strategy
    - test_semantic_coherence: your quality measurement
    """
    # Example implementation
    def estimate_max_vectors(target_latency_ms):
        # Based on your benchmarks, e.g., 1ms per 10k vectors
        return int(target_latency_ms * 10000)

    # Estimate vectors needed for target latency
    max_vectors = estimate_max_vectors(target_latency_ms)

    # Calculate optimal chunk size
    # Assuming documents have a 'tokens' attribute
    total_tokens = sum(len(doc.tokens) for doc in documents)
    optimal_chunk_size = total_tokens / max_vectors

    # Adjust for semantic boundaries
    if optimal_chunk_size < 100:
        # Too small - lose context
        print("Consider hierarchical chunking")
        # use_hierarchical_chunking()
    elif optimal_chunk_size > 1000:
        # Might be too large - test precision
        print("Consider testing semantic coherence")
        # test_semantic_coherence()

    return optimal_chunk_size

Putting It All Together

After all this technical detail, here's the practical framework I use for optimization:

The Optimization Checklist

Profile First: Measure where time is actually spent
- Find the culprit for performance issues
- Adopt observability-first strategies
- Isolate outliers and validate them
Low-Hanging Fruit (often 2-5x improvement):
- Pre-normalize vectors for cosine similarity
- Enable CPU vectorization
- Tune batch sizes for your hardware
Algorithmic Changes (can be 10x+ improvement):
- Choose the right index for your scale
- Consider quantization for large datasets
- Optimize chunking strategy
System-Level Optimization:
- Co-locate compute and data
- Use connection pooling
- Implement caching for repeated queries

The Bottom Line

Vector database performance isn't about picking the "fastest" database—it's about understanding and optimizing the entire system. Depending on how you configure and use it, the same database can be blazing fast or frustratingly slow.

The good news? You don't need a multi-million dollar budget to achieve great performance. You need to understand the system and optimize methodically. What's your experience with vector database optimization? Have you found other bottlenecks I didn't cover? Let me know—I'm always learning, and the best insights often come from real-world production challenges.

Agent Long-term Memory with Spring AI & Redis

Raphael De Lio — Wed, 16 Jul 2025 19:57:23 +0000

TL;DR:
You're building an AI agent with memory using Spring AI and Redis.

Unlike traditional chatbots that forget previous interactions, memory-enabled agents can recall past conversations and facts.

It works by storing two types of memory in Redis: short-term (conversation history) and long-term (facts and experiences as vectors), allowing agents to provide personalized, context-aware responses.

LLMs respond to each message in isolation, treating every interaction as if it's the first time they've spoken with a user. They lack the ability to remember previous conversations, preferences, or important facts.

Memory-enabled AI agents, on the other hand, can maintain context across multiple interactions. They remember who you are, what you've told them before, and can use that information to provide more personalized, relevant responses.

In a travel assistant scenario, for example, if a user mentions "I'm allergic to shellfish" in one conversation, and later asks for restaurant recommendations in Boston, a memory-enabled agent would recall the allergy information and filter out inappropriate suggestions, creating a much more helpful and personalized experience.

Video: What is an embedding model?

Video: What is semantic search?

Today, we're gonna build a memory-enabled AI agent that helps users plan travel. It will remember user preferences, past trips, and important details across multiple conversations — even if the user leaves and comes back later.

To do that, we'll build a Spring Boot app from scratch and use Redis as our memory store. It'll handle both short-term memory (conversation history) and long-term memory (facts and preferences as vector embeddings), enabling our agent to provide truly personalized assistance.

Redis as a Memory Store for AI Agents

Video: What is a vector database?

In the last 15 years, Redis became the foundational infrastructure for realtime applications. Today, with Redis Open Source 8, it's committed to becoming the foundational infrastructure for AI applications as well.

Learn more: https://redis.io/blog/searching-1-billion-vectors-with-redis-8/

For AI agents, Redis serves as both:

A short-term memory store using Redis Lists to maintain conversation history
A long-term memory store using Redis JSON and the Redis Query Engine that enables vector search to store and retrieve facts and experiences

Spring AI and Redis

Spring AI provides a unified API for working with various AI models and vector stores. Combined with Redis, it allows our users to easily build memory-enabled AI agents that can:

Store and retrieve vector embeddings for semantic search
Maintain conversation context across sessions
Extract and deduplicate memories from conversations
Summarize long conversations to prevent context window overflow

Building the Application

Our application will be built using Spring Boot with Spring AI and Redis. It will implement a travel assistant that remembers user preferences and past trips, providing personalized recommendations based on this memory.

0. GitHub Repository

The full application can be found on GitHub: https://github.com/redis-developer/redis-springboot-resources/tree/main/artificial-intelligence/agent-long-term-memory-with-spring-ai

1. Add the required dependencies

From a Spring Boot application, add the following dependencies to your Maven or Gradle file:

    implementation("org.springframework.ai:spring-ai-transformers:1.0.0")
    implementation("org.springframework.ai:spring-ai-starter-vector-store-redis")
    implementation("org.springframework.ai:spring-ai-starter-model-openai")

    implementation("com.redis.om:redis-om-spring:1.0.0-RC3")

2. Define the Memory model

The core of our implementation is the Memory class that represents items stored in long-term memory:

data class Memory(
    val id: String? = null,
    val content: String,
    val memoryType: MemoryType,
    val userId: String,
    val metadata: String = "{}",
    val createdAt: LocalDateTime = LocalDateTime.now()
)

enum class MemoryType {
    EPISODIC,  // Personal experiences and preferences
    SEMANTIC   // General knowledge and facts
}

3. Configure the Vector Store

We'll use Spring AI's RedisVectorStore to store and search vector embeddings of memories:

@Configuration
class MemoryVectorStoreConfig {

    @Bean
    fun memoryVectorStore(
        embeddingModel: EmbeddingModel,
        jedisPooled: JedisPooled
    ): RedisVectorStore {
        return RedisVectorStore.builder(jedisPooled, embeddingModel)
            .indexName("longTermMemoryIdx")
            .contentFieldName("content")
            .embeddingFieldName("embedding")
            .metadataFields(
                RedisVectorStore.MetadataField("memoryType", Schema.FieldType.TAG),
                RedisVectorStore.MetadataField("metadata", Schema.FieldType.TEXT),
                RedisVectorStore.MetadataField("userId", Schema.FieldType.TAG),
                RedisVectorStore.MetadataField("createdAt", Schema.FieldType.TEXT)
            )
            .prefix("long-term-memory:")
            .initializeSchema(true)
            .vectorAlgorithm(RedisVectorStore.Algorithm.HSNW)
            .build()
    }
}

Let's break this down:

Index Name: longTermMemoryIdx - Redis will create an index with this name for searching memories
Content Field: content - The raw memory content that will be embedded
Embedding Field: embedding - The field that will store the resulting vector embedding
Metadata Fields:
- memoryType: TAG field for filtering by memory type (EPISODIC or SEMANTIC)
- metadata: TEXT field for storing additional context about the memory
- userId: TAG field for filtering by user ID
- createdAt: TEXT field for storing the creation timestamp

4. Implement the Memory Service

The MemoryService handles storing and retrieving memories from Redis:

@Service
class MemoryService(
    private val memoryVectorStore: RedisVectorStore
) {
    private val systemUserId = "system"

    fun storeMemory(
        content: String,
        memoryType: MemoryType,
        userId: String? = null,
        metadata: String = "{}"
    ): StoredMemory {
        // Check if a similar memory already exists to avoid duplicates
        if (similarMemoryExists(content, memoryType, userId)) {
            return StoredMemory(
                Memory(
                    content = content,
                    memoryType = memoryType,
                    userId = userId ?: systemUserId,
                    metadata = metadata,
                    createdAt = LocalDateTime.now()
                )
            )
        }

        // Create a document for the vector store
        val document = Document(
            content,
            mapOf(
                "memoryType" to memoryType.name,
                "metadata" to metadata,
                "userId" to (userId ?: systemUserId),
                "createdAt" to LocalDateTime.now().toString()
            )
        )

        // Store the document in the vector store
        memoryVectorStore.add(listOf(document))

        return StoredMemory(
            Memory(
                content = content,
                memoryType = memoryType,
                userId = userId ?: systemUserId,
                metadata = metadata,
                createdAt = LocalDateTime.now()
            )
        )
    }

    fun retrieveMemories(
        query: String,
        memoryType: MemoryType? = null,
        userId: String? = null,
        limit: Int = 5,
        distanceThreshold: Float = 0.9f
    ): List<StoredMemory> {
        // Build filter expression
        val b = FilterExpressionBuilder()
        val filterList = mutableListOf<FilterExpressionBuilder.Op>()

        // Add user filter
        val effectiveUserId = userId ?: systemUserId
        filterList.add(b.or(b.eq("userId", effectiveUserId), b.eq("userId", systemUserId)))

        // Add memory type filter if specified
        if (memoryType != null) {
            filterList.add(b.eq("memoryType", memoryType.name))
        }

        // Combine filters
        val filterExpression = when (filterList.size) {
            0 -> null
            1 -> filterList[0]
            else -> filterList.reduce { acc, expr -> b.and(acc, expr) }
        }?.build()

        // Execute search
        val searchResults = memoryVectorStore.similaritySearch(
            SearchRequest.builder()
                .query(query)
                .topK(limit)
                .filterExpression(filterExpression)
                .build()
        )

        // Transform results to StoredMemory objects
        return searchResults.mapNotNull { result ->
            if (distanceThreshold < (result.score ?: 1.0)) {
                val metadata = result.metadata
                val memoryObj = Memory(
                    id = result.id,
                    content = result.text ?: "",
                    memoryType = MemoryType.valueOf(metadata["memoryType"] as String? ?: MemoryType.SEMANTIC.name),
                    metadata = metadata["metadata"] as String? ?: "{}",
                    userId = metadata["userId"] as String? ?: systemUserId,
                    createdAt = try {
                        LocalDateTime.parse(metadata["createdAt"] as String?)
                    } catch (_: Exception) {
                        LocalDateTime.now()
                    }
                )
                StoredMemory(memoryObj, result.score)
            } else {
                null
            }
        }
    }
}

Key features of the memory service:

Stores memories as vector embeddings in Redis
Retrieves memories using vector similarity search
Filters memories by user ID and memory type
Prevents duplicate memories through similarity checking

5. Implement Spring AI Advisors

We’re going to rely on the Spring AI Advisors API. Advisors are a way to intercept, modify, and enhance AI-driven interactions.
We will implement two advisors: one for retrieval and another for recorder. These advisors will be plugged in our ChatClient and intercept every interaction with the LLM.

The retrieval advisor runs before your LLM call. It takes the user’s current message, performs a vector similarity search over Redis, and injects the most relevant memories into the system portion of the prompt so the model can ground its answer.

5.1 Advisor for Long-term memory retrieval

The retrieval advisor runs before LLM calls. It takes the user’s current message, performs a vector similarity search over Redis, and injects the most relevant memories into the system portion of the prompt so the model can ground its answer.

@Component
class LongTermMemoryRetrievalAdvisor(
  private val memoryService: MemoryService,
) : CallAdvisor, Ordered {

  companion object {
    const val USER_ID = "ltm_user_id"   
    const val TOP_K = "ltm_top_k"      
  }

  override fun getOrder() = Ordered.HIGHEST_PRECEDENCE + 40
  override fun getName() = "LongTermMemoryRetrievalAdvisor"

  override fun adviseCall(req: ChatClientRequest, chain: CallAdvisorChain): ChatClientResponse {
    val userId = (req.context()[USER_ID] as? String) ?: "system"
    val k = (req.context()[TOP_K] as? Int) ?: 5

    val query = req.prompt().userMessage.text
    val memories = memoryService.retrieveRelevantMemories(query, userId = userId)
      .take(k)

    val memoryBlock = buildString {
      appendLine("Use the MEMORY below if relevant. Keep answers factual and concise.")
      appendLine("----- MEMORY -----")
      memories.forEachIndexed { i, m -> appendLine("${i+1}. ${m.memory.content}") }
      appendLine("------------------")
    }

    val enrichedPrompt = req.prompt().augmentSystemMessage { sys ->
      val existing = sys.text
      sys.mutate()
        .text(
          buildString {
            appendLine(memoryBlock)
            if (existing.isNotBlank()) {
              appendLine()
              append(existing)
            }
          }
        ).build()
    }

    val enrichedReq = req.mutate()
      .prompt(enrichedPrompt)
      .build()

    return chain.nextCall(enrichedReq)
  }
}

5.2 Advisor for Long-term memory recording

The recorder advisor runs after the assistant responds. It looks at the last user message and the assistant’s reply, asks the model to extract atomic, useful facts (episodic or semantic), deduplicates them, and stores them in Redis.

@Component
class LongTermMemoryRecorderAdvisor(
  private val memoryService: MemoryService,
  private val chatModel: ChatModel
) : CallAdvisor, Ordered {

  data class MemoryCandidate(val content: String, val type: MemoryType, val userId: String?)
  data class ExtractionResult(val memories: List<MemoryCandidate> = emptyList())

  private val extractorConverter = BeanOutputConverter(ExtractionResult::class.java)

  override fun getOrder(): Int = Ordered.HIGHEST_PRECEDENCE + 60
  override fun getName(): String = "LongTermMemoryRecorderAdvisor"

  override fun adviseCall(req: ChatClientRequest, chain: CallAdvisorChain): ChatClientResponse {
    // 1) Proceed with the normal call (other advisors may have enriched the prompt)
    val res = chain.nextCall(req)

    // 2) Build extraction prompt (user + assistant text of *this* turn)
    val userText = req.prompt().userMessage.text
    val assistantText = res.chatResponse()?.result?.output?.text

    // 3) Ask the model to extract long-term memories as structured JSON
    val schemaHint = extractorConverter.jsonSchema // JSON schema string for the POJO
    val extractSystem = """
            You extract LONG-TERM MEMORIES from a dialogue turn.

            A memory is either:

            1. EPISODIC MEMORIES: Personal experiences and user-specific preferences
               Examples: "User prefers Delta airlines", "User visited Paris last year"

            2. SEMANTIC MEMORIES: General domain knowledge and facts
               Examples: "Singapore requires passport", "Tokyo has excellent public transit"

            Only extract clear, factual information. Do not make assumptions or infer information that isn't explicitly stated.
            If no memories can be extracted, return an empty array.

            The instance must conform to this JSON Schema (for validation, do not output it):
              $schemaHint

            Do not include code fences, schema, or properties. Output a single-line JSON object.
        """.trimIndent()

    val extractUser = """
            USER SAID:
            $userText

            ASSISTANT REPLIED:
            $assistantText

            Extract up to 5 memories with correct type; set userId if present/known.
        """.trimIndent()

    val options: ChatOptions = OpenAiChatOptions.builder()
      .responseFormat(ResponseFormat.builder().type(ResponseFormat.Type.JSON_OBJECT).build())
      .build()

    val extraction = chatModel.call(
      Prompt(
        listOf(
          UserMessage(extractUser),
          SystemMessage(extractSystem)
        ),
        options
      ),
    )

    val parsed = extractorConverter.convert(extraction.result.output.text ?: "")
      ?: ExtractionResult()

    // 4) Persist memories (MemoryService handles dedupe/thresholding)
    val userId = (req.context["ltm_user_id"] as? String) // optional per-call param
    parsed.memories.forEach { m ->
      val owner = m.userId ?: userId
      memoryService.storeMemory(
        content = m.content,
        memoryType = m.type,
        userId = owner
      )
    }

    return res
  }
}

6. Plugging the advisors in our ChatClient

In our ChatConfig class, we will configure our ChatClient as:

    @Bean
    fun chatClient(
        chatModel: ChatModel,
        // chatMemory: ChatMemory, (Necessary for short-term memory)
        longTermRecorder: LongTermMemoryRecorderAdvisor,
        longTermMemoryRetrieval: LongTermMemoryRetrievalAdvisor
    ): ChatClient {
        return ChatClient.builder(chatModel)
            .defaultAdvisors(
                // MessageChatMemoryAdvisor.builder(chatMemory).build(),
                longTermRecorder,
                longTermMemoryRetrieval
            ).build()
    }

7. Implement the Chat Service

Since the advisors have been plugged in the ChatClient itself, we don’t need to worry about managing memory ourselves when interacting with the LLM. The only thing we need to make sure is that with every interaction we send the expected parameters, namely the session or user ID, so that the advisors know which history to look at.

@Service
class ChatService(
    private val chatClient: ChatClient,
    private val shortTermMemoryRepository: ShortTermMemoryRepository,
    private val travelAgentSystemPrompt: Message,
    private val chatMemoryRepository: ChatMemoryRepository
) {
    private val log = LoggerFactory.getLogger(ChatService::class.java)

    fun sendMessage(
        message: String,
        userId: String,
    ): ChatResult {
        // Use userId as the key for conversation history and long-term memory
        log.info("Processing message from user $userId: $message")
        val response = chatClient
            .prompt(
                Prompt(
                    travelAgentSystemPrompt,
                    UserMessage(message)
                )
            )
            .advisors { it
                .param(ChatMemory.CONVERSATION_ID, userId)
                .param("ltm_user_id", userId)
            }
            .call()

        return ChatResult(
            response = response.chatResponse()!!
        )
    }


    fun getConversationHistory(userId: String): List<Message?> {
        return chatMemoryRepository.findByConversationId(userId)
    }

    fun clearConversationHistory(userId: String) {
        shortTermMemoryRepository.deleteById(userId)
        log.info("Cleared conversation history for user $userId from Redis")
    }
}

8. Configure the Agent System Prompt

The agent is configured with a system prompt that explains its capabilities and access to different types of memory:

@Bean
fun travelAgentSystemPrompt(): Message {
    val promptText = """
        You are a travel assistant helping users plan their trips. You remember user preferences
        and provide personalized recommendations based on past interactions.

        You have access to the following types of memory:
        1. Short-term memory: The current conversation thread
        2. Long-term memory:
           - Episodic: User preferences and past trip experiences (e.g., "User prefers window seats")
           - Semantic: General knowledge about travel destinations and requirements

        Always be helpful, personal, and context-aware in your responses.

        Always answer in text format. No markdown or special formatting.
    """.trimIndent()

    return SystemMessage(promptText)
}

9. Create the REST Controller

The REST controller exposes endpoints for chat and memory management:

@RestController
@RequestMapping("/api")
class ChatController(private val chatService: ChatService) {

    @PostMapping("/chat")
    fun chat(@RequestBody request: ChatRequest): ChatResponse {
        val result = chatService.sendMessage(request.message, request.userId)
        return ChatResponse(
            message = result.response.result.output.text ?: "",
            metrics = result.metrics
        )
    }

    @GetMapping("/history/{userId}")
    fun getHistory(@PathVariable userId: String): List<MessageDto> {
        return chatService.getConversationHistory(userId).map { message ->
            MessageDto(
                role = when (message) {
                    is SystemMessage -> "system"
                    is UserMessage -> "user"
                    is AssistantMessage -> "assistant"
                    else -> "unknown"
                },
                content = when (message) {
                    is SystemMessage -> message.content
                    is UserMessage -> message.content
                    is AssistantMessage -> message.content
                    else -> ""
                }
            )
        }
    }

    @DeleteMapping("/history/{userId}")
    fun clearHistory(@PathVariable userId: String) {
        chatService.clearConversationHistory(userId)
    }
}

Running the Demo

The easiest way to run the demo is with Docker Compose, which sets up all required services in one command.

Step 1: Clone the repository

git clone https://github.com/redis/redis-springboot-recipes.git
cd redis-springboot-recipes/artificial-intelligence/agent-long-term-memory-with-spring-ai

Step 2: Configure your environment

Create a .env file with your OpenAI API key:

OPENAI_API_KEY=sk-your-api-key

Step 3: Start the services

docker compose up --build

This will start:

redis: for storing both vector embeddings and chat history
redis-insight: a UI to explore the Redis data
agent-memory-app: the Spring Boot app that implements the memory-aware AI agent

Step 4: Use the application

When all services are running, go to localhost:8080 to access the demo. You'll see a travel assistant interface with a chat panel and a memory management sidebar:

Enter a user ID and click "Start Chat":

Send a message like: "Hi, my name's Raphael. I went to Paris back in 2009 with my wife for our honeymoon and we had a lovely time. For our 10-year anniversary we're planning to go back. Help us plan the trip!"

The system will reply with the response to your message and, in case it identifies potential memories to be stored, they will be stored either as semantic or episodic memories. You can see the stored memories on the "Memory Management" sidebar.

On top of that, with each message, the system will also return performance metrics.

If you refresh the page, you will see that all memories and the chat history are gone.

If you reenter the same user ID, the long-term memories will be reloaded on the sidebar and the short-term memory (the chat history) will be reloaded as well:

If you refresh the page and enter the same user ID, your memories and conversation history will be reloaded

Exploring the Data in Redis Insight

RedisInsight provides a visual interface for exploring the data stored in Redis. Access it at localhost:5540 to see:

Short-term memory (conversation history) stored in Redis Lists

Long-term memory (facts and experiences) stored as JSON documents with vector embeddings

The vector index schema used for similarity search

If you run the FT.INFO longTermMemoryIdx command in the RedisInsight workbench, you'll see the details of the vector index schema that enables efficient memory retrieval.

Wrapping up

And that's it — you now have a working AI agent with memory using Spring Boot and Redis.

Instead of forgetting everything between conversations, your agent can now remember user preferences, past experiences, and important facts. Redis handles both short-term memory (conversation history) and long-term memory (vector embeddings) — all with the performance and scalability Redis is known for.

With Spring AI and Redis, you get an easy way to integrate this into your Java applications. The combination of vector similarity search for semantic retrieval and traditional data structures for conversation history gives you a powerful foundation for building truly intelligent agents.

Whether you're building customer service bots, personal assistants, or domain-specific experts, this memory architecture gives you the tools to create more helpful, personalized, and context-aware AI experiences.

Try it out, experiment with different memory types, explore other embedding models, and see how far you can push the boundaries of AI agent capabilities!

Stay Curious!

Semantic Search with Spring Boot & Redis

Raphael De Lio — Tue, 29 Apr 2025 08:48:59 +0000

TL;DR:
You’re building a semantic search app using Spring Boot and Redis.

Instead of matching exact words, semantic search finds meaning using Vector Similarity Search (VSS).

It works by turning movie synopses into vectors with embedding models, storing them in Redis (as a vector database), and finding the closest matches to user queries.

Video: What is semantic search?

A traditional searching system works by matching the words a user types with the words stored in a database or document collection. It usually looks for exact or partial matches without understanding the meaning behind the words.

Semantic searching, on the other hand, tries to understand the meaning behind what the user is asking. It focuses on the concepts, not just the keywords, making it much easier for users to find what they really want.

In a movie streaming service, for example, if a movie’s synopsis is stored in a database as “A cowboy doll feels threatened when a new space toy becomes his owner’s favorite,” but the user searches for “jealous toy struggles with new rival,” a traditional search system might not find the movie because the exact words don’t line up.

But a semantic a semantic search system can still connect the two ideas and bring up the right movie. It understands the meaning behind your query — not just the exact words.

Behind the scenes, this works thanks to vector similarity search. It turns text (or images, or audio) into vectors — lists of numbers —store them in a vector database and then finds the ones closest to your query.

Today, we’re gonna build a vector similarity search app that lets users find movies based on the *meaning *of their synopsis — not just exact keyword matches. So that even if they don’t know the title, they can still get the right movie based on a generic description of the synopsis.

To do that, we’ll build a Spring Boot app from scratch and plug in Redis OM Spring. It’ll handle turning our data into vectors, storing them in Redis, and running fast vector searches when users send a query.

Redis as a Vector Database

Video: What is a vector database?

In the last 15 years, Redis became the foundational infrastructure for realtime applications. Today, with Redis 8, it’s commited to becoming the foundational infrastructure for AI applications as well.

Redis 8 not only turns the community version of Redis into a Vector Database, but also makes it the fastest and most scalable database in the market today. Redis 8 allows you to scale to one billion vectors without penalizing latency.

Learn more: https://redis.io/blog/searching-1-billion-vectors-with-redis-8/

Redis OM Spring

To allow our users and customers to take full advantage of everything Redis can do — with the speed Redis is known for — we decided to implement Redis OM Spring, a library built on top of Spring Data Redis.

Redis OM Spring allows our users to easily communicate with Redis, model their entities as JSONs or Hashes, efficiently query them by levaraging the Redis Query Engine and even take advantage of probabilistic data structures such as Count-min Sketch, Bloom Filters, Cuckoo Filters, and more.

Redis OM Spring on GitHub: https://github.com/redis/redis-om-spring

Dataset

The dataset we’ll be looking is a catalog of thousands of movies. Each of these movies has metadata such as its title, cast, genre, year, and synopsis. The JSON file representing this dataset can be found in the repository that accompanies this article.

Sample:

{
  "title": "Toy Story",
  "year": 1995,
  "cast": [
   "Tim Allen",
   "Tom Hanks",
   "Don Rickles"
  ],
  "genres": [
   "Animated",
   "Comedy"
  ],
  "href": "Toy_Story",
  "extract": "Toy Story is a 1995 American computer-animated comedy film directed by John Lasseter, produced by Pixar Animation Studios and released by Walt Disney Pictures. The first installment in the  Toy Story franchise, it was the first entirely computer-animated feature film, as well as the first feature film from Pixar. It was written by Joss Whedon, Andrew Stanton, Joel Cohen, and Alec Sokolow from a story by Lasseter, Stanton, Pete Docter, and Joe Ranft. The film features music by Randy Newman, was produced by Bonnie Arnold and Ralph Guggenheim, and was executive-produced by Steve Jobs and Edwin Catmull. The film features the voices of Tom Hanks, Tim Allen, Don Rickles, Jim Varney, Wallace Shawn, John Ratzenberger, Annie Potts, R. Lee Ermey, John Morris, Laurie Metcalf, and Erik von Detten.",
  "thumbnail": "https://upload.wikimedia.org/wikipedia/en/1/13/Toy_Story.jpg",
  "thumbnail_width": 250,
  "thumbnail_height": 373
}

Building the Application

Our application will be built using Spring Boot with Redis OM Spring. It will allow movies to be searched by their synopsis based on semantic search rather than keyword matching. **Besides that, our application will also allow its users to perform **hybrid search, a technique that combines vector similarity with traditional filtering and sorting.

0. GitHub Repository

**The full application can be found on GitHub: **https://github.com/redis/redis-om-spring/tree/main/demos/roms-vss-movies/

1. Add the required dependencies

From a Spring Boot application, add the following dependencies to your Maven or Gradle file:

<!-- Redis OM Spring for Redis object mapping and vector search -->
<dependency>
    <groupId>com.redis.om.spring</groupId>
    <artifactId>redis-om-spring</artifactId>
    <version>0.9.11</version>
</dependency>

<!-- Redis OM Spring uses Spring AI for creating embeddings (vectors) -->
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai</artifactId>
    <version>1.0.0-M6</version>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-transformers</artifactId>
    <version>1.0.0-M6</version>
</dependency>

2. Define the Movie entity

Redis OM Spring provides two annotations that makes it easy to vectorize data and perform vector similarity search from within Spring Boot.

@vectorize: Automatically generates vector embeddings from the text field
@Indexed: Enables vector indexing on a field for efficient search

The core of the implementation is the Movie class with Redis vector indexing annotations:

@RedisHash // This annotation is used by Redis OM Spring to store the entity as a hash in Redis
public class Movie {

    @Id // IDs are automatically generated by Redis OM Spring as ULID
    private String title;

    @Indexed(sortable = true) // This annotation enables indexing on the field for filtering and sorting
    private int year;

    @Indexed
    private List<String> cast;

    @Indexed
    private List<String> genres;

    private String href;

    // This annotation automatically generates vector embeddings from the text
    @Vectorize(
            destination = "embeddedExtract", // The field where the embedding will be stored
            embeddingType = EmbeddingType.SENTENCE, // Type of embedding to generate (Sentence, Image, face, or word)
            provider = EmbeddingProvider.OPENAI, // The provider for generating embeddings (OpenAI, Transformers, VertexAI, etc.)
            openAiEmbeddingModel = OpenAiApi.EmbeddingModel.TEXT_EMBEDDING_3_LARGE // The specific OpenAI model to use for embeddings
    )
    private String extract;

    // This defines the vector field that will store the embeddings
    // The indexed annotation enables vector search on this field
    @Indexed(
            schemaFieldType = SchemaFieldType.VECTOR, // Defines the field type as a vector
            algorithm = VectorField.VectorAlgorithm.FLAT, // The algorithm used for vector search (FLAT or HNSW)
            type = VectorType.FLOAT32,
            dimension = 3072, // The dimension of the vector (must match the embedding model)
            distanceMetric = DistanceMetric.COSINE, // The distance metric used for similarity search (Cosine or Euclidean)
            initialCapacity = 10
    )
    private byte[] embeddedExtract;

    private String thumbnail;
    private int thumbnailWidth;
    private int thumbnailHeight;

    // Getters and setters...
}

In this example we're using OpenAI's embedding model that requires an OpenAI API Key to be set in the application.properties file of your application:

redis.om.spring.ai.open-ai.api-key=${OPEN_AI_KEY}

If an embedding model is not specified, Redis OM Spring will use a Hugging Face’s Transformers model (all-MiniLM-L6-v2) by default. In this case, make sure you match the number of dimensions in the indexed annotation to 384 which is the number of dimensions created by the default embedding model.

3. Repository Interface

A simple repository interface that extends RedisEnhancedRepository. This will be used to load the data into Redis using the saveAll() method:

public interface MovieRepository extends RedisEnhancedRepository<Movie, String> {}

This provides basic CRUD operations for Movie entities, with the first generic parameter being the entity type and the second being the ID type.

4. Search Service

The search service uses two beans provided by Redis OM Spring:

EntityStream: For creating a stream of entities to perform searches. The Entity Stream must not be confused with the Java Streams API. The Entity Stream will generate a Redis Command that will be sent to Redis so that Redis can perform the searching, filtering and sorting efficiently on its side.
Embedder: Used for generating the embedding for the query sent by the user. It will be generated following the configuration of the @vectorize annotation defined in the Movie class/

The search functionality is implemented in the SearchService:

@Service
public class SearchService {

    private static final Logger logger = LoggerFactory.getLogger(SearchService.class);
    private final EntityStream entityStream;
    private final Embedder embedder;

    public SearchService(EntityStream entityStream, Embedder embedder) {
        this.entityStream = entityStream;
        this.embedder = embedder;
    }

    public List<Pair<Movie, Double>> search(
            String query,
            Integer yearMin,
            Integer yearMax,
            List<String> cast,
            List<String> genres,
            Integer numberOfNearestNeighbors) {
        logger.info("Received text: {}", query);
        logger.info("Received yearMin: {} yearMax: {}", yearMin, yearMax);
        logger.info("Received cast: {}", cast);
        logger.info("Received genres: {}", genres);

        if (numberOfNearestNeighbors == null) numberOfNearestNeighbors = 3;
        if (yearMin == null) yearMin = 1900;
        if (yearMax == null) yearMax = 2100;

        // Convert query text to vector embedding
        byte[] embeddedQuery = embedder.getTextEmbeddingsAsBytes(List.of(query), Movie$.EXTRACT).getFirst();

        // Perform vector search with additional filters
        SearchStream<Movie> stream = entityStream.of(Movie.class);
        return stream
                // KNN search for nearest vectors
                .filter(Movie$.EMBEDDED_EXTRACT.knn(numberOfNearestNeighbors, embeddedQuery))
                // Additional metadata filters (hybrid search)
                .filter(Movie$.YEAR.between(yearMin, yearMax))
                .filter(Movie$.CAST.eq(cast))
                .filter(Movie$.GENRES.eq(genres))
                // Sort by similarity score
                .sorted(Movie$._EMBEDDED_EXTRACT_SCORE)
                // Return both the movie and its similarity score
                .map(Fields.of(Movie$._THIS, Movie$._EMBEDDED_EXTRACT_SCORE))
                .collect(Collectors.toList());
    }
}

Key features of the search service:

Uses EntityStream to create a search stream for Movie entities
Converts the text query into a vector embedding
Uses K-nearest neighbors (KNN) search to find similar vectors
Applies additional filters for hybrid search (combining vector and traditional search)
Returns pairs of movies and their similarity scores

5. Movie Service for Data Loading

The MovieService handles loading movie data into Redis. It reads a JSON file containing movie date and save the movies into Redis.

It may take one or two minutes to load the data for the thousands of movies in the file because the embedding generation is done in the background. The @vectorize annotation will generate the embeddings for the extract field before the movie is saved into Redis.

@Service
public class MovieService {

    private static final Logger log = LoggerFactory.getLogger(MovieService.class);
    private final ObjectMapper objectMapper;
    private final ResourceLoader resourceLoader;
    private final MovieRepository movieRepository;

    public MovieService(ObjectMapper objectMapper, ResourceLoader resourceLoader, MovieRepository movieRepository) {
        this.objectMapper = objectMapper;
        this.resourceLoader = resourceLoader;
        this.movieRepository = movieRepository;
    }

    public void loadAndSaveMovies(String filePath) throws Exception {
        Resource resource = resourceLoader.getResource("classpath:" + filePath);
        try (InputStream is = resource.getInputStream()) {
            List<Movie> movies = objectMapper.readValue(is, new TypeReference<>() {});
            List<Movie> unprocessedMovies = movies.stream()
                    .filter(movie -> !movieRepository.existsById(movie.getTitle()) &&
                            movie.getYear() > 1980
                    ).toList();
            long systemMillis = System.currentTimeMillis();
            movieRepository.saveAll(unprocessedMovies);
            long elapsedMillis = System.currentTimeMillis() - systemMillis;
            log.info("Saved " + movies.size() + " movies in " + elapsedMillis + " ms");
        }
    }

    public boolean isDataLoaded() {
        return movieRepository.count() > 0;
    }
}

5. Search Controller

The REST controller exposes the search endpoint:

@RestController
public class SearchController {

    private final SearchService searchService;

    public SearchController(SearchService searchService) {
        this.searchService = searchService;
    }

    @GetMapping("/search")
    public Map<String, Object> search(
            @RequestParam(required = false) String text,
            @RequestParam(required = false) Integer yearMin,
            @RequestParam(required = false) Integer yearMax,
            @RequestParam(required = false) List<String> cast,
            @RequestParam(required = false) List<String> genres,
            @RequestParam(required = false) Integer numberOfNearestNeighbors
    ) {
        List<Pair<Movie, Double>> matchedMovies = searchService.search(
                text,
                yearMin,
                yearMax,
                cast,
                genres,
                numberOfNearestNeighbors
        );
        return Map.of(
                "matchedMovies", matchedMovies,
                "count", matchedMovies.size()
        );
    }
}

6. Application Bootstrap

The main application class initializes Redis OM Spring and loads data. The @EnableRedisEnhancedRepositories annotation activates Redis OM Spring's repository support:

@SpringBootApplication
@EnableRedisEnhancedRepositories(basePackages = {"dev.raphaeldelio.redis8demo*"})
public class Redis8DemoVectorSimilaritySearchApplication {

    public static void main(String[] args) {
        SpringApplication.run(Redis8DemoVectorSimilaritySearchApplication.class, args);
    }

    @Bean
    CommandLineRunner loadData(MovieService movieService) {
        return args -> {
            if (movieService.isDataLoaded()) {
                System.out.println("Data already loaded. Skipping data load.");
                return;
            }
            movieService.loadAndSaveMovies("movies.json");
        };
    }
}

7. Sample Requests

You can make requests to the search endpoint:

GET http://localhost:8082/search?text=A movie about a young boy who goes to a wizardry school

GET http://localhost:8082/search?numberOfNearestNeighbors=1&yearMin=1970&yearMax=1990&text=A movie about a kid and a scientist who go back in time

GET http://localhost:8082/search?cast=Dee Wallace,Henry Thomas&text=A boy who becomes friend with an alien

Sample request:

GET http://localhost:8082/search?numberOfNearestNeighbors=1&yearMin=1970&yearMax=1990&text=A movie about a kid and a scientist who go back in time

Sample response:

{

  "count": 1,

  "matchedMovies": [

    {

      "first": { // matched movie

        "title": "Back to the Future",

        "year": 1985,

        "cast": [

          "Michael J. Fox",

          "Christopher Lloyd"

        ],

        "genres": [

          "Science Fiction"

        ],

        "extract": "Back to the Future is a 1985 American science fiction film directed by Robert Zemeckis and written by Zemeckis, and Bob Gale. It stars Michael J. Fox, Christopher Lloyd, Lea Thompson, Crispin Glover, and Thomas F. Wilson. Set in 1985, it follows Marty McFly (Fox), a teenager accidentally sent back to 1955 in a time-traveling DeLorean automobile built by his eccentric scientist friend Emmett \"Doc\" Brown (Lloyd), where he inadvertently prevents his future parents from falling in love – threatening his own existence – and is forced to reconcile them and somehow get back to the future.",

        "thumbnail": "https://upload.wikimedia.org/wikipedia/en/d/d2/Back_to_the_Future.jpg"

      },

      "second": 0.463297247887 // similarity score (the lowest the closest)

    }

  ]

}

Wrapping up

And that’s it — you now have a working semantic search app using Spring Boot and Redis.

Instead of relying on exact keyword matches, your app understands the meaning behind the query. Redis handles the heavy part: embedding storage, similarity search, and even traditional filters — all at lightning speed.

With Redis OM Spring, you get an easy way to integrate this into your Java apps. You only need two annotations: @vectorize and @Indexed and two Beans: EntityStream and Embedder.

Whether you’re building search, recommendations, or AI-powered assistants, this setup gives you a solid and scalable foundation.

Try it out, tweak the filters, explore other models, and see how far you can go!

More AI Resources

The best way to stay on the path of learning AI is by following the recipes available on the Redis AI Resources GitHub repository. There you can find dozens of recipes that will get you to start building AI apps, fast!

GitHub - redis-developer/redis-ai-resources: ✨ A curated list of awesome community resources, integrations, and examples of Redis in the AI ecosystem.

Stay Curious!

Token Bucket Rate Limiter (Redis & Java)

Raphael De Lio — Mon, 13 Jan 2025 14:15:29 +0000

This article is also available on YouTube!

The Token Bucket algorithm is a flexible and efficient rate-limiting mechanism. It works by filling a bucket with tokens at a fixed rate (e.g., one token per second). Each request consumes a token, and if no tokens are available, the request is rejected. The bucket has a maximum capacity, so it can handle bursts of traffic as long as the burst doesn’t exceed the number of tokens in the bucket.

Looking for a different rate limiter algorithm? Check the essential guide.

Index

Introduction
How the Token Bucket Rate Limiter Works
Implementation with Redis and Java
Testing with TestContainers and AssertJ
Conclusion (GitHub Repo)

How It Works

1. Define a Token Refill Rate

Set a rate at which tokens are added to the bucket, such as 1 token per second or 10 tokens per minute.

2. Track Token Consumption

For each incoming request, deduct one token from the bucket.

3. Refill Tokens

Continuously refill the bucket at the defined rate, up to its maximum capacity, ensuring unused tokens can accumulate for future bursts.

4. Rate Limit Check

Before processing a request, check if there are enough tokens in the bucket. If the bucket is empty, reject the request until tokens are replenished.

How to Implement It with Redis and Java

For the Token Bucket Rate Limiter, Redis provides an efficient way to track tokens and implement the algorithm. Here’s how to do it:

1. Retrieve current token count and last refill time

First, retrieve the current token count and the last refill time:

GET rate_limit:<clientId>:count  
GET rate_limit:<clientId>:lastRefill

If these keys don’t exist, initialize the token count to the bucket’s maximum capacity and set the current time as the last refill time using SET.

2. Refill tokens if necessary and update the bucket

Update the token count and last refill date time after processing each request:

SET rate_limit:<clientId>:count <new_token_count>  
SET rate_limit:<clientId>:lastRefill <current_time>

3. Allow or reject the request

If tokens are available, allow the request and decrement the count by one using:

DECR rate_limit:<clientId>:count

Implementing it with Jedis

Jedis is a popular Java library used to interact with **Redis **and we will use it for implementing our rate limiter because it provides a simple and intuitive API for executing Redis commands from JVM applications.

Add Jedis to Your Maven File:

Check the latest version here.

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>5.2.0</version>
</dependency>

Create a TokenBucketRateLimiter class:

The class will take:

Accept a Jedis instance.
Define the maximum capacity of the token bucket.
Specify the token refill rate (tokens per second).

    package io.redis;

    import redis.clients.jedis.Jedis;
    import redis.clients.jedis.Transaction;

    public class TokenBucketRateLimiter {
        private final Jedis jedis;
        private final int bucketCapacity; // Maximum tokens the bucket can hold
        private final double refillRate; // Tokens refilled per second

        public TokenBucketRateLimiter(Jedis jedis, int bucketCapacity, double refillRate) {
            this.jedis = jedis;
            this.bucketCapacity = bucketCapacity;
            this.refillRate = refillRate;
        }
    }

Validate the Requests

The main task of this rate limiter is to determine whether a client has sufficient tokens to process their request. If yes, the request is allowed, and tokens are deducted. If not, the request is blocked.

Step 1: Generate the keys
We’ll store each client’s token count and last refill time in Redis using unique keys. The keys will look like this:

public boolean isAllowed(String clientId) {
    String keyCount = "rate_limit:" + clientId + ":count";
    String keyLastRefill = "rate_limit:" + clientId + ":lastRefill";
}

For example, if the client ID is user123, their keys would be rate_limit:user123:count and rate_limit:user123:lastRefill.

Step 2: Fetch Current State
We use Redis’s GET command to retrieve the current token count and the last refill time. If the keys don’t exist, we assume the bucket is full, and the last refill time is the current timestamp.

public boolean isAllowed(String clientId) {
    String keyCount = "rate_limit:" + clientId + ":count";
    String keyLastRefill = "rate_limit:" + clientId + ":lastRefill";

    Transaction transaction = jedis.multi();
    transaction.get(keyLastRefill);
    transaction.get(keyCount);
    var results = transaction.exec();

    long currentTime = System.currentTimeMillis();
    long lastRefillTime = results.get(0) != null ? Long.parseLong((String) results.get(0)) : currentTime;
    int tokenCount = results.get(1) != null ? Integer.parseInt((String) results.get(1)) : bucketCapacity;
}

Step 3: Refill Tokens
Calculate how many tokens should be added based on the time elapsed since the last refill. Ensure the bucket doesn’t exceed its maximum capacity.

long elapsedTimeMs = currentTime - lastRefillTime;
double elapsedTimeSecs = elapsedTimeMs / 1000.0;
int tokensToAdd = (int) (elapsedTimeSecs * refillRate);

tokenCount = Math.min(bucketCapacity, tokenCount + tokensToAdd);

Step 4: Check Token Availability
Compare the current token count to determine if the request can be allowed. If tokens are available, deduct one token; otherwise, block the request.

boolean isAllowed = tokenCount > 0;

if (isAllowed) {
    tokenCount--;
}

Step 5: Update Redis
We update the token count and last refill time in Redis. Use a transaction to ensure atomic updates:

Transaction transaction = jedis.multi();
transaction.set(keyLastRefill, String.valueOf(currentTime)); // Update last refill time
transaction.set(keyCount, String.valueOf(tokenCount));       // Update token count
transaction.exec();

Complete Implementation

Here’s the full code for the FixedWindowRateLimiter class:

package io.redis;

import redis.clients.jedis.Jedis;
import redis.clients.jedis.Transaction;

public class TokenBucketRateLimiter {
    private final Jedis jedis;
    private final int bucketCapacity; // Maximum tokens the bucket can hold
    private final double refillRate; // Tokens refilled per second

    public TokenBucketRateLimiter(Jedis jedis, int bucketCapacity, double refillRate) {
        this.jedis = jedis;
        this.bucketCapacity = bucketCapacity;
        this.refillRate = refillRate;
    }

    public boolean isAllowed(String clientId) {
        String keyCount = "rate_limit:" + clientId + ":count";
        String keyLastRefill = "rate_limit:" + clientId + ":lastRefill";

        long currentTime = System.currentTimeMillis();

        // Fetch current state
        Transaction transaction = jedis.multi();
        transaction.get(keyLastRefill);
        transaction.get(keyCount);
        var results = transaction.exec();

        long lastRefillTime = results.get(0) != null ? Long.parseLong((String) results.get(0)) : currentTime;
        int tokenCount = results.get(1) != null ? Integer.parseInt((String) results.get(1)) : bucketCapacity;

        // Refill tokens
        long elapsedTimeMs = currentTime - lastRefillTime;
        double elapsedTimeSecs = elapsedTimeMs / 1000.0;
        int tokensToAdd = (int) (elapsedTimeSecs * refillRate);
        tokenCount = Math.min(bucketCapacity, tokenCount + tokensToAdd);

        // Check if the request is allowed
        boolean isAllowed = tokenCount > 0;

        if (isAllowed) {
            tokenCount--; // Consume one token
        }

        // Update Redis state
        transaction = jedis.multi();
        transaction.set(keyLastRefill, String.valueOf(currentTime));
        transaction.set(keyCount, String.valueOf(tokenCount));
        transaction.exec();

        return isAllowed;
    }
}

And we’re ready to start testing it’s behavior!

Testing our Rate Limiter

To ensure our Token Bucket Rate Limiter behaves as expected, we’ll write tests for various scenarios. For this, we’ll use three tools:

Redis TestContainers: This library spins up an isolated Redis container for testing. This means we don’t need to rely on an external Redis server during our tests. Once the tests are done, the container is stopped, leaving no leftover data.
JUnit 5: Our main testing framework, which helps us define and structure tests with lifecycle methods like @BeforeEach and @AfterEach.
AssertJ: A library that makes assertions readable and expressive, like assertThat(result).isTrue().

Let’s begin by adding the necessary dependencies to our pom.xml.

Adding Dependencies

Here’s what you’ll need in your Maven pom.xml file:

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-engine</artifactId>
    <version>5.10.0</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>com.redis</groupId>
    <artifactId>testcontainers-redis</artifactId>
    <version>2.2.2</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.assertj</groupId>
    <artifactId>assertj-core</artifactId>
    <version>3.11.1</version>
    <scope>test</scope>
</dependency>

Once you’ve added these dependencies, you’re ready to start writing your test class.

Setting Up the Test Class

The first step is to create a test class named FixedWindowRateLimiterTest. Inside, we’ll define three main components:

Redis Test Container: This launches a Redis instance in a Docker container.
Jedis Instance: This connects to the Redis container for sending commands.
Rate Limiter: The actual TokenBucketRateLimiter instance we’re testing.

Here’s how the skeleton of our test class looks:

public class TokenBucketRateLimiterTest {

    private static RedisContainer redisContainer;
    private Jedis jedis;
    private TokenBucketRateLimiter rateLimiter;

Preparing the Environment Before Each Test

Before running any test, we need to ensure a clean Redis environment. Here’s what we’ll do:

Connect to Redis: Use a Jedis instance to connect to the Redis container.
Flush Data: Clear any leftover data in Redis to ensure consistent results for each test.

We’ll set this up in a method annotated with @BeforeEach, which runs before every test case.

@BeforeAll
static void startContainer() {
    redisContainer = new RedisContainer("redis:latest");
    redisContainer.withExposedPorts(6379).start();
}

@BeforeEach
void setup() {
    jedis = new Jedis(redisContainer.getHost(), redisContainer.getFirstMappedPort());
    jedis.flushAll();
}

FLUSHALL is an actual Redis command that deletes all the keys of all the existing databases. Read more about it in the official documentation.

Cleaning Up After Each Test

After each test, we need to close the Jedis connection to free up resources. This ensures no lingering connections interfere with subsequent tests.

@AfterEach
void tearDown() {
    jedis.close();
}

Full Setup

Here’s how the complete test class looks with everything in place:

public class TokenBucketRateLimiterTest {

    private static RedisContainer redisContainer;
    private Jedis jedis;
    private TokenBucketRateLimiter rateLimiter;

    @BeforeAll
    static void startContainer() {
        redisContainer = new RedisContainer("redis:latest");
        redisContainer.withExposedPorts(6379).start();
    }

    @AfterAll
    static void stopContainer() {
        redisContainer.stop();
    }

    @BeforeEach
    void setup() {
        jedis = new Jedis(redisContainer.getHost(), redisContainer.getFirstMappedPort());
        jedis.flushAll();
    }

    @AfterEach
    void tearDown() {
        jedis.close();
    }
}

Verifying Requests Within the Bucket Capacity

This test ensures the rate limiter allows requests within the defined bucket capacity.

We configure it with a capacity of 5 tokens and a refill rate of one token per second, then call isAllowed(“client-1”) 5 times.

Each call should return true, confirming the rate limiter correctly tracks and permits requests within the capacity.

@Test
void shouldAllowRequestsWithinBucketCapacity() {
    rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);
    for (int i = 1; i <= 5; i++) {
        assertThat(rateLimiter.isAllowed("client-1"))
            .withFailMessage("Request %d should be allowed within bucket capacity", i)
            .isTrue();
    }
}

Verifying Requests Are Denied When Bucket is Empty

This test ensures the rate limiter correctly denies requests once the bucket is empty.

Configured with a capacity of 5 tokens and a refill rate of one token per second, we isAllowed(“client-1”) 5 times and expect all to return true.

On the 6th call, it should return false, verifying the rate limiter blocks requests once the bucket is empty.

@Test
void shouldDenyRequestsOnceBucketIsEmpty() {
    rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);
    for (int i = 1; i <= 5; i++) {
        assertThat(rateLimiter.isAllowed("client-1"))
            .withFailMessage("Request %d should be allowed within bucket capacity", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed("client-1"))
        .withFailMessage("Request beyond bucket capacity should be denied")
        .isFalse();
}

Verifying Bucket is Gradually Refilled

This test ensures the rate limiter refills the bucket correctly after every second.

Configured with a capacity of 5 tokens and a refill rate of one token per second, the first 5 requests (isAllowed(“client-1”)) return true, while the 6th request is denied (false).

After waiting for two seconds, the next two requests are allowed and the third one is denied. Confirming the refilling behavior works as expected.

    @Test
    void shouldRefillTokensGraduallyAndAllowRequestsOverTime() throws InterruptedException {
        rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);
        String clientId = "client-1";

        for (int i = 1; i <= 5; i++) {
            assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request %d should be allowed within bucket capacity", i)
                .isTrue();
        }
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request beyond bucket capacity should be denied")
            .isFalse();

        TimeUnit.SECONDS.sleep(2);

        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request after partial refill should be allowed")
            .isTrue();
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Second request after partial refill should be allowed")
            .isTrue();
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request beyond available tokens should be denied")
            .isFalse();
    }

Verifying Independent Handling of Multiple Clients

This test ensures the rate limiter handles multiple clients independently.

Configured with a capacity of 5 tokens and a refill rate of one token per second, the first 5 requests (isAllowed(“client-1”)) return true, while the 6th request is denied (false).

Simultaneously, all 5 requests from client-2 are allowed (true), confirming the rate limiter maintains separate counters for each client.

@Test
void shouldHandleMultipleClientsIndependently() {
    rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);

    String clientId1 = "client-1";
    String clientId2 = "client-2";

    for (int i = 1; i <= 5; i++) {
        assertThat(rateLimiter.isAllowed(clientId1))
            .withFailMessage("Client 1 request %d should be allowed", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId1))
        .withFailMessage("Client 1 request beyond bucket capacity should be denied")
        .isFalse();

    for (int i = 1; i <= 5; i++) {
        assertThat(rateLimiter.isAllowed(clientId2))
            .withFailMessage("Client 2 request %d should be allowed", i)
            .isTrue();
    }
}

Verifying Token Refill Does Not Exceed Bucket Capacity

This test verifies that the token bucket rate limiter correctly refills tokens up to the defined capacity without exceeding it.

Configured with a capacity of 3 tokens and a refill rate of 2 tokens per second, the first 3 requests (isAllowed(“client-1”)) return true, while the 4th request is denied (false), indicating the bucket is empty.

After waiting 3 seconds (enough to refill 6 tokens), the bucket refills only up to its maximum capacity of 3 tokens. The next 3 requests are allowed (true), but any additional request is denied (false), confirming that the rate limiter maintains the specified capacity limit regardless of refill surplus.

@Test
void shouldRefillTokensUpToCapacityWithoutExceedingIt() throws InterruptedException {
    int capacity = 3;
    double refillRate = 2.0;
    String clientId = "client-1";
    rateLimiter = new TokenBucketRateLimiter(jedis, capacity, refillRate);

    for (int i = 1; i <= capacity; i++) {
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request %d should be allowed within initial bucket capacity", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId))
        .withFailMessage("Request beyond bucket capacity should be denied")
        .isFalse();

    TimeUnit.SECONDS.sleep(3);

    for (int i = 1; i <= capacity; i++) {
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request %d should be allowed as bucket refills up to capacity", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId))
        .withFailMessage("Request beyond bucket capacity should be denied")
        .isFalse();
}

Verifying Denied Requests Do Not Affect Token Count

This test ensures that the token bucket rate limiter does not count denied requests when updating the token count.

Configured with a capacity of 3 tokens and a refill rate of 0.5 tokens per second, the first 3 requests (isAllowed(“client-1”)) are allowed (true), depleting the bucket. The 4th request is denied (false), confirming the bucket is empty.

The Redis token count (rate_limit:client-1:count) is then verified to ensure it accurately reflects the remaining tokens (0 in this case) and does not include denied requests. This confirms that the rate limiter updates the token count only when requests are successfully processed.

@Test
void testRateLimitDeniedRequestsAreNotCounted() {
    int capacity = 3;
    double refillRate = 0.5;
    String clientId = "client-1";
    rateLimiter = new TokenBucketRateLimiter(jedis, capacity, refillRate);

    for (int i = 1; i <= capacity; i++) {
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request %d should be allowed", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId))
        .withFailMessage("This request should be denied")
        .isFalse();

    String key = "rate_limit:" + clientId + ":count";
    int requestCount = Integer.parseInt(jedis.get(key));
    assertThat(requestCount)
        .withFailMessage("The count should match remaining tokens and not include denied requests")
        .isEqualTo(0);
}

Is there any other behavior we should verify? Let me know in the comments!

The Token Bucket Rate Limiter is a flexible and efficient way to manage request rates, and Redis makes it incredibly fast and reliable.

By leveraging commands like GET, SET, and MULTI/EXEC, we implemented a solution that tracks token counts, refills tokens dynamically based on time elapsed, and ensures the bucket never exceeds its defined capacity.

Using Jedis, we built a clear and intuitive Java implementation, and with thorough testing using Redis TestContainers, JUnit 5, and AssertJ, we can confidently verify that it works as expected.

This approach offers a robust foundation for managing request limits while allowing for burst handling and gradual refill, making it adaptable for more advanced rate-limiting scenarios when needed.

GitHub Repo

You can find this implementation in Java and Kotlin:

Java (Implementation, Test)
Kotlin (Implementation, Test)

DEV Community: Redis

Semantic Caching in Agentic AI: Determining Cache Eligibility and Invalidation

Table of Contents

Caching in Agentic AI

Strategies for Semantic Caching

Approach 1: String-Based Pattern Matching

Approach 2: LLM-Based Decision Making

Approach 3: Tool-Based Decision Making

Advantages of Tool-Based Caching

Approach 4: Semantic Routing

Advantages of Semantic Routing

Handling Multi-Turn Conversations

Production Considerations

Building Reliable Agents with the Transactional Outbox Pattern and Redis Streams

The problem is the handoff

Motivation for the Transactional Outbox pattern

Why is "Just Retry the Publish" not enough?

Redis Streams is great for this pattern

Diving deep into the architecture

Trade-offs that are interesting to consider

Okay, let's see some code

Closing

Syncing Data from Amazon DynamoDB to Redis with Apache SeaTunnel

Apache SeaTunnel: Your Open Source Data Pipeline Buddy

Option 1: The Quick Hit (Standalone Mode)

Option 2: The Production Powerhouse (Cluster Mode)

Step 1: Set Up Your Infrastructure

Step 2: Configure your Cluster

Step 3: Launch Your Cluster

Step 4: Submit Jobs via REST API

Wrapping Up

Organizing AI Applications: Lessons from traditional software architecture

Table of Contents

What Makes AI projects maintainable

Clear Ownership Boundaries

Reusability across entry points

Testability

Provider Independence

Architectural patterns for AI apps

Pattern 1: Structure by business components or modules

Pattern 2: Layer your feature modules with 3-tier architecture

Pattern 3: Tools and prompts should call domain logic, not implement it

Pattern 4: Dependency Inversion

Pattern 5: Use environment-aware, secure, and hierarchical config

Pattern 6: Separate persistent data from agent memory

redis-developer / restaurant-discovery-ai-agent-demo

An Agentic AI restaurant discovery platform that combines Redis's speed with LangGraph's intelligent workflow orchestration. Get personalized restaurant recommendations, make reservations, and get lightning-fast responses through semantic caching.

🍽️ Restaurant Discovery AI Agent

App screenshots

Tech Stack

Product Features

Next steps

From PostgreSQL to Redis: Accelerating Your Applications with Redis Data Integration

Cache-Aside Pattern: A Band-Aid, Not a Cure

Refresh-Ahead Pattern: You Don't Call Me; I Call You!

Implementing the Refresh-Ahead Pattern with RDI

How RDI Works?

Beyond Simple Caching: A Living Data Layer

The Future Is Refresh-Ahead

Implementing Semantic Anomaly Detection with OpenTelemetry and Redis

Detecting Anomalies in OpenTelemetry Logs Using Vector Embeddings and Redis

Finding Unknown Unknowns in Your Logs

Understanding the Solution: Embeddings and Vector Search

Building the Anomaly Detection System

Setting Up Dependencies

Setting Up RedisVL and the Vector Index

Processing OTEL Logs into Embeddings

Detecting Anomalies with Semantic Search

Making Results Actionable

Testing Everything with an Example

Monitoring the Monitor

Why This Approach Works?

Conclusion

Designing Data Systems with Vector Embeddings using Redis Vector Sets

Introduction

The Challenge: Data Design

What's Missing?

A New Perspective with Vectors

Building Marlin's Journey with Redis Vector Sets

Step 1: Creating Our Universe