<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Raphael De Lio</title>
    <description>The latest articles on DEV Community by Raphael De Lio (@raphaeldelio).</description>
    <link>https://dev.to/raphaeldelio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1009851%2Fd33dfef0-1a4e-49e6-bce1-17198b7238cb.jpeg</url>
      <title>DEV Community: Raphael De Lio</title>
      <link>https://dev.to/raphaeldelio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/raphaeldelio"/>
    <language>en</language>
    <item>
      <title>Semantic Caching with Spring AI &amp; Redis</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Thu, 31 Jul 2025 09:37:38 +0000</pubDate>
      <link>https://dev.to/redis/semantic-caching-with-spring-ai-redis-2aa4</link>
      <guid>https://dev.to/redis/semantic-caching-with-spring-ai-redis-2aa4</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt; You’re building a semantic caching system using Spring AI and Redis to improve LLM application performance.&lt;/p&gt;

&lt;p&gt;Unlike traditional caching that requires exact query matches, semantic caching understands the meaning behind queries and can return cached responses for semantically similar questions.&lt;/p&gt;

&lt;p&gt;It works by storing query-response pairs as vector embeddings in Redis, allowing your application to retrieve cached answers for similar questions without calling the expensive LLM, reducing both latency and costs.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstkb1wkl2hb1arkilbff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fstkb1wkl2hb1arkilbff.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The Problem with Traditional LLM Applications
&lt;/h1&gt;

&lt;p&gt;LLMs are powerful but expensive. Every API call costs money and takes time. When users ask similar questions like “What beer goes with grilled meat?” and “Which beer pairs well with barbecue?”, traditional systems would make separate LLM calls even though these queries are essentially asking the same thing.&lt;/p&gt;

&lt;p&gt;Traditional exact-match caching only works if users ask the identical question word-for-word. But in real applications, users phrase questions differently while seeking the same information.&lt;/p&gt;

&lt;h1&gt;
  
  
  How Semantic Caching Works
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://www.youtube.com/watch?v=AtVTT_s8AGc&amp;amp;t=1s" rel="noopener noreferrer"&gt;What is a semantic cache?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Semantic caching solves this by understanding the &lt;strong&gt;&lt;em&gt;meaning&lt;/em&gt;&lt;/strong&gt; behind queries rather than matching exact text. When a user asks a question:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; The system converts the query into a vector embedding&lt;/li&gt;
&lt;li&gt; It searches for semantically similar cached queries using vector similarity&lt;/li&gt;
&lt;li&gt; If a similar query exists above a certain threshold, it returns the cached response&lt;/li&gt;
&lt;li&gt; If not, it calls the LLM, gets a response, and caches both the query and response for future use&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Behind the scenes, this works thanks to vector similarity search. It turns text into vectors (embeddings) — lists of numbers — stores them in a vector database, and then finds the ones closest to your query when checking for cached responses.&lt;/p&gt;

&lt;p&gt;Today, we’re gonna build a semantic caching system for a beer recommendation assistant. It will remember previous responses to similar questions, dramatically improving response times and reducing API costs.&lt;/p&gt;

&lt;p&gt;To do that, we’ll build a Spring Boot app from scratch and use Redis as our semantic cache store. It’ll handle vector embeddings for similarity matching, enabling our application to provide lightning-fast responses for semantically similar queries.&lt;/p&gt;

&lt;h1&gt;
  
  
  Redis as a Semantic Cache for AI Applications
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://www.youtube.com/watch?v=Yhv19le0sBw&amp;amp;t=1s" rel="noopener noreferrer"&gt;What's a vector database&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Redis Open Source 8 not only turns the community version of Redis into a Vector Database, but also makes it the fastest and most scalable database in the market today. Redis 8 allows you to scale to one billion vectors without penalizing latency.&lt;/p&gt;

&lt;p&gt;For semantic caching, Redis serves as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  A vector store using Redis JSON and the Redis Query Engine for storing query embeddings&lt;/li&gt;
&lt;li&gt;  A metadata store for cached responses and additional context&lt;/li&gt;
&lt;li&gt;  A high-performance search engine for finding semantically similar queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Spring AI and Redis
&lt;/h1&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://www.youtube.com/watch?v=0U1S0WSsPuE&amp;amp;t=1s" rel="noopener noreferrer"&gt;What’s an embedding model?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Spring AI provides a unified API for working with various AI models and vector stores. Combined with Redis, it allows developers to easily build semantic caching systems that can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Store and retrieve vector embeddings for semantic search&lt;/li&gt;
&lt;li&gt;  Cache LLM responses with semantic similarity matching&lt;/li&gt;
&lt;li&gt;  Reduce API costs by avoiding redundant LLM calls&lt;/li&gt;
&lt;li&gt;  Improve response times for similar queries&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Building the Application
&lt;/h1&gt;

&lt;p&gt;Our application will be built using Spring Boot with Spring AI and Redis. It will implement a beer recommendation assistant that caches responses semantically, providing fast answers to similar questions about beer pairings.&lt;/p&gt;

&lt;h2&gt;
  
  
  0. GitHub Repository
&lt;/h2&gt;

&lt;p&gt;The full application can be found on GitHub&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/redis-developer/redis-springboot-resources/tree/main/artificial-intelligence/semantic-caching-with-spring-ai" rel="noopener noreferrer"&gt;https://github.com/redis-developer/redis-springboot-resources/tree/main/artificial-intelligence/semantic-caching-with-spring-ai&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Add the required dependencies
&lt;/h2&gt;

&lt;p&gt;From a Spring Boot application, add the following dependencies to your Maven or Gradle file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;implementation("org.springframework.ai:spring-ai-transformers:1.0.0")
implementation("org.springframework.ai:spring-ai-starter-vector-store-redis")
implementation("org.springframework.ai:spring-ai-starter-model-openai")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  2. Configure the Semantic Cache Vector Store
&lt;/h2&gt;

&lt;p&gt;We’ll use Spring AI’s &lt;code&gt;RedisVectorStore&lt;/code&gt; to store and search vector embeddings of cached queries and responses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Configuration
class SemanticCacheConfig {
    @Bean
    fun semanticCachingVectorStore(
        embeddingModel: TransformersEmbeddingModel,
        jedisPooled: JedisPooled
    ): RedisVectorStore {
        return RedisVectorStore.builder(jedisPooled, embeddingModel)
            .indexName("semanticCachingIdx")
            .contentFieldName("content")
            .embeddingFieldName("embedding")
            .metadataFields(
                RedisVectorStore.MetadataField("answer", Schema.FieldType.TEXT)
            )
            .prefix("semantic-caching:")
            .initializeSchema(true)
            .vectorAlgorithm(RedisVectorStore.Algorithm.HSNW)
            .build()
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s break this down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Index Name&lt;/strong&gt;: &lt;code&gt;semanticCachingIdx&lt;/code&gt; — Redis will create an index with this name for searching cached responses&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Content Field&lt;/strong&gt;: &lt;code&gt;content&lt;/code&gt; — The raw prompt that will be embedded&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Embedding Field&lt;/strong&gt;: &lt;code&gt;embedding&lt;/code&gt; — The field that will store the resulting vector embedding&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Metadata Fields&lt;/strong&gt;:&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;answer&lt;/code&gt;: TEXT field for storing the LLM's response&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Prefix&lt;/strong&gt;: &lt;code&gt;semantic-caching:&lt;/code&gt; — All keys in Redis will be prefixed with this to organize the data&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Vector Algorithm&lt;/strong&gt;: HSNW — Hierarchical Navigable Small World algorithm for efficient approximate nearest neighbor search&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Implement the Semantic Caching Service
&lt;/h2&gt;

&lt;p&gt;The SemanticCachingService handles storing and retrieving cached responses from Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Service
class SemanticCachingService(
    private val semanticCachingVectorStore: RedisVectorStore
) {
    private val logger = LoggerFactory.getLogger(SemanticCachingService::class.java)
    fun storeInCache(prompt: String, answer: String) {
        // Create a document for the vector store
        val document = Document(
            prompt,
            mapOf("answer" to answer)
        )
        // Store the document in the vector store
        semanticCachingVectorStore.add(listOf(document))

        logger.info("Stored response in semantic cache for prompt: ${prompt.take(50)}...")
    }
    fun getFromCache(prompt: String, similarityThreshold: Double = 0.8): String? {
        // Execute similarity search
        val results = semanticCachingVectorStore.similaritySearch(
            SearchRequest.builder()
                .query(prompt)
                .topK(1)
                .build()
        )
        // Check if we found a semantically similar query above threshold
        if (results?.isNotEmpty() == true) {
            val score = results[0].score ?: 0.0
            if (similarityThreshold &amp;lt; score) {
                logger.info("Cache hit! Similarity score: $score")
                return results[0].metadata["answer"] as String
            } else {
                logger.info("Similar query found but below threshold. Score: $score")
            }
        }
        logger.info("No cached response found for prompt")
        return null
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key features of the semantic caching service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Stores query-response pairs as vector embeddings in Redis&lt;/li&gt;
&lt;li&gt;  Retrieves cached responses using vector similarity search&lt;/li&gt;
&lt;li&gt;  Configurable similarity threshold for cache hits&lt;/li&gt;
&lt;li&gt;  Comprehensive logging for debugging and monitoring&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  4. Integrate with the RAG Service
&lt;/h2&gt;

&lt;p&gt;The RagService orchestrates the semantic caching with the standard RAG pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Service
class RagService(
    private val chatModel: ChatModel,
    private val vectorStore: RedisVectorStore,
    private val semanticCachingService: SemanticCachingService
) {
    private val logger = LoggerFactory.getLogger(RagService::class.java)
    fun retrieve(message: String): RagResult {
        // Check semantic cache first
        val startCachingTime = System.currentTimeMillis()
        val cachedAnswer = semanticCachingService.getFromCache(message, 0.8)
        val cachingTimeMs = System.currentTimeMillis() - startCachingTime
        if (cachedAnswer != null) {
            logger.info("Returning cached response")
            return RagResult(
                generation = Generation(AssistantMessage(cachedAnswer)),
                metrics = RagMetrics(
                    embeddingTimeMs = 0,
                    searchTimeMs = 0,
                    llmTimeMs = 0,
                    cachingTimeMs = cachingTimeMs,
                    fromCache = true
                )
            )
        }
        // Standard RAG process if no cache hit
        logger.info("No cache hit, proceeding with RAG pipeline")

        // Retrieve relevant documents
        val startEmbeddingTime = System.currentTimeMillis()
        val searchResults = vectorStore.similaritySearch(
            SearchRequest.builder()
                .query(message)
                .topK(5)
                .build()
        )
        val embeddingTimeMs = System.currentTimeMillis() - startEmbeddingTime
        // Create context from retrieved documents
        val context = searchResults.joinToString("\n") { it.text }

        // Generate response using LLM
        val startLlmTime = System.currentTimeMillis()
        val prompt = createPromptWithContext(message, context)
        val response = chatModel.call(prompt)
        val llmTimeMs = System.currentTimeMillis() - startLlmTime
        // Store the response in semantic cache for future use
        val responseText = response.result.output.text ?: ""
        semanticCachingService.storeInCache(message, responseText)
        return RagResult(
            generation = response.result,
            metrics = RagMetrics(
                embeddingTimeMs = embeddingTimeMs,
                searchTimeMs = 0, // Combined with embedding time
                llmTimeMs = llmTimeMs,
                cachingTimeMs = 0,
                fromCache = false
            )
        )
    }
    private fun createPromptWithContext(query: String, context: String): Prompt {
        val systemMessage = SystemMessage("""
            You are a beer recommendation assistant. Use the provided context to answer 
            questions about beer pairings, styles, and recommendations.

            Context: $context
        """.trimIndent())

        val userMessage = UserMessage(query)

        return Prompt(listOf(systemMessage, userMessage))
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key features of the integrated RAG service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Checks semantic cache before expensive LLM calls&lt;/li&gt;
&lt;li&gt;  Falls back to standard RAG pipeline for cache misses&lt;/li&gt;
&lt;li&gt;  Automatically caches new responses for future use&lt;/li&gt;
&lt;li&gt;  Provides detailed performance metrics including cache hit indicators&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Running the Demo
&lt;/h1&gt;

&lt;p&gt;The easiest way to run the demo is with Docker Compose, which sets up all required services in one command.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1: Clone the repository
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/redis-developer/redis-springboot-resources.git
cd redis-springboot-resources/artificial-intelligence/semantic-caching-with-spring-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 2: Configure your environment
&lt;/h2&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file with your OpenAI API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=sk-your-api-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 3: Start the services
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker compose up --build
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;redis&lt;/strong&gt;: for storing both vector embeddings and cached responses&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;redis-insight&lt;/strong&gt;: a UI to explore the Redis data&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;semantic-caching-app&lt;/strong&gt;: the Spring Boot app that implements the semantic caching system&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 4: Use the application
&lt;/h2&gt;

&lt;p&gt;When all services are running, go to &lt;code&gt;localhost:8080&lt;/code&gt; to access the demo. You'll see a beer recommendation interface:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lv3axkphkkkqa213woq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5lv3axkphkkkqa213woq.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you click on &lt;code&gt;Start Chat&lt;/code&gt;, it may be that the embeddings are still being created, and you get a message asking for this operation to complete. This is the operation where the documents we'll search through will be turned into vectors and then stored in the database. It is done only the first time the app starts up and is required regardless of the vector database you use.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgq0n7l8a9n6qdeze522b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgq0n7l8a9n6qdeze522b.png" width="554" height="235"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once all the embeddings have been created, you can start asking your chatbot questions. It will semantically search through the documents we have stored, try to find the best answer for your questions, and cache the responses semantically in Redis:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfcg5ebqkdyhq9rdu13f.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frfcg5ebqkdyhq9rdu13f.gif" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you ask something similar to a question had already been asked, your chatbot will retrieve it from the cache instead of sending the query to the LLM. Retrieving an answer much faster now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34zqfpbqosu9bl779g02.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F34zqfpbqosu9bl779g02.gif" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Exploring the Data in Redis Insight
&lt;/h1&gt;

&lt;p&gt;RedisInsight provides a visual interface for exploring the cached data in Redis. Access it at &lt;code&gt;localhost:5540&lt;/code&gt; to see:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Semantic Cache Entries&lt;/strong&gt;: Stored as JSON documents with vector embeddings&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Vector Index Schema&lt;/strong&gt;: The schema used for similarity search&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Performance Metrics&lt;/strong&gt;: Monitor cache hit rates and response times&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fno3i763jso2a7nrfhb22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fno3i763jso2a7nrfhb22.png" alt="captionless image" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you run the &lt;code&gt;FT.INFO semanticCachingIdx&lt;/code&gt; command in the RedisInsight workbench, you'll see the details of the vector index schema that enables efficient semantic matching.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fix5rqlugz9lv1uoxecat.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fix5rqlugz9lv1uoxecat.png" alt="captionless image" width="800" height="456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Wrapping up
&lt;/h1&gt;

&lt;p&gt;And that’s it — you now have a working semantic caching system using Spring Boot and Redis.&lt;/p&gt;

&lt;p&gt;Instead of making expensive LLM calls for every similar question, your application can now intelligently cache and retrieve responses based on semantic meaning. Redis handles the vector storage and similarity search with the performance and scalability Redis is known for.&lt;/p&gt;

&lt;p&gt;With Spring AI and Redis, you get an easy way to integrate semantic caching into your Java applications. The combination of vector similarity search for semantic matching and efficient caching gives you a powerful foundation for building cost-effective, high-performance AI applications.&lt;/p&gt;

&lt;p&gt;Whether you’re building chatbots, recommendation engines, or question-answering systems, this semantic caching architecture gives you the tools to dramatically reduce costs while maintaining response quality and improving user experience.&lt;/p&gt;

&lt;p&gt;Try it out, experiment with different similarity thresholds, explore other embedding models, and see how much you can save on LLM costs while delivering faster responses!&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Stay Curious!&lt;/strong&gt;
&lt;/h2&gt;

</description>
      <category>redis</category>
      <category>systemdesign</category>
      <category>springboot</category>
      <category>ai</category>
    </item>
    <item>
      <title>Agent Long-term Memory with Spring AI &amp; Redis</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Wed, 16 Jul 2025 19:57:23 +0000</pubDate>
      <link>https://dev.to/redis/agent-memory-with-spring-ai-redis-58g5</link>
      <guid>https://dev.to/redis/agent-memory-with-spring-ai-redis-58g5</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;TL;DR:&lt;br&gt;
You're building an AI agent with memory using Spring AI and Redis.&lt;/p&gt;

&lt;p&gt;Unlike traditional chatbots that forget previous interactions, memory-enabled agents can recall past conversations and facts.&lt;/p&gt;

&lt;p&gt;It works by storing two types of memory in Redis: short-term (conversation history) and long-term (facts and experiences as vectors), allowing agents to provide personalized, context-aware responses.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;LLMs respond to each message in isolation, treating every interaction as if it's the first time they've spoken with a user. They lack the ability to remember previous conversations, preferences, or important facts.&lt;/p&gt;

&lt;p&gt;Memory-enabled AI agents, on the other hand, can maintain context across multiple interactions. They remember who you are, what you've told them before, and can use that information to provide more personalized, relevant responses.&lt;/p&gt;

&lt;p&gt;In a travel assistant scenario, for example, if a user mentions "I'm allergic to shellfish" in one conversation, and later asks for restaurant recommendations in Boston, a memory-enabled agent would recall the allergy information and filter out inappropriate suggestions, creating a much more helpful and personalized experience.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://youtu.be/0U1S0WSsPuE" rel="noopener noreferrer"&gt;What is an embedding model?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Behind the scenes, this works thanks to vector similarity search. It turns text into vectors (embeddings) — lists of numbers — stores them in a vector database, and then finds the ones closest to your query when relevant information needs to be recalled.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://youtu.be/o3XN4dImESE" rel="noopener noreferrer"&gt;What is semantic search?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Today, we're gonna build a memory-enabled AI agent that helps users plan travel. It will remember user preferences, past trips, and important details across multiple conversations — even if the user leaves and comes back later.&lt;/p&gt;

&lt;p&gt;To do that, we'll build a Spring Boot app from scratch and use Redis as our memory store. It'll handle both short-term memory (conversation history) and long-term memory (facts and preferences as vector embeddings), enabling our agent to provide truly personalized assistance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redis as a Memory Store for AI Agents
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://youtu.be/Yhv19le0sBw" rel="noopener noreferrer"&gt;What is a vector database?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the last 15 years, Redis became the foundational infrastructure for realtime applications. Today, with Redis Open Source 8, it's committed to becoming the foundational infrastructure for AI applications as well.&lt;/p&gt;

&lt;p&gt;Redis Open Source 8 not only turns the community version of Redis into a Vector Database, but also makes it the fastest and most scalable database in the market today. Redis 8 allows you to scale to one billion vectors without penalizing latency.&lt;/p&gt;

&lt;p&gt;Learn more: &lt;a href="https://redis.io/blog/searching-1-billion-vectors-with-redis-8/" rel="noopener noreferrer"&gt;https://redis.io/blog/searching-1-billion-vectors-with-redis-8/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For AI agents, Redis serves as both:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A short-term memory store using Redis Lists to maintain conversation history&lt;/li&gt;
&lt;li&gt;A long-term memory store using Redis JSON and the Redis Query Engine that enables vector search to store and retrieve facts and experiences&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Spring AI and Redis
&lt;/h2&gt;

&lt;p&gt;Spring AI provides a unified API for working with various AI models and vector stores. Combined with Redis, it allows our users to easily build memory-enabled AI agents that can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Store and retrieve vector embeddings for semantic search&lt;/li&gt;
&lt;li&gt;Maintain conversation context across sessions&lt;/li&gt;
&lt;li&gt;Extract and deduplicate memories from conversations&lt;/li&gt;
&lt;li&gt;Summarize long conversations to prevent context window overflow&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Building the Application
&lt;/h2&gt;

&lt;p&gt;Our application will be built using Spring Boot with Spring AI and Redis. It will implement a travel assistant that remembers user preferences and past trips, providing personalized recommendations based on this memory.&lt;/p&gt;

&lt;h3&gt;
  
  
  0. GitHub Repository
&lt;/h3&gt;

&lt;p&gt;The full application can be found on GitHub: &lt;a href="https://github.com/redis-developer/redis-springboot-resources/tree/main/artificial-intelligence/agent-long-term-memory-with-spring-ai" rel="noopener noreferrer"&gt;https://github.com/redis-developer/redis-springboot-resources/tree/main/artificial-intelligence/agent-long-term-memory-with-spring-ai&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Add the required dependencies
&lt;/h3&gt;

&lt;p&gt;From a Spring Boot application, add the following dependencies to your Maven or Gradle file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.springframework.ai:spring-ai-transformers:1.0.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.springframework.ai:spring-ai-starter-vector-store-redis"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"org.springframework.ai:spring-ai-starter-model-openai"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.redis.om:redis-om-spring:1.0.0-RC3"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Define the Memory model
&lt;/h3&gt;

&lt;p&gt;The core of our implementation is the Memory class that represents items stored in long-term memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;EPISODIC&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;// Personal experiences and preferences&lt;/span&gt;
    &lt;span class="nc"&gt;SEMANTIC&lt;/span&gt;   &lt;span class="c1"&gt;// General knowledge and facts&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Configure the Vector Store
&lt;/h3&gt;

&lt;p&gt;We'll use Spring AI's &lt;code&gt;RedisVectorStore&lt;/code&gt; to store and search vector embeddings of memories:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Configuration&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryVectorStoreConfig&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;memoryVectorStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;EmbeddingModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;jedisPooled&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;JedisPooled&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jedisPooled&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embeddingModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;indexName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"longTermMemoryIdx"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;contentFieldName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;embeddingFieldName&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"embedding"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;metadataFields&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetadataField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"memoryType"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FieldType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TAG&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetadataField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FieldType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetadataField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FieldType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TAG&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;MetadataField&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FieldType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;TEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"long-term-memory:"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;initializeSchema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;true&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vectorAlgorithm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Algorithm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HSNW&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's break this down:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Index Name&lt;/strong&gt;: &lt;code&gt;longTermMemoryIdx&lt;/code&gt; - Redis will create an index with this name for searching memories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content Field&lt;/strong&gt;: &lt;code&gt;content&lt;/code&gt; - The raw memory content that will be embedded&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding Field&lt;/strong&gt;: &lt;code&gt;embedding&lt;/code&gt; - The field that will store the resulting vector embedding&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata Fields&lt;/strong&gt;:

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;memoryType&lt;/code&gt;: TAG field for filtering by memory type (EPISODIC or SEMANTIC)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;metadata&lt;/code&gt;: TEXT field for storing additional context about the memory&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;userId&lt;/code&gt;: TAG field for filtering by user ID&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;createdAt&lt;/code&gt;: TEXT field for storing the creation timestamp&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Implement the Memory Service
&lt;/h3&gt;

&lt;p&gt;The MemoryService handles storing and retrieving memories from Redis:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MemoryService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memoryVectorStore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RedisVectorStore&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;systemUserId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;

    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;storeMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"{}"&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;StoredMemory&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Check if a similar memory already exists to avoid duplicates&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;similarMemoryExists&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StoredMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;memoryType&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="n"&gt;systemUserId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;createdAt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Create a document for the vector store&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;document&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Document&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="nf"&gt;mapOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"memoryType"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"metadata"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="s"&gt;"userId"&lt;/span&gt; &lt;span class="nf"&gt;to&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="n"&gt;systemUserId&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="s"&gt;"createdAt"&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;toString&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// Store the document in the vector store&lt;/span&gt;
        &lt;span class="n"&gt;memoryVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;StoredMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;memoryType&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="n"&gt;systemUserId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;createdAt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;retrieveMemories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;distanceThreshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Float&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.9f&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;StoredMemory&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Build filter expression&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;b&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FilterExpressionBuilder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;filterList&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mutableListOf&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;FilterExpressionBuilder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Op&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;// Add user filter&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;effectiveUserId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="n"&gt;systemUserId&lt;/span&gt;
        &lt;span class="n"&gt;filterList&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;effectiveUserId&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;systemUserId&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

        &lt;span class="c1"&gt;// Add memory type filter if specified&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memoryType&lt;/span&gt; &lt;span class="p"&gt;!=&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;filterList&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"memoryType"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;memoryType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="c1"&gt;// Combine filters&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;filterExpression&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filterList&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;null&lt;/span&gt;
            &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;filterList&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;filterList&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;and&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;acc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;expr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;// Execute search&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;searchResults&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memoryVectorStore&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similaritySearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="nc"&gt;SearchRequest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;topK&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filterExpression&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;filterExpression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// Transform results to StoredMemory objects&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;searchResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mapNotNull&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;distanceThreshold&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;
                &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memoryObj&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;memoryType&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;valueOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"memoryType"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SEMANTIC&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="n"&gt;metadata&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"metadata"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;"{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="n"&gt;systemUserId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;createdAt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"createdAt"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?)&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="nc"&gt;LocalDateTime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nc"&gt;StoredMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memoryObj&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;null&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key features of the memory service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stores memories as vector embeddings in Redis&lt;/li&gt;
&lt;li&gt;Retrieves memories using vector similarity search&lt;/li&gt;
&lt;li&gt;Filters memories by user ID and memory type&lt;/li&gt;
&lt;li&gt;Prevents duplicate memories through similarity checking&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Implement Spring AI Advisors
&lt;/h3&gt;

&lt;p&gt;We’re going to rely on the Spring AI Advisors API. Advisors are a way to intercept, modify, and enhance AI-driven interactions.&lt;br&gt;
We will implement two advisors: one for retrieval and another for recorder. These advisors will be plugged in our ChatClient and intercept every interaction with the LLM.&lt;/p&gt;

&lt;p&gt;The retrieval advisor runs before your LLM call. It takes the user’s current message, performs a vector similarity search over Redis, and injects the most relevant memories into the system portion of the prompt so the model can ground its answer.&lt;/p&gt;
&lt;h3&gt;
  
  
  5.1 Advisor for Long-term memory retrieval
&lt;/h3&gt;

&lt;p&gt;The retrieval advisor runs before LLM calls. It takes the user’s current message, performs a vector similarity search over Redis, and injects the most relevant memories into the system portion of the prompt so the model can ground its answer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemoryRetrievalAdvisor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memoryService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MemoryService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CallAdvisor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Ordered&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="k"&gt;companion&lt;/span&gt; &lt;span class="k"&gt;object&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;USER_ID&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ltm_user_id"&lt;/span&gt;   
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;TOP_K&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ltm_top_k"&lt;/span&gt;      
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getOrder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ordered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HIGHEST_PRECEDENCE&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;40&lt;/span&gt;
  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getName&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"LongTermMemoryRetrievalAdvisor"&lt;/span&gt;

  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;adviseCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatClientRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CallAdvisorChain&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClientResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;context&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="nc"&gt;USER_ID&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;k&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;context&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="nc"&gt;TOP_K&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;query&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;userMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memories&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;memoryService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;retrieveRelevantMemories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;take&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memoryBlock&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;buildString&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Use the MEMORY below if relevant. Keep answers factual and concise."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"----- MEMORY -----"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEachIndexed&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"${i+1}. ${m.memory.content}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"------------------"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;enrichedPrompt&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;augmentSystemMessage&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;existing&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
      &lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mutate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="nf"&gt;buildString&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;memoryBlock&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;isNotBlank&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
              &lt;span class="nf"&gt;appendLine&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
              &lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;existing&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
          &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;enrichedReq&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mutate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enrichedPrompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nextCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enrichedReq&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  5.2 Advisor for Long-term memory recording
&lt;/h3&gt;

&lt;p&gt;The recorder advisor runs after the assistant responds. It looks at the last user message and the assistant’s reply, asks the model to extract atomic, useful facts (episodic or semantic), deduplicates them, and stores them in Redis.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemoryRecorderAdvisor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memoryService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MemoryService&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;chatModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatModel&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CallAdvisor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Ordered&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

  &lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;MemoryCandidate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;MemoryType&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;?)&lt;/span&gt;
  &lt;span class="kd"&gt;data class&lt;/span&gt; &lt;span class="nc"&gt;ExtractionResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;MemoryCandidate&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;emptyList&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

  &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;extractorConverter&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;BeanOutputConverter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ExtractionResult&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;java&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getOrder&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;Int&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Ordered&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HIGHEST_PRECEDENCE&lt;/span&gt; &lt;span class="p"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;
  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getName&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"LongTermMemoryRecorderAdvisor"&lt;/span&gt;

  &lt;span class="k"&gt;override&lt;/span&gt; &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;adviseCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatClientRequest&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;CallAdvisorChain&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClientResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// 1) Proceed with the normal call (other advisors may have enriched the prompt)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;res&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nextCall&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// 2) Build extraction prompt (user + assistant text of *this* turn)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;userText&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;userMessage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;assistantText&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chatResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;?.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="c1"&gt;// 3) Ask the model to extract long-term memories as structured JSON&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;schemaHint&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extractorConverter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jsonSchema&lt;/span&gt; &lt;span class="c1"&gt;// JSON schema string for the POJO&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;extractSystem&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"""
            You extract LONG-TERM MEMORIES from a dialogue turn.

            A memory is either:

            1. EPISODIC MEMORIES: Personal experiences and user-specific preferences
               Examples: "User prefers Delta airlines", "User visited Paris last year"

            2. SEMANTIC MEMORIES: General domain knowledge and facts
               Examples: "Singapore requires passport", "Tokyo has excellent public transit"

            Only extract clear, factual information. Do not make assumptions or infer information that isn't explicitly stated.
            If no memories can be extracted, return an empty array.

            The instance must conform to this JSON Schema (for validation, do not output it):
              $schemaHint

            Do not include code fences, schema, or properties. Output a single-line JSON object.
        """&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimIndent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;extractUser&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"""
            USER SAID:
            $userText

            ASSISTANT REPLIED:
            $assistantText

            Extract up to 5 memories with correct type; set userId if present/known.
        """&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimIndent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatOptions&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAiChatOptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;responseFormat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ResponseFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;type&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ResponseFormat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;JSON_OBJECT&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;extraction&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatModel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="nc"&gt;Prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nf"&gt;listOf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
          &lt;span class="nc"&gt;UserMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extractUser&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
          &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extractSystem&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;options&lt;/span&gt;
      &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;parsed&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extractorConverter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;convert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extraction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="nc"&gt;ExtractionResult&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c1"&gt;// 4) Persist memories (MemoryService handles dedupe/thresholding)&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"ltm_user_id"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;// optional per-call param&lt;/span&gt;
    &lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memories&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;forEach&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
      &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;owner&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;
      &lt;span class="n"&gt;memoryService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;storeMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;memoryType&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;userId&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;owner&lt;/span&gt;
      &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Plugging the advisors in our ChatClient
&lt;/h3&gt;

&lt;p&gt;In our ChatConfig class, we will configure our ChatClient as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;    &lt;span class="nd"&gt;@Bean&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;chatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;chatModel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatModel&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;// chatMemory: ChatMemory, (Necessary for short-term memory)&lt;/span&gt;
        &lt;span class="n"&gt;longTermRecorder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemoryRecorderAdvisor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;longTermMemoryRetrieval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemoryRetrievalAdvisor&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;builder&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chatModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;defaultAdvisors&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="c1"&gt;// MessageChatMemoryAdvisor.builder(chatMemory).build(),&lt;/span&gt;
                &lt;span class="n"&gt;longTermRecorder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;longTermMemoryRetrieval&lt;/span&gt;
            &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;build&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  7. Implement the Chat Service
&lt;/h3&gt;

&lt;p&gt;Since the advisors have been plugged in the ChatClient itself, we don’t need to worry about managing memory ourselves when interacting with the LLM. The only thing we need to make sure is that with every interaction we send the expected parameters, namely the session or user ID, so that the advisors know which history to look at.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Service&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatService&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;chatClient&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;shortTermMemoryRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ShortTermMemoryRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;travelAgentSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;chatMemoryRepository&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatMemoryRepository&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;log&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoggerFactory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatService&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;java&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Use userId as the key for conversation history and long-term memory&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Processing message from user $userId: $message"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nc"&gt;Prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;travelAgentSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="nc"&gt;UserMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;advisors&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;it&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;param&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatMemory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;CONVERSATION_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;param&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ltm_user_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chatResponse&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="o"&gt;!!&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;


    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getConversationHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Message&lt;/span&gt;&lt;span class="p"&gt;?&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chatMemoryRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;findByConversationId&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;clearConversationHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;shortTermMemoryRepository&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deleteById&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Cleared conversation history for user $userId from Redis"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  8. Configure the Agent System Prompt
&lt;/h3&gt;

&lt;p&gt;The agent is configured with a system prompt that explains its capabilities and access to different types of memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;travelAgentSystemPrompt&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nc"&gt;Message&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;promptText&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"""
        You are a travel assistant helping users plan their trips. You remember user preferences
        and provide personalized recommendations based on past interactions.

        You have access to the following types of memory:
        1. Short-term memory: The current conversation thread
        2. Long-term memory:
           - Episodic: User preferences and past trip experiences (e.g., "User prefers window seats")
           - Semantic: General knowledge about travel destinations and requirements

        Always be helpful, personal, and context-aware in your responses.

        Always answer in text format. No markdown or special formatting.
    """&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trimIndent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;promptText&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  9. Create the REST Controller
&lt;/h3&gt;

&lt;p&gt;The REST controller exposes endpoints for chat and memory management:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestController&lt;/span&gt;
&lt;span class="nd"&gt;@RequestMapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/api"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatController&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;chatService&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatService&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;

    &lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/chat"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;ChatRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;ChatResponse&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;val&lt;/span&gt; &lt;span class="py"&gt;result&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ChatResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;?:&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;metrics&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metrics&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@GetMapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/history/{userId}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;getHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@PathVariable&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="p"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;MessageDto&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chatService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getConversationHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt;
            &lt;span class="nc"&gt;MessageDto&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;role&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"system"&lt;/span&gt;
                    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;UserMessage&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"user"&lt;/span&gt;
                    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;AssistantMessage&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"assistant"&lt;/span&gt;
                    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;"unknown"&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;when&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;SystemMessage&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
                    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;UserMessage&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
                    &lt;span class="k"&gt;is&lt;/span&gt; &lt;span class="nc"&gt;AssistantMessage&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
                    &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@DeleteMapping&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/history/{userId}"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;fun&lt;/span&gt; &lt;span class="nf"&gt;clearHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@PathVariable&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;chatService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clearConversationHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running the Demo
&lt;/h2&gt;

&lt;p&gt;The easiest way to run the demo is with Docker Compose, which sets up all required services in one command.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Clone the repository
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/redis/redis-springboot-recipes.git
&lt;span class="nb"&gt;cd &lt;/span&gt;redis-springboot-recipes/artificial-intelligence/agent-long-term-memory-with-spring-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Configure your environment
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;.env&lt;/code&gt; file with your OpenAI API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;OPENAI_API_KEY=sk-your-api-key
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Start the services
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;--build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will start:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;redis: for storing both vector embeddings and chat history&lt;/li&gt;
&lt;li&gt;redis-insight: a UI to explore the Redis data&lt;/li&gt;
&lt;li&gt;agent-memory-app: the Spring Boot app that implements the memory-aware AI agent&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Step 4: Use the application
&lt;/h3&gt;

&lt;p&gt;When all services are running, go to &lt;code&gt;localhost:8080&lt;/code&gt; to access the demo. You'll see a travel assistant interface with a chat panel and a memory management sidebar:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjtn2ks4srwwtmu3mme8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftjtn2ks4srwwtmu3mme8.png" alt="Screenshot of the Redis Agent Memory demo web interface. The interface is titled “Travel Agent with Redis Memory” and features two main panels: a “Memory Management” section on the left with tabs for Episodic and Semantic memories (currently showing “No episodic memories yet”), and a “Travel Assistant” chat on the right displaying a welcome message. At the top right, there’s a field to enter a user ID and buttons to start or clear the chat. The interface is clean and styled with Redis branding." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enter a user ID and click "Start Chat":&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzc9982peiet2f9hsb0py.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzc9982peiet2f9hsb0py.png" alt="Close-up screenshot of the user ID input and chat controls. The label “User ID:” appears on the left with a text input field containing the value “raphael”. To the right are two red buttons labeled “Start Chat” and “Clear Chat”." width="363" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Send a message like: "Hi, my name's Raphael. I went to Paris back in 2009 with my wife for our honeymoon and we had a lovely time. For our 10-year anniversary we're planning to go back. Help us plan the trip!"&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbavzfshheyvac8o3zs3x.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbavzfshheyvac8o3zs3x.gif" alt="Animated screen recording of a user sending a message in the Redis Agent Memory demo. The user, identified as “raphael”, types a detailed message into the chat input box: “Hi, my name’s Raphael. I went to Paris back in 2009 with my wife for our honeymoon and we had a lovely time. For our 10-year anniversary we’re planning to go back. Help us plan the trip!” The cursor then clicks the red “Send” button, initiating the interaction with the AI travel assistant." width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The system will reply with the response to your message and, in case it identifies potential memories to be stored, they will be stored either as semantic or episodic memories. You can see the stored memories on the "Memory Management" sidebar.&lt;/p&gt;

&lt;p&gt;On top of that, with each message, the system will also return performance metrics.&lt;/p&gt;

&lt;p&gt;If you refresh the page, you will see that all memories and the chat history are gone. &lt;/p&gt;

&lt;p&gt;If you reenter the same user ID, the long-term memories will be reloaded on the sidebar and the short-term memory (the chat history) will be reloaded as well:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi3t9hf3563r6tqnedjl.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsi3t9hf3563r6tqnedjl.gif" alt="Animated screen recording of the Redis Agent Memory demo after sending a message. The sidebar under “Episodic Memories” now shows two stored entries: one noting that the user went to Paris in 2009 for their honeymoon, and another about planning a return for their 10-year anniversary. The chat assistant responds with a personalized message suggesting activities and asking follow-up questions. The browser page is then refreshed, clearing both the chat history and memory display. After re-entering the same user ID, the agent reloads the long-term memories in the sidebar and restores the conversation history, demonstrating persistent memory retrieval." width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If you refresh the page and enter the same user ID, your memories and conversation history will be reloaded&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6eided1rhbxxoes2eaht.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6eided1rhbxxoes2eaht.gif" alt="Animated screen recording of a cleared chat session in the Redis Agent Memory demo. The “Episodic Memories” panel still shows two past memories about a trip to Paris. In the chat panel, the message “Conversation cleared. How can I assist you today?” appears, indicating that the short-term memory has been reset. The user is about to start a new conversation. This demonstrates that although the short-term context is gone, the agent retains access to long-term memories, allowing it to respond with relevant information from past interactions." width="760" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Exploring the Data in Redis Insight
&lt;/h2&gt;

&lt;p&gt;RedisInsight provides a visual interface for exploring the data stored in Redis. Access it at &lt;code&gt;localhost:5540&lt;/code&gt; to see:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Short-term memory (conversation history) stored in Redis Lists&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq3d8c7jub35362acujt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzq3d8c7jub35362acujt.png" alt="Screenshot of RedisInsight displaying the contents of the conversation:raphael key. The selected key is a Redis list representing a conversation history. On the right panel, the list shows four indexed elements: system prompts defining the assistant’s role and memory access, a user message asking “Where did I go back in 2009?”, and the assistant’s reply recalling a previous trip to Paris. Below this, several memory entries stored as JSON keys are also visible. This illustrates how short-term chat history is preserved in Redis and replayed per user session." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Long-term memory (facts and experiences) stored as JSON documents with vector embeddings&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5e9x8q1d0pb2mizac9k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fo5e9x8q1d0pb2mizac9k.png" alt="Screenshot of RedisInsight showing a semantic memory stored in Redis. The selected key is a JSON object with the name memory:04d04.... The right panel displays the memory’s fields: createdAt timestamp, empty metadata, memoryType set to “SEMANTIC”, an embedding vector (collapsed), userId set to “system”, and the memory content: “Paris is a beautiful city known for celebrating love”. This illustrates how general knowledge is stored as semantic memory in the AI agent." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The vector index schema used for similarity search&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you run the &lt;code&gt;FT.INFO longTermMemoryIdx&lt;/code&gt; command in the RedisInsight workbench, you'll see the details of the vector index schema that enables efficient memory retrieval.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hwu6hquvpfik5lhrs31.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0hwu6hquvpfik5lhrs31.png" alt="Screenshot of RedisInsight Workbench showing the schema details of the longTermMemoryIdx vector index. The result of the FT.INFO longTermMemoryIdx command displays an index on JSON documents prefixed with memory:. The schema includes: •    $.content as a TEXT field named content  •    $.embedding as a VECTOR field using HNSW with 384-dimension FLOAT32 vectors and COSINE distance  •    $.memoryType and $.userId as TAG fields  •    $.metadata and $.createdAt as TEXT fields  This shows how memory data is structured and searchable in Redis using RediSearch vector similarity." width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrapping up
&lt;/h2&gt;

&lt;p&gt;And that's it — you now have a working AI agent with memory using Spring Boot and Redis.&lt;/p&gt;

&lt;p&gt;Instead of forgetting everything between conversations, your agent can now remember user preferences, past experiences, and important facts. Redis handles both short-term memory (conversation history) and long-term memory (vector embeddings) — all with the performance and scalability Redis is known for.&lt;/p&gt;

&lt;p&gt;With Spring AI and Redis, you get an easy way to integrate this into your Java applications. The combination of vector similarity search for semantic retrieval and traditional data structures for conversation history gives you a powerful foundation for building truly intelligent agents.&lt;/p&gt;

&lt;p&gt;Whether you're building customer service bots, personal assistants, or domain-specific experts, this memory architecture gives you the tools to create more helpful, personalized, and context-aware AI experiences.&lt;/p&gt;

&lt;p&gt;Try it out, experiment with different memory types, explore other embedding models, and see how far you can push the boundaries of AI agent capabilities!&lt;/p&gt;

&lt;p&gt;Stay Curious!&lt;/p&gt;

</description>
      <category>springboot</category>
      <category>ai</category>
      <category>redis</category>
      <category>vectordatabase</category>
    </item>
    <item>
      <title>How I Improved Zero-Shot Classification in Deep Java Library (DJL) OSS</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Sun, 15 Jun 2025 17:15:42 +0000</pubDate>
      <link>https://dev.to/raphaeldelio/how-i-improved-zero-shot-classification-in-deep-java-library-djl-oss-1ni0</link>
      <guid>https://dev.to/raphaeldelio/how-i-improved-zero-shot-classification-in-deep-java-library-djl-oss-1ni0</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Did you know the Deep Java Library (DJL) powers Spring AI and Redis OM Spring? DJL helps you run machine learning models right inside your Java applications. &lt;/p&gt;

&lt;p&gt;Check them out:&lt;br&gt;
Spring AI with DJL: &lt;a href="https://docs.spring.io/spring-ai/reference/api/embeddings/onnx.html" rel="noopener noreferrer"&gt;https://docs.spring.io/spring-ai/reference/api/embeddings/onnx.html&lt;/a&gt;&lt;br&gt;
Semantic Search with SpringBoot &amp;amp; Redis: &lt;a href="https://foojay.io/today/semantic-search-with-spring-boot-redis/" rel="noopener noreferrer"&gt;https://foojay.io/today/semantic-search-with-spring-boot-redis/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;TL;DR:&lt;/strong&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;You’re doing zero-shot classification in a Java app using DJL.&lt;/li&gt;
&lt;li&gt;DJL didn’t handle some models well — like DeBERTa. It missed support for token_type_ids, assumed wrong label positions, and oversimplified the softmax implementation.&lt;/li&gt;
&lt;li&gt;It was fixed by reading the model config files and adjusting DJL's translator logic.&lt;/li&gt;
&lt;li&gt;Now DJL gives correct results across different models — just like the Transformers library does in Python.&lt;/li&gt;
&lt;li&gt;The fix is merged and will probably be released with version 0.34.0.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📚 Index
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Introduction: What is Zero-Shot Classification?&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Integrating a Zero-Shot Classification Model with the Deep Java Library&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Problem #1: No support for token_input_ids&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Problem #2: Hard coded logit positions and wrong softmax implementation&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Contributing to the Deep Java Library&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Final Words&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What’s Zero-Shot Classification (and Why It Matters)
&lt;/h2&gt;

&lt;p&gt;Zero-shot classification is a machine learning technique that allows models to classify text into categories they haven’t explicitly seen during training. Unlike traditional classification models that can only predict classes they were trained on, zero-shot classifiers can generalize to new, unseen categories.&lt;/p&gt;

&lt;p&gt;One example of a zero-shot classification model is &lt;code&gt;MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli&lt;/code&gt;. Like many other models for this task, it works by comparing a sentence (the premise) to different hypotheses (the labels) and scoring how likely each one is to be true.&lt;/p&gt;

&lt;p&gt;For example, we can compare “Java is a great programming language”, the premise, to “Software Engineering, Software Programming, and Politics”, the hypotheses. In this case, the model will return:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Software Programming: 0.984
Software Engineering: 0.015
Politics: 0.001
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Meaning that “Software Programming” is the hypothesis that best classifies the premise.&lt;/p&gt;

&lt;p&gt;In this example, we’re comparing the premise to all hypotheses, but we could also compare them individually. We can do it by enabling the “multi_label” option. In this case, it will return:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Software Programming: 0.998
Software Engineering: 0.668
Politics: 0.000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;With a higher score for “Software Engineering” and an even lower score for “Politics.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0eh1wxby225hn7xsjoso.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0eh1wxby225hn7xsjoso.gif" width="760" height="646"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can easily try it out at: &lt;a href="https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli?candidate_labels=Software+Engineering%2C+Software+Programming%2C+Politics&amp;amp;multi_class=true&amp;amp;text=Java+is+a+great+programming+language" rel="noopener noreferrer"&gt;https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Integrating a Zero-Shot Classification Model with the Deep Java Library
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6x56bnythsqpuyrlde5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr6x56bnythsqpuyrlde5.png" width="800" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Deep Java Library (DJL) is an open-source library that makes it easier to work with machine learning models in Java.&lt;/strong&gt; It lets you run models locally, in-process, inside your Java application. It supports many engines (like PyTorch and TensorFlow), and it can load models directly from Hugging Face or from disk.&lt;/p&gt;

&lt;p&gt;A cool thing about this library is that **it hosts a collection of pre-trained models in its model zoo. **Those models are ready to use for common tasks like image classification, object detection, text classification, and more. They are curated and maintained by the DJL team to ensure they work out of the box with DJL’s APIs and that developers can load these models easily using a simple criteria-based API.&lt;/p&gt;

&lt;p&gt;One example is this zero-shot classification model developed by Facebook: facebook/bart-large-mnli. This model is hosted by DJL in their model zoo and can easily be reached at the following URI djl://ai.djl.huggingface.pytorch/facebook/bart-large-mnli.&lt;/p&gt;

&lt;p&gt;Let’s see how we can easily load it into our Java application and use it to classify text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependencies
&lt;/h3&gt;

&lt;p&gt;The dependencies we’re gonna be using are:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight kotlin"&gt;&lt;code&gt;&lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ai.djl.huggingface:tokenizers:0.32.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ai.djl.pytorch:pytorch-engine:0.32.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;implementation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"ai.djl:model-zoo:0.32.0"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Criteria Class
&lt;/h3&gt;

&lt;p&gt;The Criteria class in DJL is a builder-style utility that tells DJL &lt;strong&gt;how to load and use a model&lt;/strong&gt;. It defines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Input and output types&lt;/strong&gt; (e.g., ZeroShotClassificationInput, ZeroShotClassificationOutput)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Where to get the model from&lt;/strong&gt; (like a URL or model zoo ID)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Which engine to use&lt;/strong&gt; (e.g., PyTorch, TensorFlow, ONNX)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Extra arguments&lt;/strong&gt; (like tokenizer ID, batch size, device)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Custom logic&lt;/strong&gt;, like a translator to convert between raw inputs/outputs and tensors&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;modelUrl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"djl://ai.djl.huggingface.pytorch/facebook/bart-large-mnli"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

&lt;span class="nc"&gt;Criteria&lt;/span&gt; &lt;span class="n"&gt;criteria&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Criteria&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optModelUrls&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;modelUrl&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optEngine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PyTorch"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setTypes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ZeroShotClassificationInput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationOutput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optTranslatorFactory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationTranslatorFactory&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When building a Criteria in DJL, we need to pick an engine that matches what the model was trained with. Most Hugging Face models use PyTorch. We also have to define the input and output types the model expects. &lt;strong&gt;For zero-shot classification, DJL gives us ready-to-use classes:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ZeroShotClassificationInput&lt;/strong&gt;: lets us set the text (premise), candidate labels (hypotheses), whether it’s multi-label, and a hypothesis template;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ZeroShotClassificationOutput&lt;/strong&gt;: returns the labels with their confidence scores.&lt;/p&gt;

&lt;p&gt;Under the hood, machine learning models work with tensors that are basically arrays of numbers. &lt;strong&gt;To go from readable input to tensors and then back from model output to readable results, DJL uses a Translator.&lt;/strong&gt; The &lt;strong&gt;ZeroShotClassificationTranslatorFactory&lt;/strong&gt; creates a translator that knows how to tokenize the input text and how to turn raw model outputs (logits) into useful scores.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loading and using the model
&lt;/h3&gt;

&lt;p&gt;Loading the model is easy — you just call ModelZoo.loadModel(criteria). The criteria tells DJL what kind of model you’re looking for, like the engine (PyTorch), input/output types, and where to find it. Once the model is loaded, we get a Predictor from it. That’s what we use to actually run the predictions.&lt;/p&gt;

&lt;p&gt;Next, we prepare the input. In this example, we’re checking how related the sentence &lt;em&gt;“Java is the best programming language”&lt;/em&gt; is to a few labels like &lt;em&gt;“Software Engineering”&lt;/em&gt;, &lt;em&gt;“Software Programming”&lt;/em&gt;, and &lt;em&gt;“Politics”&lt;/em&gt;. Since a sentence can relate to more than one label, we set multiLabel to true.&lt;/p&gt;

&lt;p&gt;Then, we run the prediction and check the result that contains the labels and their scores. Basically, how likely it is that the input belongs to each category.&lt;/p&gt;

&lt;p&gt;Finally, we loop over the results and print each label with its score. Once we’re done, we clean up by closing the predictor and model, which is always a good practice to free up resources.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Load the model&lt;/span&gt;
&lt;span class="nc"&gt;Model&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ModelZoo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;loadModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;criteria&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;Predictor&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newPredictor&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Create the input&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;inputText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Java is the best programming language"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;candidateLabels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"Software Engineering"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Software Programming"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Politics"&lt;/span&gt;&lt;span class="o"&gt;};&lt;/span&gt;
&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;multiLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="nc"&gt;ZeroShotClassificationInput&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationInput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputText&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidateLabels&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;multiLabel&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Perform the prediction&lt;/span&gt;
&lt;span class="nc"&gt;ZeroShotClassificationOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;predict&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Print results&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\nClassification results:"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getLabels&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getScores&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;length&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;labels&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="s"&gt;": "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;]);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Clean up resources&lt;/span&gt;
&lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By running the code above, we should see the following output:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Classification results:
Software Programming: 0.82975172996521
Software Engineering: 0.15263372659683228
Politics: 0.017614541575312614
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This has been easy so far. But what if you want to use a different model?&lt;/p&gt;

&lt;h2&gt;
  
  
  Using different models
&lt;/h2&gt;

&lt;p&gt;If you want to use a different model, you have two options: pick one that’s hosted by DJL or load one directly from Hugging Face. To see all the models that DJL hosts, just run the code below , it’ll all available models.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Create an empty criteria to fetch all available models&lt;/span&gt;
&lt;span class="nc"&gt;Criteria&lt;/span&gt; &lt;span class="n"&gt;criteria&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Criteria&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// List available model names&lt;/span&gt;
&lt;span class="nc"&gt;Set&lt;/span&gt; &lt;span class="n"&gt;modelNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ModelZoo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;listModels&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;criteria&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Available models from DJL:"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;modelNames&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"- "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will output multiple models for you with their respective URIs that you can simply replace on the criteria we implemented previously in this tutorial. It should just work.&lt;/p&gt;

&lt;p&gt;However, if you want to host a model that is not available in the Model Zoo, you will have to not only download it from HuggingFace, but also convert it to a format that is compatible with DJL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Using a model that is not available in the Model Zoo
&lt;/h2&gt;

&lt;p&gt;The model I want to use is the one I introduced in the beginning of this article: MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli. It’s not available in the Model Zoo, so we will need to perform a few extra steps to make it compatible with DJL.&lt;/p&gt;

&lt;p&gt;Hugging Face models are made for Python. So, we need to convert them before using them with DJL.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To bridge this gap, DJL provides a tool called djl-convert that transforms these models into a format that works in Java&lt;/strong&gt;, removing Python-specific dependencies to make them ready for efficient inference with DJL.&lt;/p&gt;

&lt;p&gt;To install djl-convert, you can run the following commands in your terminal: (All details here)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;    &lt;span class="c"&gt;# install release version of djl-converter&lt;/span&gt;
    pip &lt;span class="nb"&gt;install &lt;/span&gt;https://publish.djl.ai/djl_converter/djl_converter-0.30.0-py3-none-any.whl
    &lt;span class="c"&gt;# install from djl master branch&lt;/span&gt;
    pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"git+https://github.com/deepjavalibrary/djl.git#subdirectory=extensions/tokenizers/src/main/python"&lt;/span&gt;
    &lt;span class="c"&gt;# install djl-convert from local djl repo&lt;/span&gt;
    git clone https://github.com/deepjavalibrary/djl.git
    &lt;span class="nb"&gt;cd &lt;/span&gt;djl/extensions/tokenizers/src/main/python
    python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
    &lt;span class="c"&gt;# Add djl-convert to PATH (if installed locally or not globally available)&lt;/span&gt;
    &lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$HOME&lt;/span&gt;&lt;span class="s2"&gt;/.local/bin:&lt;/span&gt;&lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="c"&gt;# install optimum if you want to convert to OnnxRuntime&lt;/span&gt;
    pip &lt;span class="nb"&gt;install &lt;/span&gt;optimum
    &lt;span class="c"&gt;# convert a single model to TorchScript, Onnxruntime or Rust&lt;/span&gt;
    djl-convert &lt;span class="nt"&gt;--help&lt;/span&gt;
    &lt;span class="c"&gt;# import models as DJL Model Zoo&lt;/span&gt;
    djl-import &lt;span class="nt"&gt;--help&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, you can run the following command to convert the model to a format DJL can understand:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;djl-convert -m MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This will store the converted model under folder model/DeBERTa-v3-large-mnli-fever-anli-ling-wanli in the working directory.&lt;/p&gt;

&lt;p&gt;Now we’re ready to go back to our Java application.&lt;/p&gt;

&lt;h3&gt;
  
  
  Loading a local model with DJL
&lt;/h3&gt;

&lt;p&gt;Loading a local model is also straightforward. Instead of loading it from the DJL URL, you’re going to load it from the directory that was created during the conversion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;Criteria&lt;/span&gt; &lt;span class="n"&gt;criteria&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Criteria&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optModelPath&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Paths&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"model/DeBERTa-v3-large-mnli-fever-anli-ling-wanli"&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optEngine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PyTorch"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setTypes&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ZeroShotClassificationInput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationOutput&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;optTranslatorFactory&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationTranslatorFactory&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
                &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running it should be as straightforward as before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Load the model&lt;/span&gt;
&lt;span class="nc"&gt;Model&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ModelZoo&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;loadModel&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;criteria&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;Predictor&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newPredictor&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="c1"&gt;// Create the input&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;inputText&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Java is the best programming language"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;candidateLabels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;&lt;span class="s"&gt;"Software Engineering"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Software Programming"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Politics"&lt;/span&gt;&lt;span class="o"&gt;};&lt;/span&gt;
&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;multiLabel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="nc"&gt;ZeroShotClassificationInput&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationInput&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputText&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;candidateLabels&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;multiLabel&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Perform the prediction&lt;/span&gt;
&lt;span class="nc"&gt;ZeroShotClassificationOutput&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;predictor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;predict&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="c1"&gt;// Print results&lt;/span&gt;
&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"\nClassification results:"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;labels&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getLabels&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="kt"&gt;double&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;scores&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getScores&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  &lt;span class="nf"&gt;Dict&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;str&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Tensor&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Problem #1: No support for token_input_ids
&lt;/h2&gt;

&lt;p&gt;Not every Zero-Shot Classification Model is the same, and one thing that sets them apart is whether they use &lt;strong&gt;token type IDs&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Toke Type IDs are just extra markers that tell the model where one part of the input ends and the other begins, like separating the main sentence from the label it’s being compared to. &lt;/p&gt;

&lt;p&gt;Some models, like BERT or DeBERTa, were trained to expect these markers, so they need them to work properly. Others, like RoBERTa or BART, were trained without them and just ignore that input.&lt;/p&gt;

&lt;p&gt;And well, DJL’s ZeroShotClassificationTranslator had been implemented and tested with a BART model, which didn’t require token_type_ids to work properly.&lt;/p&gt;

&lt;p&gt;By digging into the implementation of ZeroShotClassificationTranslator, I was able to see that token_type_ids were actually supported by DJL, it was simply hardcoded in the Translator, not allowing us to set it even if we initialized the Translator with its Builder:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Line 85 of ZeroShotClassificationTranslator: https://github.com/deepjavalibrary/djl/blob/fe8103c7498f23e209adc435410d9f3731f8dd65/extensions/tokenizers/src/main/java/ai/djl/huggingface/translator/ZeroShotClassificationTranslator.java&lt;/span&gt;
&lt;span class="c1"&gt;// Token Type Ids is hardcoded to false&lt;/span&gt;
&lt;span class="nc"&gt;NDList&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;toNDList&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;manager&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;int32&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I fixed this by adding a method to the Translator Builder. This method sets the token_type_id property during initialization. I also refactored the class to make it work.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ZeroShotClassificationTranslator&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="nf"&gt;optTokenTypeId&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;withTokenType&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;tokenTypeId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;withTokenType&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And even though it worked as I expected, I was surprised to find out that the scores that were output way off from scores I expected. &lt;/p&gt;

&lt;p&gt;While Python’s Transformers library would output the following, correct, results:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Software Programming: 0.9982864856719971
Software Engineering: 0.7510316371917725
Politics: 0.00020543287973850965
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The Deep Java Library was outputting completely wrong scores:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Politics: 0.9988358616828918
Software Engineering: 0.0009450475918129086
Software Programming: 0.00021904722962062806
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You can see that the scores were so wrong that it actually output that Politics was the label that best fit our premise: “Java is the best programming language.”&lt;/p&gt;

&lt;p&gt;What’s going on here?&lt;/p&gt;

&lt;h2&gt;
  
  
  Problem #2: Hard coded logit positions nad oversimplified softmax implementation
&lt;/h2&gt;

&lt;p&gt;To understand what’s going on, we also need to understand how Zero-Shot Classification models work. These models aren’t trained to classify things directly. Instead, they take two sentences, the input and the label as a hypothesis, and decide how they relate.&lt;/p&gt;

&lt;p&gt;They return logits: raw scores for each label like “entailment”, “contradiction”, or “neutral”. These logits are just numbers. To make them readable, we apply softmax, which turns them into probabilities between 0 and 1.&lt;/p&gt;

&lt;p&gt;DJL’s original implementation didn’t handle this properly. It grabbed the last logit from each label’s output, assuming it was the “entailment” score. Then, it normalized those scores across all labels.&lt;/p&gt;

&lt;p&gt;This approach ignored how each label is its own comparison. Each one is a separate classification task. So softmax must be applied within each label, not across all labels.&lt;/p&gt;

&lt;p&gt;Also, not all models use the same order for their logits. We can’t assume “entailment” is always the last. To know the correct position, we should read the model’s config.json and check the label2id field.&lt;/p&gt;

&lt;p&gt;This mapping shows which index belongs to each class. Using it, we can apply softmax to the correct pair, usually “entailment” and “contradiction,” for each label.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Check an example of a config.json file here: &lt;a href="https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli/blob/main/config.json" rel="noopener noreferrer"&gt;https://huggingface.co/MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli/blob/main/config.json&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Therefore, I not only had to fix the way softmax was applied, but also make sure we were using the correct index for the entailment score — based on what the model actually defines in its config. That meant reading the label2id mapping from config.json, identifying which index corresponds to “&lt;em&gt;entailment&lt;/em&gt;” and “&lt;em&gt;contradiction&lt;/em&gt;”, and then applying softmax to just those two values for each label.&lt;/p&gt;

&lt;p&gt;After refactoring the softmax logic, the translator started outputting the expected results. To test it with different types of models, I created a GitHub repository comparing the expected results from Python’s Transformers Library with the refactored ZeroShotClassificationTranslator. &lt;/p&gt;

&lt;p&gt;You can check it out at: &lt;a href="https://github.com/raphaeldelio/deep-java-library-zero-shot-classification-comparison-to-python/" rel="noopener noreferrer"&gt;https://github.com/raphaeldelio/deep-java-library-zero-shot-classification-comparison-to-python/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Contributing to the Deep Java Library
&lt;/h2&gt;

&lt;p&gt;After I had tested and made sure the translator was working as expected, it was time to contribute back to the library. I opened a pull request to the DJL repository with the changes I had made. The maintainer was super responsive and helped me refactor my changes to follow the guidelines of the project, and after a few tweaks, the changes were approved and merged.&lt;/p&gt;

&lt;p&gt;As a result, you can find the PR here: &lt;a href="https://github.com/deepjavalibrary/djl/pull/3712" rel="noopener noreferrer"&gt;https://github.com/deepjavalibrary/djl/pull/3712&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Words
&lt;/h2&gt;

&lt;p&gt;If you’re a Java developer working with AI, I really encourage you to check out the &lt;a href="https://github.com/deepjavalibrary/djl" rel="noopener noreferrer"&gt;Deep Java Library&lt;/a&gt;, the &lt;a href="https://docs.spring.io/spring-ai/" rel="noopener noreferrer"&gt;Spring AI&lt;/a&gt;, and the &lt;a href="https://github.com/redis/redis-om-spring" rel="noopener noreferrer"&gt;Redis OM Spring&lt;/a&gt; projects, which build on top of it.&lt;/p&gt;

&lt;p&gt;Thank you for following along! &lt;/p&gt;

&lt;h3&gt;
  
  
  Stay Curious
&lt;/h3&gt;

</description>
      <category>java</category>
      <category>machinelearning</category>
      <category>ai</category>
      <category>huggingfacetransformers</category>
    </item>
    <item>
      <title>How to send prompts in bulk with Spring AI and Java Virtual Threads</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Tue, 13 May 2025 08:40:00 +0000</pubDate>
      <link>https://dev.to/raphaeldelio/how-to-send-prompts-in-bulk-with-spring-ai-and-virtual-threads-30f7</link>
      <guid>https://dev.to/raphaeldelio/how-to-send-prompts-in-bulk-with-spring-ai-and-virtual-threads-30f7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftlib40kdyzhyjkr0f4r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fftlib40kdyzhyjkr0f4r.png" alt=" " width="800" height="310"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;TL;DR: You’re building an AI-powered app that needs to send lots of prompts to OpenAI.&lt;br&gt;
 Instead of sending them one by one, you want to do it in bulk — efficiently and safely.&lt;br&gt;
 This is how you can use Spring AI with Java Virtual Threads to process hundreds of prompts in parallel.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When calling LLM APIs like OpenAI, you’re dealing with a high-latency, network-bound task. Normally, doing that in a loop slows you down and blocks threads. But with Spring AI and Java 21 Virtual Threads, you can fire off hundreds of requests in parallel without killing your app.&lt;/p&gt;

&lt;p&gt;This is particularly useful when you want the LLM to perform actions such as summarizing or extracting information from lots of documents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Here’s the flow:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Get your list of text inputs.&lt;/li&gt;
&lt;li&gt;Filter the ones that need processing.&lt;/li&gt;
&lt;li&gt;Split them into batches.&lt;/li&gt;
&lt;li&gt;For each batch:
— Use Virtual Threads to make OpenAI calls in parallel
— Wait for all calls to finish (using CompletableFuture)
— Save the results&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Virtual Threads for Massive Parallelism
&lt;/h3&gt;

&lt;p&gt;Java Virtual Threads are perfect for this. They’re lightweight, run on the JVM, and don’t block OS threads. Ideal for I/O-heavy operations like talking to APIs.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ExecutorService executorService = Executors.newVirtualThreadPerTaskExecutor()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Each OpenAI request runs in its own thread, but without the overhead of real threads.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spring AI Prompt Call
&lt;/h3&gt;

&lt;p&gt;You create a Prompt, then send it to the model:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ChatResponse response = chatModel.call(
  new Prompt(List.of(
    new SystemMessage(“You are a helpful assistant…”),
    new UserMessage(userInput)
  ))
);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;You get back a structured response. From there, you just extract the output:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;String summary = response.getResult().getOutput().getText();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Processing in Batches
&lt;/h3&gt;

&lt;p&gt;Sending all prompts at once isn’t a good idea (rate limits, reliability, memory). Instead, chunk them into smaller batches (e.g., 300 items):&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;int batchSize = 300;
int totalBatches = (inputs.size() + batchSize — 1) / batchSize;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For each batch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Launch a CompletableFuture for every input&lt;/li&gt;
&lt;li&gt;Wait for all with CompletableFuture.allOf(…).join()&lt;/li&gt;
&lt;li&gt;Collect the results&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Handling Errors Gracefully
&lt;/h3&gt;

&lt;p&gt;Each task is wrapped in a try/catch block. So if one OpenAI call fails, it doesn’t crash the batch. You just skip that result.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.map(input -&amp;gt; CompletableFuture.supplyAsync(() -&amp;gt; {
  try {
    ChatResponse r = chatModel.call(…);
    return r.getResult().getOutput().getText();
  } catch (Exception e) {
    return null;
  }
}))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Process Results in Bulk
&lt;/h3&gt;

&lt;p&gt;After processing each batch:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Filter out the failed ones&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Process the valid results&lt;/p&gt;

&lt;p&gt;List processed = futures.stream()&lt;br&gt;
    .map(CompletableFuture::join)&lt;br&gt;
    .filter(Objects::nonNull)&lt;br&gt;
    .toList();&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Full Implementation
&lt;/h3&gt;

&lt;p&gt;In this example, we get a list of text, and send them to OpenAI in batches to get a summary. We do that in parallel, which makes the process much faster. After getting the summaries, we saves the results. Everything runs in a way that handles errors and avoids overloading the system.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Service
public class BulkSummarizationService {

    private static final Logger logger = LoggerFactory.getLogger(BulkSummarizationService.class);
    private final ChatClient chatClient;
    private final TextRepository textRepository;

    public BulkSummarizationService(ChatClient chatClient, TextRepository textRepository) {
        this.chatClient = chatClient;
        this.textRepository = textRepository;
    }

    public void summarizeTexts(boolean overwrite) {
        logger.info("Starting bulk summarization");
        List&amp;lt;TextData&amp;gt; textsToSummarize = textRepository.findAll();
        logger.info("Found {} texts to summarize", textsToSummarize.size());

        if (textsToSummarize.isEmpty()) return;

        int batchSize = 300;
        int totalBatches = (textsToSummarize.size() + batchSize - 1) / batchSize;

        try (ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor()) {
            for (int i = 0; i &amp;lt; totalBatches; i++) {
                int start = i * batchSize;
                int end = Math.min(start + batchSize, textsToSummarize.size());
                List&amp;lt;TextData&amp;gt; batch = textsToSummarize.subList(start, end);

                logger.info("Processing batch {} of {} ({} items)", i + 1, totalBatches, batch.size());

                List&amp;lt;CompletableFuture&amp;lt;TextData&amp;gt;&amp;gt; futures = batch.stream()
                        .map(text -&amp;gt; CompletableFuture.supplyAsync(() -&amp;gt; {
                            try {
                                ChatResponse response = chatClient.call(
                                        new Prompt(List.of(
                                                new SystemMessage("""
                                                    You are a helpful assistant that summarizes long pieces of text.
                                                    Focus on keeping the summary dense and informative.
                                                    Limit to 512 words.
                                                """),
                                                new UserMessage(text.getContent())
                                        ))
                                );
                                text.setSummary(response.getResult().getOutput().getText());
                                return text;
                            } catch (Exception e) {
                                logger.error("Failed to summarize text with ID: {}", text.getId(), e);
                                return null;
                            }
                        }, executor))
                        .toList();

                CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();

                List&amp;lt;TextData&amp;gt; summarized = futures.stream()
                        .map(CompletableFuture::join)
                        .filter(Objects::nonNull)
                        .toList();

                if (!summarized.isEmpty()) {
                    textRepository.saveAll(summarized);
                    logger.info("Saved {} summaries", summarized.size());
                }
            }
        }

        logger.info("Bulk summarization complete");
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And that’s it! You now have a fully async, high-throughput pipeline that can send hundreds of prompts to OpenAI — safely and efficiently — using nothing but Spring AI, Java Virtual Threads, and good batching.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stay curious!
&lt;/h3&gt;

</description>
      <category>springboot</category>
      <category>java</category>
      <category>ai</category>
      <category>jvm</category>
    </item>
    <item>
      <title>Semantic Search with Spring Boot &amp; Redis</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Tue, 29 Apr 2025 08:48:59 +0000</pubDate>
      <link>https://dev.to/redis/semantic-search-with-spring-boot-redis-48l0</link>
      <guid>https://dev.to/redis/semantic-search-with-spring-boot-redis-48l0</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;TL;DR:&lt;/strong&gt;&lt;br&gt;
You’re building a &lt;strong&gt;semantic search app&lt;/strong&gt; using &lt;strong&gt;Spring Boot&lt;/strong&gt; and &lt;strong&gt;Redis&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of matching exact words, semantic search finds &lt;strong&gt;meaning&lt;/strong&gt; using &lt;strong&gt;Vector Similarity Search (VSS)&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It works by turning movie synopses into &lt;strong&gt;vectors&lt;/strong&gt; with &lt;strong&gt;embedding models&lt;/strong&gt;, storing them in &lt;strong&gt;Redis&lt;/strong&gt; (as a vector database), and finding the closest matches to user queries.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx9xqm8mh0qh137eo9uab.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fx9xqm8mh0qh137eo9uab.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://www.youtube.com/watch?v=o3XN4dImESE" rel="noopener noreferrer"&gt;What is semantic search?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A traditional searching system works by matching the words a user types with the words stored in a database or document collection. It usually looks for exact or partial matches without understanding the meaning behind the words.&lt;/p&gt;

&lt;p&gt;Semantic searching, on the other hand, tries to understand the meaning behind what the user is asking. &lt;strong&gt;It focuses on the concepts, not just the keywords,&lt;/strong&gt; making it much easier for users to find what they really want.&lt;/p&gt;

&lt;p&gt;In a movie streaming service, for example, if a movie’s synopsis is stored in a database as &lt;strong&gt;“A cowboy doll feels threatened when a new space toy becomes his owner’s favorite,”&lt;/strong&gt; but the user searches for &lt;strong&gt;“jealous toy struggles with new rival,”&lt;/strong&gt; a traditional search system might not find the movie because the exact words don’t line up. &lt;/p&gt;

&lt;p&gt;But a semantic a semantic search system can still connect the two ideas and bring up the right movie. It understands the &lt;em&gt;meaning&lt;/em&gt; behind your query — not just the exact words.&lt;/p&gt;

&lt;p&gt;Behind the scenes, this works thanks to &lt;strong&gt;vector similarity search&lt;/strong&gt;. It turns text (or images, or audio) into vectors — lists of numbers —store them in a vector database and then finds the ones closest to your query. &lt;/p&gt;

&lt;p&gt;Today, &lt;strong&gt;we’re gonna build a vector similarity search app that lets users find movies based on the *meaning *of their synopsis — not just exact keyword matches&lt;/strong&gt;. So that even if they don’t know the title, they can still get the right movie based on a generic description of the synopsis.&lt;/p&gt;

&lt;p&gt;To do that, we’ll build a Spring Boot app from scratch and plug in &lt;strong&gt;Redis OM Spring&lt;/strong&gt;. It’ll handle turning our data into vectors, storing them in Redis, and running fast vector searches when users send a query.&lt;/p&gt;

&lt;h2&gt;
  
  
  Redis as a Vector Database
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Video: &lt;a href="https://www.youtube.com/watch?v=Yhv19le0sBw" rel="noopener noreferrer"&gt;What is a vector database?&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the last 15 years, Redis became the foundational infrastructure for realtime applications.  Today, with Redis 8, it’s commited to becoming the foundational infrastructure for AI applications as well. &lt;/p&gt;

&lt;p&gt;Redis 8 not only turns the community version of Redis into a Vector Database, but also makes it the fastest and most scalable database in the market today. &lt;strong&gt;Redis 8 allows you to scale to one billion vectors without penalizing latency.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Learn more: &lt;a href="https://redis.io/blog/searching-1-billion-vectors-with-redis-8/" rel="noopener noreferrer"&gt;https://redis.io/blog/searching-1-billion-vectors-with-redis-8/&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Redis OM Spring&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To allow our users and customers to take full advantage of everything Redis can do — with the speed Redis is known for — we decided to implement &lt;strong&gt;Redis OM Spring&lt;/strong&gt;, a library built on top of Spring Data Redis. &lt;/p&gt;

&lt;p&gt;Redis OM Spring allows our users to easily communicate with Redis, model their entities as &lt;strong&gt;JSONs&lt;/strong&gt; or Hashes, efficiently query them by levaraging the &lt;strong&gt;Redis Query Engine&lt;/strong&gt; and even take advantage of probabilistic data structures such as &lt;strong&gt;Count-min Sketch, Bloom Filters, Cuckoo Filters&lt;/strong&gt;, and more. &lt;/p&gt;

&lt;p&gt;Redis OM Spring on GitHub: &lt;a href="https://github.com/redis/redis-om-spring" rel="noopener noreferrer"&gt;https://github.com/redis/redis-om-spring&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Dataset
&lt;/h2&gt;

&lt;p&gt;The dataset we’ll be looking is a catalog of thousands of movies. Each of these movies has metadata such as its title, cast, genre, year, and synopsis. The JSON file representing this dataset can be found in the repository that accompanies this article.&lt;/p&gt;

&lt;p&gt;Sample:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "title": "Toy Story",
  "year": 1995,
  "cast": [
   "Tim Allen",
   "Tom Hanks",
   "Don Rickles"
  ],
  "genres": [
   "Animated",
   "Comedy"
  ],
  "href": "Toy_Story",
  "extract": "Toy Story is a 1995 American computer-animated comedy film directed by John Lasseter, produced by Pixar Animation Studios and released by Walt Disney Pictures. The first installment in the  Toy Story franchise, it was the first entirely computer-animated feature film, as well as the first feature film from Pixar. It was written by Joss Whedon, Andrew Stanton, Joel Cohen, and Alec Sokolow from a story by Lasseter, Stanton, Pete Docter, and Joe Ranft. The film features music by Randy Newman, was produced by Bonnie Arnold and Ralph Guggenheim, and was executive-produced by Steve Jobs and Edwin Catmull. The film features the voices of Tom Hanks, Tim Allen, Don Rickles, Jim Varney, Wallace Shawn, John Ratzenberger, Annie Potts, R. Lee Ermey, John Morris, Laurie Metcalf, and Erik von Detten.",
  "thumbnail": "https://upload.wikimedia.org/wikipedia/en/1/13/Toy_Story.jpg",
  "thumbnail_width": 250,
  "thumbnail_height": 373
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Building the Application
&lt;/h2&gt;

&lt;p&gt;Our application will be built using Spring Boot with Redis OM Spring. &lt;strong&gt;It will allow movies to be searched by their synopsis based on semantic search rather than keyword matching. **Besides that, our application will also allow its users to perform **hybrid search&lt;/strong&gt;, &lt;strong&gt;a technique that combines vector similarity with traditional filtering and sorting.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  0. GitHub Repository
&lt;/h3&gt;

&lt;p&gt;**The full application can be found on GitHub: &lt;a href="https://github.com/redis/redis-om-spring/tree/main/demos/roms-vss-movies/src" rel="noopener noreferrer"&gt;**https://github.com/redis/redis-om-spring/tree/main/demos/roms-vss-movies/&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  1. Add the required dependencies
&lt;/h3&gt;

&lt;p&gt;From a Spring Boot application, add the following dependencies to your Maven or Gradle file: &lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;!-- Redis OM Spring for Redis object mapping and vector search --&amp;gt;
&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;com.redis.om.spring&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;redis-om-spring&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;0.9.11&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;

&amp;lt;!-- Redis OM Spring uses Spring AI for creating embeddings (vectors) --&amp;gt;
&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;org.springframework.ai&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;spring-ai-openai&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;1.0.0-M6&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;org.springframework.ai&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;spring-ai-transformers&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;1.0.0-M6&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  2. Define the Movie entity
&lt;/h3&gt;

&lt;p&gt;Redis OM Spring provides two annotations that makes it easy to vectorize data and perform vector similarity search from within Spring Boot.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="mentioned-user" href="https://dev.to/vectorize"&gt;@vectorize&lt;/a&gt;: Automatically generates vector embeddings from the text field&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;@Indexed: Enables vector indexing on a field for efficient search&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The core of the implementation is the Movie class with Redis vector indexing annotations:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@RedisHash // This annotation is used by Redis OM Spring to store the entity as a hash in Redis
public class Movie {

    @Id // IDs are automatically generated by Redis OM Spring as ULID
    private String title;

    @Indexed(sortable = true) // This annotation enables indexing on the field for filtering and sorting
    private int year;

    @Indexed
    private List&amp;lt;String&amp;gt; cast;

    @Indexed
    private List&amp;lt;String&amp;gt; genres;

    private String href;

    // This annotation automatically generates vector embeddings from the text
    @Vectorize(
            destination = "embeddedExtract", // The field where the embedding will be stored
            embeddingType = EmbeddingType.SENTENCE, // Type of embedding to generate (Sentence, Image, face, or word)
            provider = EmbeddingProvider.OPENAI, // The provider for generating embeddings (OpenAI, Transformers, VertexAI, etc.)
            openAiEmbeddingModel = OpenAiApi.EmbeddingModel.TEXT_EMBEDDING_3_LARGE // The specific OpenAI model to use for embeddings
    )
    private String extract;

    // This defines the vector field that will store the embeddings
    // The indexed annotation enables vector search on this field
    @Indexed(
            schemaFieldType = SchemaFieldType.VECTOR, // Defines the field type as a vector
            algorithm = VectorField.VectorAlgorithm.FLAT, // The algorithm used for vector search (FLAT or HNSW)
            type = VectorType.FLOAT32,
            dimension = 3072, // The dimension of the vector (must match the embedding model)
            distanceMetric = DistanceMetric.COSINE, // The distance metric used for similarity search (Cosine or Euclidean)
            initialCapacity = 10
    )
    private byte[] embeddedExtract;

    private String thumbnail;
    private int thumbnailWidth;
    private int thumbnailHeight;

    // Getters and setters...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In this example we're using OpenAI's embedding model that requires an OpenAI API Key to be set in the &lt;code&gt;application.properties&lt;/code&gt; file of your application:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;redis.om.spring.ai.open-ai.api-key=${OPEN_AI_KEY}&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If an embedding model is not specified, Redis OM Spring will use a Hugging Face’s Transformers model (all-MiniLM-L6-v2) by default. In this case, make sure you match the number of dimensions in the indexed annotation to 384 which is the number of dimensions created by the default embedding model.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Repository Interface
&lt;/h3&gt;

&lt;p&gt;A simple repository interface that extends RedisEnhancedRepository. This will be used to load the data into Redis using the saveAll() method:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public interface MovieRepository extends RedisEnhancedRepository&amp;lt;Movie, String&amp;gt; {}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This provides basic CRUD operations for Movie entities, with the first generic parameter being the entity type and the second being the ID type.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;4. Search Service&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;The search service uses two beans provided by Redis OM Spring:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;EntityStream: For creating a stream of entities to perform searches. The Entity Stream must not be confused with the Java Streams API. The Entity Stream will generate a Redis Command that will be sent to Redis so that Redis can perform the searching, filtering and sorting efficiently on its side.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Embedder: Used for generating the embedding for the query sent by the user. It will be generated following the configuration of the &lt;a class="mentioned-user" href="https://dev.to/vectorize"&gt;@vectorize&lt;/a&gt; annotation defined in the Movie class/&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The search functionality is implemented in the SearchService:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Service
public class SearchService {

    private static final Logger logger = LoggerFactory.getLogger(SearchService.class);
    private final EntityStream entityStream;
    private final Embedder embedder;

    public SearchService(EntityStream entityStream, Embedder embedder) {
        this.entityStream = entityStream;
        this.embedder = embedder;
    }

    public List&amp;lt;Pair&amp;lt;Movie, Double&amp;gt;&amp;gt; search(
            String query,
            Integer yearMin,
            Integer yearMax,
            List&amp;lt;String&amp;gt; cast,
            List&amp;lt;String&amp;gt; genres,
            Integer numberOfNearestNeighbors) {
        logger.info("Received text: {}", query);
        logger.info("Received yearMin: {} yearMax: {}", yearMin, yearMax);
        logger.info("Received cast: {}", cast);
        logger.info("Received genres: {}", genres);

        if (numberOfNearestNeighbors == null) numberOfNearestNeighbors = 3;
        if (yearMin == null) yearMin = 1900;
        if (yearMax == null) yearMax = 2100;

        // Convert query text to vector embedding
        byte[] embeddedQuery = embedder.getTextEmbeddingsAsBytes(List.of(query), Movie$.EXTRACT).getFirst();

        // Perform vector search with additional filters
        SearchStream&amp;lt;Movie&amp;gt; stream = entityStream.of(Movie.class);
        return stream
                // KNN search for nearest vectors
                .filter(Movie$.EMBEDDED_EXTRACT.knn(numberOfNearestNeighbors, embeddedQuery))
                // Additional metadata filters (hybrid search)
                .filter(Movie$.YEAR.between(yearMin, yearMax))
                .filter(Movie$.CAST.eq(cast))
                .filter(Movie$.GENRES.eq(genres))
                // Sort by similarity score
                .sorted(Movie$._EMBEDDED_EXTRACT_SCORE)
                // Return both the movie and its similarity score
                .map(Fields.of(Movie$._THIS, Movie$._EMBEDDED_EXTRACT_SCORE))
                .collect(Collectors.toList());
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Key features of the search service:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Uses EntityStream to create a search stream for Movie entities&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Converts the text query into a vector embedding&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Uses K-nearest neighbors (KNN) search to find similar vectors&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Applies additional filters for hybrid search (combining vector and traditional search)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Returns pairs of movies and their similarity scores&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  5. Movie Service for Data Loading
&lt;/h3&gt;

&lt;p&gt;The MovieService handles loading movie data into Redis. It reads a JSON file containing movie date and save the movies into Redis. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;It may take one or two minutes to load the data for the thousands of movies in the file because the embedding generation is done in the background. The &lt;a class="mentioned-user" href="https://dev.to/vectorize"&gt;@vectorize&lt;/a&gt; annotation will generate the embeddings for the extract field before the movie is saved into Redis.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Service
public class MovieService {

    private static final Logger log = LoggerFactory.getLogger(MovieService.class);
    private final ObjectMapper objectMapper;
    private final ResourceLoader resourceLoader;
    private final MovieRepository movieRepository;

    public MovieService(ObjectMapper objectMapper, ResourceLoader resourceLoader, MovieRepository movieRepository) {
        this.objectMapper = objectMapper;
        this.resourceLoader = resourceLoader;
        this.movieRepository = movieRepository;
    }

    public void loadAndSaveMovies(String filePath) throws Exception {
        Resource resource = resourceLoader.getResource("classpath:" + filePath);
        try (InputStream is = resource.getInputStream()) {
            List&amp;lt;Movie&amp;gt; movies = objectMapper.readValue(is, new TypeReference&amp;lt;&amp;gt;() {});
            List&amp;lt;Movie&amp;gt; unprocessedMovies = movies.stream()
                    .filter(movie -&amp;gt; !movieRepository.existsById(movie.getTitle()) &amp;amp;&amp;amp;
                            movie.getYear() &amp;gt; 1980
                    ).toList();
            long systemMillis = System.currentTimeMillis();
            movieRepository.saveAll(unprocessedMovies);
            long elapsedMillis = System.currentTimeMillis() - systemMillis;
            log.info("Saved " + movies.size() + " movies in " + elapsedMillis + " ms");
        }
    }

    public boolean isDataLoaded() {
        return movieRepository.count() &amp;gt; 0;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  5. Search Controller
&lt;/h3&gt;

&lt;p&gt;The REST controller exposes the search endpoint:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@RestController
public class SearchController {

    private final SearchService searchService;

    public SearchController(SearchService searchService) {
        this.searchService = searchService;
    }

    @GetMapping("/search")
    public Map&amp;lt;String, Object&amp;gt; search(
            @RequestParam(required = false) String text,
            @RequestParam(required = false) Integer yearMin,
            @RequestParam(required = false) Integer yearMax,
            @RequestParam(required = false) List&amp;lt;String&amp;gt; cast,
            @RequestParam(required = false) List&amp;lt;String&amp;gt; genres,
            @RequestParam(required = false) Integer numberOfNearestNeighbors
    ) {
        List&amp;lt;Pair&amp;lt;Movie, Double&amp;gt;&amp;gt; matchedMovies = searchService.search(
                text,
                yearMin,
                yearMax,
                cast,
                genres,
                numberOfNearestNeighbors
        );
        return Map.of(
                "matchedMovies", matchedMovies,
                "count", matchedMovies.size()
        );
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  6. Application Bootstrap
&lt;/h3&gt;

&lt;p&gt;The main application class initializes Redis OM Spring and loads data. The @EnableRedisEnhancedRepositories annotation activates Redis OM Spring's repository support:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@SpringBootApplication
@EnableRedisEnhancedRepositories(basePackages = {"dev.raphaeldelio.redis8demo*"})
public class Redis8DemoVectorSimilaritySearchApplication {

    public static void main(String[] args) {
        SpringApplication.run(Redis8DemoVectorSimilaritySearchApplication.class, args);
    }

    @Bean
    CommandLineRunner loadData(MovieService movieService) {
        return args -&amp;gt; {
            if (movieService.isDataLoaded()) {
                System.out.println("Data already loaded. Skipping data load.");
                return;
            }
            movieService.loadAndSaveMovies("movies.json");
        };
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  7. Sample Requests
&lt;/h3&gt;

&lt;p&gt;You can make requests to the search endpoint:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET http://localhost:8082/search?text=A movie about a young boy who goes to a wizardry school

GET http://localhost:8082/search?numberOfNearestNeighbors=1&amp;amp;yearMin=1970&amp;amp;yearMax=1990&amp;amp;text=A movie about a kid and a scientist who go back in time

GET http://localhost:8082/search?cast=Dee Wallace,Henry Thomas&amp;amp;text=A boy who becomes friend with an alien
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Sample request:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET http://localhost:8082/search?numberOfNearestNeighbors=1&amp;amp;yearMin=1970&amp;amp;yearMax=1990&amp;amp;text=A movie about a kid and a scientist who go back in time
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Sample response:&lt;/strong&gt;&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{&lt;br&gt;
  "count": 1,&lt;br&gt;
  "matchedMovies": [&lt;br&gt;
    {&lt;br&gt;
      "first": { // matched movie&lt;br&gt;
        "title": "Back to the Future",&lt;br&gt;
        "year": 1985,&lt;br&gt;
        "cast": [&lt;br&gt;
          "Michael J. Fox",&lt;br&gt;
          "Christopher Lloyd"&lt;br&gt;
        ],&lt;br&gt;
        "genres": [&lt;br&gt;
          "Science Fiction"&lt;br&gt;
        ],&lt;br&gt;
        "extract": "Back to the Future is a 1985 American science fiction film directed by Robert Zemeckis and written by Zemeckis, and Bob Gale. It stars Michael J. Fox, Christopher Lloyd, Lea Thompson, Crispin Glover, and Thomas F. Wilson. Set in 1985, it follows Marty McFly (Fox), a teenager accidentally sent back to 1955 in a time-traveling DeLorean automobile built by his eccentric scientist friend Emmett \"Doc\" Brown (Lloyd), where he inadvertently prevents his future parents from falling in love – threatening his own existence – and is forced to reconcile them and somehow get back to the future.",&lt;br&gt;
        "thumbnail": "&lt;a href="https://upload.wikimedia.org/wikipedia/en/d/d2/Back_to_the_Future.jpg" rel="noopener noreferrer"&gt;https://upload.wikimedia.org/wikipedia/en/d/d2/Back_to_the_Future.jpg&lt;/a&gt;"&lt;br&gt;
      },&lt;br&gt;
      "second": 0.463297247887 // similarity score (the lowest the closest)&lt;br&gt;
    }&lt;br&gt;
  ]&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Wrapping up&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;And that’s it — you now have a working semantic search app using Spring Boot and Redis. &lt;/p&gt;

&lt;p&gt;Instead of relying on exact keyword matches, your app understands the meaning behind the query. Redis handles the heavy part: embedding storage, similarity search, and even traditional filters — all at lightning speed.&lt;/p&gt;

&lt;p&gt;With Redis OM Spring, you get an easy way to integrate this into your Java apps. You only need two annotations: &lt;a class="mentioned-user" href="https://dev.to/vectorize"&gt;@vectorize&lt;/a&gt; and @Indexed and two Beans: EntityStream and Embedder. &lt;/p&gt;

&lt;p&gt;Whether you’re building search, recommendations, or AI-powered assistants, this setup gives you a solid and scalable foundation.&lt;/p&gt;

&lt;p&gt;Try it out, tweak the filters, explore other models, and see how far you can go!&lt;/p&gt;

&lt;h3&gt;
  
  
  More AI Resources
&lt;/h3&gt;

&lt;p&gt;The best way to stay on the path of learning AI is by following the recipes available on the Redis AI Resources GitHub repository. There you can find dozens of recipes that will get you to start building AI apps, fast!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/redis-developer/redis-ai-resources/tree/main" rel="noopener noreferrer"&gt;&lt;strong&gt;GitHub - redis-developer/redis-ai-resources: ✨ A curated list of awesome community resources, integrations, and examples of Redis in the AI ecosystem.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Stay Curious!
&lt;/h3&gt;

</description>
      <category>java</category>
      <category>vectordatabase</category>
      <category>springboot</category>
      <category>redis</category>
    </item>
    <item>
      <title>Token Bucket Rate Limiter (Redis &amp; Java)</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Mon, 13 Jan 2025 14:15:29 +0000</pubDate>
      <link>https://dev.to/redis/token-bucket-rate-limiter-redis-java-4pi3</link>
      <guid>https://dev.to/redis/token-bucket-rate-limiter-redis-java-4pi3</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://youtu.be/cfF6nXIpDwE" rel="noopener noreferrer"&gt;This article is also available on YouTube!&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft58edaghnxrbmjdyhkf7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ft58edaghnxrbmjdyhkf7.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Token Bucket&lt;/strong&gt; algorithm is a flexible and efficient rate-limiting mechanism. It works by filling a bucket with tokens at a fixed rate (e.g., one token per second). Each request consumes a token, and if no tokens are available, the request is rejected. The bucket has a maximum capacity, so it can handle bursts of traffic as long as the burst doesn’t exceed the number of tokens in the bucket.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Looking for a different rate limiter algorithm? &lt;a href="https://raphaeldelio.medium.com/rate-limiting-with-redis-an-essential-guide-df798b1c63db?source=user_profile_page---------1-------------17e03c232bd9---------------" rel="noopener noreferrer"&gt;Check the essential guide.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Index
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How the Token Bucket Rate Limiter Works&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Implementation with Redis and Java&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Testing with TestContainers and AssertJ&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conclusion (GitHub Repo)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2160%2F1%2A7cDKq5yh5RD0ygvb3mVwfQ.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F2160%2F1%2A7cDKq5yh5RD0ygvb3mVwfQ.gif" width="1080" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Define a Token Refill Rate&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Set a rate at which tokens are added to the bucket, such as 1 token per second or 10 tokens per minute.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Track Token Consumption&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;For each incoming request, deduct one token from the bucket.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Refill Tokens&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Continuously refill the bucket at the defined rate, up to its maximum capacity, ensuring unused tokens can accumulate for future bursts.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Rate Limit Check&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before processing a request, check if there are enough tokens in the bucket. If the bucket is empty, reject the request until tokens are replenished.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Implement It with Redis and Java
&lt;/h2&gt;

&lt;p&gt;For the &lt;strong&gt;Token Bucket Rate Limiter&lt;/strong&gt;, Redis provides an efficient way to track tokens and implement the algorithm. Here’s how to do it:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Retrieve current token count and last refill time
&lt;/h3&gt;

&lt;p&gt;First, retrieve the current token count and the last refill time:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET rate_limit:&amp;lt;clientId&amp;gt;:count  
GET rate_limit:&amp;lt;clientId&amp;gt;:lastRefill  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If these keys don’t exist, initialize the token count to the bucket’s maximum capacity and set the current time as the last refill time using SET.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Refill tokens if necessary and update the bucket
&lt;/h3&gt;

&lt;p&gt;Update the token count and last refill date time after processing each request:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SET rate_limit:&amp;lt;clientId&amp;gt;:count &amp;lt;new_token_count&amp;gt;  
SET rate_limit:&amp;lt;clientId&amp;gt;:lastRefill &amp;lt;current_time&amp;gt;  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  3. Allow or reject the request
&lt;/h3&gt;

&lt;p&gt;If tokens are available, allow the request and decrement the count by one using:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;DECR rate_limit:&amp;lt;clientId&amp;gt;:count
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Implementing it with Jedis
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Jedis&lt;/strong&gt; is a popular Java library used to interact with **Redis **and we will use it for implementing our rate limiter because it provides a simple and intuitive API for executing Redis commands from JVM applications.&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Add Jedis to Your Maven File&lt;/strong&gt;:
&lt;/h3&gt;

&lt;p&gt;Check the latest version &lt;a href="https://redis.io/docs/latest/develop/clients/jedis/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;redis.clients&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;jedis&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;5.2.0&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Create a &lt;strong&gt;TokenBucketRateLimiter&lt;/strong&gt; class:
&lt;/h3&gt;

&lt;p&gt;The class will take:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Accept a Jedis instance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Define the maximum capacity of the token bucket.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Specify the token refill rate (tokens per second).&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    package io.redis;

    import redis.clients.jedis.Jedis;
    import redis.clients.jedis.Transaction;

    public class TokenBucketRateLimiter {
        private final Jedis jedis;
        private final int bucketCapacity; // Maximum tokens the bucket can hold
        private final double refillRate; // Tokens refilled per second

        public TokenBucketRateLimiter(Jedis jedis, int bucketCapacity, double refillRate) {
            this.jedis = jedis;
            this.bucketCapacity = bucketCapacity;
            this.refillRate = refillRate;
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Validate the Requests
&lt;/h3&gt;

&lt;p&gt;The main task of this rate limiter is to determine whether a client has sufficient tokens to process their request. If yes, the request is allowed, and tokens are deducted. If not, the request is blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Generate the keys&lt;/strong&gt;&lt;br&gt;
We’ll store each client’s token count and last refill time in Redis using unique keys. The keys will look like this:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public boolean isAllowed(String clientId) {
    String keyCount = "rate_limit:" + clientId + ":count";
    String keyLastRefill = "rate_limit:" + clientId + ":lastRefill";
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;For example, if the client ID is user123, their keys would be rate_limit:user123:count and rate_limit:user123:lastRefill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Fetch Current State&lt;/strong&gt;&lt;br&gt;
We use Redis’s GET command to retrieve the current token count and the last refill time. If the keys don’t exist, we assume the bucket is full, and the last refill time is the current timestamp.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public boolean isAllowed(String clientId) {
    String keyCount = "rate_limit:" + clientId + ":count";
    String keyLastRefill = "rate_limit:" + clientId + ":lastRefill";

    Transaction transaction = jedis.multi();
    transaction.get(keyLastRefill);
    transaction.get(keyCount);
    var results = transaction.exec();

    long currentTime = System.currentTimeMillis();
    long lastRefillTime = results.get(0) != null ? Long.parseLong((String) results.get(0)) : currentTime;
    int tokenCount = results.get(1) != null ? Integer.parseInt((String) results.get(1)) : bucketCapacity;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 3: Refill Tokens&lt;/strong&gt;&lt;br&gt;
Calculate how many tokens should be added based on the time elapsed since the last refill. Ensure the bucket doesn’t exceed its maximum capacity.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;long elapsedTimeMs = currentTime - lastRefillTime;
double elapsedTimeSecs = elapsedTimeMs / 1000.0;
int tokensToAdd = (int) (elapsedTimeSecs * refillRate);

tokenCount = Math.min(bucketCapacity, tokenCount + tokensToAdd);
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 4: Check Token Availability&lt;/strong&gt;&lt;br&gt;
Compare the current token count to determine if the request can be allowed. &lt;strong&gt;If tokens are available, deduct one token; otherwise, block the request.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;boolean isAllowed = tokenCount &amp;gt; 0;

if (isAllowed) {
    tokenCount--;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Step 5: Update Redis&lt;/strong&gt;&lt;br&gt;
We update the token count and last refill time in Redis. Use a transaction to ensure atomic updates:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Transaction transaction = jedis.multi();
transaction.set(keyLastRefill, String.valueOf(currentTime)); // Update last refill time
transaction.set(keyCount, String.valueOf(tokenCount));       // Update token count
transaction.exec();
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Complete Implementation
&lt;/h3&gt;

&lt;p&gt;Here’s the full code for the FixedWindowRateLimiter class:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package io.redis;

import redis.clients.jedis.Jedis;
import redis.clients.jedis.Transaction;

public class TokenBucketRateLimiter {
    private final Jedis jedis;
    private final int bucketCapacity; // Maximum tokens the bucket can hold
    private final double refillRate; // Tokens refilled per second

    public TokenBucketRateLimiter(Jedis jedis, int bucketCapacity, double refillRate) {
        this.jedis = jedis;
        this.bucketCapacity = bucketCapacity;
        this.refillRate = refillRate;
    }

    public boolean isAllowed(String clientId) {
        String keyCount = "rate_limit:" + clientId + ":count";
        String keyLastRefill = "rate_limit:" + clientId + ":lastRefill";

        long currentTime = System.currentTimeMillis();

        // Fetch current state
        Transaction transaction = jedis.multi();
        transaction.get(keyLastRefill);
        transaction.get(keyCount);
        var results = transaction.exec();

        long lastRefillTime = results.get(0) != null ? Long.parseLong((String) results.get(0)) : currentTime;
        int tokenCount = results.get(1) != null ? Integer.parseInt((String) results.get(1)) : bucketCapacity;

        // Refill tokens
        long elapsedTimeMs = currentTime - lastRefillTime;
        double elapsedTimeSecs = elapsedTimeMs / 1000.0;
        int tokensToAdd = (int) (elapsedTimeSecs * refillRate);
        tokenCount = Math.min(bucketCapacity, tokenCount + tokensToAdd);

        // Check if the request is allowed
        boolean isAllowed = tokenCount &amp;gt; 0;

        if (isAllowed) {
            tokenCount--; // Consume one token
        }

        // Update Redis state
        transaction = jedis.multi();
        transaction.set(keyLastRefill, String.valueOf(currentTime));
        transaction.set(keyCount, String.valueOf(tokenCount));
        transaction.exec();

        return isAllowed;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;And we’re ready to start testing it’s behavior!&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing our Rate Limiter
&lt;/h2&gt;

&lt;p&gt;To ensure our Token Bucket Rate Limiter behaves as expected, we’ll write tests for various scenarios. For this, we’ll use three tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis TestContainers&lt;/strong&gt;: This library spins up an isolated Redis container for testing. This means we don’t need to rely on an external Redis server during our tests. Once the tests are done, the container is stopped, leaving no leftover data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JUnit 5&lt;/strong&gt;: Our main testing framework, which helps us define and structure tests with lifecycle methods like @BeforeEach and @AfterEach.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AssertJ&lt;/strong&gt;: A library that makes assertions readable and expressive, like assertThat(result).isTrue().&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s begin by adding the necessary dependencies to our pom.xml.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Dependencies
&lt;/h3&gt;

&lt;p&gt;Here’s what you’ll need in your Maven pom.xml file:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;org.junit.jupiter&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;junit-jupiter-engine&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;5.10.0&amp;lt;/version&amp;gt;
    &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;com.redis&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;testcontainers-redis&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;2.2.2&amp;lt;/version&amp;gt;
    &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
&amp;lt;/dependency&amp;gt;
&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;org.assertj&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;assertj-core&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;3.11.1&amp;lt;/version&amp;gt;
    &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Once you’ve added these dependencies, you’re ready to start writing your test class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up the Test Class
&lt;/h3&gt;

&lt;p&gt;The first step is to create a test class named FixedWindowRateLimiterTest. Inside, we’ll define three main components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis Test Container&lt;/strong&gt;: This launches a Redis instance in a Docker container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Jedis Instance&lt;/strong&gt;: This connects to the Redis container for sending commands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate Limiter&lt;/strong&gt;: The actual TokenBucketRateLimiter instance we’re testing.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s how the skeleton of our test class looks:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class TokenBucketRateLimiterTest {

    private static RedisContainer redisContainer;
    private Jedis jedis;
    private TokenBucketRateLimiter rateLimiter;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Preparing the Environment Before Each Test
&lt;/h3&gt;

&lt;p&gt;Before running any test, we need to ensure a clean Redis environment. Here’s what we’ll do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Connect to Redis&lt;/strong&gt;: Use a Jedis instance to connect to the Redis container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flush Data&lt;/strong&gt;: Clear any leftover data in Redis to ensure consistent results for each test.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll set this up in a method annotated with @BeforeEach, which runs before every test case.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@BeforeAll
static void startContainer() {
    redisContainer = new RedisContainer("redis:latest");
    redisContainer.withExposedPorts(6379).start();
}

@BeforeEach
void setup() {
    jedis = new Jedis(redisContainer.getHost(), redisContainer.getFirstMappedPort());
    jedis.flushAll();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;FLUSHALL is an actual Redis command that deletes all the keys of all the existing databases. &lt;a href="https://redis.io/docs/latest/commands/flushall/" rel="noopener noreferrer"&gt;Read more about it in the official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Cleaning Up After Each Test
&lt;/h3&gt;

&lt;p&gt;After each test, we need to close the Jedis connection to free up resources. This ensures no lingering connections interfere with subsequent tests.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@AfterEach
void tearDown() {
    jedis.close();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Full Setup
&lt;/h3&gt;

&lt;p&gt;Here’s how the complete test class looks with everything in place:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class TokenBucketRateLimiterTest {

    private static RedisContainer redisContainer;
    private Jedis jedis;
    private TokenBucketRateLimiter rateLimiter;

    @BeforeAll
    static void startContainer() {
        redisContainer = new RedisContainer("redis:latest");
        redisContainer.withExposedPorts(6379).start();
    }

    @AfterAll
    static void stopContainer() {
        redisContainer.stop();
    }

    @BeforeEach
    void setup() {
        jedis = new Jedis(redisContainer.getHost(), redisContainer.getFirstMappedPort());
        jedis.flushAll();
    }

    @AfterEach
    void tearDown() {
        jedis.close();
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Verifying Requests Within the Bucket Capacity
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter allows requests within the defined bucket capacity.&lt;/p&gt;

&lt;p&gt;We configure it with a &lt;strong&gt;capacity of&lt;/strong&gt; &lt;strong&gt;5 tokens&lt;/strong&gt; and a &lt;strong&gt;refill rate of one token per second&lt;/strong&gt;, then call isAllowed(“client-1”) &lt;strong&gt;5 times&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Each call should return true, confirming the rate limiter correctly tracks and permits requests within the capacity.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Test
void shouldAllowRequestsWithinBucketCapacity() {
    rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);
    for (int i = 1; i &amp;lt;= 5; i++) {
        assertThat(rateLimiter.isAllowed("client-1"))
            .withFailMessage("Request %d should be allowed within bucket capacity", i)
            .isTrue();
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Verifying Requests Are Denied When Bucket is Empty
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter correctly denies requests once the bucket is empty.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;capacity of&lt;/strong&gt; &lt;strong&gt;5 tokens&lt;/strong&gt; and a &lt;strong&gt;refill rate of one token per second&lt;/strong&gt;, we isAllowed(“client-1”) &lt;strong&gt;5 times&lt;/strong&gt; and expect all to return true.&lt;/p&gt;

&lt;p&gt;On the 6th call, it should return false, verifying the rate limiter blocks requests once the bucket is empty.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Test
void shouldDenyRequestsOnceBucketIsEmpty() {
    rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);
    for (int i = 1; i &amp;lt;= 5; i++) {
        assertThat(rateLimiter.isAllowed("client-1"))
            .withFailMessage("Request %d should be allowed within bucket capacity", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed("client-1"))
        .withFailMessage("Request beyond bucket capacity should be denied")
        .isFalse();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Verifying Bucket is Gradually Refilled
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter refills the bucket correctly after every second.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;capacity of&lt;/strong&gt; &lt;strong&gt;5 tokens&lt;/strong&gt; and a &lt;strong&gt;refill rate of one token per second&lt;/strong&gt;, the first 5 requests (isAllowed(“client-1”)) return true, while the 6th request is denied (false).&lt;/p&gt;

&lt;p&gt;After waiting for two seconds, the next two requests are allowed and the third one is denied. Confirming the refilling behavior works as expected.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    void shouldRefillTokensGraduallyAndAllowRequestsOverTime() throws InterruptedException {
        rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);
        String clientId = "client-1";

        for (int i = 1; i &amp;lt;= 5; i++) {
            assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request %d should be allowed within bucket capacity", i)
                .isTrue();
        }
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request beyond bucket capacity should be denied")
            .isFalse();

        TimeUnit.SECONDS.sleep(2);

        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request after partial refill should be allowed")
            .isTrue();
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Second request after partial refill should be allowed")
            .isTrue();
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request beyond available tokens should be denied")
            .isFalse();
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Verifying Independent Handling of Multiple Clients
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter handles multiple clients independently.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;capacity of&lt;/strong&gt; &lt;strong&gt;5 tokens&lt;/strong&gt; and a &lt;strong&gt;refill rate of one token per second&lt;/strong&gt;, the first 5 requests (isAllowed(“client-1”)) return true, while the 6th request is denied (false).&lt;/p&gt;

&lt;p&gt;Simultaneously, all 5 requests from &lt;strong&gt;client-2&lt;/strong&gt; are allowed (true), confirming the rate limiter maintains separate counters for each client.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Test
void shouldHandleMultipleClientsIndependently() {
    rateLimiter = new TokenBucketRateLimiter(jedis, 5, 1.0);

    String clientId1 = "client-1";
    String clientId2 = "client-2";

    for (int i = 1; i &amp;lt;= 5; i++) {
        assertThat(rateLimiter.isAllowed(clientId1))
            .withFailMessage("Client 1 request %d should be allowed", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId1))
        .withFailMessage("Client 1 request beyond bucket capacity should be denied")
        .isFalse();

    for (int i = 1; i &amp;lt;= 5; i++) {
        assertThat(rateLimiter.isAllowed(clientId2))
            .withFailMessage("Client 2 request %d should be allowed", i)
            .isTrue();
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Verifying Token Refill Does Not Exceed Bucket Capacity
&lt;/h3&gt;

&lt;p&gt;This test verifies that the token bucket rate limiter correctly refills tokens up to the defined capacity without exceeding it.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;capacity of 3 tokens&lt;/strong&gt; and a &lt;strong&gt;refill rate of 2 tokens per second&lt;/strong&gt;, the first 3 requests (isAllowed(“client-1”)) return true, while the 4th request is denied (false), indicating the bucket is empty.&lt;/p&gt;

&lt;p&gt;After waiting 3 seconds (enough to refill 6 tokens), the bucket refills only up to its maximum capacity of 3 tokens. The next 3 requests are allowed (true), but any additional request is denied (false), confirming that the rate limiter maintains the specified capacity limit regardless of refill surplus.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Test
void shouldRefillTokensUpToCapacityWithoutExceedingIt() throws InterruptedException {
    int capacity = 3;
    double refillRate = 2.0;
    String clientId = "client-1";
    rateLimiter = new TokenBucketRateLimiter(jedis, capacity, refillRate);

    for (int i = 1; i &amp;lt;= capacity; i++) {
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request %d should be allowed within initial bucket capacity", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId))
        .withFailMessage("Request beyond bucket capacity should be denied")
        .isFalse();

    TimeUnit.SECONDS.sleep(3);

    for (int i = 1; i &amp;lt;= capacity; i++) {
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request %d should be allowed as bucket refills up to capacity", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId))
        .withFailMessage("Request beyond bucket capacity should be denied")
        .isFalse();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Verifying Denied Requests Do Not Affect Token Count
&lt;/h3&gt;

&lt;p&gt;This test ensures that the token bucket rate limiter does not count denied requests when updating the token count.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;capacity of 3 tokens&lt;/strong&gt; and a &lt;strong&gt;refill rate of 0.5 tokens per second&lt;/strong&gt;, the first 3 requests (isAllowed(“client-1”)) are allowed (true), depleting the bucket. The 4th request is denied (false), confirming the bucket is empty.&lt;/p&gt;

&lt;p&gt;The Redis token count (rate_limit:client-1:count) is then verified to ensure it accurately reflects the remaining tokens (0 in this case) and does not include denied requests. This confirms that the rate limiter updates the token count only when requests are successfully processed.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Test
void testRateLimitDeniedRequestsAreNotCounted() {
    int capacity = 3;
    double refillRate = 0.5;
    String clientId = "client-1";
    rateLimiter = new TokenBucketRateLimiter(jedis, capacity, refillRate);

    for (int i = 1; i &amp;lt;= capacity; i++) {
        assertThat(rateLimiter.isAllowed(clientId))
            .withFailMessage("Request %d should be allowed", i)
            .isTrue();
    }
    assertThat(rateLimiter.isAllowed(clientId))
        .withFailMessage("This request should be denied")
        .isFalse();

    String key = "rate_limit:" + clientId + ":count";
    int requestCount = Integer.parseInt(jedis.get(key));
    assertThat(requestCount)
        .withFailMessage("The count should match remaining tokens and not include denied requests")
        .isEqualTo(0);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Is there any other behavior we should verify? Let me know in the comments!&lt;/p&gt;

&lt;p&gt;The Token Bucket Rate Limiter is a flexible and efficient way to manage request rates, and &lt;strong&gt;Redis&lt;/strong&gt; makes it incredibly fast and reliable.&lt;/p&gt;

&lt;p&gt;By leveraging commands like GET, SET, and MULTI/EXEC, we implemented a solution that tracks token counts, refills tokens dynamically based on time elapsed, and ensures the bucket never exceeds its defined capacity.&lt;/p&gt;

&lt;p&gt;Using &lt;strong&gt;Jedis&lt;/strong&gt;, we built a clear and intuitive &lt;strong&gt;Java&lt;/strong&gt; implementation, and with thorough testing using Redis TestContainers, JUnit 5, and AssertJ, we can confidently verify that it works as expected.&lt;/p&gt;

&lt;p&gt;This approach offers a robust foundation for managing request limits while allowing for burst handling and gradual refill, making it adaptable for more advanced rate-limiting scenarios when needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Repo
&lt;/h3&gt;

&lt;p&gt;You can find this implementation in &lt;strong&gt;Java&lt;/strong&gt; and &lt;strong&gt;Kotlin&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Java (&lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-java-example/tree/main/src/main/java/io/redis" rel="noopener noreferrer"&gt;Implementation&lt;/a&gt;, &lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-java-example/blob/main/src/test/java/io/redis/TokenBucketRateLimiterTest.java" rel="noopener noreferrer"&gt;Test&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kotlin (&lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-kotlin-example/blob/main/src/main/kotlin/org/example/TokenBucketRateLimiter.kt" rel="noopener noreferrer"&gt;Implementation&lt;/a&gt;, &lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-kotlin-example/blob/main/src/test/kotlin/org/example/TokenBucketRateLimiterTest.kt" rel="noopener noreferrer"&gt;Test&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stay Curious!
&lt;/h3&gt;

</description>
      <category>redis</category>
      <category>java</category>
      <category>systemdesign</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Fixed Window Counter Rate Limiter (Redis &amp; Java)</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Mon, 30 Dec 2024 13:30:24 +0000</pubDate>
      <link>https://dev.to/redis/fixed-window-counter-rate-limiter-redis-java-dik</link>
      <guid>https://dev.to/redis/fixed-window-counter-rate-limiter-redis-java-dik</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://youtu.be/Ki3WKSNpdRU" rel="noopener noreferrer"&gt;This article is also available on YouTube!&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2j5ljyqccpn0v4aa2kkv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2j5ljyqccpn0v4aa2kkv.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;Fixed Window Counter&lt;/strong&gt; is the simplest and most straightforward rate-limiting algorithm. It divides time into fixed intervals (e.g., seconds, minutes, or hours) and counts the number of requests within each interval. If the count exceeds a predefined threshold, the requests are rejected until the next interval begins.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Looking for a more precise algorithm? Take a look at the Sliding Window Log implementation. (Coming soon)&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Index
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Introduction&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;How the Fixed Window Counter Rate Limiter Works&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Implementation with Redis and Java&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Testing with TestContainers and AssertJ&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Conclusion (GitHub Repo)&lt;/strong&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;How It Works&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5vldnjqp6aos1afq9et.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi5vldnjqp6aos1afq9et.gif" width="1080" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;1. Define a Window Interval&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Choose a time interval, such as 1 second, 1 minute, or 1 hour.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Track Requests&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Use a counter to track the number of requests made during the current window.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Reset Counter:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;At the end of the time window, reset the counter to zero and start counting again for the new window.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;strong&gt;Rate Limit Check:&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Compare the counter against the allowed limit. If it exceeds the limit, reject further requests until the next window.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Implement It with Redis and Java
&lt;/h2&gt;

&lt;p&gt;There are two ways to implement the Fixed Rate Limiter with Redis. The simplest way is by:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Use the INCR command to increment the counter in Redis each time a request is allowed
&lt;/h3&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCR my_counter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If there's no counter set yet, the INCR command will create one as zero and then increment it to one.&lt;/p&gt;

&lt;p&gt;If the counter is already set, the INCR commany will simply increment it by one.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Set the key to expire in one minute if it’s newly created
&lt;/h3&gt;

&lt;p&gt;If the counter doesn’t exist, we need to set a time-to-live to ensure the time window lasts only for the specified period. &lt;strong&gt;But we should only set an expiration if it doesn’t already exist&lt;/strong&gt;. Otherwise, Redis would reset the expiration, and older requests could be counted beyond the allowed time.&lt;/p&gt;

&lt;p&gt;We’ll use the EXPIRE command with the NX flag on the key. &lt;strong&gt;The NX flag ensures the expiration is only set if the key doesn’t already have one.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This approach is smart because the counter will only track requests during the key’s lifespan. &lt;strong&gt;Once the key expires and is removed, the counter resets, ensuring we only account for requests within the intended time window.&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;EXPIRE my_counter 60 NX
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  3. Check the counter for each new request
&lt;/h3&gt;

&lt;p&gt;When a new request comes in, check the counter to see how many requests have been made. If it’s below the threshold, allow the process and increment the counter. If not, block the process from proceeding.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If the key doesn’t exist, assume the counter starts at 0.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;GET my_counter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Cool! Now that we understand the basics of our implementation, let’s implement it in Java with Jedis.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementing it with Jedis
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Jedis&lt;/strong&gt; is a popular Java library used to interact with **Redis **and we will use it for implementing our rate because it provides a simple and intuitive API for executing Redis commands from JVM applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  Start by adding the Jedis library to your Maven file:
&lt;/h3&gt;

&lt;p&gt;Check the latest version &lt;a href="https://redis.io/docs/latest/develop/clients/jedis/" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    &amp;lt;dependency&amp;gt;
        &amp;lt;groupId&amp;gt;redis.clients&amp;lt;/groupId&amp;gt;
        &amp;lt;artifactId&amp;gt;jedis&amp;lt;/artifactId&amp;gt;
        &amp;lt;version&amp;gt;5.2.0&amp;lt;/version&amp;gt;
    &amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Create a FixedWindowRateLimiter class:
&lt;/h3&gt;

&lt;p&gt;The class will take:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;A Jedis instance.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;A time window size (e.g., 60 seconds).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The maximum number of allowed requests.&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    package io.redis;

    import redis.clients.jedis.Jedis;
    import redis.clients.jedis.Transaction;
    import redis.clients.jedis.args.ExpiryOption;

    public class FixedWindowRateLimiter {

        private final Jedis jedis;
        private final int windowSize;
        private final int limit;

        public FixedWindowRateLimiter(Jedis jedis, long windowSize, int limit) {
            this.jedis = jedis;
            this.limit = limit;
            this.windowSize = windowSize;
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Validate the Requests
&lt;/h3&gt;

&lt;p&gt;The main job of this rate limiter is to check if a client is within their allowed request limit. If yes, the request is allowed, and the counter is updated. If not, the request is blocked.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 1: Generate a key&lt;/strong&gt;&lt;br&gt;
We’ll store each client’s request count as a Redis key. To make keys unique for each client, we’ll format them like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    public boolean isAllowed(String clientId) {
        String key = "rate_limit:" + clientId;
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For example, if the client ID is user123, their key would be rate_limit:user123.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2: Fetch the Current Counter&lt;/strong&gt;&lt;br&gt;
We’ll use Redis’s GET command to check how many requests the client has made so far. If the key doesn’t exist, we assume the client hasn’t made any requests, so the counter is 0.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    public boolean isAllowed(String clientId) {
        String key = "rate_limit:" + clientId;
        String currentCountStr = jedis.get(key);
        int currentCount = currentCountStr != null ? Integer.parseInt(currentCountStr) : 0;
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 3: Check the Request Limit&lt;/strong&gt;&lt;br&gt;
Next, we compare the current count to the allowed limit. If the counter is less than the limit, the request is allowed. Otherwise, it’s blocked.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    public boolean isAllowed(String clientId) {
        String key = "rate_limit:" + clientId;
        String currentCountStr = jedis.get(key);
        int currentCount = currentCountStr != null ? Integer.parseInt(currentCountStr) : 0;

        boolean isAllowed = currentCount &amp;lt; limit;
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Step 4: Increment the Counter and Set Expiration&lt;/strong&gt;&lt;br&gt;
If the request is allowed**, we need to do two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Increment the Counter&lt;/strong&gt;: Use the Redis INCR command to increase the request count by 1.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set an Expiration&lt;/strong&gt;: Use the EXPIRE command to ensure the counter resets at the end of the time window. To make sure the expiration won’t reset everytime we increment the counter, we also need to set the NX flag.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll do this in a &lt;strong&gt;transaction&lt;/strong&gt; to ensure that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Both INCR and EXPIRE happen together, avoiding race conditions.&lt;/li&gt;
&lt;li&gt;Both INCR and EXPIRE are pipelined (sent in a batch to Redis) to reduce the number of network trips, improving performance.
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    if (isAllowed) {
        Transaction transaction = jedis.multi();
        transaction.incr(key); // Increment the counter
        transaction.expire(key, windowSize, ExpiryOption.NX); // Set expiration only if not already set
        transaction.exec(); // Execute both commands atomically
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;blockquote&gt;
&lt;p&gt;The first request marks the start of the time window. Any subsequent requests during this window’s lifespan will increment the counter.&lt;br&gt;
 Once the window expires, the key is automatically removed from Redis. The next request after that will define the start of a new window.&lt;br&gt;
 If we didn’t set the NX flag, the expiration would be reset everytime the counter is incremented, increasing the lifespan of the window.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  &lt;strong&gt;Complete Implementation&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s the full code for the FixedWindowRateLimiter class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package io.redis;

    import redis.clients.jedis.Jedis;
    import redis.clients.jedis.Transaction;
    import redis.clients.jedis.args.ExpiryOption;

    public class FixedWindowRateLimiter {

        private final Jedis jedis;
        private final int windowSize;
        private final int limit;

        public FixedWindowRateLimiter(Jedis jedis, long windowSize, int limit) {
            this.jedis = jedis;
            this.limit = limit;
            this.windowSize = windowSize;
        }

        public boolean isAllowed(String clientId) {
            String key = "rate_limit:" + clientId;
            String currentCountStr = jedis.get(key);
            int currentCount = currentCountStr != null ? Integer.parseInt(currentCountStr) : 0;

            boolean isAllowed = currentCount &amp;lt; limit;

            if (isAllowed) {
                Transaction transaction = jedis.multi();
                transaction.incr(key);
                transaction.expire(key, windowSize, ExpiryOption.NX); // Set expire only if not set
                transaction.exec();
            }

            return isAllowed;
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And we’re ready to start testing it’s behavior!&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing our Rate Limiter
&lt;/h2&gt;

&lt;p&gt;To ensure our Fixed Window Rate Limiter behaves as expected, we’ll write tests for various scenarios. For this, we’ll use three tools:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis TestContainers&lt;/strong&gt;: This library spins up an isolated Redis container for testing. This means we don’t need to rely on an external Redis server during our tests. Once the tests are done, the container is stopped, leaving no leftover data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;JUnit 5&lt;/strong&gt;: Our main testing framework, which helps us define and structure tests with lifecycle methods like @BeforeEach and @AfterEach.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AssertJ&lt;/strong&gt;: A library that makes assertions readable and expressive, like assertThat(result).isTrue().&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Let’s begin by adding the necessary dependencies to our pom.xml.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Dependencies
&lt;/h3&gt;

&lt;p&gt;Here’s what you’ll need in your Maven pom.xml file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;
        &amp;lt;groupId&amp;gt;org.junit.jupiter&amp;lt;/groupId&amp;gt;
        &amp;lt;artifactId&amp;gt;junit-jupiter-engine&amp;lt;/artifactId&amp;gt;
        &amp;lt;version&amp;gt;5.10.0&amp;lt;/version&amp;gt;
        &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
    &amp;lt;/dependency&amp;gt;

    &amp;lt;dependency&amp;gt;
        &amp;lt;groupId&amp;gt;com.redis&amp;lt;/groupId&amp;gt;
        &amp;lt;artifactId&amp;gt;testcontainers-redis&amp;lt;/artifactId&amp;gt;
        &amp;lt;version&amp;gt;2.2.2&amp;lt;/version&amp;gt;
        &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
    &amp;lt;/dependency&amp;gt;

    &amp;lt;dependency&amp;gt;
        &amp;lt;groupId&amp;gt;org.assertj&amp;lt;/groupId&amp;gt;
        &amp;lt;artifactId&amp;gt;assertj-core&amp;lt;/artifactId&amp;gt;
        &amp;lt;version&amp;gt;3.11.1&amp;lt;/version&amp;gt;
        &amp;lt;scope&amp;gt;test&amp;lt;/scope&amp;gt;
    &amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you’ve added these dependencies, you’re ready to start writing your test class.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting Up the Test Class
&lt;/h3&gt;

&lt;p&gt;The first step is to create a test class named FixedWindowRateLimiterTest. Inside, we’ll define three main components:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Redis Test Container&lt;/strong&gt;: This launches a Redis instance in a Docker container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Jedis Instance&lt;/strong&gt;: This connects to the Redis container for sending commands.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Rate Limiter&lt;/strong&gt;: The actual FixedWindowRateLimiter instance we’re testing.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s how the skeleton of our test class looks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;public class FixedWindowRateLimiterTest {

        private static final RedisContainer redisContainer = new RedisContainer("redis:latest")
                .withExposedPorts(6379);

        private Jedis jedis;
        private FixedWindowRateLimiter rateLimiter;

        // Start Redis container once before any tests run
        static {
            redisContainer.start();
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Preparing the Environment Before Each Test&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Before running any test, we need to ensure a clean Redis environment. Here’s what we’ll do:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Connect to Redis&lt;/strong&gt;: Use a Jedis instance to connect to the Redis container.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Flush Data&lt;/strong&gt;: Clear any leftover data in Redis to ensure consistent results for each test.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll set this up in a method annotated with @BeforeEach, which runs before every test case.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @BeforeEach
    public void setup() {
        jedis = new Jedis(redisContainer.getHost(), redisContainer.getFirstMappedPort());
        jedis.flushAll();
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;FLUSHALL is an actual Redis command that deletes all the keys of all the existing databases. &lt;a href="https://redis.io/docs/latest/commands/flushall/" rel="noopener noreferrer"&gt;Read more about it in the official documentation&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Cleaning Up After Each Test&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;After each test, we need to close the Jedis connection to free up resources. This ensures no lingering connections interfere with subsequent tests.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @AfterEach
    public void tearDown() {
        jedis.close();
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Full Setup&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;Here’s how the complete test class looks with everything in place:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    public class FixedWindowRateLimiterTest {
        private static final RedisContainer redisContainer = new RedisContainer("redis:latest")
                .withExposedPorts(6379);

        private Jedis jedis;
        private FixedWindowRateLimiter rateLimiter;

        static {
            redisContainer.start();
        }

        @BeforeEach
        public void setup() {
            jedis = new Jedis(redisContainer.getHost(), redisContainer.getFirstMappedPort());
            jedis.flushAll();
        }

        @AfterEach
        public void tearDown() {
            jedis.close();
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verifying Requests Within the Limit&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter allows requests within the defined limit.&lt;/p&gt;

&lt;p&gt;We configure it with a &lt;strong&gt;limit of&lt;/strong&gt; &lt;strong&gt;5 requests&lt;/strong&gt; and a &lt;strong&gt;10-second window&lt;/strong&gt;, then call isAllowed(“client-1”) &lt;strong&gt;5 times&lt;/strong&gt;. Each call should return true, confirming the rate limiter correctly tracks and permits requests under the limit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    public void shouldAllowRequestsWithinLimit() {
        rateLimiter = new FixedWindowRateLimiter(jedis, 10, 5);
        for (int i = 1; i &amp;lt;= 5; i++) {
            assertThat(rateLimiter.isAllowed("client-1"))
                    .withFailMessage("Request " + i + " should be allowed")
                    .isTrue();
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verifying &lt;strong&gt;Requests Beyond the Limit&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter correctly denies requests once the defined limit is exceeded.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;limit of&lt;/strong&gt; &lt;strong&gt;5 requests&lt;/strong&gt; in a &lt;strong&gt;60-second window&lt;/strong&gt;, we call isAllowed(“client-1”) &lt;strong&gt;5 times&lt;/strong&gt; and expect all to return true. On the 6th call, it should return false, verifying the rate limiter blocks requests beyond the allowed limit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    public void shouldDenyRequestsOnceLimitIsExceeded() {
        rateLimiter = new FixedWindowRateLimiter(jedis, 60, 5);
        for (int i = 1; i &amp;lt;= 5; i++) {
            assertThat(rateLimiter.isAllowed("client-1"))
                    .withFailMessage("Request " + i + " should be allowed")
                    .isTrue();
        }

        assertThat(rateLimiter.isAllowed("client-1"))
                .withFailMessage("Request beyond limit should be denied")
                .isFalse();
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verifying Requests After Window Reset&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter resets correctly after the fixed window expires.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;limit of 5 requests&lt;/strong&gt; and a &lt;strong&gt;1-second window&lt;/strong&gt;, the first 5 requests (isAllowed(“client-1”)) return true, while the 6th request is denied (false).&lt;/p&gt;

&lt;p&gt;After waiting for the window to expire, the next request is allowed (true), confirming the reset behavior works as expected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    public void shouldAllowRequestsAgainAfterFixedWindowResets() throws InterruptedException {
        int limit = 5;
        String clientId = "client-1";
        int windowSize = 1;
        rateLimiter = new FixedWindowRateLimiter(jedis, windowSize, limit);

        for (int i = 1; i &amp;lt;= limit; i++) {
            assertThat(rateLimiter.isAllowed(clientId))
                    .withFailMessage("Request " + i + " should be allowed")
                    .isTrue();
        }

        assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request beyond limit should be denied")
                .isFalse();

        Thread.sleep((windowSize + 1) * 1000);

        assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request after window reset should be allowed")
                .isTrue();
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verifying Independent Handling of Multiple Clients&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter handles multiple clients independently.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;limit of 5 requests&lt;/strong&gt; and a &lt;strong&gt;10-second window&lt;/strong&gt;, the first 5 requests from &lt;strong&gt;client-1&lt;/strong&gt; are allowed (true), while the 6th is denied (false).&lt;/p&gt;

&lt;p&gt;Simultaneously, all 5 requests from &lt;strong&gt;client-2&lt;/strong&gt; are allowed (true), confirming the rate limiter maintains separate counters for each client.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    public void shouldHandleMultipleClientsIndependently() {
        int limit = 5;
        String clientId1 = "client-1";
        String clientId2 = "client-2";
        int windowSize = 10;
        rateLimiter = new FixedWindowRateLimiter(jedis, windowSize, limit);

        for (int i = 1; i &amp;lt;= limit; i++) {
            assertThat(rateLimiter.isAllowed(clientId1))
                    .withFailMessage("Client 1 request " + i + " should be allowed")
                    .isTrue();
        }

        assertThat(rateLimiter.isAllowed(clientId1))
                .withFailMessage("Client 1 request beyond limit should be denied")
                .isFalse();

        for (int i = 1; i &amp;lt;= limit; i++) {
            assertThat(rateLimiter.isAllowed(clientId2))
                    .withFailMessage("Client 2 request " + i + " should be allowed")
                    .isTrue();
        }
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verifying Requests Are Denied Until Fixed Window Resets&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This test ensures the rate limiter denies additional requests until the fixed window expires.&lt;/p&gt;

&lt;p&gt;Configured with a &lt;strong&gt;limit of 3 requests&lt;/strong&gt; and a &lt;strong&gt;5-second window&lt;/strong&gt;, the first 3 requests (isAllowed(“client-1”)) are allowed (true), while the 4th is denied (false).&lt;/p&gt;

&lt;p&gt;After waiting for half the window duration (2.5 seconds), requests are still denied (false).&lt;/p&gt;

&lt;p&gt;Once the window fully resets (after another 2.5 seconds), the next request is allowed (true), confirming proper behavior during and after the fixed window.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    public void shouldDenyAdditionalRequestsUntilFixedWindowResets() throws InterruptedException {
        int limit = 3;
        int windowSize = 5;
        String clientId = "client-1";
        rateLimiter = new FixedWindowRateLimiter(jedis, windowSize, limit);

        for (int i = 1; i &amp;lt;= limit; i++) {
            assertThat(rateLimiter.isAllowed(clientId))
                    .withFailMessage("Request " + i + " should be allowed within limit")
                    .isTrue();
        }

        assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request beyond limit should be denied")
                .isFalse();

        Thread.sleep(2500);

        assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request should still be denied within the same fixed window")
                .isFalse();

        Thread.sleep(2500);

        assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("Request should be allowed after fixed window reset")
                .isTrue();
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  &lt;strong&gt;Verifying Denied Requests Are Not Counted&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This test ensures that requests denied by the rate limiter are not included in the request count.&lt;/p&gt;

&lt;p&gt;Configured with a limit of 3 requests and a 5-second window, the first 3 requests (isAllowed(“client-1”)) are allowed (true), while the 4th is denied (false).&lt;/p&gt;

&lt;p&gt;Afterward, the Redis key for the client is checked to confirm the stored count equals the limit (3), ensuring denied requests do not increase the counter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    @Test
    public void testRateLimitDeniedRequestsAreNotCounted() {
        int limit = 3;
        int windowSize = 5;
        String clientId = "client-1";
        rateLimiter = new FixedWindowRateLimiter(jedis, windowSize, limit);

        for (int i = 1; i &amp;lt;= limit; i++) {
            assertThat(rateLimiter.isAllowed(clientId))
                    .withFailMessage("Request " + i + " should be allowed")
                    .isTrue();
        }

        assertThat(rateLimiter.isAllowed(clientId))
                .withFailMessage("This request should be denied")
                .isFalse();

        String key = "rate_limit:" + clientId;
        int requestCount = Integer.parseInt(jedis.get(key));
        assertThat(requestCount)
                .withFailMessage("The count (" + requestCount + ") should be equal to the limit (" + limit + "), not counting the denied request")
                .isEqualTo(limit);
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Is there any other behavior we should verify? Let me know in the comments!&lt;/p&gt;

&lt;p&gt;The Fixed Window Rate Limiter is a simple yet effective way to manage request rates, and &lt;strong&gt;Redis&lt;/strong&gt; makes it incredibly fast and reliable.&lt;/p&gt;

&lt;p&gt;By using commands like INCR and EXPIRE, we created a solution that tracks and limits requests while automatically resetting counters when the time window expires.&lt;/p&gt;

&lt;p&gt;With &lt;strong&gt;Jedis&lt;/strong&gt;, we built an easy-to-understand Java implementation, and thanks to thorough testing with Redis TestContainers, JUnit 5, and AssertJ, we can trust it works as expected.&lt;/p&gt;

&lt;p&gt;This approach is a great starting point for handling request limits and can easily be adapted for more complex scenarios if needed.&lt;/p&gt;

&lt;h3&gt;
  
  
  GitHub Repo
&lt;/h3&gt;

&lt;p&gt;You can find this implementation in &lt;strong&gt;Java&lt;/strong&gt; and &lt;strong&gt;Kotlin&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Java (&lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-java-example/blob/main/src/main/java/io/redis/FixedWindowRateLimiter.java" rel="noopener noreferrer"&gt;Implementation&lt;/a&gt;, &lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-java-example/blob/main/src/test/java/io/redis/FixedWindowRateLimiterTest.java" rel="noopener noreferrer"&gt;Test&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Kotlin (&lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-kotlin-example/blob/main/src/main/kotlin/org/example/FixedWindowRateLimiter.kt" rel="noopener noreferrer"&gt;Implementation&lt;/a&gt;, &lt;a href="https://github.com/raphaeldelio/redis-rate-limiter-kotlin-example/blob/main/src/test/kotlin/org/example/FixedWindowRateLimiterTest.kt" rel="noopener noreferrer"&gt;Test&lt;/a&gt;)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Stay Curious!
&lt;/h3&gt;

</description>
      <category>java</category>
      <category>redis</category>
      <category>systemdesign</category>
    </item>
    <item>
      <title>Rate limiting with Redis: An essential guide</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Mon, 23 Dec 2024 15:28:17 +0000</pubDate>
      <link>https://dev.to/redis/rate-limiting-with-redis-an-essential-guide-4jll</link>
      <guid>https://dev.to/redis/rate-limiting-with-redis-an-essential-guide-4jll</guid>
      <description>&lt;p&gt;&lt;a href="https://bsky.app/profile/raphaeldelio.dev" rel="noopener noreferrer"&gt;*Bluesky&lt;/a&gt; | &lt;a href="https://twitter.com/raphaeldelio" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/raphaeldelio/?originalSubdomain=nl" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://www.youtube.com/@raphaeldelio" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; | &lt;a href="https://www.instagram.com/raphaeldelio/" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;&lt;br&gt;
&lt;a href="https://www.youtube.com/watch?v=YV4ePyW3DO8" rel="noopener noreferrer"&gt;This article is also available on YouTube!&lt;/a&gt;*&lt;/p&gt;

&lt;p&gt;Rate limiting — it’s something you’ve likely encountered, even if you haven’t directly implemented one. For example, have you ever been greeted by a “429 Too Many Requests” error? That’s a rate limiter in action, protecting a resource from overload. Or maybe you’ve used a service with explicit request quotas based on your payment tier — same concept, just more transparent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxvzr7zxnbw9wpydi4v9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffxvzr7zxnbw9wpydi4v9.png" alt="ChatGPT warning user that they have reached the limit of messages they can send in 24 hours." width="800" height="107"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Rate limiting isn’t just about setting limits; it serves a variety of purposes. Take Figma, for instance. Their rate limiter, built with Redis, saved them from a spam attack where bad actors sent massive document invitations to random email addresses. Without it, Figma could have faced skyrocketing email delivery costs and damaged reputation. Or look at Stripe: as their platform grew, they realized they couldn’t just throw more infrastructure at the problem. They needed a smarter solution to prevent resource monopolization by misconfigured scripts or bad actors.&lt;/p&gt;

&lt;p&gt;These stories show just how versatile rate limiting is. It prevents abuse, ensures fair access, manages load, cuts costs, and even protects against downtime. But here’s the kicker: the hard part isn’t knowing &lt;em&gt;why&lt;/em&gt; you need a rate limiter. The real challenge is building one that’s both efficient and tailored to your needs.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Why Redis for Rate Limiting?&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Redis has become a go-to tool for implementing rate limiters, and for good reason. It’s fast, reliable, and packed with features like atomic operations, data persistence, and Lua scripting. Just ask GitHub. When they migrated to a Redis-backed solution with client-side sharding, they solved tough challenges like replication, consistency, and scalability while ensuring reliable behavior across their infrastructure.&lt;/p&gt;

&lt;p&gt;So, why Redis? Its speed, versatility, and built-in capabilities make it perfect for handling distributed traffic patterns. But what’s even more important is &lt;em&gt;how&lt;/em&gt; you use it. Let’s break down the most common rate-limiting patterns you can implement with Redis and what each one brings to the table.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Popular Rate-Limiting Patterns&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Choosing the right rate-limiting algorithm can be challenging. Here’s a breakdown of the most popular options, when to use them, and their trade-offs, with practical examples to help you decide:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Leaky Bucket&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How It Works&lt;/strong&gt;: Imagine a bucket with a small hole at the bottom. Requests (water) are added to the bucket and processed at a steady “drip” rate, preventing sudden floods.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fraphaeldelio.com%2Fwp-content%2Fuploads%2F2024%2F12%2FLeaky-Bucket.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fraphaeldelio.com%2Fwp-content%2Fuploads%2F2024%2F12%2FLeaky-Bucket.gif" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt; Ideal for smoothing traffic flow, such as in streaming services or payment processing, where a predictable output is critical.&lt;/p&gt;

&lt;p&gt;**Example: **A video streaming platform regulates API calls to its content delivery network, ensuring consistent playback quality.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drawback:&lt;/strong&gt; Not suitable for handling sudden bursts, like flash sales or promotional campaigns.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Token Bucket&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt; Tokens are generated at a fixed rate and stored in a bucket. Each request consumes a token, allowing for short bursts as long as tokens are available.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fraphaeldelio.com%2Fwp-content%2Fuploads%2F2024%2F12%2Ftoken-bucket.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fraphaeldelio.com%2Fwp-content%2Fuploads%2F2024%2F12%2Ftoken-bucket.gif" width="1080" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt; Perfect for APIs that need to handle occasional traffic spikes while enforcing overall limits, such as login attempts or search queries.&lt;/p&gt;

&lt;p&gt;**Example: **An e-commerce site allows bursts of up to 20 requests per second during checkout but limits the overall rate to 100 requests per minute.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drawback Example:&lt;/strong&gt; Requires periodic token replenishment, which can introduce minor overhead in distributed systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fixed Window Counter
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt; Tracks the number of requests in fixed intervals (e.g., 1 minute). Once the limit is reached, all subsequent requests in that window are denied.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbib74ec8hvwfgiqnctyb.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbib74ec8hvwfgiqnctyb.gif" width="1080" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt; Simple APIs with predictable traffic and low precision needs, like throttling a hobbyist developer’s free-tier usage.&lt;/p&gt;

&lt;p&gt;**Example: **A public weather API allows 100 requests per user per minute, with any extra requests returning a “429 Too Many Requests” response.&lt;/p&gt;

&lt;p&gt;**Drawback: **Users can game the system by stacking requests at the boundary of two time windows (e.g., 100 at 59 seconds and 100 at 1 second of the next window).&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Sliding Window Log&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt; Maintains a log of timestamps for each request and calculates limits based on a rolling time window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijey90dcz0ijt3cxlqjw.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fijey90dcz0ijt3cxlqjw.gif" width="720" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt; Critical systems requiring high accuracy, such as financial transaction APIs or fraud detection mechanisms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt; A banking API limits withdrawals to 10 per hour, with each new request evaluated against the timestamps of the last 10 requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Drawback:&lt;/strong&gt; High memory usage and computational cost when scaling to millions of users or frequent requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Sliding Window Counter&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How It Works:&lt;/strong&gt; Divides the time window into smaller intervals (e.g., 10-second buckets) and aggregates request counts to approximate a rolling window.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fraphaeldelio.com%2Fwp-content%2Fuploads%2F2024%2F12%2FSliding-Window-Counter.gif%3Fresize%3D1080%252C608%26ssl%3D1" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fi0.wp.com%2Fraphaeldelio.com%2Fwp-content%2Fuploads%2F2024%2F12%2FSliding-Window-Counter.gif%3Fresize%3D1080%252C608%26ssl%3D1" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Use Cases:&lt;/strong&gt; APIs that need a balance between accuracy and efficiency, like chat systems or lightweight rate-limiting for microservices.&lt;/p&gt;

&lt;p&gt;**Example: **A messaging app limits users to 30 messages per minute but divides the minute into 6 buckets, allowing more flexibility in traffic patterns.&lt;/p&gt;

&lt;p&gt;**Drawback: **Small inaccuracies can occur, especially during highly bursty traffic patterns.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Choosing the Right Tool for the Job&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Selecting a rate-limiting strategy isn’t just about matching patterns to scenarios; it’s about understanding the trade-offs and the specific needs of your application. Here’s how to make a more informed choice:&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Understand Your Traffic Patterns&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Predictable Traffic&lt;/strong&gt;: If your API serves consistent request rates (e.g., hourly status checks or regular polling), &lt;strong&gt;Leaky Bucket&lt;/strong&gt; is excellent for maintaining a steady flow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Burst Traffic&lt;/strong&gt;: If you expect short bursts of traffic, such as during product launches or login spikes, &lt;strong&gt;Token Bucket&lt;/strong&gt; allows controlled bursts while enforcing limits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Mixed Traffic&lt;/strong&gt;: APIs with unpredictable traffic may benefit from &lt;strong&gt;Sliding Window Counter&lt;/strong&gt;, which balances accuracy and resource usage.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Assess the Level of Precision Needed&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;High Precision&lt;/strong&gt;: If exact limits are critical (e.g., financial transactions or fraud detection), &lt;strong&gt;Sliding Window Log&lt;/strong&gt; provides the most accurate enforcement by logging every request.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Approximation is Okay&lt;/strong&gt;: For most APIs, &lt;strong&gt;Sliding Window Counter&lt;/strong&gt; strikes a balance between precision and efficiency, as it uses aggregated data instead of tracking every request.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Consider Resource Constraints&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Memory and CPU Overhead&lt;/strong&gt;: Algorithms like &lt;strong&gt;Sliding Window Log&lt;/strong&gt; can become resource-intensive at scale, especially with millions of users. For a lightweight alternative, &lt;strong&gt;Fixed Window Counter&lt;/strong&gt; is simple but effective for low-traffic APIs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Redis makes scaling rate limiting easier with atomic operations, Lua scripting, and replication features, but your choice of algorithm still affects performance. For instance, &lt;strong&gt;Token Bucket&lt;/strong&gt; is computationally cheaper than &lt;strong&gt;Sliding Window Log&lt;/strong&gt; in most distributed systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Account User Experience
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;User Tolerance for Errors&lt;/strong&gt;: Fixed-window approaches like &lt;strong&gt;Fixed Window Counter&lt;/strong&gt; may frustrate users due to rigid resets. Sliding-window methods smooth out these boundaries, leading to a better user experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Handling Edge Cases&lt;/strong&gt;: Algorithms like &lt;strong&gt;Token Bucket&lt;/strong&gt; allow some flexibility for bursts, which can help avoid unnecessary rate-limit errors during legitimate usage spikes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the end, rate limiting is about more than just enforcing boundaries — it’s about designing systems that are efficient, fair, and user-friendly. By carefully matching the algorithm to your use case, you’re not just managing traffic — you’re shaping a better experience for everyone involved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stay curious!
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiob68ouzayite9w4zs56.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiob68ouzayite9w4zs56.png" width="800" height="418"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>redis</category>
      <category>systemdesign</category>
      <category>java</category>
      <category>architecture</category>
    </item>
    <item>
      <title>What do 200 electrocuted monks have to do with Redis 8, the fastest Redis ever?</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Tue, 19 Nov 2024 10:49:43 +0000</pubDate>
      <link>https://dev.to/raphaeldelio/what-do-200-electrocuted-monks-have-to-do-with-redis-8-the-fastest-redis-ever-3kca</link>
      <guid>https://dev.to/raphaeldelio/what-do-200-electrocuted-monks-have-to-do-with-redis-8-the-fastest-redis-ever-3kca</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://bsky.app/profile/raphaeldelio.dev" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt; | &lt;a href="https://twitter.com/raphaeldelio" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/raphaeldelio/?originalSubdomain=nl" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://www.youtube.com/@raphaeldelio" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; | &lt;a href="https://www.instagram.com/raphaeldelio/" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;&lt;br&gt;
&lt;a href="https://youtu.be/ok2mSw-z1Q0" rel="noopener noreferrer"&gt;This article is also available on YouTube!&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8m6owqefila8qermn1e7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8m6owqefila8qermn1e7.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Have you ever heard of Jean-Antoine Nollet? Back in the 18th century, Nollet carried out an experiment where he lined up 200 monks, each connected hand-to-hand with iron wires, forming a continuous chain over a mile (1.6 km) long. &lt;strong&gt;Once everything was set up, he connected a primitive electrical battery to the line, delivering a powerful electric shock to all of them simultaneously.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now, &lt;strong&gt;Nollet wasn’t just zapping monks for kicks.&lt;/strong&gt; His experiment had a serious purpose: to study the properties of electricity and see how far and how fast it could travel along a wire. &lt;strong&gt;This was groundbreaking at a time when sending a message 100 miles took nearly a day by horseback&lt;/strong&gt;. Nollet’s work hinted at something revolutionary — t*&lt;em&gt;he potential for electricity to be used for communication&lt;/em&gt;*.&lt;/p&gt;

&lt;p&gt;Fast forward to the 19th century, and the telegraph brought this idea to life. &lt;strong&gt;Suddenly, messages that used to take days could travel in minutes.&lt;/strong&gt; Samuel Morse and other inventors transformed Nollet’s findings into a world-changing technology. &lt;strong&gt;The telegraph became the 19th-century equivalent of the internet, connecting people in ways no one had imagined before.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;As &lt;a href="https://dev.toundefined"&gt;Tom Standage&lt;/a&gt; describes in &lt;em&gt;The Victorian Internet&lt;/em&gt;, t*&lt;em&gt;he telegraph was so fast it scared some people&lt;/em&gt;*. Critics even argued it was “too fast for the truth.” It sounds funny now, doesn’t it?&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Today, we see the internet as almost instantaneous, but back then, the telegraph felt like a leap into hyperspeed.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;That said, even with our modern tech, &lt;strong&gt;we sometimes still think the internet is slow&lt;/strong&gt;. To Nollet, the speed we’ve reached would have been incomprehensible, but we know there are limits. For example, &lt;strong&gt;even if data could travel at the speed of light in a vacuum, it would still take about 56.7 milliseconds to get from London to Sydney.&lt;/strong&gt; That’s just physics — it can’t get any faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;But speed isn’t just about how fast data travels; it’s also about how quickly it gets processed.&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;With applications like real-time gaming, video streaming, and AI-powered services, every millisecond matters. &lt;strong&gt;That’s where Redis comes in.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redis&lt;/strong&gt; is an in-memory database designed for speed. Unlike traditional databases that rely on disks, &lt;strong&gt;Redis keeps everything in RAM, giving you access times measured in microseconds.&lt;/strong&gt; This makes it ideal for real-time analytics, online gaming, and AI workloads where responsiveness is critical.&lt;/p&gt;

&lt;h3&gt;
  
  
  And guess what? Redis just got even faster with Redis 8.
&lt;/h3&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Redis 8: Faster Than Ever&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;The latest milestone, Redis 8.0 M02, brings significant latency reductions across widely-used commands, such as up to a 36% reduction in latency for ZADD, 28% for SMEMBERS, and 10% for HGETALL compared to Redis 7.2.5. Over 70% of Redis users will experience noticeably faster responses with these improvements.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redis, like the telegraph once was, is revolutionizing our expectations of speed.&lt;/strong&gt; It ensures that not only does data reach its destination quickly, but that it’s immediately available for processing and analysis. &lt;strong&gt;In a world where even a 100-millisecond delay can impact user experience, Redis plays a crucial role in minimizing the lag.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Scaling Like Never Before&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Redis 8 isn’t just faster, it’s more scalable too&lt;/strong&gt;. It brings features that were previously only available in Redis Cloud and Redis Software, like horizontal and vertical scaling for the Redis Query Engine.&lt;/p&gt;

&lt;p&gt;With horizontal scaling, you can handle much larger datasets by clustering databases, which boosts read and write throughput. &lt;strong&gt;Vertical scaling adds processing power, delivering up to 16x more throughput.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Benchmarking Redis 8: Breaking Records&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;To showcase its improvements, Redis partnered with Intel to test its performance with one billion 768-dimensional vector embeddings. The results? &lt;strong&gt;Redis handled up to 66,000 vector insertions per second with indexing for 95% precision and up to 160,000 insertions per second for lower precision indexing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Even with high-precision queries, Redis delivered a median latency of 200 milliseconds for a 90% precision rate when searching the top 100 nearest neighbors.&lt;/strong&gt; And by tweaking HNSW (Hierarchical Navigable Small World) parameters, you can fine-tune Redis to balance speed and accuracy for your specific use case.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://redis.io/blog/redis-8-0-m02-the-fastest-redis-ever/" rel="noopener noreferrer"&gt;See more of the benchmarks in the official Redis Blog.&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;strong&gt;Try Redis 8 Today&lt;/strong&gt;
&lt;/h2&gt;

&lt;p&gt;Redis 8.0 M02 is available now, and you can experience its speed and scalability for yourself. &lt;strong&gt;Whether you’re looking for better latency, scalable query engines, or support for billion-scale vector search workloads, Redis 8 is ready to deliver.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Start experimenting today by downloading an Alpine or Debian Docker image from &lt;a href="https://hub.docker.com/_/redis" rel="noopener noreferrer"&gt;Redis Docker Hub&lt;/a&gt;. See what Redis 8 can do for your real-time applications!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faehfczgj0dqxzsgqs4p9.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faehfczgj0dqxzsgqs4p9.gif" width="1080" height="608"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>redis</category>
      <category>database</category>
      <category>opensource</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>Don't forget to flush! — Ensuring Data Integrity in Spring Data JPA</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Sun, 29 Sep 2024 12:05:14 +0000</pubDate>
      <link>https://dev.to/raphaeldelio/dont-forget-to-flush-ensuring-data-integrity-in-spring-data-jpa-aab</link>
      <guid>https://dev.to/raphaeldelio/dont-forget-to-flush-ensuring-data-integrity-in-spring-data-jpa-aab</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://twitter.com/raphaeldelio" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/raphaeldelio/?originalSubdomain=nl" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://www.youtube.com/@raphaeldelio" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; | &lt;a href="https://www.instagram.com/raphaeldelio/" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0qi8sl9n468xyfiazyx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl0qi8sl9n468xyfiazyx.png" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Just like you wouldn’t leave the bathroom without flushing, you shouldn’t navigate through Spring Data JPA without understanding the importance of flushing. Flushing, in the context of JPA (Java Persistence API), is like telling your application, “Hey, let’s make sure all our pending changes to the database are actually sent and stored properly!”. It is making sure that your in-memory changes are synchronized with the database.&lt;/p&gt;

&lt;p&gt;Imagine you’re editing a document; flushing is like hitting the ‘save’ button to ensure all your changes are permanently stored. In the context of JPA, this means ensuring that any modifications made to your entities are actually reflected in the database. It’s a process that can happen automatically, like a sensor-flush in modern toilets, or manually, where you decide the right moment to sync, similar to the traditional toilet flush lever.&lt;/p&gt;

&lt;p&gt;Grasping the flushing mechanism is vital. Without proper flushing, you might end up with data discrepancies, where changes in your application’s memory don’t match what’s in the database. It’s like assuming your toilet will flush on its own, only to find out it doesn’t, leading to an unpleasant situation. Proper flushing ensures that your data integrity is maintained and your application’s interaction with the database is smooth and error-free.&lt;/p&gt;

&lt;p&gt;Let’s take a look at an example:&lt;/p&gt;

&lt;h3&gt;
  
  
  The Deduplication Strategy with Flushing in Spring Boot JPA
&lt;/h3&gt;

&lt;p&gt;Imagine you’re working with a function in Spring Boot that should run only once for a unique set of parameters. To ensure this uniqueness, you use a deduplication strategy involving a database table.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Transactional
public void processIdempotent(
        String eventId,
        String data
) {
    deduplicate(eventId);
    updateDatabase(data);
    sendMessage(data);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  The Deduplication Table:
&lt;/h3&gt;

&lt;p&gt;You create a special table in your database. This table’s job is to store each unique set of parameters your function uses. It’s designed so that if you try to insert a set of parameters that’s already in the table, the database will throw a constraint violation exception.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Entity(name="processed_events")
public class ProcessedEvent implements Serializable, Persistable&amp;lt;String&amp;gt; {

    @Id
    @Column(name="eventid")
    private String eventId;

    public ProcessedEvent(){}

    public ProcessedEvent(final String eventId) {
        this.eventId = eventId;
    }

    /**
     * Ensures Hibernate always does an INSERT operation when save() is called.
     */
    @Transient
    @Override
    public boolean isNew() {
        return true;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  Transactional Integrity and the Challenge of Parallel Execution:
&lt;/h3&gt;

&lt;p&gt;In Spring Boot JPA, database interactions are often wrapped in transactions. This means all operations, including the insertion into your deduplication table, are only finalized when the transaction commits. If any part of the transaction fails, everything is rolled back.&lt;/p&gt;

&lt;p&gt;However, imagine two instances of your function running at the same time, each within its own transaction. They both check the deduplication table and, finding no existing entries for their parameters, proceed.&lt;/p&gt;

&lt;p&gt;Even though one of the transactions will fail by the time it tries to commit, this may still cause inconsistencies, especially when your function interacts with external systems, such as a message broker or a REST API, operations that won't be rolled back with the database.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixz4qnwxd3cj9oup6zgv.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixz4qnwxd3cj9oup6zgv.gif" width="760" height="491"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Flushing Solution:
&lt;/h3&gt;

&lt;p&gt;To prevent this issue, you can use flushing right after inserting into the deduplication table. Flushing forces JPA to immediately synchronize the current state of the session with the database. So, if two instances of the function run in parallel, as soon as one tries to flush its insertion into the deduplication table, it’ll either succeed or fail immediately if the other has already inserted the same parameters.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private void deduplicate(UUID eventId) throws DuplicateEventException {
    try {
        processedEventRepository.saveAndFlush(new
ProcessedEvent(eventId));
        log.debug("Event persisted with Id: {}", eventId);
    } catch (DataIntegrityViolationException | PessimisticLockingFailureException e) {
        log.warn("Event already processed: {}", eventId);
        throw new DuplicateEventException(eventId);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This immediate feedback is crucial. It prevents the function from fully executing if another instance has already run with the same parameters, ensuring that each unique set of parameters triggers the function only once. Flushing here acts as an early alert system, maintaining the integrity of your deduplication logic and preventing potential inconsistencies, especially when your function interacts with other systems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6lspx4pp3fvv5ytwzbd1.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6lspx4pp3fvv5ytwzbd1.gif" width="760" height="491"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;As a developer, knowing when to flush in JPA is key to ensuring your data changes are properly saved and reflected in the database. It’s one of those fundamental skills that can save you from a lot of headaches down the road. So, remember to flush wisely and keep your data in sync — it’s as crucial in JPA as it is in real life after using the restroom!&lt;/p&gt;

&lt;h3&gt;
  
  
  Stay curious!
&lt;/h3&gt;

&lt;h2&gt;
  
  
  Contribute
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Writing takes time and effort.&lt;/em&gt;&lt;/strong&gt; I love writing and sharing knowledge, but I also have bills to pay. If you like my work, please, &lt;strong&gt;consider donating through Buy Me a Coffee: &lt;a href="https://www.buymeacoffee.com/RaphaelDeLio" rel="noopener noreferrer"&gt;https://www.buymeacoffee.com/RaphaelDeLio&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Or by sending me BitCoin: 1HjG7pmghg3Z8RATH4aiUWr156BGafJ6Zw&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Follow Me on Social Media
&lt;/h2&gt;

&lt;p&gt;Stay connected and dive deeper into the world of Spring with me! Follow my journey across all major social platforms for exclusive content, tips, and discussions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/raphaeldelio" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/raphaeldelio/?originalSubdomain=nl" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://www.youtube.com/@raphaeldelio" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; | &lt;a href="https://www.instagram.com/raphaeldelio/" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;&lt;/p&gt;

</description>
      <category>java</category>
      <category>springboot</category>
      <category>database</category>
      <category>spring</category>
    </item>
    <item>
      <title>The 6 Principles of Microservices Architecture</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Sat, 28 Sep 2024 10:36:15 +0000</pubDate>
      <link>https://dev.to/raphaeldelio/the-6-principles-of-microservices-architecture-17ng</link>
      <guid>https://dev.to/raphaeldelio/the-6-principles-of-microservices-architecture-17ng</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://twitter.com/raphaeldelio" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/raphaeldelio/?originalSubdomain=nl" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://www.youtube.com/@raphaeldelio" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; | &lt;a href="https://www.instagram.com/raphaeldelio/" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I recently attended &lt;a href="https://www.linkedin.com/in/urs-peter-70a2882/?originalSubdomain=nl" rel="noopener noreferrer"&gt;Urs Peter&lt;/a&gt;’s course on Event-Driven Architecture, and one of the cool things we dived into right at the start was the six key principles of Microservices architecture.&lt;/p&gt;

&lt;p&gt;It’s important to remember that microservices aren’t a magic fix; they won’t solve every issue, and if they are not implemented correctly, they can even bring up some big new challenges. So, today, I’m excited to share with you these six principles of Microservices architecture, which are super important to get right for it to work well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ya5oozdz4lgwjzobyvo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5ya5oozdz4lgwjzobyvo.png" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Isolation
&lt;/h2&gt;

&lt;p&gt;In a microservices architecture, isolation ensures each microservice functions independently with its own codebase, data storage, and runtime environment, preventing process and resource sharing with other services.&lt;/p&gt;

&lt;p&gt;One of the key advantages of isolation is that it contains failures within a single service. If one microservice fails, it doesn’t necessarily bring down the entire system, as other services continue to operate independently.&lt;/p&gt;

&lt;p&gt;Following this principle means that each microservice owns its data and data model and that no direct database is shared between services. Data sharing, if necessary, is done through well-defined interfaces (APIs).&lt;/p&gt;

&lt;p&gt;By isolating your services, typically, they will also run in isolated environments, such as containers, ensuring that issues in one service (like a memory leak) do not affect other services.&lt;/p&gt;

&lt;p&gt;However, by isolating our microservices, the overall architecture becomes more complex, with multiple isolated services interacting with each other.&lt;/p&gt;

&lt;p&gt;Moreover, managing communication between services, especially in an asynchronous environment, may also be challenging. Multiple databases add complexity to ensuring consistency across different services, especially when handling distributed transactions.&lt;/p&gt;

&lt;p&gt;And, naturally, more services mean more deployments, more monitoring, and potentially more points of failure, increasing the overall operational complexity of your system.&lt;/p&gt;

&lt;p&gt;In microservices architecture, isolation is all about finding the perfect balance between letting each service do its own thing and making sure they all work well together. The goal is to create a system where each service can stand on its own, handle problems without causing a domino effect, and easily grow as needed. But at the same time, all these services need to work together smoothly as part of a bigger picture. Getting this balance right isn’t just about the technical aspects; it also involves how teams work together and how the whole operation is run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Autonomy
&lt;/h2&gt;

&lt;p&gt;Each microservice should be autonomous, meaning it makes decisions based on its context without depending on other services. This includes how it processes data, handles business logic, and responds to requests. An autonomous service encapsulates a specific business functionality. It’s responsible for all aspects of that function, from data processing to business rules.&lt;/p&gt;

&lt;p&gt;Teams should be able to develop and test their services independently, using tools and languages best suited to the service's functionality. They should own their data and define their data schema. This data is exposed only through APIs, which maintain control over how their data is accessed and used.&lt;/p&gt;

&lt;p&gt;Moreover, services should be deployable independently. This means a service can be updated, fixed, or scaled without needing to redeploy the entire application.&lt;/p&gt;

&lt;p&gt;These benefits also add overall complexity. While services are independent, they often need to communicate. Managing these communication patterns without creating tight coupling is a challenge.&lt;/p&gt;

&lt;p&gt;Besides that, autonomous services can lead to duplication of effort or infrastructure, as each service may require its own support mechanisms like databases, caching, and logging.&lt;/p&gt;

&lt;p&gt;Autonomy in microservices is about empowering individual services to operate independently while still contributing effectively to the overall system. It brings significant benefits regarding flexibility, resilience, and development speed. However, it also introduces challenges related to communication, consistency, and potential overhead. Careful design, clear service contracts, and a focus on well-defined boundaries are key to harnessing the full potential of autonomous microservices.&lt;/p&gt;

&lt;h2&gt;
  
  
  Single Responsibility
&lt;/h2&gt;

&lt;p&gt;The Single Responsibility Principle (SRP) is a guiding concept that dictates each service should be responsible for a single piece of functionality or a single aspect of a system’s business logic.&lt;/p&gt;

&lt;p&gt;A microservice following SRP should have one, and only one, reason to change. This means it should focus on a single business capability or function. The service’s responsibilities are well-defined, and it does not overlap with or bleed into the functionalities of other services.&lt;/p&gt;

&lt;p&gt;Properly defining what each service should and should not do is crucial. This often involves identifying domain boundaries, which can be guided by practices like Domain-Driven Design (DDD).&lt;/p&gt;

&lt;p&gt;However, it's important to notice that while services should be focused, overly granular services can lead to unnecessary complexity. Finding the right balance between service size and responsibility is key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Exclusive State
&lt;/h2&gt;

&lt;p&gt;Exclusive State emphasizes the importance of each microservice managing its own data independently.&lt;/p&gt;

&lt;p&gt;In exclusive state, each microservice owns and controls its own database or state. This means that no other microservice has direct access to this data. This means that each microservice manages its data schema and storage mechanisms, which could differ from those of other services. Data sharing or synchronization between microservices, if necessary, is achieved through API calls, event streaming, or message brokers, maintaining data encapsulation.&lt;/p&gt;

&lt;p&gt;By owning its data, each service ensures the integrity and consistency of the data it manages. Besides that, different services can scale their data storage and processing capabilities independently based on their specific requirements. Moreover, with exclusive state, the failure of one service’s data store does not directly impact other services, enhancing the system’s overall resilience.&lt;/p&gt;

&lt;p&gt;However, it comes with a price. Transactions and operations that span multiple services become more complex, as they require coordination across independent data stores. Also, managing separate databases or state stores for each service can increase infrastructure complexity and costs.&lt;/p&gt;

&lt;p&gt;In a nutshell, Exclusive State ensures that each service is self-contained in terms of its data, contributing to the overall robustness and scalability of the system. However, it introduces challenges in terms of data management, particularly when dealing with operations that span multiple services. Effective implementation of this principle requires thoughtful system design and a clear understanding of the trade-offs involved in managing data within a distributed environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  Async Message Passing
&lt;/h2&gt;

&lt;p&gt;In asynchronous message passing, a microservice sends a message (a request, data, notification) to another service without waiting for an immediate response. The sending service continues its operation and can handle the response at a later point in time. This often involves an event-driven architecture where services react to events and communicate changes through messages.&lt;/p&gt;

&lt;p&gt;Systems implement asynchronous communication using technologies like message queues (e.g., RabbitMQ, Kafka), which temporarily store messages until they are processed by the receiving service. And services notify other parts of the system about changes or updates through events rather than direct calls or requests.&lt;/p&gt;

&lt;p&gt;Benefits include services that are not tightly coupled to each other’s processes, leading to a more resilient system architecture. As services don’t wait for responses, they can handle more requests and scale better under load. Also, temporary failures in one service don’t immediately impact others, as messages can be retried or delayed.&lt;/p&gt;

&lt;p&gt;However, ensuring reliable delivery and processing of messages can be complex, especially in a distributed system. While decoupling services, asynchronous communication can introduce delays in processing, which might not be suitable for time-sensitive operations. Moreover, tracing a request’s path and debugging issues can be more challenging in an asynchronous setup.&lt;/p&gt;

&lt;p&gt;To summarize, Async Message Passing enables microservices to communicate in a decoupled, efficient, and resilient manner, which is particularly beneficial in distributed and scalable systems. However, it introduces complexities in managing and monitoring message flows and requires careful design to ensure consistency and reliability. Embracing this principle often involves a shift towards an event-driven architecture, which brings its considerations in system design and operation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Location Transparency
&lt;/h2&gt;

&lt;p&gt;In a system with location transparency, microservices are designed and operated without the need for other services to know their specific physical location (IP Address). Services communicate with each other based on logical identifiers rather than physical network addresses. This often involves mechanisms for dynamic service discovery, where services can find and communicate with each other through a registry or directory service, regardless of where they are deployed.&lt;/p&gt;

&lt;p&gt;Tools like Kubernetes or service meshes provide a dynamic registry where services register themselves. Other services use this registry to discover and communicate with them. Location transparency allows for intelligent load balancing and rerouting of requests in case of service failures, enhancing the system’s fault tolerance.&lt;/p&gt;

&lt;p&gt;This approach allows services to be easily scaled up or down, moved, or replicated across different servers or clusters without impacting the system’s operation. Besides that, services can be deployed on various platforms (on-premises, cloud, hybrid) without affecting their interaction with other services. And the system can automatically handle service failures by rerouting requests to other instances or locations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Event-Driven MicroServices Training
&lt;/h2&gt;

&lt;p&gt;The “Event-Driven Microservices Training” by Urs Course is open to all interested in enhancing their knowledge of Event-Driven Architecture, and I highly recommend it.&lt;/p&gt;

&lt;p&gt;This two-day, in-person course is conducted in the Netherlands. For upcoming dates and pricing details, visit &lt;a href="https://xebia.com/academy/nl/training/event-driven-microservices-training/" rel="noopener noreferrer"&gt;Event-Driven Microservices Training&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stay Curious!
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Contribute
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Writing takes time and effort.&lt;/em&gt;&lt;/strong&gt; I love writing and sharing knowledge, but I also have bills to pay. If you like my work, please, &lt;strong&gt;consider donating through Buy Me a Coffee: &lt;a href="https://www.buymeacoffee.com/RaphaelDeLio" rel="noopener noreferrer"&gt;https://www.buymeacoffee.com/RaphaelDeLio&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Or by sending me BitCoin: 1HjG7pmghg3Z8RATH4aiUWr156BGafJ6Zw&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Follow Me on Social Media
&lt;/h2&gt;

&lt;p&gt;Stay connected and dive deeper into the world of Software Architecture with me! Follow my journey across all major social platforms for exclusive content, tips, and discussions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://twitter.com/raphaeldelio" rel="noopener noreferrer"&gt;Twitter&lt;/a&gt; | &lt;a href="https://www.linkedin.com/in/raphaeldelio/?originalSubdomain=nl" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; | &lt;a href="https://www.youtube.com/@raphaeldelio" rel="noopener noreferrer"&gt;YouTube&lt;/a&gt; | &lt;a href="https://www.instagram.com/raphaeldelio/" rel="noopener noreferrer"&gt;Instagram&lt;/a&gt;&lt;/p&gt;

</description>
      <category>microservices</category>
      <category>architecture</category>
      <category>softwaredevelopment</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>What’s the Connection Between Leonardo Da Vinci, a Cup of Coffee in Lisbon, and the Nature of Software Development?</title>
      <dc:creator>Raphael De Lio</dc:creator>
      <pubDate>Fri, 27 Sep 2024 13:12:03 +0000</pubDate>
      <link>https://dev.to/raphaeldelio/whats-the-connection-between-leonardo-da-vinci-a-cup-of-coffee-in-lisbon-and-the-nature-of-software-development-4jlo</link>
      <guid>https://dev.to/raphaeldelio/whats-the-connection-between-leonardo-da-vinci-a-cup-of-coffee-in-lisbon-and-the-nature-of-software-development-4jlo</guid>
      <description>&lt;p&gt;In Walter Isaacson’s biography of Leonardo Da Vinci, he writes about an incident that occurred while Leonardo was painting one of his most famous works, “The Last Supper.”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3vjq5ws18lsp3o9uqz1.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff3vjq5ws18lsp3o9uqz1.jpeg" alt="The Last Supper (Leonardo)" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Isaacson describes how the Prior of the church that had commissioned the work became irritated with Leonardo’s procrastination and complained to Ludovico Sforza, the then Duke of Milan. He wanted Leonardo never to put down the brush, as if he were an employee working in his garden.&lt;/p&gt;

&lt;p&gt;When the artist was summoned by the Duke, the two ended up discussing how creativity manifests. Leonardo explained that sometimes you need to go slow, take breaks, and even procrastinate. This allows ideas to mature and intuition to be stimulated. Men of high intellect, he said to the duke, sometimes make their greatest advances when they work less, as their minds are occupied with their ideas and the refinement of concepts that will later take shape.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi558kviuz1z7j4pub7yc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi558kviuz1z7j4pub7yc.png" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This passage reminded me of a moment I experienced in a café in Lisbon in 2022. In one of our conversations, my colleague &lt;a href="https://www.linkedin.com/in/celomaluf/" rel="noopener noreferrer"&gt;Marcelo Maluf Teixeira&lt;/a&gt;, holding a cup in his hand, compared it to the ever-evolving nature of software.&lt;/p&gt;

&lt;p&gt;"A coffee cup is a finished product; once it is molded, baked, and painted, it is complete. There are no updates or revisions needed. In contrast, software is a dynamic entity, constantly in a state of development and improvement."&lt;/p&gt;

&lt;p&gt;As programmers, we often encounter the “Prior of the Church” mentality in our workplaces, represented by managers or executives who expect us to always be on standby, tirelessly typing to deliver software. They often see programming as a continuous production line, where the work is simply completing tasks one after the other. However, the reality of programming is that it is a creative and iterative process, where ‘active procrastination’ plays a crucial role.&lt;/p&gt;

&lt;p&gt;Procrastination, when understood as a period of reflection and incubation of ideas, is essential in the world of programming. It’s not about avoiding work, but recognizing that conscious breaks and periods of reflection are vital for innovation and creative problem-solving. In these moments, instead of incessantly writing code, we allow ourselves to absorb and contemplate the problem as a whole, often finding more effective and innovative solutions.&lt;/p&gt;

&lt;p&gt;Leonardo often took years to finish a painting, and in some cases, like the famous “Mona Lisa,” he continued to work and make changes until the end of his life. He was always experimenting with new techniques, like sfumato, a shading technique that creates a smooth transition between colors, giving an almost ethereal quality to his paintings.&lt;/p&gt;

&lt;p&gt;Leonardo also left us a tip on how to deal with that stubborn manager. He told the Prior that he still had two heads to paint, Christ’s and Judas’, and claimed he was having trouble finding a model for Judas and would use the Prior’s image if he continued to pester him. The Duke burst out laughing, saying Leonardo had thousands of reasons to do so. And the poor Prior was embarrassed and went back to taking care of his garden, leaving Leonardo in peace.&lt;/p&gt;

&lt;p&gt;Stay curious!&lt;/p&gt;

</description>
      <category>softwaredevelopment</category>
      <category>programming</category>
      <category>productivity</category>
      <category>mindfulness</category>
    </item>
  </channel>
</rss>
