<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Lav Kumar Dixit</title>
    <description>The latest articles on DEV Community by Lav Kumar Dixit (@lovekumardixit).</description>
    <link>https://dev.to/lovekumardixit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3934288%2Fdd19ec76-5b7e-4a1f-a0cd-69661c941c2a.jpeg</url>
      <title>DEV Community: Lav Kumar Dixit</title>
      <link>https://dev.to/lovekumardixit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/lovekumardixit"/>
    <language>en</language>
    <item>
      <title>Semantic Caching with Spring AI and PgVector: Reduce LLM Costs and Improve Response Time by 90%</title>
      <dc:creator>Lav Kumar Dixit</dc:creator>
      <pubDate>Tue, 26 May 2026 06:04:12 +0000</pubDate>
      <link>https://dev.to/lovekumardixit/semantic-caching-with-spring-ai-and-pgvector-reduce-llm-costs-and-improve-response-time-by-90-2oi1</link>
      <guid>https://dev.to/lovekumardixit/semantic-caching-with-spring-ai-and-pgvector-reduce-llm-costs-and-improve-response-time-by-90-2oi1</guid>
      <description>&lt;p&gt;Large Language Models are powerful, but they're also expensive and slow when handling repetitive queries. If your AI application receives thousands of similar questions every day, repeatedly calling an LLM for nearly identical requests is inefficient.&lt;/p&gt;

&lt;p&gt;What if you could intelligently reuse previous AI responses—even when the wording is different?&lt;/p&gt;

&lt;p&gt;This is where Semantic Caching comes in.&lt;/p&gt;

&lt;p&gt;In this article, we'll build a production-ready semantic caching layer using Spring AI and PgVector, enabling Java developers to dramatically reduce AI costs, lower latency, and improve user experience.&lt;/p&gt;

&lt;p&gt;The Problem: Traditional Caching Doesn't Work for AI&lt;/p&gt;

&lt;p&gt;Consider these user queries:&lt;/p&gt;

&lt;p&gt;What is Spring Boot?&lt;br&gt;
Explain Spring Boot framework.&lt;br&gt;
Can you tell me about Spring Boot?&lt;/p&gt;

&lt;p&gt;A traditional cache such as Redis treats these as completely different keys:&lt;/p&gt;

&lt;p&gt;cache.get("What is Spring Boot?");&lt;br&gt;
cache.get("Explain Spring Boot framework.");&lt;/p&gt;

&lt;p&gt;Result:&lt;/p&gt;

&lt;p&gt;❌ Cache Miss&lt;/p&gt;

&lt;p&gt;❌ New LLM Call&lt;/p&gt;

&lt;p&gt;❌ Increased Cost&lt;/p&gt;

&lt;p&gt;❌ Higher Latency&lt;/p&gt;

&lt;p&gt;Although the intent is identical, traditional caching cannot understand meaning.&lt;/p&gt;

&lt;p&gt;Semantic caching solves this problem.&lt;/p&gt;

&lt;p&gt;What is Semantic Caching?&lt;/p&gt;

&lt;p&gt;Semantic caching stores:&lt;/p&gt;

&lt;p&gt;User query&lt;br&gt;
Query embedding&lt;br&gt;
AI response&lt;/p&gt;

&lt;p&gt;When a new request arrives:&lt;/p&gt;

&lt;p&gt;Generate an embedding&lt;br&gt;
Search for similar embeddings&lt;br&gt;
Return cached response if similarity exceeds a threshold&lt;br&gt;
Otherwise call the LLM and store the result&lt;/p&gt;

&lt;p&gt;Instead of matching text, we match meaning.&lt;/p&gt;

&lt;p&gt;Why Use PgVector?&lt;/p&gt;

&lt;p&gt;PgVector extends PostgreSQL with vector similarity search capabilities.&lt;/p&gt;

&lt;p&gt;Benefits:&lt;/p&gt;

&lt;p&gt;Open source&lt;br&gt;
No additional vector database required&lt;br&gt;
Works directly with PostgreSQL&lt;br&gt;
Supports cosine similarity&lt;br&gt;
Production-ready&lt;br&gt;
Easy integration with Spring AI&lt;/p&gt;

&lt;p&gt;For many enterprise applications, PgVector eliminates the need for separate infrastructure like Pinecone or Weaviate.&lt;/p&gt;

&lt;p&gt;High-Level Architecture&lt;br&gt;
                User Query&lt;br&gt;
                     |&lt;br&gt;
                     v&lt;br&gt;
              Generate Embedding&lt;br&gt;
                     |&lt;br&gt;
                     v&lt;br&gt;
           PgVector Similarity Search&lt;br&gt;
                     |&lt;br&gt;
          +----------+----------+&lt;br&gt;
          |                     |&lt;br&gt;
     Cache Hit             Cache Miss&lt;br&gt;
          |                     |&lt;br&gt;
          v                     v&lt;br&gt;
 Cached Response          Call LLM&lt;br&gt;
          |                     |&lt;br&gt;
          +----------+----------+&lt;br&gt;
                     |&lt;br&gt;
                     v&lt;br&gt;
              Return Result&lt;/p&gt;

&lt;p&gt;This architecture reduces both latency and token consumption.&lt;/p&gt;

&lt;p&gt;Technology Stack&lt;br&gt;
Java 21&lt;br&gt;
Spring Boot 3&lt;br&gt;
Spring AI&lt;br&gt;
PostgreSQL&lt;br&gt;
PgVector&lt;br&gt;
OpenAI Embeddings&lt;br&gt;
Maven&lt;br&gt;
Setting Up PgVector&lt;/p&gt;

&lt;p&gt;Enable the extension:&lt;/p&gt;

&lt;p&gt;CREATE EXTENSION IF NOT EXISTS vector;&lt;/p&gt;

&lt;p&gt;Create a cache table:&lt;/p&gt;

&lt;p&gt;CREATE TABLE semantic_cache (&lt;br&gt;
    id BIGSERIAL PRIMARY KEY,&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;query TEXT NOT NULL,

response TEXT NOT NULL,

embedding VECTOR(1536),

created_at TIMESTAMP DEFAULT NOW()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;);&lt;/p&gt;

&lt;p&gt;Create an index for fast similarity search:&lt;/p&gt;

&lt;p&gt;CREATE INDEX semantic_cache_embedding_idx&lt;br&gt;
ON semantic_cache&lt;br&gt;
USING ivfflat (embedding vector_cosine_ops);&lt;/p&gt;

&lt;p&gt;The index becomes increasingly important as cached entries grow into the thousands or millions.&lt;/p&gt;

&lt;p&gt;Maven Dependencies&lt;/p&gt;

&lt;p&gt;Add Spring AI and PostgreSQL dependencies:&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;org.springframework.ai&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;spring-ai-openai-spring-boot-starter&amp;lt;/artifactId&amp;gt;
&amp;lt;/dependency&amp;gt;

&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;org.postgresql&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;postgresql&amp;lt;/artifactId&amp;gt;
&amp;lt;/dependency&amp;gt;

&amp;lt;dependency&amp;gt;
    &amp;lt;groupId&amp;gt;com.pgvector&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;pgvector&amp;lt;/artifactId&amp;gt;
    &amp;lt;version&amp;gt;0.1.6&amp;lt;/version&amp;gt;
&amp;lt;/dependency&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
Application Configuration&lt;br&gt;
spring:&lt;br&gt;
  datasource:&lt;br&gt;
    url: jdbc:postgresql://localhost:5432/ai_db&lt;br&gt;
    username: postgres&lt;br&gt;
    password: postgres&lt;/p&gt;

&lt;p&gt;ai:&lt;br&gt;
    openai:&lt;br&gt;
      api-key: ${OPENAI_API_KEY}&lt;/p&gt;

&lt;p&gt;Store sensitive credentials using environment variables or a secret management solution.&lt;/p&gt;

&lt;p&gt;Entity Model&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/entity"&gt;@entity&lt;/a&gt;&lt;br&gt;
@Table(name = "semantic_cache")&lt;br&gt;
@Getter&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/setter"&gt;@setter&lt;/a&gt;&lt;br&gt;
@NoArgsConstructor&lt;br&gt;
@AllArgsConstructor&lt;br&gt;
&lt;a class="mentioned-user" href="https://dev.to/builder"&gt;@builder&lt;/a&gt;&lt;br&gt;
public class SemanticCache {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;

private String query;

@Column(columnDefinition = "TEXT")
private String response;

private LocalDateTime createdAt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;br&gt;
Embedding Generation Service&lt;/p&gt;

&lt;p&gt;Spring AI makes embedding generation straightforward.&lt;/p&gt;

&lt;p&gt;@Service&lt;br&gt;
@RequiredArgsConstructor&lt;br&gt;
public class EmbeddingService {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private final EmbeddingModel embeddingModel;

public float[] generateEmbedding(String text) {

    return embeddingModel
            .embed(text);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;Every query will be converted into a high-dimensional vector representation.&lt;/p&gt;

&lt;p&gt;Similarity Search Repository&lt;/p&gt;

&lt;p&gt;Using PostgreSQL cosine similarity:&lt;/p&gt;

&lt;p&gt;@Repository&lt;br&gt;
@RequiredArgsConstructor&lt;br&gt;
public class SemanticCacheRepository {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private final JdbcTemplate jdbcTemplate;

public Optional&amp;lt;String&amp;gt; findSimilarResponse(
        PGvector embedding,
        double threshold) {

    String sql = """
        SELECT response,
               1 - (embedding &amp;lt;=&amp;gt; ?) AS similarity
        FROM semantic_cache
        WHERE 1 - (embedding &amp;lt;=&amp;gt; ?) &amp;gt; ?
        ORDER BY similarity DESC
        LIMIT 1
    """;

    List&amp;lt;String&amp;gt; responses =
            jdbcTemplate.query(
                    sql,
                    ps -&amp;gt; {
                        ps.setObject(1, embedding);
                        ps.setObject(2, embedding);
                        ps.setDouble(3, threshold);
                    },
                    (rs, rowNum) -&amp;gt; rs.getString("response")
            );

    return responses.stream().findFirst();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;The threshold controls how strict the cache matching should be.&lt;/p&gt;

&lt;p&gt;Typical values:&lt;/p&gt;

&lt;p&gt;Threshold   Behavior&lt;br&gt;
0.70    Aggressive caching&lt;br&gt;
0.80    Balanced&lt;br&gt;
0.90    Very strict&lt;/p&gt;

&lt;p&gt;For most production systems, 0.80–0.85 works well.&lt;/p&gt;

&lt;p&gt;Semantic Cache Service&lt;/p&gt;

&lt;p&gt;Now let's connect everything.&lt;/p&gt;

&lt;p&gt;@Service&lt;br&gt;
@RequiredArgsConstructor&lt;br&gt;
@Slf4j&lt;br&gt;
public class SemanticCacheService {&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private final EmbeddingService embeddingService;

private final SemanticCacheRepository repository;

private final ChatClient chatClient;

public String getResponse(String query) {

    float[] vector =
            embeddingService.generateEmbedding(query);

    PGvector embedding =
            new PGvector(vector);

    Optional&amp;lt;String&amp;gt; cachedResponse =
            repository.findSimilarResponse(
                    embedding,
                    0.85
            );

    if (cachedResponse.isPresent()) {

        log.info("Semantic Cache Hit");

        return cachedResponse.get();
    }

    log.info("Semantic Cache Miss");

    String response =
            chatClient.prompt(query)
                      .call()
                      .content();

    saveResponse(query, response, embedding);

    return response;
}

private void saveResponse(
        String query,
        String response,
        PGvector embedding) {

    // Persist cache record
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;}&lt;/p&gt;

&lt;p&gt;This is the core semantic caching workflow.&lt;/p&gt;

&lt;p&gt;Real-World Example&lt;/p&gt;

&lt;p&gt;Imagine an HR chatbot receiving these questions:&lt;/p&gt;

&lt;p&gt;What is the company's leave policy?&lt;br&gt;
How many annual leaves do employees get?&lt;br&gt;
Can I take paid vacation days?&lt;/p&gt;

&lt;p&gt;Without semantic caching:&lt;/p&gt;

&lt;p&gt;3 LLM Requests&lt;br&gt;
3 API Charges&lt;br&gt;
3 Response Generations&lt;/p&gt;

&lt;p&gt;With semantic caching:&lt;/p&gt;

&lt;p&gt;1 LLM Request&lt;br&gt;
2 Cache Hits&lt;br&gt;
Much Lower Cost&lt;/p&gt;

&lt;p&gt;At enterprise scale, this translates into thousands of dollars saved every month.&lt;/p&gt;

&lt;p&gt;Performance Results&lt;/p&gt;

&lt;p&gt;A typical benchmark:&lt;/p&gt;

&lt;p&gt;Scenario    Response Time&lt;br&gt;
OpenAI API Call 1500–3000 ms&lt;br&gt;
Semantic Cache Hit  20–50 ms&lt;/p&gt;

&lt;p&gt;Improvement:&lt;/p&gt;

&lt;p&gt;Up to 90% faster responses&lt;br&gt;
Up to 80% lower AI costs&lt;br&gt;
Reduced API rate-limit pressure&lt;/p&gt;

&lt;p&gt;Actual numbers vary depending on model choice and infrastructure.&lt;/p&gt;

&lt;p&gt;Production Considerations&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cache Expiration&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AI responses can become outdated.&lt;/p&gt;

&lt;p&gt;Add TTL support:&lt;/p&gt;

&lt;p&gt;DELETE FROM semantic_cache&lt;br&gt;
WHERE created_at &amp;lt; NOW() - INTERVAL '30 days';&lt;/p&gt;

&lt;p&gt;Schedule cleanup jobs regularly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Multi-Tenant Systems&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Store tenant IDs:&lt;/p&gt;

&lt;p&gt;tenant_id VARCHAR(50)&lt;/p&gt;

&lt;p&gt;Only search cache within the current tenant.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Response Quality Monitoring&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Track:&lt;/p&gt;

&lt;p&gt;Cache hit rate&lt;br&gt;
Similarity score&lt;br&gt;
User feedback&lt;br&gt;
Incorrect cache matches&lt;/p&gt;

&lt;p&gt;Observability is critical when deploying semantic caching at scale.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hybrid Cache Strategy&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Best practice:&lt;/p&gt;

&lt;p&gt;Redis&lt;br&gt;
   |&lt;br&gt;
Semantic Cache&lt;br&gt;
   |&lt;br&gt;
LLM&lt;/p&gt;

&lt;p&gt;Flow:&lt;/p&gt;

&lt;p&gt;Redis lookup&lt;br&gt;
Semantic lookup&lt;br&gt;
LLM call&lt;/p&gt;

&lt;p&gt;This delivers maximum performance.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Embedding Model Consistency&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Never change embedding models without re-indexing vectors.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;text-embedding-3-small&lt;/p&gt;

&lt;p&gt;and&lt;/p&gt;

&lt;p&gt;text-embedding-3-large&lt;/p&gt;

&lt;p&gt;produce different vector spaces.&lt;/p&gt;

&lt;p&gt;Mixing them will reduce search accuracy.&lt;/p&gt;

&lt;p&gt;When Should You Use Semantic Caching?&lt;/p&gt;

&lt;p&gt;Use semantic caching when:&lt;/p&gt;

&lt;p&gt;✅ Users ask repetitive questions&lt;/p&gt;

&lt;p&gt;✅ Building AI chatbots&lt;/p&gt;

&lt;p&gt;✅ Customer support assistants&lt;/p&gt;

&lt;p&gt;✅ Internal company knowledge bases&lt;/p&gt;

&lt;p&gt;✅ HR assistants&lt;/p&gt;

&lt;p&gt;✅ Documentation search systems&lt;/p&gt;

&lt;p&gt;✅ High-volume AI applications&lt;/p&gt;

&lt;p&gt;Avoid it when:&lt;/p&gt;

&lt;p&gt;❌ Every query is unique&lt;/p&gt;

&lt;p&gt;❌ Responses depend heavily on real-time data&lt;/p&gt;

&lt;p&gt;❌ Accuracy requirements are extremely strict&lt;/p&gt;

&lt;p&gt;Key Takeaways&lt;br&gt;
Traditional caching fails for AI applications because it relies on exact text matching.&lt;br&gt;
Semantic caching matches user intent using vector embeddings.&lt;br&gt;
Spring AI simplifies embedding generation and LLM integration.&lt;br&gt;
PgVector provides efficient vector similarity search directly inside PostgreSQL.&lt;br&gt;
Cache hits can reduce response times from seconds to milliseconds.&lt;br&gt;
Production systems should include TTL policies, monitoring, and hybrid cache strategies.&lt;br&gt;
Properly implemented semantic caching can significantly reduce AI infrastructure costs while improving user experience.&lt;br&gt;
Final Thoughts&lt;/p&gt;

&lt;p&gt;As AI applications scale, managing cost and latency becomes just as important as model quality. Semantic caching is one of the highest-impact optimizations you can implement because it reduces unnecessary LLM calls while delivering faster responses to users.&lt;/p&gt;

&lt;p&gt;With Spring AI and PgVector, Java developers can build a robust semantic caching layer using technologies they already know—without introducing a dedicated vector database.&lt;/p&gt;

&lt;p&gt;If you're building AI-powered applications with Spring Boot, semantic caching should be one of the first production optimizations on your roadmap.&lt;/p&gt;

&lt;p&gt;Have you implemented semantic caching in your AI applications? Share your experience and performance gains in the comments.&lt;/p&gt;

&lt;h1&gt;
  
  
  Java #SpringBoot #SpringAI #PgVector #ArtificialIntelligence #GenerativeAI #LLM #PostgreSQL #BackendDevelopment #JavaDeveloper #MachineLearning #VectorDatabase #OpenAI #SoftwareArchitecture #PerformanceOptimization #DevOps #CloudNative #AIEngineering #SemanticSearch #TechBlog
&lt;/h1&gt;

</description>
      <category>ai</category>
      <category>vectordatabase</category>
      <category>llm</category>
      <category>programming</category>
    </item>
    <item>
      <title>Stop Making Your AI Chatbot Slower: Streaming Responses with Spring AI and Server-Sent Events</title>
      <dc:creator>Lav Kumar Dixit</dc:creator>
      <pubDate>Tue, 26 May 2026 06:00:44 +0000</pubDate>
      <link>https://dev.to/lovekumardixit/stop-making-your-ai-chatbot-slower-streaming-responses-with-spring-ai-and-server-sent-events-357g</link>
      <guid>https://dev.to/lovekumardixit/stop-making-your-ai-chatbot-slower-streaming-responses-with-spring-ai-and-server-sent-events-357g</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogfbco117h5auufchbfw.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fogfbco117h5auufchbfw.jpg" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6ybwrmlzkn0387pgyhf.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg6ybwrmlzkn0387pgyhf.jpg" alt=" " width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;**The Wrong Approach&lt;/p&gt;

&lt;p&gt;Most applications follow this flow:**&lt;/p&gt;

&lt;p&gt;User Query&lt;br&gt;
    ↓&lt;br&gt;
LLM Request&lt;br&gt;
    ↓&lt;br&gt;
Wait 5-10 Seconds&lt;br&gt;
    ↓&lt;br&gt;
Return Full Response&lt;/p&gt;

&lt;p&gt;**The Better Architecture&lt;/p&gt;

&lt;p&gt;Use Spring AI's streaming support combined with Server-Sent Events (SSE).**&lt;/p&gt;

&lt;p&gt;User Query&lt;br&gt;
    ↓&lt;br&gt;
Spring AI&lt;br&gt;
    ↓&lt;br&gt;
Streaming Tokens&lt;br&gt;
    ↓&lt;br&gt;
SSE Endpoint&lt;br&gt;
    ↓&lt;br&gt;
Browser Updates UI Instantly&lt;/p&gt;

&lt;p&gt;&lt;code&gt;&lt;br&gt;
Spring AI Streaming Example&lt;br&gt;
&lt;/code&gt;``&lt;br&gt;
@RestController&lt;br&gt;
@RequiredArgsConstructor&lt;br&gt;
public class ChatController {&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;private final ChatClient chatClient;

@GetMapping(value = "/chat/stream",
        produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux&amp;lt;String&amp;gt; streamResponse(
        @RequestParam String message) {

    return chatClient.prompt()
            .user(message)
            .stream()
            .content();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;}&lt;br&gt;
`&lt;code&gt;&lt;/code&gt;&lt;br&gt;
`&lt;br&gt;
Frontend Integration&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const eventSource = new EventSource(
    "/chat/stream?message=Explain Spring AI"
);

eventSource.onmessage = (event) =&amp;gt; {
    document.getElementById("output").innerHTML += event.data;
};
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Performance Benefits&lt;br&gt;
Faster Perceived Response Time&lt;/p&gt;

&lt;p&gt;Even if the model takes 8 seconds to complete:&lt;/p&gt;

&lt;p&gt;Without Streaming → First token after 8s&lt;/p&gt;

&lt;p&gt;With Streaming → First token after 200-500ms&lt;/p&gt;

&lt;p&gt;The total generation time remains the same, but users perceive the application as significantly faster.&lt;/p&gt;

&lt;p&gt;Reduced Bounce Rate&lt;/p&gt;

&lt;p&gt;Users are less likely to leave while waiting because they can see progress immediately.&lt;/p&gt;

&lt;p&gt;Better AI UX&lt;/p&gt;

&lt;p&gt;Streaming makes even local Ollama models feel responsive.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>springboot</category>
      <category>programming</category>
    </item>
    <item>
      <title>Getting Started with Spring AI: Transforming Java Backend Development</title>
      <dc:creator>Lav Kumar Dixit</dc:creator>
      <pubDate>Sat, 16 May 2026 06:00:43 +0000</pubDate>
      <link>https://dev.to/lovekumardixit/getting-started-with-spring-ai-transforming-java-backend-development-3a8b</link>
      <guid>https://dev.to/lovekumardixit/getting-started-with-spring-ai-transforming-java-backend-development-3a8b</guid>
      <description>&lt;p&gt;Artificial Intelligence is rapidly becoming an essential part of modern software systems, and Java developers now have powerful tools to integrate AI capabilities into enterprise applications through Spring AI.&lt;/p&gt;

&lt;p&gt;Spring AI extends the Spring ecosystem by providing seamless integration with leading AI models, enabling developers to build intelligent applications with familiar Spring Boot architecture.&lt;/p&gt;

&lt;p&gt;Key Benefits of Spring AI:&lt;br&gt;
Simplified integration with AI models such as OpenAI&lt;br&gt;
Faster development of AI-powered REST APIs&lt;br&gt;
Enterprise-grade scalability&lt;br&gt;
Enhanced developer productivity&lt;br&gt;
Easy implementation of chatbots, recommendation systems, and automation tools&lt;br&gt;
Why It Matters:&lt;/p&gt;

&lt;p&gt;For backend developers, Spring AI bridges the gap between traditional Java development and next-generation AI solutions, creating opportunities to build smarter, more adaptive applications.&lt;/p&gt;

&lt;p&gt;As the demand for AI-integrated systems continues to grow, learning Spring AI can be a valuable step for developers looking to stay relevant in the evolving tech landscape.&lt;/p&gt;

&lt;p&gt;Spring AI represents a significant advancement for Java developers who want to combine robust backend engineering with modern AI innovation.&lt;/p&gt;

&lt;h1&gt;
  
  
  Java #SpringBoot #SpringAI #ArtificialIntelligence #BackendDevelopment #SoftwareEngineering #OpenAI
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9aezl4vq85r6t19a9af.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff9aezl4vq85r6t19a9af.jpg" alt=" " width="320" height="158"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyouf3ixwa36sve140wwj.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyouf3ixwa36sve140wwj.jpg" alt=" " width="268" height="188"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>backend</category>
      <category>java</category>
      <category>springboot</category>
    </item>
  </channel>
</rss>
