<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhijith</title>
    <description>The latest articles on DEV Community by Abhijith (@abhijithzero).</description>
    <link>https://dev.to/abhijithzero</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F920114%2Fcb6c5293-43ac-4841-8057-450571542a33.png</url>
      <title>DEV Community: Abhijith</title>
      <link>https://dev.to/abhijithzero</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abhijithzero"/>
    <language>en</language>
    <item>
      <title>How I build an AI Conversation Coach with Gemini Live API for Gemini Live Agent Challenge</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Mon, 16 Mar 2026 20:52:40 +0000</pubDate>
      <link>https://dev.to/abhijithzero/how-i-build-an-ai-conversation-coach-with-gemini-live-api-for-gemini-live-agent-challenge-p6f</link>
      <guid>https://dev.to/abhijithzero/how-i-build-an-ai-conversation-coach-with-gemini-live-api-for-gemini-live-agent-challenge-p6f</guid>
      <description>&lt;h2&gt;
  
  
  Why I Built This
&lt;/h2&gt;

&lt;p&gt;I'm socially anxious. Not the cute "I'm introverted" kind but the actual scrambling for words, wishing the ground would open up and swallow kind.&lt;/p&gt;

&lt;p&gt;So I moved to Ireland for grad school. Figured a fresh start, new country, new people. Maybe things would be different. A few months in, I went to a tech conference. Perfect opportunity to network, I thought. I psyched myself up.&lt;/p&gt;

&lt;p&gt;Then it happened. I got there, saw a group of people chatting, and my brain just... shut down. Froze completely. I wanted to join the conversation so badly, but I just couldn't do it. I spent half the conference wandering around, pretending to look at the schedule.&lt;/p&gt;

&lt;p&gt;The frustrating part? It wasn't because I had nothing to say. I work in tech, I find the problems interesting, I wanted to talk to these people. The problem was pure lack of practice. I'd never actually &lt;em&gt;practiced&lt;/em&gt; starting conversations with strangers. And it showed.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Idea
&lt;/h2&gt;

&lt;p&gt;So what if there was a way to practice this? Like, actually practice talking to people without the fear of judgment. A realistic AI person who could act distracted, busy, skeptical the way real people at conferences actually are.&lt;/p&gt;

&lt;p&gt;IceBreaker is basically that. You pick any scenario be it a casual conversation, tech mixer, cold intro at a founder booth, whatever you want and you practice out loud. The AI responds in real-time, like an actual person. It can be warm or guarded depending on the difficulty level.&lt;/p&gt;

&lt;p&gt;While you're talking, it analyzes your audio and video in real-time and gives you live tips. Things like "try asking a question here" or "you said 'um' 5 times that minute." After you're done, you get a full breakdown with scores: how much you talked vs. listened, how many questions you asked, filler words, body language confidence, your sentiment trend through the conversation, and how well you recovered from awkward moments.&lt;/p&gt;

&lt;p&gt;Plus a dashboard to track your progress over multiple sessions. Watch your scores improve over time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building It
&lt;/h2&gt;

&lt;p&gt;This wasn't a simple chatbot. I needed real-time audio conversations that actually felt like talking to a person. Which meant WebSockets, streaming audio, video frames, and multiple systems all talking to each other.&lt;/p&gt;

&lt;p&gt;Here's what I used:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Part&lt;/th&gt;
&lt;th&gt;Stack&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Frontend&lt;/td&gt;
&lt;td&gt;React 19 + Vite + Tailwind + Recharts (for the debrief charts)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI Engine&lt;/td&gt;
&lt;td&gt;Google Gemini 2.5 Flash via Gemini Live API&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Backend&lt;/td&gt;
&lt;td&gt;Python + FastAPI on Cloud Run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Database&lt;/td&gt;
&lt;td&gt;Firestore&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Deployment&lt;/td&gt;
&lt;td&gt;Docker, Cloud Build, Vercel&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The tricky part was the real-time bit. The browser opens a WebSocket directly to Gemini, streams audio to it (16 kHz), gets audio back (24 kHz), and also sends video frames (~1 per second) so Gemini can see your body language.&lt;/p&gt;

&lt;p&gt;Two function calls handle the feedback loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;submit_tip()&lt;/strong&gt;: After you speak, Gemini calls this to send live coaching&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;submit_metrics()&lt;/strong&gt;: At the end, Gemini calls this to calculate your scores&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Why Function Calls Mattered
&lt;/h3&gt;

&lt;p&gt;Here's something I learned the hard way: asking an LLM to output JSON mixed in with regular text is a mess. I started by having Gemini just output tips and metrics as text, then I'd parse them. It was unreliable. Sometimes it would paraphrase what you said wrong. Sometimes the JSON wasn't valid. Sometimes it just forgot what you were asking for.&lt;/p&gt;

&lt;p&gt;Then I switched to function calling. Instead of asking Gemini to "output this as JSON," I gave it actual functions to call. Now the coaching tips and metrics come through as clean, structured data. No parsing guesswork, no hallucinations. It just works.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Went Wrong (And How I Fixed It)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The AI Kept Forgetting Things
&lt;/h3&gt;

&lt;p&gt;Early on, if your internet dropped for a second, the AI would lose context. It'd restart the conversation or start making stuff up. Imagine you're halfway through pitching your startup and suddenly the AI forgets what you said two turns ago. It was bad.&lt;/p&gt;

&lt;p&gt;I fixed this by saving sessions on the backend. Now if you drop connection, you can pick up right where you left off.&lt;/p&gt;

&lt;h3&gt;
  
  
  The AI Sounded Like a Chatbot
&lt;/h3&gt;

&lt;p&gt;Getting Gemini to sound like an actual person at a conference not helpful, not overly eager took a lot of tweaking. The same prompt would sometimes produce a warm, natural response and other times something super formal and robotic. It'd ask too many questions at once, or use phrases nobody actually says.&lt;/p&gt;

&lt;p&gt;I spent a lot of time on prompt engineering. Testing different personas, different conversational styles, getting more specific about what "natural" means. It's still not perfect, but it's way better.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parsing Was Killing Me
&lt;/h3&gt;

&lt;p&gt;Before I switched to function calls, I was asking Gemini to output coaching tips as text, then trying to parse that. It was fragile. The model would sometimes rewrite what you said, sometimes format things weird. I'd built all this parsing logic and it still broke constantly.&lt;/p&gt;

&lt;p&gt;Once I switched to function calling, it was night and day. Clean JSON every time. No more parsing headaches.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things I'm Actually Proud Of
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Getting Gemini Live to work end-to-end.&lt;/strong&gt; Real-time audio conversations that don't feel like you're talking to a bot. That's genuinely hard. WebSockets, audio streaming, keeping state, managing latency, talk about a lot that can go wrong. I made it work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building something people can actually use.&lt;/strong&gt; Not just a demo that works once. I mean error handling, reconnection logic, data persistence, tracking progress over time. The boring stuff that makes a product actually useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The feedback system actually helps.&lt;/strong&gt; You don't just get random stats. The coaching tips are timely (in-the-moment), and the debrief metrics are actually actionable. You can see exactly where you improved.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Learned
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Real-time streaming is fragile.&lt;/strong&gt; Audio dropping, video lag, WebSocket timeouts—there are so many places where things can break. It's not as simple as "just stream the data." You have to think about buffers, reconnection, graceful degradation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Never ask an LLM for JSON and try to parse it.&lt;/strong&gt; This is a hard lesson. The model will sometimes output valid JSON, sometimes not, sometimes it'll add comments, sometimes it'll mess up the schema. Function calling is the right answer. Give the model an actual function to call, not a text format to output.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Prompt engineering is really hard.&lt;/strong&gt; I thought I could write a prompt once and be done. Nope. The same prompt produces different outputs depending on temperature, context, the moon phase, who knows. It takes iteration, testing, examples, and luck. Don't underestimate it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Shipping matters more than having the perfect feature.&lt;/strong&gt; I could spend months perfecting the persona AI. But shipping an 80% solution that people can use and give feedback on is way more valuable. You learn what actually matters from real users, not from theorizing.&lt;/p&gt;




&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;More personas and scenarios.&lt;/strong&gt; Right now there are a few conversation types. I want to expand that—different industries, different difficulty levels, different people types.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Let people create custom scenarios.&lt;/strong&gt; Instead of me pre-building everything, what if you could describe an event you're going to, describe the kind of person you expect to meet, and have IceBreaker generate a practice session tailored to that? Way more useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  That's It
&lt;/h2&gt;

&lt;p&gt;Building this thing taught me that good products come from real problems. I built it because I was frustrated with my own anxiety, not because I wanted to solve "networking" for everyone. And honestly? My anxiety is still there. I'm still nervous at conferences.&lt;/p&gt;

&lt;p&gt;But now I can practice. I can work on it. And maybe that's the difference between just being stuck with something and actually being able to improve.&lt;/p&gt;

&lt;p&gt;If you're like me, if you struggle with social stuff and want to get better try it out. If you have feedback, let me know.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://ice-breaker-ai.vercel.app/" rel="noopener noreferrer"&gt;Try IceBreaker&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/abhijith-zero/IceBreaker" rel="noopener noreferrer"&gt;View Code&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>geminiliveagentchallenge</category>
    </item>
    <item>
      <title>Building a RAG Powered Assistant with Spring AI and LM Studio</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Tue, 17 Feb 2026 00:25:34 +0000</pubDate>
      <link>https://dev.to/abhijithzero/building-a-rag-powered-financial-assistant-with-spring-boot-1dam</link>
      <guid>https://dev.to/abhijithzero/building-a-rag-powered-financial-assistant-with-spring-boot-1dam</guid>
      <description>&lt;p&gt;&lt;em&gt;How to Create an Intelligent Document Q&amp;amp;A System Using Spring AI, PostgreSQL, and LM Studio&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Imagine having an AI assistant that can instantly answer questions about hundreds of financial documents, quarterly reports, market analyses, policy papers without you having to manually search through pages of text. That's exactly what Retrieval Augmented Generation (RAG) enables, and in this tutorial, we'll build one from scratch using Spring Boot.&lt;/p&gt;

&lt;p&gt;By the end of this guide, you'll have a fully functional application that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ingests PDF documents and extracts their content&lt;/li&gt;
&lt;li&gt;Converts text into semantic embeddings using AI models&lt;/li&gt;
&lt;li&gt;Stores embeddings in a PostgreSQL vector database&lt;/li&gt;
&lt;li&gt;Answers natural language queries with contextual accuracy&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is RAG and Why Does It Matter?
&lt;/h2&gt;

&lt;p&gt;Retrieval Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant context from external knowledge sources. Instead of relying solely on the model's training data, RAG systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve&lt;/strong&gt; relevant documents based on semantic similarity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Augment&lt;/strong&gt; the LLM's prompt with retrieved context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate&lt;/strong&gt; accurate, grounded answers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This approach is particularly powerful for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enterprise knowledge bases with proprietary information&lt;/li&gt;
&lt;li&gt;Financial document analysis and compliance&lt;/li&gt;
&lt;li&gt;Customer support systems with extensive documentation&lt;/li&gt;
&lt;li&gt;Research paper exploration and literature reviews&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Architecture Overview
&lt;/h2&gt;

&lt;p&gt;Our FinanceRag application follows a straightforward yet powerful architecture:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Document Ingestion Pipeline
&lt;/h3&gt;

&lt;p&gt;PDF documents are read from the classpath and processed by Spring AI's &lt;code&gt;PagePdfDocumentReader&lt;/code&gt;, which extracts text while preserving structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Text Chunking
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;TokenTextSplitter&lt;/code&gt; divides the extracted text into manageable chunks (800 tokens each). This is crucial because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embedding models have token limits&lt;/li&gt;
&lt;li&gt;Smaller chunks provide more precise semantic matching&lt;/li&gt;
&lt;li&gt;Context windows in LLMs benefit from focused, relevant information&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Vector Embedding Generation
&lt;/h3&gt;

&lt;p&gt;Each text chunk is converted into a high-dimensional vector (embedding) using the &lt;code&gt;nomic-embed-text&lt;/code&gt; model. These embeddings capture semantic meaning similar concepts cluster together in vector space.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Vector Storage with pgvector
&lt;/h3&gt;

&lt;p&gt;Embeddings are persisted in PostgreSQL using the &lt;code&gt;pgvector&lt;/code&gt; extension, which enables efficient similarity searches. We use HNSW indexing for fast approximate nearest neighbor (ANN) queries.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Query Processing
&lt;/h3&gt;

&lt;p&gt;When a user asks a question:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The question is embedded using the same model&lt;/li&gt;
&lt;li&gt;Vector similarity search retrieves the most relevant document chunks&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;QuestionAnswerAdvisor&lt;/code&gt; augments the LLM prompt with this context&lt;/li&gt;
&lt;li&gt;The LLM generates a contextual answer&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Building the Application: Step by Step
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Before diving into code, ensure you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Java 17+&lt;/strong&gt; installed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;PostgreSQL 12+&lt;/strong&gt; with pgvector extension&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LM Studio&lt;/strong&gt; (or another OpenAI-compatible LLM endpoint)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maven 3+&lt;/strong&gt; for dependency management&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setting Up PostgreSQL with pgvector
&lt;/h3&gt;

&lt;p&gt;You have two options for setting up PostgreSQL:&lt;/p&gt;

&lt;h4&gt;
  
  
  Option 1: Using Docker (Recommended for Quick Start)
&lt;/h4&gt;

&lt;p&gt;Your repository includes a &lt;code&gt;compose.yaml&lt;/code&gt; file for easy setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;postgres&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pgvector/pgvector:pg16&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;55419:5432"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_PASSWORD&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;postgres&lt;/span&gt;
      &lt;span class="na"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;finance&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;pgdata:/var/lib/postgresql/data&lt;/span&gt;

&lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;pgdata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Simply run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker-compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This spins up PostgreSQL with pgvector pre-installed on port 55419.&lt;/p&gt;

&lt;h4&gt;
  
  
  Option 2: Manual Installation
&lt;/h4&gt;

&lt;p&gt;First, create a database and enable the vector extension:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;DATABASE&lt;/span&gt; &lt;span class="n"&gt;finance&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="err"&gt;\&lt;/span&gt;&lt;span class="k"&gt;c&lt;/span&gt; &lt;span class="n"&gt;finance&lt;/span&gt;
&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="n"&gt;EXTENSION&lt;/span&gt; &lt;span class="n"&gt;IF&lt;/span&gt; &lt;span class="k"&gt;NOT&lt;/span&gt; &lt;span class="k"&gt;EXISTS&lt;/span&gt; &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The pgvector extension adds a new &lt;code&gt;vector&lt;/code&gt; data type to PostgreSQL, enabling efficient storage and querying of high-dimensional vectors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring Spring Boot
&lt;/h3&gt;

&lt;p&gt;Your &lt;code&gt;application.properties&lt;/code&gt; file should include:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Application Name
&lt;/span&gt;&lt;span class="py"&gt;spring.application.name&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;finaceRag&lt;/span&gt;

&lt;span class="c"&gt;# Database Configuration (Docker setup)
&lt;/span&gt;&lt;span class="py"&gt;spring.datasource.url&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;jdbc:postgresql://localhost:55419/finance&lt;/span&gt;
&lt;span class="py"&gt;spring.datasource.username&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;
&lt;span class="py"&gt;spring.datasource.password&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;postgres&lt;/span&gt;

&lt;span class="c"&gt;# LLM Configuration (LM Studio)
&lt;/span&gt;&lt;span class="py"&gt;spring.ai.openai.base-url&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;http://localhost:1234/&lt;/span&gt;
&lt;span class="py"&gt;spring.ai.openai.api-key&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;dummy&lt;/span&gt;

&lt;span class="c"&gt;# Embedding Model
&lt;/span&gt;&lt;span class="py"&gt;spring.ai.openai.embedding.options.model&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;nomic-embed-text&lt;/span&gt;

&lt;span class="c"&gt;# Chat Model
&lt;/span&gt;&lt;span class="py"&gt;spring.ai.openai.chat.options.model&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;google/gemma-3-4b&lt;/span&gt;

&lt;span class="c"&gt;# Vector Store Configuration
&lt;/span&gt;&lt;span class="py"&gt;spring.ai.vectorstore.pgvector.initialize-schema&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Ingestion Control (IMPORTANT!)
&lt;/span&gt;&lt;span class="py"&gt;financerag.ingest.enabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Configuration Notes:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Port &lt;code&gt;55419&lt;/code&gt; matches the Docker Compose setup&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;initialize-schema=true&lt;/code&gt; automatically creates the vector store table&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;nomic-embed-text&lt;/code&gt; is a lightweight, high-quality embedding model&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;google/gemma-3-4b&lt;/code&gt; is the chat model served by LM Studio&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;** Important: Ingestion Control**&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;financerag.ingest.enabled&lt;/code&gt; property is a smart optimization:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First Run (Initial Setup):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;financerag.ingest.enabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This processes your PDFs and populates the vector store.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Subsequent Runs:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;financerag.ingest.enabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This skips ingestion and starts the application immediately. The embeddings are already in PostgreSQL, so there's no need to re-process documents every time!&lt;/p&gt;

&lt;p&gt;This design prevents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicate embeddings in the database&lt;/li&gt;
&lt;li&gt;Slow startup times on every restart&lt;/li&gt;
&lt;li&gt;Unnecessary LLM API calls&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Setting Up LM Studio
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Download and Install LM Studio&lt;/strong&gt; from &lt;a href="https://lmstudio.ai/" rel="noopener noreferrer"&gt;lmstudio.ai&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Download the Required Models:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embedding Model:&lt;/strong&gt; Search for "nomic-embed-text" in LM Studio and download it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chat Model:&lt;/strong&gt; Search for "google/gemma-3-4b" (or similar) and download it&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Start the Local Server:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Open LM Studio&lt;/li&gt;
&lt;li&gt;Go to the "Local Server" tab&lt;/li&gt;
&lt;li&gt;Select your chat model (gemma-3-4b)&lt;/li&gt;
&lt;li&gt;Click "Start Server" (it will run on &lt;code&gt;http://localhost:1234&lt;/code&gt; by default)&lt;/li&gt;
&lt;li&gt;Ensure the embedding model is also loaded&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Verify the Connection:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   curl http://localhost:1234/v1/models
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see your loaded models listed.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Ingestion Service
&lt;/h3&gt;

&lt;p&gt;The heart of our document processing pipeline is the &lt;code&gt;IngestionService&lt;/code&gt;. Here's how it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Component&lt;/span&gt;
&lt;span class="nd"&gt;@ConditionalOnProperty&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"financerag.ingest.enabled"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;havingValue&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"true"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;matchIfMissing&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;IngestionService&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;CommandLineRunner&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;Logger&lt;/span&gt; &lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoggerFactory&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getLogger&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;IngestionService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;VectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="nd"&gt;@Value&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"classpath:/docs/article.pdf"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="nc"&gt;Resource&lt;/span&gt; &lt;span class="n"&gt;pdfResource&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;IngestionService&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;VectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;vectorStore&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="kd"&gt;throws&lt;/span&gt; &lt;span class="nc"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Starting data ingestion process..."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// 1. Read PDF using paragraph-based reader&lt;/span&gt;
        &lt;span class="kt"&gt;var&lt;/span&gt; &lt;span class="n"&gt;pdfReader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;ParagraphPdfDocumentReader&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdfResource&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

        &lt;span class="c1"&gt;// 2. Split text into chunks&lt;/span&gt;
        &lt;span class="nc"&gt;TextSplitter&lt;/span&gt; &lt;span class="n"&gt;splitter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TokenTextSplitter&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

        &lt;span class="c1"&gt;// 3. Process and store in vector database&lt;/span&gt;
        &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;splitter&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;apply&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pdfReader&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;get&lt;/span&gt;&lt;span class="o"&gt;()));&lt;/span&gt;

        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;info&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Vector store updated with PDF content."&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Key Implementation Insights:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Conditional Ingestion&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;@ConditionalOnProperty&lt;/code&gt; annotation is brilliant - it only runs ingestion when you explicitly enable it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable ingestion on first run
&lt;/span&gt;&lt;span class="py"&gt;financerag.ingest.enabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;

&lt;span class="c"&gt;# Disable after initial setup to avoid re-ingesting
&lt;/span&gt;&lt;span class="py"&gt;financerag.ingest.enabled&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prevents re-processing documents on every application restart!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. CommandLineRunner Interface&lt;/strong&gt;&lt;br&gt;
By implementing &lt;code&gt;CommandLineRunner&lt;/code&gt;, the ingestion happens automatically after Spring Boot starts, but before the application begins serving requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. ParagraphPdfDocumentReader vs PagePdfDocumentReader&lt;/strong&gt;&lt;br&gt;
Your code uses &lt;code&gt;ParagraphPdfDocumentReader&lt;/code&gt; which:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preserves document structure better by respecting paragraph boundaries&lt;/li&gt;
&lt;li&gt;Creates more semantically meaningful chunks&lt;/li&gt;
&lt;li&gt;Better suited for financial documents with structured content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;4. Simplified API&lt;/strong&gt;&lt;br&gt;
The &lt;code&gt;vectorStore.accept()&lt;/code&gt; method elegantly handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Embedding generation for each chunk&lt;/li&gt;
&lt;li&gt;Batch insertion into PostgreSQL&lt;/li&gt;
&lt;li&gt;All the complexity hidden behind a clean API&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  The Chat Controller
&lt;/h3&gt;

&lt;p&gt;Now let's expose a REST endpoint for queries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RestController&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ChatController&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;ChatClient&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;ChatController&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ChatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;Builder&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;PgVectorStore&lt;/span&gt; &lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;chatClient&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;defaultAdvisors&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;QuestionAnswerAdvisor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vectorStore&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@GetMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/chat"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestParam&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;chatClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;user&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;call&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
            &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Magic of QuestionAnswerAdvisor:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;QuestionAnswerAdvisor&lt;/code&gt; is where RAG happens. Behind the scenes, it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Converts the user's question into an embedding&lt;/li&gt;
&lt;li&gt;Performs a similarity search against the vector store&lt;/li&gt;
&lt;li&gt;Injects the most relevant document chunks into the prompt&lt;/li&gt;
&lt;li&gt;Sends the augmented prompt to the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key Implementation Details:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The advisor is built using the builder pattern: &lt;code&gt;QuestionAnswerAdvisor.builder(vectorStore).build()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Spring AI automatically handles the vector search and context injection&lt;/li&gt;
&lt;li&gt;The controller method is elegantly simple - just pass the question through the chat client&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Real World Considerations
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Choosing the Right Chunk Size
&lt;/h3&gt;

&lt;p&gt;The 800-token chunk size is a starting point. Consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Smaller chunks (200-400 tokens):&lt;/strong&gt; Better precision, but may lose context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Larger chunks (1000-1500 tokens):&lt;/strong&gt; More context, but less precise matching&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Experiment with your specific use case. Financial reports might need larger chunks to preserve numerical context, while FAQs work better with smaller, focused chunks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scaling to Production
&lt;/h3&gt;

&lt;p&gt;For production deployments, consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Async ingestion:&lt;/strong&gt; Move document processing to background jobs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching:&lt;/strong&gt; Cache embeddings for frequently accessed documents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata filtering:&lt;/strong&gt; Add tags (date, category, source) to narrow searches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitoring:&lt;/strong&gt; Track query latency and similarity scores&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Hybrid Search Strategies
&lt;/h3&gt;

&lt;p&gt;Pure vector search isn't always optimal. Combine it with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full-text search:&lt;/strong&gt; For exact keyword matches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;BM25 ranking:&lt;/strong&gt; Traditional relevance scoring&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-ranking:&lt;/strong&gt; Use a cross-encoder model to refine top results&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing Your RAG System
&lt;/h2&gt;

&lt;p&gt;Start the application and test with curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="s2"&gt;"http://localhost:8080/chat?question=What%20were%20the%20key%20trends%20in%20Q4%20earnings?"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see an answer grounded in your ingested documents. Compare responses with and without RAG to appreciate the difference in accuracy and relevance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Pitfalls and Solutions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Embedding Dimension Mismatch
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Embeddings fail to store with dimension errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Ensure &lt;code&gt;spring.ai.vectorstore.pgvector.dimensions&lt;/code&gt; matches your embedding model. For &lt;code&gt;nomic-embed-text&lt;/code&gt;, use 768.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Poor Retrieval Quality
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Answers don't align with document content.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Adjust chunk size, increase &lt;code&gt;topK&lt;/code&gt;, or lower the similarity threshold. Also verify your embedding model is appropriate for your domain.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Memory Issues During Ingestion
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt; Application crashes with &lt;code&gt;OutOfMemoryError&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Process documents in batches, increase JVM heap size (&lt;code&gt;-Xmx4g&lt;/code&gt;), or limit the &lt;code&gt;maxNumChunks&lt;/code&gt; parameter.&lt;/p&gt;

&lt;h2&gt;
  
  
  Extending FinanceRag: Ideas for Enhancement
&lt;/h2&gt;

&lt;p&gt;This project is a foundation. Here are some powerful extensions:&lt;/p&gt;

&lt;h3&gt;
  
  
  Multi Document Support
&lt;/h3&gt;

&lt;p&gt;Instead of hardcoding a single PDF, scan a directory or accept uploads via REST API. Add metadata (filename, upload date) to enable filtered searches.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conversational Memory
&lt;/h3&gt;

&lt;p&gt;Implement session based chat history so users can ask follow up questions without repeating context. Spring AI supports this with &lt;code&gt;MessageChatMemoryAdvisor&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Source Attribution
&lt;/h3&gt;

&lt;p&gt;Return not just the answer but citations showing which document chunks were used. This builds trust and allows users to verify information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced Analytics
&lt;/h3&gt;

&lt;p&gt;Track which documents are queried most frequently, average similarity scores, and query patterns to identify knowledge gaps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You've now built a production-ready RAG system that can intelligently answer questions about your documents. This architecture scales to thousands of documents and can be adapted for countless use cases such as customer support, legal document analysis, medical research, and more.&lt;/p&gt;

&lt;p&gt;The beauty of Spring AI is how it abstracts the complexity of embeddings, vector stores, and LLM orchestration, letting you focus on business logic. With just three components IngestionService, ChatController, and pgvector we've created a powerful AI assistant.&lt;/p&gt;

&lt;p&gt;The full source code for FinanceRag is available on &lt;a href="https://github.com/abhijith-zero/FinanceRag" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Clone it, experiment with different models and chunk sizes, and adapt it to your domain. The future of enterprise AI is built on foundations like these combining the power of LLMs with your organization's proprietary knowledge.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Special Thanks:&lt;/strong&gt; This project was inspired by the excellent Spring AI content from &lt;a href="https://www.danvega.dev/" rel="noopener noreferrer"&gt;Dan Vega&lt;/a&gt;, whose tutorials have helped countless developers understand the power of RAG architectures.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Happy coding, and may your AI assistants always retrieve the right context!&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  About the Author
&lt;/h2&gt;

&lt;p&gt;This tutorial is brought to you by the &lt;a href="https://abhizero.vercel.app/" rel="noopener noreferrer"&gt;Abhijith Rajesh&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Links:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/abhijith-zero/FinanceRag" rel="noopener noreferrer"&gt;GitHub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.spring.io/spring-ai/reference/" rel="noopener noreferrer"&gt;Spring AI Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://lmstudio.ai/" rel="noopener noreferrer"&gt;LM Studio&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




</description>
      <category>ai</category>
      <category>rag</category>
      <category>springboot</category>
      <category>llm</category>
    </item>
    <item>
      <title>Top 5 Infrastructure-Level Techniques to Handle High Traffic in Spring Boot: Part 2</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Sun, 13 Jul 2025 18:26:02 +0000</pubDate>
      <link>https://dev.to/abhijithzero/top-5-infrastructure-level-techniques-to-handle-high-traffic-in-spring-boot-part-2-3g48</link>
      <guid>https://dev.to/abhijithzero/top-5-infrastructure-level-techniques-to-handle-high-traffic-in-spring-boot-part-2-3g48</guid>
      <description>&lt;p&gt;In &lt;a href="https://dev.to/abhijithzero/top-5-code-level-techniques-to-handle-high-traffic-in-spring-boot-part-1-29en"&gt;Part 1&lt;/a&gt; of this blog series, we focused on &lt;strong&gt;code-level techniques&lt;/strong&gt; to make your Spring Boot APIs more resilient: connection pooling, caching, async processing, rate limiting, and circuit breakers.&lt;/p&gt;

&lt;p&gt;But when traffic &lt;strong&gt;really&lt;/strong&gt; surges — due to a flash sale, viral feature, or seasonal peak — smart code alone may not be enough.&lt;/p&gt;

&lt;p&gt;That’s where &lt;strong&gt;infrastructure-level strategies&lt;/strong&gt; come in.&lt;/p&gt;

&lt;p&gt;From auto-scaling groups and load balancers to observability, CDNs, and container orchestration — these tools and patterns ensure your backend scales &lt;strong&gt;horizontally&lt;/strong&gt;, responds &lt;strong&gt;intelligently&lt;/strong&gt;, and recovers &lt;strong&gt;automatically&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Let’s break down how you can build an infrastructure that’s ready for real-world traffic.&lt;/p&gt;




&lt;h2&gt;
  
  
  1.  Load Balancing
&lt;/h2&gt;

&lt;p&gt;When thousands (or millions) of users start hitting your application, routing &lt;strong&gt;all that traffic to a single server&lt;/strong&gt; is a recipe for disaster. That's where &lt;strong&gt;load balancers&lt;/strong&gt; come in.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Load Balancing?
&lt;/h3&gt;

&lt;p&gt;Load balancing is the process of &lt;strong&gt;distributing incoming requests across multiple instances&lt;/strong&gt; of your application, so that no single server gets overwhelmed.&lt;/p&gt;

&lt;p&gt;It ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;High availability&lt;/strong&gt; (if one instance goes down, others take over)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better performance&lt;/strong&gt; (requests are split evenly)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability&lt;/strong&gt; (you can add/remove servers dynamically)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think of it like a traffic cop that routes vehicles (requests) evenly across open lanes (app instances).&lt;/p&gt;

&lt;h3&gt;
  
  
  L4 vs L7 Load Balancing
&lt;/h3&gt;

&lt;p&gt;There are two main types of load balancing:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L4 (Transport Layer)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Routes traffic based on IP address and port (TCP/UDP)&lt;/td&gt;
&lt;td&gt;Fast routing for HTTP, gRPC, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;L7 (Application Layer)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Routes based on request content (URL path, headers, cookies)&lt;/td&gt;
&lt;td&gt;Direct &lt;code&gt;/api/users&lt;/code&gt; to user-service and &lt;code&gt;/api/orders&lt;/code&gt; to order-service&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Tip:&lt;/strong&gt; Most modern apps use &lt;strong&gt;L7 load balancing&lt;/strong&gt; because it provides more control and intelligent routing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Popular Load Balancers
&lt;/h3&gt;

&lt;p&gt;Here are some tools you can use depending on your environment:&lt;/p&gt;

&lt;h4&gt;
  
  
  - NGINX
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Lightweight and widely used L7 load balancer&lt;/li&gt;
&lt;li&gt;Great for self-managed or on-prem deployments&lt;/li&gt;
&lt;li&gt;Can route based on path, headers, or even cookie values&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  - AWS Application Load Balancer (ALB)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Fully managed L7 load balancer in AWS&lt;/li&gt;
&lt;li&gt;Works seamlessly with EC2, ECS, EKS, etc.&lt;/li&gt;
&lt;li&gt;Supports auto-scaling + health checks&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  - Spring Cloud Gateway
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Java-based API gateway built on Spring Boot + Reactor&lt;/li&gt;
&lt;li&gt;Ideal for microservices and reactive apps&lt;/li&gt;
&lt;li&gt;Can be used for dynamic routing, rate limiting, and circuit breaking&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  2.  Auto Scaling Groups (ASGs)
&lt;/h2&gt;

&lt;p&gt;No matter how well you’ve tuned your code or balanced your load, there’s a limit to what a single instance of your application can handle.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Auto Scaling Groups (ASGs)&lt;/strong&gt; let you automatically adjust the number of application instances based on real-time traffic and performance — scaling &lt;strong&gt;out&lt;/strong&gt; during spikes and &lt;strong&gt;in&lt;/strong&gt; when things are quiet.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is an Auto Scaling Group?
&lt;/h3&gt;

&lt;p&gt;An Auto Scaling Group is a cloud service (commonly on AWS, Azure, or GCP) that manages a &lt;strong&gt;group of virtual machines&lt;/strong&gt; (like EC2 instances) running your app.&lt;/p&gt;

&lt;p&gt;It can automatically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Scale out&lt;/strong&gt;: Add more instances when load increases&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Scale in&lt;/strong&gt;: Remove excess instances when traffic drops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures your app has &lt;strong&gt;just enough capacity&lt;/strong&gt; — not too little (which causes downtime) and not too much (which wastes money).&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Scaling Triggers
&lt;/h3&gt;

&lt;p&gt;ASGs respond to key metrics like:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;CPU Utilization&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scale out when CPU &amp;gt; 70% for X minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Request Count&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scale based on incoming HTTP request rate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Latency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Scale if average response time increases&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Custom Metrics&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Queue length, memory usage, DB connections&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;You can configure these in tools like &lt;strong&gt;AWS CloudWatch&lt;/strong&gt; or &lt;strong&gt;Kubernetes HPA&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Horizontal vs Vertical Scaling
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Example&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Vertical Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Increase resources on a single machine (CPU, RAM)&lt;/td&gt;
&lt;td&gt;Upgrade from t3.small → t3.large&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Horizontal Scaling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Add more instances of the app&lt;/td&gt;
&lt;td&gt;Launch 3 → 10 EC2 instances&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Horizontal scaling&lt;/strong&gt; (ASG) is preferred for high availability and fault tolerance.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Warm vs Cold Starts
&lt;/h3&gt;

&lt;p&gt;When an ASG scales out, new instances need to &lt;strong&gt;boot up&lt;/strong&gt;, pull code, and initialize. This takes time (30–90 seconds), called a &lt;strong&gt;cold start&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To reduce cold start impact:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;Amazon AMIs&lt;/strong&gt; or &lt;strong&gt;Docker images&lt;/strong&gt; preloaded with your app&lt;/li&gt;
&lt;li&gt;Prefer &lt;strong&gt;warm pools&lt;/strong&gt; or &lt;strong&gt;pre-provisioned containers&lt;/strong&gt; (ECS, EKS)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: ASG in AWS
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;You set up an ASG with:

&lt;ul&gt;
&lt;li&gt;Min size: 2 instances
&lt;/li&gt;
&lt;li&gt;Max size: 10 instances
&lt;/li&gt;
&lt;li&gt;Scale out when CPU &amp;gt; 70% for 3 mins
&lt;/li&gt;
&lt;li&gt;Scale in when CPU &amp;lt; 30% for 5 mins&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;At low traffic, it runs 2 instances. During a traffic spike, it can scale up to 10 instances automatically — no manual intervention required.&lt;/p&gt;

&lt;h3&gt;
  
  
  Spring Boot Compatibility
&lt;/h3&gt;

&lt;p&gt;Spring Boot apps work well in auto-scaling environments when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They are &lt;strong&gt;stateless&lt;/strong&gt; (no in-memory session data)&lt;/li&gt;
&lt;li&gt;Configs like DB connections and cache clients are tuned for dynamic environments&lt;/li&gt;
&lt;li&gt;Health checks (like &lt;code&gt;/actuator/health&lt;/code&gt;) are configured properly&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Auto Scaling gives you elasticity — your app grows and shrinks with your traffic, keeping costs down and uptime high.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  3.  Containerization &amp;amp; Orchestration
&lt;/h2&gt;

&lt;p&gt;Scaling manually — provisioning servers, installing dependencies, deploying code — becomes a bottleneck as traffic increases. That’s why modern Spring Boot applications are &lt;strong&gt;containerized&lt;/strong&gt; with tools like &lt;strong&gt;Docker&lt;/strong&gt; and managed by orchestration platforms like &lt;strong&gt;Kubernetes&lt;/strong&gt; or &lt;strong&gt;AWS ECS&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is Containerization?
&lt;/h3&gt;

&lt;p&gt;Containerization packages your app and its dependencies into a &lt;strong&gt;self-contained unit&lt;/strong&gt; that runs anywhere — consistently.&lt;/p&gt;

&lt;p&gt;Popular tool:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Docker&lt;/strong&gt; — the most widely used container platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With Docker, you can "bake" your Spring Boot app into an image using a &lt;code&gt;Dockerfile&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  📄 Example Dockerfile:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; openjdk:17&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; target/myapp.jar app.jar&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["java", "-jar", "app.jar"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why Containers Help Handle High Traffic
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; &lt;strong&gt;Fast startup&lt;/strong&gt;: Containers boot in seconds, perfect for scaling.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Consistency&lt;/strong&gt;: "It works on my machine" becomes irrelevant.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Portability&lt;/strong&gt;: Works across environments — cloud, local, CI/CD.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Isolation&lt;/strong&gt;: Each app instance runs independently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During traffic spikes, containers let you scale &lt;strong&gt;quickly and cleanly&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is Orchestration?
&lt;/h3&gt;

&lt;p&gt;After containerizing your app, you need a system to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Start and stop containers
&lt;/li&gt;
&lt;li&gt;Restart failed ones
&lt;/li&gt;
&lt;li&gt;Scale based on load
&lt;/li&gt;
&lt;li&gt;Handle networking between services
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is called &lt;strong&gt;container orchestration&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Popular Orchestration Tools
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Kubernetes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Cloud-agnostic, powerful container orchestrator&lt;/td&gt;
&lt;td&gt;Complex, production-grade deployments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS ECS&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;AWS-managed orchestration for Docker containers&lt;/td&gt;
&lt;td&gt;AWS-native apps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS Fargate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Serverless containers (no servers to manage)&lt;/td&gt;
&lt;td&gt;Quick, scalable deployments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;A common stack today: &lt;strong&gt;Spring Boot + Docker + Kubernetes&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  4.  CDN &amp;amp; Edge Caching
&lt;/h2&gt;

&lt;p&gt;When your APIs or static assets are publicly accessible, you don’t want every request to hit your Spring Boot server — especially during traffic spikes.&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;CDNs (Content Delivery Networks)&lt;/strong&gt; and &lt;strong&gt;edge caching&lt;/strong&gt; come in.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Is a CDN?
&lt;/h3&gt;

&lt;p&gt;A CDN is a network of geographically distributed servers that &lt;strong&gt;cache and serve content closer to the user&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of serving static files (images, CSS, JS) or even public APIs from your origin server every time, a CDN:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces latency&lt;/li&gt;
&lt;li&gt;Caches content near the user&lt;/li&gt;
&lt;li&gt;Shields your backend from spikes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Common CDNs
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;CDN Service&lt;/th&gt;
&lt;th&gt;Ideal Use Case&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloudflare&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Static content, public APIs, free tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;AWS CloudFront&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep AWS integration, S3, Lambda@Edge&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Fastly&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time edge logic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Akamai&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Enterprise-grade, massive scale&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  What You Can Cache
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Images, stylesheets, JS bundles&lt;/li&gt;
&lt;li&gt;Product listings or public blogs&lt;/li&gt;
&lt;li&gt;Public GET endpoints (e.g., &lt;code&gt;/products&lt;/code&gt;, &lt;code&gt;/news&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;API responses with &lt;code&gt;Cache-Control&lt;/code&gt; headers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Benefits in High Traffic
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt; Faster response time globally&lt;/li&gt;
&lt;li&gt; Offloads requests from backend&lt;/li&gt;
&lt;li&gt; Protects origin via DDoS shielding&lt;/li&gt;
&lt;li&gt; Handles traffic spikes better than your server alone&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  5.  Observability &amp;amp; Load Testing
&lt;/h2&gt;

&lt;p&gt;You can’t scale or debug what you can’t see. When your APIs are under heavy load, things can go wrong — services might slow down, databases could become bottlenecks, or dependencies might fail.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Observability + Load Testing&lt;/strong&gt; helps you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detect bottlenecks&lt;/li&gt;
&lt;li&gt;Understand failure points&lt;/li&gt;
&lt;li&gt;Prepare for real-world traffic&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Is Observability?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; means your system can answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What’s happening?&lt;/strong&gt; → Metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What happened?&lt;/strong&gt; → Logs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why did it happen?&lt;/strong&gt; → Traces&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Think of it as a monitoring + debugging toolkit for production.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Key Tools for Observability
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Purpose&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Logging&lt;/td&gt;
&lt;td&gt;Logback, Log4j2, Loki&lt;/td&gt;
&lt;td&gt;Application-level logs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Metrics&lt;/td&gt;
&lt;td&gt;Micrometer + Prometheus&lt;/td&gt;
&lt;td&gt;JVM, HTTP, DB metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tracing&lt;/td&gt;
&lt;td&gt;OpenTelemetry, Zipkin&lt;/td&gt;
&lt;td&gt;Distributed request tracing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Dashboards&lt;/td&gt;
&lt;td&gt;Grafana&lt;/td&gt;
&lt;td&gt;Visualize data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alerts&lt;/td&gt;
&lt;td&gt;Alertmanager, CloudWatch&lt;/td&gt;
&lt;td&gt;Notify on failures/thresholds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Metrics in Spring Boot with Prometheus
&lt;/h3&gt;

&lt;p&gt;Add Micrometer to your Spring Boot project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- pom.xml --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.micrometer&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;micrometer-registry-prometheus&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Enable Prometheus Endpoint
&lt;/h3&gt;

&lt;p&gt;Enable actuator metrics in your &lt;code&gt;application.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;management&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;endpoints&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;web&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;exposure&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;include&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;health,info,metrics,prometheus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Prometheus can now scrape from:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/actuator/prometheus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Distributed Tracing with OpenTelemetry
&lt;/h3&gt;

&lt;p&gt;Tracing helps you follow requests across microservices.&lt;/p&gt;

&lt;h4&gt;
  
  
  Add Tracing Dependencies
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.opentelemetry.instrumentation&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;opentelemetry-spring-boot-autoconfigure&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;version&amp;gt;&lt;/span&gt;1.32.0&lt;span class="nt"&gt;&amp;lt;/version&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Add Headers to Outgoing Calls Using Interceptors
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;RestTemplate&lt;/span&gt; &lt;span class="n"&gt;restTemplate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RestTemplateBuilder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;interceptors&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;TracingClientHttpRequestInterceptor&lt;/span&gt;&lt;span class="o"&gt;())&lt;/span&gt;
    &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can view request flow and bottlenecks in &lt;strong&gt;Zipkin&lt;/strong&gt; or &lt;strong&gt;Jaeger&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Metrics to Monitor
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Why It Matters&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;http.server.requests&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;API latency, error rates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;jvm.memory.used&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Memory health, garbage collection issues&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;db.connections.active&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Detect DB pool exhaustion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;cache.hit/miss&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Caching effectiveness&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;kafka.consumer.lag&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Async queue health&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Set Up Smart Alerts
&lt;/h3&gt;

&lt;p&gt;Set alerts like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Response time &amp;gt; 1s on &lt;code&gt;/checkout&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Error rate &amp;gt; 5% for any endpoint&lt;/li&gt;
&lt;li&gt;JVM memory &amp;gt; 85%&lt;/li&gt;
&lt;li&gt;DB connection pool &amp;gt; 90%&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use tools like &lt;strong&gt;Alertmanager&lt;/strong&gt;, &lt;strong&gt;CloudWatch&lt;/strong&gt;, or &lt;strong&gt;Grafana alerts&lt;/strong&gt; to notify via Slack, email, or PagerDuty.&lt;/p&gt;

&lt;h3&gt;
  
  
  Load &amp;amp; Stress Testing with JMeter
&lt;/h3&gt;

&lt;p&gt;Before your app hits real traffic, simulate it using &lt;a href="https://jmeter.apache.org" rel="noopener noreferrer"&gt;Apache JMeter&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Load Test vs Stress Test
&lt;/h4&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Load Test&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Simulate expected traffic volume&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Stress Test&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Push system beyond its limits to find breaks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  How to Test Spring Boot APIs with JMeter
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Download from &lt;a href="https://jmeter.apache.org" rel="noopener noreferrer"&gt;jmeter.apache.org&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Open JMeter GUI and create a Thread Group:

&lt;ul&gt;
&lt;li&gt;Threads: 100&lt;/li&gt;
&lt;li&gt;Ramp-up: 10s&lt;/li&gt;
&lt;li&gt;Loop: 10&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Add HTTP Request:

&lt;ul&gt;
&lt;li&gt;Method: &lt;code&gt;GET&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;URL: &lt;code&gt;http://localhost:8080/api/products&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Add Summary Report or Graph Results&lt;/li&gt;
&lt;li&gt;Run and observe response times, throughput, and failures&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Handling high traffic isn't just about writing better code — it's about building a system that can scale, self-heal, and stay visible under pressure.&lt;/p&gt;

&lt;p&gt;In this post, we covered infrastructure-level strategies that help Spring Boot applications survive and thrive in production:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Load Balancers spread traffic evenly and prevent single points of failure.&lt;/li&gt;
&lt;li&gt;Auto Scaling Groups grow or shrink your app based on demand.&lt;/li&gt;
&lt;li&gt;Containerization ensures fast, portable deployments.&lt;/li&gt;
&lt;li&gt;CDNs and edge caching offload static and public traffic from your backend.&lt;/li&gt;
&lt;li&gt;Observability tools like Prometheus and Zipkin give you deep visibility into how your system behaves under load.&lt;/li&gt;
&lt;li&gt;Load testing helps you validate performance before traffic actually hits.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These infrastructure patterns complement the code-level techniques discussed in &lt;a href="https://dev.to/abhijithzero/top-5-code-level-techniques-to-handle-high-traffic-in-spring-boot-part-1-29en"&gt;Part 1&lt;/a&gt;, creating a robust, production-ready system.&lt;/p&gt;

&lt;p&gt;When you combine resilient code with scalable infrastructure, you're not just handling traffic — you're welcoming it.&lt;/p&gt;




&lt;p&gt;What other strategies have you used to scale Spring Boot apps? Drop a comment below or share your thoughts!&lt;/p&gt;

</description>
      <category>java</category>
      <category>systemdesign</category>
      <category>api</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Top 5 Code-Level Techniques to Handle High Traffic in Spring Boot: Part 1</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Wed, 09 Jul 2025 20:56:57 +0000</pubDate>
      <link>https://dev.to/abhijithzero/top-5-code-level-techniques-to-handle-high-traffic-in-spring-boot-part-1-29en</link>
      <guid>https://dev.to/abhijithzero/top-5-code-level-techniques-to-handle-high-traffic-in-spring-boot-part-1-29en</guid>
      <description>&lt;p&gt;When your app goes viral or hits a major user milestone, there’s one thing you absolutely can’t afford: your APIs crashing.&lt;/p&gt;

&lt;p&gt;Whether you're building an e-commerce backend, a social platform, or a microservices-based system with Spring Boot, designing for peak load isn't just a best practice — it's &lt;strong&gt;essential&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The good news? You don’t need a massive budget or complex infrastructure to start preparing. Often, it begins with smart choices in your &lt;strong&gt;codebase&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this two-part blog series, we’ll explore practical strategies to make your Spring Boot APIs &lt;strong&gt;resilient&lt;/strong&gt; and &lt;strong&gt;performant&lt;/strong&gt; under heavy traffic.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 So What Is Peak Load and Why It Matters
&lt;/h2&gt;

&lt;p&gt;Peak load is when your application receives an unusually high number of requests — like during sales, promotions, or trending events. If your app isn’t ready, users might see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;⛔️ 500 Internal Server Errors&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🐢 Slow responses&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;🔄 Timeouts&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🧱 The Core Strategy: Absorb, Redirect, and Recover
&lt;/h2&gt;

&lt;p&gt;Think of your API system like a dam:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Absorb&lt;/strong&gt; sudden spikes, &lt;strong&gt;redirect&lt;/strong&gt; excess load, and &lt;strong&gt;recover&lt;/strong&gt; quickly from overload.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Let’s break down the key components using the Spring Framework.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. 🔌 Connection Pooling with Spring Boot
&lt;/h2&gt;

&lt;p&gt;Every time your Spring application needs to interact with a database—whether it's saving user data, retrieving product information, or running a report—it must establish a connection, perform the operation, and then close the connection. Creating and tearing down these connections repeatedly under high load introduces latency and exhausts database and system resources.&lt;/p&gt;

&lt;p&gt;Connection pooling solves this by maintaining a set of pre-established connections that are reused across requests. There are a lot of popular connection pooling frameworks like Apache Commons DBCP, HikariCP, C3P0. With Spring Boot and HikariCP, the pool is initialized when the application starts, creating a ready-to-use pool of connections. When a request comes in, Spring borrows an available connection from the pool, performs the operation, and returns the connection to the pool instead of closing it. This greatly reduces overhead, lowers latency, and prevents the database from becoming a bottleneck during peak traffic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv70up241eazpzwqvch9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv70up241eazpzwqvch9.png" alt="Image showing how connection pool works" width="800" height="795"&gt;&lt;/a&gt;&lt;br&gt;
Example with hikariCp:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;datasource&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jdbc:mysql://localhost:3306/mydb&lt;/span&gt;
    &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;root&lt;/span&gt;
    &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;secret&lt;/span&gt;
    &lt;span class="na"&gt;hikari&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;maximum-pool-size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;20&lt;/span&gt; &lt;span class="c1"&gt;# Max number of connections in the pool&lt;/span&gt;
      &lt;span class="na"&gt;minimum-idle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="c1"&gt;#Min number of idle (ready) connections&lt;/span&gt;
      &lt;span class="na"&gt;connection-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30000&lt;/span&gt; &lt;span class="c1"&gt;#the maximum amount of time (in milliseconds) that a client (your Spring Boot application) will wait to get a connection from the pool.&lt;/span&gt;
      &lt;span class="na"&gt;idle-timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;600000&lt;/span&gt; &lt;span class="c1"&gt;#the maximum amount of time (in milliseconds) that a connection is allowed to sit idle in the pool before being closed.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;➡️ Match pool size to the number of concurrent DB connections your app can handle efficiently.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. 🚦 Rate Limiting to Control Abuse
&lt;/h2&gt;

&lt;p&gt;If users or bots hit your API too often, they can bring your server down.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Solution: Add Rate Limiting or Throttling&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  🚦 What is Rate Limiting?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Rate Limiting&lt;/strong&gt; controls how many requests a client (user, IP, token, etc.) can make to your API within a specific time window.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 Why Use Rate Limiting?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Protects your app from &lt;strong&gt;abuse or misuse&lt;/strong&gt; (e.g., brute-force attacks or API scraping).&lt;/li&gt;
&lt;li&gt;Keeps your &lt;strong&gt;backend and database healthy&lt;/strong&gt; under high load.&lt;/li&gt;
&lt;li&gt;Ensures &lt;strong&gt;fair use&lt;/strong&gt; across all users.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  🔧 Example
&lt;/h4&gt;

&lt;blockquote&gt;
&lt;p&gt;"A user can call the &lt;code&gt;/login&lt;/code&gt; API &lt;strong&gt;5 times per minute&lt;/strong&gt;."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If the user exceeds that, they get a &lt;code&gt;429 Too Many Requests&lt;/code&gt; error.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔁 What is Throttling?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Throttling&lt;/strong&gt; is closely related to rate limiting. But while rate limiting &lt;em&gt;blocks&lt;/em&gt; requests beyond a threshold, &lt;strong&gt;throttling may slow them down&lt;/strong&gt; or &lt;strong&gt;queue&lt;/strong&gt; them instead.&lt;/p&gt;

&lt;h3&gt;
  
  
  📌 Difference in a Nutshell
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concept&lt;/th&gt;
&lt;th&gt;Behavior&lt;/th&gt;
&lt;th&gt;Goal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Rate Limiting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Reject excess requests&lt;/td&gt;
&lt;td&gt;Prevent overload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Throttling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Delay or queue excess requests&lt;/td&gt;
&lt;td&gt;Smooth traffic flow&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Use libraries like  &lt;strong&gt;Bucket4j&lt;/strong&gt; or &lt;strong&gt;resilience4j&lt;/strong&gt; to implement rate limits per IP or user.&lt;/p&gt;

&lt;p&gt;Example with resilience4j:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RateLimiter&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"productDetailRateLimiter"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="nf"&gt;fetchData&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s"&gt;"Success!"&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In application.yml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resilience4j&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ratelimiter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;productDetailRateLimiter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limitForPeriod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;       &lt;span class="c1"&gt;# Allow 100 requests (customers)&lt;/span&gt;
        &lt;span class="na"&gt;limitRefreshPeriod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1s&lt;/span&gt;    &lt;span class="c1"&gt;# every 1 second&lt;/span&gt;
        &lt;span class="na"&gt;timeoutDuration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;0s&lt;/span&gt;       &lt;span class="c1"&gt;# if full, immediately say "no"&lt;/span&gt;

      &lt;span class="na"&gt;checkoutRateLimiter&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limitForPeriod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;        &lt;span class="c1"&gt;# Allow only 10 requests (customers)&lt;/span&gt;
        &lt;span class="na"&gt;limitRefreshPeriod&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;5s&lt;/span&gt;    &lt;span class="c1"&gt;# every 5 seconds (checkout is resource intensive)&lt;/span&gt;
        &lt;span class="na"&gt;timeoutDuration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2s&lt;/span&gt;       &lt;span class="c1"&gt;# if full, wait up to 2 seconds&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. 🗃️ Add Caching for Frequently Requested Data
&lt;/h2&gt;

&lt;p&gt;APIs that serve the same data repeatedly — like product lists, configurations, or top-rated items — should avoid hitting the database every time. Caching helps improve response times and reduce load.&lt;/p&gt;

&lt;p&gt;In Spring Boot, you can use &lt;strong&gt;Caffeine&lt;/strong&gt; for fast in-memory (local) caching or &lt;strong&gt;Redis&lt;/strong&gt; for distributed caching. Combining both gives you the best of both worlds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧠 &lt;strong&gt;Caffeine&lt;/strong&gt;: Blazing-fast in-process memory cache
&lt;/li&gt;
&lt;li&gt;🌐 &lt;strong&gt;Redis&lt;/strong&gt;: Shared cache across app instances (useful in cloud or clustered environments)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Solution: Use Spring Cache with Caffeine + Redis&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Step 1: Add Dependencies
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;pom.xml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- Spring Cache Abstraction --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.boot&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-boot-starter-cache&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- Caffeine --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;com.github.ben-manes.caffeine&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;caffeine&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;

&lt;span class="c"&gt;&amp;lt;!-- Redis --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.boot&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-boot-starter-data-redis&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Enable Caching
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;
&lt;span class="nd"&gt;@SpringBootApplication&lt;/span&gt;
&lt;span class="nd"&gt;@EnableCaching&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;Application&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;SpringApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Application&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Use &lt;code&gt;@Cacheable&lt;/code&gt; to Cache Methods
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Cacheable&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cacheNames&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"products"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Product&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;getAllProducts&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;productRepository&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;findAll&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Configure Cache in &lt;code&gt;application.yml&lt;/code&gt;
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cache&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;redis&lt;/span&gt;

  &lt;span class="na"&gt;redis&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;host&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;localhost&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;6379&lt;/span&gt;

  &lt;span class="na"&gt;caffeine&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;maximumSize=500,expireAfterWrite=5m&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 5: Combine Caffeine (L1) + Redis (L2)
&lt;/h3&gt;

&lt;p&gt;To set up Caffeine + Redis hybrid caching, define a custom &lt;code&gt;CacheManager&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@Bean&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;CacheManager&lt;/span&gt; &lt;span class="nf"&gt;cacheManager&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;RedisConnectionFactory&lt;/span&gt; &lt;span class="n"&gt;redisConnectionFactory&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;Caffeine&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Object&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;caffeine&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Caffeine&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;newBuilder&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;maximumSize&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;expireAfterWrite&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;TimeUnit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;MINUTES&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;CaffeineCacheManager&lt;/span&gt; &lt;span class="n"&gt;caffeineCacheManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CaffeineCacheManager&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="n"&gt;caffeineCacheManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setCaffeine&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;caffeine&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="nc"&gt;RedisCacheManager&lt;/span&gt; &lt;span class="n"&gt;redisCacheManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisCacheManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;builder&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;redisConnectionFactory&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;&lt;span class="na"&gt;build&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="nc"&gt;CompositeCacheManager&lt;/span&gt; &lt;span class="n"&gt;compositeCacheManager&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;CompositeCacheManager&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;caffeineCacheManager&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redisCacheManager&lt;/span&gt;
    &lt;span class="o"&gt;);&lt;/span&gt;

    &lt;span class="n"&gt;compositeCacheManager&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;setFallbackToNoOpCache&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;compositeCacheManager&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;👉This is ideal for applications needing fast local reads with distributed consistency.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. ⏳ Async Processing with Queues
&lt;/h2&gt;

&lt;p&gt;When your API needs to perform a heavy or time-consuming task — like sending emails, processing images, generating reports, or calling external services — doing it &lt;strong&gt;synchronously&lt;/strong&gt; (i.e., within the request-response cycle) can slow things down or even cause timeouts during high traffic.&lt;/p&gt;

&lt;p&gt;Instead, you can &lt;strong&gt;process these tasks asynchronously&lt;/strong&gt;, freeing up your API to respond quickly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Solution 1: Use &lt;code&gt;@Async&lt;/code&gt; for Fire-and-Forget Tasks&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Spring Boot makes asynchronous method execution super easy with the &lt;code&gt;@Async&lt;/code&gt; annotation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/send-email"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;sendEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;EmailDto&lt;/span&gt; &lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;emailService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;sendEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dto&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// this is @Async&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Email scheduled"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Responds fast, but if the app crashes before task completion, the work is lost (no durability)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Solution 2: Using RabbitMQ for Queued Job&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/register"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;?&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;registerUser&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;User&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;userService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;save&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;rabbitTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;convertAndSend&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"emailQueue"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getEmail&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"User registered"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@RabbitListener&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;queues&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"emailQueue"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;sendEmail&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Send confirmation email&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;It is durable, even after restart
&lt;/li&gt;
&lt;li&gt;It decouples API from email logic&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Solution 3. Using Kafka for Logging or Events&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@PostMapping&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/checkout"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;?&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;checkout&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;@RequestBody&lt;/span&gt; &lt;span class="nc"&gt;Order&lt;/span&gt; &lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;orderService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;save&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;kafkaTemplate&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;send&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"order-events"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OrderEvent&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;order&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;ResponseEntity&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ok&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Order placed"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;It can log events for analytics
&lt;/li&gt;
&lt;li&gt;It is scalable under high load
&lt;/li&gt;
&lt;li&gt;Async consumers can process downstream (e.g., inventory, invoice)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧠 Why It Helps with High Traffic
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Reduces response time → Frees up API threads
&lt;/li&gt;
&lt;li&gt;Avoids blocking on slow operations (email, DB writes, external APIs)
&lt;/li&gt;
&lt;li&gt;Smooths traffic spikes via message queues
&lt;/li&gt;
&lt;li&gt;Scales better with distributed consumers&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔄 Use Cases in High API Traffic
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;strong&gt;Scenario&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Problem&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Async Solution&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sending emails or SMS&lt;/td&gt;
&lt;td&gt;Slow 3rd-party API blocks request thread&lt;/td&gt;
&lt;td&gt;Use &lt;code&gt;@Async&lt;/code&gt; or queue message via RabbitMQ&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Generating reports&lt;/td&gt;
&lt;td&gt;Takes seconds/minutes&lt;/td&gt;
&lt;td&gt;Queue job and return job ID instantly&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Audit logging&lt;/td&gt;
&lt;td&gt;Every request writes to DB&lt;/td&gt;
&lt;td&gt;Send logs to Kafka (high throughput)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Image or video processing&lt;/td&gt;
&lt;td&gt;CPU-intensive&lt;/td&gt;
&lt;td&gt;Offload via RabbitMQ or Kafka&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Webhook forwarding&lt;/td&gt;
&lt;td&gt;Call to external service may timeout&lt;/td&gt;
&lt;td&gt;Queue and process later&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  5. 🛑 Circuit Breakers with Resilience4j
&lt;/h2&gt;

&lt;p&gt;When your API relies on &lt;strong&gt;external services&lt;/strong&gt; like payment gateways, email providers, or third-party APIs, there's always a risk that they might fail or become slow.&lt;/p&gt;

&lt;p&gt;Under high traffic, repeated failed calls can lead to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cascading failures&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Thread exhaustion&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Service-wide slowdowns&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is where the &lt;strong&gt;Circuit Breaker&lt;/strong&gt; pattern shines. It helps your app &lt;strong&gt;fail fast&lt;/strong&gt;, protect itself, and &lt;strong&gt;recover gracefully&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ✅ What is a Circuit Breaker?
&lt;/h3&gt;

&lt;p&gt;A circuit breaker monitors external calls and &lt;strong&gt;"opens the circuit"&lt;/strong&gt; if too many failures happen in a short time. This stops further attempts temporarily, giving the system time to recover.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🟢 &lt;strong&gt;Closed&lt;/strong&gt;: Normal operation
&lt;/li&gt;
&lt;li&gt;🔴 &lt;strong&gt;Open&lt;/strong&gt;: Calls are blocked immediately
&lt;/li&gt;
&lt;li&gt;🟡 &lt;strong&gt;Half-Open&lt;/strong&gt;: Allows a few test calls to check recovery
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;✅ Solution: Use Resilience4j Circuit Breaker in Spring Boot&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Add the dependency in &lt;code&gt;pom.xml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.github.resilience4j&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;resilience4j-spring-boot2&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.boot&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
  &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-boot-starter-aop&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  🔧 Example Usage
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nd"&gt;@CircuitBreaker&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"paymentService"&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fallbackMethod&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"fallbackPayment"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;PaymentResponse&lt;/span&gt; &lt;span class="nf"&gt;chargeCard&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;paymentClient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;charge&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// external API call&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the call fails repeatedly, the circuit "opens" and the fallback method is triggered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nc"&gt;PaymentResponse&lt;/span&gt; &lt;span class="nf"&gt;fallbackPayment&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt; &lt;span class="n"&gt;userId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;Throwable&lt;/span&gt; &lt;span class="n"&gt;ex&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;PaymentResponse&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Payment service unavailable"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  ⚙️ Configure Circuit Breaker in &lt;code&gt;application.yml&lt;/code&gt;
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;resilience4j.circuitbreaker&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;paymentService&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;registerHealthIndicator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;slidingWindowSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10&lt;/span&gt; 
      &lt;span class="na"&gt;failureRateThreshold&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt; &lt;span class="c1"&gt;#If more than 5 out of 10 calls fail, open the circuit&lt;/span&gt;

      &lt;span class="na"&gt;waitDurationInOpenState&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;30s&lt;/span&gt; &lt;span class="c1"&gt;#Stay open for 30 seconds before allowing test calls again&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🧠 When to Use Circuit Breakers
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use Case&lt;/th&gt;
&lt;th&gt;Should You Use It?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;External APIs (payments, SMS, etc.)&lt;/td&gt;
&lt;td&gt;✅ Definitely&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Internal microservices (over network)&lt;/td&gt;
&lt;td&gt;✅ Recommended&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Local in-memory methods&lt;/td&gt;
&lt;td&gt;❌ Not needed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Controlling high traffic isn’t just about throwing hardware at the problem — it starts with &lt;strong&gt;writing efficient, resilient code&lt;/strong&gt;. In this post, we explored essential &lt;strong&gt;code-level strategies&lt;/strong&gt; to prepare your Spring Boot APIs for peak load:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔌 Connection pooling to avoid DB overload&lt;/li&gt;
&lt;li&gt;🚦 Rate limiting to protect endpoints from abuse&lt;/li&gt;
&lt;li&gt;🗃️ Caching with Caffeine (and Redis) to serve repeated requests faster&lt;/li&gt;
&lt;li&gt;⏳ Async processing to offload heavy background tasks&lt;/li&gt;
&lt;li&gt;🛑 Circuit breakers to prevent cascading failures from unstable dependencies&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each of these techniques helps your application stay responsive, even when traffic spikes or dependencies slow down.&lt;/p&gt;




&lt;h2&gt;
  
  
  👀 What’s Next?
&lt;/h2&gt;

&lt;p&gt;Code-level techniques take you far, but without the right infrastructure, you're still at risk.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Stay tuned for &lt;strong&gt;Part 2: Infrastructure-Level Strategies for Handling High Traffic in Spring Boot APIs.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>java</category>
      <category>webdev</category>
      <category>api</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Demystifying AI Agents: How Language Models Think, Act, and Learn in the Real World</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Sun, 22 Jun 2025 13:57:33 +0000</pubDate>
      <link>https://dev.to/abhijithzero/demystifying-ai-agents-how-language-models-think-act-and-learn-in-the-real-world-5612</link>
      <guid>https://dev.to/abhijithzero/demystifying-ai-agents-how-language-models-think-act-and-learn-in-the-real-world-5612</guid>
      <description>&lt;p&gt;AI agents are the next step in making intelligent systems more interactive, capable, and autonomous. Instead of just answering questions, agents can reason through complex tasks, use tools, interact with their environment, and adapt to feedback. In this blog, we break down the core building blocks of AI agents in simple terms.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧠 What is an Agent?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;agent&lt;/strong&gt; is a system that can:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Perceive&lt;/strong&gt; its environment (through inputs like queries or data)
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reason&lt;/strong&gt; or plan its next steps
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Act&lt;/strong&gt; by calling external tools or APIs
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Learn or adapt&lt;/strong&gt; based on the outcome of its actions
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In LLM-powered systems, the agent uses a language model to "think," tools to "act," and observations to improve future decisions.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧱 What is an LLM?
&lt;/h2&gt;

&lt;p&gt;An &lt;strong&gt;LLM (Large Language Model)&lt;/strong&gt; like GPT-4, Claude, or Gemini is trained on large amounts of text to predict the next token in a sequence. It powers the reasoning, planning, and language generation abilities of an agent.&lt;/p&gt;

&lt;p&gt;Think of it as the brain of the agent that understands instructions, generates thoughts, and decides what to do next.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Tools: Extending the LLM's Abilities
&lt;/h2&gt;

&lt;p&gt;LLMs are limited by design; they can't access real-time information or perform actions on external systems. That's where &lt;strong&gt;tools&lt;/strong&gt; come in:&lt;/p&gt;

&lt;p&gt;Tools are external functions the agent can call to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search the web
&lt;/li&gt;
&lt;li&gt;Query a database
&lt;/li&gt;
&lt;li&gt;Fetch weather or stock data
&lt;/li&gt;
&lt;li&gt;Execute code
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example tool call:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"input"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"India"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  💬 Messages and Special Tokens
&lt;/h2&gt;

&lt;p&gt;Agentic systems rely on structured communication using &lt;strong&gt;messages&lt;/strong&gt; and, in some frameworks, &lt;strong&gt;special tokens&lt;/strong&gt;. These help manage conversations, tool usage, and the agent’s internal reasoning.&lt;/p&gt;

&lt;h3&gt;
  
  
  📬 Message Roles
&lt;/h3&gt;

&lt;p&gt;Each message has a &lt;strong&gt;role&lt;/strong&gt; that defines its purpose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;system&lt;/code&gt;&lt;/strong&gt; – Sets the agent's behavior or instructions. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;_Example: “You are an AI agent that can use tools.”_&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;user&lt;/code&gt;&lt;/strong&gt; – The human's or calling app’s input.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;_Example: “What’s the weather in Tokyo?”_&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;assistant&lt;/code&gt;&lt;/strong&gt; – The LLM's response (thoughts, plans, or final answers).
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;_Example: “Action: get_weather, Input: Tokyo”_&lt;/code&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;tool&lt;/code&gt;&lt;/strong&gt; – The result of a tool call. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;_Example: “Observation: It's 27°C and sunny in Tokyo.”_&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  🧪 Special Tokens
&lt;/h3&gt;

&lt;p&gt;Some frameworks (e.g., OpenAI, LangGraph) use &lt;strong&gt;tokens or delimiters&lt;/strong&gt; to mark parts of the response:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;&amp;lt;|thought|&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;|action|&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;lt;|observation|&amp;gt;&lt;/code&gt; – Used to guide parsing
&lt;/li&gt;
&lt;li&gt;Ensures the system can &lt;strong&gt;stop at the right point&lt;/strong&gt; and &lt;strong&gt;extract actions&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔁 Why It Matters
&lt;/h3&gt;

&lt;p&gt;This structure lets agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Manage multi-turn workflows
&lt;/li&gt;
&lt;li&gt;Separate thought from action
&lt;/li&gt;
&lt;li&gt;Safely interact with tools
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, messages and special tokens form the backbone of how agents &lt;strong&gt;think, act, and learn step-by-step&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⟳ The Thought → Action → Observation Cycle
&lt;/h2&gt;

&lt;p&gt;This cycle is at the heart of agentic reasoning. The model reasons, acts, observes the result, and thinks again.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔎 Diagram: Thought-Action-Observation Cycle
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb67nwo2331e25cltwq95.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb67nwo2331e25cltwq95.png" alt="Image showing Thought-Action-Observation Cycle" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This loop continues until the task is complete.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧬 Thought = Internal Reasoning
&lt;/h2&gt;

&lt;p&gt;Not every step involves an action. Sometimes, the agent just &lt;strong&gt;thinks&lt;/strong&gt; out loud to plan its next move.&lt;/p&gt;

&lt;p&gt;These internal thoughts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Help break down complex problems&lt;/li&gt;
&lt;li&gt;Allow for step-by-step execution&lt;/li&gt;
&lt;li&gt;Improve transparency&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  ⚛️ The ReAct Approach
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;ReAct&lt;/strong&gt; stands for &lt;strong&gt;Reasoning + Acting&lt;/strong&gt;. It’s a popular approach for LLM-based agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  ReAct Agent Output Example:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Convert 10 kilometers to miles.

Thought: I need to convert 10 kilometers to miles.

Action: Call a unit conversion tool.

Observation: 10 kilometers is approximately 6.21 miles.

Response: 10 kilometers is approximately 6.21 miles.


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By alternating between reasoning and acting, the agent becomes more accurate and reliable.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌍 Actions: Interacting with the Environment
&lt;/h2&gt;

&lt;p&gt;Once the model has thought through its strategy, it uses actions to make changes in the world:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Query APIs&lt;/li&gt;
&lt;li&gt;Execute shell commands&lt;/li&gt;
&lt;li&gt;Send messages&lt;/li&gt;
&lt;li&gt;Retrieve or update records&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is what makes agents actually &lt;em&gt;do&lt;/em&gt; things instead of just &lt;em&gt;say&lt;/em&gt; things.&lt;/p&gt;




&lt;h2&gt;
  
  
  👀 Observation: Reflect and React
&lt;/h2&gt;

&lt;p&gt;Every action yields an &lt;strong&gt;observation&lt;/strong&gt; — feedback from the environment.&lt;/p&gt;

&lt;p&gt;The agent then:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Evaluates whether the result met the goal&lt;/li&gt;
&lt;li&gt;Adapts its next thought&lt;/li&gt;
&lt;li&gt;May retry or take alternative actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This closes the loop and makes agents dynamic and responsive.&lt;/p&gt;




&lt;h2&gt;
  
  
  ✅ Final Thoughts
&lt;/h2&gt;

&lt;p&gt;LLMs become truly powerful when you turn them into &lt;strong&gt;agents&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They can plan and act&lt;/li&gt;
&lt;li&gt;Use tools to bridge gaps&lt;/li&gt;
&lt;li&gt;Think, act, and observe in cycles&lt;/li&gt;
&lt;li&gt;Improve with feedback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You’ve just seen the architecture behind the smartest AI systems today — from coding copilots to research assistants. Whether using LangChain, SmolAgents, or custom frameworks, AI agents are how we move from static chat to &lt;strong&gt;autonomous intelligence&lt;/strong&gt;.&lt;/p&gt;




</description>
      <category>ai</category>
      <category>learning</category>
      <category>llm</category>
      <category>python</category>
    </item>
    <item>
      <title>Introduction to MCP: Making AI More Connected</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Sat, 24 May 2025 18:38:30 +0000</pubDate>
      <link>https://dev.to/abhijithzero/introduction-to-mcp-making-ai-more-connected-3kp2</link>
      <guid>https://dev.to/abhijithzero/introduction-to-mcp-making-ai-more-connected-3kp2</guid>
      <description>&lt;p&gt;With the rising capabilities of Large Language Models(LLMs) such as ChatGPT, Claude, Gemini, The AI ecosystem is changing rapidly. These models are often limited by their training data and don't have access to real-time data or specialized tools. That’s where &lt;strong&gt;MCP&lt;/strong&gt;, or &lt;strong&gt;Model Context Protocol&lt;/strong&gt;, comes in.&lt;/p&gt;

&lt;p&gt;In this blog, we’ll break down what MCP is, why it’s useful, and how it helps AI work better with the tools and data we already use.&lt;/p&gt;




&lt;h2&gt;
  
  
  So what is MCP?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; is a new open standard that helps AI models interact with external tools and data in a structured, secure, and consistent way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Imagine this:
&lt;/h3&gt;

&lt;p&gt;You're chatting with an AI and you ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Can you summarize the latest file in my Downloads folder?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Without MCP, the AI wouldn’t have access to that file.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;With MCP&lt;/strong&gt;, the AI can ask an external tool (called a “Server”) for help, get the file, and provide the summary — all behind the scenes.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Problem does MCP Solves
&lt;/h2&gt;

&lt;p&gt;It helps solve the M×N Integration Problem. It refers to the challenge of connecting M different AI applications to N different tools or data sources without a standardized approach.&lt;br&gt;
Take for example let’s say we have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6 different AI models&lt;/li&gt;
&lt;li&gt;10 different tools (weather APIs, databases, calculators, file readers, etc.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Without a shared protocol, you'd need &lt;strong&gt;6 × 10 = 60 custom integrations&lt;/strong&gt;.&lt;br&gt;&lt;br&gt;
Talk about a maintenance nightmare 😫 !&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MCP simplifies this&lt;/strong&gt; by transforming it to an M + N problem using a &lt;strong&gt;single, shared protocol&lt;/strong&gt;. So:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Tools only need to implement server side of MCP once&lt;/li&gt;
&lt;li&gt;Each AI application implements the client side of MCP once&lt;/li&gt;
&lt;li&gt;AI Hosts that support MCP can instantly connect
This drastically reduces the integration complexity and maintenance problem.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key MCP Concepts
&lt;/h2&gt;

&lt;p&gt;Let’s break down some important terms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Term&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Host&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;The AI application or product users interact with (e.g., chatbot, IDE). They initiate the connections to MCP Servers and orchestrate the overall flow between user requests, LLM processing, and external tools&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Client&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A component in the Host that talks to a specific MCP Server. Each client maintains a 1:1 connection with a server and handles the protocol-level details of MCP communication and acts as an intermediary between the Host’s logic and the external Server&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Server&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A tool or service that exposes capabilities (can be Tools, Resources, Prompts) via MCP protocol&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Tool&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Functions that Ai model can invoke to perform specific actions. e.g. A python Code executor tool helps AI model to execute python code and return the result.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Resource&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Read-only data like documents or files that provide context to models.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Prompt&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A predefined text-based instruction the AI can use. e.g. A Summarization prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Sampling&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Server-initiated requests let the AI run itself again to review and improve its own work. e.g. The AI writes some code, then the server asks it to run again to check if the code works and fix any errors.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MCP URI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A special format to identify tools and capabilities (e.g. &lt;code&gt;mcp://tools/python_executor/run_python_code&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  How MCP is Built: The Architecture
&lt;/h2&gt;

&lt;p&gt;MCP follows a clear architecture made of three layers:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;strong&gt;Host Application&lt;/strong&gt;
&lt;/h3&gt;

&lt;p&gt;This is the AI-powered app you're using — like a coding assistant or smart chatbot. It includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The Model&lt;/strong&gt; (e.g., an LLM)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Client&lt;/strong&gt;, which talks to external Servers via MCP&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;strong&gt;Client Layer&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Think of the &lt;strong&gt;Client&lt;/strong&gt; as a translator.&lt;/li&gt;
&lt;li&gt;It speaks the MCP language and handles communication between the AI (Host) and tools (Servers).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Client does things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Registering available tools and capabilities&lt;/li&gt;
&lt;li&gt;Routing the AI’s requests to the right Server&lt;/li&gt;
&lt;li&gt;Handling inputs/outputs securely&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. &lt;strong&gt;Server Layer&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;These are the actual tools and services that do the work.&lt;/li&gt;
&lt;li&gt;Servers define one or more &lt;strong&gt;tools&lt;/strong&gt; (like Python runners, file searchers, or translators).&lt;/li&gt;
&lt;li&gt;Each tool offers &lt;strong&gt;capabilities&lt;/strong&gt;, which the AI models can use.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Real Example: Using a PDF Summarizer Tool
&lt;/h2&gt;

&lt;p&gt;Let’s say you ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Can you summarize the contents of my &lt;code&gt;meeting_notes.pdf&lt;/code&gt; file?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here’s what happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Host (AI app)&lt;/strong&gt; receives your request to summarize a PDF
&lt;/li&gt;
&lt;li&gt;It forwards the request to the &lt;strong&gt;Client&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The Client calls the &lt;strong&gt;Server&lt;/strong&gt; that exposes the &lt;code&gt;summarize_pdf&lt;/code&gt; capability
&lt;/li&gt;
&lt;li&gt;The Server reads the PDF file and generates a summary
&lt;/li&gt;
&lt;li&gt;The Host includes that summary in the AI’s response
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;And just like that — your AI becomes a PDF summarizer!&lt;/p&gt;




&lt;h2&gt;
  
  
  Why MCP Matters
&lt;/h2&gt;

&lt;p&gt;Here’s why MCP is a game-changer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ &lt;strong&gt;Standardized&lt;/strong&gt; – Write once, use anywhere&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Interoperable&lt;/strong&gt; – Connect different tools to different AIs easily&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Modular&lt;/strong&gt; – Add/remove tools without breaking things&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Flexible&lt;/strong&gt; – Works locally or remotely&lt;/li&gt;
&lt;li&gt;✅ &lt;strong&gt;Scalable&lt;/strong&gt; – No need for N × M integrations anymore&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Capabilities Types in MCP
&lt;/h2&gt;

&lt;p&gt;There are 3 main capability types in MCP:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Tool&lt;/strong&gt;: Runs actions like executing code or searching files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource&lt;/strong&gt;: Read-only, like a document or file the model can view&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt&lt;/strong&gt;: Template instructions to guide the AI’s responses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sampling&lt;/strong&gt;: Server-initiated requests let the AI model run itself again to review and improve its own work.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In the following diagram we can see the collective capabilities for the use case of a pdf summarizer.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fngq0q4s0ydxgkjt8iava.png" alt="Diagram shows the collective capabilities for the use case of a pdf summarizer." width="800" height="400"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Final Thoughts
&lt;/h2&gt;

&lt;p&gt;MCP is a powerful way to &lt;strong&gt;connect AI with the real world&lt;/strong&gt; — in a safe, simple, and scalable manner. Whether you’re building smart assistants, data dashboards, or developer tools, MCP can make your AI much more capable.&lt;/p&gt;

&lt;p&gt;We’re just scratching the surface — the future of AI will be connected, and MCP is helping lead the way.&lt;/p&gt;




</description>
    </item>
    <item>
      <title>Getting Started with Microservices: A Beginner's Guide Using Spring Boot</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Sat, 08 Mar 2025 15:22:50 +0000</pubDate>
      <link>https://dev.to/abhijithzero/getting-started-with-microservices-a-beginners-guide-using-spring-boot-13la</link>
      <guid>https://dev.to/abhijithzero/getting-started-with-microservices-a-beginners-guide-using-spring-boot-13la</guid>
      <description>&lt;p&gt;Microservices have become an essential part of modern software architecture due to their flexibility, scalability, and ease of maintenance. In this blog, we will explore how to build microservices using Spring Boot. We will cover the integration of essential tools like &lt;strong&gt;Eureka&lt;/strong&gt; for service discovery, &lt;strong&gt;API Gateway&lt;/strong&gt; for routing, &lt;strong&gt;Config Server&lt;/strong&gt; for centralized configuration, and &lt;strong&gt;Zipkin&lt;/strong&gt; for distributed tracing. &lt;/p&gt;

&lt;p&gt;By the end of this guide, you will have a working Spring Boot project with two microservices: &lt;strong&gt;Company&lt;/strong&gt; and &lt;strong&gt;Employee&lt;/strong&gt;, running alongside an API Gateway, Eureka Discovery Server, Config Server, and Zipkin.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before we begin, ensure you have the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Basic understanding of &lt;strong&gt;Spring Boot&lt;/strong&gt; and &lt;strong&gt;Java&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;Familiarity with &lt;strong&gt;Spring Cloud&lt;/strong&gt; concepts (Eureka, Config Server, etc.).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maven&lt;/strong&gt; for dependency management.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker&lt;/strong&gt; (optional for Zipkin).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Overview of the Components
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrzeu54rgkzuonk8yx3x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgrzeu54rgkzuonk8yx3x.png" alt="Architecture Diagram of the project" width="800" height="592"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Microservices with Spring Boot
&lt;/h3&gt;

&lt;p&gt;In this architecture, we have two microservices:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Company Service&lt;/strong&gt;: Manages company-related data.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Employee Service&lt;/strong&gt;: Handles employee data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each microservice is a Spring Boot application that operates independently but interacts with other services via HTTP requests.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Eureka Discovery Server
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://cloud.spring.io/spring-cloud-netflix/reference/html/" rel="noopener noreferrer"&gt;Eureka&lt;/a&gt; provides service discovery. It allows microservices to register themselves and discover each other dynamically. By using Eureka, you eliminate the need to hard-code service URLs, enabling a more flexible system.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. API Gateway
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;API Gateway&lt;/strong&gt; is responsible for routing requests from clients to the appropriate microservices. It also offers additional features such as load balancing and security. In this demo, we will use &lt;strong&gt;Spring Cloud Gateway&lt;/strong&gt; for routing.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Config Server
&lt;/h3&gt;

&lt;p&gt;A &lt;strong&gt;Config Server&lt;/strong&gt; centralizes the configuration for all microservices, making it easier to manage and update configurations without redeploying individual services.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Distributed Tracing with Zipkin
&lt;/h3&gt;

&lt;p&gt;Distributed tracing helps track requests as they move through the various microservices. We'll use &lt;strong&gt;Zipkin&lt;/strong&gt; to visualize and trace requests across services. &lt;strong&gt;Spring Cloud Sleuth&lt;/strong&gt; automatically integrates with Zipkin, providing trace and span information.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step-by-Step Implementation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Setting up Eureka Discovery Server
&lt;/h3&gt;

&lt;p&gt;Start by creating a Spring Boot application for the &lt;strong&gt;Eureka Server&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;1) Add the required dependencies in your &lt;code&gt;pom.xml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.cloud&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-cloud-starter-netflix-eureka-server&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2) Enable Eureka Server in your main application class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;
&lt;span class="nd"&gt;@SpringBootApplication&lt;/span&gt;
&lt;span class="nd"&gt;@EnableEurekaServer&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DiscoveryServer&lt;/span&gt;&lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;SpringApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;EurekaServerApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3) Add Eureka configuration in application.yml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8761&lt;/span&gt;

&lt;span class="na"&gt;eureka&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;register-with-eureka&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;fetch-registry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the Eureka Server on port 8761. The Eureka dashboard can be accessed at &lt;a href="http://localhost:8761" rel="noopener noreferrer"&gt;http://localhost:8761&lt;/a&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 2: Creating Microservices
&lt;/h3&gt;

&lt;p&gt;Both the Company Service and Employee Service will register with Eureka. Here's how to create them:&lt;/p&gt;

&lt;p&gt;1) Add the following dependency to pom.xml for each microservice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.cloud&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-cloud-starter-netflix-eureka-client&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2) Enable Eureka Client in both microservices (Not Required for newer versions of Spring):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;
&lt;span class="nd"&gt;@SpringBootApplication&lt;/span&gt;
&lt;span class="nd"&gt;@EnableEurekaClient&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CompanyServiceApplication&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;SpringApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;CompanyServiceApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3) Configure application properties (application.yml):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;application&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;company-service&lt;/span&gt;
  &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;discovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;eureka&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;service-url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;defaultZone&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8761/eureka&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Step 3: Setting up API Gateway
&lt;/h3&gt;

&lt;p&gt;We'll use Spring Cloud Gateway to handle requests and route them to the appropriate microservices.&lt;/p&gt;

&lt;p&gt;1) Add the required dependency for Spring Cloud Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.cloud&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-cloud-starter-gateway&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2) Define routing in the application.yml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;8222&lt;/span&gt;
&lt;span class="na"&gt;spring&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;cloud&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;gateway&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;discovery&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;locator&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;enabled&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
      &lt;span class="na"&gt;routes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;employees&lt;/span&gt;
          &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8090&lt;/span&gt;
          &lt;span class="na"&gt;predicates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Path=/api/v1/employee/**&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;company&lt;/span&gt;
          &lt;span class="na"&gt;uri&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;http://localhost:8070&lt;/span&gt;
          &lt;span class="na"&gt;predicates&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;Path=/api/v1/company/**&lt;/span&gt;
&lt;span class="na"&gt;management&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tracing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;sampling&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;probability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This configuration ensures that the API Gateway routes requests to company-service and employee-service based on the request path.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 4: Config Server Setup
&lt;/h3&gt;

&lt;p&gt;Create a new Spring Boot application for the Config Server.&lt;/p&gt;

&lt;p&gt;1) Add the following dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.cloud&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-cloud-config-server&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;org.springframework.cloud&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
    &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;spring-cloud-starter-eureka&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;2) Enable Config Server in the main class:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;
&lt;span class="nd"&gt;@SpringBootApplication&lt;/span&gt;
&lt;span class="nd"&gt;@EnableConfigServer&lt;/span&gt;
&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ConfigServerApplication&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;[]&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;SpringApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConfigServerApplication&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;class&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3) Point the Config Server to a Git repository (or file system) that holds the configuration files for your microservices.&lt;/p&gt;




&lt;h3&gt;
  
  
  Step 5: Integrating Zipkin for Distributed Tracing
&lt;/h3&gt;

&lt;p&gt;Add Zipkin dependencies to the employee, company, gateway microservices:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight xml"&gt;&lt;code&gt;&lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.micrometer&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;micrometer-tracing-bridge-brave&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;dependency&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;groupId&amp;gt;&lt;/span&gt;io.zipkin.reporter2&lt;span class="nt"&gt;&amp;lt;/groupId&amp;gt;&lt;/span&gt;
            &lt;span class="nt"&gt;&amp;lt;artifactId&amp;gt;&lt;/span&gt;zipkin-reporter-brave&lt;span class="nt"&gt;&amp;lt;/artifactId&amp;gt;&lt;/span&gt;
        &lt;span class="nt"&gt;&amp;lt;/dependency&amp;gt;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure Zipkin in application.yml:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;
&lt;span class="na"&gt;management&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;tracing&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;sampling&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;probability&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run Zipkin (via Docker or standalone) on port 9411. You can now trace requests across the microservices.&lt;/p&gt;




&lt;h3&gt;
  
  
  Running the Application
&lt;/h3&gt;

&lt;p&gt;Once everything is set up, run the following services:&lt;/p&gt;

&lt;p&gt;Eureka Server: localhost:8761&lt;br&gt;
Company Service: localhost:8070&lt;br&gt;
Employee Service: localhost:8090&lt;br&gt;
API Gateway: localhost:8222&lt;br&gt;
Config Server: localhost:8888 (optional if using a Config Server)&lt;br&gt;
Access the API Gateway at &lt;a href="http://localhost:8222" rel="noopener noreferrer"&gt;http://localhost:8222&lt;/a&gt; and make requests to /company and /employee. All requests will be routed to the appropriate microservices.&lt;/p&gt;

&lt;p&gt;You can also monitor traces in Zipkin's web UI at &lt;a href="http://localhost:9411" rel="noopener noreferrer"&gt;http://localhost:9411&lt;/a&gt;.&lt;/p&gt;




&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;In this guide, we have successfully created 2 microservices with Spring Boot, integrating Eureka for service discovery, API Gateway for routing, Open Feign for communicating between the 2 microservices, Config Server for centralized configuration, and Zipkin for distributed tracing. These tools work together to help manage and monitor microservices effectively, providing a scalable and maintainable architecture.&lt;/p&gt;

&lt;p&gt;With this setup, your microservices can scale independently, discover each other dynamically, and be monitored for performance and issues through distributed tracing.&lt;/p&gt;




&lt;h3&gt;
  
  
  Code Repository
&lt;/h3&gt;

&lt;p&gt;You can access the full &lt;a href="https://github.com/abhijith-zero/MicroServiceDummy/tree/master" rel="noopener noreferrer"&gt;source code&lt;/a&gt; for this project on GitHub.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Thank you for reading!&lt;/strong&gt; Happy coding with &lt;strong&gt;Spring Boot and Microservices!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>java</category>
      <category>microservices</category>
      <category>springboot</category>
    </item>
    <item>
      <title>Getting Started with Docker: Essential Commands for Beginners</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Sat, 16 Nov 2024 11:58:46 +0000</pubDate>
      <link>https://dev.to/abhijithzero/getting-started-with-docker-essential-commands-for-beginners-b60</link>
      <guid>https://dev.to/abhijithzero/getting-started-with-docker-essential-commands-for-beginners-b60</guid>
      <description>&lt;p&gt;So you're venturing into the realm of Docker? Great choice! This technology is a game changer for developers, making it incredibly simple to package and run apps in containers.&lt;/p&gt;

&lt;p&gt;To help you started, here are some important Docker commands you'll commonly use.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Installing Docker
&lt;/h2&gt;

&lt;p&gt;Before you start, make sure Docker is installed on your machine. You can follow the official installation guide for &lt;a href="https://docs.docker.com/get-docker/" rel="noopener noreferrer"&gt;Docker Desktop&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Basic Docker Commands
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker --version&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;verifies your Docker installation&lt;/strong&gt; by checking the installed version.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker pull &amp;lt;image_name&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;pulls a Docker image from the Docker Hub&lt;/strong&gt; repository.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker run &amp;lt;image_name&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;creates and runs a container&lt;/strong&gt; from a Docker image. To run container in detached mode add &lt;code&gt;-d&lt;/code&gt; flag.&lt;br&gt;
To map container port to local ports add &lt;code&gt;-p&lt;/code&gt; flag.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Example:
docker run -d -p 8080:80 nginx

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This runs nginx container in port 8080 in detached mode(Runs in background i.e. No terminal will be tied to it).(We are mapping container port 80 to our local port 8080)&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker ps&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This &lt;strong&gt;shows all running containers&lt;/strong&gt;. Use docker ps -a to see all containers, including those that are stopped.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker images&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;lists all Docker images&lt;/strong&gt; downloaded to your local machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Commands to Manage Containers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker stop &amp;lt;container_id&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;stops a running container&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker start &amp;lt;container_id&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;starts a stopped container&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker logs &amp;lt;container_id&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command is &lt;strong&gt;used to show logs of a running container&lt;/strong&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker restart &amp;lt;container_id&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;restarts a running container&lt;/strong&gt;. &lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker rm &amp;lt;container_id&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;deletes a stopped container&lt;/strong&gt;. Use &lt;code&gt;-f&lt;/code&gt; to force remove a running container.&lt;/p&gt;

&lt;p&gt;(You can replace  with the actual container ID or name.)&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker rmi &amp;lt;image_name&amp;gt;&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;deletes an image from local machine&lt;/strong&gt;. Used to free up space.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;docker system prune&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;This command &lt;strong&gt;cleans up all stopped containers&lt;/strong&gt;, dangling images, and unused networks.&lt;/p&gt;

&lt;p&gt;Docker makes it easy to package and deploy applications. If you master these commands, it will give you a solid foundation as you begin exploring more advanced features. Any questions ask them below.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Happy Dockerizing!&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>learning</category>
      <category>docker</category>
      <category>beginners</category>
    </item>
    <item>
      <title>AWS Basics 1: How I hosted a static website on amazon S3</title>
      <dc:creator>Abhijith</dc:creator>
      <pubDate>Thu, 17 Oct 2024 15:19:37 +0000</pubDate>
      <link>https://dev.to/abhijithzero/aws-basics-1-how-i-hosted-a-static-website-on-amazon-s3-14pp</link>
      <guid>https://dev.to/abhijithzero/aws-basics-1-how-i-hosted-a-static-website-on-amazon-s3-14pp</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Have you ever wondered how to host a static website on AWS? I certainly did! For a while, it felt daunting, and I kept putting it off. But when I finally decided to dive in, I was pleasantly surprised by how simple the process turned out to be. It was quite a journey, and I learned a lot along the way. I’d love to share my experience with you!&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Need
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An AWS account&lt;/li&gt;
&lt;li&gt;About 20 minutes of your time&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Let’s Start with What S3 Is
&lt;/h2&gt;

&lt;p&gt;Amazon Simple Storage Service (S3) is a scalable cloud storage solution offered by AWS. It's designed to store and retrieve any amount of data from anywhere on the web. Here are some key features that make S3 an excellent choice for hosting static websites:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Scalability:&lt;/strong&gt; S3 can handle any size of data, from a few bytes to terabytes, making it perfect for growing websites.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Durability and Availability:&lt;/strong&gt; With a durability rate of 99.999999999% (11 nines), your data is safe and always accessible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost-Effectiveness:&lt;/strong&gt; You only pay for what you use, making it an economical option for hosting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Static Website Hosting:&lt;/strong&gt; S3 provides a straightforward way to host static websites without needing to manage servers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, S3 is a reliable, efficient, and user-friendly option for anyone looking to host a static website.&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps to Create Your S3 Bucket
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Log in to the AWS Management Console.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Navigate to S3&lt;/strong&gt; and click &lt;strong&gt;"Create Bucket."&lt;/strong&gt;
&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F95zejtq7ni2rbe5peul8.png" alt=" " width="800" height="421"&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Configure Bucket Settings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bucket Name:&lt;/strong&gt; Choose a unique name for your bucket (this name must be globally unique across all of S3).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region:&lt;/strong&gt; Select the AWS region where you want your bucket to be located. It’s best to choose a region closest to your target audience. (You can find this option in the top navbar.)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Set Permissions:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enable ACLs (for more fine-grained control over permissions in the S3 bucket).&lt;/li&gt;
&lt;li&gt;Uncheck "Block all public access" so that people can view your website.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enable Bucket Versioning:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Think of it as similar to version control in GitHub.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Review and Create:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Review your settings and click the &lt;strong&gt;Create bucket&lt;/strong&gt; button. Your new bucket will be created!&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvyib0w1nykw2vaer7yub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvyib0w1nykw2vaer7yub.png" alt=" " width="800" height="274"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Steps to upload your files
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upload your &lt;code&gt;index.html&lt;/code&gt; file:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
This file serves as the main entry point for your website.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Upload the folder containing all website assets:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Make sure to upload the folder that contains your CSS, JavaScript, images, and other assets. &lt;strong&gt;Note:&lt;/strong&gt; Do not upload a zipped version, as S3 cannot unzip files.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvbo95wbvm8t343yuh1g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpvbo95wbvm8t343yuh1g.png" alt=" " width="800" height="203"&gt;&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Enable Static Website Hosting:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Go to the &lt;strong&gt;Properties&lt;/strong&gt; section of your bucket and enable static web hosting and specify the default page of your website. This allows S3 to serve your website files directly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Testing Your URL
&lt;/h3&gt;

&lt;p&gt;Now, test the URL generated by S3. Did you encounter an error? If so, it’s likely because your bucket permissions need to be set to allow public access to your files.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4m4bdn9r00va180al2c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fp4m4bdn9r00va180al2c.png" alt="Error Example" width="800" height="178"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why the Error Occurs
&lt;/h3&gt;

&lt;p&gt;By default, S3 buckets block public access for security reasons. It’s like having a beautifully displayed store window — everyone can see the store itself, but the products inside are locked away and inaccessible. To fix this, you need to change the permissions of your files to make them public. Once that’s done, visitors will be able to see and access your content as intended.&lt;/p&gt;

&lt;h3&gt;
  
  
  Making objects public
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Select the objects in your S3 bucket and click &lt;strong&gt;Actions&lt;/strong&gt; menu, then  make public using ACL&lt;/li&gt;
&lt;li&gt;Refresh your link again to view your website&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aws</category>
      <category>s3</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
