DEV Community: Abhijith

How I build an AI Conversation Coach with Gemini Live API for Gemini Live Agent Challenge

Abhijith — Mon, 16 Mar 2026 20:52:40 +0000

Why I Built This

I'm socially anxious. Not the cute "I'm introverted" kind but the actual scrambling for words, wishing the ground would open up and swallow kind.

So I moved to Ireland for grad school. Figured a fresh start, new country, new people. Maybe things would be different. A few months in, I went to a tech conference. Perfect opportunity to network, I thought. I psyched myself up.

Then it happened. I got there, saw a group of people chatting, and my brain just... shut down. Froze completely. I wanted to join the conversation so badly, but I just couldn't do it. I spent half the conference wandering around, pretending to look at the schedule.

The frustrating part? It wasn't because I had nothing to say. I work in tech, I find the problems interesting, I wanted to talk to these people. The problem was pure lack of practice. I'd never actually practiced starting conversations with strangers. And it showed.

The Idea

So what if there was a way to practice this? Like, actually practice talking to people without the fear of judgment. A realistic AI person who could act distracted, busy, skeptical the way real people at conferences actually are.

IceBreaker is basically that. You pick any scenario be it a casual conversation, tech mixer, cold intro at a founder booth, whatever you want and you practice out loud. The AI responds in real-time, like an actual person. It can be warm or guarded depending on the difficulty level.

While you're talking, it analyzes your audio and video in real-time and gives you live tips. Things like "try asking a question here" or "you said 'um' 5 times that minute." After you're done, you get a full breakdown with scores: how much you talked vs. listened, how many questions you asked, filler words, body language confidence, your sentiment trend through the conversation, and how well you recovered from awkward moments.

Plus a dashboard to track your progress over multiple sessions. Watch your scores improve over time.

Building It

This wasn't a simple chatbot. I needed real-time audio conversations that actually felt like talking to a person. Which meant WebSockets, streaming audio, video frames, and multiple systems all talking to each other.

Here's what I used:

Part	Stack
Frontend	React 19 + Vite + Tailwind + Recharts (for the debrief charts)
AI Engine	Google Gemini 2.5 Flash via Gemini Live API
Backend	Python + FastAPI on Cloud Run
Database	Firestore
Deployment	Docker, Cloud Build, Vercel

The tricky part was the real-time bit. The browser opens a WebSocket directly to Gemini, streams audio to it (16 kHz), gets audio back (24 kHz), and also sends video frames (~1 per second) so Gemini can see your body language.

Two function calls handle the feedback loop:

submit_tip(): After you speak, Gemini calls this to send live coaching
submit_metrics(): At the end, Gemini calls this to calculate your scores

Why Function Calls Mattered

Here's something I learned the hard way: asking an LLM to output JSON mixed in with regular text is a mess. I started by having Gemini just output tips and metrics as text, then I'd parse them. It was unreliable. Sometimes it would paraphrase what you said wrong. Sometimes the JSON wasn't valid. Sometimes it just forgot what you were asking for.

Then I switched to function calling. Instead of asking Gemini to "output this as JSON," I gave it actual functions to call. Now the coaching tips and metrics come through as clean, structured data. No parsing guesswork, no hallucinations. It just works.

What Went Wrong (And How I Fixed It)

The AI Kept Forgetting Things

Early on, if your internet dropped for a second, the AI would lose context. It'd restart the conversation or start making stuff up. Imagine you're halfway through pitching your startup and suddenly the AI forgets what you said two turns ago. It was bad.

I fixed this by saving sessions on the backend. Now if you drop connection, you can pick up right where you left off.

The AI Sounded Like a Chatbot

Getting Gemini to sound like an actual person at a conference not helpful, not overly eager took a lot of tweaking. The same prompt would sometimes produce a warm, natural response and other times something super formal and robotic. It'd ask too many questions at once, or use phrases nobody actually says.

I spent a lot of time on prompt engineering. Testing different personas, different conversational styles, getting more specific about what "natural" means. It's still not perfect, but it's way better.

Parsing Was Killing Me

Before I switched to function calls, I was asking Gemini to output coaching tips as text, then trying to parse that. It was fragile. The model would sometimes rewrite what you said, sometimes format things weird. I'd built all this parsing logic and it still broke constantly.

Once I switched to function calling, it was night and day. Clean JSON every time. No more parsing headaches.

Things I'm Actually Proud Of

Getting Gemini Live to work end-to-end. Real-time audio conversations that don't feel like you're talking to a bot. That's genuinely hard. WebSockets, audio streaming, keeping state, managing latency, talk about a lot that can go wrong. I made it work.

Building something people can actually use. Not just a demo that works once. I mean error handling, reconnection logic, data persistence, tracking progress over time. The boring stuff that makes a product actually useful.

The feedback system actually helps. You don't just get random stats. The coaching tips are timely (in-the-moment), and the debrief metrics are actually actionable. You can see exactly where you improved.

What I Actually Learned

Real-time streaming is fragile. Audio dropping, video lag, WebSocket timeouts—there are so many places where things can break. It's not as simple as "just stream the data." You have to think about buffers, reconnection, graceful degradation.

Never ask an LLM for JSON and try to parse it. This is a hard lesson. The model will sometimes output valid JSON, sometimes not, sometimes it'll add comments, sometimes it'll mess up the schema. Function calling is the right answer. Give the model an actual function to call, not a text format to output.

Prompt engineering is really hard. I thought I could write a prompt once and be done. Nope. The same prompt produces different outputs depending on temperature, context, the moon phase, who knows. It takes iteration, testing, examples, and luck. Don't underestimate it.

Shipping matters more than having the perfect feature. I could spend months perfecting the persona AI. But shipping an 80% solution that people can use and give feedback on is way more valuable. You learn what actually matters from real users, not from theorizing.

Next Steps

More personas and scenarios. Right now there are a few conversation types. I want to expand that—different industries, different difficulty levels, different people types.

Let people create custom scenarios. Instead of me pre-building everything, what if you could describe an event you're going to, describe the kind of person you expect to meet, and have IceBreaker generate a practice session tailored to that? Way more useful.

That's It

Building this thing taught me that good products come from real problems. I built it because I was frustrated with my own anxiety, not because I wanted to solve "networking" for everyone. And honestly? My anxiety is still there. I'm still nervous at conferences.

But now I can practice. I can work on it. And maybe that's the difference between just being stuck with something and actually being able to improve.

If you're like me, if you struggle with social stuff and want to get better try it out. If you have feedback, let me know.

Try IceBreaker
View Code

Building a RAG Powered Assistant with Spring AI and LM Studio

Abhijith — Tue, 17 Feb 2026 00:25:34 +0000

How to Create an Intelligent Document Q&A System Using Spring AI, PostgreSQL, and LM Studio

Imagine having an AI assistant that can instantly answer questions about hundreds of financial documents, quarterly reports, market analyses, policy papers without you having to manually search through pages of text. That's exactly what Retrieval Augmented Generation (RAG) enables, and in this tutorial, we'll build one from scratch using Spring Boot.

By the end of this guide, you'll have a fully functional application that:

Ingests PDF documents and extracts their content
Converts text into semantic embeddings using AI models
Stores embeddings in a PostgreSQL vector database
Answers natural language queries with contextual accuracy

What is RAG and Why Does It Matter?

Retrieval Augmented Generation (RAG) is a technique that enhances Large Language Models (LLMs) by providing them with relevant context from external knowledge sources. Instead of relying solely on the model's training data, RAG systems:

Retrieve relevant documents based on semantic similarity
Augment the LLM's prompt with retrieved context
Generate accurate, grounded answers

This approach is particularly powerful for:

Enterprise knowledge bases with proprietary information
Financial document analysis and compliance
Customer support systems with extensive documentation
Research paper exploration and literature reviews

Architecture Overview

Our FinanceRag application follows a straightforward yet powerful architecture:

1. Document Ingestion Pipeline

PDF documents are read from the classpath and processed by Spring AI's PagePdfDocumentReader, which extracts text while preserving structure.

2. Text Chunking

The TokenTextSplitter divides the extracted text into manageable chunks (800 tokens each). This is crucial because:

Embedding models have token limits
Smaller chunks provide more precise semantic matching
Context windows in LLMs benefit from focused, relevant information

3. Vector Embedding Generation

Each text chunk is converted into a high-dimensional vector (embedding) using the nomic-embed-text model. These embeddings capture semantic meaning similar concepts cluster together in vector space.

4. Vector Storage with pgvector

Embeddings are persisted in PostgreSQL using the pgvector extension, which enables efficient similarity searches. We use HNSW indexing for fast approximate nearest neighbor (ANN) queries.

5. Query Processing

When a user asks a question:

The question is embedded using the same model
Vector similarity search retrieves the most relevant document chunks
The QuestionAnswerAdvisor augments the LLM prompt with this context
The LLM generates a contextual answer

Building the Application: Step by Step

Prerequisites

Before diving into code, ensure you have:

Java 17+ installed
PostgreSQL 12+ with pgvector extension
LM Studio (or another OpenAI-compatible LLM endpoint)
Maven 3+ for dependency management

Setting Up PostgreSQL with pgvector

You have two options for setting up PostgreSQL:

Option 1: Using Docker (Recommended for Quick Start)

Your repository includes a compose.yaml file for easy setup:

services:
  postgres:
    image: pgvector/pgvector:pg16
    ports:
      - "55419:5432"
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: finance
    volumes:
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Simply run:

docker-compose up -d

This spins up PostgreSQL with pgvector pre-installed on port 55419.

Option 2: Manual Installation

First, create a database and enable the vector extension:

CREATE DATABASE finance;
\c finance
CREATE EXTENSION IF NOT EXISTS vector;

The pgvector extension adds a new vector data type to PostgreSQL, enabling efficient storage and querying of high-dimensional vectors.

Configuring Spring Boot

Your application.properties file should include:

# Application Name
spring.application.name=finaceRag

# Database Configuration (Docker setup)
spring.datasource.url=jdbc:postgresql://localhost:55419/finance
spring.datasource.username=postgres
spring.datasource.password=postgres

# LLM Configuration (LM Studio)
spring.ai.openai.base-url=http://localhost:1234/
spring.ai.openai.api-key=dummy

# Embedding Model
spring.ai.openai.embedding.options.model=nomic-embed-text

# Chat Model
spring.ai.openai.chat.options.model=google/gemma-3-4b

# Vector Store Configuration
spring.ai.vectorstore.pgvector.initialize-schema=true

# Ingestion Control (IMPORTANT!)
financerag.ingest.enabled=true

Key Configuration Notes:

Port 55419 matches the Docker Compose setup
The initialize-schema=true automatically creates the vector store table
nomic-embed-text is a lightweight, high-quality embedding model
google/gemma-3-4b is the chat model served by LM Studio

** Important: Ingestion Control**

The financerag.ingest.enabled property is a smart optimization:

First Run (Initial Setup):

financerag.ingest.enabled=true

This processes your PDFs and populates the vector store.

Subsequent Runs:

financerag.ingest.enabled=false

This skips ingestion and starts the application immediately. The embeddings are already in PostgreSQL, so there's no need to re-process documents every time!

This design prevents:

Duplicate embeddings in the database
Slow startup times on every restart
Unnecessary LLM API calls

Setting Up LM Studio

Download and Install LM Studio from lmstudio.ai
Download the Required Models:
- Embedding Model: Search for "nomic-embed-text" in LM Studio and download it
- Chat Model: Search for "google/gemma-3-4b" (or similar) and download it
Start the Local Server:
- Open LM Studio
- Go to the "Local Server" tab
- Select your chat model (gemma-3-4b)
- Click "Start Server" (it will run on http://localhost:1234 by default)
- Ensure the embedding model is also loaded
Verify the Connection:

   curl http://localhost:1234/v1/models

You should see your loaded models listed.

The Ingestion Service

The heart of our document processing pipeline is the IngestionService. Here's how it works:

@Component
@ConditionalOnProperty(
    name = "financerag.ingest.enabled",
    havingValue = "true",
    matchIfMissing = false
)
public class IngestionService implements CommandLineRunner {
    private static final Logger logger = LoggerFactory.getLogger(IngestionService.class);

    private final VectorStore vectorStore;

    @Value("classpath:/docs/article.pdf")
    private Resource pdfResource;

    public IngestionService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    @Override
    public void run(String... args) throws Exception {
        logger.info("Starting data ingestion process...");

        // 1. Read PDF using paragraph-based reader
        var pdfReader = new ParagraphPdfDocumentReader(pdfResource);

        // 2. Split text into chunks
        TextSplitter splitter = new TokenTextSplitter();

        // 3. Process and store in vector database
        vectorStore.accept(splitter.apply(pdfReader.get()));

        logger.info("Vector store updated with PDF content.");
    }
}

Key Implementation Insights:

1. Conditional Ingestion
The @ConditionalOnProperty annotation is brilliant - it only runs ingestion when you explicitly enable it:

# Enable ingestion on first run
financerag.ingest.enabled=true

# Disable after initial setup to avoid re-ingesting
financerag.ingest.enabled=false

This prevents re-processing documents on every application restart!

2. CommandLineRunner Interface
By implementing CommandLineRunner, the ingestion happens automatically after Spring Boot starts, but before the application begins serving requests.

3. ParagraphPdfDocumentReader vs PagePdfDocumentReader
Your code uses ParagraphPdfDocumentReader which:

Preserves document structure better by respecting paragraph boundaries
Creates more semantically meaningful chunks
Better suited for financial documents with structured content

4. Simplified API
The vectorStore.accept() method elegantly handles:

Embedding generation for each chunk
Batch insertion into PostgreSQL
All the complexity hidden behind a clean API

The Chat Controller

Now let's expose a REST endpoint for queries:

@RestController
public class ChatController {

    private final ChatClient chatClient;

    public ChatController(ChatClient.Builder chatClient, PgVectorStore vectorStore) {
        this.chatClient = chatClient
            .defaultAdvisors(QuestionAnswerAdvisor.builder(vectorStore).build())
            .build();
    }

    @GetMapping("/chat")
    public String chat(@RequestParam String question) {
        return chatClient.prompt()
            .user(question)
            .call()
            .content();
    }
}

The Magic of QuestionAnswerAdvisor:

The QuestionAnswerAdvisor is where RAG happens. Behind the scenes, it:

Converts the user's question into an embedding
Performs a similarity search against the vector store
Injects the most relevant document chunks into the prompt
Sends the augmented prompt to the LLM

Key Implementation Details:

The advisor is built using the builder pattern: QuestionAnswerAdvisor.builder(vectorStore).build()
Spring AI automatically handles the vector search and context injection
The controller method is elegantly simple - just pass the question through the chat client

Real World Considerations

Choosing the Right Chunk Size

The 800-token chunk size is a starting point. Consider:

Smaller chunks (200-400 tokens): Better precision, but may lose context
Larger chunks (1000-1500 tokens): More context, but less precise matching

Experiment with your specific use case. Financial reports might need larger chunks to preserve numerical context, while FAQs work better with smaller, focused chunks.

Scaling to Production

For production deployments, consider:

Async ingestion: Move document processing to background jobs
Caching: Cache embeddings for frequently accessed documents
Metadata filtering: Add tags (date, category, source) to narrow searches
Monitoring: Track query latency and similarity scores

Hybrid Search Strategies

Pure vector search isn't always optimal. Combine it with:

Full-text search: For exact keyword matches
BM25 ranking: Traditional relevance scoring
Re-ranking: Use a cross-encoder model to refine top results

Testing Your RAG System

Start the application and test with curl:

curl "http://localhost:8080/chat?question=What%20were%20the%20key%20trends%20in%20Q4%20earnings?"

You should see an answer grounded in your ingested documents. Compare responses with and without RAG to appreciate the difference in accuracy and relevance.

Common Pitfalls and Solutions

1. Embedding Dimension Mismatch

Problem: Embeddings fail to store with dimension errors.

Solution: Ensure spring.ai.vectorstore.pgvector.dimensions matches your embedding model. For nomic-embed-text, use 768.

2. Poor Retrieval Quality

Problem: Answers don't align with document content.

Solution: Adjust chunk size, increase topK, or lower the similarity threshold. Also verify your embedding model is appropriate for your domain.

3. Memory Issues During Ingestion

Problem: Application crashes with OutOfMemoryError.

Solution: Process documents in batches, increase JVM heap size (-Xmx4g), or limit the maxNumChunks parameter.

Extending FinanceRag: Ideas for Enhancement

This project is a foundation. Here are some powerful extensions:

Multi Document Support

Instead of hardcoding a single PDF, scan a directory or accept uploads via REST API. Add metadata (filename, upload date) to enable filtered searches.

Conversational Memory

Implement session based chat history so users can ask follow up questions without repeating context. Spring AI supports this with MessageChatMemoryAdvisor.

Source Attribution

Return not just the answer but citations showing which document chunks were used. This builds trust and allows users to verify information.

Advanced Analytics

Track which documents are queried most frequently, average similarity scores, and query patterns to identify knowledge gaps.

Conclusion

You've now built a production-ready RAG system that can intelligently answer questions about your documents. This architecture scales to thousands of documents and can be adapted for countless use cases such as customer support, legal document analysis, medical research, and more.

The beauty of Spring AI is how it abstracts the complexity of embeddings, vector stores, and LLM orchestration, letting you focus on business logic. With just three components IngestionService, ChatController, and pgvector we've created a powerful AI assistant.

The full source code for FinanceRag is available on GitHub. Clone it, experiment with different models and chunk sizes, and adapt it to your domain. The future of enterprise AI is built on foundations like these combining the power of LLMs with your organization's proprietary knowledge.

Special Thanks: This project was inspired by the excellent Spring AI content from Dan Vega, whose tutorials have helped countless developers understand the power of RAG architectures.

Happy coding, and may your AI assistants always retrieve the right context!

About the Author

This tutorial is brought to you by the Abhijith Rajesh

Links:

Top 5 Infrastructure-Level Techniques to Handle High Traffic in Spring Boot: Part 2

Abhijith — Sun, 13 Jul 2025 18:26:02 +0000

In Part 1 of this blog series, we focused on code-level techniques to make your Spring Boot APIs more resilient: connection pooling, caching, async processing, rate limiting, and circuit breakers.

But when traffic really surges — due to a flash sale, viral feature, or seasonal peak — smart code alone may not be enough.

That’s where infrastructure-level strategies come in.

From auto-scaling groups and load balancers to observability, CDNs, and container orchestration — these tools and patterns ensure your backend scales horizontally, responds intelligently, and recovers automatically.

Let’s break down how you can build an infrastructure that’s ready for real-world traffic.

1. Load Balancing

When thousands (or millions) of users start hitting your application, routing all that traffic to a single server is a recipe for disaster. That's where load balancers come in.

What Is Load Balancing?

Load balancing is the process of distributing incoming requests across multiple instances of your application, so that no single server gets overwhelmed.

It ensures:

High availability (if one instance goes down, others take over)
Better performance (requests are split evenly)
Scalability (you can add/remove servers dynamically)

Think of it like a traffic cop that routes vehicles (requests) evenly across open lanes (app instances).

L4 vs L7 Load Balancing

There are two main types of load balancing:

Layer	Description	Example Use Case
L4 (Transport Layer)	Routes traffic based on IP address and port (TCP/UDP)	Fast routing for HTTP, gRPC, etc.
L7 (Application Layer)	Routes based on request content (URL path, headers, cookies)	Direct `/api/users` to user-service and `/api/orders` to order-service

Tip: Most modern apps use L7 load balancing because it provides more control and intelligent routing.

Popular Load Balancers

Here are some tools you can use depending on your environment:

- NGINX

Lightweight and widely used L7 load balancer
Great for self-managed or on-prem deployments
Can route based on path, headers, or even cookie values

- AWS Application Load Balancer (ALB)

Fully managed L7 load balancer in AWS
Works seamlessly with EC2, ECS, EKS, etc.
Supports auto-scaling + health checks

- Spring Cloud Gateway

Java-based API gateway built on Spring Boot + Reactor
Ideal for microservices and reactive apps
Can be used for dynamic routing, rate limiting, and circuit breaking

2. Auto Scaling Groups (ASGs)

No matter how well you’ve tuned your code or balanced your load, there’s a limit to what a single instance of your application can handle.

Auto Scaling Groups (ASGs) let you automatically adjust the number of application instances based on real-time traffic and performance — scaling out during spikes and in when things are quiet.

What Is an Auto Scaling Group?

An Auto Scaling Group is a cloud service (commonly on AWS, Azure, or GCP) that manages a group of virtual machines (like EC2 instances) running your app.

It can automatically:

Scale out: Add more instances when load increases
Scale in: Remove excess instances when traffic drops

This ensures your app has just enough capacity — not too little (which causes downtime) and not too much (which wastes money).

Common Scaling Triggers

ASGs respond to key metrics like:

Metric	Description
CPU Utilization	Scale out when CPU > 70% for X minutes
Request Count	Scale based on incoming HTTP request rate
Latency	Scale if average response time increases
Custom Metrics	Queue length, memory usage, DB connections

You can configure these in tools like AWS CloudWatch or Kubernetes HPA.

Horizontal vs Vertical Scaling

Type	Description	Example
Vertical Scaling	Increase resources on a single machine (CPU, RAM)	Upgrade from t3.small → t3.large
Horizontal Scaling	Add more instances of the app	Launch 3 → 10 EC2 instances

Horizontal scaling (ASG) is preferred for high availability and fault tolerance.

Warm vs Cold Starts

When an ASG scales out, new instances need to boot up, pull code, and initialize. This takes time (30–90 seconds), called a cold start.

To reduce cold start impact:

Use Amazon AMIs or Docker images preloaded with your app
Prefer warm pools or pre-provisioned containers (ECS, EKS)

Example: ASG in AWS

You set up an ASG with:
- Min size: 2 instances
- Max size: 10 instances
- Scale out when CPU > 70% for 3 mins
- Scale in when CPU < 30% for 5 mins

At low traffic, it runs 2 instances. During a traffic spike, it can scale up to 10 instances automatically — no manual intervention required.

Spring Boot Compatibility

Spring Boot apps work well in auto-scaling environments when:

They are stateless (no in-memory session data)
Configs like DB connections and cache clients are tuned for dynamic environments
Health checks (like /actuator/health) are configured properly

Auto Scaling gives you elasticity — your app grows and shrinks with your traffic, keeping costs down and uptime high.

3. Containerization & Orchestration

Scaling manually — provisioning servers, installing dependencies, deploying code — becomes a bottleneck as traffic increases. That’s why modern Spring Boot applications are containerized with tools like Docker and managed by orchestration platforms like Kubernetes or AWS ECS.

What is Containerization?

Containerization packages your app and its dependencies into a self-contained unit that runs anywhere — consistently.

Popular tool:

Docker — the most widely used container platform.

With Docker, you can "bake" your Spring Boot app into an image using a Dockerfile.

📄 Example Dockerfile:

FROM openjdk:17
COPY target/myapp.jar app.jar
ENTRYPOINT ["java", "-jar", "app.jar"]

Why Containers Help Handle High Traffic

Fast startup: Containers boot in seconds, perfect for scaling.
Consistency: "It works on my machine" becomes irrelevant.
Portability: Works across environments — cloud, local, CI/CD.
Isolation: Each app instance runs independently.

During traffic spikes, containers let you scale quickly and cleanly.

What Is Orchestration?

After containerizing your app, you need a system to:

Start and stop containers
Restart failed ones
Scale based on load
Handle networking between services

This is called container orchestration.

Popular Orchestration Tools

Tool	Description	Best For
Kubernetes	Cloud-agnostic, powerful container orchestrator	Complex, production-grade deployments
AWS ECS	AWS-managed orchestration for Docker containers	AWS-native apps
AWS Fargate	Serverless containers (no servers to manage)	Quick, scalable deployments

A common stack today: Spring Boot + Docker + Kubernetes

4. CDN & Edge Caching

When your APIs or static assets are publicly accessible, you don’t want every request to hit your Spring Boot server — especially during traffic spikes.

This is where CDNs (Content Delivery Networks) and edge caching come in.

What Is a CDN?

A CDN is a network of geographically distributed servers that cache and serve content closer to the user.

Instead of serving static files (images, CSS, JS) or even public APIs from your origin server every time, a CDN:

Reduces latency
Caches content near the user
Shields your backend from spikes

Common CDNs

CDN Service	Ideal Use Case
Cloudflare	Static content, public APIs, free tier
AWS CloudFront	Deep AWS integration, S3, Lambda@Edge
Fastly	Real-time edge logic
Akamai	Enterprise-grade, massive scale

What You Can Cache

Images, stylesheets, JS bundles
Product listings or public blogs
Public GET endpoints (e.g., /products, /news)
API responses with Cache-Control headers

Benefits in High Traffic

Faster response time globally
Offloads requests from backend
Protects origin via DDoS shielding
Handles traffic spikes better than your server alone

5. Observability & Load Testing

You can’t scale or debug what you can’t see. When your APIs are under heavy load, things can go wrong — services might slow down, databases could become bottlenecks, or dependencies might fail.

Observability + Load Testing helps you:

Detect bottlenecks
Understand failure points
Prepare for real-world traffic

What Is Observability?

Observability means your system can answer:

What’s happening? → Metrics
What happened? → Logs
Why did it happen? → Traces

Think of it as a monitoring + debugging toolkit for production.

Key Tools for Observability

Layer	Tool	Purpose
Logging	Logback, Log4j2, Loki	Application-level logs
Metrics	Micrometer + Prometheus	JVM, HTTP, DB metrics
Tracing	OpenTelemetry, Zipkin	Distributed request tracing
Dashboards	Grafana	Visualize data
Alerts	Alertmanager, CloudWatch	Notify on failures/thresholds

Metrics in Spring Boot with Prometheus

Add Micrometer to your Spring Boot project:

<!-- pom.xml -->
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Enable Prometheus Endpoint

Enable actuator metrics in your application.yml:

management:
  endpoints:
    web:
      exposure:
        include: health,info,metrics,prometheus

Prometheus can now scrape from:

/actuator/prometheus

Distributed Tracing with OpenTelemetry

Tracing helps you follow requests across microservices.

Add Tracing Dependencies

<dependency>
  <groupId>io.opentelemetry.instrumentation</groupId>
  <artifactId>opentelemetry-spring-boot-autoconfigure</artifactId>
  <version>1.32.0</version>
</dependency>

Add Headers to Outgoing Calls Using Interceptors

RestTemplate restTemplate = new RestTemplateBuilder()
    .interceptors(new TracingClientHttpRequestInterceptor())
    .build();

You can view request flow and bottlenecks in Zipkin or Jaeger.

Common Metrics to Monitor

Metric	Why It Matters
`http.server.requests`	API latency, error rates
`jvm.memory.used`	Memory health, garbage collection issues
`db.connections.active`	Detect DB pool exhaustion
`cache.hit/miss`	Caching effectiveness
`kafka.consumer.lag`	Async queue health

Set Up Smart Alerts

Set alerts like:

Response time > 1s on /checkout
Error rate > 5% for any endpoint
JVM memory > 85%
DB connection pool > 90%

Use tools like Alertmanager, CloudWatch, or Grafana alerts to notify via Slack, email, or PagerDuty.

Load & Stress Testing with JMeter

Before your app hits real traffic, simulate it using Apache JMeter.

Load Test vs Stress Test

Type	Goal
Load Test	Simulate expected traffic volume
Stress Test	Push system beyond its limits to find breaks

How to Test Spring Boot APIs with JMeter

Download from jmeter.apache.org
Open JMeter GUI and create a Thread Group:
- Threads: 100
- Ramp-up: 10s
- Loop: 10
Add HTTP Request:
- Method: GET
- URL: http://localhost:8080/api/products
Add Summary Report or Graph Results
Run and observe response times, throughput, and failures

Conclusion

Handling high traffic isn't just about writing better code — it's about building a system that can scale, self-heal, and stay visible under pressure.

In this post, we covered infrastructure-level strategies that help Spring Boot applications survive and thrive in production:

Load Balancers spread traffic evenly and prevent single points of failure.
Auto Scaling Groups grow or shrink your app based on demand.
Containerization ensures fast, portable deployments.
CDNs and edge caching offload static and public traffic from your backend.
Observability tools like Prometheus and Zipkin give you deep visibility into how your system behaves under load.
Load testing helps you validate performance before traffic actually hits.

These infrastructure patterns complement the code-level techniques discussed in Part 1, creating a robust, production-ready system.

When you combine resilient code with scalable infrastructure, you're not just handling traffic — you're welcoming it.

What other strategies have you used to scale Spring Boot apps? Drop a comment below or share your thoughts!

Top 5 Code-Level Techniques to Handle High Traffic in Spring Boot: Part 1

Abhijith — Wed, 09 Jul 2025 20:56:57 +0000

When your app goes viral or hits a major user milestone, there’s one thing you absolutely can’t afford: your APIs crashing.

Whether you're building an e-commerce backend, a social platform, or a microservices-based system with Spring Boot, designing for peak load isn't just a best practice — it's essential.

The good news? You don’t need a massive budget or complex infrastructure to start preparing. Often, it begins with smart choices in your codebase.

In this two-part blog series, we’ll explore practical strategies to make your Spring Boot APIs resilient and performant under heavy traffic.

🧠 So What Is Peak Load and Why It Matters

Peak load is when your application receives an unusually high number of requests — like during sales, promotions, or trending events. If your app isn’t ready, users might see:

⛔️ 500 Internal Server Errors
🐢 Slow responses
🔄 Timeouts

🧱 The Core Strategy: Absorb, Redirect, and Recover

Think of your API system like a dam:

Absorb sudden spikes, redirect excess load, and recover quickly from overload.

Let’s break down the key components using the Spring Framework.

1. 🔌 Connection Pooling with Spring Boot

Every time your Spring application needs to interact with a database—whether it's saving user data, retrieving product information, or running a report—it must establish a connection, perform the operation, and then close the connection. Creating and tearing down these connections repeatedly under high load introduces latency and exhausts database and system resources.

Connection pooling solves this by maintaining a set of pre-established connections that are reused across requests. There are a lot of popular connection pooling frameworks like Apache Commons DBCP, HikariCP, C3P0. With Spring Boot and HikariCP, the pool is initialized when the application starts, creating a ready-to-use pool of connections. When a request comes in, Spring borrows an available connection from the pool, performs the operation, and returns the connection to the pool instead of closing it. This greatly reduces overhead, lowers latency, and prevents the database from becoming a bottleneck during peak traffic.

Example with hikariCp:

spring:
  datasource:
    url: jdbc:mysql://localhost:3306/mydb
    username: root
    password: secret
    hikari:
      maximum-pool-size: 20 # Max number of connections in the pool
      minimum-idle: 5 #Min number of idle (ready) connections
      connection-timeout: 30000 #the maximum amount of time (in milliseconds) that a client (your Spring Boot application) will wait to get a connection from the pool.
      idle-timeout: 600000 #the maximum amount of time (in milliseconds) that a connection is allowed to sit idle in the pool before being closed.

➡️ Match pool size to the number of concurrent DB connections your app can handle efficiently.

2. 🚦 Rate Limiting to Control Abuse

If users or bots hit your API too often, they can bring your server down.

✅ Solution: Add Rate Limiting or Throttling

🚦 What is Rate Limiting?

Rate Limiting controls how many requests a client (user, IP, token, etc.) can make to your API within a specific time window.

🧠 Why Use Rate Limiting?

Protects your app from abuse or misuse (e.g., brute-force attacks or API scraping).
Keeps your backend and database healthy under high load.
Ensures fair use across all users.

🔧 Example

"A user can call the /login API 5 times per minute."

If the user exceeds that, they get a 429 Too Many Requests error.

🔁 What is Throttling?

Throttling is closely related to rate limiting. But while rate limiting blocks requests beyond a threshold, throttling may slow them down or queue them instead.

📌 Difference in a Nutshell

Concept	Behavior	Goal
Rate Limiting	Reject excess requests	Prevent overload
Throttling	Delay or queue excess requests	Smooth traffic flow

Use libraries like Bucket4j or resilience4j to implement rate limits per IP or user.

Example with resilience4j:

@RateLimiter(name = "productDetailRateLimiter")
public String fetchData() {
    return "Success!";
}

In application.yml:

resilience4j:
  ratelimiter:
    instances:
      productDetailRateLimiter:
        limitForPeriod: 100       # Allow 100 requests (customers)
        limitRefreshPeriod: 1s    # every 1 second
        timeoutDuration: 0s       # if full, immediately say "no"

      checkoutRateLimiter:
        limitForPeriod: 10        # Allow only 10 requests (customers)
        limitRefreshPeriod: 5s    # every 5 seconds (checkout is resource intensive)
        timeoutDuration: 2s       # if full, wait up to 2 seconds

3. 🗃️ Add Caching for Frequently Requested Data

APIs that serve the same data repeatedly — like product lists, configurations, or top-rated items — should avoid hitting the database every time. Caching helps improve response times and reduce load.

In Spring Boot, you can use Caffeine for fast in-memory (local) caching or Redis for distributed caching. Combining both gives you the best of both worlds:

🧠 Caffeine: Blazing-fast in-process memory cache
🌐 Redis: Shared cache across app instances (useful in cloud or clustered environments)

✅ Solution: Use Spring Cache with Caffeine + Redis

Step 1: Add Dependencies

In pom.xml:

<!-- Spring Cache Abstraction -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-cache</artifactId>
</dependency>

<!-- Caffeine -->
<dependency>
    <groupId>com.github.ben-manes.caffeine</groupId>
    <artifactId>caffeine</artifactId>
</dependency>

<!-- Redis -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

Step 2: Enable Caching


@SpringBootApplication
@EnableCaching
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

Step 3: Use `@Cacheable` to Cache Methods

@Cacheable(cacheNames = "products")
public List<Product> getAllProducts() {
    return productRepository.findAll();
}

Step 4: Configure Cache in `application.yml`

spring:
  cache:
    type: redis

  redis:
    host: localhost
    port: 6379

  caffeine:
    spec: maximumSize=500,expireAfterWrite=5m

Step 5: Combine Caffeine (L1) + Redis (L2)

To set up Caffeine + Redis hybrid caching, define a custom CacheManager:

@Bean
public CacheManager cacheManager(RedisConnectionFactory redisConnectionFactory) {
    Caffeine<Object, Object> caffeine = Caffeine.newBuilder()
        .maximumSize(500)
        .expireAfterWrite(5, TimeUnit.MINUTES);

    CaffeineCacheManager caffeineCacheManager = new CaffeineCacheManager();
    caffeineCacheManager.setCaffeine(caffeine);

    RedisCacheManager redisCacheManager = RedisCacheManager.builder(redisConnectionFactory).build();

    CompositeCacheManager compositeCacheManager = new CompositeCacheManager(
        caffeineCacheManager,
        redisCacheManager
    );

    compositeCacheManager.setFallbackToNoOpCache(false);
    return compositeCacheManager;
}

👉This is ideal for applications needing fast local reads with distributed consistency.

4. ⏳ Async Processing with Queues

When your API needs to perform a heavy or time-consuming task — like sending emails, processing images, generating reports, or calling external services — doing it synchronously (i.e., within the request-response cycle) can slow things down or even cause timeouts during high traffic.

Instead, you can process these tasks asynchronously, freeing up your API to respond quickly.

✅ Solution 1: Use @Async for Fire-and-Forget Tasks

Spring Boot makes asynchronous method execution super easy with the @Async annotation.

Example:

@PostMapping("/send-email")
public ResponseEntity<String> sendEmail(@RequestBody EmailDto dto) {
    emailService.sendEmail(dto); // this is @Async
    return ResponseEntity.ok("Email scheduled");
}

Responds fast, but if the app crashes before task completion, the work is lost (no durability)

✅ Solution 2: Using RabbitMQ for Queued Job

@PostMapping("/register")
public ResponseEntity<?> registerUser(@RequestBody User user) {
    userService.save(user);
    rabbitTemplate.convertAndSend("emailQueue", user.getEmail());
    return ResponseEntity.ok("User registered");
}

@RabbitListener(queues = "emailQueue")
public void sendEmail(String email) {
    // Send confirmation email
}

It is durable, even after restart
It decouples API from email logic

✅ Solution 3. Using Kafka for Logging or Events

@PostMapping("/checkout")
public ResponseEntity<?> checkout(@RequestBody Order order) {
    orderService.save(order);
    kafkaTemplate.send("order-events", new OrderEvent(order));
    return ResponseEntity.ok("Order placed");
}

It can log events for analytics
It is scalable under high load
Async consumers can process downstream (e.g., inventory, invoice)

🧠 Why It Helps with High Traffic

Reduces response time → Frees up API threads
Avoids blocking on slow operations (email, DB writes, external APIs)
Smooths traffic spikes via message queues
Scales better with distributed consumers

🔄 Use Cases in High API Traffic

Scenario	Problem	Async Solution
Sending emails or SMS	Slow 3rd-party API blocks request thread	Use `@Async` or queue message via RabbitMQ
Generating reports	Takes seconds/minutes	Queue job and return job ID instantly
Audit logging	Every request writes to DB	Send logs to Kafka (high throughput)
Image or video processing	CPU-intensive	Offload via RabbitMQ or Kafka
Webhook forwarding	Call to external service may timeout	Queue and process later

5. 🛑 Circuit Breakers with Resilience4j

When your API relies on external services like payment gateways, email providers, or third-party APIs, there's always a risk that they might fail or become slow.

Under high traffic, repeated failed calls can lead to:

Cascading failures
Thread exhaustion
Service-wide slowdowns

This is where the Circuit Breaker pattern shines. It helps your app fail fast, protect itself, and recover gracefully.

✅ What is a Circuit Breaker?

A circuit breaker monitors external calls and "opens the circuit" if too many failures happen in a short time. This stops further attempts temporarily, giving the system time to recover.

🟢 Closed: Normal operation
🔴 Open: Calls are blocked immediately
🟡 Half-Open: Allows a few test calls to check recovery

✅ Solution: Use Resilience4j Circuit Breaker in Spring Boot

Add the dependency in pom.xml:

<dependency>
  <groupId>io.github.resilience4j</groupId>
  <artifactId>resilience4j-spring-boot2</artifactId>
</dependency>
<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-aop</artifactId>
</dependency>

🔧 Example Usage

@CircuitBreaker(name = "paymentService", fallbackMethod = "fallbackPayment")
public PaymentResponse chargeCard(String userId) {
    return paymentClient.charge(userId); // external API call
}

If the call fails repeatedly, the circuit "opens" and the fallback method is triggered:

public PaymentResponse fallbackPayment(String userId, Throwable ex) {
    return new PaymentResponse("Payment service unavailable");
}

⚙️ Configure Circuit Breaker in `application.yml`

resilience4j.circuitbreaker:
  instances:
    paymentService:
      registerHealthIndicator: true
      slidingWindowSize: 10 
      failureRateThreshold: 50 #If more than 5 out of 10 calls fail, open the circuit

      waitDurationInOpenState: 30s #Stay open for 30 seconds before allowing test calls again

🧠 When to Use Circuit Breakers

Use Case	Should You Use It?
External APIs (payments, SMS, etc.)	✅ Definitely
Internal microservices (over network)	✅ Recommended
Local in-memory methods	❌ Not needed

🏁 Conclusion

Controlling high traffic isn’t just about throwing hardware at the problem — it starts with writing efficient, resilient code. In this post, we explored essential code-level strategies to prepare your Spring Boot APIs for peak load:

🔌 Connection pooling to avoid DB overload
🚦 Rate limiting to protect endpoints from abuse
🗃️ Caching with Caffeine (and Redis) to serve repeated requests faster
⏳ Async processing to offload heavy background tasks
🛑 Circuit breakers to prevent cascading failures from unstable dependencies

Each of these techniques helps your application stay responsive, even when traffic spikes or dependencies slow down.

👀 What’s Next?

Code-level techniques take you far, but without the right infrastructure, you're still at risk.

Stay tuned for Part 2: Infrastructure-Level Strategies for Handling High Traffic in Spring Boot APIs.

Demystifying AI Agents: How Language Models Think, Act, and Learn in the Real World

Abhijith — Sun, 22 Jun 2025 13:57:33 +0000

AI agents are the next step in making intelligent systems more interactive, capable, and autonomous. Instead of just answering questions, agents can reason through complex tasks, use tools, interact with their environment, and adapt to feedback. In this blog, we break down the core building blocks of AI agents in simple terms.

🧠 What is an Agent?

An agent is a system that can:

Perceive its environment (through inputs like queries or data)
Reason or plan its next steps
Act by calling external tools or APIs
Learn or adapt based on the outcome of its actions

In LLM-powered systems, the agent uses a language model to "think," tools to "act," and observations to improve future decisions.

🧱 What is an LLM?

An LLM (Large Language Model) like GPT-4, Claude, or Gemini is trained on large amounts of text to predict the next token in a sequence. It powers the reasoning, planning, and language generation abilities of an agent.

Think of it as the brain of the agent that understands instructions, generates thoughts, and decides what to do next.

🛠️ Tools: Extending the LLM's Abilities

LLMs are limited by design; they can't access real-time information or perform actions on external systems. That's where tools come in:

Tools are external functions the agent can call to:

Search the web
Query a database
Fetch weather or stock data
Execute code

Example tool call:

{
  "action": "get_weather",
  "input": "India"
}

💬 Messages and Special Tokens

Agentic systems rely on structured communication using messages and, in some frameworks, special tokens. These help manage conversations, tool usage, and the agent’s internal reasoning.

📬 Message Roles

Each message has a role that defines its purpose:

system – Sets the agent's behavior or instructions.

_Example: “You are an AI agent that can use tools.”_

user – The human's or calling app’s input.

_Example: “What’s the weather in Tokyo?”_

assistant – The LLM's response (thoughts, plans, or final answers).

_Example: “Action: get_weather, Input: Tokyo”_

tool – The result of a tool call.

_Example: “Observation: It's 27°C and sunny in Tokyo.”_

🧪 Special Tokens

Some frameworks (e.g., OpenAI, LangGraph) use tokens or delimiters to mark parts of the response:

<|thought|>, <|action|>, <|observation|> – Used to guide parsing
Ensures the system can stop at the right point and extract actions

🔁 Why It Matters

This structure lets agents:

Manage multi-turn workflows
Separate thought from action
Safely interact with tools

Together, messages and special tokens form the backbone of how agents think, act, and learn step-by-step.

⟳ The Thought → Action → Observation Cycle

This cycle is at the heart of agentic reasoning. The model reasons, acts, observes the result, and thinks again.

🔎 Diagram: Thought-Action-Observation Cycle

This loop continues until the task is complete.

🧬 Thought = Internal Reasoning

Not every step involves an action. Sometimes, the agent just thinks out loud to plan its next move.

These internal thoughts:

Help break down complex problems
Allow for step-by-step execution
Improve transparency

⚛️ The ReAct Approach

ReAct stands for Reasoning + Acting. It’s a popular approach for LLM-based agents.

ReAct Agent Output Example:

User: Convert 10 kilometers to miles.

Thought: I need to convert 10 kilometers to miles.

Action: Call a unit conversion tool.

Observation: 10 kilometers is approximately 6.21 miles.

Response: 10 kilometers is approximately 6.21 miles.

By alternating between reasoning and acting, the agent becomes more accurate and reliable.

🌍 Actions: Interacting with the Environment

Once the model has thought through its strategy, it uses actions to make changes in the world:

Query APIs
Execute shell commands
Send messages
Retrieve or update records

This is what makes agents actually do things instead of just say things.

👀 Observation: Reflect and React

Every action yields an observation — feedback from the environment.

The agent then:

Evaluates whether the result met the goal
Adapts its next thought
May retry or take alternative actions

This closes the loop and makes agents dynamic and responsive.

✅ Final Thoughts

LLMs become truly powerful when you turn them into agents:

They can plan and act
Use tools to bridge gaps
Think, act, and observe in cycles
Improve with feedback

You’ve just seen the architecture behind the smartest AI systems today — from coding copilots to research assistants. Whether using LangChain, SmolAgents, or custom frameworks, AI agents are how we move from static chat to autonomous intelligence.

Introduction to MCP: Making AI More Connected

Abhijith — Sat, 24 May 2025 18:38:30 +0000

With the rising capabilities of Large Language Models(LLMs) such as ChatGPT, Claude, Gemini, The AI ecosystem is changing rapidly. These models are often limited by their training data and don't have access to real-time data or specialized tools. That’s where MCP, or Model Context Protocol, comes in.

In this blog, we’ll break down what MCP is, why it’s useful, and how it helps AI work better with the tools and data we already use.

So what is MCP?

MCP (Model Context Protocol) is a new open standard that helps AI models interact with external tools and data in a structured, secure, and consistent way.

Imagine this:

You're chatting with an AI and you ask:

"Can you summarize the latest file in my Downloads folder?"

Without MCP, the AI wouldn’t have access to that file.

With MCP, the AI can ask an external tool (called a “Server”) for help, get the file, and provide the summary — all behind the scenes.

What Problem does MCP Solves

It helps solve the M×N Integration Problem. It refers to the challenge of connecting M different AI applications to N different tools or data sources without a standardized approach.
Take for example let’s say we have:

6 different AI models
10 different tools (weather APIs, databases, calculators, file readers, etc.)

Without a shared protocol, you'd need 6 × 10 = 60 custom integrations.

Talk about a maintenance nightmare 😫 !

MCP simplifies this by transforming it to an M + N problem using a single, shared protocol. So:

Tools only need to implement server side of MCP once
Each AI application implements the client side of MCP once
AI Hosts that support MCP can instantly connect This drastically reduces the integration complexity and maintenance problem.

Key MCP Concepts

Let’s break down some important terms:

Term	Description
Host	The AI application or product users interact with (e.g., chatbot, IDE). They initiate the connections to MCP Servers and orchestrate the overall flow between user requests, LLM processing, and external tools
Client	A component in the Host that talks to a specific MCP Server. Each client maintains a 1:1 connection with a server and handles the protocol-level details of MCP communication and acts as an intermediary between the Host’s logic and the external Server
Server	A tool or service that exposes capabilities (can be Tools, Resources, Prompts) via MCP protocol
Tool	Functions that Ai model can invoke to perform specific actions. e.g. A python Code executor tool helps AI model to execute python code and return the result.
Resource	Read-only data like documents or files that provide context to models.
Prompt	A predefined text-based instruction the AI can use. e.g. A Summarization prompt
Sampling	Server-initiated requests let the AI run itself again to review and improve its own work. e.g. The AI writes some code, then the server asks it to run again to check if the code works and fix any errors.
MCP URI	A special format to identify tools and capabilities (e.g. `mcp://tools/python_executor/run_python_code`)

How MCP is Built: The Architecture

MCP follows a clear architecture made of three layers:

1. Host Application

This is the AI-powered app you're using — like a coding assistant or smart chatbot. It includes:

The Model (e.g., an LLM)
A Client, which talks to external Servers via MCP

2. Client Layer

Think of the Client as a translator.
It speaks the MCP language and handles communication between the AI (Host) and tools (Servers).

The Client does things like:

Registering available tools and capabilities
Routing the AI’s requests to the right Server
Handling inputs/outputs securely

3. Server Layer

These are the actual tools and services that do the work.
Servers define one or more tools (like Python runners, file searchers, or translators).
Each tool offers capabilities, which the AI models can use.

A Real Example: Using a PDF Summarizer Tool

Let’s say you ask:

“Can you summarize the contents of my meeting_notes.pdf file?”

Here’s what happens:

Host (AI app) receives your request to summarize a PDF
It forwards the request to the Client
The Client calls the Server that exposes the summarize_pdf capability
The Server reads the PDF file and generates a summary
The Host includes that summary in the AI’s response

And just like that — your AI becomes a PDF summarizer!

Why MCP Matters

Here’s why MCP is a game-changer:

✅ Standardized – Write once, use anywhere
✅ Interoperable – Connect different tools to different AIs easily
✅ Modular – Add/remove tools without breaking things
✅ Flexible – Works locally or remotely
✅ Scalable – No need for N × M integrations anymore

Capabilities Types in MCP

There are 3 main capability types in MCP:

Tool: Runs actions like executing code or searching files
Resource: Read-only, like a document or file the model can view
Prompt: Template instructions to guide the AI’s responses
Sampling: Server-initiated requests let the AI model run itself again to review and improve its own work.

In the following diagram we can see the collective capabilities for the use case of a pdf summarizer.

Final Thoughts

MCP is a powerful way to connect AI with the real world — in a safe, simple, and scalable manner. Whether you’re building smart assistants, data dashboards, or developer tools, MCP can make your AI much more capable.

We’re just scratching the surface — the future of AI will be connected, and MCP is helping lead the way.

Getting Started with Microservices: A Beginner's Guide Using Spring Boot

Abhijith — Sat, 08 Mar 2025 15:22:50 +0000

Microservices have become an essential part of modern software architecture due to their flexibility, scalability, and ease of maintenance. In this blog, we will explore how to build microservices using Spring Boot. We will cover the integration of essential tools like Eureka for service discovery, API Gateway for routing, Config Server for centralized configuration, and Zipkin for distributed tracing.

By the end of this guide, you will have a working Spring Boot project with two microservices: Company and Employee, running alongside an API Gateway, Eureka Discovery Server, Config Server, and Zipkin.

Prerequisites

Before we begin, ensure you have the following:

Basic understanding of Spring Boot and Java.
Familiarity with Spring Cloud concepts (Eureka, Config Server, etc.).
Maven for dependency management.
Docker (optional for Zipkin).

Overview of the Components

1. Microservices with Spring Boot

In this architecture, we have two microservices:

Company Service: Manages company-related data.
Employee Service: Handles employee data.

Each microservice is a Spring Boot application that operates independently but interacts with other services via HTTP requests.

2. Eureka Discovery Server

Eureka provides service discovery. It allows microservices to register themselves and discover each other dynamically. By using Eureka, you eliminate the need to hard-code service URLs, enabling a more flexible system.

3. API Gateway

The API Gateway is responsible for routing requests from clients to the appropriate microservices. It also offers additional features such as load balancing and security. In this demo, we will use Spring Cloud Gateway for routing.

4. Config Server

A Config Server centralizes the configuration for all microservices, making it easier to manage and update configurations without redeploying individual services.

5. Distributed Tracing with Zipkin

Distributed tracing helps track requests as they move through the various microservices. We'll use Zipkin to visualize and trace requests across services. Spring Cloud Sleuth automatically integrates with Zipkin, providing trace and span information.

Step-by-Step Implementation

Step 1: Setting up Eureka Discovery Server

Start by creating a Spring Boot application for the Eureka Server.

1) Add the required dependencies in your pom.xml:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-eureka-server</artifactId>
</dependency>

2) Enable Eureka Server in your main application class:


@SpringBootApplication
@EnableEurekaServer
public class DiscoveryServer{
    public static void main(String[] args) {
        SpringApplication.run(EurekaServerApplication.class, args);
    }
}

3) Add Eureka configuration in application.yml:


server:
  port: 8761

eureka:
  client:
    register-with-eureka: false
    fetch-registry: false

Run the Eureka Server on port 8761. The Eureka dashboard can be accessed at http://localhost:8761.

Step 2: Creating Microservices

Both the Company Service and Employee Service will register with Eureka. Here's how to create them:

1) Add the following dependency to pom.xml for each microservice:


<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId>
</dependency>

2) Enable Eureka Client in both microservices (Not Required for newer versions of Spring):


@SpringBootApplication
@EnableEurekaClient
public class CompanyServiceApplication {
    public static void main(String[] args) {
        SpringApplication.run(CompanyServiceApplication.class, args);
    }
}

3) Configure application properties (application.yml):


spring:
  application:
    name: company-service
  cloud:
    discovery:
      eureka:
        client:
          service-url:
            defaultZone: http://localhost:8761/eureka

Step 3: Setting up API Gateway

We'll use Spring Cloud Gateway to handle requests and route them to the appropriate microservices.

1) Add the required dependency for Spring Cloud Gateway:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>

2) Define routing in the application.yml:


server:
  port: 8222
spring:
  cloud:
    gateway:
      discovery:
        locator:
          enabled: true
      routes:
        - id: employees
          uri: http://localhost:8090
          predicates:
            - Path=/api/v1/employee/**
        - id: company
          uri: http://localhost:8070
          predicates:
            - Path=/api/v1/company/**
management:
  tracing:
    sampling:
      probability: 1.0

This configuration ensures that the API Gateway routes requests to company-service and employee-service based on the request path.

Step 4: Config Server Setup

Create a new Spring Boot application for the Config Server.

1) Add the following dependencies:


<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-config-server</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-eureka</artifactId>
</dependency>

2) Enable Config Server in the main class:


@SpringBootApplication
@EnableConfigServer
public class ConfigServerApplication {
    public static void main(String[] args) {
        SpringApplication.run(ConfigServerApplication.class, args);
    }
}

3) Point the Config Server to a Git repository (or file system) that holds the configuration files for your microservices.

Step 5: Integrating Zipkin for Distributed Tracing

Add Zipkin dependencies to the employee, company, gateway microservices:

<dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-tracing-bridge-brave</artifactId>
        </dependency>
        <dependency>
            <groupId>io.zipkin.reporter2</groupId>
            <artifactId>zipkin-reporter-brave</artifactId>
        </dependency>

Configure Zipkin in application.yml:


management:
  tracing:
    sampling:
      probability: 1.0

Run Zipkin (via Docker or standalone) on port 9411. You can now trace requests across the microservices.

Running the Application

Once everything is set up, run the following services:

Eureka Server: localhost:8761
Company Service: localhost:8070
Employee Service: localhost:8090
API Gateway: localhost:8222
Config Server: localhost:8888 (optional if using a Config Server)
Access the API Gateway at http://localhost:8222 and make requests to /company and /employee. All requests will be routed to the appropriate microservices.

You can also monitor traces in Zipkin's web UI at http://localhost:9411.

Conclusion

In this guide, we have successfully created 2 microservices with Spring Boot, integrating Eureka for service discovery, API Gateway for routing, Open Feign for communicating between the 2 microservices, Config Server for centralized configuration, and Zipkin for distributed tracing. These tools work together to help manage and monitor microservices effectively, providing a scalable and maintainable architecture.

With this setup, your microservices can scale independently, discover each other dynamically, and be monitored for performance and issues through distributed tracing.

Code Repository

You can access the full source code for this project on GitHub.

Thank you for reading! Happy coding with Spring Boot and Microservices!

Getting Started with Docker: Essential Commands for Beginners

Abhijith — Sat, 16 Nov 2024 11:58:46 +0000

So you're venturing into the realm of Docker? Great choice! This technology is a game changer for developers, making it incredibly simple to package and run apps in containers.

To help you started, here are some important Docker commands you'll commonly use.

1. Installing Docker

Before you start, make sure Docker is installed on your machine. You can follow the official installation guide for Docker Desktop.

2. Basic Docker Commands

`docker --version`

This command verifies your Docker installation by checking the installed version.

`docker pull <image_name>`

This command pulls a Docker image from the Docker Hub repository.

`docker run <image_name>`

This command creates and runs a container from a Docker image. To run container in detached mode add -d flag.
To map container port to local ports add -p flag.

Example:
docker run -d -p 8080:80 nginx

This runs nginx container in port 8080 in detached mode(Runs in background i.e. No terminal will be tied to it).(We are mapping container port 80 to our local port 8080)

`docker ps`

This shows all running containers. Use docker ps -a to see all containers, including those that are stopped.

`docker images`

This command lists all Docker images downloaded to your local machine.

3. Commands to Manage Containers

`docker stop <container_id>`

This command stops a running container.

`docker start <container_id>`

This command starts a stopped container.

`docker logs <container_id>`

This command is used to show logs of a running container.

`docker restart <container_id>`

This command restarts a running container.

`docker rm <container_id>`

This command deletes a stopped container. Use -f to force remove a running container.

(You can replace with the actual container ID or name.)

`docker rmi <image_name>`

This command deletes an image from local machine. Used to free up space.

`docker system prune`

This command cleans up all stopped containers, dangling images, and unused networks.

Docker makes it easy to package and deploy applications. If you master these commands, it will give you a solid foundation as you begin exploring more advanced features. Any questions ask them below.

Happy Dockerizing!

AWS Basics 1: How I hosted a static website on amazon S3

Abhijith — Thu, 17 Oct 2024 15:19:37 +0000

Introduction

Have you ever wondered how to host a static website on AWS? I certainly did! For a while, it felt daunting, and I kept putting it off. But when I finally decided to dive in, I was pleasantly surprised by how simple the process turned out to be. It was quite a journey, and I learned a lot along the way. I’d love to share my experience with you!

What You Need

An AWS account
About 20 minutes of your time

Let’s Start with What S3 Is

Amazon Simple Storage Service (S3) is a scalable cloud storage solution offered by AWS. It's designed to store and retrieve any amount of data from anywhere on the web. Here are some key features that make S3 an excellent choice for hosting static websites:

Scalability: S3 can handle any size of data, from a few bytes to terabytes, making it perfect for growing websites.
Durability and Availability: With a durability rate of 99.999999999% (11 nines), your data is safe and always accessible.
Cost-Effectiveness: You only pay for what you use, making it an economical option for hosting.
Static Website Hosting: S3 provides a straightforward way to host static websites without needing to manage servers.

In short, S3 is a reliable, efficient, and user-friendly option for anyone looking to host a static website.

Steps to Create Your S3 Bucket

Log in to the AWS Management Console.
Navigate to S3 and click "Create Bucket."
Configure Bucket Settings:
- Bucket Name: Choose a unique name for your bucket (this name must be globally unique across all of S3).
- Region: Select the AWS region where you want your bucket to be located. It’s best to choose a region closest to your target audience. (You can find this option in the top navbar.)
Set Permissions:
- Enable ACLs (for more fine-grained control over permissions in the S3 bucket).
- Uncheck "Block all public access" so that people can view your website.
Enable Bucket Versioning:

Think of it as similar to version control in GitHub.
Review and Create:
- Review your settings and click the Create bucket button. Your new bucket will be created!

Steps to upload your files

Upload your index.html file:

This file serves as the main entry point for your website.
Upload the folder containing all website assets:

Make sure to upload the folder that contains your CSS, JavaScript, images, and other assets. Note: Do not upload a zipped version, as S3 cannot unzip files.
Enable Static Website Hosting:

Go to the Properties section of your bucket and enable static web hosting and specify the default page of your website. This allows S3 to serve your website files directly.

Testing Your URL

Now, test the URL generated by S3. Did you encounter an error? If so, it’s likely because your bucket permissions need to be set to allow public access to your files.

Why the Error Occurs

By default, S3 buckets block public access for security reasons. It’s like having a beautifully displayed store window — everyone can see the store itself, but the products inside are locked away and inaccessible. To fix this, you need to change the permissions of your files to make them public. Once that’s done, visitors will be able to see and access your content as intended.

Making objects public

Select the objects in your S3 bucket and click Actions menu, then make public using ACL
Refresh your link again to view your website