DEV Community: Threshika Vijayakumar

I Thought My RAG Was Broken. The Real Problem Was Chunking.

Threshika Vijayakumar — Wed, 10 Jun 2026 06:22:23 +0000

When I started learning RAG, I assumed the difficult parts would be:

Embeddings
Vector databases
LLMs

I was wrong.

My embeddings were working.

My vector database was returning results.

The LLM was generating answers.

Yet the responses were often incomplete, irrelevant, or missing important context.

After hours of debugging, I discovered the problem wasn't the model.

It was how I was splitting my documents.

Why Chunking Matters More Than Most People Think

A RAG system can only retrieve what it can find.

And what it can find depends heavily on how your documents are chunked.

Bad chunking leads to:

Missing context
Poor retrieval
Irrelevant answers
Hallucinations

Even when everything else is configured correctly.

Figure 1: Good chunking improves retrieval quality, while bad chunking fragments context and hurts answer quality.

In many cases, the quality of your answers is decided before the LLM generates a single token.

Mistake #1: Chunks That Are Too Large

Imagine storing an entire chapter as a single chunk.

20-page chapter
        ↓
      1 chunk

Now a user asks a question about one paragraph.

The retrieval system has to bring back the entire chapter.

This introduces a lot of irrelevant context and makes retrieval less precise.

Bigger chunks don't always mean better answers.

Mistake #2: Chunks That Are Too Small

I then tried the opposite approach.

Tiny chunks.

Something like:

Chunk 1:
The capital of France is

Chunk 2:
Paris

The problem?

Context gets destroyed.

The retrieval system may find only part of the answer.

The information exists, but the meaning is fragmented.

Figure 2: Effective chunking is a balance. Chunks that are too large introduce noise, while chunks that are too small lose context.

This was the first time I realized that chunk size isn't just a preprocessing setting—it directly impacts retrieval quality.

Mistake #3: No Chunk Overlap

This was one of the most surprising lessons.

Without overlap:

Chunk 1
--------
Embeddings
Vector Search

Chunk 2
--------
Retrieval
Generation

What happens if an important concept sits between the boundary of two chunks?

You lose context.

Adding overlap helps preserve information that naturally spans multiple chunks.

Mistake #4: Splitting by Character Count Alone

A lot of tutorials do something like:

chunk_size = 500

and stop there.

The problem is that text doesn't naturally organize itself into 500-character blocks.

You might accidentally split:

The vector database stores embeddings used for...

and

...semantic search across documents.

The sentence survives.

The meaning doesn't.

Mistake #5: Using the Same Strategy Everywhere

Not every document should be chunked the same way.

Documentation, codebases, contracts, and research papers all have different structures.

For example:

Documentation → section-based chunks
Code → function or class-based chunks
Research papers → section-based chunks
Contracts → clause-based chunks

The document structure often provides better chunk boundaries than arbitrary token counts.

The Lesson That Changed My Thinking

When I started learning RAG, I viewed chunking as a preprocessing step.

Now I see it differently.

Chunking is retrieval engineering.

Because retrieval quality directly affects answer quality.

Better chunks lead to:

Better retrieval
Better context
Better answers

Without changing the LLM at all.

Final Thoughts

The biggest surprise in my RAG journey wasn't embeddings or vector databases.

It was discovering how much impact document splitting has on retrieval.

If your RAG system isn't performing well, don't immediately blame the model.

Look at your chunks first.

The problem might already exist before the LLM ever sees the question.

💡 What's your preferred chunking strategy when building RAG systems?

The Day I Realized RAG Isn't an AI Problem

Threshika Vijayakumar — Wed, 10 Jun 2026 05:30:10 +0000

When I first started learning Retrieval-Augmented Generation (RAG), I thought the hardest part would be understanding Large Language Models.

I was wrong.

I thought I would spend most of my time:

Choosing the best LLM
Writing better prompts
Tweaking model parameters

Instead, I ended up spending most of my time thinking about search.

And that's when something clicked:

RAG isn't primarily an AI problem.

It's a search problem.

Let me explain.

The Mental Model Most Beginners Have

When we first interact with ChatGPT or any AI assistant, we imagine something like this:

Question
   ↓
  AI
   ↓
Answer

Simple, right?

Ask a question.

Get an answer.

But once you start building applications with your own data, this model breaks.

Fast.

My First "Wait... Why Is This Wrong?" Moment

Imagine asking an AI:

"What happened in yesterday's IPL match?"

Or:

"What's the latest version of this framework?"

Or:

"What does page 42 of this PDF say?"

The model might answer confidently.

The problem?

It may not actually know.

Why?

Because LLMs don't magically know everything.

They only know what was available during training.

Anything outside that knowledge is a problem.

And that's exactly the problem RAG tries to solve.

What I Thought RAG Was

When I first heard about RAG, I imagined something extremely complicated.

Maybe:

Multiple AI models
Complex reasoning systems
Fancy prompt engineering tricks

But after digging deeper, I realized RAG is surprisingly simple.

The biggest surprise wasn't the "Generation" part.

It was the "Retrieval" part.

Notice how the answer isn't generated immediately. The system first searches for relevant information, gathers context, and only then asks the LLM to generate a response.

This was the moment I started seeing RAG as a search problem rather than an AI problem.

The Library Analogy That Made Everything Click 📚

Imagine walking into a library with one million books.

You ask:

"How do black holes affect time?"

What would a librarian do?

Probably not this:

Read 1,000,000 books
      ↓
Find answer

Instead:

Find relevant books
      ↓
Open relevant pages
      ↓
Read only what's needed
      ↓
Answer

That's basically RAG.

The system first finds relevant information.

Then the LLM uses that information to generate an answer.

Why Traditional Search Isn't Enough

Let's say a document contains:

Electric vehicles are becoming more popular.

Now imagine the user searches:

Why are battery-powered cars growing in popularity?

Keyword matching struggles.

Humans don't.

We instantly understand both sentences are talking about the same thing.

Machines need help understanding that connection.

This is where embeddings enter the story.

The Concept That Changed Everything: Embeddings

Embeddings sounded scary when I first heard the term.

In reality, the idea is beautiful.

We convert text into numbers.

Something like:

"car"
      ↓
[0.12, -0.55, 0.89, ...]

"vehicle"
      ↓
[0.15, -0.51, 0.91, ...]

The exact numbers don't matter.

What matters is this:

Similar meanings produce similar vectors.

Which means:

car
vehicle
automobile

end up close together in vector space.

Now the machine can search by meaning instead of exact words.

That's huge.

The Most Underrated Part of RAG: Chunking

When people talk about RAG, they usually talk about:

OpenAI
Gemini
Claude
Vector Databases

But one of the most important decisions happens before any of that.

Chunking

Imagine storing an entire 200-page book as a single document.

A user asks about one sentence.

Good luck retrieving that efficiently.

Instead we split content into smaller chunks:

Document
   ↓
Chunk 1
Chunk 2
Chunk 3
Chunk 4
...

Now retrieval becomes much more precise.

One thing I've learned:

Bad chunking can destroy a RAG system.

Even when everything else is configured correctly.

So Why Do We Need Vector Databases?

After creating embeddings, we need somewhere to store them.

That's where vector databases come in.

Traditional databases answer questions like:

SELECT * FROM users
WHERE name = 'John';

Vector databases answer questions like:

Find content most similar
to this question

That's a completely different problem.

And it's what makes semantic search possible.

Popular options include:

PostgreSQL + pgvector
Pinecone
Weaviate
Qdrant
Milvus

What Actually Happens Inside a RAG Pipeline?

Here's the simplified flow:

Notice something?

The LLM appears near the end.

Most of the work happens before generation.

The Biggest Lesson From My RAG Journey

When I started learning RAG, I thought:

Better model = better answers

Now I think:

Better retrieval = better answers

Because even the most powerful model can't answer questions if the relevant information never reaches it.

That's why experienced engineers spend so much time improving:

Chunking
Embeddings
Search quality
Metadata filtering
Reranking

The answer quality often depends more on retrieval than generation.

Final Thoughts

The most surprising thing I've learned about RAG is that it changed the way I think about AI systems.

I used to believe the intelligence lived entirely inside the model.

Now I realize a huge part of the intelligence comes from finding the right information at the right time.

And that's why I no longer see RAG as just an AI technique.

I see it as a search problem that happens to use AI.

And honestly?

That realization taught me more about modern AI than any prompt engineering tutorial ever did.

💡What's the most surprising thing you've learned while building or learning RAG? I'd love to hear your experience in the comments.

Keycloak: The Open-Source Hero Behind Secure Logins

Threshika Vijayakumar — Sun, 26 Oct 2025 16:11:40 +0000

Every time you click “Login with Google” or “Sign in with GitHub,” a complex dance happens in the background: tokens are exchanged, your identity is verified, and permissions are granted, all in a matter of seconds.

While many developers rely on cloud services like AWS Cognito or Firebase Authentication, there’s a powerful open-source alternative that gives you full control over authentication and user management: Keycloak.

What is Keycloak?

Keycloak is an open-source Identity and Access Management (IAM) solution developed by Red Hat.
It helps developers add authentication, authorization, and single sign-on (SSO) to their applications without writing security code from scratch.

In simple terms:

Keycloak helps you manage who can access your application, how they log in, and what permissions they have.

Why Keycloak When There Are So Many Cloud Options?

You might wonder why not use AWS Cognito, Firebase Auth, or Azure AD instead?

Here’s what makes Keycloak special:

Open Source
Self-hosted
Easy Integration

Keycloak in a Nutshell

Realm – Your own isolated space managing users, roles, and clients. (You can have multiple realms like dev, test, prod.)

User – Represents a person or service that can log in. Can be created manually, registered, or linked via external IdPs.

Client – Any app using Keycloak for login (e.g., frontend, backend). Defines redirect URIs, access type, and permissions.

Identity Provider (IdP) – External service verifying user identity (e.g., Google, GitHub, Azure AD, AWS Cognito, GCP). Keycloak connects them all in one place.

Hands-On: Run Keycloak Using Docker

Step 1: Pull the Keycloak Image

docker pull quay.io/keycloak/keycloak:latest

Step 2: Run Keycloak in Development Mode

docker run -d \
  --name keycloak \
  -p 8080:8080 \
  -e KEYCLOAK_ADMIN=admin \
  -e KEYCLOAK_ADMIN_PASSWORD=admin \
  quay.io/keycloak/keycloak:latest start-dev

Step 3: Log in to the Admin Console

Go to:
http://localhost:8080
Login using:
Username: admin
Password: admin

You’ll see the Keycloak dashboard with options to manage realms, users, and clients.

Step 4: Create a Realm

Click on the top-left dropdown → Create Realm
Name it (e.g., myapp-realm)
Save

Step 5: Add a Client

Go to Clients → Create Client
Name: react-app
Root URL: http://localhost:3000 (your app’s URL)
Save and configure redirect URIs

Step 6: Add a User

Go to Users → Add User
Set username (e.g., john)
Go to Credentials tab → Set password
Enable Temporary Password = OFF

My Experience Working with Keycloak

Recently, I came across Keycloak while exploring secure authentication. I started experimenting with it, and soon I was able to integrate the latest Keycloak Quarkus version (previously, it was based on WildFly). The new Quarkus-based version felt significantly lighter, started faster, and was easier to configure, which made the entire setup experience smoother.

However, it wasn’t without challenges. One of the main issues I faced was with webhook-like event integrations, which weren’t available directly through the UI. I had to configure them manually using Keycloak’s event listener mechanism. Since Keycloak is open-source and fully extensible, I could add custom logic and workarounds, but it took some digging through the documentation to get it right.

Another challenge was handling redirect URIs and token configurations for clients. A small mismatch in redirect URLs or access type (public vs. confidential) can cause authentication loops or token errors. Understanding how Keycloak issues tokens and how the client consumes them took some trial and error, but once it clicked, the flow made perfect sense.

Despite these hurdles, the experience was amazing. Once the integration was complete, authentication and user management became seamless. It felt rewarding to see how flexible and powerful Keycloak can be when you really understand its structure and flow.

When you finally get Keycloak working after the setup struggle 😎

Final Thoughts

Authentication is a complex but critical part of every application.
Instead of building your own login system and handling tokens manually, Keycloak provides a ready-to-use, secure, and flexible identity management solution.

Whether you’re securing a single web app or managing microservices in the cloud, Keycloak simplifies identity so you can focus on building your core product.

Start your Keycloak journey today because secure login doesn’t have to be hard.

Redis Explained: The Secret Ingredient Behind Fast Apps & Smooth DevOps

Threshika Vijayakumar — Wed, 15 Oct 2025 18:18:49 +0000

Have you ever wondered how apps like Instagram, Netflix, or GitHub handle millions of users without breaking a sweat?
How do they make things load instantly, even when thousands of people are online at the same time?

The secret sauce is often something hidden behind the scenes, a little hero called Redis.

What Exactly is Redis?

Redis stands for Remote Dictionary Server, but don’t let the name scare you.
Think of Redis as a super-speedy notepad that your app can use to remember things temporarily (or even permanently, if you want).

It’s an open-source, in-memory data store, which means it keeps your data in RAM instead of a hard drive. And since RAM is way faster than disk, Redis can respond in microseconds, which is why it’s so popular.

It can act as:

A database (for storing data)
A cache (for speeding up responses)
A message broker (for managing queues & communication)

How Redis Works?

Redis stores data in the form of key-value pairs like a dictionary in Python, a Map in JavaScript, or a HashMap in Java.

Key: "user:101"
Value: "{name: 'Threshika', age: 22, country: 'India'}"

You can set, get, update, or delete values using simple commands:

SET user:101 "Threshika"
GET user:101

Setting Up Redis Using Docker

Now that you know what Redis is, let’s actually run it!

The easiest way to get started is by using Docker no installation headaches, just pull and run.

If you have Docker installed, open your terminal and run:

docker pull redis/redis-stack

This pulls the Redis Stack image, which includes Redis plus extra features like RedisInsight (a UI tool) and modules for JSON, Search, and Graph.

Once pulled, start the container with:

docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack

Port 6379 → Redis server
Port 8001 → RedisInsight dashboard (http://localhost:8001)

Now Redis is up and running inside Docker!
You can connect to it using any Redis client or CLI:

redis-cli

Redis in Development: The Developer’s Superpower

Redis isn’t just about speed; it helps developers build smarter and more efficient apps.

Caching for Instant Responses

Your backend can cache database queries or API results in Redis, so users don’t have to wait every time.

Example:
If a user opens your profile page 10 times, your app fetches it from Redis instead of reloading everything from the database again and again.

Session Storage

Web apps use Redis to store user sessions, those tiny pieces of data that remember you’re logged in.
If you’ve ever been logged into a website even after closing your browser, there’s a good chance Redis was behind it.

Real-Time Applications

Redis has a Pub/Sub (Publish/Subscribe) feature that makes it ideal for chat apps, live notifications, or multiplayer games, allowing updates to be sent to users instantly.

Queues and Background Jobs

Redis Lists and Streams are great for managing background tasks like sending emails or processing payments asynchronously.

Redis in DevOps: The Backbone of Speed and Reliability

Developers use Redis in their code, but DevOps engineers rely on it to make entire systems faster and more reliable.

Here’s how,

Shared Caching Layer Across Microservices

In large systems, Redis acts as a central cache, helping multiple microservices share data efficiently.

Message Broker for Smooth Communication

Redis Streams enable services to communicate with each other. One service sends a message, another receives it, and processes it. This is how scalable systems handle background work seamlessly.

CI/CD Optimization

In DevOps pipelines, Redis stores temporary build data, job states, and cache dependencies to speed up deployment times.

Scalable Infrastructure

Redis runs beautifully inside Docker, Kubernetes, or cloud platforms like AWS, Azure, and Google Cloud. It can even be clustered for high availability, with no single point of failure.

Wrapping Up: Why Redis is Worth Learning

Redis is more than just a tool; it’s a mindset of speed, simplicity, and scalability.
Whether you’re building a small side project or managing a large-scale distributed system, Redis fits right in.

Once you start using Redis, you’ll realize how much time and effort it saves and how much faster your apps feel.

Final Thought

Redis isn’t just for developers or DevOps; it’s for anyone who loves building things that feel instant.
Learn it once, and you’ll find yourself using it everywhere.