DEV Community: Sakshi Srivastava

Behind the Scenes of RAG

Sakshi Srivastava — Fri, 15 Aug 2025 13:28:38 +0000

Introduction

In my last article, "RAG to Riches", we explored why Retrieval-Augmented Generation (RAG) is transforming the way AI models deliver accurate, up-to-date answers by blending retrieval and generation. As promised, it’s time to roll up our sleeves and see RAG in action—step by step, from setting up your knowledge base to watching your AI answer real questions with live data.

What You’ll Learn

How to prepare and structure your knowledge base
The basics of embeddings and vector databases
How to connect a retriever and a language model
How to run real-time Q&A with your own data
Tools and code snippets to get you started

*Step 1: Define Your Use Case and Gather Data
*
Start by identifying your use case—customer support, internal documentation, research assistant, etc. Gather all relevant documents, FAQs, manuals, or datasets that your AI will reference.

*Step 2: Preprocess and Chunk Your Data
*
Large documents need to be broken into smaller, meaningful chunks. This makes retrieval more precise and efficient. For example, split a 100-page manual into sections or paragraphs, each focused on a specific topic.

*Step 3: Create Embeddings and Store Vectors
*
Transform each chunk into a vector (embedding) that captures its semantic meaning. Use models like Sentence Transformers, OpenAI, or even local tools via Ollama. Store these vectors in a vector database such as ChromaDB, Pinecone, or FAISS for fast similarity search.

*Step 4: Build the Retriever
*
When a user asks a question, convert it to an embedding and search your vector database for the most relevant chunks. This is your retriever at work—bringing back the best context for your model to use.

*Step 5: Connect to a Language Model
*
Feed the retrieved context and the user’s question to a language model (like GPT, Llama, or Mistral). The model generates a response grounded in the retrieved information, reducing hallucinations and improving accuracy.

*Step 6: Run, Test, and Iterate
*
Now, you’re ready to test! Ask real questions and see your AI respond with answers sourced from your own knowledge base. Analyze the results, tweak chunk sizes, retriever settings, or the prompt to improve performance.

*Tools & Frameworks to Make It Easier
*

FastAPI: Build REST endpoints for document ingestion and query handling
PyMuPDF: PDF text extraction without external parsers
Ollama: Run LLMs and embedding models locally—no internet required.
ChromaDB/FAISS: Popular open-source vector databases.

*Code Example: PDF Document Analysis
*

from fastapi import APIRouter, UploadFile, File, HTTPException
import uuid

router = APIRouter()

@app.post("/upload")
async def upload_documents(file: UploadFile = File(...)):
    """
    Uploads and processes documents for RAG systems by:
    1. Reading file content
    2. Extracting and validating text
    3. Chunking content
    4. Storing in vector database

    Returns document metadata with processing details
    """

    # 1. Read file content asynchronously
    content = await file.read()

    # 2. Decode and validate content
    decoded_text = text_processor.decode_content(content)
    if not decoded_text or len(decoded_text.strip()) < 10:
        raise HTTPException(
            status_code=400,
            detail="File is empty or contains no readable text"
        )

    # 3. Generate unique document ID
    doc_id = str(uuid.uuid4())

    # 4. Chunk text into processable segments
    chunks = text_processor.chunk_text(decoded_text)
    if not chunks:
        raise HTTPException(
            status_code=400,
            detail="Failed to extract meaningful text chunks"
        )

    # 5. Prepare chunks with metadata
    doc_chunks = [
        {
            'id': f"{doc_id}_{index}",
            'text': chunk,
            'source': file.filename,
            'doc_id': doc_id
        }
        for index, chunk in enumerate(chunks)
    ]

    # 6. Store chunks in vector database
    try:
        await rag_system.add_document_chunks(
            chunks=doc_chunks,
            doc_id=doc_id,
            file=file,
            encoding="utf-8"
        )
    except Exception as error:
        print(f"Database error: {error}")
        raise HTTPException(
            status_code=500,
            detail=f"Failed to store document: {str(error)}"
        )

    # 7. Return processing metadata
    return {
        'document_id': doc_id,
        'filename': file.filename,
        'file_size': len(content),
        'chunks_generated': len(chunks),
        'message': 'Document processed successfully'
    } 
from datetime import datetime
from fastapi import HTTPException
from pydantic import BaseModel

# Define structured request/response models
class QueryRequest(BaseModel):
    query: str
    top_k: int = 5  # Default value for optional parameter

class QueryResponse(BaseModel):
    answer: str
    query_time: float

@app.post("/query")
async def query_document(request: QueryRequest):
    """Processes user queries through RAG pipeline"""
    start_time = datetime.now()

    try:
        # 1. Retrieve relevant document chunks
        relevant_chunks = rag_system.search_chunks(
            query_text=request.query,
            result_count=request.top_k
        )

        # 2. Build context from retrieved chunks
        context = "\n\n".join(chunk["text"] for chunk in relevant_chunks)

        # 3. Generate AI-powered response
        answer = rag_system.generate_answer(
            user_query=request.query,
            context=context
        )

        # 4. Calculate processing time
        query_time = (datetime.now() - start_time).total_seconds()

        return QueryResponse(
            answer=answer,
            query_time=query_time
        )

    except Exception as error:
        # Handle specific error types
        error_msg = f"Query processing failed: {str(error)}"
        print(f"ERROR: {error_msg}")
        raise HTTPException(
            status_code=500,
            detail=error_msg
        )

This is a part of the code, see my entire code on Github

Tips for Success

Start small: Test with a limited dataset before scaling up.
Use open-source tools for flexibility and privacy.
Always evaluate your system with real user queries and iterate.

Let’s make AI smarter—one chunk at a time!

Smart Document Hub - Algolia MCP Server Challenge

Sakshi Srivastava — Sat, 26 Jul 2025 11:08:23 +0000

This is a submission for the Algolia MCP Server Challenge

What I Built

An AI-powered learning dashboard with a React/Vite frontend and a Flask backend. Users can upload PDFs or submit web links - the backend processes will extract text (using pdfplumber for PDFs and Jina Reader for web links), then enrich with AI-generated summaries and key points via OpenAI. All enriched data and metadata are indexed in the Algolia MCP Server, enabling fast, unified, and semantic search across all resources. The system also securely manages user authentication with AWS, allowing users to search, review, and download their learning materials with ease.

Demo

Deployed Link: https://study-documents-fe.vercel.app/login

*Github Repo: *
Frontend: https://github.com/sakshi30/study_documents_fe
Backend: https://github.com/sakshi30/study-enhancement-bknd

Demo:
https://drive.google.com/file/d/1AhO3UQ-9s43K_jO6AXwx7yfeQRU9HRJb/view?usp=sharing

Screenshots:

How I Utilized the Algolia MCP Server

I utilized the Algolia MCP Server as the central indexing and retrieval layer for all the learning materials my users upload, including PDFs and web links. By sending AI-enriched summaries and metadata to MCP, I enable fast, semantic search across diverse content sources. This integration greatly simplifies how my platform organizes and delivers intelligent, relevant information to users instantly.

Key Takeaways

Development Process:
I started by designing a modular backend with Flask, integrating pdfplumber for PDF text extraction and Jina Reader for web link content parsing. I added support for user authentication with AWS RDS to ensure secure uploads. For each resource, the backend used OpenAI to generate structured summaries and key points in JSON. All enriched metadata and download links were indexed into the Algolia MCP Server, which powered the frontend’s unified and semantic search experience built with React and Vite.

Challenges Faced:
One major challenge was handling diverse input formats—extracting clean, useful text from PDFs (which can be poorly formatted) and ensuring reliable content parsing from web links, given varied site structures. Integrating multiple external APIs (AWS, OpenAI, Jina, and Algolia MCP) required careful error handling and thoughtful workflow design, especially for asynchronous processing and returning accurate status to users. If I had to improve I would have used Cognito for authentication and stored the pdfs in S3.

What I Learned:
Through this project, I learned effective strategies for building robust, API-driven backend pipelines that leverage multiple third-party services. I gained hands-on experience integrating advanced AI (summarization, key points) and using Algolia MCP Server to create a scalable, interoperable search layer. Most importantly, I saw the value of modular, service-oriented architecture: it made troubleshooting easier, future expansion straightforward, and gave end users a seamless, intelligent learning experience.

Sakshi Srivastava
Dev.to: https://dev.to/sakshi_srivastava
Linkedin: https://www.linkedin.com/in/srivassa/

Pet Care: Voice Agent for Automated Pet Profile Management & Health Tracking

Sakshi Srivastava — Tue, 22 Jul 2025 19:19:21 +0000

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

Pet Care is an AI-powered voice agent that automates key workflows in pet clinics, grooming centers, and veterinary hospitals. With just a conversation, pet owners can:

Create detailed pet profiles (name, breed, age, weight, etc.)
Ask health-related questions (e.g., "What food is good for a 3-month-old Shih Tzu?")
Set reminders for vaccinations, deworming, grooming, and vet visits
This system eliminates manual form-filling and scheduling, making pet care seamless for both users and clinics.

Prompt Category:
Business Automation

Demo

Hosted at: https://pet-care-ai-assist-git-d334ae-sakshidinesh-srivastavas-projects.vercel.app
Video Demo: https://drive.google.com/file/d/1i5ry7hSk76Qkc7KNBDksel53rBe4sD5K/view?usp=sharing

Screenshot:

GitHub Repository

Frontend: https://github.com/sakshi30/PetCare-AI-Assistant
Backend: https://github.com/sakshi30/PetCare-AI-Assistant-Backend

Technical Implementation & AssemblyAI Integration

Unique Value Proposition

Pet Care is designed for real-world B2C/B2B deployment in pet-focused businesses. It enables multi-step workflows using natural speech — for example:

“Create a profile for Bruno, a 2-year-old male Beagle.”

"Also remind me to schedule Bruno's next vaccination in 3 weeks."

Why it's unique:

Transcribes and understands domain-specific language (e.g., breed names, medical terms)
Supports contextual follow-up queries without restarting the flow
Handles structured data extraction, scheduling logic, and AI-driven Q&A in one voice interface
Ideal for integration with CRM, pet clinic dashboards, and mobile apps

Assembly AIs api is used to stream the query and get a response from backend using various llms

Submitted by Sakshi Srivastava (https://www.linkedin.com/in/srivassa/)

RAG to Riches

Sakshi Srivastava — Thu, 19 Jun 2025 10:31:49 +0000

Let’s face it: even the smartest AI can sometimes act like that friend who’s super confident but not always right. Enter RAG—Retrieval-Augmented Generation—the secret sauce that’s taking AI from “pretty smart” to “wow, did my AI just cite the latest company policy?"

What’s RAG, and Why Should You Care?

Imagine your favorite language model (think ChatGPT, Gemini, or Claude) as a trivia champ who’s been living under a rock since 2023. Sure, it knows a ton, but ask it about last month’s news or your company’s latest guidelines, and you’ll get a blank stare—or worse, a wild guess.

RAG changes the game. It gives your AI a digital backpack filled with up-to-date info from trusted sources. When you ask a question, RAG lets the AI rummage through this backpack, grab the freshest, most relevant facts, and weave them into its answer. No more outdated info. No more hallucinations. Just smart, on-the-money responses.

A Day in the Life: RAG in Action

Let’s say you work at a big company. You need to know the latest policy on remote work.

Old-school AI: “Based on my training data, here’s what I think…” (Cue generic, possibly outdated answer.)
RAG-powered AI: “According to the HR memo from last week, here’s the new policy—and here’s a link to the document.”

Boom. Instant, accurate, and you look like a rockstar in your next meeting.

Real-World RAG Magic

HR & Enterprise Assistants: Employees ask questions, RAG fetches answers straight from the latest internal docs, policies, or wikis. No more endless email chains or outdated FAQs!
Healthcare: Doctors get summaries from the newest research papers—no more flipping through journals during a consult.
Legal: Lawyers retrieve the latest case law and statutes, saving hours of manual research.
Customer Support: Chatbots serve up solutions from the freshest product manuals and troubleshooting guides.

Why Is RAG Such a Big Deal?

Because it solves two of AI’s biggest headaches:

Memory Gaps: No more “Sorry, my data only goes up to 2023.”
Hallucinations: If the AI doesn’t know, it checks the facts—just like a good journalist.

Plus, RAG-powered systems can show you exactly where their answers come from. Want to double-check? Here’s the source. Total transparency.

How to Set Up Your Own RAG System (It’s Easier Than You Think!)

Ready to give your AI a memory boost? Here’s a high-level look at how you can set up a RAG pipeline:

Pick Your Language Model: Start with a solid foundation—an LLM like OpenAI’s GPT, Google’s Gemini, or an open-source model.
Build or Choose a Knowledge Base: Gather your documents, PDFs, wikis, or any data you want your AI to access. Store them in a searchable format (think databases or vector stores like Pinecone, FAISS, or ChromaDB).
Add a Retriever: This is the librarian of your system. Use tools like Elasticsearch or vector search to quickly fetch the most relevant chunks of data when a question comes in.
Connect the Dots: When a user asks something, the retriever grabs the best info, and the language model uses it to generate a grounded, accurate answer.
Bonus—Show Your Work: For extra trust points, display the sources or links your AI used to answer the question.

Pro Tip: There are open-source frameworks like Haystack, LlamaIndex, and LangChain that make building RAG pipelines a breeze—even if you’re not a hardcore coder.

Stay Tuned: Full RAG Demo Coming Soon!

Curious to see RAG in action, step by step? I’ll be posting a follow-up article soon, walking you through the entire process—from setting up your knowledge base to seeing your AI answer real questions with live data. Follow me to get notified when it drops!

The Bottom Line: From RAG to Riches

RAG isn’t just another AI buzzword—it’s the upgrade that’s making AI genuinely useful for real-world work. Whether you’re building the next-gen chatbot, automating research, or just want smarter answers, RAG is your ticket from “meh” to “magnificent.”

Follow me on Linkedin