DEV Community

Hiroki Kameyama
Hiroki Kameyama

Posted on

Taking RAG to Production — Evals, Observability, Security, and Beyond (Introduction)

About This Guide

In the previous guide, Building a RAG System from Scratch with pgvector and Gemini, we implemented a RAG system from scratch using pgvector and Gemini, then extended it through Tool Use, AI Agents, MCP, and cloud deployment.

This guide is the sequel. It takes you from "building a system that works" to "making a system that works in production".

[Previous Guide]
RAG → Tool Use → AI Agents → MCP → Render × Supabase deployment

[This Guide]
Evals → Observability → Security → MLOps → Fine-tuning
→ Multi-Agent → Governance
Enter fullscreen mode Exit fullscreen mode

Why Production Operations Are Hard

After implementing RAG or Agents, you'll inevitably hit these problems when trying to go live:

Quality problems
Manually checking "is this answer correct?" doesn't scale. You need an automated system to measure quality.

Visibility problems
When something goes wrong in production, you can't diagnose it if you can't track "what happened at which step."

Security problems
When accepting requests from external users, you need to handle prompt injection attacks and prohibited content.

Continuous improvement problems
If you have no way to verify "did this actually get better?" after improving a prompt, your iteration cycle stalls.

Model problems
Sometimes a general-purpose model isn't enough — you need a model specialized for your specific domain.


Guide Structure

Each chapter can be read independently. The content assumes the previous guide's implementation (pgvector, Gemini, RAG), but you can read for conceptual understanding alone.

Chapter Theme Problem Solved
Ch. 2 Evals Automated measurement of answer quality
Ch. 3 Observability Tracing and cost management
Ch. 4 Security Guardrails and attack defense
Ch. 5 MLOps / LLMOps CI/CD and prompt management
Ch. 6 Fine-tuning Domain-specific model specialization
Ch. 7 Multi-Agent Orchestrator × Worker architecture
Ch. 8 Governance EU AI Act compliance, audit logs

Prerequisites

  • Completed the previous guide's pgvector tutorial
  • Python 3.11, Docker, pgvector environment set up
  • GEMINI_API_KEY configured in .env

Tools Used

Tool Purpose Free Tier
Google Gemini API LLM + Embedding 1,500 requests/day
pgvector Vector DB Unlimited (local)
Langfuse Observability Free cloud tier available
GitHub Actions CI/CD pipeline 2,000 minutes/month (free)
Hugging Face Fine-tuning models Free

Let's start with Chapter 2: Evals.

Top comments (0)