Digvijay Singh

Posted on Apr 19 • Edited on Apr 21

I Built a Production-Grade AI Platform From Scratch (Here’s the Exact Folder Structure)

#ai #python #fastapi #docker

How I Structured a Production-Grade AI Platform From Scratch

I stopped doing tutorials.

Not because tutorials are bad. But because after finishing one, I could follow code. I couldn't explain why the code was written that way.

So I decided to build something real from scratch. No copy-paste. No shortcuts. Every decision justified. Every file explained.

This is the first article in a series documenting how I build a production-grade Agentic RAG Document Intelligence System — phase by phase, file by file.

What We're Building

The GenAI DocQA Platform is a system where users upload documents (PDF, DOCX, CSV, PPTX) and ask complex natural language questions. A 10-node LangGraph agent retrieves relevant chunks, reasons over them, self-corrects, and streams sourced answers back to the user.

Think: mini Perplexity AI + Notion AI + an OpenAI API platform. Built entirely from scratch.

Total cost to run: $0 — all free tiers.

The Full Stack

Layer	Technology
API Framework	FastAPI + async SQLAlchemy 2.0
AI Agent	LangGraph (10-node ReAct workflow)
RAG	pgvector + BM25 hybrid search + Cohere reranking
LLMs	Groq (free) → OpenAI → Anthropic (fallback chain)
Embeddings	Sentence-Transformers (local, free, CPU)
Cache	Redis (rate limiting + query cache + embeddings)
Database	PostgreSQL 16 + pgvector extension
Monitoring	LangSmith + Prometheus + Grafana + RAGAS
Security	JWT + bcrypt + AES-256-GCM + Presidio PII
Infrastructure	Docker Compose + GitHub Actions CI/CD

13 Phases

Phase	What Gets Built
01	Project scaffold — this article
02	JWT auth, bcrypt, AES encryption, rate limiting
03	Document parsers, chunking, WebSocket progress
04	Embeddings, pgvector, hybrid search, reranking
05	LLM router — 7 providers, fallback chain, cost tracking
06	RAG pipeline — prompt engineering, query rewriting, CRAG
07	LangGraph 10-node agent, self-correction loops
08	MCP integration — external tool calling
09	SSE streaming, conversation memory
10	Safety layer — PII masking, RAGAS evaluation, CI gates
11-13	React frontend, production deployment

This article covers Phase 1 — the complete project scaffold. No AI yet. Just the foundation that everything else sits on.

The Philosophy: Why This Matters

Before writing a single line of code, I made one decision:

Every file gets a reason. Every decision gets a justification. Nothing exists "because the tutorial said so."

This forces architectural clarity. When you know WHY each piece exists, you can adapt it. When you only know WHAT it does, you're stuck.

Phase 1 File Structure

genai-platform/
├── .gitignore                     # what git never tracks
├── .env.example                   # env var template — committed
├── README.md                      # project front door
├── .github/
│   └── workflows/
│       ├── ci.yml                 # runs on every push
│       └── deploy.yml             # runs on main merge only
├── infrastructure/
│   └── docker-compose.yml         # PostgreSQL + Redis + Backend
└── backend/
    ├── requirements.txt           # pinned Python dependencies
    ├── pyproject.toml             # ruff + mypy + pytest config
    ├── Dockerfile                 # multi-stage production build
    ├── alembic.ini                # migration configuration
    ├── alembic/
    │   ├── env.py                 # async migration bridge
    │   └── versions/              # migration files (Phase 2+)
    └── app/
        ├── __init__.py            # package marker + version
        ├── config.py              # Pydantic Settings — all env vars
        ├── main.py                # FastAPI app + lifespan + middleware
        ├── dependencies.py        # get_db, get_redis, get_current_user
        ├── db/
        │   ├── database.py        # async engine + session factory
        │   └── init_db.py         # pgvector extension + admin seed
        ├── monitoring/
        │   ├── logger.py          # structured JSON logging (structlog)
        │   └── metrics.py         # 12 Prometheus metrics defined
        └── api/v1/
            └── health.py          # /health (liveness) + /ready (readiness)

24 files. Let's go through the key decisions.

Decision 1: `.gitignore` — What Never Gets Committed

The .gitignore has one critical pattern most developers miss:

# The pattern every production codebase uses
.env        # real secrets — never committed
.env.*      # covers .env.local, .env.production, etc.
!.env.example  # ← the ! means EXCEPTION — template IS committed

.env.example is committed. It has placeholder values. When someone clones the repo they do:

cp .env.example .env
# then fill in real values

Zero guessing about what variables are needed.

We also ignore uploaded documents:

uploads/
*.pdf
*.docx
*.pptx
*.csv

In production, files go to Cloudflare R2 object storage — not git. Git is for code. Not user data.

Decision 2: Environment Variables Done Right

The beginner approach:

# scattered across 15 files — dangerous
db_url = os.getenv("DATABASE_URL")         # returns None silently if missing
secret = os.getenv("SECRIT_KEY")           # typo — also None, no warning
expire = int(os.getenv("TOKEN_EXPIRE"))    # crashes here if None

Three problems: typos return None silently, no types, missing variables don't surface until runtime — deep inside a failing request.

The production approach — Pydantic Settings:

# app/config.py
from pydantic_settings import BaseSettings, SettingsConfigDict
from pydantic import Field, field_validator
from functools import lru_cache

class Settings(BaseSettings):
    model_config = SettingsConfigDict(
        env_file=".env",
        extra="ignore",
    )

    # Required — app refuses to start without these
    DATABASE_URL: str = Field(...)
    SECRET_KEY: str = Field(...)
    GROQ_API_KEY: str = Field(...)

    # Optional — typed, with defaults
    ACCESS_TOKEN_EXPIRE_MINUTES: int = 15   # "15" → int automatically
    DEBUG: bool = False                      # "true" → bool automatically

    # Custom validators — fail fast with clear messages
    @field_validator("SECRET_KEY")
    @classmethod
    def validate_secret_key(cls, v: str) -> str:
        if len(v) < 32:
            raise ValueError(
                "SECRET_KEY must be at least 32 characters. "
                "Generate with: openssl rand -hex 32"
            )
        return v

    @field_validator("DATABASE_URL")
    @classmethod
    def validate_database_url(cls, v: str) -> str:
        if not v.startswith("postgresql+asyncpg://"):
            raise ValueError(
                "DATABASE_URL must use asyncpg driver. "
                "Change postgresql:// to postgresql+asyncpg://"
            )
        return v

@lru_cache()
def get_settings() -> Settings:
    return Settings()

settings = get_settings()  # read once, cached forever

If DATABASE_URL is missing:

ValidationError: DATABASE_URL field required

If SECRET_KEY is too short:

ValidationError: SECRET_KEY must be at least 32 characters.
Generate with: openssl rand -hex 32

Clear. Specific. Actionable. At startup — not runtime.

The @lru_cache() means the .env file is read once at startup. Not on every request. Not on every import. Once. Cached forever. This also ensures immutable configuration — the app has one consistent config for its entire lifetime.

Decision 3: Async SQLAlchemy 2.0 — Why It Changes Everything

# app/db/database.py
from sqlalchemy.ext.asyncio import (
    create_async_engine,
    async_sessionmaker,
    AsyncSession,
)
from sqlalchemy.orm import DeclarativeBase

engine = create_async_engine(
    url=settings.DATABASE_URL,  # postgresql+asyncpg:// — MUST be asyncpg
    pool_size=20,                # 20 connections in pool
    max_overflow=10,             # 10 extra when pool is full
    pool_pre_ping=True,          # test before using (prevents stale connections)
    echo=settings.DEBUG,         # log SQL in development
)

AsyncSessionLocal = async_sessionmaker(
    bind=engine,
    expire_on_commit=False,  # ← CRITICAL — more on this below
    class_=AsyncSession,
    autoflush=False,
)

class Base(DeclarativeBase):
    pass

The `expire_on_commit=False` Trap

This is the most common async SQLAlchemy mistake. By default, after commit(), SQLAlchemy marks all objects as "expired". The next attribute access triggers a new DB query. In sync code — fine. In async code:

user = await db.get(User, user_id)
await db.commit()
print(user.email)   # ← CRASH in async: MissingGreenlet error

With expire_on_commit=False:

user = await db.get(User, user_id)
await db.commit()
print(user.email)   # ✅ works — values kept in memory

When you actually need fresh data, use await db.refresh(user) explicitly. Explicit is better than implicit.

Why `postgresql+asyncpg://` Not `postgresql://`?

One character difference. Completely different behaviour.

postgresql:// → sync driver → blocks the event loop → your server handles one request at a time during DB calls.

postgresql+asyncpg:// → async driver → non-blocking → server handles hundreds of concurrent requests during DB calls.

Our validator in config.py catches the wrong driver at startup:

ValidationError: DATABASE_URL must use asyncpg driver.

Decision 4: FastAPI Application Structure

# app/main.py
from contextlib import asynccontextmanager
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

@asynccontextmanager
async def lifespan(app: FastAPI):
    # ── STARTUP ──────────────────────────────────────────────
    setup_logging()        # 1. logging first — everything else logs
    setup_prometheus()     # 2. metrics
    await init_db()        # 3. DB — needs logging ready
    connect_redis()        # 4. Redis

    yield  # ← app handles requests here

    # ── SHUTDOWN ─────────────────────────────────────────────
    await redis.aclose()
    await engine.dispose()

app = FastAPI(lifespan=lifespan)

Why `lifespan` Instead of `@app.on_event`?

@app.on_event("startup") is deprecated since FastAPI 0.93. It's still in most tutorials. Don't use it.

The lifespan pattern:

Startup and shutdown in one function — paired naturally
finally block ensures cleanup even on crashes
Testable — can be mocked cleanly
No deprecation warnings ### The Startup Order

Order matters. Logging must be first so everything after it can produce logs.

Logging → Prometheus → Database → Redis → Ready

If database fails at startup — we raise the exception. An app that starts without a database looks healthy but serves broken responses. Fail fast. Fail loudly.

Middleware — Three Layers

# Layer 1: CORS — browsers need this to call your API
app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.ALLOWED_ORIGINS,
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
    allow_headers=["*"],
)

# Layer 2: Request ID — every request gets a unique ID
@app.middleware("http")
async def add_request_id(request: Request, call_next):
    request_id = request.headers.get("X-Request-ID", str(uuid.uuid4()))
    structlog.contextvars.bind_contextvars(request_id=request_id)
    response = await call_next(request)
    response.headers["X-Request-ID"] = request_id
    structlog.contextvars.clear_contextvars()
    return response

# Layer 3: Request Logging — every request logged automatically
@app.middleware("http")
async def log_requests(request: Request, call_next):
    start = time.perf_counter()
    response = await call_next(request)
    duration_ms = (time.perf_counter() - start) * 1000
    log.info("request_completed",
             method=request.method,
             path=request.url.path,
             status_code=response.status_code,
             duration_ms=round(duration_ms, 2))
    return response

CORS must be registered first — it needs to be on every response including error responses. If registered after your error handler, CORS errors on errors produce confusing network failures.

Decision 5: Dependency Injection

# app/dependencies.py
async def get_db() -> AsyncGenerator[AsyncSession, None]:
    async with AsyncSessionLocal() as session:
        try:
            yield session        # route runs with this session
            await session.commit()
        except Exception:
            await session.rollback()
            raise
        # session closes automatically — even on exception

async def get_redis(request: Request):
    return request.app.state.redis

In every route:

@router.get("/documents")
async def list_documents(
    db: AsyncSession = Depends(get_db),
    current_user: User = Depends(get_current_user),
):
    # db is ready, current_user is verified
    # no setup code needed here
    ...

The yield pattern ensures the session always closes, even if the route raises an exception. No leaked connections.

For testing:

app.dependency_overrides[get_db] = override_get_db  # swap real DB for test DB

Decision 6: Liveness vs Readiness Probes

Two endpoints. Two completely different questions.

# GET /api/v1/health — liveness: is the process alive?
# NEVER checks external services
@router.get("/health")
async def health_check():
    return {"status": "ok", "version": __version__}

# GET /api/v1/ready — readiness: can this instance handle traffic?
# Checks ALL dependencies
@router.get("/ready")
async def readiness_check(request: Request):
    checks = {}
    all_ready = True

    try:
        async with engine.connect() as conn:
            await conn.execute(text("SELECT 1"))
        checks["database"] = {"status": "ok"}
    except Exception as e:
        all_ready = False
        checks["database"] = {"status": "error", "error": str(e)}

    try:
        await request.app.state.redis.ping()
        checks["redis"] = {"status": "ok"}
    except Exception as e:
        all_ready = False
        checks["redis"] = {"status": "error"}

    return JSONResponse(
        status_code=200 if all_ready else 503,
        content={"status": "ready" if all_ready else "not_ready",
                 "checks": checks},
    )

Real scenario — PostgreSQL restarts for 30 seconds:

With one combined endpoint:

/health returns 503
Kubernetes thinks the process is dead
Kubernetes kills and restarts the container
Restart doesn't fix PostgreSQL
Kubernetes keeps restarting
This is a crash loop — users see errors for minutes With two separate endpoints:
/health returns 200 (process is alive)
/ready returns 503 (can't reach DB)
Load balancer stops routing to this instance
Traffic goes to other healthy instances
Users see nothing

- PostgreSQL comes back → `/ready` returns 200 → traffic resumes

Decision 7: Structured Logging

# app/monitoring/logger.py
import structlog

def setup_logging():
    structlog.configure(
        processors=[
            structlog.stdlib.add_log_level,
            structlog.stdlib.add_logger_name,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.contextvars.merge_contextvars,  # includes request_id
            structlog.processors.JSONRenderer() if not settings.DEBUG
            else structlog.dev.ConsoleRenderer(colors=True),
        ],
        wrapper_class=structlog.stdlib.BoundLogger,
        logger_factory=structlog.stdlib.LoggerFactory(),
    )

Development output (colored, human-readable):

2025-03-14 10:30:00 [info] document_uploaded  filename=report.pdf size_mb=2.4

Production output (JSON, machine-readable):

{"event":"document_uploaded","filename":"report.pdf","size_mb":2.4,
 "request_id":"abc-123","level":"info","timestamp":"2025-03-14T10:30:00Z"}

Same logging call. Different format. Zero code changes.

Usage anywhere in the app:

log = structlog.get_logger(__name__)
log.info("chunks_created", count=47, strategy="parent_child", duration_ms=234)
log.error("llm_failed", provider="groq", error=str(e), exc_info=True)

The request_id bound in middleware flows through every log line automatically. When something breaks, search request_id = "abc-123" and see the complete story of that request.

Decision 8: Multi-Stage Docker Build

# Stage 1 — Builder (large, temporary)
FROM python:3.12-slim AS builder
WORKDIR /build

RUN apt-get update && apt-get install -y gcc python3-dev libpq-dev \
    && rm -rf /var/lib/apt/lists/*

# requirements.txt BEFORE app code — enables layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2 — Production (small, deployed)
FROM python:3.12-slim AS production

# Runtime dependencies only — no build tools
RUN apt-get update && apt-get install -y libpq5 tesseract-ocr curl \
    && rm -rf /var/lib/apt/lists/*

# Copy ONLY compiled packages — not gcc, not make, not compilers
COPY --from=builder /install /usr/local

# Non-root user — least privilege principle
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
RUN mkdir -p /app/uploads && chown -R appuser:appgroup /app

USER appuser
COPY --chown=appuser:appgroup ./app /app/app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Result: 812MB → 298MB. Same application. Same functionality.

Layer caching rule:

Things that change RARELY → top of Dockerfile
Things that change OFTEN  → bottom of Dockerfile

requirements.txt changes rarely. App code changes constantly. Copy requirements first → pip install is cached → code changes don't trigger pip install.

Decision 9: Alembic Async Bridge

Standard Alembic is synchronous. Our app uses async SQLAlchemy. The bridge:

# alembic/env.py
async def run_async_migrations() -> None:
    connectable = create_async_engine(settings.DATABASE_URL)

    async with connectable.connect() as connection:
        # run_sync() extracts a sync connection from async
        # Alembic runs inside that sync connection
        await connection.run_sync(do_run_migrations)

    await connectable.dispose()

def run_migrations_online() -> None:
    asyncio.run(run_async_migrations())

connection.run_sync(do_run_migrations) is the bridge.

connection is async
run_sync() extracts a sync version
Alembic runs normally inside do_run_migrations() This is the official Alembic async pattern.

Decision 10: GitHub Actions CI

Every push triggers four jobs in parallel:

jobs:
  lint:      # ruff — style + security issues
  typecheck: # mypy — type errors
  test:      # pytest with real postgres + redis
    services:
      postgres:
        image: pgvector/pgvector:pg16
      redis:
        image: redis:7-alpine

  docker:    # verify image builds
    needs: [lint, typecheck, test]  # only if all pass

Key decisions:

Real PostgreSQL and Redis in test job — not mocks
Docker build only runs after all three pass — fail fast on cheap checks
cache: "pip" in setup-python — subsequent runs are 10× faster

- `cache-from: type=gha` for Docker — layer caching across CI runs

Running Phase 1

# 1. Clone and set up
git clone https://github.com/digvijaysingh21/genai-docqa.git
cd genai-docqa
cp .env.example .env

# 2. Generate secrets
openssl rand -hex 32  # paste as SECRET_KEY
openssl rand -hex 32  # paste as ENCRYPTION_KEY
# Get free Groq key at: console.groq.com → paste as GROQ_API_KEY

# 3. Start everything
cd infrastructure
docker compose up

# 4. Test
curl http://localhost:8000/api/v1/health
# {"status":"ok","version":"1.0.0","environment":"development"}

curl http://localhost:8000/api/v1/ready
# {"status":"ready","checks":{"database":{"status":"ok"},"redis":{"status":"ok"}}}

# 5. Open Swagger UI
open http://localhost:8000/docs

What Phase 1 Establishes

Before a single AI feature is built, we have:

✅ Deterministic builds (pinned versions)
✅ Fail-fast configuration (Pydantic validators)
✅ Async database with connection pooling
✅ Structured JSON logging with request tracing
✅ 12 Prometheus metrics defined
✅ Liveness + readiness health probes
✅ Multi-stage Docker build (300MB vs 800MB)
✅ Non-root container user
✅ Layer-optimised Dockerfile (10s rebuild vs 6min)
✅ Async Alembic migration bridge
✅ FastAPI DI system (get_db, get_redis)
✅ CI pipeline (lint + typecheck + tests + docker build) This is the foundation. Everything from Phase 2 through Phase 13 sits on top of this.

Phase 2: Auth and Security — JWT tokens with 15-minute access + 7-day refresh, bcrypt password hashing, AES-256-GCM encryption for LLM API keys (BYOK pattern), and Redis sliding window rate limiting.

If you're building along, the full code is at: THE REPO will be added soon

Building this project phase by phase. Every decision explained. Follow along for Phase 2.

DEV Community

I Built a Production-Grade AI Platform From Scratch (Here’s the Exact Folder Structure)

How I Structured a Production-Grade AI Platform From Scratch

What We're Building

The Full Stack

13 Phases

The Philosophy: Why This Matters

Phase 1 File Structure

Decision 1: `.gitignore` — What Never Gets Committed

Decision 2: Environment Variables Done Right

Decision 3: Async SQLAlchemy 2.0 — Why It Changes Everything

The `expire_on_commit=False` Trap

Why `postgresql+asyncpg://` Not `postgresql://`?

Decision 4: FastAPI Application Structure

Why `lifespan` Instead of `@app.on_event`?

Middleware — Three Layers

Decision 5: Dependency Injection

Decision 6: Liveness vs Readiness Probes

- PostgreSQL comes back → `/ready` returns 200 → traffic resumes

Decision 7: Structured Logging

Decision 8: Multi-Stage Docker Build

Decision 9: Alembic Async Bridge

Decision 10: GitHub Actions CI

- `cache-from: type=gha` for Docker — layer caching across CI runs

Running Phase 1

What Phase 1 Establishes

Next Article

Top comments (0)

How I Structured a Production-Grade AI Platform From Scratch

What We're Building

The Full Stack

13 Phases

The Philosophy: Why This Matters

Phase 1 File Structure

Decision 1: .gitignore — What Never Gets Committed

Decision 2: Environment Variables Done Right

Decision 3: Async SQLAlchemy 2.0 — Why It Changes Everything

The expire_on_commit=False Trap

Why postgresql+asyncpg:// Not postgresql://?

Decision 4: FastAPI Application Structure

Why lifespan Instead of @app.on_event?

Middleware — Three Layers

Decision 5: Dependency Injection

Decision 6: Liveness vs Readiness Probes

- PostgreSQL comes back → /ready returns 200 → traffic resumes

Decision 7: Structured Logging

Decision 8: Multi-Stage Docker Build

Decision 9: Alembic Async Bridge

Decision 10: GitHub Actions CI

- cache-from: type=gha for Docker — layer caching across CI runs

Running Phase 1

What Phase 1 Establishes

Next Article

Decision 1: `.gitignore` — What Never Gets Committed

The `expire_on_commit=False` Trap

Why `postgresql+asyncpg://` Not `postgresql://`?

Why `lifespan` Instead of `@app.on_event`?

- PostgreSQL comes back → `/ready` returns 200 → traffic resumes

- `cache-from: type=gha` for Docker — layer caching across CI runs