Every ML system is like a spacecraft โ powerful, intricate, and temperamental.
But without telemetry, you have no idea where itโs headed.
๐ Introduction
The CRAG (Comprehensive RAG Benchmark) from Meta AI is the control panel for Retrieval-Augmented Generation systems.
It measures how well model responses stay grounded in facts, remain robust under noise, and maintain contextual relevance.
As is often the case with research projects, CRAG required engineering adaptation to operate reliably in a modern environment:
incompatible library versions, dependency conflicts, unclear paths, and manual launch steps.
๐งฐ I wanted to bring CRAG to a state where it could be launched with a single command โ no dependency chaos, no manual fixes.
The result is a fully reproducible Dockerized environment, available here:
๐ github.com/astronaut27/CRAG_with_Docker
๐ What I Improved
In the original build, several issues made CRAG difficult to run:
- ๐ง Conflicting library versions;
- ๐ฆ An incorrect PYTHONPATH broke the mock-API launch;
- โ๏ธ No unified, reproducible start-up workflow.
Now, everything comes to life with a single command:
docker-compose up --build
After building, two containers start automatically:
- ๐ฐ๏ธ mock-api โ an emulator for web search and Knowledge Graph APIs;
- ๐ crag-app โ the main container with the benchmark and built-in baseline models.
๐งฑ Pre-Launch Preparation: Handling the Mission Artifacts
Before firing up the Docker build, make sure all mission artifacts โ the large data and model files โ are present locally.
Because CRAG includes files over 100 MB, it uses Git Large File Storage (LFS). Without them, your container wonโt initialize.
So the first command in your console is essentially fueling the ship with data:
git lfs pull
๐งฉ How It Works
๐ก โ๏ธ CRAG in Autonomous Mode
- mock-API โ simulates external data sources (Web Search, KG API) used by the RAG system.
- crag-app โ the main container running the benchmark and the model used for response generation (a dummy model at this stage).
- local_evaluation.py โ coordinates the pipeline, calls the mock API, and handles metric evaluation.
- ChatGPT โ serves as an LLM-assisted judge that evaluates generated responses by CRAGโs metrics.
๐ง What CRAG Measures: The Telemetry Dashboard
CRAG reports quantitative indicators โ a flight log of your system after a test mission:
- total: Total number of evaluated examples.
- n_correct: Count of responses that are fully supported by retrieved context.
- n_hallucination: Number of responses containing unsupported or invented facts.
- n_miss: Responses missing key information or empty answers.
- accuracy/ score: Overall precision (ratio of correct responses).
- hallucination: Ratio = n_hallucination / total.
- missing: Ratio = n_miss / total.
๐ก These metrics are the sensors on your RAG shipโs dashboard.
If any of them start flashing red โ itโs time to check the modelโs engine.
๐งฑ Docker Architecture
version: '3.8'
services:
# Mock API service for RAG data
mock-api:
build:
context: ../mock_api
dockerfile: ../deployments/Dockerfile.mock-api
container_name: crag-mock-api
ports:
- "8000:8000"
volumes:
- ../mock_api/cragkg:/app/cragkg
environment:
- PYTHONPATH=/app
networks:
- crag-network
restart: unless-stopped
# CRAG application container
crag-app:
build:
context: ..
dockerfile: deployments/Dockerfile.crag-app
container_name: crag-app
depends_on:
- mock-api
environment:
# OpenAI for evaluation (optional)
- OPENAI_API_KEY=${OPENAI_API_KEY}
# Mock API connection (Docker service)
- CRAG_MOCK_API_URL=http://mock-api:8000
# Evaluation model
- EVALUATION_MODEL_NAME=${EVALUATION_MODEL_NAME:-gpt-4-0125-preview}
volumes:
# Mount large data directories (read-only)
- ../data:/app/data:ro
- ../results:/app/results
- ../example_data:/app/example_data:ro
# Tokenizer (if needed)
- ../tokenizer:/app/tokenizer:ro
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
- crag-network
stdin_open: true
tty: true
command: ["python", "local_evaluation.py"]
networks:
crag-network:
driver: bridge
๐ช Why This Matters
RAG systems are quickly becoming the core engines of modern LLM-based products.
CRAG allows engineers to evaluate their reliability and factual grounding before shipping to production.
This Docker build transforms Meta AIโs research benchmark into a practical engineering environment:
- ๐ฆ fully isolated and reproducible;
- ๐ง runnable locally or in CI pipelines;
- ๐ easily extendable with your own models (for example, via LM Studio โ coming in the next mission).
๐ญ The Next Mission
Right now, CRAG runs on its built-in baselines โ a test flight before mounting the real engine.
The next step is integrating the LM Studio API and evaluating a live LLM within the same container setup.
That will be Mission II ๐
๐งญ Mission Summary
โSometimes engineering magic isnโt about building a brand-new ship,
but about preparing an existing one for its next flight.โ
CRAG now launches reliably, telemetry is stable, and the mission is a success.
Next up: integrating LM Studio and real models.
For now, the ship holds a steady course. ๐ช
๐ Mission Repository
๐ฆ github.com/astronaut27/CRAG_with_Docker
๐ License
CRAG is distributed under the MIT License, developed by Meta AI / Facebook Research.
All modifications in CRAG_with_Docker preserve the original copyright notices.



Top comments (0)