DEV Community

Cover image for ๐Ÿง‘โ€๐Ÿš€ Mission Accomplished: How an Engineer-Astronaut Prepared Metaโ€™s CRAG Benchmark for Launch in Docker
astronaut
astronaut

Posted on

๐Ÿง‘โ€๐Ÿš€ Mission Accomplished: How an Engineer-Astronaut Prepared Metaโ€™s CRAG Benchmark for Launch in Docker

Every ML system is like a spacecraft โ€” powerful, intricate, and temperamental.
But without telemetry, you have no idea where itโ€™s headed.

๐ŸŒŒ Introduction

The CRAG (Comprehensive RAG Benchmark) from Meta AI is the control panel for Retrieval-Augmented Generation systems.
It measures how well model responses stay grounded in facts, remain robust under noise, and maintain contextual relevance.

As is often the case with research projects, CRAG required engineering adaptation to operate reliably in a modern environment:
incompatible library versions, dependency conflicts, unclear paths, and manual launch steps.

๐Ÿงฐ I wanted to bring CRAG to a state where it could be launched with a single command โ€” no dependency chaos, no manual fixes.
The result is a fully reproducible Dockerized environment, available here:

๐Ÿ‘‰ github.com/astronaut27/CRAG_with_Docker

๐Ÿš€ What I Improved

In the original build, several issues made CRAG difficult to run:

  • ๐Ÿ”ง Conflicting library versions;
  • ๐Ÿ“ฆ An incorrect PYTHONPATH broke the mock-API launch;
  • โš™๏ธ No unified, reproducible start-up workflow.

Now, everything comes to life with a single command:

docker-compose up --build
Enter fullscreen mode Exit fullscreen mode

After building, two containers start automatically:

  • ๐Ÿ›ฐ๏ธ mock-api โ€” an emulator for web search and Knowledge Graph APIs;
  • ๐Ÿš€ crag-app โ€” the main container with the benchmark and built-in baseline models.

๐Ÿงฑ Pre-Launch Preparation: Handling the Mission Artifacts

Before firing up the Docker build, make sure all mission artifacts โ€” the large data and model files โ€” are present locally.

Because CRAG includes files over 100 MB, it uses Git Large File Storage (LFS). Without them, your container wonโ€™t initialize.

So the first command in your console is essentially fueling the ship with data:

git lfs pull
Enter fullscreen mode Exit fullscreen mode

๐Ÿงฉ How It Works

๐Ÿ“ก โš™๏ธ CRAG in Autonomous Mode

  • mock-API โ€” simulates external data sources (Web Search, KG API) used by the RAG system.
  • crag-app โ€” the main container running the benchmark and the model used for response generation (a dummy model at this stage).
  • local_evaluation.py โ€” coordinates the pipeline, calls the mock API, and handles metric evaluation.
  • ChatGPT โ€” serves as an LLM-assisted judge that evaluates generated responses by CRAGโ€™s metrics.

๐Ÿง  What CRAG Measures: The Telemetry Dashboard

CRAG reports quantitative indicators โ€” a flight log of your system after a test mission:

  • total: Total number of evaluated examples.
  • n_correct: Count of responses that are fully supported by retrieved context.
  • n_hallucination: Number of responses containing unsupported or invented facts.
  • n_miss: Responses missing key information or empty answers.
  • accuracy/ score: Overall precision (ratio of correct responses).
  • hallucination: Ratio = n_hallucination / total.
  • missing: Ratio = n_miss / total.

๐Ÿ’ก These metrics are the sensors on your RAG shipโ€™s dashboard.
If any of them start flashing red โ€” itโ€™s time to check the modelโ€™s engine.

๐Ÿงฑ Docker Architecture

version: '3.8'

services:
  # Mock API service for RAG data
  mock-api:
    build:
      context: ../mock_api
      dockerfile: ../deployments/Dockerfile.mock-api
    container_name: crag-mock-api
    ports:
      - "8000:8000"
    volumes:
      - ../mock_api/cragkg:/app/cragkg
    environment:
      - PYTHONPATH=/app
    networks:
      - crag-network
    restart: unless-stopped

  # CRAG application container
  crag-app:
    build:
      context: ..
      dockerfile: deployments/Dockerfile.crag-app
    container_name: crag-app
    depends_on:
      - mock-api
    environment:
      # OpenAI for evaluation (optional)
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      # Mock API connection (Docker service)
      - CRAG_MOCK_API_URL=http://mock-api:8000
      # Evaluation model
      - EVALUATION_MODEL_NAME=${EVALUATION_MODEL_NAME:-gpt-4-0125-preview}
    volumes:
      # Mount large data directories (read-only)
      - ../data:/app/data:ro
      - ../results:/app/results
      - ../example_data:/app/example_data:ro
      # Tokenizer (if needed)
      - ../tokenizer:/app/tokenizer:ro
    extra_hosts:
      - "host.docker.internal:host-gateway"
    networks:
      - crag-network
    stdin_open: true
    tty: true
    command: ["python", "local_evaluation.py"]

networks:
  crag-network:
    driver: bridge
Enter fullscreen mode Exit fullscreen mode

๐Ÿช Why This Matters

RAG systems are quickly becoming the core engines of modern LLM-based products.
CRAG allows engineers to evaluate their reliability and factual grounding before shipping to production.

This Docker build transforms Meta AIโ€™s research benchmark into a practical engineering environment:

  • ๐Ÿ“ฆ fully isolated and reproducible;
  • ๐Ÿง  runnable locally or in CI pipelines;
  • ๐Ÿš€ easily extendable with your own models (for example, via LM Studio โ€” coming in the next mission).

๐Ÿ”ญ The Next Mission

Right now, CRAG runs on its built-in baselines โ€” a test flight before mounting the real engine.
The next step is integrating the LM Studio API and evaluating a live LLM within the same container setup.
That will be Mission II ๐Ÿš€

๐Ÿงญ Mission Summary

โ€œSometimes engineering magic isnโ€™t about building a brand-new ship,
but about preparing an existing one for its next flight.โ€

CRAG now launches reliably, telemetry is stable, and the mission is a success.

Next up: integrating LM Studio and real models.
For now, the ship holds a steady course. ๐Ÿช

๐Ÿ”— Mission Repository

๐Ÿ“ฆ github.com/astronaut27/CRAG_with_Docker

๐Ÿ“œ License
CRAG is distributed under the MIT License, developed by Meta AI / Facebook Research.
All modifications in CRAG_with_Docker preserve the original copyright notices.

Top comments (0)