DEV Community

Parmod Gandhi
Parmod Gandhi

Posted on

I built a "boring" RAG demo over World Cup data — SQLite, sqlite-vec, and no framework

Most RAG tutorials reach for a vector database and a heavy framework before they’ve answered a single question. I wanted to see how small the whole thing could be — so I built a question-answering demo over real soccer data using nothing but a file-based SQLite database, a vector extension, and an LLM call.
You can try it, free and with no signup: WorldCup.GetToKnowYourOwnData.com

Ask it things like “Who scored in the 2022 World Cup final?” or “How did Morocco’s group stage go?” and it answers in plain English — and points back to the match record it used, so you can verify it rather than trust it.

This post walks through the architecture, which is deliberately unexciting. That’s the point.

What RAG actually is, in one paragraph
Retrieval-Augmented Generation doesn’t change the model. It changes what you put in front of the model when you ask a question. Keep a collection of your own content, find the few pieces most relevant to a question, hand those to an LLM along with the question, and ask it to answer from what you gave it. The model doesn’t have to remember your data — it just reads the snippets you retrieved. Think of a knowledgeable friend with the right page of a handbook open in front of them.

The whole stack
Here is everything involved:

  1. Chunk the source documents into passages.
  2. Embed each chunk — turn its text into a vector — with an embedding model (I use Voyage AI).
  3. Store the chunks and their vectors in SQLite, using the sqlite-vec extension for vector search.
  4. At query time, embed the question, run a vector similarity search to get the top-k closest chunks, and hand them to an LLM (I use Claude) with a prompt that says: answer only from this context, and cite it. No vector-DB service. No orchestration framework. The database is a single file you can copy with scp. The retrieval is one SQL query:

SELECT c.text, c.source
FROM chunk_vectors v
JOIN chunks c ON c.chunk_id = v.chunk_id
WHERE v.embedding MATCH :question_vector
ORDER BY distance
LIMIT 8;

Why SQLite instead of a vector database
For a corpus of a few thousand chunks — which covers an enormous number of real-world use cases — a dedicated vector database is solving a scale problem you don’t have. SQLite with sqlite-vec gives you vector search in-process: zero servers, zero network hops, and a database that is a single portable file. Back it up by copying it. Deploy it by copying it. When you genuinely outgrow it you’ll know — and most projects never do.

The honest answer to “what framework should I use for RAG?” is often: none. The moving parts are a chunker, an embedder, a vector index, and a prompt. All four are visible here.

The data
It runs on free, open data:

  1. StatsBomb open data for completed tournaments — the 2022 World Cup, Euro 2024, and Copa América 2024 — with full match detail: shots, expected goals, scorers.
  2. Openfootball for the 2026 World Cup schedule and results as they fill in (open data, next-day, not live in-game).

Each match becomes a few readable text documents, which get chunked and embedded like anything else. The corpus is just files on disk.

Cloud today, local tomorrow — the part I care about most
The demo calls a cloud model (Claude) for generation. But the LLM is the one part of a RAG pipeline that is genuinely swappable: nothing else in the system cares which model answers. Change two lines of config and the exact same pipeline runs against a local model with Ollama — so the whole thing can run on one machine with no data leaving it. That matters for the real reason most people want RAG over their own documents: privacy. A lawyer’s contracts, a doctor’s records, a company’s internal documents — none of that should reach a cloud API.

This live soccer demo is the cheap, public proof that the pipeline works. The same architecture, pointed at a local model, is what you’d actually use for private data.

What it’s good at — and not
RAG shines when an answer lives in one or two pieces of your content: what was the score of X, who scored in Y. It’s weak on questions that require synthesizing across your entire corpus at once — it only sees what it retrieves. And it can still be wrong: retrieval can miss, or the model can misread what it got. That’s exactly why every answer in the demo cites its source. Grounding helps enormously; verification is still yours.

Try it, or dig in
Demo: WorldCup.GetToKnowYourOwnData.com — free, no signup. Try to break it and tell me where retrieval falls down.

The demo is the worked example from a book I wrote on building your own RAG end to end (in Delphi, and Python) — Get to Know Your Own Data with RAG — and the companion code will be free on GitHub (when book is published this month).

If you take one thing from this: before you install a vector database and a framework, try the boring version. A file, a SQL query, and a prompt go a remarkably long way.

Top comments (0)