<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Parmod Gandhi</title>
    <description>The latest articles on DEV Community by Parmod Gandhi (@gandhipk).</description>
    <link>https://dev.to/gandhipk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3981611%2F3ee3145e-d09e-47e6-9205-074a80292207.jpg</url>
      <title>DEV Community: Parmod Gandhi</title>
      <link>https://dev.to/gandhipk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gandhipk"/>
    <language>en</language>
    <item>
      <title>I built a "boring" RAG demo over World Cup data — SQLite, sqlite-vec, and no framework</title>
      <dc:creator>Parmod Gandhi</dc:creator>
      <pubDate>Fri, 12 Jun 2026 17:30:14 +0000</pubDate>
      <link>https://dev.to/gandhipk/i-built-a-boring-rag-demo-over-world-cup-data-sqlite-sqlite-vec-and-no-framework-3l21</link>
      <guid>https://dev.to/gandhipk/i-built-a-boring-rag-demo-over-world-cup-data-sqlite-sqlite-vec-and-no-framework-3l21</guid>
      <description>&lt;p&gt;Most RAG tutorials reach for a vector database and a heavy framework before they’ve answered a single question. I wanted to see how small the whole thing could be — so I built a question-answering demo over real soccer data using nothing but a file-based SQLite database, a vector extension, and an LLM call.&lt;br&gt;
You can try it, free and with no signup: WorldCup.GetToKnowYourOwnData.com&lt;/p&gt;

&lt;p&gt;Ask it things like “Who scored in the 2022 World Cup final?” or “How did Morocco’s group stage go?” and it answers in plain English — and points back to the match record it used, so you can verify it rather than trust it.&lt;/p&gt;

&lt;p&gt;This post walks through the architecture, which is deliberately unexciting. That’s the point. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What RAG actually is, in one paragraph&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Retrieval-Augmented Generation doesn’t change the model. It changes what you put in front of the model when you ask a question. Keep a collection of your own content, find the few pieces most relevant to a question, hand those to an LLM along with the question, and ask it to answer from what you gave it. The model doesn’t have to remember your data — it just reads the snippets you retrieved. Think of a knowledgeable friend with the right page of a handbook open in front of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The whole stack&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Here is everything involved:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;  Chunk the source documents into passages.&lt;/li&gt;
&lt;li&gt;  Embed each chunk — turn its text into a vector — with an embedding model (I use Voyage AI).&lt;/li&gt;
&lt;li&gt;  Store the chunks and their vectors in SQLite, using the sqlite-vec extension for vector search.&lt;/li&gt;
&lt;li&gt;  At query time, embed the question, run a vector similarity search to get the top-k closest chunks, and hand them to an LLM (I use Claude) with a prompt that says: answer only from this context, and cite it.
No vector-DB service. No orchestration framework. The database is a single file you can copy with scp. The retrieval is one SQL query:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;SELECT c.text, c.source&lt;br&gt;
FROM chunk_vectors v&lt;br&gt;
JOIN chunks c ON c.chunk_id = v.chunk_id&lt;br&gt;
WHERE v.embedding MATCH :question_vector&lt;br&gt;
ORDER BY distance&lt;br&gt;
LIMIT 8;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Why SQLite instead of a vector database&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
For a corpus of a few thousand chunks — which covers an enormous number of real-world use cases — a dedicated vector database is solving a scale problem you don’t have. SQLite with sqlite-vec gives you vector search in-process: zero servers, zero network hops, and a database that is a single portable file. Back it up by copying it. Deploy it by copying it. When you genuinely outgrow it you’ll know — and most projects never do.&lt;/p&gt;

&lt;p&gt;The honest answer to “what framework should I use for RAG?” is often: none. The moving parts are a chunker, an embedder, a vector index, and a prompt. All four are visible here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;The data&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
It runs on free, open data:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;  StatsBomb open data for completed tournaments — the 2022 World Cup, Euro 2024, and Copa América 2024 — with full match detail: shots, expected goals, scorers.&lt;/li&gt;
&lt;li&gt;  Openfootball for the 2026 World Cup schedule and results as they fill in (open data, next-day, not live in-game).&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each match becomes a few readable text documents, which get chunked and embedded like anything else. The corpus is just files on disk.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Cloud today, local tomorrow — the part I care about most&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
The demo calls a cloud model (Claude) for generation. But the LLM is the one part of a RAG pipeline that is genuinely swappable: nothing else in the system cares which model answers. Change two lines of config and the exact same pipeline runs against a local model with Ollama — so the whole thing can run on one machine with no data leaving it. That matters for the real reason most people want RAG over their own documents: privacy. A lawyer’s contracts, a doctor’s records, a company’s internal documents — none of that should reach a cloud API.&lt;/p&gt;

&lt;p&gt;This live soccer demo is the cheap, public proof that the pipeline works. The same architecture, pointed at a local model, is what you’d actually use for private data.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;What it’s good at — and not&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
RAG shines when an answer lives in one or two pieces of your content: what was the score of X, who scored in Y. It’s weak on questions that require synthesizing across your entire corpus at once — it only sees what it retrieves. And it can still be wrong: retrieval can miss, or the model can misread what it got. That’s exactly why every answer in the demo cites its source. Grounding helps enormously; verification is still yours.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Try it, or dig in&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Demo: WorldCup.GetToKnowYourOwnData.com — free, no signup. Try to break it and tell me where retrieval falls down.&lt;/p&gt;

&lt;p&gt;The demo is the worked example from a book I wrote on building your own RAG end to end (in Delphi, and Python) — Get to Know Your Own Data with RAG — and the companion code will be free on GitHub (when book is published this month).&lt;/p&gt;

&lt;p&gt;If you take one thing from this: before you install a vector database and a framework, try the boring version. A file, a SQL query, and a prompt go a remarkably long way.&lt;/p&gt;

</description>
      <category>rag</category>
      <category>ai</category>
      <category>sqlite</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
