nicolas.vbgh

Posted on Feb 20 • Edited on Feb 23

Who's That Pokémon?

#postgres #ai #python #beginners

How I built a real Pokédex with Postgres, linear algebra, and mass nostalgia*

You know the one. Enormous. Blocks the road. Eats all day and sleeps the rest. You can see it perfectly in your head — the round belly, the closed eyes, the absolute commitment to doing nothing. You just can't remember if it's called Snorlax or Munchlax or Relaxo or something else entirely.

This isn't Alzheimer's. This is being 30 and having played Pokémon Red at age 8. You remember everything about these creatures except the one thing a search engine needs: the exact name.

So you type "big sleepy eater" into the search bar.

Zero results.

The database has "An enormous, rotund Pokémon that spends most of its life eating and sleeping" — which is exactly what you meant — but the search engine doesn't know that. It compares letters. It doesn't read.

This is a stupid problem. It's also completely unsolvable with traditional tools. And I spent a week fixing it with linear algebra, graph theory, and a Postgres extension. For 151 Pokémon.

Worth it? Absolutely not. Did I learn something genuinely useful? Actually, yes.

Search Engines Don't Read

Here's something that sounds obvious once you hear it but that most people — including most developers — never think about.

Every search engine works the same way at its core: you type words, it finds those exact words (or close variations) in the data. If the words match, you get results. If they don't, you get nothing. That's it. There's no understanding involved.

This fails constantly:

You search for "comfy shoes for long days" — zero results, because the product is listed as "ergonomic footwear with memory foam"
You search for "legendary thunder bird" — zero results, even though Zapdos is right there, literally tagged as legendary, literally Electric/Flying type
You search for "psychic genetic clone laboratory" — zero results, even though Mewtwo's description says "created from Mew through horrific genetic experiments"

The meanings are identical. The words are different. The search engine shrugs.

This isn't just a Pokémon problem. This is every e-commerce site where users describe what they want instead of knowing what it's called. You just never notice because people silently give up and blame themselves.

The Trick: GPS for Meaning

Here's the idea that makes everything work. And I promise it's simpler than it sounds.

There's a family of AI models — called embedding models — that have been trained on enormous amounts of text (basically the internet) and have learned something remarkable: how to convert any sentence into a set of coordinates.

Think of it like GPS, but for meaning instead of geography.

When you feed "fire lizard with a flame on its tail" into the model, it returns 384 numbers — coordinates in meaning-space. When you feed "Charmander" into the same model, it returns a different set of 384 numbers — but they're remarkably close to the first set. Because the model understands that these two phrases describe the same thing.

Meanwhile, "ghost haunting shadows" produces coordinates that are, mathematically speaking, on a different continent from "friendly goldfish".

The key principle: similar meanings → nearby coordinates. Different meanings → distant coordinates.

That's it. That's the whole trick. Once you have coordinates for meaning, search becomes a geometry problem:

Pre-compute coordinates for everything in your database.
When someone searches, compute coordinates for their query.
Return the entries with the nearest coordinates.

No keyword matching. No synonym dictionaries. The model already understands that "big sleepy eater" and "enormous rotund Pokémon that spends its life eating and sleeping" are the same idea — because it has read the internet.

Wait, That's All?

Here's the part that surprised me.

You don't need a new database. You don't need a subscription to some AI platform. You don't need a PhD. You need Postgres (the database most companies already run) with one extension called pgvector.

Here's the entire setup. If you're not a developer, don't worry about the syntax — just notice how little there is:

One new column on your existing table:

ALTER TABLE pokemon ADD COLUMN embedding vector(384);

One index to make searches fast:

CREATE INDEX ON pokemon
USING hnsw (embedding vector_cosine_ops);

The search query:

SELECT name FROM pokemon
ORDER BY embedding <=> :query_vector  -- the user's search, converted to coordinates
LIMIT 5;

That <=> symbol measures how close two sets of coordinates are. Postgres sorts by closeness and returns the top 5 matches. That's natural language search in four lines of SQL.

No new service to operate. No API key. No machine learning pipeline. You add a column to your existing database, and it suddenly understands what people mean, not just what they type.

So Does It Work?

I built a proper benchmark to find out. 151 Gen 1 Pokémon, each encoded with its name, category, types, and description. Then 16 search queries designed to be deliberately tricky — none of the query words appear in the target Pokémon's description. If this works, it's purely because the model understands meaning.

Here's what happened:

What I typed	What I wanted	What the database says	Found?
big heavy sleepy gluttonous lazy	Snorlax	"enormous, rotund, eating and sleeping"	#1
legendary thunder lightning bird	Zapdos	legendary, Electric/Flying type	#1
ethereal gaseous invisible ghost haunting	Gastly	"toxic gas, eerie purple haze"	#1
bone skull dead ghost bony	Cubone	"wears skull of deceased mother"	#1
mysterious ancestor pink psychic origins	Mew	"rarest, contains genetic code of every Pokémon"	#1
fire lizard tail flame starter	Charmander	"small orange lizard with flame on tail"	#1
psychic genetic clone laboratory	Mewtwo	"created through genetic experiments"	#1
giant furious blue sea serpent	Gyarados	"fearsome sea serpent, violent temperament"	top 5

15 out of 16: correct answer as the very first result. The last one (Gyarados) lands in the top 5. Zero word overlap between queries and data.

And the search doesn't just find the right Pokémon — it understands families:

query: 'ethereal gaseous invisible ghost haunting'

  1. 0.842  Gastly   [Ghost/Poison]
  2. 0.821  Haunter  [Ghost/Poison]
  3. 0.809  Gengar   [Ghost/Poison]

Gastly, Haunter, Gengar — the ghost evolution line, ranked in order. In 2 milliseconds. From a regular Postgres database.

What the Numbers Actually Mean

I threw a lot of numbers at this benchmark. Here's how to read them without a statistics degree.

The scorecard

Hit@1 — "Was the correct answer the very first result?" This is the metric that matters most. When you search for something, you look at the first result. If it's wrong, the search feels broken, even if the right answer is hiding at position 3.

Hit@5 — "Was the correct answer somewhere in the top 5?" More forgiving. Most people scan a short list. This tells you whether the answer is findable, even if it's not immediately obvious.

MRR@5 (Mean Reciprocal Rank) — a single number that captures where the right answer typically lands. Always #1? MRR is 100%. Always #2? MRR is 50%. Always #5? MRR is 20%. Think of it as "how many results does the user need to scan before finding what they want?"

Recall@10 — this one is different. It doesn't measure search quality at all. It measures whether the index (the thing that makes searches fast) is giving you honest results. 100% means the index isn't cutting corners. Less than 100% means the index is secretly hiding some results from you. More on that shortly.

Four models, two surprises

I tested four AI models. Here are the results:

Model	Brain size	MRR@5	Hit@1	Hit@5
MPNet-base	109M params	90.6%	81%	100%
BGE-small-en	33M params	90.6%	88%	94%
BGE-small (ONNX)	same brain, lighter runtime	90.6%	88%	94%
MiniLM-L6	23M params	82.8%	69%	100%

Surprise #1: bigger isn't always better. MPNet is 3× larger than BGE-small, but BGE-small actually gets the right answer first more often (88% vs 81%). The trade-off: MPNet never loses the right answer entirely — it always shows up somewhere in the top 5. BGE-small misses one query completely (Mewtwo).

So: the small model is more precise, the big model is more reliable. Both are excellent. "Just use the biggest model" isn't the right advice.

Surprise #2: MiniLM is the unreliable genius. 100% Hit@5 (it always finds the right answer) but only 69% Hit@1 (it often buries it behind other results). It's the student who knows the material but writes messy exams. Cheap to run, but you'll need to show more than one result.

The query that stumped everyone

The hardest query: "psychic genetic clone laboratory" → Mewtwo.

Mewtwo's actual description: "The most powerful Pokémon in existence, created from Mew through horrific genetic experiments."

The words "clone" and "laboratory" never appear. The model has to reason: "genetic experiments" implies a laboratory. The output of genetic experiments is a clone. That's not word matching — that's understanding implications.

The largest model (MPNet) nails it: first result. The smaller models struggle. This is where brain size actually matters — not on easy queries like "fire lizard" → Charmander, but on queries that require connecting dots the text doesn't explicitly draw.

The Index That Lies to You

This is the most important section if you're actually going to build this.

pgvector (the Postgres extension) ships two ways to index your data. One of them has a default setting that silently destroys your search quality. No error message. No warning. Just wrong results.

IVFFlat splits your data into groups (say, 50 groups). When you search, it only looks at the closest group. With 50 groups and 151 Pokémon, each group has about 3 Pokémon. You're searching 3 out of 151 — 2% of your data.

The query is fast. Because it's barely doing anything.

The benchmark makes the damage obvious:

Index	Recall	Speed	Search quality (MRR@5)
No index (scan everything)	100%	4.0 ms	90.6%
IVFFlat (default settings)	30%	3.9 ms	56.2%
HNSW (recommended)	100%	2.0 ms	90.6%

That middle row: 30% recall means 70% of the correct results are hidden. The search quality drops from 90.6% to 56.2%. And it's not even faster — 3.9ms vs 4.0ms without any index.

HNSW (the other index): 100% recall, identical quality to exhaustive search, and it's actually the fastest of the three at 2.0ms. It builds a navigable graph that connects similar entries, and walks that graph at query time. No configuration needed.

The nasty thing about IVFFlat: your queries still return results. They look normal. There are no errors. You just quietly get the wrong Pokémon, your users think the search is bad, and nobody figures out it's the index configuration.

Use HNSW. Always.

Choosing a Model

The model converts text to coordinates. It's the only piece that runs outside Postgres. Two practical choices:

Option A — Maximum quality: all-mpnet-base-v2

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-mpnet-base-v2")
vector = model.encode(["legendary thunder bird"])

109M parameters, 768 dimensions. Requires PyTorch (~1.5 GB install). Loads in 3–4 seconds. 100% Hit@5 in every single test. Best choice for a server that stays running.

Option B — Lightweight: BAAI/bge-small-en-v1.5 via FastEmbed

from fastembed import TextEmbedding
model = TextEmbedding("BAAI/bge-small-en-v1.5")
vector = list(model.embed(["legendary thunder bird"]))

33M parameters, 384 dimensions. No PyTorch — runs on ONNX (~100 MB install). Loads in under a second. Slightly weaker on the hardest queries. Best choice for serverless, containers, or anything where startup time matters.

Both output a list of numbers. Both go into the same Postgres column. Both use the same SQL query. The choice is about your deployment, not your architecture.

What It Costs

Nothing is free. Here's what you're actually paying for.

Storage. Each row gains 1.5 KB (at 384 dimensions) or 3 KB (at 768). One million rows: 1.5–3 GB extra. Non-trivial, but manageable by modern Postgres standards.

Ingestion speed. Encoding 151 Pokémon takes 2–12 seconds depending on the model. It scales linearly. For a million items, plan for a background job and a coffee break.

Memory. The AI model lives in RAM while it's running. PyTorch models: 200–400 MB. FastEmbed: much less. If you're running serverless, this is your cold-start bottleneck.

Query overhead. Every search converts the user's text to coordinates before asking Postgres. A few milliseconds on a warm server. On a cold start, add the model load time.

None of these are dealbreakers. But they're not zero, and you should know what you're signing up for.

When to Outgrow This

This approach works beautifully for thousands to a few million entries. Beyond that:

Tens of millions of entries with strict latency requirements → look at dedicated vector databases (Qdrant, Weaviate)
If vector search becomes the primary workload, not a feature → dedicated infrastructure starts to make sense

But those are scaling problems. For most applications — product search, knowledge bases, recommendation engines, and yes, Pokédexes — pgvector inside Postgres is more than enough. Start simple.

Fine Print (Before You Get Too Excited)

A few things I conveniently glossed over while showing you impressive results:

151 is not a lot. This benchmark runs on 151 Pokémon. Your product catalog has 2 million SKUs. Will it scale? Probably. Will the results be as clean? Honestly, I don't know. 151 entries is a demo, not a stress test.

English only. Every model tested here is English. If your users search in French, Japanese, or a mix of both, you need multilingual models — and those are a different conversation with different trade-offs.

Pokémon are easy to search. Think about it: a fire-breathing dragon, an electric mouse, a ghost made of toxic gas. These are absurdly distinctive creatures with exaggerated, unmistakable characteristics. Real-world data — beige office chairs, slightly different insurance policies, 47 variations of the same running shoe — is much harder to distinguish. Don't expect 88% Hit@1 on your SaaS knowledge base.

And the most important thing: it's Postgres. Vector search is a nice trick. But you're still inside a full relational database. You can combine <=> with WHERE type = 'Fire', AND legendary = true, OR generation = 'Gen I'. Filter by category, join with your user table, add a LIMIT based on user preferences — all in the same query. That's not a limitation. That's the whole point. You don't choose between smart search and structured queries. You get both. Because it's a database, baby.

Now Imagine What Else You Could Do

This was 151 Pokémon. A toy. But the technique is the same whether you're searching fictional creatures or anything else. And "anything else" is where it gets interesting.

Let customers describe what they need, not what it's called. A customer walks into a hardware store's website and types "something to hang a heavy mirror on a plaster wall". The product is listed as "toggle bolt M5 zinc-plated 10-pack". No keyword search connects those. Embeddings will. Same story for every retailer where customers describe a problem instead of naming a product.

Find the right section in documentation. You have 2,000 pages of internal docs. A new hire asks "how do I request access to the staging environment?" — the answer is buried in a paragraph titled "Infrastructure Onboarding Procedures, Section 4.2.b". No keyword search will connect those. Embeddings will.

Search scientific conferences by topic. Thousands of posters and abstracts from a medical conference. A researcher is looking for "immune response in patients with liver transplants" — the relevant poster is titled "Hepatic Allograft Tolerance and T-Cell Mediated Immunity". Same idea, completely different vocabulary. Coordinates don't care about jargon.

Detect near-duplicate entries. Customer support tickets, bug reports, patient records — anywhere humans describe the same thing in different words. Encode everything, find entries whose coordinates are suspiciously close, flag them for review. Deduplication without writing a single regex.

Search across images and text. Multimodal embedding models (like CLIP) encode both images and text into the same coordinate space. Upload a photo, find similar products. Describe what you see, find matching images. The principle is identical — coordinates for meaning — just applied to pixels as well as words.

Go multilingual. Multilingual models encode "chien", "dog", and "Hund" to nearly the same coordinates. One column. One index. Every language at once.

The point is: what works for Pokémon will work for your domain, with your data, in your language. Maybe not with the same model or the same parameters — what's perfect for short product names might underperform on long medical reports. The models are different. The data is different. The right tuning is different.

That's not a problem. That's an invitation.

Clone the repo. Swap the dataset. Swap the model. Run the benchmark. See what works for your data. The whole setup is a few files and a Docker image — you can have results in an afternoon.

What I Actually Learned

I set out to build a stupid toy project. I ended up understanding something real.

The gap between "search for words" and "search for meaning" was genuinely unsolvable for decades. Synonym lists helped a little. Fancy text analysis helped a little more. But the core problem — that "big sleepy eater" and "enormous rotund Pokémon that spends its life eating and sleeping" mean the same thing — that required a breakthrough.

That breakthrough: convert text to coordinates, store them in a column, sort by distance.

The language understanding? Done by researchers who trained models on the internet. The fast vector search? Done by the pgvector team. The storage, indexing, and querying? Done by Postgres, which has been quietly excellent at this since 1996.

If you want to try it yourself, the full code — benchmark, dataset, search demo — is open source: gitlab.com/nicolas.vbgh/python-pg-embeddings.

The bottom line:

Use HNSW (not IVFFlat)
all-mpnet-base-v2 if you want the best quality
BAAI/bge-small-en-v1.5 via FastEmbed if you want the lightest setup with still a good quality
Pack as much meaning as possible into the text you encode
It's one column. One index.

All of this to help someone remember that the big sleepy one is called Snorlax.

I regret nothing.

DEV Community