A few weeks ago, I decided to stop just reading about vector databases and actually build something with them.
Not a tutorial clone. Not a copy-paste project.
Something messy, slightly broken, and real.
This is a write-up of what I built, what I learned the hard way, and what I would not do again.
Why I Started This
Everywhere I looked, people were talking about embeddings, semantic search, and AI-powered retrieval.
But most tutorials felt shallow. They showed how to use a library, not why things work or what breaks when you go off-script.
So I set a simple goal:
Build a small vector database project that actually solves a problem.
The Project Idea
I built a semantic search engine for personal notes.
Instead of keyword search, I wanted to search like this:
- “notes about scaling backend”
- “ideas I wrote about startups”
- “that thing about caching I wrote last week”
Even if those exact words weren’t in the note.
The Basic Setup
Here’s what I used:
- Python
- An embedding model (I started with OpenAI embeddings, later tried local ones)
- A vector store (initially FAISS)
- A simple API layer
The flow looked like this:
- Take a note
- Convert it into an embedding (vector)
- Store it in the database
- When searching:
- Convert query to embedding
- Find closest vectors
- Return matching notes
Simple in theory.
Not so simple in practice.
What I Learned (The Real Stuff)
1. Embeddings Are Not Magic
At first, I thought:
“Same meaning = always similar vectors”
Not true.
Small wording differences sometimes gave weird results.
Example:
- “How to scale a backend” worked well
- “Handling traffic spikes” returned unrelated notes
Lesson:
You don’t just rely on embeddings. You sometimes need:
- better chunking
- metadata
- re-ranking
2. Chunking Matters More Than I Expected
Initially, I stored full notes as single vectors.
Bad idea.
Long notes diluted meaning.
When I switched to smaller chunks (like 200–500 words), search improved a lot.
But then a new problem appeared:
Too many chunks = noisy results
So there’s a balance:
- Too big → vague results
- Too small → fragmented context
3. Vector Search Alone Is Not Enough
I assumed nearest neighbor search would solve everything.
It didn’t.
Sometimes results were “technically similar” but not useful.
What helped:
- adding metadata filters
- sorting by both similarity and recency
- sometimes even mixing keyword search
4. Performance Sneaks Up on You
At the beginning, everything felt fast.
Then I added more data.
Search slowed down. Memory usage increased.
Things I learned:
- Index type matters (FAISS has options, and they behave very differently)
- You don’t notice problems until you scale even a little
- Testing with 50 records is meaningless
5. Local vs API Embeddings Is a Tradeoff
I tried both:
API-based embeddings
- Easy
- High quality
- Costs money
Local models
- Free after setup
- Slower (on my machine)
- Sometimes lower quality
There’s no “best” option. It depends on:
- your budget
- latency requirements
- privacy needs
Mistakes I Made (So You Don’t Have To)
Treating It Like a Normal Database
This is not SQL.
You don’t query exact matches. You deal with probabilities.
That mindset shift takes time.
Ignoring Evaluation
At first, I just “felt” like results were good.
That’s dangerous.
You need test queries like:
- expected input → expected output
Otherwise you’re just guessing.
Overengineering Too Early
I wasted time trying:
- fancy pipelines
- multiple models
- complex ranking
When a simple setup worked fine.
Not Logging Results
I didn’t log queries at first.
Big mistake.
Logs help you understand:
- what users search
- where results fail
- patterns you didn’t expect
What I’d Do Differently
If I started again, I’d:
- Start with a very small dataset
- Add proper evaluation early
- Keep chunking simple
- Avoid premature optimization
- Focus on real use cases, not benchmarks
Final Thoughts
Building this project changed how I think about search systems.
Vector databases are powerful, but they’re not plug-and-play magic.
You still need:
- good data
- thoughtful design
- constant iteration
If you’re planning to build something similar, my advice is simple:
Don’t try to be perfect. Try to be real.
Build something small. Break it. Fix it. Repeat.
That’s where the actual learning happens.

Top comments (0)