Semantic Search — How ProjectBrain Finds What You Mean

The filing cabinet problem

Imagine your project's knowledge base as a massive library with thousands of books, each one containing facts, decisions, and lessons learned by your team. The challenge? There’s no universal catalog. Every book is shelved by whatever label the author thought made sense at the time.

When you need to find something, you rarely remember the exact phrase that was used. You search for "token expiration" and miss the entry titled "auth session handling." You search for "rate limit" and miss the fact logged as "API throttling ceiling is 1000 req/min." The answer is there. You just can’t reach it.

How most search works — and where it falls short

Most search systems operate on exact word matching. The technical term is lexical search.

The idea is simple: take the words in your query, find documents that contain those words, and rank them by how often the words appear.

If you search for "rate limit," you get back entries that literally contain the words "rate" and "limit." If someone logged a fact called "API throttling ceiling is 1000 requests per minute," you won't find it — even though it's exactly what you were looking for.

Lexical search has real strengths. It's fast, reliable, and perfect for exact identifiers. If you need to find a specific ticket number, an error code, or a function name, word-matching is what you want.

But for a knowledge base full of human-authored notes, decisions, and procedures, literal word matching misses half the content.

A different approach: search by meaning

In recent years, semantic search powered by vector embeddings has become accessible and practical for most teams.

Here is the idea. Modern AI models can read a piece of text and produce a numerical fingerprint — a list of hundreds of numbers that represents the meaning of the text. Similar meanings produce similar fingerprints. Different meanings produce very different ones.

When you store a fact in ProjectBrain, we run it through OpenAI's embedding model and save this numerical fingerprint alongside the text. When you search, we fingerprint your query the same way. Then we find the stored entries whose fingerprints are most similar to yours.

Because the fingerprints encode meaning rather than words, this works even when the vocabulary is completely different. "Rate limit," "API throttling ceiling," and "maximum requests per minute" all point to the same region in meaning-space. The search finds all of them.

Here's a real example from our own knowledge base. We logged this fact:

Docker test stage must reset ENTRYPOINT inherited from production stage
When a Dockerfile test stage extends a production base stage that sets ENTRYPOINT, the test stage inherits it. This causes docker compose run to pass the test command as arguments to the production entrypoint instead of executing it directly.

If you search for "run tests locally docker compose", a lexical search on that query finds it because "docker" and "compose" appear in the title. But if you search for "test container starts server instead of running pytest" — which is the actual symptom someone debugging this would type — a lexical search finds nothing. Semantic search finds it immediately, because the meaning of those two descriptions is the same.

The problem with semantic-only search

Semantic search sounds perfect. Why not just use it for everything?

Because it has its own blind spots.

Semantic search relies on your embeddings being up to date. A newly added entry needs to be indexed before it can be found. And the embedding model sometimes misses on very technical content — exact identifiers, version numbers, and project-specific abbreviations that have no semantic neighborhood in the training data.

If someone on our team logged a fact about migration revision 053_task_id_facts_skills, a semantic search for that exact string might rank it lower than other migration-related entries. Lexical search would nail it immediately.

The two approaches are genuinely complementary.

How we combined them

ProjectBrain's search uses both — and then ranks the combined results using four signals. The weights below were tuned empirically against real search sessions on our own knowledge base:

Semantic similarity (55%) is the dominant factor when embeddings are available. It captures meaning, synonyms, and conceptual proximity.

Lexical overlap (25%) handles exact matches — identifiers, code snippets, specific error messages. This is our Elasticsearch-style fallback.

Recency (15%) gives newer entries a boost. A fact logged last week is more likely to be current than one from six months ago.

Task linkage (5%) is a small tiebreaker: entries linked to specific tasks in the project rank slightly higher than general, free-floating knowledge.

What this looks like in practice

Here are two real searches we ran against ProjectBrain's own knowledge base after building this feature.

Query: "run tests locally docker compose"

Rank	Entry	Type	Score
1	Run tests locally using docker compose (matches CI)	Skill	0.58
2	Containerise CI test runs with docker compose	Decision	0.52
3	Docker test stage must reset ENTRYPOINT inherited from production stage	Fact	0.52

The top three results are exactly the three entries we logged earlier that day. They weren't the most recent entries in the system, and they didn't use the same phrasing as the query. But they matched the meaning.

Query: "git hooks enforce lint before push"

Rank	Entry	Score
1	Store git hooks in .githooks/ and activate via core.hooksPath	0.55

A 10-point gap to the next result. No other entry came close.

Why transparency matters

One thing we were careful about: every search result includes a score breakdown. You can see exactly how much of the score came from semantic similarity, lexical overlap, recency, and task linkage.

This matters for a couple of reasons.

First, it builds trust. When an agent retrieves knowledge and acts on it, you want to understand why that entry was selected. "Semantic similarity: 72%, also linked to the current task" is a lot more trustworthy than "it came from the search."

Second, it makes the system debuggable. If a result that should rank first is coming in third, the breakdown tells you exactly which signal is dragging it down. Maybe the entry is old and needs refreshing. That's a fixable problem.

What this means for agents

For AI agents working through ProjectBrain, the search improvement has a direct effect on session startup quality.

When an agent begins a session with an intent — say, "implement the new billing flow" — the context tool now runs a semantic search behind the scenes. Instead of returning the most recently logged entries, it returns the entries most relevant to billing: the rate limit facts, the payment gateway decisions, the deployment skill for this service.

The agent starts with the right context instead of the most recent context. In practice, that means fewer cases of an agent re-discovering something the team already knew, and fewer cases of contradicting a decision that was logged months ago.

If you're already using ProjectBrain, your existing knowledge base is already indexed. The next agent session you run will pull in the most contextually relevant entries for whatever it's working on — not the most recent ones, the most relevant ones. You don't need to do anything.

If you're not yet using ProjectBrain, get started here.