Adding Vector Search to a Zero-Dependency Python Package

#python #ai #vectorsearch #opensource

Last week I built agent-memory, a lightweight memory system for AI agents. It started with TF-IDF keyword search — simple, fast, zero dependencies.

But keyword search has limits. "What did I learn about deployment?" won't match "Figured out how to ship to production." I needed semantic search.

The obvious answer: sentence-transformers + numpy. But that's 2GB of PyTorch for a 672-line package. The whole point was zero dependencies.

Here's how I added vector search without adding a single dependency.

The Architecture

User configures embedding API (optional)
         ↓
    add() → text → HTTP POST /v1/embeddings → vector
         ↓
    vectors.jsonl (id + float array)
         ↓
    search() → query → embed → cosine similarity → ranked results

The key insight: embeddings are an API call, not a local computation. OpenAI, Cohere, Jina, and dozens of providers all expose the same /v1/embeddings endpoint. Use urllib (stdlib) to call it.

Pure Python Cosine Similarity

No numpy needed:

def cosine_similarity(a, b):
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return dot / (norm_a * norm_b)

For a typical agent memory store (hundreds of entries, 1536-dim vectors), this runs in single-digit milliseconds. You don't need BLAS for 500 dot products.

Three Search Modes

Keyword (TF-IDF) — fast, exact matching, no API calls:

mem.search("dark mode", mode="keyword")

Vector — semantic similarity via embeddings:

mem.search("UI preferences", mode="vector")

Hybrid — weighted blend (0.4 keyword + 0.6 vector):

mem.search("settings", mode="hybrid")

When no embedding API is configured, everything falls back to keyword search. Zero-config degradation.

The TF-IDF Bug Nobody Talks About

While building this, I found a subtle bug in my TF-IDF implementation.

The standard IDF formula: log(N / df). Many implementations use smoothing: log((N + 1) / (df + 1)).

The problem: with 1 document where df=1, you get log(2/2) = log(1) = 0. Every term scores zero. Single-document search is broken.

The fix: log((N + 1) / (df + 0.5)). With N=1, df=1: log(2/1.5) ≈ 0.29. Not zero.

This is a known issue in BM25 literature (Okapi BM25 uses df + 0.5), but most toy implementations copy the wrong formula.

Configuration

Embedding config goes in .agent-memory/config.json:

{
  "embedding": {
    "api_base": "https://api.openai.com/v1",
    "api_key": "sk-...",
    "model": "text-embedding-3-small"
  }
}

Or environment variables: AGENT_MEMORY_EMBEDDING_API_BASE, AGENT_MEMORY_EMBEDDING_API_KEY.

Works with any OpenAI-compatible API — local Ollama, Jina, LiteLLM proxy, whatever.

What I Learned

stdlib is underrated. urllib.request handles 90% of HTTP needs. math.sqrt is fine for cosine similarity.
Optional > Required. Vector search enhances; keyword search is the floor. Never break the simple path.
Small corpuses don't need numpy. Profile before you import.
Test with mocks. All 10 vector tests use mock embeddings (deterministic hash vectors). No API calls in CI.