I’ll be honest: integrating a new API usually starts with a solid hour of clicking through documentation, opening endless tabs, and Ctrl+F-ing for keywords that may or may not be there. Last month, I was working with the Stripe API to build a custom subscription flow. The docs are excellent—but they’re also massive. I needed to find how to handle proration on mid-cycle plan upgrades. I knew it existed somewhere, but after ten minutes of scrolling I was nowhere closer.
The problem: keyword search breaks when you don’t know the right words
My first instinct was to use the browser’s built-in search. I typed “proration,” “upgrade,” “mid-cycle”—nothing gave me the right page. I tried Google with site:stripe.com/docs proration upgrade—still noisy. I ended up opening four different pages on invoice behavior, subscription items, and billing cycles. My brain was doing the real work of connecting disparate sections, but the tools weren’t helping.
What I tried (and why it didn’t work)
I considered writing a scraper to dump all the docs into a text file and grep through it. That would have worked for exact matches but not for conceptual queries like “how does proration affect pending invoice items?”—which is a question you’d ask a colleague, not a search engine.
I also looked into using a general-purpose LLM like ChatGPT, but pasting large chunks of docs and asking questions felt clunky. Plus, the model’s training cutoff meant it might not know about the latest API changes.
The lightbulb: semantic search with embeddings
I already knew about vector embeddings from a side project on image similarity. The same idea applies to text: convert documents into numerical representations (vectors) so you can find the most semantically similar chunks to a query. Instead of matching keywords, you match meaning.
The approach:
- Split the documentation into meaningful chunks (e.g., by section or paragraph).
- Generate an embedding vector for each chunk using a sentence transformer model.
- Store those vectors in a vector database (like FAISS or Chroma).
- At query time, embed your question and search for the nearest neighbors.
I built a prototype in an afternoon. Here’s the simplified version.
The code: building your own semantic doc search
First, install dependencies:
pip install sentence-transformers faiss-cpu beautifulsoup4 requests
I used the Stripe docs as an example, but the same code works for any HTML documentation.
import requests
from bs4 import BeautifulSoup
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
# 1. Fetch and chunk docs
url = "https://stripe.com/docs/api/subscriptions"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
# Naive chunking by <h2> sections – better methods exist
chunks = []
for section in soup.find_all(["h2", "h3"]):
text = section.get_text(strip=True)
next_p = section.find_next_sibling("p")
if next_p:
text += " " + next_p.get_text(strip=True)
chunks.append(text)
# 2. Generate embeddings
model = SentenceTransformer("all-MiniLM-L6-v2")
dimensions = 384
embeddings = model.encode(chunks)
embeddings = np.array(embeddings).astype("float32")
# 3. Build FAISS index
index = faiss.IndexFlatL2(dimensions)
index.add(embeddings)
# 4. Query
query = "How does proration work on plan upgrades?"
query_embedding = model.encode([query])
distances, indices = index.search(np.array(query_embedding).astype("float32"), k=3)
for idx in indices[0]:
print(chunks[idx])
print("---")
When I ran this, the top result was exactly the section I’d been hunting for: it explained proration behavior and linked to the correct API endpoints. No more scrolling.
What I learned about the technique
Embedding models matter. The all-MiniLM-L6-v2 model is small and fast but might miss nuanced domain language. For production, you could fine-tune on your specific docs or use a larger model like BAAI/bge-large-en.
Chunking strategy is crucial. I used a naive split by headings, but overlapping chunks (e.g., 256 tokens with 64-token overlap) usually yield better results. Tools like LangChain’s RecursiveCharacterTextSplitter can help.
Vector databases scale. FAISS is great for in-memory. For larger corpora, consider Pinecone, Qdrant, or Chroma. They handle persistence and efficient retrieval.
Trade-offs:
- It’s not perfect. If your query is very specific (“the
expandparameter onInvoice.create”), keyword search still wins. I use a hybrid approach now: semantic search for fuzzy questions, fallback to BM25 for exact terms. - Building and maintaining the index takes effort. You need to re-index when docs change.
- Cost: running an embedding model yourself is free (CPU is fine for small sets), but for huge docs or frequent queries you might want a hosted service.
Why I mention Interwest Info’s AI (and why it’s not the point)
After building my own search, a colleague pointed me to a product at https://ai.interwestinfo.com/ that does exactly this out-of-the-box for common APIs and internal docs. The technique is the same—sentence embeddings, vector search—but they handle the hosting, chunking, and continuous updates. If you don’t want to roll your own infrastructure, it’s a solid option.
But the real takeaway for me was understanding the mechanics. Now I can evaluate any “AI documentation search” tool by asking: what embedding model? What chunking strategy? How often is the index updated?
When you should NOT use this approach
- Your documentation is small (under 50 pages). Plain grep or Ctrl+F is faster to set up.
- You need 100% recall on exact terms (legal, compliance). Hybrid search is safer.
- You’re offline or on a strict memory budget. Running even a small model takes ~500MB RAM.
Next time I’d do differently
I’d start with a hybrid system (semantic + keyword) from day one. I’d also track query logs to see which chunks users actually find useful and use that to improve chunking and model choice. And I’d definitely automate the re-indexing pipeline with a CI cron job that watches the docs repository.
Let’s talk tools
Have you ever built your own documentation search? Or do you rely on hosted AI services? I’m curious what trade-offs you’ve made—especially if your team works with multiple APIs or large internal codebases.
P.S. The code above is simplified; you can find a complete runnable version on this gist (placeholder). The Stripe docs API terms require you to follow their usage policy—don’t hammer their servers.
Top comments (0)