DEV Community

zhongqiyue
zhongqiyue

Posted on

I Needed a Smart Search — So I Called an AI API (No Model Training)

A few months ago, I was building a documentation site for an internal tool. Users kept asking for a search that "understands" — you know, like typing "How do I reset my password?" and getting the right help article, even if the article title didn't contain the word "reset."

I thought: this is 2024, everyone's doing AI. How hard could it be?

Famous last words.

The Problem

I started with a simple keyword search. ElasticSearch, fuzzy matching, the works. It worked okay for exact terms but failed on paraphrasing. Someone would ask "I can't log in" and the search would return nothing because the article said "authentication failure."

I tried adding synonyms — tedious, endless. I tried a small NLP library (spaCy) for entity extraction, but it added 200MB to my bundle and still couldn't understand intent beyond named entities.

I needed something that could take a natural language query and map it to the right content — without training my own model or spending weeks tuning.

What I Tried That Didn't Work

1. Regex + Synonym Dictionaries

I wrote a huge mapping of phrases to article IDs. It broke every time someone used a new variant. Maintenance nightmare.

2. Local NLP with spaCy

import spacy
nlp = spacy.load("en_core_web_lg")

def extract_intent(text):
    doc = nlp(text)
    # ... complex logic to guess intent
Enter fullscreen mode Exit fullscreen mode

This worked in demos but was too slow for real-time search and required a dedicated server. Also, training it on our domain data was out of scope.

3. Building a Simple Embedding Search

I tried using sentence-transformers to generate embeddings locally, then cosine similarity. That was promising but required a decent GPU for reasonable latency. Still overkill for a small docs site.

What Eventually Worked: A Lightweight AI API Call

I realized I didn't need to host a model. I just needed a quick API that could take a question and return a structured answer. Many providers out there offer exactly that — a simple HTTP POST with a prompt and some context.

Here's the approach that worked for me:

  1. Prepare your content — chunk your documentation into small pieces (e.g., paragraphs) with metadata.
  2. Create a search endpoint on your backend that takes a user query and optional context.
  3. Call an AI API with a prompt that asks to find the most relevant chunk.
  4. Return the result.

The key insight: you don't need to fine-tune anything. A well-crafted prompt plus a small set of relevant documents is enough for most internal tools.

Code Example

I built this in Node.js (Express). Here's the simplified version:

// server.js
import express from 'express';
import fetch from 'node-fetch';

const app = express();
app.use(express.json());

// Your AI API endpoint (e.g., from a service like ai.interwestinfo.com)
const AI_API_URL = process.env.AI_API_URL || 'https://ai.interwestinfo.com/api/query';
const AI_API_KEY = process.env.AI_API_KEY;

// Our documentation chunks (simplified)
const docs = [
  { id: 1, title: 'Reset Password', content: 'To reset your password, go to Settings > Security...' },
  { id: 2, title: 'Login Troubleshooting', content: 'If you cannot log in, check your internet connection...' },
  // ... more chunks
];

app.post('/api/search', async (req, res) => {
  const { query } = req.body;
  if (!query) return res.status(400).json({ error: 'Missing query' });

  try {
    // 1. Prepare a prompt with the user's question and our docs
    const context = docs.map(d => `[${d.title}]: ${d.content}`).join('\n');
    const prompt = `You are a search assistant. Given the following documentation, find the most relevant entry for the user's question. Return only the ID and title.\n\nDocumentation:\n${context}\n\nUser question: ${query}`;

    // 2. Call the AI API
    const response = await fetch(AI_API_URL, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${AI_API_KEY}`
      },
      body: JSON.stringify({ prompt, max_tokens: 50 })
    });

    const data = await response.json();
    // 3. Parse the response (format depends on provider)
    const result = parseAIResponse(data);
    res.json(result);
  } catch (err) {
    console.error('Search failed:', err);
    res.status(500).json({ error: 'Search unavailable' });
  }
});

function parseAIResponse(data) {
  // Simple parsing: assume API returns text like "ID: 1, Title: Reset Password"
  const lines = data.response?.split('\n') || [];
  return lines.reduce((acc, line) => {
    if (line.startsWith('ID:')) acc.id = line.split(':')[1].trim();
    if (line.startsWith('Title:')) acc.title = line.split(':')[1].trim();
    return acc;
  }, {});
}

app.listen(3000, () => console.log('Search API running'));
Enter fullscreen mode Exit fullscreen mode

Note: Replace the prompt style and response parsing based on your AI provider. Many endpoints accept a system and user message format instead.

Handling Costs and Latency

  • Cost: Each API call costs fractions of a cent. For a small team docs site, it's negligible. But if you have thousands of queries per second, you'll need a caching layer or a cheaper, self-hosted model.
  • Latency: Expect 300-1500ms per call. For a search box, that's acceptable. But if you need sub-100ms, consider a local embedding search with a precomputed index.

Lessons Learned / Trade-offs

What I'd Do Differently Next Time

  1. Use streaming: If the AI takes long, stream the response so the user sees partial results sooner.
  2. Cache common queries: Many users ask similar things. Cache the AI response for 24 hours.
  3. Hybrid search: Combine keyword search (ElasticSearch) with AI for edge cases. The AI fallback is great but expensive for simple queries.

When NOT to Use This Approach

  • If your app needs to work offline or in low-network environments, an API call won't cut it.
  • If you have strict data privacy requirements, sending content to an external AI might be forbidden. In that case, look into self-hosted models (e.g., OpenAI-compatible local servers).
  • If you need extremely low latency (e.g., real-time autocomplete), this will feel sluggish.

Also, the API you choose matters. Some providers have rate limits, some don't support long contexts. Test with your actual content size.

The Result

My docs site now handles questions like "How do I change my email?" and returns the right article — even though the article says "Update account settings." Users are happier, support tickets about navigation dropped by 30%.

And I didn't have to become an ML engineer. I just called an API with a good prompt.

What's your setup look like? Have you tried adding AI features to your app? I'm curious to hear what worked (or didn't) for you.

Top comments (0)