Alan West

Posted on Mar 25 • Edited on Mar 29

How to Stop Over-Engineering with AI When a Simple Query Will Do

#engineering #ai #pragmatism #problemsolving

I spent three days last month building an AI-powered search feature for an internal tool. Embeddings, vector database, retrieval-augmented generation — the whole stack. My teammate looked at it and said, "Couldn't you just use PostgreSQL full-text search?"

He was right. The dataset was 12,000 records with well-structured fields. I'd built a Ferrari to drive to the mailbox.

If you've caught yourself reaching for an LLM API or embedding model before even considering whether the problem needs it, this post is for you. Let's walk through how to diagnose over-engineering with AI and apply the right tool for the job.

The Root Cause: Solution-First Thinking

The core problem isn't AI itself — it's that we've started working backwards. Instead of asking "what does this feature need to do?" we're asking "how can I use AI here?" I've done it. You've probably done it too.

This leads to three predictable failure modes:

Unnecessary latency — an LLM call adds 500ms-3s to something that could resolve in 5ms
Unpredictable costs — token-based pricing on a feature that gets hit 10,000 times a day adds up fast
Brittleness — non-deterministic outputs in places where you actually need deterministic behavior

Let me show you what I mean with real examples.

Example 1: Search That Doesn't Need Vectors

Here's what I originally built for that internal tool:

import openai
import numpy as np

def search_documents(query, documents):
    # Generate embedding for the search query
    query_embedding = openai.embeddings.create(
        model="text-embedding-3-small",
        input=query
    ).data[0].embedding

    # Compare against every document embedding
    results = []
    for doc in documents:
        similarity = np.dot(query_embedding, doc["embedding"])
        results.append((doc, similarity))

    return sorted(results, key=lambda x: x[1], reverse=True)[:10]

This works, but it requires maintaining embeddings for every document, calling an external API on every search, and paying per request. For a structured dataset of employee records where people search by name, department, and role, this is absurd.

Here's what replaced it:

-- Create a tsvector column for full-text search
ALTER TABLE employees ADD COLUMN search_vector tsvector;

UPDATE employees SET search_vector =
    to_tsvector('english',
        coalesce(name, '') || ' ' ||
        coalesce(department, '') || ' ' ||
        coalesce(role, '') || ' ' ||
        coalesce(location, '')
    );

-- Add a GIN index so queries are fast
CREATE INDEX idx_employee_search ON employees USING GIN(search_vector);

-- Now search is instant and free
SELECT name, department, role
FROM employees
WHERE search_vector @@ plainto_tsquery('english', 'backend engineer london')
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'backend engineer london')) DESC
LIMIT 10;

Zero external dependencies. Sub-millisecond response times. Completely deterministic. The PostgreSQL full-text search docs cover this thoroughly — it handles stemming, ranking, and even fuzzy matching out of the box.

Example 2: Classification That's Really Just Pattern Matching

Another one I see constantly: using an LLM to classify or categorize things that follow clear rules. A colleague was sending every incoming support ticket to an API to determine priority level. The prompt was something like "classify this ticket as low, medium, or high priority."

The actual logic, once we sat down and wrote it out:

import re

# Keywords determined by looking at 6 months of ticket data
HIGH_PRIORITY_PATTERNS = [
    r"(can'?t|cannot|unable to)\s+(log\s*in|access|authenticate)",
    r"(data\s+loss|missing\s+data|deleted)",
    r"(outage|down|5[0-9]{2}\s+error|service unavailable)",
    r"(security|breach|unauthorized)",
]

MEDIUM_PRIORITY_PATTERNS = [
    r"(slow|latency|timeout|performance)",
    r"(bug|broken|doesn'?t work)",
    r"(error|exception|crash)",
]

def classify_ticket(text):
    text_lower = text.lower()

    for pattern in HIGH_PRIORITY_PATTERNS:
        if re.search(pattern, text_lower):
            return "high"

    for pattern in MEDIUM_PRIORITY_PATTERNS:
        if re.search(pattern, text_lower):
            return "medium"

    return "low"  # default for feature requests, questions, etc.

Is this as nuanced as an LLM? No. But it's consistent, testable, instant, and costs nothing. When the rules need updating, you add a regex — you don't retune a prompt and hope the output stays stable across model versions.

The Decision Framework

After burning myself a few times, I started running through these questions before reaching for any AI tool:

1. Is the input structured or unstructured?

If your data has clear fields and types, traditional queries and logic almost always win. AI shines with genuinely unstructured input — free-form text where the meaning is ambiguous, images, audio.

2. Do you need deterministic output?

If the answer to "should the same input always produce the same output?" is yes, think hard before using a model. LLMs are probabilistic by design. You can set temperature to 0, but even then, outputs can vary across model versions.

3. What's the latency budget?

An API call to a hosted model is at minimum 200-500ms for simple tasks. If your feature sits in a hot path or user-facing flow where every millisecond matters, that's a real cost.

4. Could you write the rules in an afternoon?

If a senior developer on your team could sit down and write the business logic in a few hours, do that first. You can always add AI later when the simple version proves insufficient.

5. What happens when the AI is wrong?

This is the big one. If an incorrect classification, search result, or summary causes a real problem — data corruption, security issues, billing errors — you need guarantees that a probabilistic model can't give you.

When AI Actually Is the Right Call

I'm not saying never use it. There are problems where AI is genuinely the best tool:

Summarizing long documents — you can't regex your way through condensing a 50-page report
Semantic search over truly unstructured data — when users need to find things by meaning, not keywords
Code generation and transformation — turning natural language specs into boilerplate
Content moderation at scale — where the rules are too nuanced and context-dependent to hardcode

The key is reaching for it because the problem demands it, not because it's the shiny tool in your toolbox.

Prevention: A Pre-Flight Checklist

I keep this taped to my monitor. Seriously.

Before adding any AI/ML component to a feature:

[ ] Write down exactly what the feature needs to do in plain English
[ ] Attempt a solution using only standard library tools and your database
[ ] If that fails, try a well-established non-AI library (regex, full-text search, rule engines)
[ ] If that fails, prototype with AI but measure latency, cost, and accuracy
[ ] Compare the AI prototype against the simpler alternatives with real data

If you can't articulate why the AI version is meaningfully better after running through this, you don't need it.

The Boring Truth

The most productive engineering decision I've made this year was deleting an AI feature and replacing it with 40 lines of SQL. It was faster to build, faster to run, easier to debug, and hasn't needed a single prompt adjustment since.

I'm not bored of AI. I'm bored of using it as a hammer when I'm not even looking at a nail. The best engineers I know treat AI the same way they treat any other tool — with a healthy dose of "do I actually need this?" before writing the first line of code.

Start with the simplest thing that works. Add complexity only when the simple thing fails. That advice is older than machine learning itself, and it still applies.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.