Benard Otieno

Posted on May 15

Pick Boring Technology. Yes, Especially for AI.

#ai #architecture #tooling #devops

What "Boring" Actually Means

Boring technology does not mean old technology. It does not mean slow, limited, or low-quality. It means technology that has been in production long enough that its failure modes are documented, its operational characteristics are well understood, and the person debugging it at 2am has a reasonable chance of finding a Stack Overflow answer that is not from a beta forum post in 2024.

Postgres is boring. Redis is boring. S3 is boring. A plain HTTP API with JSON is boring. SQLite, for things that fit in SQLite, is boring. None of these things are slow, limited, or embarrassing to use. They are boring because they have been deployed by enough people, at enough scale, for long enough that the surprises have mostly been found. The surface area of "things that can go wrong that nobody has written about" is small.

When Dan McKinley wrote the Choose Boring Technology essay in 2015, he framed it as a budget: you get a limited number of new technologies per project, and you should spend that budget intentionally. That framing is still correct. What's changed is that AI products have a non-negotiable budget item now: the model and the scaffolding around it. That item is expensive. It is genuinely new. It has failure modes that nobody fully understands yet. That is the place you are choosing to spend your novelty budget. Everywhere else, the argument for boring is stronger than it has ever been.

The Vector Database Problem

The most common place I see this play out is in the retrieval layer. A team is building RAG — retrieval-augmented generation, some form of semantic search over a corpus. They need to store embeddings and query them by similarity. There are purpose-built vector databases for this: Pinecone, Weaviate, Qdrant, Chroma. They have impressive benchmarks, polished SDKs, and marketing copy that makes Postgres look like a horse and buggy.

So teams reach for them. Then six months later they are managing two separate databases — Postgres for everything else, Pinecone for vectors — running two separate migration workflows, debugging sync issues between them, and paying for an additional managed service. The team that wanted to move fast has added an operational surface area that requires dedicated attention.

pgvector exists. It is a Postgres extension. It is boring. It stores vectors in Postgres, queries them in Postgres, transactions with them in Postgres. You run one database. You use the migration tooling you already have. You query it with SQL you already know. The performance ceiling is lower than a dedicated system optimised for nothing but ANN search — but the teams I've talked to who hit that ceiling with pgvector are building at a scale where infrastructure complexity is genuinely their problem to manage. Most teams are not those teams.

The right question is not "what is the best vector database." It is "what is the simplest thing that handles my actual query volume, that I can operate with my existing knowledge, that does not require me to manage data consistency across two systems." The answer to that question, for most products, is Postgres.

-- pgvector: you already know how to do this
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id         uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  content    text NOT NULL,
  embedding  vector(1536),
  metadata   jsonb,
  created_at timestamptz DEFAULT now()
);

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Retrieval: pure SQL, same connection pool as everything else
SELECT
  id,
  content,
  metadata,
  1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

That is the retrieval layer for a production RAG system. It is a Postgres query. You already know how to read it, index it, back it up, and monitor it.

Agent Frameworks Are the Same Problem, Bigger

The vector database situation is a contained example. Agent frameworks are the same problem, scaled up.

There are now a meaningful number of agent frameworks in active development: LangChain, LangGraph, AutoGen, CrewAI, Pydantic AI, and several more depending on when you are reading this. They differ in their abstractions for tool calling, memory management, multi-agent coordination, and state persistence. Some of them are good. Some of them are in the process of becoming good. All of them are new enough that you are, to some degree, a beta tester.

The alternative is to not use a framework for the parts that don't require one. The model's tool-calling API is not complicated. You define tools as JSON schemas. The model returns a function call. You route it and return the result. That is the loop. You can implement the core of it in a hundred lines of Python that you wrote, that you understand completely, that has no transitive dependencies you didn't choose.

import anthropic
import json

client = anthropic.Anthropic()

def run_agent(system: str, user_message: str, tools: list, tool_handlers: dict) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            system=system,
            tools=tools,
            messages=messages,
        )

        # No tool use → we're done
        if response.stop_reason == "end_turn":
            return next(b.text for b in response.content if b.type == "text")

        # Process tool calls
        tool_results = []
        for block in response.content:
            if block.type != "tool_use":
                continue
            handler = tool_handlers.get(block.name)
            if not handler:
                raise ValueError(f"No handler for tool: {block.name}")
            result = handler(**block.input)
            tool_results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": json.dumps(result),
            })

        # Extend conversation with model turn and tool results
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

That is a complete agentic loop. No framework. No magic. Every line of it is readable by someone who has never seen it before. When it breaks, you know where to look. When you need to add a checkpoint before a destructive operation, you know exactly where to put it. When a framework update ships a breaking change to how tool results are structured, you are unaffected because you wrote the tool result handling yourself.

Frameworks earn their keep when they solve problems you genuinely have: complex multi-agent coordination, built-in state persistence, graph-based execution flows where you need cycle detection and conditional edges. If you have those problems, use a framework. But reach for your own loop first, and upgrade to a framework when you have a reason, not because the README has a compelling architecture diagram.

The Counterargument Is Real

I want to be honest about the case against this position, because it is not trivial.

Boring technology is not always available in the form you need. pgvector has a performance ceiling. If you are running similarity search across a hundred million vectors with sub-10ms latency requirements, you need a dedicated ANN index and the purpose-built databases are probably worth their operational cost. If your agent coordination is genuinely complex — multiple agents with heterogeneous capabilities, conditional routing based on intermediate state, nested tool calls — a framework that has solved those problems is better than reinventing it.

The real trap is not "using new technology." It is using new technology as the default rather than as the exception. When you reach for Pinecone before asking whether pgvector handles your actual query volume, you have made a choice you probably did not mean to make. The question is whether you made it consciously.

What Changes When AI Is Involved

The argument for boring technology is not new. What AI changes is the urgency of it, for a specific reason: the model is already the source of novel, hard-to-predict behavior in your system. The model hallucinates. The model handles edge cases in ways you did not anticipate. The model's output quality varies with context length, with phrasing, with temperature settings you forgot you changed. The model is a continuous source of surprises, and managing those surprises is the actual engineering work.

When the model is already the unpredictable component, adding unpredictable infrastructure around it is compounding risk. A flaky external API call in your tool chain plus a model that sometimes decides to call that tool three times in a row plus a vector database that occasionally returns inconsistent results under concurrent load is not three small problems. It is three small problems that interact in ways you cannot enumerate in advance.

Boring infrastructure shrinks the problem space. When the retrieval layer is Postgres and the queue is Redis and the API is plain HTTP, the list of things that can behave unexpectedly in hard-to-reproduce ways is shorter. You are not eliminating surprises — the model will still surprise you — but you are constraining where they can come from.

The system that is easiest to debug is not the one with the fewest components. It is the one where the largest number of components have predictable, documented behavior. Build toward that, and let the model be the interesting part.

The Heuristic I Actually Use

When evaluating a new technology for an AI product, I ask three questions before I let it into the stack:

1. What happens when this fails? Not "can it fail" — everything can fail. What does failure look like? Is it a clean error or silent corruption? Is it recoverable without data loss? Is there a runbook for it, or will I be writing one?

2. Can I replace it in a weekend? This is not about whether I will replace it. It is about whether the abstraction is thin enough that swapping the implementation does not require a rewrite. If replacing the vector store requires touching thirty files, the abstraction is wrong.

3. Does my boring alternative exist and have I ruled it out? Postgres, Redis, S3, plain HTTP. If one of these handles the problem, I need a specific reason not to use it — not just a feeling that the new thing is more purpose-built.

If a technology passes all three, it can earn its place in the stack. If it fails the first question and the second and the third, the burden of proof is high.

The teams that ship boring AI products are not the teams that lack ambition. They are the teams that understand where the ambition should go. The model is where the novel bets live. The model is where you spend the engineering attention on failure modes you have never seen before, on evaluation strategies that do not exist in textbooks yet, on product decisions that require genuine taste about AI behavior. That is the hard, interesting work.

Letting the infrastructure be interesting too is not ambitious. It is just expensive.

Make the retrieval layer boring. Make the queue boring. Make the API boring. Let Postgres handle the things Postgres is good at, which turn out to be most things. And spend the attention you just freed up on the part of the system that actually requires it.

Top comments (2)

Leo Harrison • May 16

good

Some comments may only be visible to logged-in visitors. Sign in to view all comments.