Vasiliy Shilov

Posted on Mar 8

Stop Using LLMs for Everything: The Power of Hybrid Architectures

#architecture #systemdesign #ai #softwareengineering

Over the past month my thinking about AI systems changed dramatically.

Many teams are quietly making the same architectural mistake:

They use LLMs for problems that should remain deterministic.

The result is predictable:

higher latency
higher cost
lower reliability
harder debugging

The irony?

Most intelligent systems don't need more AI. They need better architecture.

The common narrative today is simple:

Intelligence = large probabilistic models.

This assumption quietly pushes many teams into a dangerous design mistake: using probabilistic models for problems that should remain deterministic.

But when you start building systems that actually work reliably, a different picture appears.

Most practical systems are not purely probabilistic — they are architectures combining deterministic and probabilistic computation.

Understanding the difference between these two classes of computation turns out to be extremely important, not only for AI engineers but for system architects in general.

Where This Idea Came From

My perspective on this topic evolved through several stages.

First, years of writing software the traditional way — carefully designing deterministic systems where behavior is predictable and constraints are explicit.

Then the arrival of AI coding tools. Suddenly code generation became extremely cheap. Many tasks that used to require careful implementation could be produced instantly.

At first this felt like pure acceleration. But over time it became clear that cheap execution has a side effect: architectural drift.

This line of thinking started when I began exploring the hidden cost of cheap execution in AI-accelerated development (which I wrote about earlier in a LinkedIn post: "The Hidden Cost of Cheap Execution").

More recently I've been building tools that intentionally combine deterministic and probabilistic computation — applying probabilistic reasoning only where deterministic structure cannot reduce the problem space further.

This article summarizes the core principle behind that approach.

Two Classes of Computation

At a very high level, most computational tasks fall into two categories:

Deterministic computation
Probabilistic computation

They solve fundamentally different kinds of problems.

Property	Deterministic	Probabilistic
Output	fixed	distribution
Debugging	straightforward	statistical
Failure	explicit	uncertain
Cost	cheap	expensive
Use case	constraints	ambiguity

Deterministic Computation

Deterministic computation is what classical software engineering is built on. Given the same input, the system always produces the same output.

Examples:

compilers
parsers
type checkers
database queries
validation rules
cryptography
routing logic
protocol implementations
regular expressions (parsing, validation, extraction)

In deterministic systems:

output = f(input)

The function f is explicit, stable, and predictable.

Strengths

Deterministic computation is extremely powerful when:

rules are known
constraints are strict
correctness matters
behavior must be explainable
failure modes must be controlled

Properties:

predictable
debuggable
verifiable
cheap to run
safe for critical paths

This is why the core infrastructure of the digital world — databases, compilers, operating systems — is deterministic. No surprises. You can reason about it.

Limitations

Deterministic systems struggle when:

rules are unknown
inputs are ambiguous
the space of possibilities is huge
knowledge must be compressed from data

For example:

natural language interpretation
image recognition
semantic similarity
reasoning under uncertainty

These problems are hard to encode with explicit rules. That's where probability earns its place.

Probabilistic Computation

Probabilistic systems operate differently. Instead of explicit rules, they model probability distributions.

For example, a language model estimates:

P(next_token | context)

The system does not compute the answer through rules; it computes the most likely continuation.

Examples:

language models
speech recognition
recommender systems
ranking models
anomaly detection
computer vision models

Probabilistic systems are extremely powerful for problems where:

rules are unknown
data is noisy
patterns must be inferred

Strengths

Probabilistic systems are excellent at:

pattern recognition
generalization
handling ambiguity
synthesizing new combinations
compressing large knowledge spaces
when you don't have a spec, they're often the only option

This is why modern AI works at all. The catch: it doesn't tell you where to use it.

Limitations

But probabilistic systems have fundamental weaknesses:

non-deterministic outputs
hallucinations
difficulty enforcing constraints
limited explainability
cost and latency — model inference is expensive compared to deterministic logic

A regex, an if statement, or a database lookup executes in microseconds and costs essentially nothing. A model call costs money and introduces latency. At scale, this difference becomes a primary architectural constraint.

If used incorrectly, they introduce uncertainty into places where certainty is required.

The False Dichotomy

Many discussions today frame the problem incorrectly:

Should we replace deterministic systems with AI?

This is the wrong question. The real question is:

How should deterministic and probabilistic computation be composed?

Where Deterministic Computation Wins

Deterministic systems dominate when:

the structure is known
constraints exist
invariants must be preserved

Examples:

Programming languages — Compilers are deterministic for a reason. A probabilistic compiler would be catastrophic.
Databases — SQL engines are deterministic because queries must be correct.
Protocols — Network protocols rely on deterministic state machines.
Validation — Formats like JSON, protobuf, and schema validation require exact correctness.
Regular expressions — Same pattern and input always yield the same match. In hybrid systems they often do the first cut — extracting structure (dates, IDs, emails) from raw text before any LLM sees it. That reduces ambiguity and keeps the model away from tasks that don't need probability.

Where Probabilistic Computation Wins

Probabilistic systems dominate when the problem is inherently ambiguous.

Examples:

Natural language — Human language contains ambiguity everywhere.
Retrieval and ranking — Choosing the most relevant document is rarely deterministic. Ever tried to make that 100% rule-based? It doesn't scale.
Vision — Images are noisy and high dimensional.
Code synthesis — Generating new code often requires combining patterns probabilistically.

Deterministic Risk Control

Deterministic layers are where you enforce invariants and reduce risk. Probabilistic components don't get to override these rules.

Input validation — length, charset, schema (e.g. JSON schema). Invalid input never reaches the model.
Output validation — allowlists of actions, formats, or categories; length limits; PII checks. The model may suggest something, but only allowed values are executed or stored.
Regular expressions — extract and validate structure (emails, IDs, tags) before the model; same for checking model output against expected patterns.
Audit and idempotency — deterministic request IDs and idempotency keys ensure that critical actions are logged and not duplicated, regardless of model non-determinism.

I've seen codebases that sent every user message straight to an LLM. The bill and the latency told the story.

The rule of thumb: anything that would cause legal, safety, or data-integrity issues must be enforced in deterministic code, not in prompt engineering or "smarter" models.

Example: deterministic extraction before any LLM call:

function extractStructuredParts(userMessage: string): {
  emails: string[];
  ticketIds: string[];
  hasUrgent: boolean;
} {
  const emailRegex = /[\w.-]+@[\w.-]+\.\w+/g;
  const ticketRegex = /#(\d+)/g;
  const urgentRegex = /\b(urgent|asap|critical)\b/i;
  return {
    emails: userMessage.match(emailRegex) ?? [],
    ticketIds: [...userMessage.matchAll(ticketRegex)].map((m) => m[1]),
    hasUrgent: urgentRegex.test(userMessage),
  };
}
// Same input => same output. No model needed for this.

Example: deterministic guardrails on model output — only allowlisted actions are executed:

const ALLOWED_ACTIONS = new Set(["view", "edit", "submit", "cancel"]);

function safeExecute(modelOutput: string): string {
  const action = modelOutput.trim().toLowerCase().split(/\s+/)[0]; // e.g. "submit form"
  if (!ALLOWED_ACTIONS.has(action)) {
    return "error: unknown action"; // never pass through raw model output
  }
  return executeAction(action);
}

The Real Architecture: Hybrid Systems

The most powerful systems are hybrid. Instead of replacing deterministic computation, probabilistic models should operate inside deterministic scaffolding.

Deterministic logic defines the boundaries. Probabilistic models explore inside those boundaries. That is the metaphor worth keeping in mind.

Conceptually, the flow looks like this:

          Problem Space

┌──────────────────────────────┐
│                              │
│   Deterministic Reduction    │
│  (rules, validation, index)  │
│                              │
└──────────────┬───────────────┘
               │
               ▼
      Residual Uncertainty
               │
               ▼
     Probabilistic Reasoning
        (LLM / ML models)

Good architecture reduces the problem space deterministically before applying probabilistic intelligence.

A typical pipeline in code looks like this:

input
   │
   ▼
deterministic preprocessing
   │
   ▼
constraint reduction
   │
   ▼
retrieval / memory
   │
   ▼
probabilistic reasoning
   │
   ▼
deterministic validation
   │
   ▼
output

In code, that often looks like this:

// Hybrid: deterministic shell around probabilistic core
async function processUserRequest(raw: string): Promise<string> {
  // 1. Deterministic: normalize and validate input
  const text = raw.trim()
  if (text.length < 1 || text.length > 10000) {
    throw new Error('Invalid length')
  }

  // 2. Deterministic: extract known structure (e.g. with regex)
  const refs = [...text.matchAll(/#(\d+)/g)].map((m) => m[1]) // ticket IDs

  // 3. Probabilistic: only for the ambiguous part
  const response = await llm.generate({ context: text, refs })

  // 4. Deterministic: validate output shape and safety
  if (!response || response.length > 5000) {
    return fallbackResponse()
  }
  return response
}

In other words:

Remove everything that can be solved deterministically.
Narrow the search space.
Retrieve known information.
Use probabilistic reasoning only for the residual uncertainty.

Ports and Adapters: Structure Decides

The same pipeline fits naturally into a port-adapter (hexagonal) view. What matters is the structure — the ports and the flow — not whether a given step is implemented deterministically or probabilistically.

          ┌─────────────────────────────────────┐
          │         Application Core            │
          │  (orchestration, use cases, ports)  │
          └──────────────────┬──────────────────┘
                             │
     ┌───────────────────────┼────────────────────────┐
     │                       │                        │
     ▼                       ▼                        ▼
┌─────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌─────────┐
│ Preproc │ │ Retrieve │ │ Reason │ │ Validate │ │  Output │
│  port   │ │  port    │ │  port  │ │  port    │ │  port   │
└────┬────┘ └────┬─────┘ └────┬───┘ └────┬─────┘ └────┬────┘
     │           │            │          │            │
     ▼           ▼            ▼          ▼            ▼
  adapter     adapter      adapter     adapter      adapter
  (determ.    (vector DB,  (LLM /      (schema,     (format,
  or LLM)     or rule)     or rules)   allowlist)   log)

The core depends only on ports (interfaces). Each adapter can be deterministic or probabilistic. You can replace a deterministic preprocessor with a probabilistic one (e.g. "normalize with an LLM") or the other way around — the architecture stays the same. Structure decides; implementations are pluggable.

// Port: the core only depends on this contract
interface ReasonerPort {
  generate(ctx: { context: string; refs: string[] }): Promise<string>
}

// Adapter A: deterministic (rules, template)
class RuleBasedReasoner implements ReasonerPort {
  async generate({ context, refs }: { context: string; refs: string[] }): Promise<string> {
    return applyTemplates(context, refs) // same input => same output
  }
}

// Adapter B: probabilistic (LLM)
class LLMReasoner implements ReasonerPort {
  async generate({ context, refs }: { context: string; refs: string[] }): Promise<string> {
    return llm.generate({ context, refs }) // same input => may vary
  }
}

// Application code is identical; swap the adapter to switch behaviour
const reasoner: ReasonerPort = new RuleBasedReasoner() // or new LLMReasoner()

So: the decision of where to use deterministic vs probabilistic logic lives in the choice of adapters, not in the core. The core defines what steps exist and in what order — that is what we mean by "architecture is the multiplier."

Residual Intelligence

The Residual Intelligence Principle

Probabilistic models should solve only the residual uncertainty after deterministic reduction of the problem space.

Good architecture does not ask AI to solve everything. It asks AI to solve only what cannot be solved deterministically. Get that wrong and you're paying for intelligence you don't need.

This dramatically reduces complexity and leads to:

cheaper systems
more reliable outputs
fewer hallucinations
easier governance

Example: Code Completion

Modern IDEs illustrate this hybrid approach well. Many completions do not require LLMs. They rely on deterministic information:

syntax
types
symbol tables
project index
scope rules

Only when the system cannot determine a clear continuation does it use probabilistic generation. This combination is far more efficient than using an LLM everywhere.

Extreme Cases

Understanding the extremes is also instructive: pure deterministic systems suffer from rule explosion, pure probabilistic ones from uncontrolled uncertainty.

Pure deterministic systems

Strengths: reliability, predictability, efficiency
Weaknesses: brittleness, inability to generalize, enormous rule complexity

Pure probabilistic systems

Strengths: flexibility, adaptability, pattern recognition
Weaknesses: instability, hallucinations, lack of guarantees

Most systems that "went full AI" learned that the hard way.

Architecture Is the Multiplier

The biggest performance gains rarely come from making probabilistic models bigger. They come from structuring the system correctly.

A well-designed deterministic layer can reduce the search space by orders of magnitude, so the probabilistic layer works on a much smaller and easier problem — and that is where nonlinear efficiency gains appear. One good deterministic filter can shrink the problem tenfold before the model ever runs.

A Different Way to Think About AI

Instead of thinking about AI as a replacement for software engineering, we can think about it as a new computational layer.

Not:

software => replaced by AI

But:

deterministic systems
      +
probabilistic models
      =
hybrid intelligent architectures

The future of intelligent systems is likely not pure AI; it is architecture — the art of deciding which parts of the system must be deterministic, and where probability should be allowed to exist.

Closing Thought

AI did not eliminate engineering. It exposed something deeper.

Execution was never the hardest problem. The real challenge has always been structuring the problem space so that expensive intelligence is used only where it is truly needed. That is the job of architecture.

One question

If you removed all deterministic layers from your system and replaced them with LLM calls...

would it become smarter — or just more expensive?

Top comments (2)

Vasiliy Shilov • Mar 8

The core idea of this article in one sentence (if you don't want to read a lot of words):
Probabilistic intelligence should solve only the residual uncertainty after deterministic reduction of the problem space.

Buono Make Studio • Mar 9

It's very deep consideration!
I was moved to see that.

To be honest I've been using LLM for everything no matter what type it is.
You must be really high-level LLM user.
I'll take care of it from now on...