DEV Community: Vasiliy Shilov

Inspecting @cursor/sdk: what npm installs - and what it doesn't decide for you

Vasiliy Shilov — Wed, 29 Apr 2026 20:27:33 +0000

How this started

A few hours ago I scrolled past Cursor's own post. It said they're introducing the Cursor SDK so you can build agents with the same runtime, harness, and models that power Cursor - CI/CD, end-to-end automations, agents embedded in products. The post also named several companies as examples.

My brain did the thing every tired developer does: same runtime -> I'll read the code -> I'll see how sandboxing works.

Official announcement: Cursor on LinkedIn

Spoiler: the SDK is on npm. What ships isn't human-readable application source laid out for audit - it's compiled bundles (still plain text on disk, not intended for line-by-line inspection of logic).

What I expected vs what I got

Expectation	Reality
Clone-adjacent transparency	`@cursor/sdk` is compiled bundles (`dist/cjs`, `dist/esm`), not a source tree you can grep like your own repo - not intended for line-by-line inspection (nothing stops you opening the file; it's not the packaging intent)
"I'll verify security in the agent loop"	You verify your process boundary (user, container, network). The types hint at `local.sandboxOptions.enabled` - a boolean, not a full FS ACL
Open license vibes	`LICENSE.md` is short and proprietary (Anysphere, tied to Terms of Service)

No pitchforks - just alignment: proprietary code on my laptop is a contract, not a community repo. I'm fine with that when I choose it. I had simply mixed up "SDK on npm" with "something I can read like first-party source."

What actually sits in `node_modules/@cursor`

I pinned what I saw from @cursor/sdk@1.0.9 (public beta in package.json).

`@cursor/sdk`

Name / version: @cursor/sdk 1.0.9
Description: TypeScript SDK for Cursor agents (public beta).
Entry: main -> ./dist/cjs/index.js, ESM -> ./dist/esm/index.js, types in dist
Exports: "." and "./agent" both resolve to the same built entrypoints (so import '@cursor/sdk/agent' isn't a second codebase - it's the same bundle, different export path)
Runtime deps (what npm actually installs for consumers): @bufbuild/protobuf, @connectrpc/connect, @connectrpc/connect-node, @statsig/js-client, sqlite3, zod
Optional dependencies: platform packages that pull native-ish payloads - @cursor/sdk-darwin-arm64, @cursor/sdk-darwin-x64, @cursor/sdk-linux-arm64, @cursor/sdk-linux-x64, @cursor/sdk-win32-x64 (same version pin)

On disk it's still Connect RPC + protobuf + sqlite + zod + Statsig on paper - plus a large bundled index.js for each of CJS and ESM (~5 MB each in 1.0.9). Good for shipping; not packaged for comfortable line-by-line review of behavior (the bundle is readable; ergonomics aren't those of a source drop).

Connect RPC (Buf). Dependencies like @connectrpc/connect, @connectrpc/connect-node, and @bufbuild/protobuf point at Protobuf over HTTP - the stack Buf popularised as "typed RPC without reinventing REST." That doesn't reveal application logic, but it's a stack choice: typical patterns are generated contracts and fewer stringly-typed boundaries than ad hoc JSON.

Statsig. The tree includes @statsig/js-client (Statsig's own product category is feature flags and related tooling). I'm not going to guess what it does in this SDK at runtime; it's listed here so you can check Cursor's docs and your own network traces if that matters for your policy. More on treating unknown egress below.

`@cursor/sdk-darwin-x64`

Separate tiny package, same version 1.0.9:

Description: Ripgrep binary for darwin-x64, bundled for @cursor/sdk.
Constraints: os: ["darwin"], cpu: ["x64"]
Ships: bin/**/* (plus package.json / README)

So one "platform" slice is literally ripgrep for macOS x64 - a native binary inside npm. On Linux x64 the optional package is @cursor/sdk-linux-x64 with bin/rg - same pattern. That's normal for tooling; it's also another reminder that trust isn't abstract: you're executing vendor-supplied code and vendor-supplied binaries.

License (the one file I didn't need a source map for)

From LICENSE.md in the package:

© Anysphere Inc. All rights reserved. Use is subject to Cursor's Terms of Service.

Short. Clear. Not OSI-approved openness.

Security: still not "in the box"

The SDK types expose things like local.cwd (workspace) and local.sandboxOptions.enabled - useful switches, not a full policy engine. There is no fine-grained filesystem ACL in the public types (no per-path allow/deny lists, no separate read/write policies per tree). So enabled helps but is not enough without everything below.

Privacy, telemetry, and dependencies

If @statsig/js-client (or any other listed dependency) is a concern for air-gapped use, strict CI determinism, or third-party analytics policies, the practical step is the same as for any SDK: Cursor's docs / ToS, and validate egress in your environment (proxy, firewall logs, packet capture if your policy allows).

Egress allowlisting applies both to RPC/API/model traffic you already expect and to anything else the process might open - without assuming good or bad intent; just know what leaves the box.

Checklist: safe local use of `@cursor/sdk`

Treat the SDK as the driver; security comes from the OS, container, network, permissions, and your own guardrails (audit, approvals, limits).

Recommended baseline (any OS)

Run the agent in an isolated environment (container, VM, or dedicated user).
Use a dedicated workspace per task (cwd), not your home directory or multi-project trees with secrets.
Set local.sandboxOptions.enabled: true for local agents.
Restrict outbound network (allowlist only what you need).
Require human approval for risky actions (delete, bulk writes, destructive shell).
Log and audit prompts, tool calls, shell commands, and file changes.
Apply timeouts and step/output limits to stop runaway runs.
Pin dependencies, use lockfiles, and avoid arbitrary installs at runtime.

Filesystem

One workspace directory per job; avoid pointing cwd at trees that hold SSH keys, cloud creds, or personal data.
Prefer read-only mounts for source and a writable scratch for artifacts only.
Run as a non-root / non-admin user dedicated to the agent.
Block access to typical secret locations unless strictly required: ~/.ssh; ~/.aws, ~/.config/gcloud, similar cloud CLI dirs; password managers, browser profiles, credential stores.

Secrets and tokens

Prefer short-lived credentials.
Redact secrets in logs and artifacts.
Avoid baking long-lived tokens into env unless necessary; rotate regularly.

Tools and automation risk

shell is usually the highest risk; write / delete are next. Apply least privilege: disable or gate tools you do not need, and require confirmation for destructive operations.

Network

Use an egress allowlist (only package registries and APIs you actually use).
Block access to internal RFC1918 ranges and cloud metadata endpoints unless required.

Audit and observability

Log at minimum: who / when / what prompt / which tools / which commands / what changed / outcome. Essential for incident review.

Supply chain

Commit lockfiles; pin critical versions.
Scan dependencies for known issues.
Do not allow unrestricted npm install / package pulls inside the agent runtime without review.

Pre-flight

[ ] Isolated runtime (user, container, or VM).
[ ] Narrow cwd to a dedicated workspace.
[ ] Sandbox on: local.sandboxOptions.enabled for local agent options, or sandboxOptions on createLocalExecutor (same boolean - pick the API you use).
[ ] Network restricted (allowlist).
[ ] Secrets not exposed on the agent's filesystem or env.
[ ] Dangerous tools gated or approved manually.
[ ] Logging and audit enabled.
[ ] Timeouts and step/output limits set.

Linux (short)

Dedicated system user, no sudo; per-run directory (e.g. /srv/agent-runs/<id>), minimal home for that user.
Prefer containers: read-only root, single writable mount for work; --cap-drop=ALL, no-new-privileges, resource limits.
Egress: nftables/iptables or Kubernetes/CNI policies.
Optional hardening: systemd (NoNewPrivileges, ProtectSystem, ProtectHome), cgroups / ulimits, seccomp / AppArmor / SELinux where available.

macOS (short)

Separate local user for the agent (not administrator); dedicated folder for work - do not reuse your personal home for untrusted automation.
No Linux-style containers by default - user separation + strict permissions; consider VM or Linux VM (Lima, Colima, UTM) for stronger boundaries.
TCC: avoid Full Disk Access unless required; don't grant broad disk access unnecessarily; keep Keychain / iCloud Drive off reachable paths unless intentional.
Application firewall plus egress allowlist; block localhost/internal services if not needed.
Same SDK settings: tight local.cwd, local.sandboxOptions.enabled: true, with OS isolation as the real barrier.

That checklist is operations: isolation, egress, logs. API details: Cursor TypeScript SDK docs.

Admissibility (next section) sits on top of that - who may propose which actions before anything runs.

The missing layer: admissibility

This part is my framing, not something missing from a README by mistake.

The Cursor SDK gives you:

a runtime
tools (read, write, shell)
an agent loop

What you still have to supply, in any serious deployment, is:

a definition of what is allowed to happen - or accept that only culture and prompts stand between intent and execution.

One common mental model is:

Agent -> executes actions -> system changes

A stricter pattern many teams want is:

Agent -> proposes actions -> policy validates -> executor applies

The difference:

one style leans on trusting the agent within a sandbox flag
another splits policy (what may exist) from execution (what actually runs), so disallowed actions are unrepresentable or rejected before an executor touches disk or shell

That admissibility layer - policies that decide whether a proposed action may exist at all - is where I spend time in design reviews, regardless of whether the SDK bundle is open source.

Closing

The LinkedIn post describes same product surface, packaged for automation. I had conflated that with shipping source I could audit line-by-line like my own repo; npm install showed published packages and a license, not a navigable source tree.

If you're embedding this on a machine that also holds sensitive data - treat it like any other proprietary runtime: isolate, limit, log, and read the license.

If readable sources or a published threat model for local execution appear later, I'll read them with interest.

The SDK is easy to ship and wire up; safety still rides on your threat model, isolation, egress, logging, and any admissibility rules outside what one npm package implies. That isn't replaced by prompts or local.sandboxOptions.enabled alone.

How do you run proprietary agents in CI - sandbox flags only, or gVisor/Firecracker (or similar) around the process?

When diffs outrun gates: admissibility, not vibes

Vasiliy Shilov — Thu, 23 Apr 2026 11:42:45 +0000

When diffs land faster than your gates can reject bad ones, the repo feels limitless.

That used to be mostly human typing speed. Now it is often assistant-proposed patches, merged on vibe when they "look right". Same geometry, faster collapse if admission lags.

It is not. You are trading away room to change the thing later.

Freedom in software is not how much you can ship. It is how much you can still change safely six months out.

That is not a one-time "we fixed how we ship". It is evolution in the evolutionary architecture sense (Fowler's foreword to Ford/Parsons/Kua): the product moves in steps, and the admission regime moves with it - new gates, retired checks, shifted seams - as explicit change, not folklore baked into tribal merge habits.

Control has to scale with proposal rate - not with how fast someone can read patches. So the blunt move: replace review with rejection, opinion with admissibility. You are not rubber-stamping plausibility; you refuse invalid transitions.

The model

proposal
  -> machine constraints (reject / accept)
  -> repo state (only admissible states land)

Correctness is enforced before merge, not argued after.

If staying correct still means "someone read the whole diff", you lose at high volume. Proposal outruns reading; verification has to be mechanical.

The correction loop (this is the engine)

proposal
  -> tool says no (structured, machine-readable errors)
  -> patch again
  -> repeat until merge carries proof

The environment teaches through errors - not through a thread of explanations. Good failure messages converge; vague ones thrash.

The failure pattern is predictable

The problem is rarely "we shipped a bug". It is "we let implicit behavior through".

Then the same chain:

not rejected
  -> observable
  -> depended on (Hyrum)
  -> contracted
  -> expensive to fix

Plasticity drops: features keep landing, safe edits shrink.

High proposal volume makes the middle steps cheap: easy to merge plausible patches, easy to skip intent and invariants, hard for admission to keep up.

AI-assist raises proposal rate. Vibe-coding lowers the cost of acceptance - merge on feel, argue after. Together they widen the gap between output volume and proof.

You get more "looks right", less "we can defend this change".

Summary

The failure mode is unbounded admissibility.

If invalid states are representable and mergeable, something will eventually occupy them - especially under pressure. The same dynamic is spelled out as Hyrum's Law and in more depth in Software Engineering at Google, chapter 1.

Scalable control is not better prose in the PR. It is correctness you can enforce in CI.

Law

Proposal scales.
Only constraints scale correctness.

The stack (admissibility layers, not a checklist)

Each layer shrinks what can exist without proof.

1) State space - illegal states unrepresentable

If a state is representable, it will be produced.

Shrink the surface:

opaque / branded types at boundaries
ADTs + exhaustiveness so new cases do not compile until handled
total-ish functions where "partial" hides bugs

That is state topology, not typing for aesthetics.

Merge fails if: new state / case escapes the model.

2) Topology - graph with enforced edges

You do not architecture-review line by line. You kill invalid edges.

layer rules, forbidden imports
private / internal modules so reachability is smaller
one real way to cross a boundary (protocol, schema, bus)
automate the graph in CI (ArchUnit for Java, dependency-cruiser for TS)

Merge fails if: dependency rule or boundary breaks.

3) Contracts - what may exist

Contracts are not discovered in the patch. They are declared first; implementation proves conformance.

interface / trait as shape
OpenAPI / Protobuf / JSON Schema as truth
generated validators and clients off one spec

Merge fails if: code drifts from the declared contract.

4) Semantic policy - beyond "it parses"

Parser green is cheap. Policy is the rest: complexity, nesting, fan-out, banned APIs.

Non-syntax observables count too: latency floors, ordering quirks, anything users can see in production. Those expectations harden like a documented field - even when the contract never promised them (same shape as implicit-performance expectations in Hyrum's Law - see Summary).

Merge fails if: budgets or deny-lists trip.

5) Fitness - health and cost, not only logic

Building Evolutionary Architectures calls this kind of thing a fitness function: dependency growth, cycles, perf, memory, "how hard is revert". That is the bill for next quarter, not today's green build.

Merge fails if: fitness budgets regress.

6) Decision economy - intent is not optional

Big anonymous diffs are debt. Small changes + link to ticket / ADR / explicit intent ("why now, what breaks if wrong, when removable").

If there is no machine-linkable why for the what, treat the change as invalid.

Merge fails if: trace missing.

7) Feedback semantics

Verdict + reason codes + stable repro steps for the tool chain. Invest in error quality like you invest in features - bad messages waste everyone's time.

Another shift (architectural, not only procedural)

Gates reject bad merges. Good. You still lose if volatility leaks outward and becomes the contract.

Principle

Volatile parts must not be contracts. They live behind contracts - recursively:

churny implementation behind a stable module surface
churny module behind a stable service contract
vendor quirks behind an internal adapter
accidental shapes behind a semantic API

Stability is not "clean core, messy edges". Every boundary that other teams or binaries see has to stabilize meaning - not mirror today's DB row, library type, or temporary workflow. The old split is still the reference: Parnas on module criteria; for "shallow interface, deep module", Ousterhout.

When volume is high, the dangerous move is the same every time: an internal detail ships as a public shape (DTO as API, enum as protocol, error text as behavior). Assistants are especially good at surfacing whatever shape is easiest in the moment - which is exactly what you do not want on a public edge. Then the usual chain: observable -> depended -> expensive.

Good architecture does not publish change. It absorbs it where the system can still afford to rewrite.

The balance (do not over-rotate)

"Hide everything behind interfaces" is how you buy a second job: adapters, mapping layers, ceremony, integration tax.

The target is not maximal isolation. The target is governed coupling:

volatile internals
stable semantics
economical integration

Contracts must be cheap enough to compose. A boundary is good if it absorbs local churn without making cross-boundary change disproportionately expensive.

Prefer a thin waist: small surface, explicit invariants, few operations, clear error semantics - not a universal wrapper over the universe.

DRY caveat

Duplication inside a volatile zone is often cheaper than a shared abstraction that freezes churn too early.

Law-shaped rule:

Don't share what is still changing.
Hide it until the semantics stabilize.

Sanity checks on any seam

Does it hide volatility, or just export the internal model?
Can you swap implementation without changing the contract? (Seams and tests around them are still the practical toolkit in Feathers.)
What does a cross-cutting change cost across two or three of these seams?
Did the adapter layer become the slowest and smartest part of the system?

Not everything should be isolated. Isolate where volatility would otherwise escape and become public truth.

The purpose of a boundary is not separation for its own sake. It is preserving cheap change for everyone outside.

Rollout (without freezing the team)

Pick 5-10 non-negotiables: invariants + boundary rules.
Encode as CI with explicit failure modes.
Require intent links for non-trivial patches.
Add fitness budgets (deps, complexity, perf, reversibility).
Trend the meta: rejections by rule, repeat offenders, cost to delete recent work, deps touched per change.

What to optimize for

Not "more lines merged". More reversible change.

Execution got cheap; mistakes did not. They got easier to commit and harder to remove.

So the question is not "how fast do we ship"? It is "do we keep the right to change this tomorrow"?

"Read every line" is not a strategy at volume.

What scales: invariants and contracts that reject invalid states before they become history.

Code Is Not the Source of Truth. It's a Materialized View.

Vasiliy Shilov — Wed, 18 Mar 2026 23:46:27 +0000

A short piece about what happens to development when code stops being the center. More of a reflection: how we think, what we fix in place, and why speed is no longer the main constraint.

Introduction What this is about and why it matters now. The shift from "write faster" to "understand more clearly".
Code Is Not the Center Code as cache and as a projection of decisions. Source of truth is not files but invariants and intent.
Intent and Invariants What to fix before code and why. Decision first, then code - not the other way around.
Two Modes Flow and serialization: when something new is born, when clarity appears. How not to get stuck packaging the past.
Comprehension Debt We generate understanding but don't know how to keep it as a system. And what to do about it.
Equilibrium Architecture as a balance of forces: latency - throughput, flexibility - simplicity. Change = shift in space, invariants = boundaries.
One Formula Meaning is fixed, execution is computed, the system lives through the evolution of decisions.

1. Introduction

For the last six months I've had the sense that I'm not so much writing code as trying to get a feel for the invariants of development itself.

At first it looked like a local task: speed up development with AI without losing control. It seemed to be about tools - Cursor, models, pipelines.

Very quickly it became clear: tools aren't the problem at all.

The problem is undefined intent.

AI doesn't break architecture. It just makes its weak spots visible faster than you can paper over them. You used to be able to write by hand for a week and not notice the mess in your head. Now - ten minutes, and you already have an architectural mess in the code.

I stopped trying to "write faster". That's no longer the problem. AI simply removed it.

The problem now is different:

how well you actually understand what you're doing
how well you can say it
whether you have invariants, not just "seems fine"

And the most interesting observation: speed is no longer the constraint. The new constraint is clarity of thinking, sharpness of invariants, the ability to formalize intent.

This booklet is about where that leads: not another framework, but a shift of gravity. Code moves to the background. What we fix before code and how we keep the system within the bounds of understanding moves to the center.

In short: not "how to write better", but "how to make thinking a bit more engineering". And that turned out harder than any NestJS service.

2. Code Is Not the Center

We've lived a long time with the idea that code is the system. That if the code exists - the system exists. If we lost the code - disaster.

AI quietly broke that assumption. Not loudly, not declaratively - through practice.

Code became cheap. So cheap that losing it is no longer a tragedy. Not because code got worse, but because a layer above it appeared that used to be blurry.

Before: idea, discussions, code, workarounds, documentation (if you're lucky).

Now something else is taking shape: idea, invariants, boundaries, decision, execution, code.

That's where the main shift happens: code no longer explains the system, it only reflects it.

Code is not the asset. The asset is invariants and decisions.

Code is their projection. You can rebuild it.

If you have intent, invariants, constraints, decision graph - code becomes computable. Not in a magical sense, but in this sense: computation used to be expensive, so we fixed the result (code). Now computation is cheap - we can fix the description and rebuild the result.

A handy formula:

CHANGE_PLAN / DOSA / invariants - source of truth
code - materialized view
execution - computing system state

You're just applying to development what databases, functional systems, and distributed systems already do: schema and change log matter more than the current snapshot.

One important anti-pattern: "I made it pretty but lost understanding" - that's a real anchor. So the model needs one more invariant: if you can't verify and understand the result - it's invalid, even if it's "computable".

The reality of the system is slowly moving: it used to be "files + classes + functions", it's becoming "decisions + constraints + their evolution".

3. Intent and Invariants

Clean Architecture, DDD, Ports & Adapters - they define structure and boundaries. But the lifecycle of a decision - how it is captured, constrained, and evolved - remains implicit. The closest thing we have is ADR. But ADR turns decisions into documents. What we need is to turn decisions into system primitives.

When you start explicitly fixing intent, invariants, constraints, kill criteria - you don't get documentation. You get a decision graph. And at some point it becomes obvious: code is not the center of the system. Code is a projection of decisions.

Hence the next step. If decisions are primary, they can be versioned, checked, executed, tied to metrics. Then architecture isn't a set of layers, it's a system for managing decisions under uncertainty.

In practice it looks like this. First you form the context of intent: talk to people, gather the picture. Then - a dump of thoughts, into a chat or on paper. Then structuring: not "make it pretty", but "split by intent, invariants, boundaries, steps". One plan file, one execution step at a time, a separate context per step. The model gets not "everything at once" but the minimal sufficient slice. And that gives both quality and economy: moving along meaning is cheaper than recovering meaning through search.

The main observation: a separate chat - less context -> often better quality. LLM quality is proportional to the clarity of local context, not its size. A tree model fits: root -> intent and invariants, branches -> change plans, leaves -> execution steps. That's indexing by meaning, not by text.

The human is responsible for meaning and boundaries. The system (AI + runtime) - for execution. AI shouldn't define the system, it should live inside the frame we set.

4. Two Modes

There are two modes we slip into when we think and do.

Flow / exploration. That's where something new is born. No clarity, just "feels like something's here", contradiction is allowed.

Serialization / fixation. That's where clarity, words, boundaries appear. But almost nothing new appears there anymore.

The problem isn't that the second mode is bad. The problem is that it's very sticky. It gives a sense of control, completeness - "I did good, I packaged it". And you can sit in it for hours, days, and at some point catch yourself: you haven't discovered anything in a long time, you're just neatly packing the past.

The key skill isn't "write better", it's switching between modes on purpose. Almost like a toggle: now I'm exploring, now I'm fixing.

A strict rule helps: if you're in flow - don't try to make it pretty and correct right away. At most - short markers, anchors, scraps of thought so you can come back later. And only in a separate state: ok, now let's turn this into a plan, an article, an architecture.

And an important effect: if you do it that way, serialization stops "killing the flow" and starts feeding it. You return not to a void but to a space that's already a bit structured.

CIMP and similar things are basically an attempt to make even the fixed part not die but stay part of a living thinking process. Not a "memory dump", but a frame you can build the next loop on.

5. Comprehension Debt

You talk with someone - and it's great. But if you close the chat, half the meaning just vanishes.

We generate understanding but don't know how to keep it as a system. I call this comprehension debt - like technical debt, only in understanding. The term has been around in various circles for a long time, and I didn't coin it, but this name fits me best.

Any conversation with AI is ephemeral. It doesn't persist as a knowledge system. You kind of understood - but that understanding isn't anchored anywhere. Hence ideas like RAG, context graphs, structured artifacts. But even that's not enough: knowledge isn't text, it's the link between decisions, invariants, and consequences.

In the new model, documentation stops being "extra work". It becomes the main interface for controlling the system. Not an afterthought, but what execution loses its anchor without.

Formulating outward - articles, posts, plans - is a way to unload your head, check what matters, and make room for new things. Not a side effect but a natural extension of the same approach: structure lowers the cognitive cost.

One more thought: if you can't hold the system as a model - it doesn't exist. Doesn't matter how many lines of code. So fixing meaning isn't bureaucracy, it's the condition for the system to exist in your head and in the heads of everyone who works with it.

6. Equilibrium

Architecture works with a limited set of concepts and meanings. They're almost deterministic, what changes is mainly context. And there are concrete forces that are always being traded off.

Technical: compute vs data, latency vs throughput, consistency and availability.

Structural: flexibility vs simplicity, experimentation and research vs control.

Architecture isn't a choice of "what's better", it's equilibrium. Every change isn't a "feature", it's a shift of the system in the space of these forces. And if you have invariants, you limit where you can move. If you don't - the system just spreads.

It helps to think of it as vectors. The system is a point in the space of forces, a change is a shift of the vector, invariants are constraints on the allowed region. You start to see that system architecture and model architecture (the same vectors, matrices, parallel computation) aren't different worlds - they're the same math in different guises.

Splitting into two layers keeps the zones from mixing: deterministic (invariants, policies, boundaries, checkable rules) and probabilistic (AI, inference, generation, search). AI should live inside constraints, not define them. Then the human is responsible for meaning and boundaries, the system for execution.

In the end, development isn't writing code, it's managing the computation of decisions. And your workflow with plans, steps, and review stops looking like a hack and starts looking like a clean model: you generate intent, fix it, split into steps, execute one at a time, check, adjust. The main thing: you don't hold everything in your head at once. You build a system where moving along meaning is cheaper than recovering it.

7. One Formula

If you pack it all into one short formula:

Meaning is fixed. Execution is computed. The system lives through the evolution of decisions.

You're no longer trying to "write correctly". You're trying to make it so that even if you forget - the system can be restored. That's already very close to how math, physics, and stable systems in general work.

The shift is essentially:

was: code-centric engineering
becomes: decision-centric engineering

Like git for code - only for decisions. At some point I realized architecture isn't about diagrams. It's about taking a blurry thought and making it concrete enough to be checked, executed, and not lost a week later.

The final picture: not "speed up development", but make it observable, explainable, controllable, and scalable without losing meaning. Thinking becomes executable. And that's much deeper than it seems at first.

This idea doesn't stop here.

If code is not the source of truth, then:

what is?
how is it structured?
how does it evolve?

I'll explore this in the next short parts:

DOSA (Decision-Oriented System Architecture) - what happens when decisions become first-class system primitives, and architecture turns into a graph of evolving decisions rather than static structure
Context-oriented engineering - why smaller, structured context outperforms large prompts, and how to make context composable
Feedback-centric engineering - how systems close the loop between execution and understanding, and learn through decisions
System evolution - how systems become capable of change while preserving stability

Stop Using LLMs for Everything: The Power of Hybrid Architectures

Vasiliy Shilov — Sun, 08 Mar 2026 19:40:10 +0000

Over the past month my thinking about AI systems changed dramatically.

Many teams are quietly making the same architectural mistake:

They use LLMs for problems that should remain deterministic.

The result is predictable:

higher latency
higher cost
lower reliability
harder debugging

The irony?

Most intelligent systems don't need more AI. They need better architecture.

The common narrative today is simple:

Intelligence = large probabilistic models.

This assumption quietly pushes many teams into a dangerous design mistake: using probabilistic models for problems that should remain deterministic.

But when you start building systems that actually work reliably, a different picture appears.

Most practical systems are not purely probabilistic — they are architectures combining deterministic and probabilistic computation.

Understanding the difference between these two classes of computation turns out to be extremely important, not only for AI engineers but for system architects in general.

Where This Idea Came From

My perspective on this topic evolved through several stages.

First, years of writing software the traditional way — carefully designing deterministic systems where behavior is predictable and constraints are explicit.

Then the arrival of AI coding tools. Suddenly code generation became extremely cheap. Many tasks that used to require careful implementation could be produced instantly.

At first this felt like pure acceleration. But over time it became clear that cheap execution has a side effect: architectural drift.

This line of thinking started when I began exploring the hidden cost of cheap execution in AI-accelerated development (which I wrote about earlier in a LinkedIn post: "The Hidden Cost of Cheap Execution").

More recently I've been building tools that intentionally combine deterministic and probabilistic computation — applying probabilistic reasoning only where deterministic structure cannot reduce the problem space further.

This article summarizes the core principle behind that approach.

Two Classes of Computation

At a very high level, most computational tasks fall into two categories:

Deterministic computation
Probabilistic computation

They solve fundamentally different kinds of problems.

Property	Deterministic	Probabilistic
Output	fixed	distribution
Debugging	straightforward	statistical
Failure	explicit	uncertain
Cost	cheap	expensive
Use case	constraints	ambiguity

Deterministic Computation

Deterministic computation is what classical software engineering is built on. Given the same input, the system always produces the same output.

Examples:

compilers
parsers
type checkers
database queries
validation rules
cryptography
routing logic
protocol implementations
regular expressions (parsing, validation, extraction)

In deterministic systems:

output = f(input)

The function f is explicit, stable, and predictable.

Strengths

Deterministic computation is extremely powerful when:

rules are known
constraints are strict
correctness matters
behavior must be explainable
failure modes must be controlled

Properties:

predictable
debuggable
verifiable
cheap to run
safe for critical paths

This is why the core infrastructure of the digital world — databases, compilers, operating systems — is deterministic. No surprises. You can reason about it.

Limitations

Deterministic systems struggle when:

rules are unknown
inputs are ambiguous
the space of possibilities is huge
knowledge must be compressed from data

For example:

natural language interpretation
image recognition
semantic similarity
reasoning under uncertainty

These problems are hard to encode with explicit rules. That's where probability earns its place.

Probabilistic Computation

Probabilistic systems operate differently. Instead of explicit rules, they model probability distributions.

For example, a language model estimates:

P(next_token | context)

The system does not compute the answer through rules; it computes the most likely continuation.

Examples:

language models
speech recognition
recommender systems
ranking models
anomaly detection
computer vision models

Probabilistic systems are extremely powerful for problems where:

rules are unknown
data is noisy
patterns must be inferred

Strengths

Probabilistic systems are excellent at:

pattern recognition
generalization
handling ambiguity
synthesizing new combinations
compressing large knowledge spaces
when you don't have a spec, they're often the only option

This is why modern AI works at all. The catch: it doesn't tell you where to use it.

Limitations

But probabilistic systems have fundamental weaknesses:

non-deterministic outputs
hallucinations
difficulty enforcing constraints
limited explainability
cost and latency — model inference is expensive compared to deterministic logic

A regex, an if statement, or a database lookup executes in microseconds and costs essentially nothing. A model call costs money and introduces latency. At scale, this difference becomes a primary architectural constraint.

If used incorrectly, they introduce uncertainty into places where certainty is required.

The False Dichotomy

Many discussions today frame the problem incorrectly:

Should we replace deterministic systems with AI?

This is the wrong question. The real question is:

How should deterministic and probabilistic computation be composed?

Where Deterministic Computation Wins

Deterministic systems dominate when:

the structure is known
constraints exist
invariants must be preserved

Examples:

Programming languages — Compilers are deterministic for a reason. A probabilistic compiler would be catastrophic.
Databases — SQL engines are deterministic because queries must be correct.
Protocols — Network protocols rely on deterministic state machines.
Validation — Formats like JSON, protobuf, and schema validation require exact correctness.
Regular expressions — Same pattern and input always yield the same match. In hybrid systems they often do the first cut — extracting structure (dates, IDs, emails) from raw text before any LLM sees it. That reduces ambiguity and keeps the model away from tasks that don't need probability.

Where Probabilistic Computation Wins

Probabilistic systems dominate when the problem is inherently ambiguous.

Examples:

Natural language — Human language contains ambiguity everywhere.
Retrieval and ranking — Choosing the most relevant document is rarely deterministic. Ever tried to make that 100% rule-based? It doesn't scale.
Vision — Images are noisy and high dimensional.
Code synthesis — Generating new code often requires combining patterns probabilistically.

Deterministic Risk Control

Deterministic layers are where you enforce invariants and reduce risk. Probabilistic components don't get to override these rules.

Input validation — length, charset, schema (e.g. JSON schema). Invalid input never reaches the model.
Output validation — allowlists of actions, formats, or categories; length limits; PII checks. The model may suggest something, but only allowed values are executed or stored.
Regular expressions — extract and validate structure (emails, IDs, tags) before the model; same for checking model output against expected patterns.
Audit and idempotency — deterministic request IDs and idempotency keys ensure that critical actions are logged and not duplicated, regardless of model non-determinism.

I've seen codebases that sent every user message straight to an LLM. The bill and the latency told the story.

The rule of thumb: anything that would cause legal, safety, or data-integrity issues must be enforced in deterministic code, not in prompt engineering or "smarter" models.

Example: deterministic extraction before any LLM call:

function extractStructuredParts(userMessage: string): {
  emails: string[];
  ticketIds: string[];
  hasUrgent: boolean;
} {
  const emailRegex = /[\w.-]+@[\w.-]+\.\w+/g;
  const ticketRegex = /#(\d+)/g;
  const urgentRegex = /\b(urgent|asap|critical)\b/i;
  return {
    emails: userMessage.match(emailRegex) ?? [],
    ticketIds: [...userMessage.matchAll(ticketRegex)].map((m) => m[1]),
    hasUrgent: urgentRegex.test(userMessage),
  };
}
// Same input => same output. No model needed for this.

Example: deterministic guardrails on model output — only allowlisted actions are executed:

const ALLOWED_ACTIONS = new Set(["view", "edit", "submit", "cancel"]);

function safeExecute(modelOutput: string): string {
  const action = modelOutput.trim().toLowerCase().split(/\s+/)[0]; // e.g. "submit form"
  if (!ALLOWED_ACTIONS.has(action)) {
    return "error: unknown action"; // never pass through raw model output
  }
  return executeAction(action);
}

The Real Architecture: Hybrid Systems

The most powerful systems are hybrid. Instead of replacing deterministic computation, probabilistic models should operate inside deterministic scaffolding.

Deterministic logic defines the boundaries. Probabilistic models explore inside those boundaries. That is the metaphor worth keeping in mind.

Conceptually, the flow looks like this:

          Problem Space

┌──────────────────────────────┐
│                              │
│   Deterministic Reduction    │
│  (rules, validation, index)  │
│                              │
└──────────────┬───────────────┘
               │
               ▼
      Residual Uncertainty
               │
               ▼
     Probabilistic Reasoning
        (LLM / ML models)

Good architecture reduces the problem space deterministically before applying probabilistic intelligence.

A typical pipeline in code looks like this:

input
   │
   ▼
deterministic preprocessing
   │
   ▼
constraint reduction
   │
   ▼
retrieval / memory
   │
   ▼
probabilistic reasoning
   │
   ▼
deterministic validation
   │
   ▼
output

In code, that often looks like this:

// Hybrid: deterministic shell around probabilistic core
async function processUserRequest(raw: string): Promise<string> {
  // 1. Deterministic: normalize and validate input
  const text = raw.trim()
  if (text.length < 1 || text.length > 10000) {
    throw new Error('Invalid length')
  }

  // 2. Deterministic: extract known structure (e.g. with regex)
  const refs = [...text.matchAll(/#(\d+)/g)].map((m) => m[1]) // ticket IDs

  // 3. Probabilistic: only for the ambiguous part
  const response = await llm.generate({ context: text, refs })

  // 4. Deterministic: validate output shape and safety
  if (!response || response.length > 5000) {
    return fallbackResponse()
  }
  return response
}

In other words:

Remove everything that can be solved deterministically.
Narrow the search space.
Retrieve known information.
Use probabilistic reasoning only for the residual uncertainty.

Ports and Adapters: Structure Decides

The same pipeline fits naturally into a port-adapter (hexagonal) view. What matters is the structure — the ports and the flow — not whether a given step is implemented deterministically or probabilistically.

          ┌─────────────────────────────────────┐
          │         Application Core            │
          │  (orchestration, use cases, ports)  │
          └──────────────────┬──────────────────┘
                             │
     ┌───────────────────────┼────────────────────────┐
     │                       │                        │
     ▼                       ▼                        ▼
┌─────────┐ ┌──────────┐ ┌────────┐ ┌──────────┐ ┌─────────┐
│ Preproc │ │ Retrieve │ │ Reason │ │ Validate │ │  Output │
│  port   │ │  port    │ │  port  │ │  port    │ │  port   │
└────┬────┘ └────┬─────┘ └────┬───┘ └────┬─────┘ └────┬────┘
     │           │            │          │            │
     ▼           ▼            ▼          ▼            ▼
  adapter     adapter      adapter     adapter      adapter
  (determ.    (vector DB,  (LLM /      (schema,     (format,
  or LLM)     or rule)     or rules)   allowlist)   log)

The core depends only on ports (interfaces). Each adapter can be deterministic or probabilistic. You can replace a deterministic preprocessor with a probabilistic one (e.g. "normalize with an LLM") or the other way around — the architecture stays the same. Structure decides; implementations are pluggable.

// Port: the core only depends on this contract
interface ReasonerPort {
  generate(ctx: { context: string; refs: string[] }): Promise<string>
}

// Adapter A: deterministic (rules, template)
class RuleBasedReasoner implements ReasonerPort {
  async generate({ context, refs }: { context: string; refs: string[] }): Promise<string> {
    return applyTemplates(context, refs) // same input => same output
  }
}

// Adapter B: probabilistic (LLM)
class LLMReasoner implements ReasonerPort {
  async generate({ context, refs }: { context: string; refs: string[] }): Promise<string> {
    return llm.generate({ context, refs }) // same input => may vary
  }
}

// Application code is identical; swap the adapter to switch behaviour
const reasoner: ReasonerPort = new RuleBasedReasoner() // or new LLMReasoner()

So: the decision of where to use deterministic vs probabilistic logic lives in the choice of adapters, not in the core. The core defines what steps exist and in what order — that is what we mean by "architecture is the multiplier."

Residual Intelligence

The Residual Intelligence Principle

Probabilistic models should solve only the residual uncertainty after deterministic reduction of the problem space.

Good architecture does not ask AI to solve everything. It asks AI to solve only what cannot be solved deterministically. Get that wrong and you're paying for intelligence you don't need.

This dramatically reduces complexity and leads to:

cheaper systems
more reliable outputs
fewer hallucinations
easier governance

Example: Code Completion

Modern IDEs illustrate this hybrid approach well. Many completions do not require LLMs. They rely on deterministic information:

syntax
types
symbol tables
project index
scope rules

Only when the system cannot determine a clear continuation does it use probabilistic generation. This combination is far more efficient than using an LLM everywhere.

Extreme Cases

Understanding the extremes is also instructive: pure deterministic systems suffer from rule explosion, pure probabilistic ones from uncontrolled uncertainty.

Pure deterministic systems

Strengths: reliability, predictability, efficiency
Weaknesses: brittleness, inability to generalize, enormous rule complexity

Pure probabilistic systems

Strengths: flexibility, adaptability, pattern recognition
Weaknesses: instability, hallucinations, lack of guarantees

Most systems that "went full AI" learned that the hard way.

Architecture Is the Multiplier

The biggest performance gains rarely come from making probabilistic models bigger. They come from structuring the system correctly.

A well-designed deterministic layer can reduce the search space by orders of magnitude, so the probabilistic layer works on a much smaller and easier problem — and that is where nonlinear efficiency gains appear. One good deterministic filter can shrink the problem tenfold before the model ever runs.

A Different Way to Think About AI

Instead of thinking about AI as a replacement for software engineering, we can think about it as a new computational layer.

Not:

software => replaced by AI

But:

deterministic systems
      +
probabilistic models
      =
hybrid intelligent architectures

The future of intelligent systems is likely not pure AI; it is architecture — the art of deciding which parts of the system must be deterministic, and where probability should be allowed to exist.

Closing Thought

AI did not eliminate engineering. It exposed something deeper.

Execution was never the hardest problem. The real challenge has always been structuring the problem space so that expensive intelligence is used only where it is truly needed. That is the job of architecture.

One question

If you removed all deterministic layers from your system and replaced them with LLM calls...

would it become smarter — or just more expensive?

Clean Architecture in the Age of AI: Preventing Architectural Liquefaction

Vasiliy Shilov — Mon, 02 Mar 2026 00:21:21 +0000

AI has made execution cheap; models optimize locally, not for architecture. In many teams the side effect is not bad code or broken builds, but something more structural: architectural liquefaction.

Architectural liquefaction is the progressive loss of structural boundaries under sustained probabilistic code generation and accelerated change cycles. It does not happen in one PR — layer boundaries soften, dependencies cross the wrong way, contracts drift, invariants weaken, "temporary" shortcuts pile up. Everything still works. Until the cost of change quietly multiplies. Without explicit constraints, entropy grows as we ship faster.

Clean Architecture is often described as a layering discipline. But in the context of AI-assisted development, it may serve a different purpose: a deterministic shell around probabilistic execution. Not dogma, not aesthetic preference — a stabilizing mechanism. When boundaries are explicit and dependency direction is enforced:

The solution space narrows.
Drift becomes detectable.
Structural violations surface earlier.
Local optimization cannot silently destroy global design.

The architecture becomes a control surface.

Before AI, architectural violations required effort. A developer had to consciously decide to break a boundary.

Now, violations can be generated in seconds.

And because AI-generated code often "looks right", structural erosion is harder to notice. The real cost is not bad code in the moment; it's that the drift stays invisible until you hit a refactor that suddenly touches half the codebase. One more thing: the more “flexible” and underspecified your prompts and rules are, the faster liquefaction tends to happen — the model fills in the gaps in whatever direction is locally easiest.

I once wrote down all our architectural principles — boundaries, dependency rules, what lives where — into a docs/ folder in plain Markdown, then wired them into Cursor as project rules so they get injected into every prompt.

tree ./docs/
.
├── ARCHITECTURAL-STYLE-GUIDE.md
├── CLEAN-NEST-APP.md
├── architecture
│   ├── adapters.md
│   ├── core.md
│   ├── controllers.md
│   ├── events.md
│   ├── inter-module-communication.md
│   ├── modules.md
│   ├── structure.md
│   ├── testing.md
│   └── when-to-simplify.md
└── guides
    ├── cheat-sheet.md
    ├── common-patterns.md
    └── quick-start.md

Before that, Cursor would often put repository calls straight into controllers or leak infrastructure imports into the domain layer — it just followed the patterns it saw in the codebase. After the rules were in place, it started routing through use cases and keeping adapters out of core. Still not perfect: sometimes it over-engineers or picks the wrong abstraction. But the rate of cross-layer violations dropped sharply. The model had something to optimize for instead of only optimizing for "code that runs".

That is one data point. It fits the hypothesis: explicit boundaries plus enforcement reduce structural drift, even when the code is AI-generated.

To make this testable we'd need drift metrics (e.g. dependency violations, cross-layer calls), review cost over time, and refactor scope when fixing violations. The hypothesis would be falsified if teams with strict rules drift as much as others, or if review and refactor cost keep growing despite enforcement. I'm preparing concrete ways to define and track these — drift metrics and cost — for follow-up posts.

Clean Architecture is usually framed as boundaries, inward dependencies, business logic isolated from the rest. True enough — but in an AI-heavy workflow the useful way to see it is: probabilistic execution, deterministic governance. We are not removing uncertainty. We are putting a box around it so that the model's choices stay inside the box. The architecture becomes the box.

If you are using AI heavily in development: are your boundaries getting stronger or weaker? Is the cost of keeping the structure in your head going up or down? I don't have a conclusion yet — only a hypothesis. AI has optimized execution; whether we've optimized stability, or are just producing entropy faster, is open. Obvious structures are often the first to dissolve when everything speeds up. In the next posts I'll look at other ways to keep things from liquefying.

DEV Community: Vasiliy Shilov

Inspecting @cursor/sdk: what npm installs - and what it doesn't decide for you

How this started

What I expected vs what I got

What actually sits in node_modules/@cursor

@cursor/sdk

@cursor/sdk-darwin-x64

License (the one file I didn't need a source map for)

Security: still not "in the box"

Privacy, telemetry, and dependencies

Checklist: safe local use of @cursor/sdk

The missing layer: admissibility

Closing

When diffs outrun gates: admissibility, not vibes

The model

The correction loop (this is the engine)

The failure pattern is predictable

Summary

Law

The stack (admissibility layers, not a checklist)

1) State space - illegal states unrepresentable

2) Topology - graph with enforced edges

3) Contracts - what may exist

4) Semantic policy - beyond "it parses"

5) Fitness - health and cost, not only logic

6) Decision economy - intent is not optional

7) Feedback semantics

Another shift (architectural, not only procedural)

Principle

The balance (do not over-rotate)

DRY caveat

Sanity checks on any seam

Rollout (without freezing the team)

What to optimize for

Code Is Not the Source of Truth. It's a Materialized View.

Contents

1. Introduction

2. Code Is Not the Center

3. Intent and Invariants

4. Two Modes

5. Comprehension Debt

6. Equilibrium

7. One Formula

Stop Using LLMs for Everything: The Power of Hybrid Architectures

Where This Idea Came From

Two Classes of Computation

Deterministic Computation

Strengths

Limitations

Probabilistic Computation

Strengths

Limitations

The False Dichotomy

Where Deterministic Computation Wins

Where Probabilistic Computation Wins

Deterministic Risk Control

The Real Architecture: Hybrid Systems

Ports and Adapters: Structure Decides

Residual Intelligence

The Residual Intelligence Principle

Example: Code Completion

Extreme Cases

Architecture Is the Multiplier

A Different Way to Think About AI

Closing Thought

One question

Clean Architecture in the Age of AI: Preventing Architectural Liquefaction

What actually sits in `node_modules/@cursor`

`@cursor/sdk`

`@cursor/sdk-darwin-x64`

Checklist: safe local use of `@cursor/sdk`