Bala Paranj

Posted on May 24

Thoughtworks Just Moved LangGraph Out of Adopt. Our Security Tool Was Already Built for What Comes Next.

#security #ai #agents #architecture

Thoughtworks Technology Radar April 2026 issue moved LangGraph out of Adopt. It is an architecture argument:

Instead of starting with a rigid graph and a massive shared state, this approach favors simple agents communicating through code execution, with graph structures added later when needed. … Because each agent only has access to the state it needs, reasoning, testing and debugging become easier.

Two claims are doing the work there:

Communicate through code execution, not a shared mutable blackboard.
Each agent gets only the state it needs, not a view into one global object.

We build a cloud-security evaluator — a CLI, not an agent framework. We never wrote a graph orchestrator or a shared-state store. But when we recently traced an agent-driven workflow through our tool end to end, we realized we had landed on precisely the pattern the Radar is now recommending — by accident, by way of an older discipline: make every capability a deterministic command that reads files and writes files.

The workflow that gave it away

Here is a four-agent task an engineer might hand to an LLM orchestrator, using our tool's commands:

Engineer: "Connect Steampipe to AWS and produce observations for S3 and IAM."
Agent 1:   reads contracts/steampipe/aws_s3_bucket.yaml
           reads contracts/steampipe/aws_iam_role.yaml
           queries Steampipe, transforms, validates  →  observations/

Engineer: "Evaluate and show me compound risks."
Agent 2:   stave apply   → findings.json
           stave gaps    → gap-report.json

Engineer: "Prove whether anonymous access to PHI is reachable."
Agent 3:   reads reasoning-specs/.../z3-public-read-bucket/spec.yaml
           stave export-sir  → SMT-LIB facts
           follows the spec  → SAT / UNSAT

Engineer: "Map findings to HIPAA Technical Safeguards."
Agent 4:   reads the compliance crosswalk
           stave export compliance --framework hipaa  → status report

Look at what is not here:

No shared AgentState object threaded through a graph.
No orchestrator that has to know about all four steps in advance.
No agent that can read or corrupt another agent's working memory.

Each agent's state is exactly the slice of the filesystem its job requires. Agent 1 touches Steampipe contracts and writes observations. Agent 3 touches a reasoning spec and the exported facts. Agent 3 has no idea Agent 1 exists — it consumes a snapshot, not Agent 1's internal variables. The integration surface between agents is the thing every engineer already knows how to inspect, diff, and version: files and exit codes.

That is "communicate through code execution" in its most literal form. The agent runs a command; the command's output is the message.

Why "the state it needs" falls out for free

In a global-shared-state design, scoping is something you have to impose — you write reducers, you namespace keys, you hope no node mutates a field another node depends on. The Radar's critique is that this is where reasoning and debugging go to die: when everything can touch everything, a wrong value has N possible authors.

Our agents get scoped state without anyone designing scoping, because the unit of work is a single-responsibility command over an explicit input:

LangGraph-as-default	Command-and-file composition
One graph, defined up front	No graph; agents call commands ad hoc
Global shared state object	State = the files each command reads
Scoping is engineered (reducers, namespaces)	Scoping is the command's argument list
A bad value has many possible authors	A bad value came from one command's input
Test a node by mocking the whole state	Test a command with an input file

stave gaps cannot accidentally read the reasoning spec. stave export compliance cannot mutate the findings. Not because we forbade it — because those things were never in scope. The argument list is the scope.

The hidden requirement: determinism

Here is the part that is easy to miss. "Agents communicate through code execution" only works if a command's output is trustworthy as a message — which means the same input must always produce the same output. If Agent 2's apply returned subtly different findings each run, Agent 4's compliance mapping would be building on sand, and you'd be back to debugging a distributed system where the state is non-reproducible.

We made determinism a founding rule long before any of this was about agents: same inputs + same --now produce byte-identical output. Snapshots instead of live API calls. Time as an explicit input. Sorted, canonical JSON. A byte-for-byte verification command and golden tests that fail the build the instant output drifts.

That rule turns out to be the thing that makes code-execution composition safe:

An agent can cache and reuse a prior step's output, because re-running would produce the same bytes.
An agent can verify a claim by re-deriving it — export-sir on the same snapshot yields the same facts, so the SAT/UNSAT proof is reproducible, not a one-time oracle reading.
A human can debug the pipeline by running any single command in isolation and getting the exact output the agent saw.

A global mutable graph state gives you none of this. A value in the blackboard has no provenance — you can't re-derive it, you can only trust that whatever node wrote it was correct. Every fact our tool emits, by contrast, carries a deterministic id that traces back through the export and the projector to the specific observation property that produced it. Provenance is a property of the architecture, not a logging afterthought.

We made this stronger by deleting things

The most counterintuitive move was subtraction. Over the last stretch we removed the commands that didn't fit "snapshots in → findings out": continuous monitoring, remediation planning, incident timelines, external enrichment, multi-account orchestration. Those are real jobs — but they are orchestration jobs, and orchestration must be owned by the calling agent (or CI, or a scheduler).

In LangGraph terms, we resisted the temptation to grow our own graph. We kept the tool a set of leaf functions — evaluate, export, prove, map — and let the agent be the graph, when a graph is even needed. That mirrors the Radar's final point: add graph structure later, when the use case demands it, instead of paying for it everywhere up front.

The result is a tool that is agent-ready by being boring: deterministic, file-based, single-responsibility commands with no shared state to corrupt and no orchestration opinions to fight. An LLM can compose them. A bash script can compose them. A human can run them one at a time. The composition layer is free to be as simple — or, when warranted, as graph-shaped — as the problem actually requires.

Global State and Simplicity

The Radar's pullback on LangGraph is not "graphs are bad." It's "don't start with a rigid graph and a global state when simple agents over code execution would be leaner, and easier to reason about, test, and debug."

If you're building a capability for agents to use — rather than the agent framework itself — the lesson is sharper: expose deterministic commands over explicit files, scope every command to its inputs, and let the caller own the graph. You'll get the scoping, testability, and debuggability the Radar is asking for, and you'll get them without writing a single reducer.

We didn't build for agents. We built for determinism and single responsibility. It turns out that's the same thing.

Top comments (2)

Truong Bui • May 25

The "scoping is the argument list" observation is the insight that usually gets buried. In shared-state designs, scoping is work you do continuously — you add reducers, namespace your keys, audit who can write what. With command-and-file composition, scope falls out of the architecture without anyone designing it. Each command only touches what's in its argument list because that's all it knows about.

This has a direct security implication that your table captures but doesn't name explicitly: when a bad value surfaces, it has one possible author. That's not just better for debugging — it's better for auditing and blast radius containment. An agent that only touches its slice of the filesystem can't corrupt state it was never handed.

The MCP server ecosystem is learning this the hard way from the other direction. MCP servers often expose capabilities far beyond what their stated purpose requires — a simple database query wrapper that also has file system access, or an API integration that pulls in a shell execution tool. We scanned 508 public servers at MCPSafe (mcpsafe.io) and that mismatch between stated purpose and actual capability footprint shows up in a lot of the SSRF and tool poisoning findings. The Radar's principle — "each agent gets only the state it needs" — applies at the tool definition layer too, before the agent framework even enters the picture.

The determinism requirement is the part that's easy to skip and painful to add later. You clearly made it a founding constraint rather than a retrofit, which is why it holds up across four different agent compositions.

Bala Paranj • May 25

Very good observations and now I am gaining deeper insights into my own article.