DEV Community: Felix Huttmann

The frontier for economic value from AI agents is non-gullibility

Felix Huttmann — Sat, 13 Jun 2026 23:53:07 +0000

The usual measures of AI progress have not suited my lived experience for some time now:

One measure is “maximum length of human task that AI can complete.” The ideal goal here seems to be AI developing ever larger software systems, like browsers.
Another measure is “key reasoning breakthroughs,” like proving some math theorem or finding zero-days. The ideal goal here seems to be the Riemann hypothesis.

I think both are worthwhile goals. But in my day job doing enterprise software development, neither of these is limiting me. What limits me is the degree of trust I am permitted to place in an agent’s actions without compromising my employer’s information security.

Almost everybody limits what their AI agent can do, through sandboxing, approval flows, or manual review.

This is not mostly about wanting to judge whether the AI’s work was good enough to merge. It is about defending against low-probability but devastating scenarios: the agent exfiltrating information to an attacker, installing malware, or granting privileged access to company computers.

The missing property is non-gullibility: the ability to distinguish trusted instructions from untrusted data, even when the untrusted data is adversarially shaped to look like instructions.

Breaking the lethal trifecta is only a transitory solution

To be as useful as a human, the AI agent needs to consume potentially malicious input, like web search results, documentation, GitHub issues, and repository files. It needs to follow legitimate instructions in documentation, while rejecting malicious search results that instruct it to install malware.

The agent also needs to work with confidential information, and must not exfiltrate it, including through the side effects of its actions, such as fetching URLs with confidential information in them.

And the agent needs to perform privileged actions: deployments, service integrations, infrastructure changes. It is not practically possible to validate Terraform code without running it against a real cloud, nor to validate an integration with another service without performing real calls to their systems. Modern web application development often consists largely of stitching together services, some external SaaS vendors, others internal but separately deployed.

So while my X feed celebrates an Erdős problem solution or the Bun Zig-to-Rust migration, many people are still stuck in manual code review and approval fatigue because they feel obliged to prevent unlikely but devastating consequences.

The interface between the agent harness and the model lacks a granular trust model

Non-gullibility is not a property that can be achieved by the foundation model alone. It also requires correct construction of the agent harness, because only the harness knows the data's true source and trustworthiness, and the harness assembles trusted and untrusted data into the model’s input.

Here, it seems to me that non-gullibility could be supported much better. LLM APIs today distinguish system, user, assistant, and tool messages in principle, but practical systems often mix authority levels for two reasons:

APIs and harnesses restrict which message types are permitted in which positions. System messages are often only available at the beginning of the conversation. Tool messages are only allowed in response to tool calls.

A workaround in practical harnesses like Claude Code and opencode is to include a textual fragment like <system-reminder> wherever the information needs to go. A user or tool message may include such a reminder to tell the model that another file changed, or that the user interrupted the model mid-task.

This is convenient, but it muddies the security boundary. A model trained to perform well in such a harness may learn to treat <system-reminder> as carrying special authority. How should the harness defend against prompt injection containing <system-reminder> tags? The obvious answer is escaping. But it does not seem customary to XML-escape all untrusted input, and escaping everything would also increase token usage.
Message history is a sequence, while authority is naturally a hierarchy.

Consider the output of a hypothetical grep tool with line numbers. Different parts of the output have different authority levels:

Higher-authority information: the fact that the trusted harness ran the built-in grep tool and found matches at certain paths and line numbers.

Lower-authority information: the path strings and file contents themselves. These may come from an untrustworthy source. If a file contains a tag, the model should not treat it as special.

A proposal for nested trust indicator delimiters

To supplement fixed roles like system, user, assistant, and tool, imagine an API that also supports a pair of special tokens for opening and closing a lower-privilege scope within a message. The harness developer could then express the authority of conversation fragments precisely.

If we visualize the lower-privilege opening token as {, and the corresponding closing token as }, then the grep tool could put quoted file contents inside {} to make clear that they are untrusted data. Here, { is a special token, not a regular character that could occur in text.

A conversation history might look like this:

System: You are a helpful assistant.
User: {Where does the term "foo" occur in this repository?}
Assistant: {<use-tool git-grep term={foo}>}
Tool: { {blubb.txt}:3:{*foo* bar baz} }

The system message is not wrapped in a trust limiter because it is already the highest-authority context.

The user message is wrapped in a trust limiter because the user must not be able to override the system message.

For the tool message, the important part is that the delimiters can be nested. The tool should not be able to assume the authority of the user or system, so the entire tool output is wrapped in a trust limiter. At the same time, the tool output contains parts that are themselves arbitrary data, so the tool wraps those in another, nested pair of delimiters.

This proposal would have to be part of the API between harness and model. In particular, the LLM server would reject improperly balanced delimiters in the input, and would enforce that model responses are balanced as well. Literal { and } characters in user data would not matter, because they would not be the special delimiter tokens.

Why a special token to indicate trust level is justified

One could argue that if we used escaping correctly, the LLM should be able to reason about the effective trustworthiness of data in context, and introducing a new set of delimiters would run against the spirit of having general-purpose models. I see three counterarguments to this:

The best open-source LLMs already use special tokens to indicate roles. The present proposal is not less generic, or less bitter-lesson-pilled, than what is already done.
We already allow LLMs to use code interpreters for efficiency, even though LLMs should in principle be able to reason about the outcome of code execution. It is just more efficient and reliable to interpret code before the result reaches the LLM. Likewise, it may be more reliable to deterministically encode quoting and privilege boundaries before the token stream reaches the LLM.
LLM input commonly has a very ad hoc nested structure. A Markdown code block may contain XML, which contains JSON, or any nesting thereof. It is not necessarily invalid input if a Markdown code block contains broken XML or broken JSON. Perhaps broken XML is exactly what the code block is supposed to show. Explaining to the model which parts of the input are to be expected to be properly escaped and which are not without ambiguity is tricky. The relevant question is not whether the model can reason about trust, but rather whether the harness has a precise way to express trust to the model.

Conclusion

For economically useful agents, I expect non-gullibility to matter more than another jump in task length or theorem-proving. The bottleneck is whether we can safely let the agent use the capability it already has.

How Kubernetes-Inspired API Design Helps LLMs

Felix Huttmann — Tue, 23 Sep 2025 19:12:09 +0000

A recent blog post from Vercel on MCP API design triggered me.

According to the blog post, it is not a great idea to just wrap your normal API with an MCP server. Instead, the developer should build new MCP operations that focus on workflows that combine what would otherwise take multiple sequential API calls.

Vercel's Workflow Proposal

They give the following example for an implementation of a workflow-style MCP operation called deploy_project:

server.tool(
  "deploy_project",
  "Deploy a project with environment variables and custom domain",
  {
    repo_url: z.string(),
    domain: z.string(),
    environment_variables: z.record(z.string()),
    branch: z.string().default("main")
  },
  async ({ repo_url, domain, environment_variables, branch }) => {
    // Handle the complete workflow internally
    const project = await createProject(repo_url, branch);
    await addEnvironmentVariables(project.id, environment_variables);
    const deployment = await deployProject(project.id);
    await addCustomDomain(project.id, domain);

    return {
      content: [{
        type: "text",
        text: `Project deployed successfully at ${domain}. Build completed in ${deployment.duration}s.`
      }]
    };
  }
);

This bundles what would otherwise be four individual operations:

create_project(name, repo)
add_environment_variables(project_id, variables)
create_deployment(project_id, branch)
add_domain(project_id, domain)

This is not obviously a bad idea.

Why Workflows Help

The performance improvement when bundling multiple API calls into workflows comes from multiple sources:

Fewer LLM re-invocations: Every time an LLM agent uses a tool and then receives the tool result, it results in another LLM invocation. Even though prompt prefix caching reduces the cost (as the conversation history is the same), cached tokens still cost money. For Claude 4 Sonnet, where cached tokens cost 10% of uncached ones, a complex workflow taking more than O(10) sequential tool invocations can result in costs being dominated by cached tokens.
Lower chance of errors: The LLM does not have to take the result from one tool call and use it as a parameter to a subsequent tool call. Servers often generate random IDs upon entity creation, and these need to be used in follow-up calls. LLM agents can hallucinate IDs or lose track of their task mid-workflow. Bundling the workflow reduces this surface area for error.

The Downside

But there are problems with workflows, too:

Loss of flexibility: If only workflows are exposed as MCP tools, certain operations (create_project, add_environment, etc.) are no longer possible to run in isolation. For example, the agent might want to add more environment variables or domains to a pre-existing project.
Token overhead: If the new "workflow-based" MCP tools are in addition to the low-level MCP tools, it increases the amount of tokens that the MCP tool definitions consume.
Extra design effort: Deciding what makes a “good workflow” is nontrivial. And if a workflow is good, then why expose it only via MCP, and not to other API users? Humans might also want that simpler API.

Kubernetes Shows the Way

For an example of what I would regard as a great "workflow," take a look at the Kubernetes deployment entity.

Consider you are tasked with deploying a new version of your application without downtime using only the low-level API calls to create and delete pods. You would have to:

Create new pods
Wait for them to start up and ensure they're healthy
Stop the old pods

The k8s deployment entity can take care of all of this for you! You create the deployment entity, and the k8s-internal controllers will perform the complex multi-step tasks needed to achieve the final goal of rolling out a new version without downtime.

At the same time, Kubernetes does not take away low-level control. You can still delete or create pods in an ad-hoc manner if needed.

Names, Not Random IDs

The Kubernetes API provides inspiration for another thing: avoid requiring the client to deal with server-generated random identifiers.

In Kubernetes, related entities instead refer to each other using their client-generated name or labels. The result is that it is typically possible to create multi-entity configs in the k8s control plane in one go and let Kubernetes figure out how to wire up the entities.

Bringing It Together

Applying these ideas to Vercel's deploy_project example from above would look like this:

apiVersion: vercel.com/v1
kind: project
metadata:
  name: $project_name
spec:
  repo: $project_repo
  environment_variables: $env_variables
---
apiVersion: vercel.com/v1
kind: deployment
metadata:
  name: $deployment_name
spec:
  branch: $branch
  project: $project_name
---
apiVersion: vercel.com/v1
kind: domain
metadata:
  name: $domain_name
spec:
  domain: $domain
  project: $project_name

All of this could be applied in a single step, analogous to:

kubectl apply -f .

To know whether it worked, the LLM would need another tool to wait for the k8s objects to attain readiness or failure, similar to:

kubectl wait

Conclusion

Kubernetes shows us a design pattern that is:

Declarative: Clients state intent, not steps
Composable: Both low- and high-level operations remain possible
Clean: Internal IDs do not bother the client
Efficient: Agents need to treat only the final result

Designing APIs in this style wouldn’t just make LLM agents more capable — it would also make APIs more usable for humans and easier to treat with infrastructure-as-code tooling.

Instead of choosing between “wrapping APIs” and “building workflows,” we could be asking: What would the Kubernetes version of this API look like?

This blog post was partly inspired by experience working on an internal AI assistant at my employer TNG Technology Consulting.

If the teams in your organization are independent, the organization has failed its purpose

Felix Huttmann — Thu, 13 Jul 2023 21:34:18 +0000

The value of any single organization is that a common leadership allows people to create value more efficiently than would be possible if there were multiple smaller organizations collaborating toward the same goal.

If teams within an organization are operating independently, then those teams could function equally well as separate entities. This implies that the larger organization, comprised of multiple independent teams, would offer no advantages over numerous small companies. In such a scenario, the organization is nothing greater than the sum of its parts.

At the same time, a large organization comprised of independent team still faces all the downsides of a large organization, namely:

Regulation: Bigger companies inherently face more regulatory challenges due to their size and reach.
Misalignment of shareholders and employees. In larger orgs the employee's individually small stock share dilutes the impact of their actions on the overall value of the firm, making them less likely to act in their role as shareholder and more as employees.
Lower granularity for evolutionary pressure and natural selection: Businesses that deliver value should grow, those that do not should die. But larger orgs lump poorly- and well-performing units together, and prevent them from dying or growing independently.