DEV Community: Fran

AI Security: The OWASP Top 10 LLM Risks Every Developer Should Know

Fran — Sun, 12 Apr 2026 08:48:51 +0000

Most LLM security articles warn you about the AI your users interact with. They don’t mention the AI tools you’re building with. I’ve used AI coding assistants to write code, generate documentation, and even learn cryptography fundamentals, all to deploy services in production. The OWASP Top 10 for LLM applications, updated after 2025, describes 10 risks that apply just as much to your internal AI toolchain as to the chatbot you’re shipping. The threat surface isn’t in front of your users. It starts in your IDE.

While writing this post, the articles covering this list that I read focus on external-facing chatbots. I wrote this one to also consider all 10 risks in the AI workflows engineers are already running inside their companies. If you’re a developer using AI tools like Claude Code, Codex, or GitHub Copilot, not just someone building an AI product, this is written for you.

Get the free AI Agent Building Blocks ebook when you subscribe to my substack:

Subscribe now

In this post, you’ll learn

What the OWASP Top 10 for LLM applications covers and why it was updated for 2025
How prompt injection, sensitive data disclosure, and excessive agency affect real engineering workflows
What changed between the 2023/24 and 2025 OWASP LLM Top 10 lists
How to apply a practical security checklist mapped to all 10 LLM vulnerabilities
Why agentic AI in 2026 makes several of these risks significantly more dangerous

What is AI security for LLM applications?

AI security means two different things, and the distinction matters.

The first is using AI to improve security: threat detection, automated code reviews, and vulnerability scanning. The second is securing the AI itself: protecting the models, the pipelines, the APIs, and the data those systems handle. This article is about the second kind.

LLMs introduce attack surfaces that traditional software doesn’t have. A conventional application has deterministic logic. You can audit a decision tree. LLMs are probabilistic and context-sensitive. The same prompt doesn’t always produce the same output. Adversarial inputs can produce emergent, unpredictable behavior.

New attack vectors showed up with LLMs that didn’t exist before: crafted prompts, poisoned training data, plugin chains, and autonomous agent actions. The OWASP community responded by extending its trusted web application framework to cover these risks. The LLM Top 10 was built by over 600 contributors across 18+ countries, and the 2025 update reflects how much the threat landscape has changed in a single year.

Subscribe now

Why the OWASP Top 10 for LLMs matters in 2026

The original OWASP Top 10 for web applications became the de facto standard for secure development. Security certifications, compliance frameworks like SOC 2 and ISO 27001, and enterprise security reviews all cite it. The LLM version carries the same weight. If you’re working at a company that ships software, the OWASP LLM list will show up in your audits and your security checklists.

This list was written for three audiences: developers building LLM-powered features like chatbots and copilots, security engineers reviewing AI integrations, and engineering leaders approving AI tooling for their teams.

There was a first list on 2023/24 focused on first-wave LLM integrations. Insecure plugins, model theft, and overreliance. The 2025 update restructured everything around agentic AI, RAG systems, and supply chain risks that emerged when those deployments hit production. The AI landscape is evolving very fast, and three risks were entirely new in 2025: System Prompt Leakage, Vector and Embedding Weaknesses, and Misinformation. Several others were renamed or merged to reflect how the attacks evolved.

Let’s review the full list.

Subscribe now

The OWASP Top 10 LLM security risks (2025)

LLM01:2025 — Prompt Injection

Prompt injection is an attack where user prompts alter the LLM’s behavior or output in unintended ways, potentially causing it to violate guidelines, generate harmful content, or influence critical decisions.

There are two types.

Direct injection happens when a user’s input directly alters model behavior, whether intentionally by a malicious actor or unintentionally by hitting an unexpected trigger.
Indirect injection happens when the LLM processes external content — a web page, a document, a RAG result — that contains embedded adversarial instructions the model then follows.

Prompt injection is different from jailbreaking, though both are related. Prompt injection manipulates model responses through specific inputs. Jailbreaking is a form of prompt injection where the attacker causes the model to disregard its safety protocols entirely. You can mitigate prompt injection through system prompt safeguards. Jailbreaking requires ongoing model training updates.

This is ranked number one because it’s the most exploitable vulnerability on the list. No special access required. Anyone with a text field can attempt it. RAG and fine-tuning don’t fully eliminate the risk. Research confirms the vulnerability persists across model architectures.

To mitigate: treat all external content as untrusted data, not as instructions. Use separate input channels for system instructions versus user content where possible. Add output validation before executing any LLM-generated actions.

Prompts is one paradigm for coding with AI, but there’s also the paradigm of Spec-Driven Development to create a plan (a spec) before doing code changes. You can learn more in this article:

Prompt Engineering vs Spec Engineering: Coding with AI Like a Senior Engineer

LLM02:2025 — Sensitive Information Disclosure

Sensitive information disclosure occurs when an LLM reveals confidential data from its training set, context window, or prior user interactions.

GitHub announced that starting April 24th, GitHub Copilot will use your code and prompts to train its models. Besides taking advantage of your work, you can see how this is a problem. That’s LLM02 in practice. Feeding proprietary data into a model that could surface it to someone else.

There are three main disclosure vectors.

First, the model was trained on sensitive data and can be prompted to reproduce it.
Second, sensitive data is in the context window and leaks through crafted user questions.
Third, multi-tenant deployments where one user’s context bleeds into another’s responses.

What makes this worse than a traditional data leak is that you can’t always tell what the model “knows.” The disclosure is probabilistic. The same prompt may not reproduce the data every time, which makes it hard to test systematically.

To mitigate: never include credentials or confidential business data in system prompts unless necessary. Use data masking before sending sensitive content to external AI APIs. Audit what your AI tools can actually access, especially browser plugins and meeting tools. If you can, use models deployed in your cloud, under your control.

LLM03:2025 — Supply Chain

LLM supply chain vulnerabilities come from insecure components in the AI development pipeline that introduce risks into production applications. This includes pre-trained models, datasets, libraries, or AI-assisted tooling.

I used AI tools at work, but they have to be approved by the security department. Think about it, the AI tool is itself a supply chain component. If the model powering your coding assistant was fine-tuned on poisoned data, or the tool has an insecure plugin, your AI output that is sent to prod inherits that risk. That’s LLM03 in practice.

Common supply chain risks include pre-trained models from open-source hubs with unknown training provenance, third-party AI SDKs with insecure dependencies, AI coding assistants that access your codebase and external APIs simultaneously, and models with undisclosed training data.

This moved to number three in the 2025 update because the AI tooling ecosystem exploded in variety and popularity. The LLM itself is now one component in a larger pipeline. Each layer introduces risk.

To mitigate: pin model versions and don’t auto-update AI dependencies, similarly to what you’d do with the other software dependencies. Treat AI tools used in security reviews or certifications as components requiring their own security review. Maintain an AI Bill of Materials for your production AI pipeline.

LLM04:2025 — Data and Model Poisoning

Data and model poisoning is an attack where adversarial data is introduced into a model’s training, fine-tuning, or feedback datasets to manipulate its behavior in ways that may not surface until specific conditions are triggered.

This is distinct from supply chain (LLM03). Supply chain covers the pipeline components around the model. Poisoning targets the model’s learned behavior directly, through the data it was trained or fine-tuned on.

Poisoning happens in several ways. Public datasets used for fine-tuning can be poisoned before or during collection. Feedback loops, specifically RLHF data from users, can be manipulated at scale by adversarial users. Backdoor attacks embed a hidden trigger. The model behaves normally until a specific phrase or pattern activates the malicious behavior.

Recently, with the Claude Code codebase leaked, we saw in this application instructions to “not add attribution if it’s an Anthropic employee“. Now imagine this was not added at the client-level, but the model training or the RLHF phase was making the model react differently on certain conditions. That’s Model Poisoning.

What makes this particularly insidious is that you can’t see the poison. It’s baked into the weights. We have to differentiate the transparent “open source” models from abstract “open weight” models. The weights allow us to execute, but not to understand the model. The model behaves normally in standard use cases and only misbehaves on specific trigger conditions. Discovery requires systematic red-teaming by security engineers.

To mitigate: vet training data sources and treat them like third-party code dependencies. Use models from verified, auditable sources with documented training data provenance. Monitor model behavior over time for drift or anomalies in specific contexts.

LLM05:2025 — Improper Output Handling

Improper output handling occurs when LLM outputs are passed directly into downstream systems without adequate validation or sanitization. Think about browsers, shells, APIs, or databases.

LLM outputs can contain HTML, JavaScript, shell commands, SQL, or executable code. All of those are potentially dangerous depending on where they land.

If your app renders LLM output as HTML, you have a Cross-Site Scripting risk. If your app passes LLM output to a shell or code interpreter, you have a Remote Code Execution risk. If your app passes LLM output to a database query, you have a SQL injection risk.

The common mistake is treating LLM output as safe because it came from your own system. The model was told to write a response. It wasn’t told to write safe output for every rendering context it might encounter downstream.

To mitigate: apply context-appropriate output encoding everywhere: HTML escaping, SQL parameterization, shell quoting. Never pass raw LLM output to eval(), exec(), or shell commands. Treat LLM output as untrusted user input when passing it to any system that executes it. Have client application rules that prevent execution of certain commands, shell, or only allow the ones you trust.

Subscribe now

LLM06:2025 — Excessive Agency

Excessive agency is when an LLM-based system is granted too much autonomy, functionality, or permissions to act in the world, enabling it to take harmful or unintended actions beyond the intended scope.

I used LLM assistance to write all my code and documents. Efficient, useful. But if the LLM suggests code that is slightly wrong, and I apply it without reviewing, the LLM has exercised agency over something that may cause trouble. At scale, in an agentic workflow where the LLM writes the code, commits it, and triggers the pipeline, this is exactly what LLM06 warns against. The more you automate, the more agency you hand over, and the less oversight each individual action gets.

Excessive agency has three dimensions.

Excessive functionality means the LLM can call more tools or APIs than its task requires.
Excessive permissions mean it operates with higher privileges than needed, like read-write when only read is required.
Excessive autonomy means it takes multi-step actions without human checkpoints.

Agentic AI makes this the defining risk of 2026. Single LLM calls have a limited blast radius: the user sees the output and decides. Agentic workflows can take dozens of actions before a human sees results. You have your openClaw taking actions for you while you sleep. Each autonomous step compounds the risk of an uncaught mistake.

To mitigate: scope each LLM agent to the minimum permissions it needs for its specific task. Add human-in-the-loop checkpoints for consequential actions like deploys, permission changes, and file commits. Log all LLM-initiated actions for audit trails.

Regarding guardrails for LLMs, I’d recommend you read this other article:

Harness Engineering: Turning AI Agents Into Reliable Engineers

LLM07:2025 — System Prompt Leakage

System prompt leakage occurs when the confidential instructions given to an LLM via the system prompt are exposed to users, attackers, or downstream systems, revealing business logic, security guardrails, or sensitive configuration.

System prompts have become the standard mechanism for configuring LLM behavior in production apps. A poorly protected system prompt is now equivalent to exposed source code.

What attackers do with leaked system prompts: they map the application’s security controls to find gaps, understand business logic to craft more targeted prompt injections, and extract competitive IP embedded in instructions like proprietary workflows or internal tool names.

Leakage happens in several ways. Direct extraction, where prompts like “Repeat your system prompt” sometimes work on poorly guarded models. Indirect extraction, where crafted user inputs get access to partial system prompt content in responses. Also related to LLM02, if the system prompt itself contains sensitive data, it gets disclosed.

To mitigate: never embed secrets or credentials in system prompts. Test your application for system prompt leakage before deploying. Design system prompts as if they are public. Sensitive logic should live in code, not in prompts.

LLM08:2025 — Vector and Embedding Weaknesses

Vector and embedding weaknesses are vulnerabilities in the retrieval and storage of embeddings used in RAG and semantic search systems, enabling data poisoning, information extraction, or unauthorized access to indexed content.

RAG systems became widespread in 2024, and vector databases are now a core component of enterprise LLM deployments. Embeddings are often treated as opaque black boxes, but they carry real security risks that weren’t widely understood when they got popular.

The attack patterns include many scenarios, like:

Embedding inversion: extracting the original text from stored embeddings
Poisoning the vector store: by injecting adversarial documents that get retrieved as authoritative context
Cross-tenant leakage: where one user’s indexed content surfaces in another user’s query context
Similarity search abuse: where queries are crafted to surface sensitive documents.

If your RAG system indexes internal Confluence pages, code repositories, or support tickets, the vector store is a high-value target. Access controls on the source documents must be mirrored at the vector store level, not just at retrieval time.

To mitigate: apply source document access controls to vector store queries. Don’t return embeddings from documents that the user couldn’t read directly. Validate and sanitize documents before indexing. Treat ingested content like user input.

Subscribe now

LLM09:2025 — Misinformation

Misinformation occurs when an LLM generates false, misleading, or outdated information that users act on as if it were accurate, particularly dangerous in high-stakes domains like security, medicine, legal, or compliance.

This risk captures both hallucination (fabricated facts) and confident incorrectness (plausible but wrong answers). The risk isn’t just that users trust AI too much. It’s that the AI gives them something false to trust.

For example, I used LLMs to verify some details about different hashing algorithms. This is a common move now, we don’t search in Google and original specs, but we ask the LLM. However, anything related to hashing and encryption has security nuances. Choosing the wrong one in the wrong context is a vulnerability. If I take a decision from an LLM without verifying later against authoritative documentation, I’m falling for LLM09 risk. Scale that to your team writing security documentation, threat models, or code review feedback. Uncritical acceptance of that output is how misinformation enters production.

The reason LLMs produce confident misinformation is structural. LLMs don’t signal uncertainty the way a search result does. They don’t have a cause-and-effect relationship (if A, then B). They only have probabilistic correlations (A and B happen at the same time 99% of the time). In security contexts, a confident but wrong answer can pass through review unchallenged. The fluency of LLM output with human-readable text creates a false sense of verification.

To mitigate: treat LLM outputs as first drafts, not final answers. Establish a verification step: AI output, then human review, then authoritative source check. I personally like moving the human all the way to the right of the process, but not removing the human. Require in your prmopts citation of primary sources where data comes from.

LLM10:2025 — Unbounded Consumption

Unbounded consumption occurs when an LLM application allows users or adversaries to consume excessive computational resources, causing degraded service, runaway costs, or denial of availability.

It covers not just availability attacks but cost exhaustion, budget blowouts, and resource abuse by legitimate users who hit no limits.

This matters more in 2026 because agentic workflows can trigger cascading API calls. Anthropic just announced a few days ago that they will explicitly forbid the use of their subscription with OpenClaw. One user action with openclaw may spawn dozens of LLM requests. Cost-per-inference has dropped, making it easy to deploy LLMs, and also easy to accidentally burn through API budgets. Multi-step agents and techniques like Ralph Loops with no token or turn limits can self-loop forever.

Attack patterns include flooding the API with max-context-length requests to maximize per-request cost, crafting prompts designed to trigger recursive or verbose responses, and prompt injection that triggers agentic loops consuming resources without termination.

To mitigate: implement rate limiting per user, API key, and endpoint. Set max token limits on inputs and outputs, both per request and per session. Set hard budget caps on AI API spending and alert before they’re hit, not after. Design agentic workflows with explicit termination conditions and maximum iteration counts.

Subscribe now

How to secure LLM applications: practical checklist

Here is every risk mapped to its most important mitigation action.

Honestly, I’d forward this to every engineer I know:

LLM01 — Prompt Injection: Treat all external content as untrusted data, not instructions. Validate before acting on LLM outputs.
LLM02 — Sensitive Information Disclosure: Mask sensitive data before sending to AI APIs. Audit what your AI tools can access.
LLM03 — Supply Chain: Treat AI tooling as supply chain. Pin model versions. Audit tools used in regulated processes.
LLM04 — Data and Model Poisoning: Vet training data sources. Use models with documented provenance. Monitor for behavioral drift.
LLM05 — Improper Output Handling: Apply context-appropriate encoding on all LLM outputs before rendering or executing.
LLM06 — Excessive Agency: Apply least-privilege to AI agents. Add human checkpoints for consequential actions.
LLM07 — System Prompt Leakage: Design system prompts as if public. Test for leakage before deployment. Keep secrets in code, not prompts.
LLM08 — Vector and Embedding Weaknesses: Mirror source document access controls at the vector store level. Validate documents before indexing.
LLM09 — Misinformation: Treat LLM outputs as first drafts. Require authoritative source verification for security guidance.
LLM10 — Unbounded Consumption: Set rate limits, token caps, and hard budget limits. Monitor inference costs in real time.

By the way, checklists are one of the most important resources to use AI. You can learn my experience with “read“ checklists and “do“ checklists in this other article

How Checklists + AI Automation Made Me a 10x Engineer (And Can Do the Same For You)

AI security in the real enterprise: what the OWASP list doesn’t tell you

The OWASP list describes risks. It doesn’t describe who gets blamed when they materialize.

We’re using AI tools all the time. When we ship fast, we only take partial credit for using AI, but the credits go to AI. When the AI tool misses a vulnerability, the developer get 100% of the blame. This asymmetry shapes how engineers should actually use AI: Aggressively for prototyping, conservatively when there’s accountability at risk. The struggle is that we don’t realize the asymmetry until something goes wrong.

Organizations are also deploying AI faster than legal and compliance teams can catch up. I’ve developed small tooling, like getting meeting transcripts, and only months later, I got a notice about the legal requirements for it and to switch to an approved tool. LLM02 and LLM06 risks often materialize not from malicious actors but from well-meaning engineers working around slow policy processes. No malice required.

Then there’s the review paradox. Using AI to pass to review is efficient, and I’d always encourage to use AI. But we can’t skip a human reviewing the AI review output. If the AI tool has its own supply chain risks (LLM03), and you’re using it to review a security standard, you’ve introduced the risk you’re reviewing against. This isn’t an argument against using AI tools for security reviews. It’s an argument for understanding what you’re trusting.

The biggest shift in 2026 is what agentic AI does to blast radius. If we go to the first OWASP list for LLM applications in 2023, it assumed the prompts were triggered by a human. A user submitting a prompt and reviewing an output. In 2026, agents act autonomously across multiple steps. These risks all become more dangerous when there’s no human checkpoint in the workflow.

Practically: figure out which of the 10 risks actually apply to your specific AI integration. They don’t all apply equally. Treat each AI tool used in a regulated process as a component requiring its own security review. Document your AI tool usage in your threat models.

Get the free AI Agent Building Blocks ebook when you subscribe to my substack:

Subscribe now

Common Questions about LLM Security

What are the OWASP Top 10 risks for LLMs in 2025?

The 2025 OWASP Top 10 for LLM Applications covers Prompt Injection, Sensitive Information Disclosure, Supply Chain, Data and Model Poisoning, Improper Output Handling, Excessive Agency, System Prompt Leakage, Vector and Embedding Weaknesses, Misinformation, and Unbounded Consumption. The full list is at genai.owasp.org/llm-top-10.

What is prompt injection in AI security?

Prompt injection is an attack where malicious input overrides an LLM’s instructions, causing it to take unintended actions. It can be direct (from user input) or indirect (from external content the LLM processes, like a web page or document). It is ranked the top LLM vulnerability because it requires no special access to attempt.

How is the OWASP LLM Top 10 different from the OWASP Web Application Top 10?

The OWASP Web App Top 10 covers traditional web vulnerabilities like XSS and SQL injection. The OWASP LLM Top 10 covers risks specific to large language models: prompt-based attacks, training data risks, autonomous agent behavior, and RAG system vulnerabilities. The two lists have overlapping mitigations but address fundamentally different attack surfaces.

What changed between the 2023/24 and 2025 OWASP LLM Top 10?

The 2025 update added System Prompt Leakage, Vector and Embedding Weaknesses, and Misinformation. It removed Model Theft, Insecure Plugin Design, and Overreliance, which were absorbed into other categories. The update reflects the rise of agentic AI and RAG-based production systems that weren’t widespread when the 2023 list was written.

Subscribe now

Conclusion: the AI you trust is the AI you’re responsible for

The code got shipped, and the feature launched. But the question stayed with me: What unknown unknowns may I be missing?

The OWASP LLM Top 10 doesn’t answer that question for you. It gives you the vocabulary to ask it precisely. It’s not a compliance checkbox, but a thinking tool. Use it to audit how you’re using AI tools for building.

Key Takeaways:

The OWASP Top 10 for LLM Applications (2025) defines 10 security risks specific to large language model systems, updated from the 2023/24 list to reflect agentic AI and RAG deployments.
Prompt Injection (LLM01) remains the top risk because it requires no special access and persists even when using RAG or fine-tuning.
Excessive Agency (LLM06) and Unbounded Consumption (LLM10) are the defining risks of agentic AI in 2026, where autonomous multi-step workflows amplify every unchecked mistake.
Sensitive Information Disclosure (LLM02) often comes not from model misbehavior but from engineers inadvertently feeding sensitive data into AI tools they trust.
Three risks are new in the 2025 update: System Prompt Leakage (LLM07), Vector and Embedding Weaknesses (LLM08), and Misinformation (LLM09), each reflecting how production LLM deployments evolved since 2023.

Which of these 10 risks is already present in your AI workflow?

If you read until here, continue reading about how to scale your software development process to handle the surge of code of AI: https://strategizeyourcareer.com/p/scaling-software-engineering-with-ai

References: OWASP LLM Top 10

Harness Engineering: Turning AI Agents Into Reliable Engineers

Fran — Sun, 12 Apr 2026 08:37:48 +0000

Most AI coding agents can write impressive demos. Few can ship production code without breaking everything around it. The difference is harness engineering: the discipline of building systems that make AI agents reliable.

Here is how I used it to ship 100+ PRs/month at Amazon

Get the free AI Agent Building Blocks ebook when you subscribe to my Substack

Subscribe now

I’m Fran. I’m a software engineer at Amazon during the day, and I write and experiment with AI during the night.

I want to tell you about the moment I realized that prompting alone would never work for production AI agents.

I was working on an automation project at Amazon. The goal was simple: update large JSON configuration files automatically based on requirements. These configs were thousands of lines long, and the updates followed predictable patterns.

A perfect job for an AI agent, right? That’s what everyone thought. Engineers on the team opened their AI-powered IDE or CLI, typed their prompts to modify the JSONs, and watched the LLM struggle to modify the target node correctly.

It failed to implement the changes properly. Every single time.

The model wasn’t broken. We were on Opus 4.6 with a one-million context window.

The context window was a problem. When you feed multiple 10,000-line JSON files into an LLM, the model loses track of the surrounding structure. It edits what you asked it to edit, but it quietly breaks everything around it. No error message. No warning. Just a structurally invalid file that passes a surface-level glance but fails in production.

This is not a model quality problem. It is an environment problem. And the fix is not a better prompt.

You may think the fix is Anthropic to release a 10M context window, but we know that a bigger context window still degrades after 100k or 200k tokens.

The real fix is a harness.

Harness engineering is the discipline that turned my broken prototype at Amazon into a system that now ships over 100 PRs per month. Fully autonomous.

I wrote a 10-step guide to build that agent in this previous post:

How I built an agent that works at Amazon while I sleep (10 steps)

In this post, you’ll learn

What harness engineering is and how it differs from prompt engineering, context engineering, and agent engineering
Why AI agents fail on large structured files like JSON, and how to fix it with deterministic scripts
The four pillars of a production AI harness: state management, context architecture, guardrails, and entropy management
How I built a harness at Amazon that ships 100+ PRs/month without human intervention
The mindset shift that separates engineers who demo AI from engineers who deploy it

Why AI Agents Fail on Large Files

Most engineers today interact with AI coding tools the same way: open an IDE, type a prompt, review the output, repeat. For small files and isolated tasks, this works beautifully. But the moment the problem involves a large amount of files, the whole approach falls apart.

Large Language Models are probabilistic engines. They predict the next token based on patterns in their context window. When the context window is filled with thousands of lines of structured data, the model’s attention gets diluted. It correctly identifies the node you want to modify, but it loses track of sibling keys, nested brackets, and structural integrity. The result is a file that looks right at the point of change but is broken somewhere else.

We have to understand that Context Window isn’t the same as Context Attention. As a human, I can store hundreds of items in a storage unit, but I will remember about a fraction of the items I have there.

Same with LLMs. Performance degrades as the context window gets filled (and costs).

Did you know that every message you send is sending all the previous conversation in an API call? Yes, you’re billed also for those past messages. The servers in the cloud don’t keep any state, they only have a cache.

When the model fails to make an update, the instinct is to write a better prompt.

Add more constraints.

Tell the model to “preserve the surrounding structure.”

“Make no mistakes.”

But that is like asking someone to juggle while blindfolded and then giving them more detailed instructions about hand positioning. The problem is not the instructions. The problem is the blindfold.

The context window itself becomes a liability when it’s packed with thousands of lines of repetitive structure. No prompt can fix that.

I covered in this post how to scale AI setting up guardrails

Scaling Software Engineering with AI

What Is Harness Engineering?

Harness engineering is the discipline of designing the systems, architectural constraints, execution environments, and automated feedback loops that wrap around AI agents to make them reliable in production.

The term was first coined by Mitchell Hashimoto, the founder of HashiCorp. The metaphor comes from horse riding. Think of the LLM as a powerful horse. It has raw energy, speed, and strength. But without reins, a saddle, and a bridle, that energy is undirected and potentially destructive (the horse kicks you, the LLM runs a rm -rf, and I don’t know which is worse). The harness allows the rider to direct the horse’s power productively.

To understand where harness engineering fits, here’s how it relates to the other disciplines you’ve probably heard about:

Prompt Engineering → Single interaction to craft the best input to the model (single request-response interaction).
Context Engineering → Control what the model sees during a whole session (multiple interactions until clearing).
Harness Engineering → Designs the environment, tools, guardrails, and feedback loops (multiple sessions).
Agent Engineering → Design the agent’s internal reasoning loop (define specialized agents).
Platform Engineering → Infrastructure to manage deployment, scaling, and cloud operations (where agents can run).

Prompt engineering is about what you say to the model.

Context engineering is about what the model sees.

Harness engineering is about the entire world the model operates in. It includes the tools the agent can call, the constraints it cannot violate, the documentation structure it reads, and the automated feedback loops that catch its mistakes before they reach production.

Subscribe now

How I Built a Harness That Ships 100+ PRs/Month at Amazon

Let me walk you through the specific problem I solved, because abstract talk about agents only becomes useful when you see them applied to a real constraint.

The problem: We had large JSON configuration files that needed automated, repetitive updates. These files were too big for the LLM’s context window. Every manual update was tedious, error-prone, and time-consuming.

What everyone else tried: Engineers on the team opened their IDEs and started prompting. The LLM would correctly modify the target node, but would fail to identify which other files had to be updated, and it would fail to keep the correct JSON structure. There was no awareness of JSON structural integrity as a hard constraint. Every run was a coin flip. Sometimes it worked. Most times it broke. You can’t trust an AI like this.

The harness approach: Instead of trying to update the prompt, I narrowed the problem to one specific operation: How to read and write into our JSON files. I wasn’t trying to build a general-purpose agent. I built a scoped one. I wrote deterministic Python scripts to handle the actual JSON surgery: read the file, apply a precise modification, validate the structure, write it back. The agent’s only job was to provide the intent, the what, and the where. The script provided the execution guarantee.

The key insight was this: the agent calls the script as a tool. It does not generate JSON directly. It tells the script what to change, and the script changes it with zero ambiguity. This means the AI is the brain that chooses which steps to take, like a CEO indicating directions. The AI didn’t have to make the groundwork itself.

I then added a structural validation step as a guardrail. If the resulting JSON is malformed, the agent cannot proceed. It physically cannot ship a broken config. This provides a feedback loop, which is something managers and C-level executives also want when delegating to humans.

The result: 100+ PRs per month. Zero structural corruption. Fully autonomous. The system has been running for months, and after a few weeks of tweaking edge cases in the deterministic scripts, the Agent nails the updates.

At some point, we realized the only reason a PR gets rejected is that the requirement was wrong, not because the AI didn’t execute the requirement.

That’s when you are into something good.

This is what harness engineering looks like in practice. You stop asking the model to do things it’s bad at. You give it the tools for the parts that require precision, you let the agent handle the parts that require judgment, and you instruct it not to jump in to do the job itself.

Subscribe now

The Four Pillars of Harness Engineering

My JSON automation project taught me the pattern to build a good AI agent, but the approach is generic. After studying how OpenAI, Anthropic, and other teams have built their own harnesses, I’ve identified four pillars that every production harness needs.

1. State Management

AI agents are stateless by default. Every API call starts with a blank slate. For a task that takes five minutes, this is fine. For a task that spans hours or requires following the updates of dozens of files, statelessness is bad. The agent forgets what it did 20 steps ago. It repeats the same mistake in a loop. It loses track of the overall architecture. This “AI amnesia” is the most common failure mode in long-running agent tasks, and it’s why Openclaw got very popular.

A harness solves this by serializing context snapshots and restoring them across sessions. Think of it as save points in a video game. The agent does work, the harness saves a snapshot, and if the agent crashes or hits a rate limit, the harness restores the snapshot and picks up exactly where it left off.

Advanced implementations use structured state objects that persist across runs. There are two main strategies here:

Context Compaction, where the harness continuously summarizes the agent’s history as it approaches the token limit
Context Resets, where the harness clears the window entirely and boots a fresh agent with a structured handoff of artifacts.

Both work. The right choice depends on your task length and coherence requirements.

2. Context Architecture (Progressive Disclosure)

The first agent-friendly codebases I saw produced gigantic AGENTS.md files. This approach fails for the same reason a 500-page employee handbook fails on someone’s first day. The agent gets confused, misses critical rules, and follows outdated instructions that were never cleaned up.

The better approach is progressive disclosure. Give the agent a short table of contents that points to a structured docs/ directory. The agent reads the table of contents first, then navigates to the specific document it needs for the task at hand.

This is the same pattern introduced with the Agent Skills standard. Instead of the early MCP implementations that loaded all the definitions above the user’s first prompt, let the agent find them when needed.

The agent gets a map, not an encyclopedia.

One more thing that is easy to forget: anything the agent cannot access in-context does not exist for it. Your Slack threads, Google Docs, and verbal agreements in meetings… None of that is real to the agent unless provided or instructed to fetch them.

3. Deterministic Guardrails

This is where harness engineering diverges most sharply from prompt engineering. Prompt engineering asks the agent to write clean code or make no mistakes. Harness engineering mechanically enforces it.

You’d need custom linters, structural tests, and CI jobs that validate architecture before merge.

The agent isn’t “discouraged” or “instructed against” skipping those. The agent is blocked.

If a file exceeds a size limit, the linter rejects it.
If a dependency flows in the wrong direction, the structural test fails.
If the JSON output is malformed, the validation script prevents mering the PR.

The error messages in your custom lints and validations should include remediation instructions. When the agent hits a linter failure, the error message itself tells the agent exactly how to fix the problem. That error message gets injected directly into the agent’s context, creating a tight feedback loop that requires zero human intervention.

This was a realization of my early attempts in the agent that modifies JSONs. I was using JQ commands instead of Python scripts. JQ ended all possible failures with a 0 or 1 exit code. These outputs are intended for terminals, not for LLMs to recover from them.

One more thing worth noting: A “boring” codebase is better for agents. Stable APIs, predictable patterns, and simple architectures are far easier for agents to model than clever abstractions. Every layer of complexity you add to your codebase is a layer the agent has to navigate.

Keep it simple.

4. Entropy Management (Garbage Collection)

This is something most people skip. AI agents replicate patterns, including bad ones. Over time, your codebase accumulates “AI slop”: redundant logic, verbose implementations, subtly hallucinated variables that the model keeps copying because they exist in the context.

Left unchecked, this entropy degrades the entire codebase. People call it context poisoning.

Some people use this as an argument that AI is bad. But whenever I face a bad AI output, instead of judging if AI is good or bad for this task, I ask myself how can we make AI work here? The answer is usually adding another harness.

We can have a recurring cleanup agent. Think of it as garbage collection for your repo. For any implementation task, have a separate agent that scans the codebase, looks for drift from your golden principles, and fixes things before raising the PR. You can also execute this kind of agent on a schedule. Because you already designed other harnesses, like having unit tests and linters, you can allow AI to refactor code with confidence.

It is the same concept as a “doc-gardening” agent that scans for stale documentation and updates it. Technical debt is called like this becuase it works like money debt. If you pay it daily, you stay solvent. If you let it accumulate, you end up spending a lot of time later.

The harness should include entropy management from day one, not as an afterthought.

To know where to apply the harnesses, I covered a 3-level framework for AI-assisted coding in this previous post:

My code was AI Slop until I learned this system

The Mindset Shift: From Prompts to Harness Engineering

The biggest change harness engineering requires is not technical. It is mental.

You stop writing prompts. You start designing environments. Your job is neither to write code nor to write the detailed prompts. It is to make the codebase legible to the agent. Every file name, every directory structure, every naming convention, every piece of documentation exists not just for human developers but for the autonomous agents that will read, modify, and extend the codebase at machine speed.

Constraints stop being restrictions and start being multipliers. A custom linter you write once applies to every line of code the agent writes, deterministically, and forever. A structural test you build today catches every future violation automatically. You invest once, and the return compounds with every agent run. That is the leverage engineers had for humans, and we need it for AI agents.

The engineers shipping the most code right now all converged on this independently. OpenAI’s internal team shipped one million lines of code and 1,500 PRs in five months using this approach. Anthropic has released 52 features in 50 days. My team at Amazon ships 100+ PRs per month. The patterns are the same: narrow the problem, use deterministic scripts at the execution boundary, enforce constraints mechanically, and make the codebase legible to the agent.

source

Now, to apply these harnesses, you need to know the building blocks of IA agents.

If you want the full guide, let me know your email below, and I’ll send you the free “AI Agents Building Blocks” guide inside the newsletter’s welcome email

Subscribe now

To Recap:

What is harness engineering in AI?

Harness engineering is the discipline of designing the systems, constraints, execution environments, and feedback loops that wrap around AI agents to make them reliable in production. Unlike prompt engineering, which focuses on a single model interaction, harness engineering governs the entire agent lifecycle, from state management to automated validation.

How is harness engineering different from prompt engineering?

Prompt engineering crafts the input to the model in a single interaction. Harness engineering designs the entire environment the agent operates in: tools, guardrails, documentation structure, and automated feedback loops. The goal is reliable behavior across thousands of runs, not just one.

Why do AI agents fail on large structured files like JSON?

Large JSON files exceed or crowd out the model’s context window, causing the agent to lose track of the surrounding structure. It may correctly modify the target node but corrupt adjacent keys, producing a broken file. The fix is a deterministic script that handles the file surgery, with the agent only providing the intent.

How do you build a simple AI agent harness?

Start by narrowing the problem to one operation. Write deterministic scripts for the execution step. Wire the agent to tool-call those scripts instead of generating the output directly. Add a validation step that the agent cannot bypass (embed it in scripts if needed!). This three-part loop, intent to deterministic execution to validation, is the minimal viable harness.

What is an AGENTS.md file and why does it matter?

AGENTS.md is a file in your repository that tells an AI agent the rules, conventions, and architectural constraints of your codebase. It acts as the agent’s static context, injected at startup, so it knows your team’s norms without you having to repeat them in every prompt. Keep it short (under 100 lines) and use it as a table of contents pointing to deeper documentation.

Conclusion: The Harness IS the Product

The model is the easy part. Everyone has access to the same foundation models. GPT, Claude, Gemini, they are all remarkably capable. The harness is the hard part. The harness is what separates a demo that impresses your manager from a production system that ships real code every day without breaking things.

Here is what I want you to take away from this article:

Narrow the problem before you build the agent. A scoped agent that does one thing well beats a general-purpose agent that does everything poorly.
Use deterministic scripts at the execution boundary. Let the agent provide intent. Let the script provide the guarantee.
Enforce constraints mechanically, not verbally. If a rule matters, make it a linter, a test, or a validation step. Do not put it in a prompt and hope for the best.
Make the codebase legible to the agent, not just to humans. Progressive disclosure, structured documentation, and the repo as the single system of record.

The engineers who figure this out first will have an enormous advantage. Not because they have better models, but because they have better harnesses.

Subscribe now

If you read until this point, you have to read this other article with the AI concepts every software engineer needs to know in 2026:

Software engineer's AI stack in 2026