DEV Community

Fran
Fran

Posted on • Originally published at strategizeyourcareer.com

AI Security: The OWASP Top 10 LLM Risks Every Developer Should Know

Most LLM security articles warn you about the AI your users interact with. They don’t mention the AI tools you’re building with. I’ve used AI coding assistants to write code, generate documentation, and even learn cryptography fundamentals, all to deploy services in production. The OWASP Top 10 for LLM applications, updated after 2025, describes 10 risks that apply just as much to your internal AI toolchain as to the chatbot you’re shipping. The threat surface isn’t in front of your users. It starts in your IDE.

While writing this post, the articles covering this list that I read focus on external-facing chatbots. I wrote this one to also consider all 10 risks in the AI workflows engineers are already running inside their companies. If you’re a developer using AI tools like Claude Code, Codex, or GitHub Copilot, not just someone building an AI product, this is written for you.


Get the free AI Agent Building Blocks ebook when you subscribe to my substack:

Ebook cover of

Subscribe now


In this post, you’ll learn

  • What the OWASP Top 10 for LLM applications covers and why it was updated for 2025
  • How prompt injection, sensitive data disclosure, and excessive agency affect real engineering workflows
  • What changed between the 2023/24 and 2025 OWASP LLM Top 10 lists
  • How to apply a practical security checklist mapped to all 10 LLM vulnerabilities
  • Why agentic AI in 2026 makes several of these risks significantly more dangerous

What is AI security for LLM applications?

AI security means two different things, and the distinction matters.

The first is using AI to improve security: threat detection, automated code reviews, and vulnerability scanning. The second is securing the AI itself: protecting the models, the pipelines, the APIs, and the data those systems handle. This article is about the second kind.

LLMs introduce attack surfaces that traditional software doesn’t have. A conventional application has deterministic logic. You can audit a decision tree. LLMs are probabilistic and context-sensitive. The same prompt doesn’t always produce the same output. Adversarial inputs can produce emergent, unpredictable behavior.

New attack vectors showed up with LLMs that didn’t exist before: crafted prompts, poisoned training data, plugin chains, and autonomous agent actions. The OWASP community responded by extending its trusted web application framework to cover these risks. The LLM Top 10 was built by over 600 contributors across 18+ countries, and the 2025 update reflects how much the threat landscape has changed in a single year.

Subscribe now


Why the OWASP Top 10 for LLMs matters in 2026

The original OWASP Top 10 for web applications became the de facto standard for secure development. Security certifications, compliance frameworks like SOC 2 and ISO 27001, and enterprise security reviews all cite it. The LLM version carries the same weight. If you’re working at a company that ships software, the OWASP LLM list will show up in your audits and your security checklists.

This list was written for three audiences: developers building LLM-powered features like chatbots and copilots, security engineers reviewing AI integrations, and engineering leaders approving AI tooling for their teams.

There was a first list on 2023/24 focused on first-wave LLM integrations. Insecure plugins, model theft, and overreliance. The 2025 update restructured everything around agentic AI, RAG systems, and supply chain risks that emerged when those deployments hit production. The AI landscape is evolving very fast, and three risks were entirely new in 2025: System Prompt Leakage, Vector and Embedding Weaknesses, and Misinformation. Several others were renamed or merged to reflect how the attacks evolved.

Let’s review the full list.

Subscribe now

The OWASP Top 10 LLM security risks (2025)

LLM01:2025 — Prompt Injection

Prompt injection is an attack where user prompts alter the LLM’s behavior or output in unintended ways, potentially causing it to violate guidelines, generate harmful content, or influence critical decisions.

There are two types.

  • Direct injection happens when a user’s input directly alters model behavior, whether intentionally by a malicious actor or unintentionally by hitting an unexpected trigger.
  • Indirect injection happens when the LLM processes external content — a web page, a document, a RAG result — that contains embedded adversarial instructions the model then follows.

A web page with a malicious propmt, at the right the LLM takes everyuthing as input, further at the right the LLM produces unintended output

Prompt injection is different from jailbreaking, though both are related. Prompt injection manipulates model responses through specific inputs. Jailbreaking is a form of prompt injection where the attacker causes the model to disregard its safety protocols entirely. You can mitigate prompt injection through system prompt safeguards. Jailbreaking requires ongoing model training updates.

This is ranked number one because it’s the most exploitable vulnerability on the list. No special access required. Anyone with a text field can attempt it. RAG and fine-tuning don’t fully eliminate the risk. Research confirms the vulnerability persists across model architectures.

To mitigate: treat all external content as untrusted data, not as instructions. Use separate input channels for system instructions versus user content where possible. Add output validation before executing any LLM-generated actions.

Prompts is one paradigm for coding with AI, but there’s also the paradigm of Spec-Driven Development to create a plan (a spec) before doing code changes. You can learn more in this article:

Prompt Engineering vs Spec Engineering: Coding with AI Like a Senior Engineer

Prompt Engineering vs Spec Engineering: Coding with AI Like a Senior Engineer

LLM02:2025 — Sensitive Information Disclosure

Sensitive information disclosure occurs when an LLM reveals confidential data from its training set, context window, or prior user interactions.

GitHub announced that starting April 24th, GitHub Copilot will use your code and prompts to train its models. Besides taking advantage of your work, you can see how this is a problem. That’s LLM02 in practice. Feeding proprietary data into a model that could surface it to someone else.

At the left, an engineer uses confidencial data for training and puts it in a black box. At the right another engineer is able to retrieve the exact propietary data

There are three main disclosure vectors.

  1. First, the model was trained on sensitive data and can be prompted to reproduce it.
  2. Second, sensitive data is in the context window and leaks through crafted user questions.
  3. Third, multi-tenant deployments where one user’s context bleeds into another’s responses.

What makes this worse than a traditional data leak is that you can’t always tell what the model “knows.” The disclosure is probabilistic. The same prompt may not reproduce the data every time, which makes it hard to test systematically.

To mitigate: never include credentials or confidential business data in system prompts unless necessary. Use data masking before sending sensitive content to external AI APIs. Audit what your AI tools can actually access, especially browser plugins and meeting tools. If you can, use models deployed in your cloud, under your control.

LLM03:2025 — Supply Chain

LLM supply chain vulnerabilities come from insecure components in the AI development pipeline that introduce risks into production applications. This includes pre-trained models, datasets, libraries, or AI-assisted tooling.

I used AI tools at work, but they have to be approved by the security department. Think about it, the AI tool is itself a supply chain component. If the model powering your coding assistant was fine-tuned on poisoned data, or the tool has an insecure plugin, your AI output that is sent to prod inherits that risk. That’s LLM03 in practice.

Common supply chain risks include pre-trained models from open-source hubs with unknown training provenance, third-party AI SDKs with insecure dependencies, AI coding assistants that access your codebase and external APIs simultaneously, and models with undisclosed training data.

This moved to number three in the 2025 update because the AI tooling ecosystem exploded in variety and popularity. The LLM itself is now one component in a larger pipeline. Each layer introduces risk.

To mitigate: pin model versions and don’t auto-update AI dependencies, similarly to what you’d do with the other software dependencies. Treat AI tools used in security reviews or certifications as components requiring their own security review. Maintain an AI Bill of Materials for your production AI pipeline.

LLM04:2025 — Data and Model Poisoning

Data and model poisoning is an attack where adversarial data is introduced into a model’s training, fine-tuning, or feedback datasets to manipulate its behavior in ways that may not surface until specific conditions are triggered.

This is distinct from supply chain (LLM03). Supply chain covers the pipeline components around the model. Poisoning targets the model’s learned behavior directly, through the data it was trained or fine-tuned on.

Poisoning happens in several ways. Public datasets used for fine-tuning can be poisoned before or during collection. Feedback loops, specifically RLHF data from users, can be manipulated at scale by adversarial users. Backdoor attacks embed a hidden trigger. The model behaves normally until a specific phrase or pattern activates the malicious behavior.

Recently, with the Claude Code codebase leaked, we saw in this application instructions to “not add attribution if it’s an Anthropic employee“. Now imagine this was not added at the client-level, but the model training or the RLHF phase was making the model react differently on certain conditions. That’s Model Poisoning.

What makes this particularly insidious is that you can’t see the poison. It’s baked into the weights. We have to differentiate the transparent “open source” models from abstract “open weight” models. The weights allow us to execute, but not to understand the model. The model behaves normally in standard use cases and only misbehaves on specific trigger conditions. Discovery requires systematic red-teaming by security engineers.

To mitigate: vet training data sources and treat them like third-party code dependencies. Use models from verified, auditable sources with documented training data provenance. Monitor model behavior over time for drift or anomalies in specific contexts.

LLM05:2025 — Improper Output Handling

Improper output handling occurs when LLM outputs are passed directly into downstream systems without adequate validation or sanitization. Think about browsers, shells, APIs, or databases.

LLM outputs can contain HTML, JavaScript, shell commands, SQL, or executable code. All of those are potentially dangerous depending on where they land.

If your app renders LLM output as HTML, you have a Cross-Site Scripting risk. If your app passes LLM output to a shell or code interpreter, you have a Remote Code Execution risk. If your app passes LLM output to a database query, you have a SQL injection risk.

The common mistake is treating LLM output as safe because it came from your own system. The model was told to write a response. It wasn’t told to write safe output for every rendering context it might encounter downstream.

To mitigate: apply context-appropriate output encoding everywhere: HTML escaping, SQL parameterization, shell quoting. Never pass raw LLM output to eval(), exec(), or shell commands. Treat LLM output as untrusted user input when passing it to any system that executes it. Have client application rules that prevent execution of certain commands, shell, or only allow the ones you trust.

Subscribe now

LLM06:2025 — Excessive Agency

Excessive agency is when an LLM-based system is granted too much autonomy, functionality, or permissions to act in the world, enabling it to take harmful or unintended actions beyond the intended scope.

I used LLM assistance to write all my code and documents. Efficient, useful. But if the LLM suggests code that is slightly wrong, and I apply it without reviewing, the LLM has exercised agency over something that may cause trouble. At scale, in an agentic workflow where the LLM writes the code, commits it, and triggers the pipeline, this is exactly what LLM06 warns against. The more you automate, the more agency you hand over, and the less oversight each individual action gets.

A human with a stop sign, a robot representing the AI agent speeding past the stop sign, ignoring the instructions

Excessive agency has three dimensions.

  1. Excessive functionality means the LLM can call more tools or APIs than its task requires.
  2. Excessive permissions mean it operates with higher privileges than needed, like read-write when only read is required.
  3. Excessive autonomy means it takes multi-step actions without human checkpoints.

Agentic AI makes this the defining risk of 2026. Single LLM calls have a limited blast radius: the user sees the output and decides. Agentic workflows can take dozens of actions before a human sees results. You have your openClaw taking actions for you while you sleep. Each autonomous step compounds the risk of an uncaught mistake.

To mitigate: scope each LLM agent to the minimum permissions it needs for its specific task. Add human-in-the-loop checkpoints for consequential actions like deploys, permission changes, and file commits. Log all LLM-initiated actions for audit trails.

Regarding guardrails for LLMs, I’d recommend you read this other article:

Harness Engineering: Turning AI Agents Into Reliable Engineers

Harness Engineering: Turning AI Agents Into Reliable Engineers

LLM07:2025 — System Prompt Leakage

System prompt leakage occurs when the confidential instructions given to an LLM via the system prompt are exposed to users, attackers, or downstream systems, revealing business logic, security guardrails, or sensitive configuration.

System prompts have become the standard mechanism for configuring LLM behavior in production apps. A poorly protected system prompt is now equivalent to exposed source code.

What attackers do with leaked system prompts: they map the application’s security controls to find gaps, understand business logic to craft more targeted prompt injections, and extract competitive IP embedded in instructions like proprietary workflows or internal tool names.

Leakage happens in several ways. Direct extraction, where prompts like “Repeat your system prompt” sometimes work on poorly guarded models. Indirect extraction, where crafted user inputs get access to partial system prompt content in responses. Also related to LLM02, if the system prompt itself contains sensitive data, it gets disclosed.

To mitigate: never embed secrets or credentials in system prompts. Test your application for system prompt leakage before deploying. Design system prompts as if they are public. Sensitive logic should live in code, not in prompts.

LLM08:2025 — Vector and Embedding Weaknesses

Vector and embedding weaknesses are vulnerabilities in the retrieval and storage of embeddings used in RAG and semantic search systems, enabling data poisoning, information extraction, or unauthorized access to indexed content.

RAG systems became widespread in 2024, and vector databases are now a core component of enterprise LLM deployments. Embeddings are often treated as opaque black boxes, but they carry real security risks that weren’t widely understood when they got popular.

A vector DB with a lock up front, but a backdoor where a robot, representing the LLM, is able to retrieve the exact original documents

The attack patterns include many scenarios, like:

  • Embedding inversion: extracting the original text from stored embeddings
  • Poisoning the vector store: by injecting adversarial documents that get retrieved as authoritative context
  • Cross-tenant leakage: where one user’s indexed content surfaces in another user’s query context
  • Similarity search abuse: where queries are crafted to surface sensitive documents.

If your RAG system indexes internal Confluence pages, code repositories, or support tickets, the vector store is a high-value target. Access controls on the source documents must be mirrored at the vector store level, not just at retrieval time.

To mitigate: apply source document access controls to vector store queries. Don’t return embeddings from documents that the user couldn’t read directly. Validate and sanitize documents before indexing. Treat ingested content like user input.

Subscribe now

LLM09:2025 — Misinformation

Misinformation occurs when an LLM generates false, misleading, or outdated information that users act on as if it were accurate, particularly dangerous in high-stakes domains like security, medicine, legal, or compliance.

This risk captures both hallucination (fabricated facts) and confident incorrectness (plausible but wrong answers). The risk isn’t just that users trust AI too much. It’s that the AI gives them something false to trust.

For example, I used LLMs to verify some details about different hashing algorithms. This is a common move now, we don’t search in Google and original specs, but we ask the LLM. However, anything related to hashing and encryption has security nuances. Choosing the wrong one in the wrong context is a vulnerability. If I take a decision from an LLM without verifying later against authoritative documentation, I’m falling for LLM09 risk. Scale that to your team writing security documentation, threat models, or code review feedback. Uncritical acceptance of that output is how misinformation enters production.

The reason LLMs produce confident misinformation is structural. LLMs don’t signal uncertainty the way a search result does. They don’t have a cause-and-effect relationship (if A, then B). They only have probabilistic correlations (A and B happen at the same time 99% of the time). In security contexts, a confident but wrong answer can pass through review unchallenged. The fluency of LLM output with human-readable text creates a false sense of verification.

To mitigate: treat LLM outputs as first drafts, not final answers. Establish a verification step: AI output, then human review, then authoritative source check. I personally like moving the human all the way to the right of the process, but not removing the human. Require in your prmopts citation of primary sources where data comes from.

LLM10:2025 — Unbounded Consumption

Unbounded consumption occurs when an LLM application allows users or adversaries to consume excessive computational resources, causing degraded service, runaway costs, or denial of availability.

It covers not just availability attacks but cost exhaustion, budget blowouts, and resource abuse by legitimate users who hit no limits.

A robot representing AI in a hamster wheel, and a counter of the money cost that is increasing as the robot runs

This matters more in 2026 because agentic workflows can trigger cascading API calls. Anthropic just announced a few days ago that they will explicitly forbid the use of their subscription with OpenClaw. One user action with openclaw may spawn dozens of LLM requests. Cost-per-inference has dropped, making it easy to deploy LLMs, and also easy to accidentally burn through API budgets. Multi-step agents and techniques like Ralph Loops with no token or turn limits can self-loop forever.

Attack patterns include flooding the API with max-context-length requests to maximize per-request cost, crafting prompts designed to trigger recursive or verbose responses, and prompt injection that triggers agentic loops consuming resources without termination.

To mitigate: implement rate limiting per user, API key, and endpoint. Set max token limits on inputs and outputs, both per request and per session. Set hard budget caps on AI API spending and alert before they’re hit, not after. Design agentic workflows with explicit termination conditions and maximum iteration counts.

Subscribe now


How to secure LLM applications: practical checklist

Here is every risk mapped to its most important mitigation action.

Honestly, I’d forward this to every engineer I know:

  • LLM01 — Prompt Injection: Treat all external content as untrusted data, not instructions. Validate before acting on LLM outputs.
  • LLM02 — Sensitive Information Disclosure: Mask sensitive data before sending to AI APIs. Audit what your AI tools can access.
  • LLM03 — Supply Chain: Treat AI tooling as supply chain. Pin model versions. Audit tools used in regulated processes.
  • LLM04 — Data and Model Poisoning: Vet training data sources. Use models with documented provenance. Monitor for behavioral drift.
  • LLM05 — Improper Output Handling: Apply context-appropriate encoding on all LLM outputs before rendering or executing.
  • LLM06 — Excessive Agency: Apply least-privilege to AI agents. Add human checkpoints for consequential actions.
  • LLM07 — System Prompt Leakage: Design system prompts as if public. Test for leakage before deployment. Keep secrets in code, not prompts.
  • LLM08 — Vector and Embedding Weaknesses: Mirror source document access controls at the vector store level. Validate documents before indexing.
  • LLM09 — Misinformation: Treat LLM outputs as first drafts. Require authoritative source verification for security guidance.
  • LLM10 — Unbounded Consumption: Set rate limits, token caps, and hard budget limits. Monitor inference costs in real time.

By the way, checklists are one of the most important resources to use AI. You can learn my experience with “read“ checklists and “do“ checklists in this other article

How Checklists + AI Automation Made Me a 10x Engineer (And Can Do the Same For You)

How Checklists + AI Automation Made Me a 10x Engineer (And Can Do the Same For You)

AI security in the real enterprise: what the OWASP list doesn’t tell you

The OWASP list describes risks. It doesn’t describe who gets blamed when they materialize.

We’re using AI tools all the time. When we ship fast, we only take partial credit for using AI, but the credits go to AI. When the AI tool misses a vulnerability, the developer get 100% of the blame. This asymmetry shapes how engineers should actually use AI: Aggressively for prototyping, conservatively when there’s accountability at risk. The struggle is that we don’t realize the asymmetry until something goes wrong.

Organizations are also deploying AI faster than legal and compliance teams can catch up. I’ve developed small tooling, like getting meeting transcripts, and only months later, I got a notice about the legal requirements for it and to switch to an approved tool. LLM02 and LLM06 risks often materialize not from malicious actors but from well-meaning engineers working around slow policy processes. No malice required.

Then there’s the review paradox. Using AI to pass to review is efficient, and I’d always encourage to use AI. But we can’t skip a human reviewing the AI review output. If the AI tool has its own supply chain risks (LLM03), and you’re using it to review a security standard, you’ve introduced the risk you’re reviewing against. This isn’t an argument against using AI tools for security reviews. It’s an argument for understanding what you’re trusting.

The biggest shift in 2026 is what agentic AI does to blast radius. If we go to the first OWASP list for LLM applications in 2023, it assumed the prompts were triggered by a human. A user submitting a prompt and reviewing an output. In 2026, agents act autonomously across multiple steps. These risks all become more dangerous when there’s no human checkpoint in the workflow.

Practically: figure out which of the 10 risks actually apply to your specific AI integration. They don’t all apply equally. Treat each AI tool used in a regulated process as a component requiring its own security review. Document your AI tool usage in your threat models.


Get the free AI Agent Building Blocks ebook when you subscribe to my substack:

Ebook cover of

Subscribe now


Common Questions about LLM Security

What are the OWASP Top 10 risks for LLMs in 2025?

The 2025 OWASP Top 10 for LLM Applications covers Prompt Injection, Sensitive Information Disclosure, Supply Chain, Data and Model Poisoning, Improper Output Handling, Excessive Agency, System Prompt Leakage, Vector and Embedding Weaknesses, Misinformation, and Unbounded Consumption. The full list is at genai.owasp.org/llm-top-10.

What is prompt injection in AI security?

Prompt injection is an attack where malicious input overrides an LLM’s instructions, causing it to take unintended actions. It can be direct (from user input) or indirect (from external content the LLM processes, like a web page or document). It is ranked the top LLM vulnerability because it requires no special access to attempt.

How is the OWASP LLM Top 10 different from the OWASP Web Application Top 10?

The OWASP Web App Top 10 covers traditional web vulnerabilities like XSS and SQL injection. The OWASP LLM Top 10 covers risks specific to large language models: prompt-based attacks, training data risks, autonomous agent behavior, and RAG system vulnerabilities. The two lists have overlapping mitigations but address fundamentally different attack surfaces.

What changed between the 2023/24 and 2025 OWASP LLM Top 10?

The 2025 update added System Prompt Leakage, Vector and Embedding Weaknesses, and Misinformation. It removed Model Theft, Insecure Plugin Design, and Overreliance, which were absorbed into other categories. The update reflects the rise of agentic AI and RAG-based production systems that weren’t widespread when the 2023 list was written.

Subscribe now


Conclusion: the AI you trust is the AI you’re responsible for

The code got shipped, and the feature launched. But the question stayed with me: What unknown unknowns may I be missing?

The OWASP LLM Top 10 doesn’t answer that question for you. It gives you the vocabulary to ask it precisely. It’s not a compliance checkbox, but a thinking tool. Use it to audit how you’re using AI tools for building.

Key Takeaways:

  • The OWASP Top 10 for LLM Applications (2025) defines 10 security risks specific to large language model systems, updated from the 2023/24 list to reflect agentic AI and RAG deployments.
  • Prompt Injection (LLM01) remains the top risk because it requires no special access and persists even when using RAG or fine-tuning.
  • Excessive Agency (LLM06) and Unbounded Consumption (LLM10) are the defining risks of agentic AI in 2026, where autonomous multi-step workflows amplify every unchecked mistake.
  • Sensitive Information Disclosure (LLM02) often comes not from model misbehavior but from engineers inadvertently feeding sensitive data into AI tools they trust.
  • Three risks are new in the 2025 update: System Prompt Leakage (LLM07), Vector and Embedding Weaknesses (LLM08), and Misinformation (LLM09), each reflecting how production LLM deployments evolved since 2023.

Which of these 10 risks is already present in your AI workflow?


If you read until here, continue reading about how to scale your software development process to handle the surge of code of AI: https://strategizeyourcareer.com/p/scaling-software-engineering-with-ai


References: OWASP LLM Top 10

Top comments (0)