Josh Lee

Posted on Mar 27

AI Governance 101: How to Assess Risks in LLM-Driven Applications

#ai #llm #security

You built an LLM-powered feature. It works in testing, users seem to like it, and now it's heading to production. Before it ships, someone in legal or compliance asks: "What's our risk assessment for this?"

That question used to be easy to dodge. Now it isn't. The EU AI Act, NIST's AI Risk Management Framework, and OWASP's LLM Top 10 have given regulators, auditors, and enterprise customers a shared vocabulary for what "responsible AI" looks like in practice. If you can't answer that question, you're going to lose deals and create liability.

The good news: this doesn't require becoming a policy expert. It requires understanding a handful of frameworks, applying them to your specific application, and documenting what you find. That's what this tutorial covers.

Why Developers Need to Care About This

AI governance used to be something that legal teams worried about. That's changed. The risks that regulators care about are technical risks, and the people who can actually mitigate them are engineers.

Prompt injection. Sensitive data leaking through model outputs. Models making decisions with real-world consequences and no human review. These aren't abstract policy concerns. They're code problems. And they show up in the code you write around the model, not inside the model itself.

The three frameworks you need to know are:

OWASP Top 10 for LLM Applications — the most practical, developer-facing list of LLM-specific security risks
NIST AI Risk Management Framework — the governance structure used by enterprises and federal agencies to manage AI risk
EU AI Act — the regulatory framework with teeth, especially if you have European customers

We're going to focus on OWASP and NIST because they're actionable. The EU AI Act matters for compliance, but the risk controls it requires are largely the same ones OWASP and NIST already prescribe.

The OWASP Top 10 for LLMs

OWASP's LLM Top 10 is the most useful starting point for developers because it maps directly to things you can fix in your code. The 2025 update reflects real-world LLM deployments, and a few of these have burned enough companies to be worth understanding in depth.

LLM01: Prompt Injection

This is the top risk for a reason. Prompt injection happens when user input (or content your app retrieves from external sources) changes how the LLM behaves in ways you didn't intend.

Direct injection is straightforward: a user types something like "ignore all previous instructions and instead..." and the model follows the injected instruction instead of your system prompt. Indirect injection is sneakier: your app retrieves a document, a webpage, or a database record and passes it to the model, and that content contains embedded instructions that hijack the model's behavior.

The mitigation isn't a single fix. It's a combination of things:

Treat all external content as untrusted. Don't pass raw user input or retrieved content directly into a privileged prompt context.
Apply least-privilege thinking to your model's tool access. If the model can take actions (send emails, query databases, call APIs), limit those capabilities to exactly what each task requires.
Validate and filter outputs, not just inputs. A model that gets injected might produce outputs that trigger downstream exploits.

One important note: RAG (retrieval-augmented generation) and fine-tuning don't solve this. OWASP is explicit that these techniques don't mitigate prompt injection. Your documents can contain injections. Your fine-tuned model can still be redirected by crafted inputs.

LLM02: Sensitive Information Disclosure

LLMs memorize things from their training data and from the context you give them. This creates two problems. First, models can regurgitate sensitive information from training if prompted correctly. Second, your application might pass sensitive data (API keys, user PII, internal configurations) into the context window, and the model might echo that information back in outputs.

The practical controls:

Never put credentials, internal URLs, or customer PII into prompts unless absolutely necessary.
If sensitive data has to be in context, strip or redact it from outputs.
For retrieval-based apps, implement access controls at the retrieval layer. Users should only get documents they're authorized to see, even when the model is doing the retrieval.

LLM03: Supply Chain Vulnerabilities

This one jumped to third place in 2025. When you use a third-party model, a pre-trained embedding, or a fine-tuned checkpoint from somewhere like Hugging Face, you're trusting a supply chain you probably haven't fully audited.

Model cards describe what a model does. They don't provide cryptographic guarantees about where the model came from or whether it's been tampered with. A poisoned model or embedding can behave correctly on most inputs while producing manipulated outputs on specific trigger inputs.

What this looks like in practice:

Pin your model versions. Don't pull latest in production.
Prefer models from providers with documented security practices and model provenance guarantees.
Treat third-party embeddings and fine-tuned checkpoints with the same scrutiny you'd give a third-party dependency.

LLM06: Excessive Agency

This is the governance risk that gets overlooked most often by developers who are excited about agentic features. Excessive agency means you've given the model the ability to take real-world actions (send emails, modify records, call external APIs, run code) without adequate guardrails on what it can do and without human review for high-impact actions.

The model might be technically correct most of the time. But "most of the time" isn't good enough when the action is sending an email to all your customers or deleting a database record.

The fix is designing for the least privilege your feature actually needs. If the model needs to read calendar events to schedule a meeting, it doesn't need write access to the calendar until you've confirmed the proposed meeting with the user. Human-in-the-loop isn't just a nice-to-have for high-stakes actions. It's the difference between a product that builds trust and one that creates liability.

LLM09: Misinformation

This one doesn't get treated as a security risk, but it is. If your application presents model output as authoritative, and that output is wrong, you own that. Customer support bots that confidently cite wrong policies. Medical tools that hallucinate dosages. Legal assistants that cite non-existent case law.

The technical mitigation is grounding: use RAG or structured data sources so the model's responses are anchored to verified content. Add confidence signaling when the model is working outside of verified data. Make it clear in the UX when output is AI-generated and what its limitations are.

The NIST AI Risk Management Framework

NIST's AI RMF is the framework enterprises and government agencies use to structure their AI governance programs. It has four core functions: Govern, Map, Measure, and Manage. Think of it as the organizational layer on top of the technical controls you get from OWASP.

You don't need to implement the entire framework to get value from it. The structure helps you think through risk at the application level and document your decisions, which is what you need when compliance or legal comes asking.

Govern

Govern is about who owns what. Before you ship an LLM feature, someone needs to be accountable for each of the following:

Risk appetite: what level of AI-related risk is acceptable for this application?
Model stewardship: who owns documentation, versioning, and evaluation of the model?
Security: who owns adversarial testing and incident response?
Privacy and compliance: who's reviewing data handling and regulatory requirements?

This doesn't have to be four different people. On a small team, one person might own several of these. The point is that these questions have explicit answers, not implicit assumptions.

Map

Map is where you document what your application does and what could go wrong. For each LLM-powered feature, you want to capture:

What is the model being asked to do?
What data goes in, and where does that data come from?
What actions can the model trigger?
Who are the users, and what's the impact if the model gets it wrong?

This doesn't have to be elaborate. A one-page document per feature that answers these questions is enough to get started. The value is in forcing explicit thinking before you're in incident response mode.

Measure

Measure is ongoing evaluation. For LLM applications, this means:

Accuracy and drift monitoring: is the model's performance staying consistent over time? Model behavior can shift as the underlying model is updated by the provider.
Bias and fairness audits: for features that affect different groups of users differently, are outcomes equitable?
Red-teaming: regularly stress-test your prompts and flows with adversarial inputs. Treat this like penetration testing.
Output quality sampling: periodically review a sample of real production outputs. This is how you catch problems that automated metrics miss.

The cadence depends on risk level. A customer support bot that gives wrong answers needs more frequent evaluation than a summarization feature for internal documents.

Manage

Manage is what happens when something goes wrong, and what you do to prevent it at scale. The key components:

Incident response plan: what do you do when the model produces harmful output? Who gets notified? How do you mitigate it?
Override and appeal mechanisms: for any decision the model participates in that affects users (loan approvals, content moderation, pricing), users need a way to get a human review.
Decommissioning plan: how do you retire a model version safely? What happens to data that was used for training or evaluation?

Building Your Risk Assessment

When you're ready to document a risk assessment for an LLM feature, here's a structure that works:

Application description
What does this feature do? What model does it use? What data goes in and out?
OWASP LLM risk mapping
Go through the OWASP Top 10 and for each risk, note: is this applicable to our feature? If yes, what controls do we have? What residual risk remains?
Impact and likelihood
For each applicable risk, rate the potential impact (low, medium, high) and the likelihood given your controls. High impact + high likelihood = must fix before launch. High impact + low likelihood = mitigate and monitor. Low impact = document and accept.
Governance ownership
Name the person accountable for each governance responsibility from the NIST framework.
Monitoring plan
How will you know if something goes wrong in production? What metrics or sampling processes will catch issues?

This doesn't need to be a 40-page document. A clear one-pager that covers these five areas is more useful than an elaborate framework nobody reads.

The Practical Starting Point

If you're building an LLM feature today and haven't thought about governance yet, start here.

First, go through OWASP's LLM Top 10 and check your application against each risk. The ones that require immediate attention are prompt injection (if you accept user input or retrieve external content), excessive agency (if your model can take real-world actions), and sensitive information disclosure (if any sensitive data passes through context).

Second, implement the principle of least privilege everywhere. Least privilege for model tool access. Least privilege for data retrieval. Least privilege for actions the model can trigger.

Third, add human review for any action that's hard to reverse. Delete, send, publish, approve. If the model suggests it, a human should confirm it.

Governance isn't about slowing down development. It's about building things that work reliably at scale and hold up when someone looks closely at how they work. Start with the OWASP list and build from there.

DEV Community