Claude code

Posted on Jun 15

The complete guide to llm security best practices

#llmsecuritybestpractices

What Are LLM Security Best Practices?

LLM security best practices are a set of technical controls, architectural patterns, and operational procedures that prevent language model deployments from being exploited, manipulated, or used as vectors for data exfiltration, privilege escalation, or unauthorized code execution.

That definition matters because most teams treat LLM security like they treated API security in 2012 — as an afterthought bolted on after deployment. The threat surface is different here. You are not just protecting an endpoint; you are protecting a reasoning system that can be convinced to misuse its own capabilities. Prompt injection, indirect injection through tool outputs, and confused deputy attacks against MCP servers are not hypothetical risks. They are documented, reproducible, and increasingly common in production environments.

This guide covers what the attack surface looks like in 2026, how to structure a defense, and which tools are actually worth deploying.

Why LLM Security Matters More in 2026

In 2023, LLM deployments were mostly chatbots. In 2026, they are autonomous agents with filesystem access, shell execution, and connections to internal APIs. Claude Code, GitHub Copilot, and similar tools don't just answer questions — they read files, write code, run tests, and commit to repositories. That capability expansion has made the security stakes meaningfully higher.

A 2025 analysis by Trail of Bits identified prompt injection as present in over 70% of production agentic systems they audited, with roughly half allowing indirect injection through tool outputs that were never sanitized before being passed back to the model. The attack pattern is straightforward: an attacker embeds instructions in a document, web page, or API response that the agent will eventually read. The model processes the injected content as instructions and executes them with whatever permissions it holds.

The second trend worth understanding is tool proliferation through the Model Context Protocol. MCP servers give agents capabilities — running queries, reading secrets, calling external APIs. Each tool is an attack surface. Each server that isn't properly scoped is a potential confused deputy: an agent with read-write database access that can be instructed, via injected content, to delete records it was never supposed to touch.

For teams deploying AI coding assistants specifically, the Claude Code Security documentation covers the specific threat model for agentic code execution environments, including how tool authorization scoping should be structured and what audit logging looks like at the reasoning-trace level.

How to Approach LLM Security

Start With the Threat Model

Before you configure anything, map what your agent can do. List every tool it has access to, the permissions each tool requires, and what the worst-case outcome is if a tool is misused. A read-only database query tool carries different risk than a tool that can execute arbitrary SQL. Write this down. Most teams skip this step and end up retrofitting controls after an incident.

A useful framing: treat every tool as a potential attack vector, not just a capability. If an attacker could control what the agent reads, could they cause it to misuse any of these tools? If yes, that tool needs explicit authorization controls — not just authentication, but per-invocation authorization against a known-good policy.

Implement Input and Output Validation

Every external input that enters the model context should be treated as untrusted. This includes tool outputs, retrieved documents, web content, and user-provided files. The validation approach depends on the use case, but at minimum you need:

Instruction-stripping or instruction-flagging for retrieved content before it enters the prompt
- Output filtering that catches known exfiltration patterns (e.g., base64-encoded payloads, anomalous network requests generated by the agent)
- Structural validation for tool parameters — if a tool expects a filename, it should not accept a string containing shell metacharacters

Scope Tool Permissions Precisely

The principle of least privilege applies to agent tools just as it does to service accounts. An agent that needs to read files in /src should not have a filesystem tool scoped to /. An agent that queries one database schema should not have credentials that reach a second schema.

This is operationally harder than it sounds because most MCP server implementations don't have fine-grained tool scoping built in. You frequently have to implement it at the wrapper layer — intercepting tool calls and validating them against a policy before execution. At Enkrypt AI, we've found that teams who implement this at the tool-call layer catch significantly more exploitation attempts than teams who rely on model-level instructions alone.

Log Reasoning Traces, Not Just Actions

Standard audit logs capture what an agent did. Reasoning-trace logs capture why it thought it should do it. The distinction matters for incident response: if an agent exfiltrates data, knowing which tool it called tells you what happened. Knowing what it was reasoning about tells you how the injection worked and where the control failed.

This requires your logging infrastructure to capture intermediate reasoning steps, not just final tool invocations. Some platforms surface this natively; others require instrumentation at the prompt-construction layer.

LLM Security Tools and Solutions Worth Deploying

The tooling category is still maturing, but several solutions have demonstrated practical value in production environments:

Prompt injection scanners — tools like Rebuff and Lakera Guard evaluate incoming prompts and retrieved content for injection attempts before they reach the model. Neither is foolproof, but both add a meaningful detection layer at low latency cost.
- Policy engines — OPA (Open Policy Agent) can be adapted to evaluate tool-call requests against defined authorization policies. It's infrastructure-level work, but it gives you verifiable, auditable controls that aren't dependent on the model's behavior.
- Sandboxed execution environments — for agents that execute code, sandboxing with strict egress controls limits blast radius. An agent that can't make outbound network calls from its execution environment can't exfiltrate data even if it's successfully injected.
- Behavioral anomaly detection — monitoring for unusual patterns in tool usage (e.g., an agent suddenly accessing files it has never accessed before, or generating network requests to external hosts) catches attacks that rule-based systems miss.

For teams running Claude Code in enterprise environments, the Claude Code Security product overview details the built-in controls — including tool-call interception, allowlist enforcement, and reasoning-trace logging — that are available without requiring custom instrumentation.

If you're evaluating options and need to understand how pricing scales with team size and audit requirements, the Claude Code Security pricing page breaks down tier differences clearly.

LLM Security Best Practices: Operational Checklist

Theory is useful; a repeatable checklist is more useful. Before deploying any LLM agent to production, verify:

$1
1. $1
2. $1
3. $1
4. $1
5. $1
6. $1

The last point is underweighted by most teams. Running known prompt injection payloads against your own agent in a staging environment is cheap. Discovering your controls fail after a production incident is not. The Claude Code Security blog covers specific attack scenarios and test cases teams can adapt for their own adversarial testing programs.

Frequently Asked Questions

What is prompt injection in LLMs?

Prompt injection is an attack where malicious instructions are embedded in content that an LLM processes — user inputs, retrieved documents, tool outputs, or web pages. When the model reads the injected content, it may treat the embedded instructions as legitimate commands and execute them, potentially bypassing intended behavior constraints or misusing tools it has access to.

What is indirect prompt injection?

Indirect prompt injection occurs when the attack payload reaches the model through a secondary channel rather than directly from the user. For example: an agent searches the web and retrieves a page containing hidden instructions; the agent reads a file that an attacker has written to a shared directory; or an API response includes embedded directives. The model processes all of this as context, making indirect injection harder to detect than direct injection.

How do I secure MCP tool access for LLM agents?

Treat each MCP tool as a separate attack surface. Scope permissions to the minimum the tool requires, implement per-invocation authorization checks using a policy engine rather than relying on model-level instructions, log every tool call with its parameters and stated rationale, and periodically audit which tools are actually being used versus which were provisioned. Remove tools that aren't actively needed — every unused tool is unnecessary attack surface.

What are the best tools for LLM security in 2026?

For prompt injection detection: Rebuff and Lakera Guard provide real-time scanning at the input layer. For tool authorization: Open Policy Agent adapted to evaluate tool-call requests against defined policies. For execution sandboxing: containerized environments with strict egress controls limit exfiltration risk. For agentic code execution specifically: Claude Code Security provides built-in tool interception, allowlist enforcement, and reasoning-trace audit logging without requiring custom instrumentation.

How do I get started with LLM security best practices?

Start by documenting your threat model: list every tool your agent has access to, the permissions each tool requires, and the worst-case outcome if misused. Then implement controls in priority order — tool scoping first (highest impact, catches the widest range of attacks), input validation second, output filtering third. Don't wait until your security posture is perfect to deploy; deploy with the controls you have, log everything, and iterate. The operational checklist in this article is a practical starting point.

What are common LLM security mistakes to avoid?

The most common mistakes: treating model-level instructions as a security control (they're not — they can be overridden by injection); provisioning tools with broad permissions because scoping is inconvenient; logging only tool invocations without capturing the model's reasoning; skipping adversarial testing before production deployment; and assuming that because an agent "seems to behave well" in normal use, it's secure against targeted attack. Normal-use behavior tells you almost nothing about adversarial robustness.

What is a confused deputy attack in LLM contexts?

A confused deputy attack occurs when an agent is tricked into misusing its own legitimate permissions on behalf of an attacker. The agent isn't compromised in the traditional sense — its credentials and tools are valid. Instead, an attacker manipulates the agent's reasoning, via prompt injection, to use those legitimate capabilities for unauthorized purposes. An agent with write access to a production database can become a confused deputy if it reads injected instructions telling it to modify or delete records. Preventing confused deputy attacks requires per-invocation authorization checks that verify the agent's intent, not just its credentials.

DEV Community