Pico

Posted on Mar 27 • Originally published at agentlair.dev

Don't Let Your AI Agents Hold Their Own Credentials

#security #python #ai #devops

The LiteLLM PyPI compromise revealed a critical architectural flaw in how AI agents manage secrets. A hidden .pth file executed at Python startup, harvesting environment variables, SSH keys, and cloud credentials before any application code ran.

The Supply Chain Attack That Worked Because Credentials Were There

On March 25, 2026, versions 1.82.7 and 1.82.8 of LiteLLM—a popular Python routing layer for AI model providers—were compromised on PyPI. The attack used a .pth file that executes automatically when Python starts, requiring no imports.

The payload collected:

Environment variables (API keys, service tokens, database passwords)
SSH private keys and authorized_keys files
AWS credentials, IMDS tokens, and IAM role files
Kubernetes configs and service account tokens
Docker authentication configs
Git credentials and shell history
Cryptocurrency wallet directories

The encrypted data was sent to a domain mimicking the legitimate service. This succeeded not because the attack was sophisticated, but because credentials were sitting exactly where an attacker would look first.

Why Python Startup Attacks Work

The .pth mechanism is a standard Python feature. Files in site-packages ending with .pth evaluate at interpreter startup to extend the Python path. A compromised .pth file executes arbitrary code before your application starts.

Current agent deployments typically load all credentials at startup:

import litellm
import os

client = litellm.completion(
    model="openai/gpt-4o",
    api_key=os.environ["OPENAI_API_KEY"],
    messages=[{"role": "user", "content": task}]
)

All secrets—OPENAI_API_KEY, AWS_SECRET_ACCESS_KEY, GITHUB_TOKEN, database URLs—live in memory throughout the agent's lifetime. A startup attack bypasses application logic and finds them anyway.

The Structural Problem: Agents as Credential Holders

The real issue isn't supply chain hygiene alone. It's architectural: agents load all credentials at startup and hold them in memory for their entire lifecycle.

According to OWASP MCP Top 10, "token mismanagement" ranks as the #1 risk. The observation: most agents treat credentials as ambient environment rather than scoped, time-limited resources. Research shows that 92% of MCP servers store secrets in plaintext configuration files.

This mirrors the risk model of container escapes and process injection attacks. Owning the process means owning everything it holds. For an agent running for hours while managing multiple tool connections, that's a substantial attack surface.

From Credential Holders to Vault Accessors

The alternative is architecturally straightforward: agents don't store credentials. They store vault access.

Current model:

[Python Startup]
  → Load all credentials from environment
  → Hold in memory
  → Use throughout agent lifetime
  → Attack surface: entire process, all credentials

Vault model:

[Python Startup]
  → Load vault API key (limited scope only)

[At moment of need]
  → Fetch specific secret from vault
  → Use immediately
  → Discard from memory
  → Attack surface: one credential, moment of use

Even if an attacker steals the vault API key, its scope is bounded. They access only the specific agent's secrets, not your entire credential store. And they can only grab secrets that were in memory at compromise time.

HashiCorp Vault pioneered this approach for infrastructure: dynamic secrets, short TTLs, audit logging, revocation. But it's enterprise software requiring ops teams, Kubernetes, and dedicated infrastructure. Solo developers building autonomous agents or MCP servers need the architecture without the operational overhead.

The Missing Layer: Agent-Native Credential Management

The current landscape offers two extremes:

Option 1: .env files and environment variables—zero setup, zero security. The LiteLLM attack target.

Option 2: HashiCorp Vault, AWS Secrets Manager, Azure Key Vault—correct architecture, designed for enterprise infrastructure teams, requires organizational IAM and network policies.

Between them: a gap. Developers building agents need a vault model implementation that works with a simple API call, not enterprise platform setup.

Implementing Vault Architecture for Agents

An agent-focused vault solution should be:

Client-side encrypted: Server sees only ciphertext blobs, never plaintext
Moment-of-use fetching: Credentials loaded when needed, not at startup
Per-agent scoped: One vault key per agent limits compromise blast radius
Audit-ready: Track what was accessed and when
Developer-simple: Minimal setup friction

Example implementation:

import { VaultClient } from '@agentlair/vault';

const vault = new VaultClient({
  apiKey: process.env.VAULT_API_KEY
});

// At the moment of use—not startup:
const openaiKey = await vault.get('openai-api-key');
const client = new OpenAI({ apiKey: openaiKey });

const result = await client.chat.completions.create({ ... });

// openaiKey falls out of scope after use

What This Architecture Prevents

Against startup-based supply chain attacks:

Attack Vector	Environment Variables	Vault Model
.pth file at startup	All env vars harvested	Vault API key only (limited scope)
Process memory dump	All credentials in memory	Only credentials from last N seconds
Container escape	Entire environment exfiltrated	Ciphertext only, unreadable to server
Log exposure	Plaintext in logs	Never logged, referenced by key name

The vault API key is still a credential, but it's bounded in scope. Rotating it is one operation. Auditing access is straightforward.

Minimum Defensive Posture

If you're using LiteLLM or similar AI proxies, the baseline isn't just dependency pinning (though you should). It's relocating credentials:

Move credentials from environment variables into a vault
Fetch credentials when needed, not at startup
Scope access: one vault key per agent or service
Audit access and retrieval patterns

This doesn't require enterprise platforms. It requires treating agent credentials with the same discipline applied to infrastructure credentials over the past decade.

Convenience and security aren't in conflict. A vault client requiring 30 seconds to integrate is as convenient as environment variables, with a fundamentally different risk posture.

The LiteLLM compromise succeeded because agents are architecturally convenient—they load everything at startup. That convenience created the attack surface. Shifting to vault-based architecture—fetching credentials at the moment of use, scoping access per agent, maintaining audit trails—doesn't add complexity. It replaces a naive model with a defensible one.

The question isn't whether supply chain attacks will happen again. They will. The question is whether your agents will be holding your entire credential store when they do.

AgentLair provides a credential vault and identity layer for AI agents.

DEV Community