Eli

Posted on Jun 13 • Originally published at aiglimpse.ai

FastAPI and Starlette Security for AI Agents: Production Hardening

#tools #machinelearning #tutorial

Vulnerabilities, patches, and a hardening checklist for Python teams deploying AI agent backends.

FastAPI and Starlette are the default Python frameworks for deploying AI agent backends, but they inherit a legacy of web framework vulnerabilities that take on new meaning when agents have permission to call external APIs, modify databases, or execute tool code. This guide covers the specific vulnerabilities that matter to agent deployments, current patch status, and a hardening checklist for teams moving to production in 2026.

Why this matters now

AI agents are no longer experimental. Teams are running production multi-agent systems that execute real actions via tool calls, and those tools often talk to FastAPI or Starlette backends. A path traversal vulnerability or CORS misconfiguration in a simple tool API becomes a security incident when an untrusted agent (or a prompt-injected trusted agent) can read files, modify data, or invoke unintended tools.

The risk profile is asymmetric: your Starlette app might be well-intentioned and tightly scoped, but the agents calling it are not. They are incentivized to probe endpoints, try parameter injection, and escalate privilege. Unlike a traditional API attack, agent-driven exploitation is automated, rapid, and often invisible until logs are reviewed days later. FastAPI's documentation is excellent for basic use cases, but it does not emphasize threat modeling for agent-driven access patterns or the interaction between ASGI middleware, dependency injection, and untrusted input.

Common FastAPI and Starlette vulnerabilities in agent deployments

Photo by Christina Morillo on Pexels.

FastAPI sits on top of Starlette's ASGI middleware stack. Understanding the surface area means understanding both layers.

Path traversal via route parameters is the most common vulnerability. When a route handler accepts a string parameter (e.g., /file/{filename}) without validation, an agent can send /file/../../etc/passwd and read arbitrary files if the handler passes the parameter directly to a file operation. FastAPI does not validate path segments by default.

Dependency injection bypasses occur when a Depends() function is used for authentication but the function is memoized or cached incorrectly. For example, if a user ID is cached at the app level instead of the request level, a second request from a different user may reuse the cached identity. This is rare in modern FastAPI but happens when developers mix async and sync code or use global state inside dependency functions.

CORS (Cross-Origin Resource Sharing) misconfiguration is endemic. Setting CORS allow_origins to "*" or a broad list allows any JavaScript in a browser to call your agent API. If the agent API also sets credentials to true, or if authentication is passed via cookies, an attacker's website can make authenticated requests on behalf of the user. Many teams enable CORS permissively to "make development easier" and never tighten it in production.

Unencrypted WebSocket connections (ws:// instead of wss://) are used frequently in agent systems for streaming responses. A WebSocket upgrade request can leak authentication tokens in the URL or headers, and the connection itself carries unencrypted tool results and agent reasoning.

Deserialization of untrusted JSON via Pydantic is generally safe (Pydantic does not execute code during parsing), but complex nested models with validators that call external services can be exploited. An agent sending a deeply nested JSON payload with thousands of objects can trigger a DoS if validators run for each object.

DEBUG mode left enabled in production is a catastrophic vulnerability. FastAPI/Starlette's debug mode serves the Starlette debug toolbar at /debug, which exposes request history, local variables, and source code. This should never be enabled in production, but teams sometimes deploy with DEBUG=True in environment variables by accident.

Starlette CVEs and patch status

Starlette 0.x and early 1.x versions had several notable issues. The most significant was CVE-2021-41773 (inherited from Uvicorn), which allowed path traversal in static file serving when using path_regex patterns without anchors. This was patched in Starlette 0.20.2 and later.

Starlette 0.26.0 through 0.28.0 had a minor vulnerability in the TestClient where session cookies were not properly isolated between test cases, but this only affects test code, not production deployments.

As of early 2026, Starlette 1.x (released in 2024) is the stable version and has no critical outstanding vulnerabilities. However, "no known vulnerabilities" does not mean "unhackable." The framework's design assumes developers will validate inputs, configure middleware correctly, and not expose internal APIs.

The practical guidance: upgrade to Starlette 1.x and FastAPI 0.115.x or later. If you are on 0.x, you must upgrade. Check the release notes for your current version at github.com/encode/starlette/releases and github.com/tiangolo/fastapi/releases. Use pip list | grep -E 'starlette|fastapi' to audit your current versions across all services.

MCP server exposure and agent-specific risks

Photo by Christina Morillo on Pexels.

MCP (Model Context Protocol) servers are a common pattern in agent systems. An agent calls a single FastAPI endpoint that defines available tools, then calls additional endpoints to execute those tools. The vulnerability comes from exposing these tool execution endpoints without authentication or rate limiting.

Consider a simple example: an agent calls /api/tools/execute with a body like {"tool": "read_file", "path": "/home/user/data.csv"}. If this endpoint is world-readable (bound to 0.0.0.0 and reachable from the internet), any attacker can enumerate tools and call them. Even if the endpoint requires an API key, a leaked key (via logs, error messages, or a compromised client) gives full agent capability to an attacker.

The hardening approach:

Never bind MCP tool endpoints to 0.0.0.0:8000. Bind to 127.0.0.1 or a private network only, and use a reverse proxy on a different machine to route authenticated requests inbound.
Require API key rotation on a short schedule (weekly or monthly, not annually). Store keys in a secrets manager (HashiCorp Vault, AWS Secrets Manager, 1Password) and rotate them without redeploying the app.
Implement per-tool rate limiting. If an agent can call delete_user 1000 times per hour, the first 100 calls should trigger an alert.
Log every tool call with the agent ID, timestamp, parameters, and result. Use structured logging (JSON) so security teams can query the logs later.
Use JWT tokens with a short expiration (5 to 15 minutes) instead of static API keys. Agents must refresh tokens by calling an auth endpoint, which is another place to check rate limits and agent status.

ASGI middleware security and the order of operations

FastAPI's middleware stack processes requests in the order they are added. A common mistake is adding security middleware after business logic middleware, which means the business logic sees the request before authentication happens. This is easy to do because the middleware is added in reverse order of decoration in the code.

For example:

app.add_middleware(CORSMiddleware, ...)
app.add_middleware(CustomLoggingMiddleware)
app.add_middleware(AuthenticationMiddleware)

Requests are processed in reverse order: AuthenticationMiddleware first, then CustomLoggingMiddleware, then CORSMiddleware. If CustomLoggingMiddleware logs the request body, it sees the raw (unauthenticated) data. If AuthenticationMiddleware is missing from the stack entirely, requests bypass authentication.

The safer pattern is to use FastAPI's dependency injection for per-route authentication instead of global middleware. Define a Depends() function that checks the Authorization header, validate it, and attach the user to the request state. This is more explicit and easier to audit.

async def verify_api_key(request: Request) -> str:
    key = request.headers.get("X-API-Key")
    if not key or key not in VALID_KEYS:
        raise HTTPException(status_code=403, detail="Forbidden")
    return key

@app.post("/api/tools/execute")
async def execute_tool(body: ToolRequest, key: str = Depends(verify_api_key)):
# Now you know the request is authenticated
return {"result": "..."}

This approach also allows different routes to have different authentication levels. An internal health check endpoint might not require a key, while the tool execution endpoint does.

Production hardening: a concrete checklist

Start with these items and expand based on your threat model:

Update dependencies. Run pip install -U starlette fastapi uvicorn and test in a staging environment before deploying to production. Use pip-audit or Snyk to scan for known vulnerabilities in your dependency tree.
Disable DEBUG mode. Set DEBUG=False (or omit the DEBUG variable) in all production environments. Never rely on environment variables to toggle debug mode, as mistakes are common. Explicitly check in code: if os.getenv("ENVIRONMENT") != "production": DEBUG = True.
Enable HTTPS/TLS. Bind the app to 127.0.0.1:8000 and run a reverse proxy (nginx, Caddy, or AWS ALB) in front that handles TLS termination. Let's Encrypt provides free certificates. Never serve an agent API over plain HTTP in production, even internally, because monitoring tools and logs may transmit URLs.
Configure CORS correctly. Instead of allow_origins=["*"], specify the exact origins that need access: allow_origins=["https://agent-frontend.example.com", "https://dashboard.example.com"]. Set allow_credentials=False unless you specifically need cross-origin cookies.
Implement authentication on all agent endpoints. Use API keys with a short expiration, JWT tokens, or OAuth2. Store keys in a secrets manager, not in source code or environment files checked into git. Rotate keys on a fixed schedule independent of deployment.
Add rate limiting. Use a library like slowapi (based on Flask-Limiter) to limit requests per agent or IP. Set conservative limits: 10 requests per second per agent, 100 per minute for public endpoints. Monitor for unusual patterns and alert.
Log all agent tool calls. Include the agent ID, tool name, parameters (redacted if sensitive), timestamp, and result. Use structured logging (JSON) and forward logs to a centralized store (ELK, Datadog, CloudWatch). Review logs daily or set up alerts for unauthorized tool use.
Run in a container, not as root. Create a non-root user in your Dockerfile and run the app as that user. This limits the blast radius if the process is compromised.
Validate all input. Use Pydantic models with strict validation. Set max_length on string fields, min and max on numeric fields, and regex patterns on identifiers. This prevents both DoS attacks (huge payloads) and injection attacks (special characters).
Isolate MCP endpoints. If your agent calls tool execution endpoints, bind them to a private network or localhost only. Use a different port from the public-facing API, or use a routing rule in the reverse proxy to require special headers before forwarding to the tool endpoint.
Use environment variables for secrets, but not as global state. Load secrets at startup, validate them, and pass them to functions as arguments. Never store secrets in global variables that might be logged or cached.
Implement request timeouts. Set a timeout on all external API calls (to other services, databases, or agent-driven requests). An agent that stalls a request indefinitely can exhaust worker threads and cause a DoS. Use asyncio.wait_for() or httpx.AsyncClient with timeout parameters.

When this fails: real incidents and limitations

Even with a hardened setup, several attack vectors remain difficult to prevent at the application level.

Prompt injection is the canonical example. If an agent reads user-supplied data (from a file, database, or API), then includes that data in a prompt sent to an LLM, an attacker can inject instructions into the LLM that cause it to call tools incorrectly. No amount of FastAPI security stops this, because the attack happens at the LLM layer, not the HTTP layer. Mitigation requires model-level controls (instruction hierarchies, tool allowlisting per agent, prompt filtering).

Side-channel attacks via timing analysis are theoretically possible if an agent times how long an endpoint takes to respond and infers information about data existence. For example, if /api/check_user/{username} takes 50ms for valid usernames and 10ms for invalid ones, an attacker can enumerate valid usernames. This is hard to prevent at the application level without adding artificial delays to all responses, which hurts performance.

Distributed DoS (DDoS) attacks that overwhelm your reverse proxy are beyond application-level mitigation. You need a DDoS protection service (Cloudflare, AWS Shield) or infrastructure-level rate limiting (at the network edge).

Compromised dependencies are a systemic risk. If a transitive dependency in your requirements.txt is compromised, you inherit its vulnerabilities. Use pip-audit regularly and subscribe to security advisories for your core dependencies. Consider using a software composition analysis (SCA) tool in your CI/CD pipeline.

Misconfigurations in production are the most common failure mode. A team hardens the code perfectly but misconfigures the reverse proxy, or enables CORS for debugging and forgets to disable it. Use infrastructure-as-code tools (Terraform, CDK) and enforce code review on infrastructure changes, just as you do for application code.

Conclusion and next steps

FastAPI and Starlette are production-ready frameworks, but agent deployments carry asymmetric risk because agents are not human users. They are automated, tireless, and incentivized to find edge cases. Start by upgrading to the latest Starlette and FastAPI, then work through the hardening checklist above. Bind internal APIs to localhost, use a reverse proxy for public traffic, implement per-agent rate limiting and authentication, and log all tool calls. None of these steps are novel, but their combination is what separates a research project from a production system that teams can rely on. After that, focus on the agent layer: validate outputs from external APIs before passing them to the LLM, implement instruction hierarchies so agents cannot redefine their own permissions, and monitor for anomalous tool use patterns. Security is a process, not a checklist.

This article was originally published on AI Glimpse.

DEV Community