You’ve got your AI model behaving well. You’ve cleaned your data. You’ve built guardrails to handle prompt injection. But here’s the catch — none of that matters if your API is wide open or your frontend leaks keys.
In this post, we’re tackling a layer that often gets ignored: the infrastructure between the user and the model — specifically, your API layer and frontend interface.
If you’re using FastAPI, Gradio, or any framework for your AI apps, this is for you.
Why API and Frontend Security Matters
AI APIs are a goldmine for attackers:
- They expose high-value endpoints (e.g., GPT-4, Gemini, Claude)
- They often have low/no auth in MVPs and prototypes
- They can leak sensitive info in logs or responses
- They are expensive to run, abusing which means real money lost
Your model might be smart, but if anyone can POST to your /generate
endpoint without limits, you’ve built an open faucet — and it won’t end well.
Common Risks in AI API Layers
1. Exposed API Keys
Storing OpenAI or Gemini keys directly in frontend code — often in JavaScript or HTML, or on GitHub with the code files — allows anyone to grab and abuse them.
2. Unprotected Inference Endpoints
APIs that accept user prompts and return model responses without auth, validation, or throttling.
3. Rate-limit bypass
If your rate-limiting is weak or IP-based only, attackers can rotate proxies and spam your model.
4. Prompt leaking via logs
Logging raw prompts and outputs for debugging or analytics — without redaction or masking.
5. CSRF / CORS misconfigurations
Allowing requests from any domain or lacking proper CSRF tokens in session-based apps.
Secure API Design for AI Apps
1. Move API keys to the backend
Frontend should never talk to OpenAI or Gemini directly.
Instead:
- Frontend → your backend → model provider
- Add an auth layer and usage quotas per user
- Rotate keys securely with environment variables
2. Use middlewares
Protect endpoints with:
- Authentication (JWTs, OAuth, session tokens)
- Request validation (e.g.,
pydantic
orzod
) - Rate-limiting (
slowapi
for FastAPI,express-rate-limit
for Node)
3. Example: FastAPI Endpoint
from fastapi import FastAPI, Request, HTTPException
from slowapi import Limiter
from slowapi.util import get_remote_address
limiter = Limiter(key_func=get_remote_address)
app = FastAPI()
app.state.limiter = limiter
@app.post("/generate")
@limiter.limit("5/minute")
async def generate(request: Request, payload: dict):
if not request.headers.get("Authorization"):
raise HTTPException(status_code=401, detail="Missing auth")
# sanitize payload here
# forward to OpenAI / Gemini
return {"response": "..."}
Frontend Security
1. Never expose secrets
Even .env
variables become public if not scoped properly.
Bad:
NEXT_PUBLIC_OPENAI_API_KEY
on frontend
Good:
Call your backend route (/api/chat
) and store keys on the server only.
2. Don’t trust user input blindly
Escape HTML or markdown. Don’t render untrusted strings as JSX or dangerouslySetInnerHTML without sanitization.
Use:
- DOMPurify (React/Next.js)
-
bleach
(Python) - Built-in escape methods in Gradio
3. Input size limits
Prevent abuse by setting max character lengths for inputs, file uploads, or text areas. This avoids context flooding and DoS-like behavior.
Observability + Logging: Do It Right
You still need logs — but with guardrails.
- Mask API keys, tokens, emails in logs
- Truncate or hash prompts before storing
- Never log full model outputs in production unless scrubbed
- Store logs securely (e.g., encrypted S3, Redact.dev)
Bonus: RAG & Vector DB Endpoints
If you’re using Pinecone, Weaviate, or Qdrant for semantic search:
- Require signed or tokenized queries to access embeddings
- Validate source documents before they’re chunked and embedded
- Don’t expose raw vector data to users (it can be reverse engineered)
Final Thoughts
AI security isn’t just about what happens inside the model.
It’s about everything surrounding it — the wrappers, the servers, the user interface, and the network traffic.
Your AI app should behave like any production-grade backend:
- Secure endpoints
- Isolated secrets
- Clean logging
- Strict rate limiting
In the next post, we’ll explore Deployment Security — securing AI apps once they’re live on Hugging Face Spaces, VMs, or cloud platforms.
Until then, audit your own API layer. Try hitting your endpoints like an attacker. You’ll learn a lot about what you missed.
Connect & Share
I’m Faham — currently diving deep into AI and security while pursuing my Master’s at the University at Buffalo. Through this series, I’m sharing what I learn as I build real-world AI apps.
If you find this helpful, or have any questions, let’s connect on LinkedIn and X (formerly Twitter).
This is blog post #6 of the Security in AI series. Let's build AI that's not just smart, but safe and secure.
See you guys in the next blog.
Top comments (0)