FastAPI + LLM: Build a Production-Ready AI API in 30 Minutes
Need to serve an LLM through a real API? Here's how to build one that handles production traffic in under 30 minutes.
Why FastAPI + LLM?
FastAPI is the best choice for AI APIs because:
- Async by default — handles concurrent LLM calls efficiently
- Auto-generated docs — Swagger UI out of the box
- Type validation — Pydantic models catch bad requests before they hit your LLM
- WebSocket support — streaming tokens to clients
Minimal Working API
from fastapi import FastAPI
from pydantic import BaseModel
from openai import OpenAI
app = FastAPI(title="AI API", version="1.0")
client = OpenAI()
class ChatRequest(BaseModel):
message: str
system_prompt: str = "You are a helpful AI assistant."
model: str = "gpt-4"
max_tokens: int = 1000
class ChatResponse(BaseModel):
reply: str
model: str
tokens_used: int
@app.post("/chat", response_model=ChatResponse)
async def chat(request: ChatRequest):
response = client.chat.completions.create(
model=request.model,
messages=[
{"role": "system", "content": request.system_prompt},
{"role": "user", "content": request.message}
],
max_tokens=request.max_tokens
)
return ChatResponse(
reply=response.choices[0].message.content,
model=request.model,
tokens_used=response.usage.total_tokens
)
Production Checklist
- [ ] Add rate limiting (
slowapior custom middleware) - [ ] Add API key authentication
- [ ] Add request logging for monitoring
- [ ] Set up health check endpoint
- [ ] Configure CORS for web clients
- [ ] Add timeout middleware (LLM calls can hang)
- [ ] Use environment variables for secrets
Deploy in 5 Minutes
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4
That's it. You now have a production-ready AI API.
Building AI tools? Follow me for more practical guides. Code available at GitHub.
Top comments (0)