How I Built a Multi-Agent Prompt Engineering Runbook with pydantic-ai and FastAPI
Most teams building AI tooling eventually hit the same wall: they have five different prompt patterns scattered across Notion docs, Slack threads, and someone's local Python file. Nobody agrees on the output format. The SWOT analysis prompt returns markdown sometimes and JSON sometimes. The code reviewer just dumps text. When something breaks in production, you spend 40 minutes figuring out which version of the prompt was actually running.
This article walks through an architecture that solves that problem using pydantic-ai, FastAPI, and structured Pydantic outputs. The result is a prompt engineering runbook: a single deployable service that handles SWOT analysis, social post generation, code review, multi-format summarisation, and a decision framework, all returning typed, validated responses.
The Problem: Prompt Sprawl Kills Reliability
Here is a concrete scenario that plays out in teams of five or more engineers.
Someone writes a useful SWOT analyser prompt in a Jupyter notebook. It works great. A teammate copies it into a FastAPI route, changes a few words, and hardcodes the model name. Three months later, a third person builds a Slack bot that uses a slightly different version. Now you have three SWOT analysers in production with no shared contract on what the output looks like.
Downstream systems start breaking because one version returns strengths as a list and another returns it as a comma-separated string. The code reviewer prompt just returns raw text, so the frontend has to parse it with regex. When you upgrade the model, you have no idea which of the six prompt functions will silently regress.
Teams that use Slack as their source of truth are the most exposed to this problem. Context lives in threads that expire from memory, decisions get buried, and when someone needs to extract structured insights from that context, they either do it manually or rely on informal scripts that nobody maintains. The chaos compounds because there is no single place that says "this is what our AI outputs look like."
The fix is not better prompt writing. It is a typed contract layer between your prompts and the rest of your system.
The Approach: pydantic-ai + FastAPI as a Typed Contract Layer
The core idea is simple: every agent in the runbook has a Pydantic model as its output type. pydantic-ai enforces that contract at the LLM call boundary. FastAPI exposes each agent as an endpoint with typed request and response bodies.
Why pydantic-ai over alternatives?
LangChain is the obvious comparison. LangChain has output parsers and structured output support, but the abstraction layer is thick. Debugging a failed parse means tracing through multiple internal chain objects. For a runbook that needs to be maintained by the whole team, that opacity is a liability.
Plain requests with instructor is closer to what this is doing, and honestly a valid choice. The tradeoff is that pydantic-ai gives you agent-level retries and tool support out of the box, which matters when you start adding context retrieval or multi-step reasoning.
Raw OpenAI structured outputs work but lock you to one provider. pydantic-ai is provider-agnostic, so swapping from OpenAI to Anthropic or a local model is a config change, not a rewrite.
The key design decision that makes this reliable: every agent is defined with a result_type that is a Pydantic model, not a string. pydantic-ai will retry the LLM call if the output fails validation. You get automatic retries with validation feedback fed back into the prompt. This is the thing that plain prompt engineering cannot give you on its own.
The FastAPI layer adds HTTP-level validation on the way in and serialisation on the way out. Every request and response is typed. Your frontend, your Slack bot, and your CI pipeline all talk to the same contract.
The Code Pattern: Typed Agents with Structured Outputs
Here is the central pattern. Everything in the runbook follows this shape.
from pydantic import BaseModel, Field
from pydantic_ai import Agent
from fastapi import FastAPI, HTTPException
# 1. Define the output contract
class SWOTAnalysis(BaseModel):
strengths: list[str] = Field(description="Internal positive factors")
weaknesses: list[str] = Field(description="Internal negative factors")
opportunities: list[str] = Field(description="External positive factors")
threats: list[str] = Field(description="External negative factors")
summary: str = Field(description="Two-sentence executive summary")
# 2. Define the input
class SWOTRequest(BaseModel):
context: str = Field(description="Business or product context to analyse")
focus_area: str | None = Field(default=None, description="Optional domain focus")
# 3. Create the agent with result_type enforcing the contract
swot_agent = Agent(
model="openai:gpt-4o",
result_type=SWOTAnalysis,
system_prompt=(
"You are a strategic analyst. Analyse the provided context and return "
"a structured SWOT analysis. Be specific and actionable. "
"Each list should contain 3-5 items."
),
)
app = FastAPI()
# 4. Expose it as a typed FastAPI endpoint
@app.post("/analyse/swot", response_model=SWOTAnalysis)
async def analyse_swot(request: SWOTRequest) -> SWOTAnalysis:
prompt = request.context
if request.focus_area:
prompt = f"Focus area: {request.focus_area}\n\nContext: {request.context}"
try:
result = await swot_agent.run(prompt)
return result.data
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
What each part does and why it matters:
result_type=SWOTAnalysis is the critical line. This tells pydantic-ai to use the model's structured output mode and validate the response against your Pydantic schema. If the LLM returns malformed JSON or missing fields, pydantic-ai retries automatically.
response_model=SWOTAnalysis on the FastAPI route means the OpenAPI docs are generated from your actual output type. Your frontend developers can see exactly what fields are returned without reading the prompt.
result.data gives you the validated Pydantic instance directly. No JSON parsing, no .get() calls with fallbacks.
The same pattern is repeated for every agent in the runbook: code reviewer, social post generator, multi-format summariser, and decision framework. They each have a different Pydantic model and a different system prompt, but the structural shape is identical.
Integration: Connecting to External Sources
The runbook becomes genuinely useful when it is connected to external data sources. The most impactful integration for most teams is Slack.
The data flow looks like this:
Slack channel/thread
-> Slack API (conversations.history or webhooks)
-> extraction endpoint on the runbook
-> summariser or SWOT agent
-> structured output stored in Postgres or returned to Slack
For the Slack integration, you fetch message history using slack_sdk, concatenate the thread into a single context string, and pass it to whichever agent fits the use case. Decision threads go to the decision framework agent. Product discussion threads go to the SWOT analyser. Code snippets shared in chat go to the code reviewer.
from slack_sdk import WebClient
slack_client = WebClient(token=settings.slack_bot_token)
def extract_thread_context(channel_id: str, thread_ts: str) -> str:
response = slack_client.conversations_replies(
channel=channel_id,
ts=thread_ts
)
messages = response["messages"]
return "\n".join(
f"{msg.get('username', 'user')}: {msg['text']}"
for msg in messages
)
One gotcha worth knowing: Slack message text contains user ID mentions in the format <@U12345>. These will confuse the LLM if left in. Preprocess the context string to replace user IDs with display names or generic placeholders before passing to any agent. You can do this with the users.info API call or by maintaining a local ID-to-name cache.
Tradeoffs and Limitations
This architecture has real costs that you should weigh before building it.
Latency. Every request makes at least one LLM API call. For a code reviewer on a hot path, that is 1-3 seconds minimum. Do not use this for anything that needs sub-200ms response times.
Retry costs. pydantic-ai's automatic retries on validation failure mean a badly calibrated system prompt can silently double your API spend. Monitor retry rates and set max_retries explicitly.
Overkill for small teams. If you have two engineers and three prompts, a shared Python module with well-named functions and type hints is probably the right answer. The FastAPI layer adds deployment overhead that only pays off when multiple systems are consuming the same agents.
Provider lock-in is deferred, not eliminated. Switching providers is easier than with raw OpenAI calls, but system prompts that are tuned for GPT-4o may behave differently on Claude or Gemini. You still need to test across providers if portability matters.
For teams with strict documentation habits already, the marginal value is lower. This runbook is most valuable when your AI prompts are currently scattered and your outputs are inconsistent.
Get the Code and Keep the Conversation Going
I packaged this as an open-source template on GitHub: https://github.com/Reactance0083/pydantic-ai-prompt-engineering-runbook
The scaffold gives you the core patterns for all five agents and the FastAPI setup. If you want the full production version with tests, error handling, provider configuration, logging middleware, and deployment docs, that is available here: https://reactance0083.gumroad.com/l/mdsbpc
If you are building something similar and have hit a different set of tradeoffs, specifically around retry strategies or multi-tenant prompt isolation, I would like to hear about it in the comments. This architecture has a few rough edges I am still working through and real-world feedback tends to surface the problems that local testing misses.
Top comments (0)