Building a Production-Ready LangGraph-Style Agent: From Raw Documents to Structured Intelligence

#agents #ai #aws #python

A pragmatic walkthrough of orchestrating extraction, summarization, memory, and routing using graph patterns and modular agent components—fully generic.

Why Another “Agent” Article?
Most write‑ups about agents stop at toy examples. This guide focuses on practical layering: ingest unstructured content (like PDFs), extract what matters, summarize with guardrails, persist long‑term memory, and route requests through specialized workflows—while keeping everything cloud‑friendly and composable. All patterns are standalone; you do not need any specific repository.
Conceptual Architecture (LangGraph Pattern)
We adopt a graph mindset (inspired by LangGraph) where each node encapsulates a responsibility:

Ingestion Node: Accepts user query + optional file references.
Extraction Node: Pulls raw text from uploaded documents (e.g., PDFs in object storage).
Summarization Node: Produces structured or free‑form summaries (LLM with JSON schema enforcement).
Memory Node: Persists distilled knowledge for subsequent sessions.
Routing Node: Selects workflow type (foundation / RAG / extractor) based on config.
Output Node: Returns assistant response + structural content for UI or downstream processes.
Represented abstractly:

Core Building Blocks 3.1 Structured Summarization The summarizer turns extracted content plus an existing structural scaffold into a validated JSON object. Generic pattern:

def summarize_json(state: Message, llm: ChatBedrock, schema: dict) -> dict:
DynamicModel = json_schema_to_pydantic(schema)
structured_llm = llm.with_structured_output(DynamicModel)
system_context = f"CONTENT: {state.content}\nSTRUCTURED_OUTPUT: {state.structural_content}"
messages = [
SystemMessage(content=system_context),
HumanMessage(content="Summarize and fill the schema accurately."),
]
result = structured_llm.invoke(messages)
return result.model_dump()
Key takeaways:

Use JSON Schema → dynamic Pydantic model for strong typing.
Keep prompt minimal; the schema drives completeness.
Separate raw content (unstructured text) from structural_content (prior structured context or template fields).
3.2 Generic Generation (Content vs Structure)
Switch temperature / max tokens depending on output goal:

def generate(context: dict | str, prompt: str, llm: ChatBedrock, structured: bool = False):
ctx = json.dumps(context, indent=2) if isinstance(context, dict) else context
messages = [
SystemMessage(content=prompt),
HumanMessage(content=f"Structured data:\n{ctx}\n"),
]
resp = llm.invoke(messages)
raw = resp.model_dump().get("content", "").strip()
# Try JSON first, fall back to text
try:
parsed = json.loads(raw)
return parsed if isinstance(parsed, dict) else raw
except Exception:
return raw
3.3 Workflow Routing (Generic)
The router inspects configuration (e.g., workflow_type) and dynamically dispatches:

def run_agent(request: AgentMessageRequest) -> Message:
cfg = load_config()
workflow = cfg.get("workflow_type", "foundation")
if workflow == "rag":
return run_rag(request) # retrieval + synthesis path
if workflow == "extractor":
return run_extractor(request) # text parsing path
# Foundation path
crew = FoundationCrew() # sets up Agent + Task
result = crew.crew.kickoff(inputs={"query": request.message})
return Message(
role="assistant",
structural_content={"response": str(result.raw)},
content=str(result.raw),
metadata={"workflow_type": workflow, "status": "success"},
)
Routing Principles:

Keep each specialized agent self‑contained.
Avoid heavy if/else trees by mapping workflow keys to callables.
Return a unified Message model to downstream consumers.
3.4 Memory & Persistence
Persist long‑term summaries (e.g., DynamoDB, PostgreSQL, Redis) via a memory node. Simplified pattern:

def persist_summary(state: Message, user_id: str, session_id: str, table) -> None:
summary_blob = summarize_json(state, llm, schema)
item = {
"user_id": user_id,
"session_id": session_id,
"session_time": iso_now_utc(),
"agent_summary": json.dumps(summary_blob),
}
table.put_item(Item=item)

def load_memory(user_id: str, session_id: str, table) -> dict | None:
resp = table.get_item(Key={"user_id": user_id, "session_id": session_id})
return resp.get("Item")
Add caching for reads; keep writes idempotent when possible.

3.5 Asynchronous Fan‑Out (Optional)
Queue raw payloads for UI updates / analytics:

def enqueue_payload(message: AgentMessageResponse, queue_url: str, sqs_client) -> None:
body = json.dumps(message.model_dump())
sqs_client.send_message(QueueUrl=queue_url, MessageBody=body)

Text Extraction Use Case (Generic Flow) An end‑to‑end invocation typically looks like this:

Deploy supporting infra (locally or remote) – object storage, agent endpoint.
Upload PDFs to a bucket.
POST an agent payload referencing uploaded file keys.
Receive structured summary / overview.
Condensed generic flow:

pdf_files = discover_local_pdfs("./samples")
for path in pdf_files:
s3_key = upload_pdf(path, bucket)
payload = {
"message": "Give me a concise overview.",
"sessionId": uuid.uuid4().hex,
"metadata": {"files": [s3_key]},
}
resp = requests.post(agent_url, headers=auth_headers(), json=payload)
print(parse_summary(resp.json()))
Guidelines:

Keep uploads batched to reduce auth overhead.
Return both human‑readable content and machine‑friendly structural_content.
Enforce timeouts; PDFs can be large.

Optional: A Minimal LangGraph-Style Graph Definition If you formalize nodes with LangGraph, a simple graph assembly could look like:

from langgraph.graph import Graph

graph = Graph()
graph.add_node("route", route_node)
graph.add_node("extract", extract_node)
graph.add_node("summarize", summarize_node)
graph.add_node("memory", memory_node)
graph.add_node("respond", respond_node)

graph.add_edge("route", "extract")
graph.add_edge("extract", "summarize")
graph.add_edge("summarize", "memory")
graph.add_edge("memory", "respond")

app = graph.compile()
result = app.invoke({"message": "Summarize the uploaded docs", "files": file_keys})
print(result["content"])
Design Notes:

Each node keeps a single responsibility.
The compiled graph enforces explicit data flow—easier to test.
Inject observability (timers, counters) at node boundaries.

Production Hardening Checklist Input Validation: Reject oversized or malformed files early. Structured Output Enforcement: Fail fast if schema fields are missing. Idempotency: Re‑summarize only when source or config changes. Observability: Log node start/end + latency; emit metrics per workflow. Cost Controls: Cache summaries; adjust temperature and max_tokens conservatively. Security: Signed URLs for document retrieval; strict auth on agent endpoint. Drift Handling: Maintain versioned JSON schemas for structured outputs.
Common Pitfalls Overstuffing the system prompt—better to pass clean content + minimal instruction. Mixing storage concerns (session state) with transformation logic—keep memory node isolated. Ignoring error surfaces: always wrap LLM calls and return structured error objects.
Extending the Graph Add an Evaluation Node: Auto‑grade summaries against reference heuristics. Add a Retrieval Node: Hybrid semantic + metadata filtering before summarization. Add a Redaction Node: Strip PII before persistence.
Try It Yourself (Generic Mini Script) def quick_demo(file_paths: list[str]): # Pretend upload + extraction extracted_chunks = [open(p, encoding="utf-8").read()[:4000] for p in file_paths] merged = "\n".join(extracted_chunks) state = Message(role="user", content=merged, structural_content={"sections": []}) llm = return_llm() # any provider instance (Bedrock, OpenAI, local, etc.) schema = {"type": "object", "properties": {"summary": {"type": "string"}}} summary = summarize_json(state, llm, schema) print("Summary:\n", summary["summary"])
Conclusion A production‑ready agent is not magic—it is a disciplined composition of small, testable nodes: routing, extraction, summarization, memory, and output shaping. By expressing the system as a graph, you gain clarity, resilience, and extensibility. Start simple (foundation workflow), then layer retrieval, structured output, and long‑term memory as concrete value drivers.

Questions or improvements you want to explore next (e.g., evaluation, caching, or multi‑modal inputs)? Turn each responsibility into a node, experiment, and iterate.

Happy building.

About the Author
Written by Suraj Khaitan — Gen AI Architect | Working on serverless AI & cloud platforms.