Most AI systems break the moment they leave a notebook. They work fine as demos one prompt in, one response out but fall apart when asked to reason in steps, collaborate across tasks, recover from errors, or operate securely at scale. This is where Agentic AI becomes necessary. Instead of a single large prompt, we design systems that plan, execute, validate, and respond much like a small team of engineers working together.
In this article, I’ll walk through how to build a production-grade Agentic AI system on AWS, using LangGraph and CrewAI for orchestration, AWS Bedrock and SageMaker for intelligence, and Amazon EKS to deploy the whole thing as a scalable API.
The Problem: Why a Single LLM Call Is Not Enough
If you’ve built LLM-powered features before, you’ve probably run into the same issues:
The model produces inconsistent results.
A single failure breaks the entire flow.
There’s no memory or state across steps.
Observability is poor.
Security and access control feel bolted on.
Agentic AI solves this by explicitly modeling how thinking happens instead of pretending everything fits into one prompt. But to do that well, we need structure.
A Quick Look at the Architecture
At runtime, the system behaves like a normal backend service. A client sends a request, an API responds. Internally, however, that request triggers a multi-step reasoning workflow.
The request enters through an API Gateway endpoint and is routed to services running on Amazon EKS. Inside the cluster, LangGraph orchestrates the reasoning flow, CrewAI manages collaboration between agents, and the agents themselves call AWS Bedrock for foundation models or SageMaker endpoints for custom ML predictions. State is persisted, validated, and finally returned to the caller.
The key idea is simple: treat intelligence like a distributed system, not a function call.
Modeling Reasoning Explicitly with LangGraph
LangGraph is the backbone of the system because it forces us to be honest about how reasoning works.
Instead of chaining prompts, we define a graph where each node represents a step in thinking or execution, and edges represent transitions. State flows through the graph and gets updated along the way.
Let’s start by defining the shared state.
from typing import TypedDict, List
class AgentState(TypedDict):
user_query: str
plan: str
research_notes: List[str]
risk_score: float
final_answer: str
This state object becomes the contract between all agents. No hidden context. No magic.
Designing Agents That Do One Thing Well
Rather than creating a single “smart” agent, we split responsibilities. This mirrors how humans actually work.
One agent plans.
Another researches.
Another validates.
Another synthesizes the final answer.
CrewAI makes this collaboration straightforward.
from crewai import Agent
planner = Agent(
role="Planner",
goal="Break the user query into clear, actionable steps",
backstory="Senior system architect who excels at structured thinking"
)
researcher = Agent(
role="Researcher",
goal="Gather accurate information and supporting details",
backstory="Meticulous analyst with a strong research background"
)
validator = Agent(
role="Validator",
goal="Check correctness, risk, and policy compliance",
backstory="Risk and compliance expert"
)
responder = Agent(
role="Responder",
goal="Produce a clear and concise final response",
backstory="Excellent technical communicator"
)
Each agent can use different tools, models, or permissions. That flexibility becomes crucial later.
Using AWS Bedrock for Foundation Model Reasoning
For language reasoning, summarization, and planning, we rely on AWS Bedrock. It removes the operational burden of managing model infrastructure and integrates cleanly with IAM and VPC networking.
Here’s a simple helper function agents can use.
import boto3
import json
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")
def bedrock_call(prompt: str) -> str:
response = bedrock.invoke_model(
modelId="anthropic.claude-3-sonnet-20240229-v1:0",
body=json.dumps({
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 600
})
)
payload = json.loads(response["body"].read())
return payload["content"][0]["text"]
This function can now be wrapped as a tool and used by any agent during execution.
Adding Deterministic Intelligence with SageMaker
Large language models are probabilistic. That’s fine for reasoning, but risky for things like scoring, classification, or prediction.
This is where SageMaker fits in.
Imagine a custom risk model trained on historical data. We deploy it as a real-time endpoint and call it from our agent.
sagemaker = boto3.client("sagemaker-runtime")
def get_risk_score(features: dict) -> float:
response = sagemaker.invoke_endpoint(
EndpointName="risk-scoring-endpoint",
ContentType="application/json",
Body=json.dumps(features)
)
result = json.loads(response["Body"].read())
return result["risk_score"]
Now our system combines:
LLM reasoning (Bedrock)
Deterministic ML predictions (SageMaker)
This hybrid approach is far more robust than LLM-only designs.
Wiring Everything Together with LangGraph
With agents and tools defined, we assemble the reasoning workflow.
from langgraph.graph import StateGraph
graph = StateGraph(AgentState)
graph.add_node("plan", planner.run)
graph.add_node("research", researcher.run)
graph.add_node("validate", validator.run)
graph.add_node("respond", responder.run)
graph.add_edge("plan", "research")
graph.add_edge("research", "validate")
graph.add_edge("validate", "respond")
graph.set_entry_point("plan")
agent_app = graph.compile()
This graph enforces discipline. Planning must happen before research. Validation must happen before response. If something fails, we know exactly where.
Deploying the Agent as a Service on Amazon EKS
Everything we’ve built runs inside containers deployed to Amazon EKS. Each agent system becomes just another microservice.
This gives us predictable scaling, rolling deployments, health checks, and isolation. If demand spikes, Kubernetes scales. If a bad release happens, we roll back.
From the outside, it behaves like a normal API.
Exposing the Agent Through an API
Using Amazon API Gateway and an Application Load Balancer, we expose a single endpoint.
A client sends:
{
"query": "Analyze compliance risks in this policy document"
}
The system responds with:
{
"plan": "...",
"risk_score": 0.18,
"final_answer": "..."
}
Any application — web, mobile, backend, or partner system — can now consume agentic intelligence through a clean interface.
Making It Production-Grade
This is where AWS shines. We use DynamoDB for state persistence, S3 for documents and artifacts, Secrets Manager for credentials, CloudWatch and X-Ray for observability, and IAM for fine-grained access control.
Conclusion
Agentic AI is not about building smarter prompts. It’s about designing systems that think in steps, collaborate in roles, and operate under constraints. By combining LangGraph, CrewAI, AWS Bedrock, SageMaker, and EKS, we can build AI systems that can be integrated with enterprise system and provide good solutions for complex workflow problems.






Top comments (0)