Seven Lambda functions. Seven isolated AI tools that couldn't talk to each other.
Each function worked perfectly alone - document summarization, content extraction, classification. But when I tried to chain them together for complex workflows, the entire architecture crumbled. Timeouts cascaded. State management became impossible. Users got no feedback until everything finished or failed.
I learned the hard way that Lambdas are tools, not orchestrators.
The Lambda Limitation Wall
Here's what I was trying to build: an AI research agent that could take a topic, find relevant documents, extract key information, summarize findings, and present a coherent report. Simple enough, right?
Wrong. With Lambda functions, this became a nightmare of nested API calls, state management, and timeout issues. Each function had a 15-minute timeout, but my research workflow needed 45 minutes for complex topics. Plus, I had no way to maintain conversation context or stream partial results back to users.
The breaking point came when a user asked: "Can you research climate change impacts on agriculture and give me updates as you find things?"
With Lambdas, my answer was: "No, wait 15 minutes and I'll dump everything on you at once."
That's when I knew I needed a different approach.
Enter ECS Fargate: The Orchestrator
ECS Fargate solved every problem I had with Lambda orchestration:
- No timeout limits - tasks can run for hours if needed
- Streaming capabilities - real-time updates to users
- Persistent connections - maintain state across tool calls
- Still serverless-ish - pay per second, auto-scaling
The key realization: use Lambda functions as specialized tools, but orchestrate them with ECS Fargate running AI agents.
The ReAct Pattern: Think, Act, Observe, Repeat
The breakthrough came when I implemented the ReAct pattern (Reasoning and Acting). It's how humans solve complex problems:
- Think - analyze the current situation and plan next steps
- Act - use a tool to gather information or perform an action
- Observe - examine the results and update understanding
- Repeat - continue until the goal is achieved
Here's my implementation of the Agent class using the ReAct pattern:
import type { AIGateway } from '@ai-platform-aws/sdk';
import type { Tool, ToolCall, ToolResult, AgentConfig } from './types';
export class Agent {
private config: AgentConfig;
private gateway: AIGateway;
constructor(config: AgentConfig, gateway: AIGateway) {
this.config = config;
this.gateway = gateway;
}
async run(task: string): Promise<AgentResult> {
const maxIterations = this.config.maxIterations || 10;
const toolCalls: ToolCall[] = [];
let iterations = 0;
const systemPrompt = this.buildSystemPrompt();
const memory = this.config.memory;
// Add user task to memory
await memory.addMessage({
role: 'user',
content: task,
timestamp: new Date()
});
while (iterations < maxIterations) {
iterations++;
// Get conversation history and format for LLM
const history = await memory.getHistory();
const messages = this.formatMessages(systemPrompt, history);
// Call LLM for reasoning step
const response = await this.gateway.complete({
model: this.config.model,
messages
});
const content = response.content || '';
// Check if response contains tool call or final answer
const parsedToolCall = this.parseToolCall(content);
if (!parsedToolCall) {
// No tool call found - this is the final answer
await memory.addMessage({
role: 'assistant',
content,
timestamp: new Date()
});
return {
success: true,
output: this.extractFinalAnswer(content),
toolCalls,
iterations
};
}
// Execute the tool call
const { toolName, parameters } = parsedToolCall;
const tool = this.config.tools.find(t => t.name === toolName);
if (!tool) {
await memory.addMessage({
role: 'tool',
content: `Error: Tool "${toolName}" not found`,
timestamp: new Date()
});
continue;
}
// Human approval check for sensitive operations
if (this.config.onToolCall) {
const approved = await this.config.onToolCall(call);
if (!approved) {
await memory.addMessage({
role: 'tool',
content: `Tool call "${toolName}" rejected by human operator`,
timestamp: new Date()
});
continue;
}
}
// Execute the tool
const call: ToolCall = {
id: `call_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
toolName,
parameters,
timestamp: new Date()
};
let result: ToolResult;
try {
result = await tool.execute(parameters);
toolCalls.push(call);
} catch (err) {
result = {
success: false,
data: null,
error: `Tool execution error: ${(err as Error).message}`
};
}
// Add both the thought and tool result to memory
await memory.addMessage({
role: 'assistant',
content,
toolCall: call,
timestamp: new Date()
});
await memory.addMessage({
role: 'tool',
content: JSON.stringify(result, null, 2),
toolResult: result,
timestamp: new Date()
});
}
return {
success: false,
output: 'Agent reached maximum iterations without finding answer',
toolCalls,
iterations,
error: `Max iterations (${maxIterations}) reached`
};
}
private parseToolCall(content: string): { toolName: string; parameters: Record<string, unknown> } | null {
// Look for pattern: "Action: toolname\nAction Input: {...}"
const actionMatch = content.match(/Action:\s*(\w+)\s*\n\s*Action Input:\s*(\{[\s\S]*?\})\s*$/m);
if (actionMatch) {
try {
return {
toolName: actionMatch[1],
parameters: JSON.parse(actionMatch[2])
};
} catch {
return null;
}
}
return null;
}
private buildSystemPrompt(): string {
const toolDescriptions = this.config.tools
.map(t => `- **${t.name}**: ${t.description}\n Parameters: ${JSON.stringify(t.parameters)}`)
.join('\n');
return `You are ${this.config.name}: ${this.config.description}
Available tools:
${toolDescriptions}
When you need to use a tool, respond with:
Thought: [your reasoning]
Action: [tool_name]
Action Input: [JSON parameters]
When you have enough information to answer, respond with:
Thought: [final reasoning]
Final Answer: [complete answer]`;
}
}
Connecting Lambda Functions as Tools
The breakthrough was turning my existing Lambda functions into agent tools. Each Lambda becomes a specialized capability the agent can invoke:
import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';
import type { Tool, ToolResult } from './types';
export class LambdaTool implements Tool {
constructor(
public name: string,
public description: string,
public parameters: Record<string, any>,
private lambdaClient: LambdaClient,
private functionName: string
) {}
async execute(params: Record<string, unknown>): Promise<ToolResult> {
try {
const command = new InvokeCommand({
FunctionName: this.functionName,
Payload: JSON.stringify(params)
});
const response = await this.lambdaClient.send(command);
if (!response.Payload) {
return { success: false, data: null, error: 'No response from Lambda' };
}
const result = JSON.parse(Buffer.from(response.Payload).toString());
if (result.errorMessage) {
return { success: false, data: null, error: result.errorMessage };
}
return { success: true, data: result, error: null };
} catch (error) {
return {
success: false,
data: null,
error: `Lambda invocation failed: ${(error as Error).message}`
};
}
}
}
// Configure agent with Lambda tools
const lambdaClient = new LambdaClient({ region: 'us-east-1' });
const agent = new Agent({
name: 'ResearchAgent',
description: 'Analyzes documents and extracts insights',
model: 'gpt-4',
maxIterations: 10,
tools: [
new LambdaTool(
'summarize_document',
'Summarize document text content with specified length',
{ text: 'string', max_length: 'number' },
lambdaClient,
'ai-platform-summarize'
),
new LambdaTool(
'extract_entities',
'Extract named entities from text',
{ text: 'string', entity_types: 'string[]' },
lambdaClient,
'ai-platform-extract-entities'
),
new LambdaTool(
'classify_content',
'Classify text into predefined categories',
{ text: 'string', categories: 'string[]' },
lambdaClient,
'ai-platform-classify'
)
]
}, gateway);
Real Example: Research Agent in Action
Here's what happens when I ask the agent to research renewable energy trends:
Iteration 1:
- Think: "I need to find recent information about renewable energy trends. Let me search for relevant documents first."
-
Act:
search_documents({ query: "renewable energy trends 2024", limit: 10 }) - Observe: Found 8 relevant documents
Iteration 2:
- Think: "I found several documents. Let me extract key information from the most recent one."
-
Act:
extract_entities({ text: document_content, entity_types: ["technologies", "statistics", "companies"] }) - Observe: Extracted solar, wind, battery storage technologies with growth statistics
Iteration 3:
- Think: "Now I have some data points. Let me summarize the key findings and get more specific information about growth rates."
-
Act:
summarize_document({ text: combined_content, max_length: 500 }) - Observe: Generated concise summary with key trends
Final Answer: Comprehensive report on renewable energy trends with specific data points and growth projections.
Lambda vs ECS: Side-by-Side Comparison
What this research task looks like with Lambda:
// Impossible to implement cleanly
export const handler = async (event) => {
// Step 1: Search (works)
const documents = await searchLambda.invoke({...});
// Step 2: Process each document (timeout risk)
const summaries = [];
for (const doc of documents) {
const summary = await summarizeLambda.invoke({...});
summaries.push(summary);
// What if this takes 20 minutes total?
}
// Step 3: Synthesize (might not even get here)
const final = await synthesizeLambda.invoke({...});
// No streaming, no progress updates, all-or-nothing
return final;
};
What it looks like with ECS:
// Clean, maintainable, observable
export class ResearchAgent extends ReActAgent {
async researchTopic(topic: string): Promise<void> {
this.onProgress = (update) => this.streamToClient(update);
const result = await this.run(`Research ${topic} and provide comprehensive analysis`);
return result;
}
private streamToClient(update: string): void {
// WebSocket or SSE to client
this.websocket.send(JSON.stringify({
type: 'progress',
message: update,
timestamp: new Date().toISOString()
}));
}
}
Human-in-the-Loop for High-Stakes Actions
Not all actions should be automated. For high-stakes operations, I implemented a human approval pattern:
class ApprovalTool implements Tool {
name = 'request_approval';
description = 'Request human approval for sensitive actions';
parameters = { action: 'string', reasoning: 'string', risk_level: 'string' };
async execute(params: any): Promise<any> {
const approval = await this.notificationService.requestApproval({
message: `Agent wants to: ${params.action}`,
reasoning: params.reasoning,
riskLevel: params.risk_level,
timeout: 300000 // 5 minutes
});
if (!approval.approved) {
throw new Error(`Action rejected: ${approval.reason}`);
}
return { approved: true, conditions: approval.conditions };
}
}
When the agent wants to delete files, send emails, or make API calls with financial implications, it asks for permission first. The user gets a notification and can approve/reject with reasoning.
Performance and Cost Reality
After running this architecture for 3 months in production:
ECS Fargate costs:
- Average task: 1 vCPU, 2GB RAM
- Runtime: 5-15 minutes per complex query
- Cost: $0.04-0.12 per research task
- Monthly infrastructure: ~$25 for moderate usage
Lambda costs remained the same:
- $0.01-0.05 per tool invocation
- No change to individual function costs
- Better utilization since agents batch calls efficiently
Performance improvements:
- 40% reduction in total execution time (parallel tool calls)
- 90% improvement in user experience (streaming updates)
- Zero timeout failures
- 60% reduction in support tickets ("is it still working?")
The Architecture Today
My current setup runs 3 ECS tasks:
- Research Agent - multi-tool workflows like the example above
- Content Agent - writing, editing, formatting workflows
- Analysis Agent - data processing and reporting workflows
Each agent can call any of the 7 Lambda tools as needed. The agents scale independently based on demand, and I can deploy new tools without touching the orchestration logic.
What's Next
The complete code is available in my repositories:
- Agent implementation: ai-platform-aws/packages/agents
- Lambda tools example: 03-lambda-ai-tool
- ECS agent orchestrator: 04-ecs-agent-orchestrator
Next up: Part 6 covers the TypeScript SDK that makes this platform actually enjoyable for developers to use. No more raw HTTP calls or manual error handling.
Part 5 of 8 in the series "Building an AI Platform on AWS from Scratch". Everything I learned building ai-platform-aws - including the expensive mistakes.
Top comments (0)