Building AI Agents That Actually Work in Production: My Technical Approach

#ai #openai #machinelearning #webdev

Building an AI agent that works in a demo is easy. Building one that works reliably in production is a completely different engineering challenge.

Production systems must handle real users, real data, and real consequences when things fail.

This is the production agent architecture I use across Brainfy AI and Navlyt, along with real code patterns and failure modes I design around.

What Makes Production Agents Different From Demo Agents

Demo agents optimize for the happy path.

Production agents must handle:

Real data variance
Production inputs are messy, ambiguous, and full of edge cases.
Concurrent executions
Multiple agent instances running simultaneously with shared state.
Long-running tasks
Agents that may take minutes or hours requiring durable execution state.
Cost management
Confused agents making unnecessary tool calls can become expensive quickly.
Observability
You must understand exactly what the agent decided and why.

The Core Architecture: Durable Agent State

The most important production decision:

Keep agent state in a database — not in memory.

In-memory state:

Dies with the server
Cannot scale horizontally
Cannot be audited

Database state:

Survives restarts
Enables horizontal scaling
Provides observability
Enables debugging

Example schema:

-- Agent execution state table

CREATE TABLE agent_executions (

 id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

 user_id UUID REFERENCES auth.users NOT NULL,

 agent_type TEXT NOT NULL,

 status TEXT NOT NULL DEFAULT 'pending',

 CONSTRAINT valid_status CHECK (
   status IN (
     'pending',
     'running',
     'completed',
     'failed',
     'cancelled',
     'awaiting_review'
   )
 ),

 input_data JSONB NOT NULL,

 state JSONB DEFAULT '{}',

 result JSONB,

 error TEXT,

 step_count INTEGER DEFAULT 0,

 token_count INTEGER DEFAULT 0,

 created_at TIMESTAMPTZ DEFAULT NOW(),

 updated_at TIMESTAMPTZ DEFAULT NOW(),

 completed_at TIMESTAMPTZ

);

-- Tool call log for observability

CREATE TABLE agent_tool_calls (

 id UUID DEFAULT gen_random_uuid() PRIMARY KEY,

 execution_id UUID REFERENCES agent_executions NOT NULL,

 step_number INTEGER NOT NULL,

 tool_name TEXT NOT NULL,

 tool_input JSONB NOT NULL,

 tool_output JSONB,

 status TEXT NOT NULL DEFAULT 'pending',

 latency_ms INTEGER,

 error TEXT,

 called_at TIMESTAMPTZ DEFAULT NOW()

);

The Agent Loop With Production Safeguards

Production agents need hard limits.

Example safeguards:

Step limits
Token limits
Timeout limits
Failure conditions

Example TypeScript loop:

// lib/agents/production-agent.ts

const AGENT_LIMITS = {

 maxSteps: 25,

 maxTokens: 50_000,

 stepTimeoutMs: 30_000,

 totalTimeoutMs: 300_000

}

export async function runAgent(

 executionId: string,

 supabase: SupabaseClient

): Promise<void> {

 const startTime = Date.now()

 let execution = await loadExecution(
   executionId,
   supabase
 )

 await updateStatus(
   executionId,
   'running',
   supabase
 )

 while (true) {

   const elapsed =
     Date.now() - startTime

   if (execution.step_count >= AGENT_LIMITS.maxSteps){

     await failWithReason(
       executionId,
       'MAX_STEPS_EXCEEDED',
       supabase
     )

     return
   }

   if (execution.token_count >= AGENT_LIMITS.maxTokens){

     await failWithReason(
       executionId,
       'MAX_TOKENS_EXCEEDED',
       supabase
     )

     return
   }

   if (elapsed >= AGENT_LIMITS.totalTimeoutMs){

     await failWithReason(
       executionId,
       'TOTAL_TIMEOUT',
       supabase
     )

     return
   }

   const response =
     await callModel(messages, TOOLS)

   execution.step_count++

   execution.token_count +=
     response.usage?.total_tokens ?? 0

   await persistState(
     executionId,
     execution,
     supabase
   )

}

The Human-in-the-Loop Gate

For actions that are difficult to reverse, I require human approval.

The agent:

Prepares the action
Sets status to awaiting_review
Stops execution
Waits for approval

Example:

const APPROVAL_REQUIRED_TOOLS = [

 'send_email',

 'update_customer_record',

 'generate_compliance_document',

 'submit_to_regulator'

]

async function executeToolCall(

 toolCall,

 executionId,

 supabase

){

 if(APPROVAL_REQUIRED_TOOLS.includes(name)){

   await updateStatus(
     executionId,
     'awaiting_review',
     supabase
   )

   throw new AgentPausedError(
     'Human approval required'
   )
 }

 return await callTool(name,args)

}

Monitoring: What I Track in Production

Metrics I monitor:

Step efficiency
Tool success rate
Human review escalation rate
Token cost per completion
Completion rate

Example health query:

const { data } =
await supabase.rpc(

 'agent_health_metrics',

 {

   agent_type:
     'compliance_document_generator',

   since:
     new Date(
       Date.now() -
       7 * 24 * 60 * 60 * 1000
     ).toISOString()

 }
)