Elena Revicheva

Posted on May 15 • Originally published at aideazz.xyz

What Is an AI Agent: A Builder's Definition from Production

#ai #programming #machinelearning

Originally published on AIdeazz — cross-posted here with canonical link.

Everyone's building "AI agents" now. Most are glorified chatbots with API calls. Here's what an actual agent looks like when you're running dozens in production, serving thousands of users daily across Telegram and WhatsApp.

The Production Definition: Observe → Decide → Act → Persist

An AI agent is software that maintains state across interactions while autonomously executing multi-step workflows. Not "chat + function calling." Not "GPT wrapper with memory."

The core loop:

Observe: Ingest data from multiple sources (webhooks, APIs, user messages, scheduled triggers)
Decide: Route to appropriate models based on task complexity and cost constraints
Act: Execute workflows spanning multiple systems and services
Persist: Maintain context and state across sessions, users, and time

My WhatsApp invoice processor doesn't just extract data—it maintains conversation state across days, remembers user preferences, autonomously retries failed OCR attempts, and escalates to human review when confidence drops below thresholds. That's an agent.

Architecture Reality: What Production Agents Actually Need

Running agents at scale on Oracle Cloud Infrastructure taught me the non-negotiables:

State Management

Agents die without proper state persistence. We use Oracle Autonomous Database for conversation history, user context, and workflow state. Redis handles session caching. Every agent maintains:

Conversation memory (last 10-50 messages depending on use case)
User profile data (preferences, permissions, usage patterns)
Workflow state machines (current step, pending actions, retry counts)
Cross-session context (previous interactions, learned patterns)

Model Routing

Single-model agents fail in production. We route dynamically:

Groq's Llama 70B for high-volume, low-latency tasks (€0.0008/1K tokens)
Claude 3.5 Sonnet for complex reasoning and code generation
GPT-4 Vision for document processing when OCR confidence is low
Local Whisper models for voice transcription

The router itself is a lightweight classifier that considers: token cost, latency requirements, task complexity, and current queue depths.

Error Handling

Agents fail constantly. Network timeouts, model hallucinations, API rate limits, malformed responses. Production agents need:

Exponential backoff with jitter for API retries
Fallback models for critical paths
Human escalation workflows
Graceful degradation (partial results > no results)

The Difference: Agents vs Chat Wrappers

Most "AI agents" are chat interfaces with function calling. Here's how to spot the difference:

Chat Wrapper Characteristics:

Stateless between conversations
Single model dependency
Synchronous request/response only
No autonomous execution
Context limited to current session

True Agent Characteristics:

Maintains state across weeks/months
Multi-model orchestration
Asynchronous, event-driven execution
Autonomous decision-making within bounds
Context spans users, sessions, and systems

My logistics coordination agent doesn't wait for commands. It monitors shipment webhooks, detects delays, calculates impact across supply chains, notifies affected parties, and suggests rerouting options—all before a human notices the delay.

Multi-Agent Coordination: The Next Complexity Level

Single agents are straightforward. Multi-agent systems are where architecture decisions compound.

In our Oracle Cloud setup, agents communicate through:

Message queues (Oracle Streaming) for async coordination
Shared state stores for collaborative workflows
Event buses for system-wide notifications
Consensus protocols for conflicting decisions

Example: Our customer service system runs three agent types:

Intake agents: Classify requests, extract entities, route to specialists
Specialist agents: Domain-specific problem solvers (billing, technical, logistics)
QA agents: Monitor other agents' responses, flag anomalies, trigger retraining

They share context through a distributed cache but maintain separate decision loops. When the billing agent needs shipping data, it queries the logistics agent through our internal API—not direct database access.

Building Your First Real Agent

Skip the tutorials building "email summarizers." Here's a production-worthy starting point:

Document Processing Agent (what we deploy for SMB clients):

Monitor email inbox or cloud folder
Classify document type (invoice, PO, contract)
Extract structured data using appropriate model
Validate against business rules
Push to ERP/accounting system
Handle exceptions with human escalation

Technical stack:

Python + FastAPI for agent service
Celery + Redis for task queue
PostgreSQL for state persistence
Webhook endpoints for email/cloud triggers
Docker + Kubernetes for deployment

Cost reality: Running this for 1,000 documents/month:

Infrastructure: ~€50 (Oracle Cloud Always Free tier covers most)
Model API costs: €20-80 depending on document complexity
Development: 2-3 weeks for MVP, 2-3 months for production-ready

Common Failure Modes I've Seen

1. Infinite Loops

Agent decides to call itself recursively. Solution: Implement call stack depth limits and circuit breakers.

2. Context Pollution

Agent accumulates irrelevant context over time, degrading performance. Solution: Sliding window memory with relevance scoring.

3. Cost Explosion

Agent makes unnecessary API calls or uses expensive models for simple tasks. Solution: Implement cost tracking per workflow and model routing logic.

4. Hallucinated Actions

Agent invents functions or parameters that don't exist. Solution: Strict action validation against predefined schemas.

5. State Corruption

Concurrent updates corrupt agent state. Solution: Implement proper locking mechanisms and state versioning.

The Business Reality

After building agents for logistics companies, government contractors, and financial services, the pattern is clear: successful agents handle narrow, well-defined workflows with clear success metrics.

My most successful deployment? A procurement agent that:

Monitors supplier catalogs for price changes
Compares against contract terms
Flags discrepancies to procurement team
Generates monthly savings reports

Boring? Yes. Valuable? It saves one client €200K annually in overcharges.

The failures? Always overambitious "general purpose" agents that tried to handle everything. Constraint is feature, not bug.

Moving Beyond Demos

The gap between demo and production agents is massive. Demo agents work on happy paths with perfect inputs. Production agents handle:

Malformed data (PDFs with tables, handwritten notes, mixed languages)
Intermittent failures (network issues, service outages, rate limits)
Adversarial inputs (prompt injections, deliberately confusing requests)
Scale (thousands of concurrent conversations, millions of state transitions)
Compliance (audit trails, data residency, privacy regulations)

If you're building agents, start narrow. Pick one workflow. Handle every edge case. Add monitoring. Then expand.

The market's flooded with "AI agent frameworks" that abstract away the complexity. They work for demos. For production, you need to understand the plumbing: state management, error handling, cost control, and operational monitoring.

That's what an AI agent actually is—not a chatbot with memory, but an autonomous system that observes, decides, acts, and persists, handling real workflows with real constraints.

Frequently Asked Questions

Q: What's the minimum viable agent architecture for a startup?
A: Single Python service with FastAPI, PostgreSQL for state, Redis for caching, and Celery for async tasks. Deploy on a single VM initially. This handles 1,000s of daily interactions before needing horizontal scaling.

Q: How do you prevent agents from making costly mistakes autonomously?
A: Implement approval workflows for high-risk actions, set spending limits per time period, and use canary deployments where agents operate in shadow mode before getting write permissions. We also log every decision for audit trails.

Q: Should I use LangChain/similar frameworks or build from scratch?
A: Start with frameworks for prototypes, but expect to replace components as you scale. Most frameworks optimize for flexibility over production concerns like cost control and error handling. We use LangChain for experiments, custom code for production.

Q: What's the typical latency for multi-step agent workflows?
A: Depends on complexity, but our invoice processing agent averages 8-12 seconds for: receive document → OCR → extract data → validate → update database → send confirmation. User-facing chat agents target sub-2 second responses by preprocessing and caching common paths.

Q: How do you handle agent testing and deployment?
A: Three-stage pipeline: (1) Unit tests for individual components with mocked LLM responses, (2) Integration tests using recorded real interactions, (3) Shadow mode in production for 48-72 hours before full deployment. Rollback is one-click through feature flags.

— Elena Revicheva · AIdeazz · Portfolio

DEV Community