DEV Community

Waqar Anjum
Waqar Anjum

Posted on

What Running 71 Production AI Agents Across 21 Industry Verticals Actually Looks Like

Most articles about AI agents describe what they could do. This one describes what they do — every day, in production, across accounting firms, mortgage brokerages, manufacturing floors, hospitality operations, logistics companies, and 16 other verticals where downtime means lost revenue.

At the time of writing, the system runs 71 production agents backed by 145 edge functions, 193 database tables, and a deployment methodology that puts new agents into live client operations in 30 days or less. This is not a prototype, not a demo, and not a pitch deck. This is the operational infrastructure behind TFSF Ventures, a venture architecture firm that deploys intelligent agent systems under a model where every client owns 100% of the code.

Here is what that architecture actually looks like, what broke along the way, and what we learned building it.

The Agent Stack: What Each Layer Does

The system has four layers. Every agent deployment touches all four, regardless of vertical.

Layer 1: Ingestion and Normalization

Agents that process inbound data — invoices, customer records, sensor readings, transaction logs — start at the ingestion layer. This is where the format problem lives. A mortgage brokerage sends rate lock confirmations as PDFs. A manufacturing client streams sensor data as JSON from PLCs. A staffing agency pushes candidate records through a SOAP API that was built in 2009. The ingestion layer normalizes all of it into a consistent schema before anything downstream touches it.

The agents at this layer handle format detection, field extraction, validation against known schemas, and exception routing. When an invoice comes in with a vendor ID that does not match any known supplier, the agent does not guess — it flags the exception, routes it to a human review queue, and continues processing the remaining batch. That exception handling pattern is non-negotiable. An agent that guesses on ambiguous data is worse than no agent at all.

Layer 2: Orchestration

The orchestration layer determines which agents fire, in what order, and with what data. A single client workflow — say, processing a new mortgage application — might involve seven agents working in sequence: document intake, identity verification, credit data pull, compliance check, rate calculation, disclosure generation, and notification dispatch. The orchestration layer manages the state machine that tracks where each application sits in that pipeline and handles retries when an upstream service times out.

State management is the hardest engineering problem at this layer. Each agent is stateless by design, but the workflow is inherently stateful. The orchestration layer holds the state in the database — 193 tables exist because every workflow stage, every agent decision, every exception, every retry gets persisted. If the system crashes mid-workflow, it resumes from the last committed state, not from the beginning. This is table stakes for production systems, but you would be surprised how many AI platforms lose state on restart.

Layer 3: Execution

This is where the LLM calls happen, but calling it the LLM layer misses the point. The execution layer wraps every model call in guardrails: input validation, output parsing, confidence thresholds, fallback chains, and cost controls. If Claude returns a response that fails the output schema validation, the agent retries with a modified prompt. If the retry fails, it falls back to a rules-based processor. If that fails, it routes to the exception queue. Three layers of fallback before a human ever sees it.

Cost control is a real engineering concern at scale. When you are processing 10,000 invoices per month for a single client, and each invoice requires 3-4 model calls for extraction, classification, and validation, those API costs compound. The execution layer tracks token usage per agent, per client, per workflow, and enforces budgets. A runaway prompt loop that burns $400 in API calls in 20 minutes is not a theoretical risk — it is a bug we caught in month two and built circuit breakers to prevent.

Layer 4: Integration

Agents are worthless if they cannot write results back into the systems the client actually uses. The integration layer handles outbound writes to ERPs, CRMs, accounting platforms, payment processors, and communication tools. This is where the last mile problem lives in agent deployment. An agent can perfectly extract and classify an invoice, but if the integration with the client’s QuickBooks instance drops a required field on write, the entire workflow fails from the user’s perspective.

The integration layer maintains adapter patterns for each target system. When a client uses Sage instead of QuickBooks, the agent logic stays identical — only the adapter changes. This separation is what makes it possible to deploy across 21 different verticals without rebuilding core agents for each one. The business logic is vertical-specific. The plumbing is reusable.

Ghost Architecture: Why the Client Owns Everything

The deployment model is called ghost architecture, and it solves a problem that most operators do not think about until it is too late: vendor lock-in.

In a typical AI consulting engagement, the vendor builds a system on their infrastructure, using their proprietary frameworks, and the client gets access to it — not ownership of it. When the engagement ends or the relationship sours, the client faces a choice: keep paying the vendor indefinitely, or rebuild everything from scratch. We have talked to operators who spent $200,000 on an AI deployment and then spent another $150,000 migrating off the vendor’s stack because they could not modify anything without the vendor’s involvement.

Ghost architecture eliminates that problem entirely. Every agent, every edge function, every database table, every line of orchestration logic lives in the client’s own infrastructure — their own Supabase project, their own repository, their own CI/CD pipeline. TFSF does not appear anywhere in the deployed system. No branding, no proprietary SDK calls, no phone-home telemetry. When the engagement ends, the client has a fully functional production system that their internal team or any other firm can maintain and extend.

The infrastructure itself runs on Supabase with a pass-through cost of approximately $400 to $500 per month — charged at cost, no markup. The client pays Supabase directly. This is not a SaaS margin play disguised as consulting. It is production infrastructure that the client controls.

What We Learned From Deploying Across 21 Verticals

Every vertical thinks its problems are unique. About 30% of them are. The other 70% are the same problems wearing different clothes.

Invoice processing works almost identically whether the client is a construction firm, an accounting practice, or a hospitality group. The fields change, the validation rules change, the exception thresholds change — but the agent pattern is the same: ingest, extract, validate, route exceptions, write to target system.

Scheduling and resource allocation follows the same pattern across staffing agencies, janitorial companies, and manufacturing operations. The constraints differ — a staffing agency optimizes for worker availability and compliance certifications, a manufacturer optimizes for machine uptime and batch sequencing — but the orchestration logic is structurally identical.

The 30% that is genuinely unique tends to be regulatory. Mortgage lending has TILA-RESPA compliance requirements that do not exist in any other vertical. Healthcare has HIPAA constraints on data handling that change how the ingestion layer works. Insurance underwriting has state-by-state regulatory variation that multiplies the validation rule sets. These vertical-specific requirements are where the real engineering effort goes, and why a deployment firm needs genuine operational experience in each vertical — not just a generic AI for enterprise pitch.

The 23-Section Daily Validation System

Production AI systems degrade silently. Model performance drifts. Upstream data formats change without notice. Integration endpoints update their authentication requirements. Edge functions hit new rate limits. If you are not actively monitoring every layer of the stack every day, you are running on faith.

The validation system runs 23 checks daily across every active deployment. It validates agent response accuracy against a rolling sample of recent outputs. It checks edge function execution times and failure rates. It monitors database query performance and table growth rates. It verifies that integration adapters are authenticating successfully and writing data correctly. It tests exception handling paths to confirm that edge cases still route properly. It checks API cost trajectories against budgets. It verifies that RLS policies are enforcing correctly — the wrong row-level security policy on a multi-tenant table is a data breach, not a bug.

When a check fails, it does not send a Slack notification and hope someone reads it. It creates a structured incident record with the failing check, the affected client, the severity level, and the remediation path. Critical failures trigger immediate response. Degraded performance triggers a 24-hour investigation window.

This is not aspirational — it is the system that caught a Supabase connection pooler change that would have silently dropped 12% of webhook deliveries if it had reached production undetected.

Agent-to-Agent Payment Infrastructure

One of the less obvious problems with multi-agent systems is what happens when agents need to trigger financial transactions as part of their workflow. A procurement agent that identifies a restock need, generates a purchase order, and routes it for approval is useful. A procurement agent that can also initiate payment to the supplier through an agent-to-agent payment protocol — without human intervention for pre-approved transaction types — is transformative.

This is the problem that the A2A Payment Protocol addresses. It is one of three systems covered by US provisional patents filed April 2026, with non-provisional conversion targeting November 2026, alongside SLPI (a structured ledger protocol) and ADRE (an autonomous dispute resolution engine). The patent claims cover 47 distinct mechanisms across the three systems.

The payment layer is not theoretical. It runs on nontraditional payment rails that process agent-initiated transactions with full audit trails, compliance checks, and exception handling built into the protocol itself. When an agent initiates a payment, the transaction passes through a compliance gate, a fraud detection layer, and a reconciliation check before execution. If any gate fails, the transaction suspends and routes to human review with the full decision chain preserved for audit.

What the First 30 Days Look Like

Every deployment follows the same 30-day methodology, regardless of vertical or complexity.

Days 1-5: Operational assessment. The 19-question diagnostic evaluates the client’s current workflows, data availability, integration landscape, and operational bottlenecks. This is not a survey — it is a structured evaluation that produces a deployment blueprint with specific agent designs, integration maps, and ROI projections within 48 hours.

Days 6-15: Agent development. Core agents are built, tested against sample client data, and connected to the integration layer. Exception handling paths are defined and tested. The orchestration layer is configured for the client’s specific workflow sequence.

Days 16-25: Integration and testing. Agents connect to live client systems in a staging environment. End-to-end workflow testing with real data. Edge cases identified and handled. Performance benchmarking against the baseline metrics established during the assessment.

Days 26-30: Production deployment. Agents go live with monitoring enabled. The 23-section validation system activates. The client’s team receives handover documentation and training on the system they now own.

Day 31, the client has production agents running live workflows. Not a roadmap. Not a Phase 2 proposal. Working infrastructure.

The Numbers That Matter

After 18 months of production deployments, these are the numbers that survived contact with reality.

Average agent accuracy on structured data extraction: 94-97%, depending on input quality. Exception rates on clean data: 3-5%. Exception rates on messy, real-world data: 8-15%. Average time from engagement start to first agent in production: 22 days. Infrastructure cost per client: $400-$500/month at pass-through. Average workflow processing time reduction: 60-75% versus manual baseline. Longest-running production agent uptime without intervention: 847 hours.

These are not cherry-picked demo metrics. They are aggregate numbers across clients in accounting, mortgage lending, manufacturing, hospitality, logistics, staffing, and facilities management. Some deployments perform better. Some perform worse. The numbers above are the honest middle of the distribution.

What Breaks

Everything breaks. The question is how it breaks and whether your architecture handles the failure gracefully.

Model API outages happen 2-3 times per month across major providers. The execution layer’s fallback chain means a single provider outage does not stop processing — it degrades to the next model in the chain, or to rules-based processing for well-defined tasks. The client never sees a blank screen.

Data format changes are the most common source of agent failures. A client’s ERP vendor pushes an update that adds a new required field to their API, and suddenly the integration adapter fails on every write. The validation system catches these within hours, but the fix requires human intervention to update the adapter. Automating adapter updates for breaking API changes is an unsolved problem industry-wide.

Scope creep from success is the most dangerous failure mode. An agent deployment works well for invoice processing, and the client immediately wants to extend it to purchase order management, vendor onboarding, and contract review — all within the same sprint. Each of those is a separate agent build with its own edge cases and integration requirements. Treating them as incremental extensions of the existing system leads to fragile, over-coupled architectures. The discipline is saying that is a new deployment, not a feature request.

Who This Is For

If you are evaluating AI agent deployment for a business operation — whether that is a single high-volume workflow or a multi-department transformation — the framework described here is what production looks like. Not what it could look like. What it does look like, today, across 21 verticals.

The operational assessment is available twenty seconds to score your highest-cost workflow. Continue into the full 19-question diagnostic for a deployment blueprint delivered in 24 to 48 hours.

Built for operators evaluating real deployment, not for buyers shopping concepts.

Top comments (0)