DEV Community: Arjun

You're Not Building an AI Agent. You're Building a Very Expensive Chatbot.

Arjun — Wed, 13 May 2026 06:15:52 +0000

The architectural difference developers keep missing ,and why it's costing teams months of rework.

The ticket came in on a Friday: "We need an AI agent that handles customer onboarding end-to-end." By Tuesday, the team had a working demo. A polished chat interface. It asked the right questions. The stakeholders loved it.

Six weeks later, it was in production doing approximately one thing: answering FAQ questions in a slightly fancier wrapper than the old help center.

This pattern is everywhere right now. Teams ship what they call an "AI agent" and discover it is, functionally, a better-dressed chatbot. Not because the developers are cutting corners ,but because the distinction between the two architectures is still genuinely blurry in most sprint rooms, product briefs, and vendor pitches.

It matters. Getting the architecture wrong at the start costs three to six months of rework. Here is what developers and technical decision-makers actually need to know.

The Difference Is Architectural, Not a Marketing Label

A chatbot responds. An AI agent acts.

That sounds like a slogan, but the technical implication is significant. A chatbot ,even an LLM-powered one ,operates in a closed loop: user sends input, model generates output, conversation continues. It has no persistent state beyond the context window, no access to external systems unless explicitly hardwired, and no ability to decide whichtool to reach for based on the task at hand.

The definition of an AI agent shifted significantly in 2025 ,from the academic framing of systems that "perceive, reason and act" to a more operational description: LLMs capable of using software tools and taking autonomous action, calling APIs, coordinating with other systems and completing tasks independently.

The inflection point that made this practical was Anthropic's release of the Model Context Protocol in late 2024. MCP allowed developers to connect large language models to external tools in a standardized way, effectively giving models the ability to act beyond generating text. Before that, most "agentic" implementations were brittle custom wiring.

The architectural checklist is blunt: does the system have persistent memory across sessions, access to real tools and APIs it selects dynamically, a planning loop that breaks goals into sub-tasks, and a feedback mechanism that evaluates its own output? If the answer to most of those is no, it is a chatbot. A useful one, possibly. But not an agent.

GeekyAnts' engineering team describes this precisely in their breakdown of building AI agents vs chatbots: "Chatbots follow scripted flows and handle basic queries. AI agents go beyond ,they understand context, access tools, trigger APIs, and make decisions across complex workflows."

Where Developers Actually Get Burned

The wrong architecture causes two distinct failure modes, and they hit at different points in the development cycle.

The first failure arrives at demo. The team builds something with LangChain, hooks it into a few APIs, and it works ,in the demo environment, with the happy path, with a human watching and course-correcting. Production looks different. Edge cases, ambiguous user inputs, and multi-step tasks that require the agent to recover from a failed tool call all expose the fact that the "reasoning" layer was mostly prompt engineering, not genuine planning. Agentic systems often trade latency and cost for better task performance, and teams should consider carefully when this tradeoff makes sense.

The second failure arrives at scale. Teams that build chatbot architectures and call them agents hit a wall when the use case grows. Adding a new workflow means hardwiring new paths. Memory doesn't carry context across sessions. Observability is non-existent. Debugging a multi-tool failure chain in production without proper logging is ,to use a technical term ,a nightmare.

Real-world enterprise deployments tell a different story from the demos: Majesco's AI copilot achieved 23% faster task completion and 84% daily adoption rates when the underlying architecture matched the use case. The underreported part of that stat is how many deployments didn't achieve it because the architecture was mismatched from the start.

GeekyAnts' Aman Soni documented a practical example of this ,building a multi-agent SQL workflow where each agent handled a specific responsibility (query generation, validation, testing, response synthesis). That separation of concerns only works if the system is genuinely agentic. A chatbot would have collapsed that into a single prompt and called it done.

Choosing the Right Tool Before Writing the First Line

The honest decision framework is not "chatbot vs agent." It is: how much autonomous decision-making does the task actually require?

Most internal tools, customer FAQs, support ticket triage, and document summarization workflows do not need an agent. They need a well-designed chatbot with good retrieval (RAG), clear fallback handling, and fast response times. Building an agent here adds latency, cost, and debugging complexity with no user-facing benefit.

Where agents become necessary:

The task requires multi-step execution across different systems that cannot be predetermined at build time. Order processing that touches inventory, payments, notifications, and CRM simultaneously ,that is an agent problem.
The system must recover from failures mid-task without human intervention, re-plan based on new information, and maintain state across a session that spans days, not messages.

For multi-stage or multi-agent pipelines ,supply chain management, financial trading, complex support escalations ,orchestrated workflows offer better performance control. Agents are appropriate for tasks requiring flexibility and model-driven decision-making at scale. For simple, self-contained tasks, a well-structured chain is usually sufficient.

GeekyAnts has published a useful comparison of RAG vs fine-tuning vs AI agents that maps use cases to architecture choices without defaulting to "always use the most complex option." It is worth reading before committing to a stack.

The framework decision also has cost implications. Many applications are fully served by optimizing a single LLM call with retrieval and in-context examples. Reaching for agent architecture before validating that simpler approaches fail is a common and expensive mistake.

The Part Nobody Puts in the Sprint Brief

There is a conversation that happens in most teams after a chatbot-marketed-as-agent ships and underperforms. Someone says the model needs to be smarter. Someone else says the prompts need work. The actual answer, usually, is that the architecture was wrong before the first commit landed.

The distinction between a chatbot and an AI agent is not a vocabulary debate. It determines memory strategy, tool integration design, observability requirements, cost modeling, and how the system behaves when something goes wrong at 2am on a Sunday.

Get the architecture decision right first. The frameworks ,LangChain, LangGraph, CrewAI, AutoGen, Google's ADK ,are all buildable once the decision is clear. GeekyAnts' step-by-step guide to building and deploying AI agents covers the implementation path once the architecture decision is made.

The demo worked. The question is whether the architecture behind it is built for what comes after the demo.

Inside a Real-Time AI Fraud Detection Engine That Makes Decisions in Under 50ms

Arjun — Wed, 06 May 2026 08:53:43 +0000

Every time a payment is submitted, a system somewhere has a matter of milliseconds to decide whether it's legitimate. Not seconds. Milliseconds. By the time the loading spinner appears on your screen, the verdict has already been issued.

That constraint ,act fast or be useless ,is what makes fraud detection one of the most interesting engineering challenges in fintech. This article breaks down how a production-grade, real-time fraud engine actually works: the architecture, the tradeoffs, and the decisions that make sub-50ms possible.

The Real Problem With Fraud Systems Today

The naive version of fraud detection is simple: write rules. Block transactions over a certain amount. Flag new devices. Reject international transfers from accounts that have never made them.

That works until it doesn't.

Modern financial platforms process tens of thousands of transactions per minute. Fraudsters adapt quickly. Static rules age out. And the collateral damage ,legitimate transactions blocked because they look unusual ,quietly destroys user trust. A customer whose payment gets declined at a restaurant doesn't file a complaint. They just switch banks
.
Four compounding problems define the current state of fraud systems:

Volume at scale. No human review queue can keep up. The system must make autonomous decisions, every time, without a queue.

Legacy latency. Many fraud systems were built when a two-second check was acceptable. Today, a two-second delay is noticeable. Users expect payments to feel instant.

False positive rates. Overly aggressive models block real customers. Under-tuned models miss actual fraud. Both outcomes cost money.

Explainability gaps. Regulators increasingly require that automated financial decisions come with a reason. "The model said no" isn't a compliant answer.

What a Modern Fraud Engine Actually Looks Like

The solution isn't a single smarter model. It's a system made of specialized components that work in coordination.

Machine Learning for Behavioral Anomalies

An ML model trained on transaction history can detect patterns that no human would think to write a rule for. A user who always pays for groceries in one neighborhood, then suddenly makes a high-value purchase from a device in another country ,that's a behavioral drift the model picks up on, even if no explicit rule covers it.

A Rules Engine for Known Attack Patterns

Purely learned models have a weakness: they need examples. If a new fraud vector appears that the model has never seen, it won't catch it. Rules handle the known universe: velocity limits, block lists, device fingerprint anomalies, card testing patterns. Rules are fast, auditable, and precise.

AI Reasoning for Explanation

This is the layer that often gets skipped in engineering discussions, but it's increasingly non-negotiable. An LLM layer (or a structured reasoning module) generates a human-readable explanation for why a transaction was flagged. This serves compliance, powers customer support, and makes the system debuggable by the engineers maintaining it.

No single one of these layers is sufficient on its own. The fraud engine is the combination.

How the Pipeline Works, Step by Step

Here's the end-to-end flow of a transaction moving through a production fraud engine:

1. Signal Collection
When a transaction arrives, the system immediately gathers context: device fingerprint, IP geolocation, session behavior (how fast the user is typing, whether they copied and pasted fields), and historical patterns for that user. This signal package is assembled in parallel ,not sequentially ,to minimize latency.

2. Fraud Categorization
Before scoring, the system classifies the type of risk being evaluated. Is this potentially account takeover? Card-not-present fraud? Synthetic identity? The category determines which downstream models and rules are most relevant.

3. Risk Scoring
The ML model runs against the collected signals and returns a probability score. The rules engine runs simultaneously, checking the transaction against known patterns. Both outputs feed into an aggregation layer that produces a single composite risk score.

4. Decision
The composite score maps to one of three outcomes: approve, challenge (step-up authentication like OTP), or block. Thresholds are tunable per merchant, per transaction type, and per user segment.

5. Explanation Generation
For any flagged transaction, the reasoning layer generates a structured explanation. Something like: "Transaction flagged due to device mismatch combined with velocity anomaly ,three transactions in 90 seconds from two different countries." This gets logged, surfaced to compliance tools, and used in customer communication if the user disputes.

The Key Insight: Separate Your Fast Path from Your Deep Path

This is the architectural decision that makes sub-50ms realistic.

Not every decision needs the same depth of analysis. A transaction that matches a known fraud fingerprint exactly can be blocked in under 15ms via the rules engine alone. A transaction with ambiguous signals needs deeper analysis ,but that deeper analysis doesn't have to block the primary response.

The pattern that works in production:

Fast path (5–15ms): Rules engine + cached ML inference on pre-computed user features. Returns a decision immediately. Handles the majority of clear-cut cases.

Deep path (~200ms, asynchronous): Full ML inference, behavioral sequence modeling, cross-account graph analysis. Runs in the background. If the deep path disagrees with the fast path decision, it can trigger a follow-up action ,not reverse the initial decision, but queue a secondary review or increase monitoring on the account.

Separating these paths means the user experience never waits on the heavy computation. The system feels instant. The sophisticated analysis still happens; it just doesn't block the response.

Why Hybrid Systems Win

It's tempting to frame this as "ML vs. rules" and pick a side. In practice, the two approaches have complementary failure modes.

Rules are interpretable, fast, and excellent at catching known attack patterns. They degrade when fraud evolves in ways the rule authors didn't anticipate.

ML models generalize across unseen patterns and adapt to behavioral drift. They're opaque, require training data for each new fraud type, and can drift silently if monitoring isn't tight.

LLM-based reasoning adds the explainability layer that neither rules nor ML models natively provide. It's the component that makes the system auditable.

Together, the three layers cover each other's weaknesses. Rules handle the known. ML handles the novel. Reasoning handles the explainability requirement. Some engineering teams are already shipping this in production ,GeekyAnts published a detailed breakdown of how they built exactly this kind of multi-agent fraud pipeline if you want a concrete reference point.

Real-World Takeaways for Engineers

If you're building or evaluating a fraud system ,or any real-time decision system ,these are the things worth internalizing:

Latency is a product requirement, not just an engineering metric. The 50ms target isn't arbitrary. It's derived from what users perceive as "instant." Build your SLAs from that constraint backward.

Explainability is a first-class concern. Compliance requirements are tightening globally. If your system can't generate a structured, human-readable rationale for a decision, you're accumulating regulatory debt. Build the explanation layer early, not as an afterthought.

Observability is different in distributed pipelines. When your decision engine spans a rules service, an ML inference endpoint, and a reasoning module, a single slow component can cascade. Instrument every layer independently. Track p95 and p99 latency per stage, not just end-to-end.

A single model is a single point of failure. The model that catches 95% of fraud today will miss a new attack vector tomorrow. Hybrid architecture gives you fallback depth. When one layer fails to catch something, another layer might.

Cache aggressively, but carefully. Pre-computed user feature vectors dramatically reduce inference latency. But stale features can introduce subtle bugs ,a user's "normal" location from 12 hours ago might not reflect their current context. Build cache invalidation logic that's aware of the feature's temporal sensitivity.

Building systems like this is a balance of product thinking and systems engineering. The fraud problem is ultimately a latency problem, a data problem, and a trust problem at the same time. The teams that treat all three seriously are the ones shipping fraud engines that actually work in production.