Sergey

Posted on Feb 22

Beyond the AI Chatbot Hype: Why We Built a Hybrid Agent Instead of Buying One

#chatbot #ai #rag

The honest story of evaluating the chatbot market, rejecting every SaaS option, and building a hybrid rule-based + AI system that costs $50/month instead of $1,500+.

A Brief History of Terrible Chatbots

Website chatbots have been promising to revolutionize customer service since 2016, when Facebook opened the Messenger Platform and 100,000 bots appeared overnight. Most were garbage — rigid button menus pretending to be conversations. Press 1 for support. Press 2 to speak to a human. Press 3 to give up.

The next wave (2018–2022) brought NLU frameworks like Dialogflow, Rasa, and IBM Watson. These could actually understand natural language — sort of. They required training on hundreds of labeled examples, constant retraining as language drifted, and a dedicated ML engineer to maintain them. Gartner noted that 70% of these projects failed, not because the technology was bad, but because organizations underestimated the ongoing work.

Then 2023 happened. ChatGPT showed the world what LLMs could do, and every chatbot vendor scrambled to rebrand. Intercom launched Fin. Zendesk launched AI Agents. Tidio launched Lyro. The pitch was simple: connect your knowledge base, and the AI does the rest. No training required. No ML engineer needed.

The pitch is mostly true. But the pricing tells a different story.

What the Market Looks Like in 2026

If you need a chatbot for your business today, here's what you're actually choosing between:

The SaaS Platforms

Intercom is the market leader for B2B SaaS and startups. Their Fin AI agent is genuinely good. It costs $0.99 per resolved conversation, on top of seat costs ($29–$132/agent/month). A small 5-agent team with 1,000 AI resolutions per month pays roughly $1,400/month before any channel extras.

Zendesk targets enterprise support teams. Add their Advanced AI add-on (+$50/agent/month) and you're at $165–$219/agent/month. At 10 agents, that's $1,650–$2,190/month just in platform fees, before the per-resolution costs kick in for more capable AI agents.

Drift (now part of Salesloft) is the enterprise sales tool of choice. Entry price: $2,500/month. Enterprise contracts run $10,000–$150,000/year.

Tidio is the SMB option. The base plan is $29/month, but the AI features (Lyro) are a separate $39+/month add-on, and Flows automation is another $29/month. You quickly land at $97–$200/month for modest usage, scaling steeply with conversation volume.

Botpress is the developer-friendly option — open-source core, cloud platform, AI tokens billed separately. The free tier is 500 messages/month. Their Team plan is $500/month for 50,000 messages.

The Hidden Cost Stack

Every pricing page shows the platform fee. Nobody shows you the full stack:

Cost category	Typical range
Platform subscription	$100–$2,500/month
LLM API fees (at scale)	$50–$2,000/month
Agent seat licenses	$29–$169/seat/month
Implementation/setup	$5,000–$30,000 one-time
Conversation design	$3,000–$12,000 one-time
CRM integration (custom)	$5,000–$25,000 one-time
Ongoing maintenance	15–20% of build cost/year
WhatsApp/SMS channel fees	$500–$5,000/month at scale

The per-resolution model deserves special attention. Intercom Fin resolves conversations at $0.99 each. A good month where the bot handles 5,000 conversations = $4,950 in AI fees alone, on top of your seat costs. Volume spikes — holidays, product launches, PR moments — can triple your bill with zero warning.

Why SaaS Platforms Fail for Complex Products

The LLM-over-knowledge-base approach works beautifully for FAQ deflection: "What are your business hours?" "How do I reset my password?" "Where's my order?"

It breaks down for guided selling — situations where the bot needs to ask specific questions in a specific order, qualify the customer against product rules, and match them to the right product from a catalog that changes monthly.

You can't tell an LLM "Ask about loan amount first, then credit history, then determine which of 30 products across 10 lenders matches their profile." It will interpret, improvise, skip steps, and occasionally recommend products that don't exist. You're also storing sensitive customer financial data on third-party cloud infrastructure — a real compliance concern for businesses operating under GDPR, CCPA, or FCA regulations.

Data sovereignty is increasingly non-negotiable. 84% of organizations express concern about it (Parallels, 2026), and it's one of the main drivers pushing regulated industries toward self-hosted or custom solutions.

The Shortcut Everyone Is Selling Right Now

There's a fourth option that doesn't appear on comparison tables but is currently the most commonly pitched approach to small business owners: the pure LLM prompt bot.

The pitch is appealing: take an existing LLM (ChatGPT, Claude, Gemini), write a detailed system prompt describing your business and what the bot should and shouldn't do, embed it on your site using a white-label wrapper like Chatbase or CustomGPT.ai, and you're live in an afternoon. Freelancers on Fiverr will do it for $20–$95. Searches for "AI agents" on Fiverr grew 18,347% in late 2024 — the market is enormous.

Here's what you actually get.

How It Works

The system prompt is a hidden instruction block prepended to every conversation:

You are Alex, a friendly advisor for Brightfield Financial.
Only answer questions about our loan products.
Do not discuss competitors.
If the user wants to apply, ask for their name, phone, and email.
Our products: [entire product catalog pasted as text]

As the business tries to squeeze in more product knowledge, the prompt grows from a few hundred words to 2,000–5,000 tokens. When users type questions, the LLM reads the prompt plus the conversation history and generates a response. That's the entire system.

The Honest Cost

For simple FAQ at low volume, the direct API cost is genuinely cheap:

Volume	GPT-4o	GPT-4o-mini
1,000 conversations/month	~$7	~$0.44
10,000 conversations/month	~$72	~$4.35
100,000 conversations/month	~$725	~$43.50

Assumes ~500 input + 300 output tokens per conversation with a minimal system prompt.

Add the white-label platform on top — Chatbase runs $19–$399/month depending on volume — and you're looking at $20–$500/month for a fully managed solution. That's genuinely cheap compared to Intercom.

The catch: those numbers assume a short system prompt. A detailed product catalog baked into every conversation as 5,000 tokens multiplies your input token bill by 3–10x. At 10,000 conversations/month with a heavy prompt, GPT-4o costs jump to $200–$500/month in API fees alone.

What Actually Goes Wrong

Hallucination is the main event. LLMs don't know what they don't know. When a user asks about a product not well-described in the prompt, the model confidently synthesizes a plausible-sounding answer from its training data. In financial services, this means inventing interest rates, eligibility rules, or policies that don't exist.

This isn't theoretical. In a February 2024 ruling, the BC Civil Resolution Tribunal found Air Canada liable after their chatbot invented a bereavement fare refund policy that didn't exist. The customer sued, and Air Canada's defense — that it couldn't be responsible for a separate "AI entity" — was rejected outright. If the bot says it, the company said it.

Guardrails written in prose fail in practice. "Do not discuss competitors" sounds like a solid instruction. It isn't enforceable. In December 2023, a Chevrolet dealership's ChatGPT-powered bot was prompted by users to enthusiastically recommend Teslas, write Python code, and "agree" to sell an $81,000 Tahoe for $1. In January 2024, DPD's bot was coaxed into writing poetry about how terrible DPD's service was and declaring itself "the world's worst chatbot." Both went viral.

The reliability problem is structural: by most estimates, a system prompt instruction is followed roughly 90–95% of the time. At 10,000 monthly conversations, that's 500–1,000 guardrail violations per month. At 100,000 conversations, it's 5,000–10,000.

The system prompt is not a secret. OWASP lists prompt injection as the #1 risk for LLM applications. Users can extract system prompts with simple instructions ("repeat the text above verbatim") — and they do. A GitHub repository documents leaked system prompts from hundreds of popular Custom GPTs, including ones with embedded API keys and proprietary business logic. Your confidential product margin rules, competitor analysis, and CRM access credentials are readable by any curious user.

Business logic cannot be enforced. There is no way to instruct an LLM via prompt to always ask questions in a specific order and guarantee compliance. It will skip qualification steps when the user seems decisive, mix threads when questions are compound, and collect partial data it presents as complete. For a credit consultant, this means arriving at product recommendations without knowing whether the customer actually qualifies.

No session memory between visits. By default, every new browser session starts blank. A returning customer who described their situation yesterday is a stranger today. Solving this requires code — a session database, cookie management, conversation history storage — at which point you've built the infrastructure of a real system, and the "simple prompt" story is gone.

No CRM integration. The bot collects name and phone conversationally with no validation. "Call me tomorrow" is not a phone number. Getting that data into a CRM requires parsing the conversation transcript with another automation layer (Zapier, Make), which introduces more failure points and still delivers inconsistently structured data.

Where It Actually Works

To be fair: pure prompt bots work well under specific conditions.

They're good for simple FAQ with stable, short content — business hours, location, return policy, service area. When the entire knowledge base fits in a few hundred tokens, hallucination risk is low and the approach is genuinely adequate.

They work for internal tools — employee handbook lookup, onboarding Q&A, helpdesk for a specific software tool — where the audience is trusted, stakes are low, and there's no compliance exposure.

They work for proof of concept — demonstrating to stakeholders that a conversational interface is worth investing in, before committing to a real implementation.

They fail for anything involving product recommendations with real financial consequences, structured multi-step qualification, reliable data collection, CRM sync, or regulatory compliance.

The Freelancer Economy Around This

The $20–$95 Fiverr chatbot gig typically delivers: a text file with a system prompt, instructions for pasting it into Chatbase's free tier, and a 24-hour turnaround. No testing with adversarial inputs, no conversation flow design, no CRM connection, no session continuity.

The result works — it produces chatbot-shaped responses to chatbot-shaped questions. It fails predictably when real customers show up with real questions, edge cases, and the occasional deliberate attempt to make it say something embarrassing.

The business owner who paid $50 for a chatbot is now responsible for everything that chatbot says.

Our Situation: A Credit Consultant's Problem

We built this chatbot for a credit consultancy — a company that matches individuals and businesses to loan products across a panel of lenders. The product catalog covers 30+ products across 10 lenders and 15 categories: mortgages, consumer loans, auto loans, credit cards, business loans, leasing, and factoring.

To match a customer to the right product, you need to collect specific information in a logical order:

Are you an individual or a business?
What product are you looking for?
What amount do you need?
What's your credit history like?
What's your employment status?

Miss any of these, and the match is wrong. Let the LLM wander, and it might skip steps, ask in the wrong order, or match confidently to a product the customer doesn't qualify for.

We also needed the bot to answer financial questions mid-conversation — "What's the difference between a secured and unsecured loan?" — using the company's 130+ blog articles as a knowledge base.

And we needed it to run cost-effectively on a small VPS alongside other projects, with API costs well under $100/month.

None of the SaaS platforms could do all of this. So we built a hybrid.

The Architecture: Deterministic Core, AI at the Edges

The core idea is straightforward: use code where code is reliable, use AI where code isn't enough.

User Input
    |
    v
1. Exact match (button clicks, known phrases) --> Navigate    [cost: $0]
    | no match
    v
2. Pattern matching (amounts, dates, yes/no) --> Collect      [cost: $0]
    | no match
    v
3. LLM agent fallback ------------------------> Understand    [cost: ~$0.02]

In our production traffic, 85% of user inputs are handled by steps 1 and 2 with zero LLM cost. A user clicking "Mortgage" or typing "three hundred thousand" — pure deterministic code. Only free-text questions, ambiguous inputs, and mid-conversation digressions reach the AI.

The Conversation Graph

The backbone is a tree of nodes. Each node has a type: question, info, contact collection, product matching, or redirect. Every quick-reply button links to a specific child node. The tree encodes the business logic: ask product type, then amount, then credit history, then match.

This path is always followed. The AI cannot skip steps or improvise the flow. It can only help the user navigate the path more naturally.

Root: "Individual or business?"
+-- Individual
|   +-- Mortgage
|   |   +-- Amount?
|   |   +-- Program? (fixed, tracker, offset)
|   |   +-- -> Product match
|   +-- Personal loan
|   +-- Credit card
+-- Business
    +-- Business loan
    +-- Asset finance

Three Agents, Not One

When a user types something the pattern matcher can't handle, three specialized AI agents work together:

Orchestrator (most capable model) classifies the user's intent in a single call: are they answering the current question, asking their own question, doing both at once, or confused? Based on that classification, it dispatches to the appropriate sub-agent — or both in parallel.

Parser (cheaper, faster model) handles semantic matching. The user typed "around three hundred grand" instead of clicking "$250k–$500k". The Parser reads the current node's options and maps the free text to the correct choice. Temperature zero — deterministic output. Results are cached by node + input hash, so the same phrasing at the same step never costs twice.

Info Agent handles the user's own questions using a RAG pipeline over the company blog. "What documents do I need for a mortgage?" leads to a search across 500+ blog article chunks by semantic similarity, then a synthesized answer citing real content. No hallucination of product details; the answer comes directly from published articles.

The reason for three agents instead of one: each has its own cache strategy, its own failure mode, and can use the cheapest model appropriate for the task. Parser failures don't block Info Agent results. They can run in parallel when the user is doing both (answering + asking) simultaneously.

Blog as Knowledge Base

The company has 130+ articles about loans, mortgages, credit cards, and business financing. Every night, a background task reads the sitemap, fetches new or updated articles, strips navigation and UI chrome from the HTML, chunks the content into searchable segments, generates vector embeddings, and stores them in PostgreSQL with the pgvector extension.

Before calling the LLM to synthesize an answer, the system checks two cache layers: an exact-match cache for repeated identical questions (free), and a semantic similarity cache for near-identical phrasings (cost of one embedding call). Only truly novel questions reach the full LLM synthesis pipeline.

Why PostgreSQL with pgvector instead of a dedicated vector database (Pinecone, Qdrant, Weaviate)? For a few hundred vectors, a separate service adds infrastructure complexity with no performance benefit. pgvector runs inside the existing database, uses the same backup system, and eliminates one more thing to deploy and monitor.

Product Sync from the Source of Truth

The chatbot never stores product data manually. A daily sync task reads from the company's existing CMS, maps products to a normalized schema, upserts by product ID, and deactivates removed records. When the company adds a lender or a rate changes, it happens in the CMS — the chatbot picks it up the next morning without any chatbot-specific work.

Product matching is entirely deterministic: filter by product type, amount range, credit history, and employment status. The AI is not involved. Interest rates and product eligibility are never at risk of hallucination.

Page Context and Smart Auto-Skip

The widget reads the current page URL on load. A user landing on /mortgages/ is probably interested in a mortgage. This gets stored in the session immediately.

When the conversation tree reaches the "What product are you looking for?" node, the Orchestrator detects the pre-known answer and skips the question — but asks for confirmation first:

"I see you're interested in a mortgage. Is that right?"

If confirmed, the conversation jumps ahead. If not, the question is asked normally and the pre-filled value is discarded. The same pattern applies to returning visitors who already answered some questions in a previous session.

Messenger as a Channel

The same engine powers both the web widget and a messaging bot (we use Telegram; WhatsApp Business API works identically in principle). When a user wants to continue the conversation in their preferred messenger, the widget generates a session transfer token and opens the bot with a deep link. The bot claims the token and picks up exactly where the web conversation left off — same graph, same collected data, same product matches. The bot also ingests the company's social channel posts into the same RAG knowledge base, giving the Info Agent access to announcements and promotions alongside the blog.

What It Actually Costs

Running at roughly 10,000 conversations per month (we use a regional LLM provider; costs below use GPT-4o equivalents for reference):

Component	Monthly cost
VPS (shared with other projects)	~$5
LLM API — Orchestrator + Info Agent (GPT-4o class)	~$42
LLM API — Parser (GPT-4o-mini class, 40% cached)	~$11
Embeddings (ingestion + search)	~$2.50
Total	~$60/month

The comparable Intercom configuration — 5 agents, Fin AI, similar conversation volume — would run $1,400–$1,600/month, plus one-time setup costs.

The tradeoff is upfront development time. This isn't a weekend project. It took several months of focused engineering to build the conversation graph, integrate with the CRM, implement the RAG pipeline, and wire up the messaging bot. For a business that can afford the investment, the long-term economics are clear.

Which Approach Is Actually Right for You?

Four options, honest about what each one is:

Pure prompt bot ($20–$500/month):
Use this if you need simple FAQ coverage, your entire knowledge base fits in a short prompt, you have no compliance exposure, and you understand you're accepting hallucination and guardrail risk. Good for proof of concept. Not good for financial services, healthcare, legal, or any domain where the bot's wrong answer has consequences.

SaaS platform ($100–$2,500+/month):
Use this if you primarily need FAQ deflection and ticket routing, your product catalog is stable and simple, you have a support team that wants a unified inbox, and you want to be running in days. Intercom Fin and Zendesk AI Agents are genuinely good at what they do. The per-resolution pricing is painful at scale but may be worth it for teams that would otherwise need to hire.

Open-source framework (Rasa/Botpress, $0 software + infra + engineers):
Rasa gives you full control with self-hosted NLU and dialogue management. It requires a Python engineer and real training data. Enterprise licensing starts at $35,000/year. Botpress has become increasingly LLM-first with a visual builder and code escape hatches — worth evaluating if you want something between no-code and fully custom. Both require ongoing maintenance that SaaS absorbs for you.

Custom hybrid (engineering investment + $50–$100/month API costs):
Use this if you need guided multi-step qualification flows, product data that stays synchronized with an existing system, data residency or compliance requirements (GDPR, HIPAA, FCA), non-standard channel integration, or long-term cost predictability. Not a weekend project — but at scale, the economics are clear. Expect fully-loaded engineering costs of $40K–$150K to get to production.

A rough decision matrix:

	Prompt bot	SaaS	Open source	Custom hybrid
Time to deploy	Hours	Days	Weeks	Months
Upfront cost	$0–$800	$0–$30K	$0 + dev time	$40K–$150K
Monthly cost	$20–$500	$100–$2,500+	Infra + salaries	$50–$150
Hallucination risk	High	Low–medium	Low–medium	Low (rule-based core)
Business logic enforcement	None	Partial	Yes	Yes
CRM integration	None	Via webhooks	Custom	Native
Session memory	None	Yes	Yes	Yes
Compliance / data residency	Risky	Complex	Manageable	Full control
Best for	FAQ / PoC	Support teams	ML-heavy use cases	Complex sales/qualification

The Honest Tradeoffs

What hybrid gets you:

Predictable monthly costs that don't spike with conversation volume
Complete control over conversation flow and business logic
Data stays in your infrastructure
Product matching is deterministic — no hallucinated interest rates
AI handles what it's good at: understanding natural language and synthesizing knowledge base answers

What you give up:

A polished multi-agent inbox for your support team (build or integrate separately)
The "set it up in a week" promise of SaaS
Automatic updates as LLM providers improve
Someone else's engineering team maintaining the underlying platform

The lingering questions:
The SaaS platforms are improving fast. Intercom Fin and Zendesk AI Agents are significantly better than they were a year ago. The per-resolution pricing is painful now but may normalize. For many businesses, the build option will stop making sense as the platforms mature.

The differentiator that won't go away is data and compliance. If your business collects sensitive personal information and operates under regulatory requirements, you need to know exactly where that data goes. SaaS platforms can claim GDPR compliance, but the actual data processing location and sub-processor chain is complex. For fintech, healthcare, and legal services, that complexity carries real risk.

What We Learned

Building a chatbot is not primarily a technology problem. The technology — conversation graphs, LLM APIs, vector search — is well-understood and accessible. The real work is conversation design: figuring out what to ask, in what order, with what fallbacks, and what to do when users say something you didn't anticipate.

The hybrid approach forced us to be explicit about every step of the conversation. Every node in the graph is a decision. Every quick-reply option is a commitment. That explicitness is a feature — it means the chatbot's behavior is auditable, testable, and predictable. You can read the graph and understand exactly what the bot will do in any situation.

Pure LLM chatbots are flexible in ways that can be dangerous for financial advice: they'll confidently answer questions outside their knowledge base, recommend products that don't exist, or skip qualification steps that matter for compliance. The graph prevents that class of error entirely.

The cost math works in our favor today. Whether it still will in two years depends on how LLM pricing evolves and how good the SaaS platforms get at respecting data boundaries. Both are moving fast.

Built with Django, PostgreSQL + pgvector, and vanilla JavaScript.

DEV Community