DEV Community

Geminate Solutions
Geminate Solutions

Posted on • Originally published at geminatesolutions.com

How Do You Build AI Agents in Node.js for Production?

How Do You Build AI Agents in Node.js for Production?

84% of developers now use or plan to use AI in their workflow, according to GitHub's 2025 Developer Survey. But most AI agent tutorials stop at "hello world" — a chatbot that answers trivia, a script that summarizes text. Useless when you need an agent handling 10,000 support tickets per day without crashing at 3 AM.

Geminate Solutions has shipped AI agents processing 10,000+ daily requests for SaaS startups and growing businesses worldwide and UK. Agents that book appointments, process refunds, triage support tickets, and chain multi-step workflows without human intervention. Here's every pattern that survived production.

What Are AI Agents and How Do They Differ From Chatbots?

An AI agent is not a chatbot with extra steps. A chatbot responds to prompts. An agent acts on them. What's the difference?

Three properties separate agents from chatbots:

  1. Autonomy — the agent decides what action to take based on context, not a hardcoded flow
  2. Tool use — it calls external APIs, queries databases, sends emails, triggers webhooks
  3. Looping — it keeps working through multiple steps until the task is complete

Think of it this way. A chatbot is a calculator — you ask, it answers. An AI agent is an accountant who decides which calculations to run, pulls the right numbers from your books, and delivers the finished report without being told each step.

In March 2026, Claude's tool_use API, OpenAI's function calling, and Gemini's function declarations all support agentic patterns natively. The framework war is over. The real challenge? Getting agents to work reliably when real money is on the line.

How Do You Build the Core Agent Loop in Node.js?

Every production AI agent follows a single pattern. Master this loop and you can build anything from a support bot to an autonomous DevOps agent.

import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();

const tools = [
  {
    name: "search_orders",
    description: "Search customer orders by email or order ID",
    input_schema: {
      type: "object",
      properties: {
        query: { type: "string", description: "Email or order ID" },
        status: { type: "string", enum: ["pending", "shipped", "delivered", "refunded"] }
      },
      required: ["query"]
    }
  },
  {
    name: "process_refund",
    description: "Issue a refund for a specific order",
    input_schema: {
      type: "object",
      properties: {
        order_id: { type: "string" },
        reason: { type: "string" },
        amount_cents: { type: "number" }
      },
      required: ["order_id", "reason"]
    }
  }
];

async function runAgent(userQuery) {
  let messages = [{ role: "user", content: userQuery }];

  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      system: "You are a customer support agent. Use tools to look up orders and process refunds.",
      tools,
      messages
    });

    if (response.stop_reason === "end_turn") {
      return response.content.find(b => b.type === "text")?.text;
    }

    const toolBlocks = response.content.filter(b => b.type === "tool_use");
    if (toolBlocks.length === 0) break;

    messages.push({ role: "assistant", content: response.content });

    const toolResults = [];
    for (const block of toolBlocks) {
      const result = await executeTool(block.name, block.input);
      toolResults.push({
        type: "tool_result",
        tool_use_id: block.id,
        content: JSON.stringify(result)
      });
    }
    messages.push({ role: "user", content: toolResults });
  }
}
Enter fullscreen mode Exit fullscreen mode

The while(true) loop is deliberate. The agent keeps calling tools — searching orders, checking statuses, issuing refunds — until it has everything needed to respond. One user message can trigger five tool calls before the agent says "Your refund has been processed."

How Should You Handle Tool Calling With Claude API in Node.js?

The agent loop above is generic plumbing. The executeTool function is where your business logic lives. How do you structure it so the agent can recover from errors instead of crashing?

async function executeTool(name, input) {
  const toolHandlers = {
    search_orders: async ({ query, status }) => {
      const orders = await db.orders.findMany({
        where: {
          OR: [{ email: query }, { id: query }],
          ...(status && { status })
        },
        take: 10
      });
      return { orders, count: orders.length };
    },

    process_refund: async ({ order_id, reason, amount_cents }) => {
      const order = await db.orders.findUnique({ where: { id: order_id } });
      if (!order) return { error: "Order not found" };
      if (order.status === "refunded") return { error: "Already refunded" };

      const refund = await stripe.refunds.create({
        payment_intent: order.paymentIntentId,
        amount: amount_cents || undefined
      });

      await db.orders.update({
        where: { id: order_id },
        data: { status: "refunded", refundReason: reason }
      });
      return { success: true, refund_id: refund.id };
    }
  };

  const handler = toolHandlers[name];
  if (!handler) return { error: `Unknown tool: ${name}` };

  try {
    return await handler(input);
  } catch (err) {
    return { error: err.message };
  }
}
Enter fullscreen mode Exit fullscreen mode

Two patterns worth noting. Every tool returns structured data — never raw stack traces. If a refund fails, the agent gets { error: "Already refunded" } and explains it to the user naturally. And every tool validates before executing side effects. The refund handler checks existence and status before touching Stripe. Never trust the LLM to validate inputs.

What Makes AI Agent Error Handling Production-Ready?

Tutorial code crashes in production within hours. What's the difference between a demo agent and one that handles real SaaS traffic?

async function runAgentWithGuards(userQuery, maxIterations = 10) {
  let messages = [{ role: "user", content: userQuery }];
  let iterations = 0;

  while (iterations < maxIterations) {
    iterations++;
    let response;

    try {
      response = await client.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024, tools, messages
      });
    } catch (err) {
      if (err.status === 429) {
        const delay = Math.min(1000 * Math.pow(2, iterations), 30000);
        await new Promise(r => setTimeout(r, delay));
        iterations--;
        continue;
      }
      throw new Error("Agent failed: " + err.message);
    }

    if (response.stop_reason === "end_turn") {
      return {
        answer: response.content.find(b => b.type === "text")?.text,
        iterations,
        tokensUsed: response.usage.input_tokens + response.usage.output_tokens
      };
    }

    const toolBlocks = response.content.filter(b => b.type === "tool_use");
    if (toolBlocks.length === 0) break;

    messages.push({ role: "assistant", content: response.content });

    const toolResults = [];
    for (const block of toolBlocks) {
      try {
        const result = await executeTool(block.name, block.input);
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: JSON.stringify(result) });
      } catch (toolErr) {
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: JSON.stringify({error: toolErr.message}), is_error: true });
      }
    }
    messages.push({ role: "user", content: toolResults });
  }

  return { answer: "Could not complete within allowed steps.", maxedOut: true };
}
Enter fullscreen mode Exit fullscreen mode

Three patterns that prevent 3 AM pages:

Max iteration cap. Without it, a confused agent loops forever burning API credits. Use 10 for simple agents, 25 for complex multi-step workflows. One client's uncapped agent ran 847 iterations on a single malformed request — $23 in tokens before anyone noticed.

Exponential backoff on 429/529. Claude and OpenAI both rate-limit aggressively under load. Crashing on a rate limit is amateur hour. Back off, retry, succeed.

Tool-level error isolation. If one tool throws an exception, catch it and return the error as a tool result. The LLM can often recover — "That order wasn't found. Can you double-check the order number?"

What Are the Best Agentic AI Architecture Patterns for SaaS?

Shipping one agent is straightforward. Shipping an agent system for a production SaaS product means solving coordination, state management, and cost control simultaneously.

Pattern 1: Router Agent to Specialist Agents. Don't build one mega-agent with 30 tools. Build a lightweight router (Claude Haiku — fast, 1/10th the cost) that classifies intent and delegates to specialists. A billing agent gets Stripe tools only. A support agent gets ticket tools. A deployment agent gets infrastructure tools. Each specialist's narrow tool set reduces hallucination and keeps token costs predictable.

Pattern 2: Human-in-the-Loop Checkpoints. For high-stakes actions — refunds over $100, account deletions, data exports — pause the agent loop and fire a Slack webhook for human approval. Non-negotiable for B2B SaaS. Your enterprise customers will ask about this in security questionnaires.

Pattern 3: Sliding Window Memory. Long conversations destroy token budgets. Keep the system prompt plus the last 6 message exchanges plus a compressed summary of everything before that. The Claude API SaaS integration guide covers token optimization strategies that cut costs 40-60% on high-volume agents.

Can You Combine AI Agents With No-Code Automation?

Not every AI workflow needs custom code. For internal operations — lead scoring, email triage, content pipelines, CRM updates — combining Node.js agents with n8n workflow automation cuts development time by 40-60%.

The pattern works like this: n8n handles the trigger (new email arrives, form gets submitted, cron job fires) and calls your Node.js agent via HTTP webhook. The agent does the reasoning and decision-making. n8n handles the plumbing — moving data between Slack, Google Sheets, Notion, and 400+ other integrations.

Why not just build everything in n8n? Because n8n's AI nodes can't do multi-step reasoning with tool calling. And why not build everything in Node.js? Because writing Slack-to-Sheets-to-CRM integration code from scratch is a waste of engineering time when n8n does it in a drag-and-drop workflow.

What Does a Production AI Agent Architecture Look Like?

Here's the reference architecture handling 10K+ daily requests for a SaaS client:

API Gateway (Express/Fastify) — rate limiting, auth, request queue
  |
  Router Agent (Claude Haiku) — intent classification in ~200ms
  |
  ├── Support Agent (Sonnet) — 6 tools: tickets, orders, FAQ, refunds
  ├── Sales Agent (Sonnet) — 4 tools: CRM, calendar, email templates
  └── Ops Agent (Sonnet) — 8 tools: deploys, logs, alerts, rollbacks
  |
  Tool Execution Layer — Postgres, Stripe, SendGrid, Slack (shared)
  |
  Observability — tokens per conversation, latency, cost per resolution
Enter fullscreen mode Exit fullscreen mode

Haiku for routing, Sonnet for reasoning. The router classifies intent in 200ms at 1/10th the cost. Only specialist agents use the more capable model. This single decision cut one client's monthly AI spend from $4,200 to $1,100.

Shared tool execution layer. All agents use the same database client, same Stripe instance, same SendGrid connection. Keeps connection pools manageable and avoids the "each microservice has its own DB connection" trap.

Observability isn't optional. Track tokens per conversation, cost per resolution, and error rates per tool. Without this, you're spending money with no idea which conversations cost $0.02 and which cost $2.00.

How Do You Deploy an AI Agent to Production?

Before shipping, hit every item. Skip one and you'll learn about it from a customer, not from your test suite:

  • Max iteration cap — prevent infinite loops and runaway spend
  • Request timeout — 30s for simple queries, 120s for multi-step workflows
  • Rate limit handling — exponential backoff on 429/529 responses
  • Tool error isolation — catch at tool level, return structured errors
  • Token budget tracking — log per-request usage, set daily spend alerts at 80%
  • Input sanitization — clean user input before it reaches the LLM
  • Human-in-the-loop — any action above your risk threshold gets approval
  • Fallback path — when the agent fails, route to human support gracefully
  • Cost monitoring — daily and monthly caps with automatic notification

Start with customer support triage. Clear intent classification, bounded tool set, measurable ROI (tickets deflected per day). Once that runs reliably, add specialist agents one at a time. Every agent you add compounds the system's capability.

The teams shipping fastest right now aren't the ones with the most ML engineers. They're the ones treating AI agents as software engineering problems — with proper error handling, observability, testing, and deployment pipelines. Build it like you'd build any production system.


Geminate Solutions is a custom software development company that has shipped 50+ web, mobile, and AI-powered products for startups across the startups and growing businesses worldwide. From EdTech platforms serving 250K+ daily active users to IoT systems tracking 30,000+ vehicles — the team delivers production-ready software from week one. Explore services | View portfolio

Top comments (0)