DEV Community

Geminate Solutions
Geminate Solutions

Posted on • Originally published at geminatesolutions.com

Build AI Agents in Node.js That Handle 10K Requests/Day

Build AI Agents in Node.js That Handle 10K Requests/Day

Most AI agent tutorials teach you to build a toy. A chatbot that answers trivia questions. A script that summarizes text. Cool for a weekend project — useless in production.

I've spent the last 18 months shipping AI agents that handle 10,000+ requests per day for SaaS companies. Agents that book appointments, process refunds, triage support tickets, and orchestrate multi-step workflows without human intervention.

Here's everything I learned about building AI agents with Node.js that actually survive contact with real users.

What Are AI Agents (And What They're Not)

An AI agent is not a chatbot with extra steps. A chatbot responds to messages. An agent acts.

The difference comes down to three properties:

  1. Autonomy — the agent decides what to do next based on context
  2. Tool use — it can call external APIs, query databases, send emails
  3. Looping — it keeps working until the task is complete, not just one response

Think of it this way: a chatbot is a calculator. An AI agent is an accountant. The accountant decides which calculations to run, pulls the right numbers from your books, and delivers the finished report.

In March 2026, every major LLM provider supports agentic patterns natively. Claude's tool_use API, OpenAI's function calling, and Gemini's function declarations all follow the same core loop: prompt, think, act, observe, repeat.

The real challenge isn't getting an agent to work. It's getting it to work reliably at scale.

The Agent Loop: Build an AI Agent in Node.js

Every production AI agent follows this pattern:

User Input -> LLM Reasoning -> Tool Selection -> Tool Execution -> Result Observation -> (loop or respond)
Enter fullscreen mode Exit fullscreen mode

Here's the minimal implementation using the Claude API:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

const tools = [
  {
    name: "search_orders",
    description: "Search customer orders by email or order ID",
    input_schema: {
      type: "object",
      properties: {
        query: { type: "string", description: "Email or order ID" },
        status: { type: "string", enum: ["pending", "shipped", "delivered", "refunded"] }
      },
      required: ["query"]
    }
  },
  {
    name: "process_refund",
    description: "Issue a refund for a specific order",
    input_schema: {
      type: "object",
      properties: {
        order_id: { type: "string" },
        reason: { type: "string" },
        amount_cents: { type: "number" }
      },
      required: ["order_id", "reason"]
    }
  }
];

async function runAgent(userQuery) {
  let messages = [{ role: "user", content: userQuery }];

  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      system: "You are a customer support agent. Use tools to look up orders and process refunds.",
      tools,
      messages
    });

    if (response.stop_reason === "end_turn") {
      const textBlock = response.content.find(b => b.type === "text");
      return textBlock?.text || "Task completed.";
    }

    const toolBlocks = response.content.filter(b => b.type === "tool_use");
    if (toolBlocks.length === 0) break;

    messages.push({ role: "assistant", content: response.content });

    const toolResults = [];
    for (const block of toolBlocks) {
      const result = await executeTool(block.name, block.input);
      toolResults.push({
        type: "tool_result",
        tool_use_id: block.id,
        content: JSON.stringify(result)
      });
    }
    messages.push({ role: "user", content: toolResults });
  }
}
Enter fullscreen mode Exit fullscreen mode

This is the foundation. The while(true) loop is intentional — the agent keeps calling tools until it has enough information to respond.

Claude API in Node.js: Tool Calling That Works

The executeTool function is where your business logic lives:

async function executeTool(name, input) {
  const toolHandlers = {
    search_orders: async ({ query, status }) => {
      const orders = await db.orders.findMany({
        where: {
          OR: [{ email: query }, { id: query }],
          ...(status && { status })
        },
        take: 10
      });
      return { orders, count: orders.length };
    },

    process_refund: async ({ order_id, reason, amount_cents }) => {
      const order = await db.orders.findUnique({ where: { id: order_id } });
      if (!order) return { error: "Order not found" };
      if (order.status === "refunded") return { error: "Already refunded" };

      const refund = await stripe.refunds.create({
        payment_intent: order.paymentIntentId,
        amount: amount_cents || undefined
      });

      await db.orders.update({
        where: { id: order_id },
        data: { status: "refunded", refundReason: reason }
      });

      return { success: true, refund_id: refund.id };
    }
  };

  const handler = toolHandlers[name];
  if (!handler) return { error: `Unknown tool: ${name}` };

  try {
    return await handler(input);
  } catch (err) {
    return { error: err.message };
  }
}
Enter fullscreen mode Exit fullscreen mode

Two patterns to notice: every tool returns structured data (never raw errors), and every tool validates inputs before executing side effects. Never trust the LLM to validate — validate in the handler.

Production Error Handling: Deploy AI Agents That Don't Crash

The tutorial code above will crash in production within hours. Here's what you actually need:

async function runAgentWithGuards(userQuery, maxIterations = 10) {
  let messages = [{ role: "user", content: userQuery }];
  let iterations = 0;

  while (iterations < maxIterations) {
    iterations++;
    let response;

    try {
      response = await client.messages.create({
        model: "claude-sonnet-4-6",
        max_tokens: 1024,
        tools,
        messages
      });
    } catch (err) {
      if (err.status === 429) {
        const delay = Math.min(1000 * Math.pow(2, iterations), 30000);
        await new Promise(r => setTimeout(r, delay));
        iterations--;
        continue;
      }
      throw new Error(`Agent failed: ${err.message}`);
    }

    if (response.stop_reason === "end_turn") {
      const text = response.content.find(b => b.type === "text");
      return {
        answer: text?.text || "Done.",
        iterations,
        tokensUsed: response.usage.input_tokens + response.usage.output_tokens
      };
    }

    const toolBlocks = response.content.filter(b => b.type === "tool_use");
    if (toolBlocks.length === 0) break;

    messages.push({ role: "assistant", content: response.content });

    const toolResults = [];
    for (const block of toolBlocks) {
      try {
        const result = await executeTool(block.name, block.input);
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: JSON.stringify(result)
        });
      } catch (toolErr) {
        toolResults.push({
          type: "tool_result",
          tool_use_id: block.id,
          content: JSON.stringify({ error: toolErr.message }),
          is_error: true
        });
      }
    }
    messages.push({ role: "user", content: toolResults });
  }

  return { answer: "Could not complete within allowed steps.", iterations, maxedOut: true };
}
Enter fullscreen mode Exit fullscreen mode

Three production patterns that save you at 3 AM:

  1. Max iterations cap. Without it, a confused agent loops forever burning API credits. We use 10 for simple agents, 25 for complex workflows.
  2. Exponential backoff on 429/529. Claude and OpenAI both rate-limit under load. Crashing on a 429 is amateur hour.
  3. Tool-level error isolation. If one tool throws, catch it and return the error as a tool result. The LLM can often recover.

Agentic AI Patterns for SaaS Applications

Shipping a single agent is the easy part. Shipping an agent system for a SaaS product means solving coordination, state management, and cost control.

Here are the patterns we use at Geminate Solutions when building AI-powered features for production SaaS:

Pattern 1: Router Agent to Specialist Agents

Don't build one mega-agent. Build a lightweight router that classifies intent and delegates to specialists. A billing agent has Stripe tools only. A technical agent has logs and deployment tools. Each specialist has a narrow tool set, which reduces hallucination and keeps token costs down.

Pattern 2: Human-in-the-Loop Checkpoints

For high-stakes actions (refunds over $100, account deletions), pause the agent loop and request human approval via Slack webhook. This isn't optional for B2B SaaS — your customers will demand it.

Pattern 3: Conversation Memory with Sliding Window

Long conversations blow up your token budget. Keep the system prompt + last 6 messages + a compressed summary of earlier messages. The Claude API SaaS integration guide covers token optimization strategies in depth.

Hybrid Automation: When Code Meets No-Code

Not every AI workflow needs custom code. For internal operations — lead scoring, email triage, content pipelines — combining Node.js agents with tools like n8n gives you the best of both worlds.

The pattern: n8n handles the trigger (new email, form submission, scheduled job) and calls your Node.js agent via HTTP webhook. The agent does the reasoning, n8n handles the plumbing. This hybrid approach cuts development time by 40-60% for internal automation.

AI Agent Architecture: Production Reference

Here's the architecture handling 10K+ daily requests for one of our SaaS clients:

API Gateway (Express/Fastify) - Rate limiting, auth, request queue
  |
  Router Agent (Claude Haiku) - Intent classification, fast + cheap
  |
  +-- Support Agent (Sonnet) - 6 tools (tickets, orders, FAQ)
  +-- Sales Agent (Sonnet) - 4 tools (CRM, calendar, email)
  +-- Ops Agent (Sonnet) - 8 tools (deploy, logs, alerts)
  |
  Tool Execution Layer - Postgres, Stripe, SendGrid, Slack
  |
  Observability - Token tracking, latency, cost per conversation
Enter fullscreen mode Exit fullscreen mode

Key decisions: Haiku for routing (200ms, 1/10th cost), Sonnet for reasoning. Tool execution layer is shared across agents. Observability is non-negotiable — track tokens per conversation and cost per resolution.

Deploy Your AI Agent: Production Checklist

Before you ship:

  • Max iteration cap to prevent infinite loops
  • Request timeout (30s simple, 120s multi-step)
  • Rate limit handling with exponential backoff
  • Tool error isolation with structured error returns
  • Token budget tracking with daily spend alerts
  • Input sanitization before LLM processing
  • Output guardrails validating response format
  • Human-in-the-loop for high-risk actions
  • Fallback path routing to human support on failure
  • Cost monitoring with 80% threshold alerts

Start with a single, well-scoped agent. Customer support triage is the best first project — clear intent classification, bounded tool set, measurable ROI.

The teams shipping the fastest right now aren't the ones with the most ML engineers. They're the ones who treat AI agents as software engineering problems — with proper error handling, observability, testing, and deployment pipelines.

Build it like you'd build any production system. Because that's exactly what it is.


Yash Korat is the CEO of Geminate Solutions, a custom software development company shipping AI-powered products for startups across the US, UK, and Australia. His team integrates Claude, GPT-4, and Gemini APIs into production SaaS applications.

Top comments (0)