Last month I shipped AgentDesk — a SaaS platform where consultants and agencies can run pre-built AI agents for client intake, proposal generation, and reporting. The stack is Next.js 16, Claude Sonnet 4 with tool use, Stripe subscriptions, and Vercel. This post walks through the architecture decisions and actual code that powers it.
The Problem
Every consulting firm does the same repetitive work: qualifying inbound leads, writing proposals from call notes, generating client reports. These tasks follow predictable patterns, which makes them perfect candidates for AI agents — not chatbots, but structured agents with defined tools and outputs.
I wanted to build a platform where each agent has a specific job, uses typed tools, and produces structured output that plugs directly into existing workflows.
Architecture Overview
The project uses Next.js 16.2.1 with the App Router, React 19, TypeScript, and Tailwind CSS 4. The core is a central agent engine that defines agent configurations and orchestrates Claude API calls. The API layer exposes trial and authenticated endpoints, Stripe handles billing, and the whole thing deploys to Vercel.
src/
├── app/
│ ├── api/
│ │ ├── agents/[agentId]/route.ts # Authenticated agent runs
│ │ ├── trial/[agentId]/route.ts # Free trial with rate limiting
│ │ ├── billing/checkout/route.ts # Stripe checkout sessions
│ │ └── billing/webhook/route.ts # Stripe webhook handler
│ ├── dashboard/
│ │ ├── agents/[agentId]/page.tsx # Individual agent UI
│ │ ├── layout.tsx # Dashboard shell
│ │ └── page.tsx # Agent list
│ └── page.tsx # Landing page
├── lib/
│ ├── agent-engine.ts # Core agent definitions + runner
│ └── stripe.ts # Stripe config + plan definitions
The Agent Engine
The heart of the platform is agent-engine.ts. Each agent is defined as a typed AgentConfig object: a system prompt that defines its behavior, a set of Claude tools (function-calling schemas), token limits, and a temperature setting tuned per agent type.
export interface AgentConfig {
id: string;
name: string;
systemPrompt: string;
tools: Anthropic.Messages.Tool[];
maxTokens: number;
temperature: number;
}
Here's a condensed look at how the Intake Agent is defined. The system prompt instructs it to qualify leads on a 1-10 scale, suggest a next action, and draft a personalized response. The qualify_lead tool gives Claude a structured way to process the inquiry:
export const INTAKE_AGENT: AgentConfig = {
id: "intake",
name: "Intake Agent",
systemPrompt: `You are a professional client intake agent...
Your job is to:
1. Read incoming inquiries (emails, form submissions, messages)
2. Qualify the lead based on: budget indicators, timeline urgency,
service fit, company size
3. Generate a personalized, professional response
4. Score the lead on a 1-10 scale with reasoning`,
tools: [
{
name: "qualify_lead",
description:
"Analyze an incoming inquiry and produce a qualification score, suggested response, and next action.",
input_schema: {
type: "object" as const,
properties: {
inquiry_text: {
type: "string",
description: "The full text of the incoming inquiry",
},
source: {
type: "string",
description: "Where the inquiry came from (email, form, chat)",
},
sender_info: {
type: "string",
description: "Any available info about the sender",
},
},
required: ["inquiry_text"],
},
},
],
maxTokens: 2048,
temperature: 0.3,
};
Temperature tuning matters. The Intake Agent runs at 0.3 for consistent, reliable lead scoring. The Proposal Agent runs at 0.4 — slightly more creative for writing proposals, but still grounded. The Report Agent is back at 0.3 because reports need to be precise, not creative.
The runAgent function is the orchestrator. It accepts an agent ID, user input, and optional context, then calls Claude's Messages API:
export async function runAgent(
agentId: string,
input: string,
context?: Record<string, unknown>
): Promise<AgentTask> {
const agent = getAgent(agentId);
if (!agent) throw new Error(`Agent "${agentId}" not found`);
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
const contextStr = context
? `\n\nAdditional context:\n${JSON.stringify(context, null, 2)}`
: "";
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: agent.maxTokens,
temperature: agent.temperature,
system: agent.systemPrompt,
messages: [{ role: "user", content: `${input}${contextStr}` }],
});
const textBlocks = response.content.filter(
(block): block is Anthropic.Messages.TextBlock => block.type === "text"
);
return {
...task,
output: textBlocks.map((b) => b.text).join("\n\n"),
status: "completed",
};
}
Each task gets a unique ID, tracks its lifecycle (pending to running to completed / failed), and returns structured output. The type guard on TextBlock handles Claude's multi-block response format cleanly.
Trial Rate Limiting Without a Database
For the free trial, I wanted users to try agents without signing up — but I also needed to prevent abuse. The solution: an in-memory Map with IP-based tracking, scoped to 5 requests per 24-hour window.
const trialUsage = new Map<string, { count: number; resetAt: number }>();
const TRIAL_LIMIT = 5;
const WINDOW_MS = 24 * 60 * 60 * 1000; // 24 hours
function checkTrialLimit(ip: string): { allowed: boolean; remaining: number } {
const now = Date.now();
const entry = trialUsage.get(ip);
if (!entry || now > entry.resetAt) {
trialUsage.set(ip, { count: 0, resetAt: now + WINDOW_MS });
return { allowed: true, remaining: TRIAL_LIMIT };
}
if (entry.count >= TRIAL_LIMIT) {
return { allowed: false, remaining: 0 };
}
return { allowed: true, remaining: TRIAL_LIMIT - entry.count };
}
Is this production-grade rate limiting? No. The Map resets on every Vercel cold start. But for a trial feature, that's actually a feature — it's forgiving, and a hard-core abuser who discovers the reset behavior still isn't costing much since each request is a single Claude call. When you need actual metering, move to Redis or a database. For launch, this took 20 minutes to build and it works.
The trial endpoint also caps input at 3,000 characters to prevent prompt-stuffing:
const trimmedInput = body.input.slice(0, 3000);
const task = await runAgent(agentId, trimmedInput, body.context);
const remainingAfter = recordUsage(ip);
When the limit is hit, the response includes an upgrade URL:
if (!allowed) {
return NextResponse.json(
{
error: "trial_limit_reached",
message: "You've used all 5 free trial runs. Upgrade to keep using AgentDesk.",
upgradeUrl: "https://buy.stripe.com/...",
},
{ status: 429 }
);
}
A cleanup interval runs every 10 minutes to prune expired entries and prevent the Map from growing unbounded — a detail that's easy to forget and causes a slow memory leak if you skip it.
Stripe Subscription Tiers
Billing is structured as three tiers with clear capability gates. The Starter plan ($99/month) gives access to the Intake Agent only. Professional ($349/month) unlocks all three agents. Agency ($799/month) adds unlimited tasks:
export const PLANS = {
starter: {
name: "Starter",
priceId: process.env.STRIPE_PRICE_STARTER!,
agents: ["intake"],
monthlyPrice: 99,
taskLimit: 100,
},
professional: {
name: "Professional",
priceId: process.env.STRIPE_PRICE_PROFESSIONAL!,
agents: ["intake", "proposal", "report"],
monthlyPrice: 349,
taskLimit: 500,
},
agency: {
name: "Agency",
priceId: process.env.STRIPE_PRICE_AGENCY!,
agents: ["intake", "proposal", "report"],
monthlyPrice: 799,
taskLimit: -1, // unlimited
},
} as const;
Price IDs come from environment variables (never hardcoded), and the as const assertion gives us full type narrowing on PlanId. The agents array doubles as an access control list — the API checks whether the user's plan includes the requested agent before running it.
The Stripe integration uses SDK v21 with the latest API version. Checkout sessions are created server-side, and a webhook handler processes checkout.session.completed and customer.subscription.updated events to keep plan status in sync.
Deployment
The whole thing deploys to Vercel with zero configuration beyond environment variables. The App Router's file-based routing means every file in src/app/api/ is automatically a serverless function. No Express server, no Docker, no infrastructure management.
Key environment variables:
-
ANTHROPIC_API_KEY— Claude API access -
STRIPE_SECRET_KEY— Stripe billing -
STRIPE_PRICE_STARTER,STRIPE_PRICE_PROFESSIONAL,STRIPE_PRICE_AGENCY— plan price IDs -
STRIPE_WEBHOOK_SECRET— webhook signature verification
One Vercel gotcha: when setting env vars, use printf instead of echo to pipe values. echo adds a trailing newline that corrupts Stripe price IDs and causes silent checkout failures.
What I'd Do Differently
Persistent rate limiting from day one. The in-memory approach works for launch, but the moment you scale past one serverless instance, users get free resets. Vercel KV or Upstash Redis is the obvious fix and takes about 30 minutes to swap in.
Streaming responses. The current implementation waits for the full Claude response before returning. For the Proposal Agent, which generates 4,096 tokens of structured markdown, that's a noticeable wait. Server-Sent Events via the App Router's streaming support would improve perceived latency significantly.
Multi-turn agent conversations. Right now each agent call is a single turn. The next iteration will add conversation history so agents can ask clarifying questions before producing output — especially useful for the Intake Agent when an inquiry is ambiguous.
Try It
AgentDesk is live with a free trial — no signup required. Paste a client inquiry into the Intake Agent, feed call notes into the Proposal Agent, or drop project metrics into the Report Agent and see what comes back.
Try the agents at agentdesk.thewedgemethodai.com/dashboard
If you're building something similar, the key architectural insight is this: define agents as typed configuration objects, not ad-hoc prompt strings scattered across your codebase. It makes them testable, composable, and easy to add new agents without touching the engine.
Top comments (0)