DEV Community

Cover image for How to Build a Cost-Optimized AI Pipeline (Without Learning the Hard Way)
NeuroLink AI
NeuroLink AI

Posted on • Originally published at blog.neurolink.ink

How to Build a Cost-Optimized AI Pipeline (Without Learning the Hard Way)

How to Build a Cost-Optimized AI Pipeline (Without Learning the Hard Way)

Let me tell you about the $400 mistake.

A team I know was building a document analysis feature. They wired up Claude Opus for every request — because Opus gives the best results, right? It worked great in testing with 20 documents. They shipped it.

A week later, a batch job processed 3,000 documents. Each one called Opus. Each call cost roughly $0.13. The bill: $390. The on-call alert came in at 2am.

The problem wasn't Claude Opus. It's a great model. The problem was that nobody had put any cost awareness into the pipeline. No tracking. No limits. No alerts. Just API calls flying into the void until the bill arrived.

This post is about doing it the right way from the start: per-call cost tracking, budget limits that actually block requests, analytics you can inspect in code, and model selection based on real cost data.

All of this is built into NeuroLink.

The Real Cost of Ignorance

Most teams discover their AI costs the same way: the monthly bill. By then, the damage is done.

The problem is architectural. Standard AI SDKs give you this:

// What most developers write
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: prompt }],
});
// How much did that cost? ¯\_(ツ)_/¯
Enter fullscreen mode Exit fullscreen mode

You get a response. You have no idea what it cost. Token counts are buried in response.usage if you remember to check, and you'd need to manually multiply by the current pricing (which you'd need to look up) to get a dollar figure.

Multiply this across thousands of requests, multiple models, and multiple providers, and cost visibility becomes a spreadsheet exercise after the fact — not an operational control in real time.

NeuroLink's Approach: Cost as a First-Class Citizen

NeuroLink ships with a pricing utility (src/lib/utils/pricing.ts) that contains current pricing data for every supported provider and model. Cost is calculated automatically after every API call and included in the result.

Here's the difference:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
  input: { text: "Analyze this contract for liability clauses" },
  provider: "anthropic",
  model: "claude-opus-4-6",
});

// Cost is right here, no calculation needed
console.log(`Cost: $${result.analytics?.cost}`);
console.log(`Input tokens: ${result.usage?.input}`);
console.log(`Output tokens: ${result.usage?.output}`);
console.log(`Total tokens: ${result.usage?.totalTokens}`);
console.log(`Response time: ${result.responseTime}ms`);
Enter fullscreen mode Exit fullscreen mode

result.analytics.cost is a USD float. You know exactly what each call cost, at call time, without touching a spreadsheet.

The Pricing Table

NeuroLink's pricing data covers all 13 supported providers. As of February 2026, the table includes:

Provider Model Notes
Anthropic claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5 Includes cache read/write tokens
OpenAI gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o1-mini
Google gemini-2.0-flash, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash

Anthropic's pricing is particularly nuanced because they have separate rates for cache creation and cache read tokens. NeuroLink tracks these separately:

import { calculateCost } from "@juspay/neurolink";

// Manual cost calculation (if you need it outside generate())
const cost = calculateCost("anthropic", "claude-opus-4-6", {
  input: 5000,
  output: 1200,
  totalTokens: 6200,
});
console.log(`Calculated cost: $${cost}`);
Enter fullscreen mode Exit fullscreen mode

The calculateCost() utility is exported directly, so you can use it in your own analytics pipelines or cost estimation tools.

Budget Limits: Blocking Requests Before They Run

The real safety valve is maxBudgetUsd. This is a per-instance accumulated cost limit — set it on a NeuroLink instance, and every subsequent call checks the running total before making the API request.

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

try {
  const result = await neurolink.generate({
    input: { text: "Analyze all 3,000 documents in this batch" },
    provider: "anthropic",
    model: "claude-opus-4-6",
    maxBudgetUsd: 10.00, // Stop if accumulated cost exceeds $10
  });

  console.log(`Cost this call: $${result.analytics?.cost}`);
  console.log(result.content);
} catch (err) {
  if (err.name === "BudgetExceededError") {
    console.error("Budget limit reached. Switching to a cheaper model...");
    // Handle gracefully: retry with cheaper model, alert ops, queue for manual review
  }
}
Enter fullscreen mode Exit fullscreen mode

BudgetExceededError is thrown before the API call is made — when the projected cost would push the instance over its limit. You're not billed for requests that get blocked.

This is the credit card limit for your AI pipeline. It should be standard practice. It isn't — yet.

Practical Patterns for Cost Optimization

Pattern 1: Tiered Model Selection

Not every task needs the most expensive model. Use cheap models for simple tasks, expensive ones only when necessary:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

async function analyzeDocument(text: string, isHighPriority: boolean) {
  // For quick classification, a fast cheap model is fine
  const classification = await neurolink.generate({
    input: { text: `Classify this document: ${text.substring(0, 500)}` },
    provider: "openai",
    model: "gpt-4o-mini", // ~20x cheaper than gpt-4o
    maxTokens: 100,
  });

  console.log(`Classification cost: $${classification.analytics?.cost}`);

  // Only call the expensive model for complex analysis
  if (isHighPriority || classification.content.includes("HIGH_RISK")) {
    const deepAnalysis = await neurolink.generate({
      input: { text: `Perform deep legal analysis of: ${text}` },
      provider: "anthropic",
      model: "claude-opus-4-6",
      maxBudgetUsd: 5.00,
    });

    console.log(`Deep analysis cost: $${deepAnalysis.analytics?.cost}`);
    return deepAnalysis.content;
  }

  return classification.content;
}
Enter fullscreen mode Exit fullscreen mode

By routing simple classification to gpt-4o-mini (roughly $0.00015/1K input tokens vs $0.01/1K for gpt-4o), you cut costs on the high-volume path by 65x while reserving the powerful model for cases that genuinely need it.

Pattern 2: Budget Per User or Workflow

Create separate NeuroLink instances with separate budget limits per user session or workflow:

import { NeuroLink } from "@juspay/neurolink";

async function processUserRequest(userId: string, prompt: string) {
  // Each user gets their own budget-tracked instance
  const userSession = new NeuroLink();

  try {
    // Free tier: $0.50 limit
    const result = await userSession.generate({
      input: { text: prompt },
      provider: "openai",
      model: "gpt-4o-mini",
      maxBudgetUsd: 0.50,
      requestId: `user-${userId}-${Date.now()}`,
    });

    // Log cost for billing/analytics
    const callCost = result.analytics?.cost ?? 0;
    await logUsage(userId, callCost, result.usage);

    return result.content;
  } catch (err) {
    if (err.name === "BudgetExceededError") {
      return "You've reached your usage limit for this session. Upgrade to continue.";
    }
    throw err;
  }
}
Enter fullscreen mode Exit fullscreen mode

Each new NeuroLink() instance maintains its own accumulated cost counter independently. This is the natural primitive for per-user or per-workflow cost isolation.

Pattern 3: Cost Logging for Analytics

The result.analytics object contains everything you need to build a cost dashboard:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

async function trackedGenerate(prompt: string, context: { feature: string; userId: string }) {
  const result = await neurolink.generate({
    input: { text: prompt },
    provider: "anthropic",
    model: "claude-sonnet-4-5",
  });

  // Structured cost event for your analytics pipeline
  const costEvent = {
    timestamp: new Date().toISOString(),
    feature: context.feature,
    userId: context.userId,
    provider: result.provider,
    model: result.model,
    inputTokens: result.usage?.input,
    outputTokens: result.usage?.output,
    totalTokens: result.usage?.totalTokens,
    costUsd: result.analytics?.cost,
    responseTimeMs: result.responseTime,
  };

  // Send to your analytics store (DataDog, Mixpanel, your own DB)
  await analytics.track("ai_call", costEvent);

  return result.content;
}
Enter fullscreen mode Exit fullscreen mode

After a week of this, you'll know exactly which features drive the most cost, which users generate the most tokens, and where your optimization effort should go.

Pattern 4: Switching Providers on Budget Exhaustion

When a budget limit hits, fall back to a cheaper provider automatically:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

async function costAwareGenerate(prompt: string) {
  // Try the best model first
  try {
    return await neurolink.generate({
      input: { text: prompt },
      provider: "anthropic",
      model: "claude-opus-4-6",
      maxBudgetUsd: 2.00,
    });
  } catch (err) {
    if (err.name !== "BudgetExceededError") throw err;
    console.warn("Opus budget exceeded, falling back to Haiku");
  }

  // Fall back to a much cheaper model
  return await neurolink.generate({
    input: { text: prompt },
    provider: "anthropic",
    model: "claude-haiku-4-5", // ~50x cheaper than Opus
    maxBudgetUsd: 0.50,
  });
}
Enter fullscreen mode Exit fullscreen mode

This pattern gives you quality-first AI with a hard cost ceiling. Haiku delivers very good results for many tasks at a fraction of Opus pricing.

The Math: Why This Matters at Scale

Let's look at the document analysis scenario from the opening story, with proper cost controls:

Approach Model Cost per doc 3,000 docs
Naive (what happened) claude-opus-4-6 ~$0.13 ~$390
Classify first, Opus only for high-risk (10%) gpt-4o-mini + claude-opus-4-6 ~$0.013 avg ~$40
Haiku for everything claude-haiku-4-5 ~$0.003 ~$9
With maxBudgetUsd: 50 Any Capped at $50

Cost controls don't just save money after the fact. With maxBudgetUsd, the runaway batch job stops at $50 instead of $390. The 2am alert becomes a non-event.

Connecting Cost Data to Observability

If you're using Langfuse for tracing (which NeuroLink supports natively), cost data flows into your traces automatically:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  observability: {
    langfuse: {
      enabled: true,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      environment: "production",
      traceNameFormat: "userId:operationName",
      autoDetectOperationName: true,
    },
  },
});

const result = await neurolink.generate({
  input: { text: "Summarize this legal document" },
  provider: "openai",
  model: "gpt-4o-mini",
  requestId: "doc-summary-001",
});

// result.analytics.cost appears in Langfuse trace automatically
// No additional instrumentation needed
Enter fullscreen mode Exit fullscreen mode

In Langfuse, you can filter traces by cost, see cost trends over time, and identify which prompts or features drive the most spending — all without writing custom instrumentation.

Putting It Together: A Cost-Aware Document Processor

Here's a complete, production-ready document processor that applies all the patterns:

import { NeuroLink } from "@juspay/neurolink";

interface ProcessingResult {
  summary: string;
  costUsd: number;
  model: string;
  tokensUsed: number;
}

const neurolink = new NeuroLink({
  observability: {
    langfuse: {
      enabled: !!process.env.LANGFUSE_PUBLIC_KEY,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "",
      secretKey: process.env.LANGFUSE_SECRET_KEY ?? "",
      environment: process.env.NODE_ENV ?? "development",
    },
  },
});

async function processDocument(
  documentText: string,
  options: { maxCost?: number; highQuality?: boolean } = {}
): Promise<ProcessingResult> {
  const { maxCost = 1.00, highQuality = false } = options;

  // Step 1: Quick classification with cheap model
  const classifyResult = await neurolink.generate({
    input: {
      text: `Classify this document as one of: LEGAL, FINANCIAL, TECHNICAL, GENERAL.
      Document: ${documentText.substring(0, 500)}`,
    },
    provider: "openai",
    model: "gpt-4o-mini",
    maxTokens: 20,
  });

  const category = classifyResult.content.trim();
  console.log(`Category: ${category} (cost: $${classifyResult.analytics?.cost})`);

  // Step 2: Choose model based on category and quality flag
  const needsPowerfulModel =
    highQuality || category === "LEGAL" || category === "FINANCIAL";

  const summaryModel = needsPowerfulModel ? "claude-opus-4-6" : "claude-haiku-4-5";
  const summaryProvider = "anthropic";

  // Step 3: Generate summary with budget cap
  try {
    const summaryResult = await neurolink.generate({
      input: {
        text: `Provide a concise, accurate summary of this ${category} document.
        Focus on key decisions, dates, amounts, and action items.

        Document:
        ${documentText}`,
      },
      provider: summaryProvider,
      model: summaryModel,
      maxTokens: 500,
      maxBudgetUsd: maxCost,
    });

    return {
      summary: summaryResult.content,
      costUsd:
        (classifyResult.analytics?.cost ?? 0) +
        (summaryResult.analytics?.cost ?? 0),
      model: summaryResult.model,
      tokensUsed:
        (classifyResult.usage?.totalTokens ?? 0) +
        (summaryResult.usage?.totalTokens ?? 0),
    };
  } catch (err) {
    if (err.name === "BudgetExceededError") {
      // Retry with cheapest model
      console.warn(`Budget exceeded for ${summaryModel}, retrying with Haiku`);
      const fallbackResult = await neurolink.generate({
        input: {
          text: `Summarize this document in 2-3 sentences: ${documentText}`,
        },
        provider: "anthropic",
        model: "claude-haiku-4-5",
        maxTokens: 200,
      });

      return {
        summary: fallbackResult.content,
        costUsd:
          (classifyResult.analytics?.cost ?? 0) +
          (fallbackResult.analytics?.cost ?? 0),
        model: fallbackResult.model,
        tokensUsed:
          (classifyResult.usage?.totalTokens ?? 0) +
          (fallbackResult.usage?.totalTokens ?? 0),
      };
    }
    throw err;
  }
}

// Process a batch
async function processBatch(documents: string[]) {
  let totalCost = 0;
  const results: ProcessingResult[] = [];

  for (const doc of documents) {
    const result = await processDocument(doc, { maxCost: 0.50 });
    totalCost += result.costUsd;
    results.push(result);
    console.log(
      `Processed: $${result.costUsd.toFixed(4)} | Running total: $${totalCost.toFixed(4)}`
    );
  }

  console.log(`\nBatch complete: ${documents.length} docs, $${totalCost.toFixed(4)} total`);
  return results;
}
Enter fullscreen mode Exit fullscreen mode

If you had run this against 3,000 documents with maxCost: 0.50 per document, the absolute ceiling would have been $1,500 — with the actual cost far lower due to Haiku fallback and cheap classification. More importantly, you'd have full visibility into every call.

Summary

Cost-aware AI pipelines aren't a nice-to-have. They're the difference between a predictable operational expense and a 2am incident.

NeuroLink gives you:

  • Per-call cost tracking: result.analytics?.cost is a USD float, always available
  • calculateCost() utility: manual cost estimation with current pricing data
  • maxBudgetUsd: hard stop before budget is exceeded, throws BudgetExceededError
  • Pricing for all 13 providers: OpenAI, Anthropic, Google, and more, updated Feb 2026
  • Anthropic cache token tracking: separate input/cacheRead/cacheCreation pricing
  • Langfuse integration: cost visible in traces without custom instrumentation

The pattern is simple: track everything, limit ruthlessly, log aggressively. You'll never be surprised by an AI bill again.


Try NeuroLink:

What's your current approach to AI cost tracking? Share in the comments — I'm curious what others are doing.

Top comments (0)