How to Build a Cost-Optimized AI Pipeline (Without Learning the Hard Way)
Let me tell you about the $400 mistake.
A team I know was building a document analysis feature. They wired up Claude Opus for every request — because Opus gives the best results, right? It worked great in testing with 20 documents. They shipped it.
A week later, a batch job processed 3,000 documents. Each one called Opus. Each call cost roughly $0.13. The bill: $390. The on-call alert came in at 2am.
The problem wasn't Claude Opus. It's a great model. The problem was that nobody had put any cost awareness into the pipeline. No tracking. No limits. No alerts. Just API calls flying into the void until the bill arrived.
This post is about doing it the right way from the start: per-call cost tracking, budget limits that actually block requests, analytics you can inspect in code, and model selection based on real cost data.
All of this is built into NeuroLink.
The Real Cost of Ignorance
Most teams discover their AI costs the same way: the monthly bill. By then, the damage is done.
The problem is architectural. Standard AI SDKs give you this:
// What most developers write
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: prompt }],
});
// How much did that cost? ¯\_(ツ)_/¯
You get a response. You have no idea what it cost. Token counts are buried in response.usage if you remember to check, and you'd need to manually multiply by the current pricing (which you'd need to look up) to get a dollar figure.
Multiply this across thousands of requests, multiple models, and multiple providers, and cost visibility becomes a spreadsheet exercise after the fact — not an operational control in real time.
NeuroLink's Approach: Cost as a First-Class Citizen
NeuroLink ships with a pricing utility (src/lib/utils/pricing.ts) that contains current pricing data for every supported provider and model. Cost is calculated automatically after every API call and included in the result.
Here's the difference:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "Analyze this contract for liability clauses" },
provider: "anthropic",
model: "claude-opus-4-6",
});
// Cost is right here, no calculation needed
console.log(`Cost: $${result.analytics?.cost}`);
console.log(`Input tokens: ${result.usage?.input}`);
console.log(`Output tokens: ${result.usage?.output}`);
console.log(`Total tokens: ${result.usage?.totalTokens}`);
console.log(`Response time: ${result.responseTime}ms`);
result.analytics.cost is a USD float. You know exactly what each call cost, at call time, without touching a spreadsheet.
The Pricing Table
NeuroLink's pricing data covers all 13 supported providers. As of February 2026, the table includes:
| Provider | Model | Notes |
|---|---|---|
| Anthropic | claude-opus-4-6, claude-sonnet-4-5, claude-haiku-4-5 | Includes cache read/write tokens |
| OpenAI | gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o1-mini | |
| gemini-2.0-flash, gemini-2.0-pro, gemini-1.5-pro, gemini-1.5-flash |
Anthropic's pricing is particularly nuanced because they have separate rates for cache creation and cache read tokens. NeuroLink tracks these separately:
import { calculateCost } from "@juspay/neurolink";
// Manual cost calculation (if you need it outside generate())
const cost = calculateCost("anthropic", "claude-opus-4-6", {
input: 5000,
output: 1200,
totalTokens: 6200,
});
console.log(`Calculated cost: $${cost}`);
The calculateCost() utility is exported directly, so you can use it in your own analytics pipelines or cost estimation tools.
Budget Limits: Blocking Requests Before They Run
The real safety valve is maxBudgetUsd. This is a per-instance accumulated cost limit — set it on a NeuroLink instance, and every subsequent call checks the running total before making the API request.
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
try {
const result = await neurolink.generate({
input: { text: "Analyze all 3,000 documents in this batch" },
provider: "anthropic",
model: "claude-opus-4-6",
maxBudgetUsd: 10.00, // Stop if accumulated cost exceeds $10
});
console.log(`Cost this call: $${result.analytics?.cost}`);
console.log(result.content);
} catch (err) {
if (err.name === "BudgetExceededError") {
console.error("Budget limit reached. Switching to a cheaper model...");
// Handle gracefully: retry with cheaper model, alert ops, queue for manual review
}
}
BudgetExceededError is thrown before the API call is made — when the projected cost would push the instance over its limit. You're not billed for requests that get blocked.
This is the credit card limit for your AI pipeline. It should be standard practice. It isn't — yet.
Practical Patterns for Cost Optimization
Pattern 1: Tiered Model Selection
Not every task needs the most expensive model. Use cheap models for simple tasks, expensive ones only when necessary:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
async function analyzeDocument(text: string, isHighPriority: boolean) {
// For quick classification, a fast cheap model is fine
const classification = await neurolink.generate({
input: { text: `Classify this document: ${text.substring(0, 500)}` },
provider: "openai",
model: "gpt-4o-mini", // ~20x cheaper than gpt-4o
maxTokens: 100,
});
console.log(`Classification cost: $${classification.analytics?.cost}`);
// Only call the expensive model for complex analysis
if (isHighPriority || classification.content.includes("HIGH_RISK")) {
const deepAnalysis = await neurolink.generate({
input: { text: `Perform deep legal analysis of: ${text}` },
provider: "anthropic",
model: "claude-opus-4-6",
maxBudgetUsd: 5.00,
});
console.log(`Deep analysis cost: $${deepAnalysis.analytics?.cost}`);
return deepAnalysis.content;
}
return classification.content;
}
By routing simple classification to gpt-4o-mini (roughly $0.00015/1K input tokens vs $0.01/1K for gpt-4o), you cut costs on the high-volume path by 65x while reserving the powerful model for cases that genuinely need it.
Pattern 2: Budget Per User or Workflow
Create separate NeuroLink instances with separate budget limits per user session or workflow:
import { NeuroLink } from "@juspay/neurolink";
async function processUserRequest(userId: string, prompt: string) {
// Each user gets their own budget-tracked instance
const userSession = new NeuroLink();
try {
// Free tier: $0.50 limit
const result = await userSession.generate({
input: { text: prompt },
provider: "openai",
model: "gpt-4o-mini",
maxBudgetUsd: 0.50,
requestId: `user-${userId}-${Date.now()}`,
});
// Log cost for billing/analytics
const callCost = result.analytics?.cost ?? 0;
await logUsage(userId, callCost, result.usage);
return result.content;
} catch (err) {
if (err.name === "BudgetExceededError") {
return "You've reached your usage limit for this session. Upgrade to continue.";
}
throw err;
}
}
Each new NeuroLink() instance maintains its own accumulated cost counter independently. This is the natural primitive for per-user or per-workflow cost isolation.
Pattern 3: Cost Logging for Analytics
The result.analytics object contains everything you need to build a cost dashboard:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
async function trackedGenerate(prompt: string, context: { feature: string; userId: string }) {
const result = await neurolink.generate({
input: { text: prompt },
provider: "anthropic",
model: "claude-sonnet-4-5",
});
// Structured cost event for your analytics pipeline
const costEvent = {
timestamp: new Date().toISOString(),
feature: context.feature,
userId: context.userId,
provider: result.provider,
model: result.model,
inputTokens: result.usage?.input,
outputTokens: result.usage?.output,
totalTokens: result.usage?.totalTokens,
costUsd: result.analytics?.cost,
responseTimeMs: result.responseTime,
};
// Send to your analytics store (DataDog, Mixpanel, your own DB)
await analytics.track("ai_call", costEvent);
return result.content;
}
After a week of this, you'll know exactly which features drive the most cost, which users generate the most tokens, and where your optimization effort should go.
Pattern 4: Switching Providers on Budget Exhaustion
When a budget limit hits, fall back to a cheaper provider automatically:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
async function costAwareGenerate(prompt: string) {
// Try the best model first
try {
return await neurolink.generate({
input: { text: prompt },
provider: "anthropic",
model: "claude-opus-4-6",
maxBudgetUsd: 2.00,
});
} catch (err) {
if (err.name !== "BudgetExceededError") throw err;
console.warn("Opus budget exceeded, falling back to Haiku");
}
// Fall back to a much cheaper model
return await neurolink.generate({
input: { text: prompt },
provider: "anthropic",
model: "claude-haiku-4-5", // ~50x cheaper than Opus
maxBudgetUsd: 0.50,
});
}
This pattern gives you quality-first AI with a hard cost ceiling. Haiku delivers very good results for many tasks at a fraction of Opus pricing.
The Math: Why This Matters at Scale
Let's look at the document analysis scenario from the opening story, with proper cost controls:
| Approach | Model | Cost per doc | 3,000 docs |
|---|---|---|---|
| Naive (what happened) | claude-opus-4-6 | ~$0.13 | ~$390 |
| Classify first, Opus only for high-risk (10%) | gpt-4o-mini + claude-opus-4-6 | ~$0.013 avg | ~$40 |
| Haiku for everything | claude-haiku-4-5 | ~$0.003 | ~$9 |
| With maxBudgetUsd: 50 | Any | Capped at | $50 |
Cost controls don't just save money after the fact. With maxBudgetUsd, the runaway batch job stops at $50 instead of $390. The 2am alert becomes a non-event.
Connecting Cost Data to Observability
If you're using Langfuse for tracing (which NeuroLink supports natively), cost data flows into your traces automatically:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({
observability: {
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
environment: "production",
traceNameFormat: "userId:operationName",
autoDetectOperationName: true,
},
},
});
const result = await neurolink.generate({
input: { text: "Summarize this legal document" },
provider: "openai",
model: "gpt-4o-mini",
requestId: "doc-summary-001",
});
// result.analytics.cost appears in Langfuse trace automatically
// No additional instrumentation needed
In Langfuse, you can filter traces by cost, see cost trends over time, and identify which prompts or features drive the most spending — all without writing custom instrumentation.
Putting It Together: A Cost-Aware Document Processor
Here's a complete, production-ready document processor that applies all the patterns:
import { NeuroLink } from "@juspay/neurolink";
interface ProcessingResult {
summary: string;
costUsd: number;
model: string;
tokensUsed: number;
}
const neurolink = new NeuroLink({
observability: {
langfuse: {
enabled: !!process.env.LANGFUSE_PUBLIC_KEY,
publicKey: process.env.LANGFUSE_PUBLIC_KEY ?? "",
secretKey: process.env.LANGFUSE_SECRET_KEY ?? "",
environment: process.env.NODE_ENV ?? "development",
},
},
});
async function processDocument(
documentText: string,
options: { maxCost?: number; highQuality?: boolean } = {}
): Promise<ProcessingResult> {
const { maxCost = 1.00, highQuality = false } = options;
// Step 1: Quick classification with cheap model
const classifyResult = await neurolink.generate({
input: {
text: `Classify this document as one of: LEGAL, FINANCIAL, TECHNICAL, GENERAL.
Document: ${documentText.substring(0, 500)}`,
},
provider: "openai",
model: "gpt-4o-mini",
maxTokens: 20,
});
const category = classifyResult.content.trim();
console.log(`Category: ${category} (cost: $${classifyResult.analytics?.cost})`);
// Step 2: Choose model based on category and quality flag
const needsPowerfulModel =
highQuality || category === "LEGAL" || category === "FINANCIAL";
const summaryModel = needsPowerfulModel ? "claude-opus-4-6" : "claude-haiku-4-5";
const summaryProvider = "anthropic";
// Step 3: Generate summary with budget cap
try {
const summaryResult = await neurolink.generate({
input: {
text: `Provide a concise, accurate summary of this ${category} document.
Focus on key decisions, dates, amounts, and action items.
Document:
${documentText}`,
},
provider: summaryProvider,
model: summaryModel,
maxTokens: 500,
maxBudgetUsd: maxCost,
});
return {
summary: summaryResult.content,
costUsd:
(classifyResult.analytics?.cost ?? 0) +
(summaryResult.analytics?.cost ?? 0),
model: summaryResult.model,
tokensUsed:
(classifyResult.usage?.totalTokens ?? 0) +
(summaryResult.usage?.totalTokens ?? 0),
};
} catch (err) {
if (err.name === "BudgetExceededError") {
// Retry with cheapest model
console.warn(`Budget exceeded for ${summaryModel}, retrying with Haiku`);
const fallbackResult = await neurolink.generate({
input: {
text: `Summarize this document in 2-3 sentences: ${documentText}`,
},
provider: "anthropic",
model: "claude-haiku-4-5",
maxTokens: 200,
});
return {
summary: fallbackResult.content,
costUsd:
(classifyResult.analytics?.cost ?? 0) +
(fallbackResult.analytics?.cost ?? 0),
model: fallbackResult.model,
tokensUsed:
(classifyResult.usage?.totalTokens ?? 0) +
(fallbackResult.usage?.totalTokens ?? 0),
};
}
throw err;
}
}
// Process a batch
async function processBatch(documents: string[]) {
let totalCost = 0;
const results: ProcessingResult[] = [];
for (const doc of documents) {
const result = await processDocument(doc, { maxCost: 0.50 });
totalCost += result.costUsd;
results.push(result);
console.log(
`Processed: $${result.costUsd.toFixed(4)} | Running total: $${totalCost.toFixed(4)}`
);
}
console.log(`\nBatch complete: ${documents.length} docs, $${totalCost.toFixed(4)} total`);
return results;
}
If you had run this against 3,000 documents with maxCost: 0.50 per document, the absolute ceiling would have been $1,500 — with the actual cost far lower due to Haiku fallback and cheap classification. More importantly, you'd have full visibility into every call.
Summary
Cost-aware AI pipelines aren't a nice-to-have. They're the difference between a predictable operational expense and a 2am incident.
NeuroLink gives you:
-
Per-call cost tracking:
result.analytics?.costis a USD float, always available -
calculateCost()utility: manual cost estimation with current pricing data -
maxBudgetUsd: hard stop before budget is exceeded, throwsBudgetExceededError - Pricing for all 13 providers: OpenAI, Anthropic, Google, and more, updated Feb 2026
- Anthropic cache token tracking: separate input/cacheRead/cacheCreation pricing
- Langfuse integration: cost visible in traces without custom instrumentation
The pattern is simple: track everything, limit ruthlessly, log aggressively. You'll never be surprised by an AI bill again.
Try NeuroLink:
- GitHub: github.com/juspay/neurolink — give it a star
- Discord: Ask questions, share patterns
- Install:
npm install @juspay/neurolink
What's your current approach to AI cost tracking? Share in the comments — I'm curious what others are doing.
Top comments (0)