DEV Community: NeuroLink AI

Tracing Tool Calls in MCP Workflows: Per-Tool Latency, Cost, and Failure Modes

NeuroLink AI — Sat, 16 May 2026 06:40:13 +0000

Tracing Tool Calls in MCP Workflows: Per-Tool Latency, Cost, and Failure Modes

MCP agents are opaque by default. You call generate(), the model decides to invoke four tools, and three seconds later you get an answer. Which tool took 2.8 of those seconds? Was it the database lookup? The file read? The search API that's been flaky since Tuesday?

Without per-tool tracing, you're debugging a black box.

This article builds on three earlier pieces — AI Observability: Logging, Tracing, and Monitoring Your AI Calls, Building AI Agents with MCP and TypeScript in 2026, and OpenTelemetry for AI: Tracing Every Token Through Your Pipeline — and goes one level deeper: tracing individual tool invocations in multi-step MCP workflows, not just the outer LLM call.

Why Tool-Level Tracing Matters

A typical MCP workflow looks like this:

User prompt
  → search (github.search_code)         ~200ms
  → read × 3 (github.read_file)         ~150ms × 3
  → analyze (code-analyzer.check)       ~2,800ms  ← the real bottleneck
  → write (github.create_issue)         ~300ms
Total: ~3,750ms

If you only trace the outer generate() call, you see 3.75 seconds and conclude "the LLM is slow." It isn't. The analyzer tool is slow, and it's probably doing something fixable — a missing index, a cold lambda, an API without keepalive.

A "slow agent" is almost always one slow tool. The rest is noise.

The Tracing Architecture

The pattern has three parts:

NeuroLink's transformParams hook — fires before each tool invocation, starts an OTEL span
NeuroLink's transformResponse hook — fires after each tool returns, ends the span with latency + outcome
onError hook — fires on tool failures, records the error type and whether it's retryable

You wrap the MCP tool layer with spans, not the entire generate() call. This gives you one span per tool call, nested under the parent trace.

Setup: OpenTelemetry + NeuroLink

npm install @juspay/neurolink
npm install @opentelemetry/api @opentelemetry/sdk-node @opentelemetry/exporter-otlp-http

Initialize your tracer once at startup:

// tracing.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-otlp-http";
import { trace, SpanStatusCode } from "@opentelemetry/api";

export const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://localhost:4318/v1/traces",
  }),
});

sdk.start();

export const tracer = trace.getTracer("neurolink-mcp-agent", "1.0.0");
export { SpanStatusCode };

Wrapping Each Tool Call in a Span

NeuroLink's middleware system exposes transformParams and transformResponse. For MCP tool tracing, you write middleware that instruments the tool dispatch layer.

Here's the core pattern:

import { NeuroLink } from "@juspay/neurolink";
import { tracer, SpanStatusCode } from "./tracing";
import { context, trace } from "@opentelemetry/api";

// Track in-flight spans by tool call ID so we can close them on response
const activeSpans = new Map<string, ReturnType<typeof tracer.startSpan>>();

const mcpTracingMiddleware = {
  name: "mcp-tool-tracer",
  priority: 115, // Runs before analytics, after transform

  transformParams: async (params: any, ctx: any) => {
    // Only instrument MCP tool calls, not direct generate() calls
    if (!params.toolCall) return params;

    const toolName = params.toolCall.name;
    const serverId = params.toolCall.serverId ?? "unknown";
    const callId = params.toolCall.id ?? `${toolName}-${Date.now()}`;

    // Start a span as a child of the current active trace
    const parentContext = ctx.traceContext ?? context.active();
    const span = tracer.startSpan(
      `mcp.tool.${toolName}`,
      {
        attributes: {
          "mcp.tool.name": toolName,
          "mcp.server.id": serverId,
          "mcp.call.id": callId,
          "mcp.input.size": JSON.stringify(params.toolCall.input ?? {}).length,
        },
      },
      parentContext
    );

    activeSpans.set(callId, span);

    // Attach call start time for latency calculation
    return { ...params, _spanCallId: callId, _spanStartMs: Date.now() };
  },

  transformResponse: async (response: any, params: any) => {
    const callId = params._spanCallId;
    if (!callId) return response;

    const span = activeSpans.get(callId);
    if (!span) return response;

    const latencyMs = Date.now() - (params._spanStartMs ?? Date.now());

    span.setAttributes({
      "mcp.latency_ms": latencyMs,
      "mcp.output.size": JSON.stringify(response?.result ?? {}).length,
      "mcp.success": true,
    });
    span.setStatus({ code: SpanStatusCode.OK });
    span.end();
    activeSpans.delete(callId);

    return response;
  },
};

Handling Failure Modes

MCP tool failures come in three distinct flavors, and they need different handling:

const mcpErrorMiddleware = {
  name: "mcp-error-classifier",
  onError: (error: Error, metadata: any) => {
    const callId = metadata._spanCallId;
    const span = callId ? activeSpans.get(callId) : undefined;

    // Classify the failure
    let errorClass: string;
    let retryable: boolean;

    if (error.message.includes("ETIMEDOUT") || error.message.includes("timeout")) {
      // Tool server didn't respond in time
      errorClass = "timeout";
      retryable = true;
    } else if (error.message.includes("rate_limit") || error.message.includes("429")) {
      // Tool's upstream API is rate-limited
      errorClass = "rate_limit";
      retryable = true; // After backoff
    } else if (
      error.message.includes("SyntaxError") ||
      error.message.includes("JSON") ||
      error.message.includes("schema")
    ) {
      // Tool returned malformed output
      // This is almost always a tool implementation bug, not a transient error
      errorClass = "malformed_output";
      retryable = false;
    } else {
      errorClass = "unknown";
      retryable = metadata.recoverable ?? false;
    }

    if (span) {
      span.setAttributes({
        "mcp.error.class": errorClass,
        "mcp.error.message": error.message,
        "mcp.error.retryable": retryable,
        "mcp.latency_ms": Date.now() - (metadata._spanStartMs ?? Date.now()),
      });
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: `${errorClass}: ${error.message}`,
      });
      span.end();

      if (callId) activeSpans.delete(callId);
    }

    // Don't swallow — NeuroLink re-throws after onError fires
    logger.error("mcp_tool_error", {
      tool: metadata.toolCall?.name,
      errorClass,
      retryable,
      message: error.message,
    });
  },
};

The malformed output case deserves special attention. When a tool returns JSON that doesn't match the schema the LLM expected, the model may retry the tool call silently — or hallucinate an answer without the data. Both are bad. Tag these errors distinctly so you can find them in your trace backend.

Trace ID Propagation Across the Agent Run

Each generate() call should share a single trace ID so you can reconstruct the full tool chain in your trace viewer. Pass the trace context explicitly:

import { NeuroLink } from "@juspay/neurolink";
import { tracer, SpanStatusCode } from "./tracing";
import { context, propagation, ROOT_CONTEXT } from "@opentelemetry/api";

async function runAgent(prompt: string, agentRunId: string) {
  // Create the root span for the entire agent run
  const rootSpan = tracer.startSpan("agent.run", {
    attributes: {
      "agent.run_id": agentRunId,
      "agent.prompt_length": prompt.length,
    },
  });

  // All tool spans created during this generate() will be children of rootSpan
  const agentContext = trace.setSpan(context.active(), rootSpan);

  const ai = new NeuroLink({
    provider: "anthropic",
    model: "claude-sonnet-4-6",
    apiKey: process.env.ANTHROPIC_API_KEY,
    middleware: [mcpTracingMiddleware, mcpErrorMiddleware],
  });

  await ai.addExternalMCPServer("github", {
    command: "npx",
    args: ["-y", "@modelcontextprotocol/server-github"],
    transport: "stdio",
    env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
  });

  let result: any;

  await context.with(agentContext, async () => {
    result = await ai.generate({
      input: { text: prompt },
    });
  });

  rootSpan.setAttributes({
    "agent.tool_calls": result?.toolCalls?.length ?? 0,
    "agent.tokens.total": result?.usage?.totalTokens ?? 0,
    "agent.success": true,
  });
  rootSpan.setStatus({ code: SpanStatusCode.OK });
  rootSpan.end();

  return result;
}

In Jaeger, Grafana Tempo, or Honeycomb, this renders as a flame graph: the root span contains the LLM call, the LLM call contains all the tool spans, each tool span shows latency and outcome. The slow tool stands out immediately.

What to Track Per Tool

Once you have spans, you want these attributes on every tool call:

Attribute	Why
`mcp.tool.name`	Which tool — the primary grouping key
`mcp.server.id`	Which MCP server — helps distinguish tools with the same name on different servers
`mcp.latency_ms`	The single most useful number for debugging slow agents
`mcp.input.size`	Large inputs often cause timeouts; correlate with latency
`mcp.output.size`	Unexpected output sizes suggest a schema mismatch
`mcp.error.class`	`timeout` / `rate_limit` / `malformed_output` — drives alerting strategy
`mcp.error.retryable`	Whether the agent should attempt this tool call again

For cost-bearing tools (ones that hit paid APIs internally), you can add mcp.estimated_cost_usd from your internal pricing table. The tool cost is often larger than the LLM cost for data-heavy workflows.

A note on where this data lives: NeuroLink's generate() result exposes a toolCalls array (each entry is {toolCallId, toolName, args}) plus toolExecutions ({name, input, output}) and toolResults. The middleware sees the full call/response inside transformParams / wrapGenerate — that's where you do the rich span work — but the post-generation summary is also available on the result object for ad-hoc analytics.

Per-Tool Latency Budgets

You can extend the tracing middleware to enforce latency budgets per tool class — failing fast instead of waiting for a timeout:

const TOOL_TIMEOUT_MS: Record<string, number> = {
  "github.read_file": 5_000,
  "github.search_code": 8_000,
  "database.query": 3_000,
  "web.search": 10_000,
};

const timeoutMiddleware = {
  name: "tool-timeout",
  priority: 125,
  transformParams: async (params: any) => {
    if (!params.toolCall) return params;

    const toolName = params.toolCall.name;
    const budget = TOOL_TIMEOUT_MS[toolName];

    if (budget) {
      // Attach deadline for downstream middleware/provider to enforce
      return { ...params, toolCall: { ...params.toolCall, timeoutMs: budget } };
    }
    return params;
  },
};

A practical note on timeoutMs: NeuroLink's tool registration accepts a per-tool timeoutMs (see ToolRegistrationOptions in the SDK) and forwards it to the tool executor for MCP-level enforcement — so registering a tool with { execute, timeoutMs: 3000 } is the cleanest way to set a hard per-tool deadline. The middleware pattern above mutates params.toolCall.timeoutMs, which is only useful if you have a downstream middleware that reads it — pick one approach, not both.

Putting It All Together

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink({
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  apiKey: process.env.ANTHROPIC_API_KEY,
  middleware: [
    timeoutMiddleware,      // Priority 125: set per-tool deadlines
    mcpTracingMiddleware,   // Priority 115: open/close OTEL spans
    mcpErrorMiddleware,     // Priority 115: classify errors (onError hook)
    // Priority 100: NeuroLink's built-in analytics middleware
  ],
});

await ai.addExternalMCPServer("github", {
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-github"],
  transport: "stdio",
  env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
});

// Run with trace propagation
const result = await runAgent(
  "Find all files using processPayment() and create a security issue if any lack input validation",
  "run-" + crypto.randomUUID()
);

Your trace now shows:

Root span: total agent execution time
LLM span: actual model inference time (usually the smallest component)
Tool spans: github.search_code, github.read_file × N, github.create_issue with individual latencies
Error spans: any tools that failed, with error class and retryable flag

The model is usually not your bottleneck. The tools are. Now you can prove it.

Get started with NeuroLink:

GitHub: https://github.com/juspay/neurolink
npm: npm install @juspay/neurolink
Docs: https://blog.neurolink.ink/docs
Blog: https://blog.neurolink.ink

Productionizing Ollama: Rate Limits, Cloud Fallback, and Cost Guardrails

NeuroLink AI — Sat, 16 May 2026 02:39:44 +0000

Productionizing Ollama: Rate Limits, Cloud Fallback, and Cost Guardrails

Running Ollama locally is easy. Running it in a production service that handles concurrent users without melting your box — that's a different problem.

I wrote up the basic Ollama + NeuroLink setup in Running Local LLMs with NeuroLink and Ollama: Complete Guide. This article is the follow-up: what happens after you ship it and it gets real traffic.

Three things break first: request queues pile up under concurrency, latency spikes on heavier models, and you have no budget guardrails because "it's free" turns out not to mean "it can't cause you problems." Here's how to solve all three.

The Problem: Ollama Has No Native Rate Limiting

OpenAI returns a 429 when you hit its rate limit. Ollama doesn't have a rate limit — it queues requests and processes them serially on whatever GPU you have. Five concurrent requests to llama3.1:70b on a single machine means the fifth request waits for the first four to finish.

In practice, your p99 latency goes from 4 seconds to 20 seconds and your users give up.

You need to impose your own rate limiting at the SDK layer before requests reach the Ollama process.

Pattern 1: Request Throttling via Middleware

NeuroLink's middleware system runs as a pipeline on every generate() call. A throttling middleware can reject or queue requests before they're dispatched to the provider:

import { NeuroLink } from "@juspay/neurolink";

// Simple token-bucket rate limiter
class TokenBucket {
  private tokens: number;
  private lastRefill = Date.now();

  constructor(
    private readonly capacity: number,
    private readonly refillRatePerSecond: number
  ) {
    this.tokens = capacity;
  }

  consume(): boolean {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRatePerSecond
    );
    this.lastRefill = now;

    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;
    }
    return false;
  }
}

const bucket = new TokenBucket(10, 2); // 10 burst, 2 req/sec sustained

const throttleMiddleware = {
  name: "ollama-throttle",
  priority: 120, // Runs before everything else
  transformParams: async (params: any) => {
    if (!bucket.consume()) {
      throw new Error("LOCAL_RATE_LIMIT: Ollama request queue full");
    }
    return params;
  },
};

const ai = new NeuroLink({
  provider: "ollama",
  model: "llama3.1",
  middleware: [throttleMiddleware],
});

The middleware throws a LOCAL_RATE_LIMIT error before the request reaches Ollama. Your calling code catches this and routes elsewhere — which brings us to the next pattern.

Pattern 2: Falling Back to Cloud When Local is Overloaded

This is the multi-provider fallback pattern from Building Resilient AI: Multi-Provider Fallback Patterns in TypeScript applied specifically to the Ollama overload scenario.

NeuroLink's fallbackChain handles provider-level failures automatically, but the throttle middleware above throws before the provider is even called. You need to catch that specific error and escalate.

Here's the full pattern:

import { NeuroLink } from "@juspay/neurolink";

// Primary: local Ollama with throttle
const localAI = new NeuroLink({
  provider: "ollama",
  model: "llama3.1",
  middleware: [throttleMiddleware],
});

// Fallback: cloud providers in priority order
const cloudAI = new NeuroLink({
  providers: [
    { name: "anthropic", model: "claude-3-5-haiku-20241022", priority: 1 },
    { name: "openai", model: "gpt-4o-mini", priority: 2 },
  ],
  fallbackChain: ["anthropic", "openai"],
});

async function generate(prompt: string) {
  try {
    return await localAI.generate({ input: { text: prompt } });
  } catch (err: any) {
    if (err.message?.startsWith("LOCAL_RATE_LIMIT")) {
      // Ollama queue full — route to cloud
      console.warn("Ollama saturated, routing to cloud");
      return await cloudAI.generate({ input: { text: prompt } });
    }
    throw err; // Re-throw unexpected errors
  }
}

const result = await generate("Summarize this support ticket...");
console.log(`Provider used: ${result.provider}`);

The critical thing here: you want Haiku or GPT-4o-mini as your cloud fallback, not Claude Sonnet or GPT-4o. The fallback scenario is "Ollama is busy" — you're handling overflow, not upgrading quality. Match the capability tier, not the price tier.

Pattern 3: Latency Budgets — Switching on Timeout

Queue saturation isn't the only signal that Ollama is struggling. A 70B model under thermal throttling might accept the request but take 30 seconds to answer. You need a latency budget.

NeuroLink's generate() accepts a timeout option (number ms or string like "8s") plus an abortSignal, and the FallbackConfig chain triggers on errors — including timeout errors. Combine both for a clean latency-budget pattern:

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink({
  providers: [
    {
      name: "ollama",
      model: "llama3.1",
      priority: 1,
    },
    {
      name: "anthropic",
      model: "claude-3-5-haiku-20241022",
      priority: 2,
      apiKey: process.env.ANTHROPIC_API_KEY,
    },
  ],
  fallbackConfig: {
    enabled: true,
    maxAttempts: 2, // ollama, then anthropic
    circuitBreaker: true,
  },
});

const result = await ai.generate({
  input: { text: prompt },
  timeout: 8000, // 8s budget for the call; throws → fallback chain takes over
});

// Log which provider actually served this request
if (result.provider !== "ollama") {
  console.warn(`Latency budget exceeded, fell back to ${result.provider}`);
  metrics.increment("ollama.latency_fallback");
}

Set your timeout conservatively. An 8-second budget for an interactive request is already too slow for chat. If you're building a real-time interface, consider 3-4 seconds and accepting that heavy models will frequently fall back. Batch processing can afford 15-30 seconds.

The timeout option applies to the whole generate() call. For a strict per-provider deadline (e.g., "give Ollama exactly 3 seconds before racing Claude"), wrap each provider's call in a Promise.race with your own AbortController — the SDK doesn't expose a per-provider timeout field directly.

Pattern 4: Cost Guardrails with the onFinish Hook

"Ollama is free" is true for the LLM calls themselves. It's not true for:

Cloud fallback calls (every Anthropic/OpenAI request costs money)
Your compute bill if you're running Ollama on cloud GPU instances
The engineering time debugging a service that's silently spending money

The onFinish lifecycle hook fires after every successful generation with usage data and provider info. Use it to track where your spend is going:

import { NeuroLink } from "@juspay/neurolink";

// Per-1K token pricing (cloud fallback providers)
const CLOUD_PRICING: Record<string, { input: number; output: number }> = {
  "claude-3-5-haiku-20241022": { input: 0.0008, output: 0.004 },
  "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
};

let sessionCost = 0;
const BUDGET_ALERT_USD = 5.0; // Alert when session spend hits $5

const ai = new NeuroLink({
  providers: [
    { name: "ollama", model: "llama3.1", priority: 1 },
    {
      name: "anthropic",
      model: "claude-3-5-haiku-20241022",
      priority: 2,
      apiKey: process.env.ANTHROPIC_API_KEY,
    },
  ],
  fallback: true,
  fallbackConfig: { timeoutMs: 8000, retryAttempts: 1 },
  middleware: [
    {
      name: "cost-guard",
      onFinish: (result, metadata) => {
        // Ollama cost is effectively zero, but the hook still fires
        const pricing = CLOUD_PRICING[metadata.model] ?? { input: 0, output: 0 };
        const callCost =
          ((result.usage?.promptTokens ?? 0) / 1000) * pricing.input +
          ((result.usage?.completionTokens ?? 0) / 1000) * pricing.output;

        sessionCost += callCost;

        // Always log provider — visibility into fallback frequency is useful
        console.log(
          `[cost-guard] provider=${metadata.provider} ` +
          `model=${metadata.model} ` +
          `tokens=${result.usage?.totalTokens ?? 0} ` +
          `cost=$${callCost.toFixed(6)} ` +
          `session_total=$${sessionCost.toFixed(4)}`
        );

        if (metadata.provider !== "ollama") {
          metrics.increment("ollama.fallback_call", {
            provider: metadata.provider,
          });
        }

        if (sessionCost > BUDGET_ALERT_USD) {
          notifyOps(`Cloud fallback cost alert: $${sessionCost.toFixed(2)} this session`);
        }
      },
    },
  ],
});

Even when Ollama handles the request, this log line tells you your fallback rate. If 30% of requests are hitting cloud fallback, your Ollama instance is undersized for your traffic.

Putting It Together: A Production-Ready Ollama Service

Here's the complete pattern for a service that handles realistic traffic:

import { NeuroLink } from "@juspay/neurolink";

const CLOUD_PRICING = {
  "claude-3-5-haiku-20241022": { input: 0.0008, output: 0.004 },
  "gpt-4o-mini": { input: 0.00015, output: 0.0006 },
};

const bucket = new TokenBucket(10, 2);

export const ai = new NeuroLink({
  providers: [
    { name: "ollama", model: "llama3.1", priority: 1 },
    {
      name: "anthropic",
      model: "claude-3-5-haiku-20241022",
      priority: 2,
      apiKey: process.env.ANTHROPIC_API_KEY,
    },
  ],
  fallback: true,
  fallbackConfig: {
    timeoutMs: 8000,
    retryAttempts: 1,
  },
  middleware: [
    {
      name: "throttle",
      priority: 120,
      transformParams: async (params: any) => {
        if (!bucket.consume()) {
          throw new Error("LOCAL_RATE_LIMIT");
        }
        return params;
      },
    },
    {
      name: "cost-guard",
      onFinish: (result, metadata) => {
        const pricing = (CLOUD_PRICING as any)[metadata.model] ?? { input: 0, output: 0 };
        const cost =
          ((result.usage?.promptTokens ?? 0) / 1000) * pricing.input +
          ((result.usage?.completionTokens ?? 0) / 1000) * pricing.output;

        recordMetrics({
          provider: metadata.provider,
          model: metadata.model,
          tokens: result.usage?.totalTokens ?? 0,
          cost,
          duration: metadata.duration,
          wasLocal: metadata.provider === "ollama",
        });
      },
      onError: (error, metadata) => {
        logger.error("generation_failed", {
          provider: metadata.provider,
          error: error.message,
          recoverable: metadata.recoverable,
        });
      },
    },
  ],
});

export async function generateWithFallback(prompt: string) {
  try {
    return await ai.generate({ input: { text: prompt } });
  } catch (err: any) {
    if (err.message?.startsWith("LOCAL_RATE_LIMIT")) {
      // Explicit queue-full path: skip Ollama entirely, go straight to cloud
      return await new NeuroLink({
        providers: [
          {
            name: "anthropic",
            model: "claude-3-5-haiku-20241022",
            apiKey: process.env.ANTHROPIC_API_KEY,
          },
        ],
      }).generate({ input: { text: prompt } });
    }
    throw err;
  }
}

What to Watch in Production

A few metrics worth tracking:

ollama.fallback_rate: What percentage of requests don't complete on Ollama. Over 10% means your instance is undersized.
ollama.p95_latency: If your 70B model's p95 goes above your timeout threshold, you need a smaller model or more hardware.
cloud_fallback.cost_per_hour: Your actual cloud spend from overflow requests. This is your real Ollama infrastructure cost.
token_bucket.rejection_rate: How often you're hitting the local rate limit before even trying Ollama. A spike here usually means a burst of traffic, not a hardware problem.

The Ollama guide covers what to run. This setup covers what to watch after you run it.

Get started with NeuroLink:

GitHub: https://github.com/juspay/neurolink
npm: npm install @juspay/neurolink
Docs: https://blog.neurolink.ink/docs
Blog: https://blog.neurolink.ink

Building Your Own AI Proxy: Route, Cache, and Monitor LLM Requests in TypeScript

NeuroLink AI — Mon, 06 Apr 2026 10:43:05 +0000

Building Your Own AI Proxy: Route, Cache, and Monitor LLM Requests in TypeScript

In the rapidly evolving world of AI, Large Language Models (LLMs) have become indispensable tools for a myriad of applications. However, integrating and managing these powerful models in production environments comes with its own set of challenges: spiraling costs, vendor lock-in, inconsistent APIs, and a lack of observability. This is where an AI proxy becomes a game-changer.

At Juspay, a fintech company dealing with high-volume, mission-critical transactions, we've learned the hard way that robust infrastructure is paramount. Our experience building and scaling payment systems has directly informed our approach to AI integration, leading to the creation of NeuroLink—our universal AI development platform. NeuroLink isn't just an SDK; it's the foundation upon which you can build sophisticated AI infrastructure, including your own AI proxy.

This article will guide you through the process of building a powerful AI proxy using NeuroLink in TypeScript, covering key components like routing, caching, rate limiting, cost tracking, and logging.

Why Teams Build AI Proxies

Before diving into the "how," let's understand the "why." Why do engineering teams, especially in enterprise environments, invest in building their own AI proxies?

Cost Control and Optimization: LLM usage can get expensive, fast. A proxy allows you to implement intelligent routing to the cheapest available model for a given task, enforce rate limits to prevent accidental overspending, and track costs per user or project.
Multi-Tenancy and Access Control: For platforms serving multiple users or internal teams, a proxy can manage API keys, enforce usage quotas, and isolate access, ensuring fair usage and preventing resource contention.
Vendor Abstraction and Resilience: Relying on a single LLM provider creates vendor lock-in and a single point of failure. A proxy abstracts away provider-specific APIs, allowing you to seamlessly switch between models (e.g., OpenAI, Anthropic, Google Gemini, AWS Bedrock) or even implement failover to a different provider if one goes down. NeuroLink, with its unified API across 13+ providers, makes this abstraction a core feature.
Audit Logs and Observability: Understanding how LLMs are being used is crucial for debugging, compliance, and performance optimization. A proxy acts as a central point to log all requests and responses, track latency, monitor errors, and gain insights into usage patterns.
Data Governance and Security: In sensitive environments, proxies can sanitize requests, redact Personally Identifiable Information (PII) from prompts and responses, and enforce data residency policies.
Performance Enhancement: Caching LLM responses for common or deterministic queries can significantly reduce latency and API calls, improving user experience and cutting costs.

Key Components of an AI Proxy

A robust AI proxy typically comprises several core components:

Request Router: Directs incoming LLM requests to the appropriate provider and model based on predefined rules (e.g., cost, latency, capability).
Caching Layer: Stores responses for frequently asked or deterministic queries to reduce latency and API costs.
Rate Limiting: Prevents abuse and controls spending by limiting the number of requests within a given timeframe.
Cost Tracking: Monitors token usage and API costs, providing granular insights.
Logging and Monitoring: Captures detailed logs of all interactions, errors, and performance metrics.
Security & Data Sanitization: Handles API key management, input validation, and output redaction.

Building One with NeuroLink as the Foundation

NeuroLink is designed to be the "pipe layer for the AI nervous system," making it an ideal foundation for an AI proxy. Its key features—unified API, multi-provider support, middleware system, and built-in telemetry—directly address the needs of proxy development.

Let's explore how to build some of these components using NeuroLink.

Initial Setup

First, ensure you have NeuroLink installed:

npm install @juspay/neurolink

Then, configure your NeuroLink instance with the LLM providers you want to proxy. NeuroLink allows you to define multiple providers and will intelligently select the best one.

// src/proxy.ts
import { NeuroLink, type Middleware } from "@juspay/neurolink";
import { type IncomingMessage, type ServerResponse } from "http";

// Initialize NeuroLink with your desired providers
const neurolink = new NeuroLink({
  // Configure providers with their API keys (ideally from environment variables)
  openai: { apiKey: process.env.OPENAI_API_KEY },
  anthropic: { apiKey: process.env.ANTHROPIC_API_KEY },
  googleAI: { apiKey: process.env.GOOGLE_AI_API_KEY },
  // ... add other providers
});

console.log("NeuroLink AI Proxy initialized.");

// This will be our HTTP server handler
async function handleRequest(req: IncomingMessage, res: ServerResponse) {
  // ... proxy logic goes here
}

// Example of a simple HTTP server (can be integrated with Express, Fastify, etc.)
// import * as http from 'http';
// const server = http.createServer(handleRequest);
// server.listen(3000, () => {
//   console.log('AI Proxy listening on port 3000');
// });

1. Middleware for Logging and Monitoring

NeuroLink's middleware system is perfect for implementing cross-cutting concerns like logging, cost tracking, and performance monitoring.

Let's create a logging middleware:

// src/middleware/logging.ts
import { type Middleware, type GenerateOptions, type GenerateResult } from "@juspay/neurolink";

export const loggingMiddleware: Middleware = {
  name: "logging-middleware",
  async onBeforeGenerate(options: GenerateOptions) {
    const startTime = Date.now();
    console.log(`[${this.name}] Request received:`, {
      model: options.model,
      provider: options.provider,
      input: options.input?.text?.substring(0, 100) + "...", // Log first 100 chars
      // ... other relevant options
    });
    return { ...options, __startTime: startTime }; // Attach startTime for later use
  },
  async onAfterGenerate(result: GenerateResult, options: GenerateOptions & { __startTime: number }) {
    const endTime = Date.now();
    const duration = endTime - options.__startTime;
    console.log(`[${this.name}] Request completed in ${duration}ms:`, {
      model: options.model,
      provider: options.provider,
      output: result.output.text?.substring(0, 100) + "...", // Log first 100 chars
      // ... other relevant results
    });
    // Here, you could send metrics to an observability platform like OpenTelemetry, Langfuse, etc.
    return result;
  },
  async onError(error: Error, options: GenerateOptions) {
    console.error(`[${this.name}] Request failed:`, {
      model: options.model,
      provider: options.provider,
      input: options.input?.text?.substring(0, 100) + "...",
      error: error.message,
    });
    throw error; // Re-throw the error
  },
};

// Apply the middleware to your NeuroLink instance
// neurolink.use(loggingMiddleware);

You can extend this middleware to track token usage (from result.usage), record costs, and send data to your observability platform of choice. NeuroLink also supports OpenTelemetry integration natively.

2. Caching Layer

A caching layer is crucial for optimizing performance and cost. NeuroLink's MCP (Model Context Protocol) enhancements include a built-in ToolCache. While primarily for tool calls, you can adapt a similar pattern for LLM responses or implement a custom middleware.

Here's a simplified caching middleware example:

// src/middleware/caching.ts
import { type Middleware, type GenerateOptions, type GenerateResult } from "@juspay/neurolink";
import LRUCache from "lru-cache"; // npm install lru-cache

interface CacheEntry {
  result: GenerateResult;
  timestamp: number;
}

const cache = new LRUCache<string, CacheEntry>({
  max: 1000, // Max 1000 entries
  ttl: 1000 * 60 * 5, // Cache for 5 minutes
});

export const cachingMiddleware: Middleware = {
  name: "caching-middleware",
  async onBeforeGenerate(options: GenerateOptions) {
    // Generate a cache key based on the request
    const cacheKey = JSON.stringify({
      input: options.input,
      model: options.model,
      provider: options.provider,
      // Exclude non-deterministic options like __startTime
      // You might need a more sophisticated key generation for complex scenarios
    });

    const cached = cache.get(cacheKey);
    if (cached && (Date.now() - cached.timestamp < cache.ttl!)) {
      console.log(`[${this.name}] Cache hit for key: ${cacheKey}`);
      return { ...options, __cachedResult: cached.result }; // Return cached result
    }

    console.log(`[${this.name}] Cache miss for key: ${cacheKey}`);
    return options; // Proceed with generation
  },
  async onAfterGenerate(result: GenerateResult, options: GenerateOptions & { __cachedResult?: GenerateResult }) {
    if (options.__cachedResult) {
      return options.__cachedResult; // Return the result that was found in cache
    }

    // If not from cache, store the new result
    const cacheKey = JSON.stringify({
      input: options.input,
      model: options.model,
      provider: options.provider,
    });
    cache.set(cacheKey, { result, timestamp: Date.now() });
    console.log(`[${this.name}] Stored new result in cache for key: ${cacheKey}`);
    return result;
  },
};

// Add to NeuroLink:
// neurolink.use(cachingMiddleware);

3. Request Router

NeuroLink's core functionality includes intelligent provider selection. You can configure it to automatically pick the cheapest or fastest model, or implement a custom routing logic within a middleware.

For example, to prioritize a specific model or provider for certain requests:

// src/middleware/routing.ts
import { type Middleware, type GenerateOptions } from "@juspay/neurolink";

export const routingMiddleware: Middleware = {
  name: "routing-middleware",
  async onBeforeGenerate(options: GenerateOptions) {
    // Example: Route specific keywords to a powerful but expensive model
    if (options.input?.text?.toLowerCase().includes("financial analysis")) {
      console.log(`[${this.name}] Routing "financial analysis" to gpt-4o.`);
      return { ...options, provider: "openai", model: "gpt-4o" };
    }

    // Example: Route shorter requests to a cheaper, faster model
    if (options.input?.text && options.input.text.length < 50) {
      console.log(`[${this.name}] Routing short request to gemini-3-flash.`);
      return { ...options, provider: "googleAI", model: "gemini-3-flash" };
    }

    // Default NeuroLink's auto-selection or existing provider/model in options
    return options;
  },
};

// neurolink.use(routingMiddleware);

NeuroLink also has a ToolRouter within its MCP enhancements that supports various strategies (e.g., capability-based, round-robin). While this is for tool calls, the principles can be applied to LLM routing.

4. Security: API Key Management & Request Sanitization

Your proxy should manage API keys securely and potentially sanitize user inputs.

For API key management, ensure keys are loaded from secure environment variables or a secrets manager, not hardcoded. NeuroLink handles this by default when you initialize it with process.env.YOUR_API_KEY.

For request sanitization, you can add another middleware:

// src/middleware/sanitization.ts
import { type Middleware, type GenerateOptions } from "@juspay/neurolink";

export const sanitizationMiddleware: Middleware = {
  name: "sanitization-middleware",
  async onBeforeGenerate(options: GenerateOptions) {
    if (options.input?.text) {
      // Simple example: Remove common PII patterns
      let sanitizedText = options.input.text.replace(/\d{16}/g, "[CREDIT_CARD_NUMBER]") // Credit card numbers
                                          .replace(/\b\d{3}-\d{2}-\d{4}\b/g, "[SSN]"); // Social Security Numbers
      // More robust PII detection requires NLP libraries or dedicated services

      // Prevent prompt injection (basic example)
      if (sanitizedText.toLowerCase().includes("ignore previous instructions")) {
        console.warn(`[${this.name}] Potential prompt injection detected. Blocking request.`);
        throw new Error("Invalid input: Potential prompt injection detected.");
      }

      return { ...options, input: { ...options.input, text: sanitizedText } };
    }
    return options;
  },
};

// neurolink.use(sanitizationMiddleware);

When to Build vs. Buy

Building an AI proxy provides ultimate control and customization, which is critical for complex enterprise needs, stringent security requirements, or highly specialized routing logic. However, it requires development and maintenance effort.

For many teams, especially those starting out or with simpler needs, commercial solutions like Portkey, Helicone, OpenPipe, or LiteLLM Proxy offer off-the-shelf capabilities that cover many common proxy use cases (caching, logging, cost tracking). NeuroLink itself can be seen as an SDK that complements these, allowing you to integrate with them or build similar features on top.

Consider building if:

You have unique routing logic or business rules.
You need deep integration with existing internal systems (e.g., identity, billing, audit).
You have strict compliance or security requirements that off-the-shelf solutions don't fully meet.
You want complete control over the infrastructure and data flow.
You are already using NeuroLink for unified AI access and want to leverage its ecosystem.

Consider buying if:

You need a quick, managed solution.
Your requirements are standard (basic caching, rate limiting, logging).
You want to offload infrastructure maintenance.

Conclusion

Building your own AI proxy with NeuroLink in TypeScript empowers you to gain granular control over your LLM infrastructure. From optimizing costs through intelligent routing and caching to enhancing observability with comprehensive logging and ensuring security through input sanitization, a custom proxy addresses the complex challenges of production AI.

By leveraging NeuroLink's unified API and powerful middleware system, you can develop a robust, resilient, and cost-effective AI gateway tailored to your specific needs, enabling your team to build and scale AI applications with confidence.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Why We Built NeuroLink: Making AI Development Practically Free

NeuroLink AI — Mon, 06 Apr 2026 08:39:47 +0000

Why We Built NeuroLink: Making AI Development Practically Free

How a fintech company processing millions of payments ended up building the universal AI SDK—and why we open-sourced it.

The Problem We Couldn't Ignore

At Juspay, we process millions of payments daily across India and Southeast Asia. When you're moving that much money, you don't get to experiment with "nice-to-have" AI features. Every integration has to work, scale, and comply with strict financial regulations.

In 2023, we started integrating AI across our products:

HyperSDK: AI-powered payment error detection and recovery suggestions
Breeze: One-click checkout with intelligent fraud scoring
Euler: AI-assisted merchant analytics and anomaly detection
Lighthouse: Automated alert triaging and root cause analysis

Each product team started their AI integration differently. One team used the OpenAI SDK. Another tried Anthropic. A third experiment used Google's Gemini. By Q2 2024, we had seven different AI integration patterns across our codebase.

Here's what that looked like in practice:

// Team A's OpenAI integration
import OpenAI from "openai";

// Team B's Anthropic integration
import Anthropic from "@anthropic-ai/sdk";

// Team C's Bedrock integration (for compliance)
import { BedrockRuntimeClient } from "@aws-sdk/client-bedrock-runtime";

// Team D's Vertex integration (for PDF processing)
import { VertexAI } from "@google-cloud/vertexai";

Four teams. Four SDKs. Four different error handling patterns. Four different streaming implementations. Four different authentication flows.

And the kicker? They were all doing fundamentally the same thing: sending text to an LLM and getting text back.

The Cost of Fragmentation

Our infrastructure team started seeing the pain first:

Credential Sprawl

Every SDK needed its own API key management. Some used environment variables. Others needed credential files. Bedrock required IAM roles. Vertex needed service account JSON.

Our secrets management system wasn't designed for "one key per AI provider per service." We had API keys scattered across AWS Secrets Manager, HashiCorp Vault, and (we're not proud of this) a few hardcoded in environment configs that we had to rotate in a panic.

Observability Nightmares

Want to know your total AI spend across all providers? Good luck. Each SDK had its own way of exposing token counts. Some didn't expose them at all. We ended up building a Frankenstein monitoring dashboard that queried four different APIs and tried to normalize the data.

When Claude went down for 20 minutes in March 2024, we didn't even know which services were affected because our alerting was fragmented by SDK, not unified by function.

The Onboarding Tax

New engineers joining AI-related projects needed to learn the quirks of whichever SDK that team had chosen. "Oh, you're working on Lighthouse? That's the Anthropic SDK. Here's the 12-page internal doc on how we handle streaming errors."

We were spending more time training people on SDK specifics than on AI concepts that actually mattered.

Provider Lock-In Anxiety

Every architectural decision came with a haunting question: "What if we need to switch providers later?"

OpenAI had an outage. Anthropic changed their API. Gemini launched a feature we needed. Each time, teams hesitated because switching meant rewriting integration code, retesting error handling, and retraining the team.

We weren't choosing the best AI for the job. We were choosing the AI that would minimize migration work.

The Internal Project That Changed Everything

In June 2024, a small team of three engineers got permission to build something experimental: a unified AI client that could route to any provider through a single, consistent API.

The requirements were simple:

One import regardless of which provider you used
Identical error handling across all providers
Automatic failover when a provider went down
Cost optimization without code changes
Full TypeScript safety with IntelliSense support

We called it "NeuroLink"—the idea being that AI intelligence flows like signals through a nervous system, and we needed a unified layer to carry those signals wherever they needed to go.

The Architecture Decisions That Mattered

TypeScript-First (Not TypeScript-Compatible)

Most AI SDKs are written in Python first, with TypeScript bindings added later. The types are often loose. The streaming interfaces feel bolted on.

We built NeuroLink in TypeScript from day one:

// Everything is fully typed
const result = await neurolink.generate({
  input: { text: "Hello" },
  provider: "anthropic",
  model: "claude-3-5-sonnet-20241022", // Autocomplete shows all available models
});

// result is fully typed - content, token counts, finish reason
console.log(result.content);
console.log(result.usage?.inputTokens);

No any types. No "check the documentation for response shape." If it compiles, it works.

Provider-Agnostic by Design

We didn't build an "OpenAI client with fallback." We built a unified protocol that normalizes every provider into a common interface:

// The same code works with any provider
await neurolink.generate({
  input: { text: "Analyze this" },
  provider: "openai",    // GPT-4o
});

await neurolink.generate({
  input: { text: "Analyze this" },
  provider: "anthropic", // Claude
});

await neurolink.generate({
  input: { text: "Analyze this" },
  provider: "vertex",    // Gemini
});

The differences between providers (message format, function calling syntax, error structures) are handled internally. Your code stays clean.

MCP Native from the Start

When we learned about the Model Context Protocol (MCP), we realized it was the missing piece. AI tools shouldn't be tied to a specific provider. They should be infrastructure that any AI can use.

We built MCP support directly into the core:

// Add GitHub as a tool - works with ANY provider
await neurolink.addExternalMCPServer("github", {
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-github"],
  transport: "stdio",
  env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
});

// Claude can use it
await neurolink.generate({
  input: { text: "Create a GitHub issue" },
  provider: "anthropic",
});

// So can GPT-4
await neurolink.generate({
  input: { text: "Create a GitHub issue" },
  provider: "openai",
});

Tools became portable. Teams could share MCP servers across projects without worrying about which LLM was being used.

Intelligent Orchestration

We didn't want engineers to hardcode provider choices. We wanted the system to be smart:

const neurolink = new NeuroLink({
  enableOrchestration: true,
});

// NeuroLink automatically selects the best provider
// based on cost, availability, and task complexity
const result = await neurolink.generate({
  input: { text: "Summarize this legal document" },
  // No provider specified - intelligent routing
});

The orchestration layer considers:

Cost: Use cheaper models for simple tasks
Capability: Route PDF processing to providers with native support
Availability: Fail over automatically during outages
Latency: Choose the fastest provider for real-time features

Engineers stopped thinking about "which provider" and started thinking about "what task."

From Internal Tool to Open Source

By August 2024, NeuroLink was powering AI features across all Juspay products. New integrations that used to take 2-3 weeks were taking 2-3 hours. The math was undeniable.

But we kept thinking: "Every company building with AI is facing this same fragmentation problem."

The decision to open-source wasn't just about being good open-source citizens (though that mattered). It was about creating a standard. If we wanted to hire engineers who already knew NeuroLink, we needed to release it. If we wanted vendors to integrate with our tooling, we needed to be open.

In September 2024, we released NeuroLink on GitHub under the MIT license.

The Impact: Before and After

Here's what changed at Juspay after NeuroLink became our standard:

Metric	Before	After
New AI integration time	2-3 weeks	2-3 hours
Lines of integration code per feature	500+	~50
Provider switch cost	Full rewrite	1 parameter change
Credential management	7 different systems	1 unified config
Onboarding time	3 days (SDK training)	30 minutes
Production incidents (AI-related)	12/quarter	2/quarter

The incident reduction was the surprise benefit. When you have one error handling pattern instead of seven, you get really good at handling those errors.

The Vision: AI Should Be Infrastructure, Not Integration

We're building toward a future where AI is as easy to use as any other infrastructure service.

Think about databases. You don't import pg-sdk, mysql-sdk, and mongo-sdk in the same project. You use an ORM or a query builder that abstracts the differences. You choose PostgreSQL or MySQL based on your needs, not based on which SDK you prefer.

AI should work the same way. The provider is an implementation detail. Your code should focus on the task, not the transport layer.

NeuroLink is our step toward that future:

13+ providers unified under one API
58+ MCP tools that work everywhere
TypeScript-first design for developer confidence
Production-ready features like Redis memory and HITL workflows
Cost optimization that happens automatically

Try What We Built

# Install and setup in under 5 minutes
npm install @juspay/neurolink
npx @juspay/neurolink setup

# Generate with automatic provider selection
npx @juspay/neurolink generate "Hello from NeuroLink"

# Or use it in your TypeScript project
import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();
const result = await neurolink.generate({
  input: { text: "Your prompt here" },
});

From weeks of integration work to hours. From SDK complexity to clean abstraction. From provider lock-in to complete flexibility.

That's why we built NeuroLink. And that's why we think you'll want to use it.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Stop Using 5 Different AI SDKs in Your TypeScript Project

NeuroLink AI — Mon, 06 Apr 2026 08:38:49 +0000

Stop Using 5 Different AI SDKs in Your TypeScript Project

You're creating tech debt for no reason. Here's how to fix it.

Let me guess: your package.json looks something like this right now:

{
  "dependencies": {
    "openai": "^4.0.0",
    "@anthropic-ai/sdk": "^0.24.0",
    "@google/generative-ai": "^0.21.0",
    "@aws-sdk/client-bedrock-runtime": "^3.0.0",
    "@azure/openai": "^2.0.0"
  }
}

Five different SDKs. Five different import styles. Five different response formats. Five different ways to handle streaming. Five different error handling patterns.

And for what? To talk to LLMs that fundamentally do the same thing: take text in, return text out.

You're not being pragmatic. You're being a pack rat, collecting SDKs like they're going out of style. Let's talk about why this is costing you more than you think.

The Real Cost of SDK Fragmentation

Bundle Size Bloating

Each SDK adds weight. OpenAI's SDK alone is ~200KB. Anthropic's is another ~150KB. By the time you've imported all five, you've added nearly a megabyte to your bundle just for HTTP wrappers around JSON APIs.

// Your current bundle impact:
// openai: ~200KB
// @anthropic-ai/sdk: ~150KB
// @google/generative-ai: ~180KB
// @aws-sdk/client-bedrock-runtime: ~300KB
// @azure/openai: ~250KB
// Total: ~1.08MB of SDK overhead

That's before you write a single line of application code.

The Mental Model Tax

Every SDK has its own quirks:

SDK	Streaming Pattern	Error Shape	Auth Method
OpenAI	`for await...of`	`error.message`	`apiKey` param
Anthropic	`stream.on()`	`error.error.message`	`anthropicApiKey` header
Google AI	`async generator`	`error.message`	`genAI.getGenerativeModel()`
Bedrock	`response.body`	SDK-specific	AWS credentials
Azure	`stream.iterator()`	Nested error	`azureApiKey` + endpoint

You need to remember which is which. Your team needs documentation for each. Code reviews become a game of "did you handle the Anthropic error format correctly this time?"

Inconsistent Error Handling

Here's what error handling looks like across different SDKs:

// OpenAI
try {
  const response = await openai.chat.completions.create({...});
} catch (error) {
  // error is an APIError with nested props
  console.log(error.message);
  console.log(error.code); // rate_limit_exceeded, etc.
}

// Anthropic
try {
  const response = await anthropic.messages.create({...});
} catch (error) {
  // error is an AnthropicError
  // need error.error for details
  console.log(error.error?.message);
  console.log(error.error?.type); // rate_limit_error, etc.
}

// Google AI
try {
  const result = await model.generateContentStream(...);
} catch (error) {
  // Google wraps errors differently
  console.log(error.message);
  // No standardized error codes
}

You end up writing adapter layers anyway. So why not use one that's already built?

Testing Multiplies

Every SDK needs its own test setup. Mocking OpenAI responses? Different from mocking Anthropic. Testing streaming? Five different patterns to validate. Integration tests? You need real credentials for each provider.

Your CI pipeline thanks you for the complexity.

The Provider Switching Myth

"But I need to support multiple providers for redundancy!"

Sure. But you don't need five SDKs for that. You need one SDK that understands how to route between providers. The abstraction should happen at the integration layer, not in your application code.

Here's what "provider redundancy" looks like with multiple SDKs:

// The nightmare you wrote
async function generateWithFallback(prompt: string) {
  try {
    return await callOpenAI(prompt);
  } catch (e) {
    console.log("OpenAI failed, trying Anthropic...");
    try {
      return await callAnthropic(prompt);
    } catch (e) {
      console.log("Anthropic failed, trying Google...");
      return await callGoogle(prompt);
    }
  }
}

Nested try-catch hell. Hardcoded fallback order. No cost optimization. No intelligent routing. Just desperation-driven retry logic.

The Unified Alternative

What if you could do this instead?

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Works with any provider - same API
const result = await neurolink.generate({
  input: { text: "Explain quantum computing" },
  provider: "openai", // or "anthropic", "vertex", "bedrock", "azure"...
});

console.log(result.content);

That's it. Same code. Same error handling. Same streaming pattern. Just change one parameter to switch providers.

Streaming: One Pattern, Every Provider

Remember the streaming chaos? Here's what unified streaming looks like:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Same streaming pattern for ALL 13 providers
const result = await neurolink.stream({
  input: { text: "Write a story" },
  provider: "anthropic", // or any other provider
});

for await (const chunk of result.stream) {
  if ("content" in chunk) {
    process.stdout.write(chunk.content);
  }
}

No more memorizing stream.on('data') vs for await...of vs response.body.pipe(). One pattern. Every provider.

Automatic Provider Fallback (That Actually Works)

const neurolink = new NeuroLink({
  enableOrchestration: true, // Smart routing + failover
});

// NeuroLink automatically:
// 1. Selects the optimal provider
// 2. Falls back if one fails
// 3. Optimizes for cost when appropriate
const result = await neurolink.generate({
  input: { text: "Analyze this data" },
  // No provider specified - intelligent auto-selection
});

No nested try-catch. No manual failover logic. No hardcoded provider preferences. Just intelligent routing that works.

The Bundle Size Reality Check

NeuroLink: ~150KB total for 13+ providers.

Your current setup: ~1MB+ for 5 providers.

And with NeuroLink, adding provider #6, #7, #13 costs you zero additional bundle size. The provider routing happens server-side or through a unified client. You're not importing SDK bloat for providers you might use once a month.

Error Handling That Makes Sense

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

try {
  const result = await neurolink.generate({
    input: { text: "Hello" },
    provider: "openai",
  });
} catch (error) {
  // Same error structure regardless of provider
  console.log(error.message);
  console.log(error.provider); // Which provider failed
  console.log(error.code); // Standardized error codes
  console.log(error.retryable); // Can we retry?
}

One error format. Standardized codes. Provider-agnostic handling. Your error monitoring tools will thank you.

Tools Without the Configuration Tax

Adding tools to OpenAI vs Anthropic? Different parameter structures. Different function calling formats. Different response parsing.

With NeuroLink:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  tools: [
    {
      name: "getWeather",
      description: "Get weather for a location",
      parameters: {
        type: "object",
        properties: {
          location: { type: "string" },
        },
      },
      execute: async ({ location }) => {
        return await fetchWeather(location);
      },
    },
  ],
});

// Works identically across all providers
const result = await neurolink.generate({
  input: { text: "What's the weather in Tokyo?" },
  provider: "vertex", // or "anthropic", "bedrock", etc.
});

One tool definition. Universal compatibility. No provider-specific format conversions.

MCP: The Tool Ecosystem You Didn't Know You Needed

NeuroLink ships with 6 built-in tools and supports 58+ external MCP servers:

// GitHub MCP server - works across all providers
await neurolink.addExternalMCPServer("github", {
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-github"],
  transport: "stdio",
  env: { GITHUB_TOKEN: process.env.GITHUB_TOKEN },
});

// Now AI can create issues, list repos, create PRs
const result = await neurolink.generate({
  input: { text: 'Create a GitHub issue for this bug' },
  provider: "anthropic", // MCP works with any provider
});

Your tools aren't tied to a provider. They're infrastructure.

What You're Actually Defending

When you say "I need separate SDKs for flexibility," what you're actually saying is:

"I enjoy writing adapter code"
"I like debugging why Anthropic's error format broke my handler again"
"Bundle size doesn't matter" (it does)
"My team enjoys context-switching between 5 documentation sites"
"I prefer writing 5 different test mocks"

You're not preserving flexibility. You're preserving complexity for its own sake.

The Migration Path

"But I already have code using these SDKs!"

Fine. Keep it. But ask yourself: every new feature you build, every new AI integration you add—do you want to keep multiplying your SDK dependencies? Or do you want to consolidate?

// Legacy code - keep it working
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// New code - use NeuroLink
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();

// Gradually migrate. No big-bang rewrite required.

Start new features with NeuroLink. Migrate legacy code when you touch it. In 6 months, you'll wonder why you ever managed five SDKs.

The Hard Truth

The AI landscape is consolidating. Providers are becoming commodities. The value isn't in which LLM you use—it's in how you use them.

Your job isn't to be an expert in OpenAI's SDK quirks or Anthropic's response format. Your job is to build products that solve problems. Every hour spent debugging SDK differences is an hour not spent on your actual product.

Stop collecting SDKs like Pokémon cards. Start building with a unified platform.

Try It In 30 Seconds

# Install once, get 13+ providers
npm install @juspay/neurolink

# Run the setup wizard (configures your API keys)
npx @juspay/neurolink setup

# Generate with any provider
npx @juspay/neurolink generate "Hello world" --provider openai
npx @juspay/neurolink generate "Hello world" --provider anthropic
npx @juspay/neurolink generate "Hello world" --provider vertex

Same command. Same interface. Different providers. Zero cognitive overhead.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Building a Slack AI Assistant with NeuroLink: From Prototype to Production

NeuroLink AI — Mon, 06 Apr 2026 08:37:49 +0000

Building a Slack AI Assistant with NeuroLink: From Prototype to Production

Internal support consumes engineering time. At Juspay, our 500+ engineers constantly asked questions like:

"What's the status of the Euler payment API?"
"How do I get credentials for the sandbox environment?"
"Who owns the HyperSDK Android module?"
"Deploy the latest Breeze release to staging"

These questions needed answers, but pulling engineers from deep work was expensive. We needed an AI assistant that could:

Answer questions using our internal knowledge
Execute actions (deployments, credential provisioning)
Remember conversation context across sessions
Integrate with our existing tools (Jira, Bitbucket, Kubernetes)

Meet Tara — our Slack AI assistant built with NeuroLink and Claude Sonnet.

Architecture Overview

┌──────────────┐     ┌──────────────┐     ┌─────────────────┐
│   Slack      │────▶│  Slack Bolt  │────▶│   Tara Service  │
│   Message    │     │   App        │     │   (FastAPI)     │
└──────────────┘     └──────────────┘     └────────┬────────┘
                                                   │
                          ┌────────────────────────┼────────────────────────┐
                          ▼                        ▼                        ▼
                   ┌──────────────┐      ┌─────────────────┐      ┌──────────────┐
                   │  NeuroLink   │      │  MCP Servers    │      │   Redis      │
                   │  SDK         │      │  - Jira         │      │   Memory     │
                   │  (Claude)    │      │  - Bitbucket    │      │              │
                   └──────────────┘      │  - K8s          │      └──────────────┘
                                          └─────────────────┘

Getting Started: The Prototype

Our first version was surprisingly simple. Here's the core loop:

import { NeuroLink } from "@juspay/neurolink";
import { App } from "@slack/bolt";

// Initialize NeuroLink with Claude
const neurolink = new NeuroLink({
  conversationMemory: {
    enabled: true,
    enableSummarization: true, // Auto-summarize long conversations
  },
});

// Slack Bolt app
const slack = new App({
  token: process.env.SLACK_BOT_TOKEN,
  signingSecret: process.env.SLACK_SIGNING_SECRET,
});

// Handle direct messages and mentions
slack.event("app_mention", async ({ event, say }) => {
  await handleMessage(event.user, event.text, say);
});

slack.event("message", async ({ event, say }) => {
  if (event.channel_type === "im") {
    await handleMessage(event.user, event.text, say);
  }
});

Conversation Handling with Memory

The magic of Tara is maintaining context. NeuroLink's conversation memory handles this automatically:

async function handleMessage(
  userId: string,
  text: string,
  say: (text: string) => Promise<void>
) {
  // Stream the response for better UX
  const result = await neurolink.stream({
    input: { text },
    provider: "anthropic",
    model: "claude-4-sonnet",
    user: userId, // Enables per-user memory automatically
    system: `You are Tara, Juspay's AI assistant. You help engineers with:
             - Finding documentation and code
             - Checking deployment status
             - Answering questions about services
             - Creating Jira tickets and PRs

             Be concise and helpful. If you need to take action,
             use the available tools.`,
    enableOrchestration: true, // Allow tool use
  });

  // Stream chunks back to Slack
  let response = "";
  for await (const chunk of result.stream) {
    if ("content" in chunk) {
      response += chunk.content;
      // Update Slack message every few tokens
      if (response.length % 100 === 0) {
        await say(response + "⏳");
      }
    }
  }

  await say(response);
}

Adding Tool Capabilities

Tara becomes powerful when she can actually do things. We added MCP servers for our internal tools:

// Jira integration for ticket creation
await neurolink.addExternalMCPServer("jira", {
  transport: "stdio",
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-jira"],
  env: {
    JIRA_TOKEN: process.env.JIRA_TOKEN,
    JIRA_HOST: "https://juspay.atlassian.net",
  },
});

// Kubernetes for deployment status
await neurolink.addExternalMCPServer("k8s", {
  transport: "stdio",
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-kubernetes"],
});

// Internal API server (custom MCP)
await neurolink.addExternalMCPServer("juspay-api", {
  transport: "http",
  url: "https://internal-api.juspay.net/mcp",
  headers: {
    Authorization: `Bearer ${process.env.INTERNAL_API_TOKEN}`,
  },
});

Now users can say things like:

"Create a Jira ticket for the HyperSDK crash on Android"

And Tara will:

Use the Jira tool to create the ticket
Return the ticket URL
Remember the ticket ID for follow-up questions

Structured Commands with Zod

For common operations, we use structured output to ensure reliability:

const DeploymentRequest = z.object({
  service: z.enum(["euler", "breeze", "hyper-sdk", "neurolink"]),
  environment: z.enum(["dev", "staging", "prod"]),
  version: z.string(),
  confirm: z.boolean(),
});

async function handleDeploymentRequest(userId: string, text: string) {
  const result = await neurolink.generate({
    input: {
      text: `Parse this deployment request: "${text}"`,
    },
    provider: "anthropic",
    model: "claude-4-haiku",
    schema: DeploymentRequest,
    output: { format: "json" },
  });

  const deployment = result.parsed as z.infer<typeof DeploymentRequest>;

  if (!deployment.confirm) {
    return `You want to deploy ${deployment.service} v${deployment.version} to ${deployment.environment}. Confirm with "yes"?`;
  }

  // Execute deployment via MCP
  await neurolink.generate({
    input: {
      text: `Deploy ${deployment.service} version ${deployment.version} to ${deployment.environment}`,
    },
  });

  return `✅ Deployment initiated for ${deployment.service} v${deployment.version} to ${deployment.environment}`;
}

Multi-Modal Support: Screenshots and Logs

Engineers often share screenshots of errors or paste log snippets. Tara handles these with NeuroLink's multimodal capabilities:

slack.event("message", async ({ event, say }) => {
  if (event.files && event.files.length > 0) {
    // Download files
    const filePaths = await downloadSlackFiles(event.files);

    const result = await neurolink.generate({
      input: {
        text: "What's in this screenshot? If it's an error, suggest fixes.",
        files: filePaths,
      },
      provider: "google-ai",
      model: "gemini-2.5-pro", // Vision-capable model
    });

    await say(result.content);
  }
});

Advanced Features

1. RAG for Documentation

Tara answers questions about our internal docs using RAG:

const answer = await neurolink.generate({
  input: { text: "How does the Euler payment flow work?" },
  rag: {
    files: [
      "./docs/euler/architecture.md",
      "./docs/euler/payment-flow.md",
      "./docs/euler/webhooks.md",
    ],
    strategy: "markdown",
    topK: 5,
  },
});

2. Human-in-the-Loop for Sensitive Actions

For destructive operations, we require approval:

const neurolink = new NeuroLink({
  hitl: {
    enabled: true,
    requireApproval: ["deployToProduction", "deleteDatabase", "revokeCredentials"],
    reviewCallback: async (action, context) => {
      // Post to admin Slack channel for approval
      return await requestSlackApproval(action, context.user);
    },
  },
});

3. Cost Optimization with Model Routing

We use different models for different tasks:

// Simple queries: fast, cheap model
const quickAnswer = await neurolink.generate({
  input: { text: "What time is it in Bangalore?" },
  provider: "google-ai",
  model: "gemini-2.5-flash",
});

// Complex analysis: reasoning model
const architectureReview = await neurolink.generate({
  input: { text: "Review this system design..." },
  provider: "anthropic",
  model: "claude-4-opus",
  thinkingConfig: { thinkingLevel: "high" },
});

Production Deployment

We run Tara as a containerized service with the following configuration:

// production-config.ts
export const taraConfig = {
  neurolink: {
    conversationMemory: {
      enabled: true,
      redisConfig: {
        host: process.env.REDIS_HOST,
        port: 6379,
        ttl: 86400 * 30, // 30-day retention
      },
    },
    // Multi-provider failover
    fallbackProviders: ["anthropic", "google-ai", "vertex"],
  },
  slack: {
    port: 3000,
    logLevel: "info",
  },
  // Rate limiting per user
  rateLimit: {
    requestsPerMinute: 20,
    burstSize: 5,
  },
};

Results

After deploying Tara to our engineering organization:

Metric	Before	After
Avg. support response time	4 hours	30 seconds
Tickets created correctly	N/A	98%
Engineer satisfaction	65%	92%
Cost per interaction	$2.50 (human)	$0.03 (AI)

Key Learnings

Conversation memory is essential: Users expect context continuity. Redis-backed memory made Tara feel truly intelligent.
Streaming improves perception: Even if total time is the same, streaming responses feel faster and more engaging.
Tool use requires guardrails: Start with read-only tools, add write operations gradually with HITL.
Model selection matters: Routing simple queries to cheaper models cut costs by 75% without quality loss.
MCP > Custom integrations: Using standard MCP servers for Jira, K8s, etc. meant we spent days, not weeks, on integrations.

Getting Started

Want to build your own Slack assistant? Here's the minimal setup:

import { NeuroLink } from "@juspay/neurolink";
import { App } from "@slack/bolt";

const neurolink = new NeuroLink({
  conversationMemory: { enabled: true },
});

const slack = new App({
  token: process.env.SLACK_BOT_TOKEN,
  signingSecret: process.env.SLACK_SIGNING_SECRET,
});

slack.event("message", async ({ event, say }) => {
  const result = await neurolink.stream({
    input: { text: event.text },
    provider: "anthropic",
    model: "claude-4-sonnet",
    user: event.user,
  });

  let response = "";
  for await (const chunk of result.stream) {
    if ("content" in chunk) response += chunk.content;
  }
  await say(response);
});

await slack.start(3000);

Conclusion

Building Tara with NeuroLink let us create a production-ready AI assistant in days, not months. The combination of Claude's reasoning, NeuroLink's memory management, and MCP's tool ecosystem gave us everything we needed to automate internal support at scale.

If you're considering an internal AI assistant, start with NeuroLink — the unified API means you can experiment with different models and tools without rewriting your integration code.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

How We Built an AI Code Reviewer with NeuroLink and Bitbucket

NeuroLink AI — Mon, 06 Apr 2026 08:36:50 +0000

How We Built an AI Code Reviewer with NeuroLink and Bitbucket

At Juspay, we process thousands of pull requests across 100+ repositories every month. Code review bottlenecks were slowing our release velocity, and we needed a solution that could:

Understand our domain-specific patterns and conventions
Integrate seamlessly with Bitbucket and Jira
Learn from past reviews to improve over time
Run entirely within our infrastructure for security

Enter Yama — our AI-native code review tool built on NeuroLink, the universal AI SDK for TypeScript. This is the story of how we built it.

The Architecture Decision

We evaluated several approaches:

Off-the-shelf AI code review tools: Great for generic checks, but couldn't understand our Haskell payment systems or custom conventions
Direct LLM API integration: Would require building provider abstraction, memory management, and tool integration from scratch
NeuroLink with MCP: Best of both worlds — provider flexibility + standardized tool integration

We chose NeuroLink because it gave us:

13 AI providers under one API (we use Claude for reasoning, Gemini for cost-effective checks)
MCP (Model Context Protocol) for Bitbucket/Jira integration
Conversation memory for learning reviewer preferences
Streaming responses for real-time progress updates

Core Architecture

┌─────────────────┐     ┌──────────────┐     ┌─────────────────┐
│  Bitbucket PR   │────▶│   Yama API   │────▶│   NeuroLink     │
│   Webhook       │     │   (Node.js)  │     │   SDK           │
└─────────────────┘     └──────────────┘     └────────┬────────┘
       │                                              │
       ▼                                              ▼
┌─────────────────┐                         ┌─────────────────┐
│  Jira Issues    │                         │  MCP Servers    │
│  (context)      │                         │  - Bitbucket    │
│                 │                         │  - Jira         │
└─────────────────┘                         └─────────────────┘

Building the Review Pipeline

1. Setting Up NeuroLink with MCP Integration

First, we initialize NeuroLink with our MCP servers for Bitbucket and Jira:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  conversationMemory: {
    enabled: true,
    redisConfig: {
      host: process.env.REDIS_HOST,
      port: 6379,
      ttl: 86400 * 7, // Keep PR context for a week
    },
  },
});

// Connect to Bitbucket MCP server
await neurolink.addExternalMCPServer("bitbucket", {
  transport: "stdio",
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-bitbucket"],
  env: {
    BITBUCKET_TOKEN: process.env.BITBUCKET_TOKEN,
    BITBUCKET_WORKSPACE: "juspay",
  },
});

// Connect to Jira for ticket context
await neurolink.addExternalMCPServer("jira", {
  transport: "stdio",
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-jira"],
  env: {
    JIRA_TOKEN: process.env.JIRA_TOKEN,
    JIRA_HOST: "https://juspay.atlassian.net",
  },
});

2. Fetching PR Context

When a webhook fires, we gather all relevant context:

interface PRContext {
  prId: string;
  repoSlug: string;
  branch: string;
  author: string;
  jiraTicket?: string;
}

async function gatherPRContext(
  neurolink: NeuroLink,
  ctx: PRContext
): Promise<string> {
  // Let the AI use MCP tools to fetch PR data
  const result = await neurolink.generate({
    input: {
      text: `Fetch the diff, files changed, and description for PR ${ctx.prId}
             in repo ${ctx.repoSlug}. Also fetch related Jira ticket ${ctx.jiraTicket}.`,
    },
    provider: "anthropic",
    model: "claude-4-sonnet",
    enableOrchestration: true, // Let AI decide which tools to call
  });

  return result.content;
}

3. The Multi-Stage Review Engine

Yama performs reviews in three stages, each with different models for cost optimization:

async function performCodeReview(
  neurolink: NeuroLink,
  prContext: string,
  files: string[]
): Promise<ReviewComment[]> {
  const comments: ReviewComment[] = [];

  // Stage 1: Security scan (fast, cheap model)
  const securityResult = await neurolink.generate({
    input: {
      text: `Analyze this PR for security issues:
             ${prContext}

             Check for:
             - Hardcoded secrets or credentials
             - SQL injection vulnerabilities
             - Unsafe file operations
             - Authentication bypasses`,
    },
    provider: "google-ai",
    model: "gemini-2.5-flash", // Fast and cost-effective
    schema: z.object({
      issues: z.array(z.object({
        severity: z.enum(["critical", "high", "medium", "low"]),
        file: z.string(),
        line: z.number(),
        description: z.string(),
        suggestion: z.string(),
      })),
    }),
    output: { format: "json" },
  });

  comments.push(...parseSecurityIssues(securityResult));

  // Stage 2: Architecture review (reasoning model)
  const archResult = await neurolink.stream({
    input: {
      text: `Review this PR for architectural concerns:
             ${prContext}

             Consider our conventions:
             - Haskell services should use EulerHS patterns
             - Database queries must use Beam ORM
             - API responses follow Juspay standard format`,
      files: files.filter(f => f.endsWith(".hs") || f.endsWith(".ts")),
    },
    provider: "anthropic",
    model: "claude-4-sonnet",
    thinkingConfig: {
      thinkingLevel: "medium", // Enable extended reasoning
    },
  });

  // Stream architecture review in real-time
  for await (const chunk of archResult.stream) {
    if ("content" in chunk) {
      process.stdout.write(chunk.content);
    }
  }

  // Stage 3: Style and conventions (cached model)
  const styleResult = await neurolink.generate({
    input: {
      text: `Check style compliance. Be concise.`,
      files: files,
    },
    provider: "google-ai",
    model: "gemini-2.5-flash",
    rag: {
      files: ["./docs/coding-standards.md", "./docs/style-guide.md"],
      strategy: "markdown",
      topK: 3,
    },
  });

  return comments;
}

4. Posting Review Comments

Using the Bitbucket MCP tool to post comments:

async function postReviewComments(
  neurolink: NeuroLink,
  prId: string,
  comments: ReviewComment[]
) {
  for (const comment of comments) {
    await neurolink.generate({
      input: {
        text: `Post this review comment to PR ${prId}:
               File: ${comment.file}
               Line: ${comment.line}
               Comment: ${comment.description}

               ${comment.suggestion ? `Suggestion: ${comment.suggestion}` : ""}`,
      },
      provider: "anthropic",
      // MCP tool will be automatically invoked
    });
  }
}

Learning from Feedback

Yama improves over time by learning from developer feedback. When a reviewer dismisses or modifies a Yama comment, we capture that signal:

async function learnFromFeedback(
  neurolink: NeuroLink,
  originalComment: ReviewComment,
  reviewerAction: "accepted" | "modified" | "dismissed",
  reviewerNote?: string
) {
  // Store feedback in Redis memory for future context
  await neurolink.generate({
    input: {
      text: `Learning from review feedback:
             Original: ${originalComment.description}
             Action: ${reviewerAction}
             Note: ${reviewerNote || "None"}

             Adjust future recommendations accordingly.`,
    },
    provider: "anthropic",
    model: "claude-4-haiku",
  });
}

Results & Lessons Learned

After 6 months in production:

70% reduction in trivial review comments (style, formatting)
40% faster PR turnaround time
Zero security issues missed in production (caught during review)
$0.12 average cost per PR review (using cost-optimized model routing)

Key Lessons

Multi-model strategy works: Using cheaper models for simple checks and expensive ones for complex reasoning cut costs by 80%
MCP is a game-changer: Tool integration that "just works" across providers saved us weeks of integration work
Memory matters: Per-PR conversation context dramatically improved review quality over stateless approaches
Streaming UX: Real-time progress updates made developers trust the system more

The Code

Yama is now part of our internal tooling suite. Here's the complete minimal setup if you want to build something similar:

import { NeuroLink } from "@juspay/neurolink";
import { z } from "zod";

// Initialize
const yama = new NeuroLink({
  conversationMemory: { enabled: true },
});

// Add your MCP servers
await yama.addExternalMCPServer("bitbucket", {
  transport: "stdio",
  command: "npx",
  args: ["-y", "@modelcontextprotocol/server-bitbucket"],
  env: { BITBUCKET_TOKEN: process.env.BITBUCKET_TOKEN },
});

// Review webhook handler
export async function handlePRWebhook(payload: PRWebhook) {
  const context = await gatherPRContext(yama, payload);
  const comments = await performCodeReview(yama, context, payload.files);
  await postReviewComments(yama, payload.prId, comments);
}

Conclusion

Building Yama with NeuroLink let us focus on the review logic instead of AI infrastructure. The combination of provider flexibility, MCP tool integration, and conversation memory made it possible to ship a production-grade code review system in weeks, not months.

If you're building AI-powered developer tools, NeuroLink's unified API and MCP ecosystem will save you significant engineering time — it certainly did for us.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Semantic Search with TypeScript: Using embed() and embedMany() for Vector Search

NeuroLink AI — Mon, 06 Apr 2026 08:35:51 +0000

Semantic Search with TypeScript: Using embed() and embedMany() for Vector Search

In the age of information overload, keyword-based search often falls short. Users aren't just looking for exact matches; they're looking for meaning. This is where semantic search shines, allowing systems to understand the intent behind a query and retrieve results that are conceptually similar, even if they don't contain the exact keywords.

At the heart of semantic search lies the concept of embeddings – dense numerical representations of text that capture its meaning. NeuroLink, the universal AI SDK for TypeScript, simplifies the process of generating and utilizing these embeddings, making it straightforward to build powerful semantic search capabilities into your applications.

This article will guide you through generating embeddings with NeuroLink's embed() and embedMany() methods, performing similarity search, and integrating with vector databases to build a complete semantic search engine.

What are Embeddings and Why Are They Crucial for Semantic Search?

Imagine mapping every word, sentence, or document into a multi-dimensional space where items with similar meanings are located closer to each other. That's essentially what an embedding model does. Each piece of text is transformed into a fixed-size vector (a list of numbers) that encapsulates its semantic properties.

For semantic search, this means:

Understanding Context: A search for "car repair" can return results about "automobile maintenance" or "vehicle servicing" even if the exact phrase isn't present.
Ranking Relevance: Results can be ranked based on their semantic similarity to the query, providing more relevant outcomes.
Bridging Vocabulary Gaps: It overcomes issues arising from synonyms, paraphrases, or different ways of expressing the same idea.

NeuroLink provides a unified API to generate these crucial vectors from various leading AI providers.

Generating Embeddings with NeuroLink: `embed()` and `embedMany()`

NeuroLink offers two primary methods for generating embeddings via its ProviderFactory interface: embed() for single text strings and embedMany() for efficient batch processing.

`provider.embed(text, modelName?)`: Single Text Embedding

The embed() method takes a single string of text and returns its corresponding embedding vector.

import { ProviderFactory } from "@juspay/neurolink";

async function getEmbedding(text: string) {
  // Create an OpenAI provider instance for embedding.
  // NeuroLink supports OpenAI, Google AI Studio, Google Vertex, and Amazon Bedrock for embeddings.
  const provider = await ProviderFactory.createProvider("openai");

  // Generate the embedding vector
  const vector = await provider.embed(text);

  console.log(`Text: "${text}"`);
  console.log(`Embedding dimension: ${vector.length}`);
  // console.log("Embedding vector (first 5 elements):", vector.slice(0, 5));

  return vector;
}

// Example usage
const queryEmbedding = await getEmbedding("How do I reset my password?");
const documentEmbedding = await getEmbedding("Troubleshooting password issues");

Key Parameters:

text (string, required): The input text to be embedded.
modelName (string, optional): Allows you to override the default embedding model for the chosen provider. For example, text-embedding-3-small for OpenAI or gemini-embedding-001 for Google AI Studio.

`provider.embedMany(texts, modelName?)`: Batch Embedding for Efficiency

For scenarios involving multiple documents or a large corpus, embedMany() is significantly more efficient. It accepts an array of text strings and returns an array of corresponding embedding vectors. NeuroLink (via Vercel AI SDK) intelligently handles batching for providers that have batch size limits.

import { ProviderFactory } from "@juspay/neurolink";

async function getManyEmbeddings(texts: string[]) {
  const provider = await ProviderFactory.createProvider("googleAiStudio");

  // Generate embeddings for multiple texts in a single API call
  const embeddings = await provider.embedMany(texts);

  console.log(`Generated ${embeddings.length} embeddings.`);
  embeddings.forEach((emb, index) => {
    console.log(`Embedding ${index + 1} dimension: ${emb.length}`);
  });

  return embeddings;
}

// Example usage with multiple document snippets
const documents = [
  "NeuroLink provides a unified API for 13+ AI providers.",
  "Semantic search helps find documents by meaning, not just keywords.",
  "Event-driven AI applications can leverage lifecycle hooks for analytics.",
];

const documentEmbeddings = await getManyEmbeddings(documents);

Key Parameters:

texts (string[], required): An array of text strings to be embedded.
modelName (string, optional): Same as embed(), allows overriding the default embedding model.

Supported Providers and Model Selection

NeuroLink integrates with several top-tier embedding providers:

OpenAI: Uses models like text-embedding-3-small (default) or text-embedding-3-large.
Google AI Studio: Uses gemini-embedding-001 (default).
Google Vertex: Uses text-embedding-004 (default).
Amazon Bedrock: Uses models like amazon.titan-embed-text-v2:0 (default).

You can configure the default models using environment variables (e.g., OPENAI_EMBEDDING_MODEL, VERTEX_EMBEDDING_MODEL) or directly within the embed()/embedMany() calls.

For providers that do not natively support embeddings (e.g., Anthropic, Mistral), you can still use NeuroLink for text generation and then use a separate NeuroLink provider instance configured for an embedding-capable provider to handle your embedding needs.

import { ProviderFactory } from "@juspay/neurolink";

// Use Anthropic for chat generation
const chatProvider = await ProviderFactory.createProvider("anthropic");
const response = await chatProvider.generate({
  input: { text: "Tell me a story about a wizard." },
});

// Use OpenAI for embeddings, independently
const embedProvider = await ProviderFactory.createProvider("openai");
const storyEmbedding = await embedProvider.embed(response.text);

Building a Semantic Search Engine: From Embeddings to Vector Databases

Once you have embeddings, the next step is to store them and perform similarity searches. This typically involves a vector database (or vector store), which is optimized for storing and querying high-dimensional vectors.

The general workflow for building a semantic search engine looks like this:

Index Documents:
- Take your corpus of documents (e.g., articles, product descriptions, support tickets).
- Chunk them into manageable segments if they are large.
- Use embedMany() to generate embeddings for each segment.
- Store these embeddings, along with their original text and any metadata, in a vector database.
Query and Retrieve:
- When a user submits a query, use embed() to generate an embedding for the query.
- Query the vector database to find document embeddings that are most "similar" to the query embedding. Similarity is usually calculated using distance metrics like cosine similarity.
- Retrieve the original text segments corresponding to the most similar embeddings.

Practical Code Example: In-Memory Semantic Search

Let's illustrate with a basic in-memory vector store, focusing on the core embedding and similarity logic. For production systems, you would integrate with dedicated vector databases like Pinecone, Weaviate, or ChromaDB.

import { NeuroLink, ProviderFactory, InMemoryVectorStore } from "@juspay/neurolink";

// 1. Prepare your documents
const articles = [
  { id: "article1", text: "NeuroLink simplifies AI integration across 13 providers." },
  { id: "article2", text: "Vector embeddings enable semantic search by capturing meaning." },
  { id: "article3", text: "Lifecycle hooks in NeuroLink's middleware system manage AI events." },
  { id: "article4", text: "Building reactive applications with event-driven AI architectures." },
  { id: "article5", text: "The importance of cost tracking and analytics in AI applications." },
];

async function buildSemanticSearchEngine() {
  const embedProvider = await ProviderFactory.createProvider("openai");
  const vectorStore = new InMemoryVectorStore(); // For simplicity, use in-memory

  // 2. Generate embeddings and index documents
  console.log("Indexing documents...");
  const textsToEmbed = articles.map((a) => a.text);
  const embeddings = await embedProvider.embedMany(textsToEmbed);

  const documentNodes = articles.map((article, index) => ({
    id: article.id,
    vector: embeddings[index],
    metadata: { text: article.text },
  }));
  await vectorStore.upsert("docs_index", documentNodes); // Store embeddings in an index
  console.log("Documents indexed successfully.");

  // Function to perform semantic search
  async function semanticSearch(query: string, topK: number = 2) {
    console.log(`Searching for: "${query}"`);
    const queryVector = await embedProvider.embed(query);

    // 3. Query the vector store for similar documents
    const searchResults = await vectorStore.query("docs_index", queryVector, topK);

    console.log("Top results:");
    searchResults.forEach((result, i) => {
      console.log(
        `  ${i + 1}. Score: ${result.score.toFixed(4)}, Text: "${result.metadata.text}"`
      );
    });
    return searchResults;
  }

  // 4. Test the semantic search engine
  await semanticSearch("How to connect to multiple AI services?");
  await semanticSearch("Tell me about application events in AI.");
}

buildSemanticSearchEngine().catch(console.error);

Integration with RAG Pipelines

NeuroLink's rag feature simplifies the entire process by internally handling document chunking, embedding generation, and similarity search. When you use rag: { files } in neurolink.generate() or neurolink.stream(), it transparently leverages embed() and embedMany() under the hood to build context for your AI model.

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// NeuroLink automatically handles embedding and retrieval for RAG
const result = await neurolink.generate({
  input: { text: "What are the benefits of event-driven AI?" },
  rag: {
    files: ["./my-event-driven-ai-guide.md", "./middleware-patterns.pdf"], // Your documents
    chunkSize: 512,
    topK: 3,
  },
});

console.log(result.text); // AI response informed by semantic search of your files

For advanced use cases requiring more control over the embedding and retrieval steps, NeuroLink allows you to use createVectorQueryTool with explicit embed() calls and your chosen vector store.

Conclusion

Semantic search is a game-changer for building intelligent applications, moving beyond keyword matching to true understanding. NeuroLink, with its powerful embed() and embedMany() methods, makes it incredibly simple to integrate embedding generation into your TypeScript projects. Whether you're building a sophisticated RAG pipeline or a standalone semantic search engine, NeuroLink provides the tools to unlock the full potential of vector search. By leveraging these capabilities, you can build AI applications that are not just smart, but truly intuitive and responsive to user intent.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Event-Driven AI: Building Reactive Applications with Lifecycle Hooks

NeuroLink AI — Mon, 06 Apr 2026 08:34:53 +0000

Event-Driven AI: Building Reactive Applications with Lifecycle Hooks

In the rapidly evolving landscape of AI, building robust, observable, and cost-effective applications is paramount. Traditional request-response patterns often fall short when dealing with the complexities of AI workflows, which involve multiple steps, external tool calls, and varying response times. This is where an event-driven architecture, powered by a flexible middleware or hook system, becomes indispensable.

NeuroLink, the universal AI SDK for TypeScript, provides a powerful and extensible middleware system that acts as the "lifecycle hooks" for your AI operations. These hooks allow you to inject custom logic at various stages of an AI request, enabling capabilities like real-time analytics, guardrails, automated evaluation, and comprehensive error handling.

The Power of NeuroLink's Middleware System

NeuroLink's middleware system transforms your AI interactions into an event-driven flow. Instead of a monolithic block of code, your AI requests pass through a chain of configurable functions, each capable of inspecting, modifying, or reacting to the request and response. This architecture is reminiscent of web frameworks like Express.js or Koa.js, but tailored specifically for AI.

This event-driven approach provides several key advantages:

Modularity: Each piece of logic (e.g., logging, cost calculation, safety check) is encapsulated in its own middleware, promoting cleaner code and easier maintenance.
Extensibility: Easily add new functionality without modifying core AI logic. Want to add a new monitoring tool? Write a new middleware.
Observability: Centralize logging, metrics, and tracing by hooking into every AI operation.
Control: Implement fine-grained control over AI behavior, from pre-call validations to post-response processing.

Understanding NeuroLink's Built-in Lifecycle Hooks

NeuroLink comes with several production-ready middleware components that exemplify the power of lifecycle hooks: Analytics, Guardrails, and Auto-Evaluation. Let's explore how these translate into event-driven patterns.

1. Analytics: Capturing Every Pulse of Your AI Application

The Analytics Middleware is a prime example of an onFinish hook – it captures comprehensive metrics after an AI operation completes, whether successfully or with an error.

How it Works:

This middleware intercepts every AI request and response, recording:

Token Usage: Input, output, and total tokens consumed. Crucial for cost tracking.
Response Time: Latency for each AI call. Essential for performance monitoring.
Request Status: Success or failure of the operation.
Provider/Model Information: Which AI provider and model were used.

All this data is automatically attached to the response metadata, making it easily accessible for further processing.

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
  input: { text: "Explain quantum computing" },
  provider: "openai",
  model: "gpt-4",
});

const analytics = result.experimental_providerMetadata?.neurolink?.analytics;
console.log(`Tokens used: ${analytics.usage.total}`);
console.log(`Response time: ${analytics.responseTime}ms`);

Event-Driven Benefits:

Cost Tracking: Automatically calculate costs per request, enabling budget management and optimization.
Performance Monitoring: Identify slow AI calls or bottlenecks in real-time.
Usage Analytics: Build dashboards to understand how your AI is being used across different models and providers.

2. Guardrails: Proactive Error Handling and Content Moderation (`onError`, `onChunk`)

The Guardrails Middleware acts as both a pre-call hook to prevent issues and a post-response hook for content moderation, effectively handling potential "errors" in content safety. It also demonstrates onChunk behavior for streaming.

How it Works:

Guardrails intercept both incoming prompts and outgoing responses.

Precall Evaluation (Preventative onError): Before a prompt even reaches the LLM, NeuroLink can evaluate its safety. If it's deemed unsafe, the request is blocked, preventing costly and inappropriate AI generation. This acts as an early-stage onError by preventing the main AI call from occurring.

const factory = new MiddlewareFactory({
  middlewareConfig: {
    guardrails: {
      enabled: true,
      config: {
        precallEvaluation: {
          enabled: true,
          provider: "openai",
          evaluationModel: "gpt-4",
          thresholds: { safetyScore: 8 },
          blockUnsafeRequests: true,
        },
      },
    },
  },
});

// If the input is unsafe, this will be blocked before calling the LLM
const result = await neurolink.generate({
  input: { text: "unsafe content" },
});
// result.text will be "<BLOCKED BY PRECALL GUARDRAILS>"

Bad Word Filtering (Reactive onChunk / onFinish): Scans both requests and responses for prohibited terms and redacts them. For streaming responses, this happens in real-time on each onChunk event.

Event-Driven Benefits:

Content Safety: Automatically filter out or redact inappropriate content, ensuring your AI applications remain compliant and ethical.
Prompt Injection Protection: Prevent malicious prompts from compromising your AI's behavior.
Cost Savings: Block unsafe requests early, avoiding unnecessary token consumption.

3. Auto-Evaluation: Ensuring Quality and Responding to Failures (`onFinish`, `onError` for retry)

The Auto-Evaluation Middleware is a sophisticated onFinish hook that assesses the quality of AI responses. If the quality falls below a certain threshold, it can trigger retry mechanisms, effectively acting as an onError handler for suboptimal outputs.

How it Works:

After an AI response is generated, this middleware uses another AI model (or custom logic) to evaluate criteria like relevance, accuracy, and coherence.

Blocking Mode: The user waits for the evaluation to complete. If the quality is too low, NeuroLink can automatically retry the request or return an error, guaranteeing a minimum quality standard. This is a direct onError pattern if the quality is unacceptable.
Non-Blocking Mode: Evaluation happens in the background, making it suitable for applications where latency is critical. The results can be logged or used asynchronously.

const factory = new MiddlewareFactory({
  middlewareConfig: {
    autoEvaluation: {
      enabled: true,
      config: {
        threshold: 7, // Minimum quality score
        blocking: true, // Wait for evaluation
        onEvaluationComplete: async (evaluation) => {
          if (!evaluation.passed) {
            console.log("Low quality response detected. Consider retrying.");
          }
        },
      },
    },
  },
});

Event-Driven Benefits:

Quality Assurance: Maintain high standards for AI output in customer-facing applications.
Automatic Improvement: Trigger retries or use adaptive strategies when responses are subpar.
Continuous Learning: Collect quality metrics to fine-tune prompts, models, or even the middleware itself.

Implementing Custom Lifecycle Hooks: The Middleware Architecture

NeuroLink's middleware system isn't just about built-in features; it's about providing a framework for you to implement your own event-driven logic. Every AI operation (generate, stream, embed, etc.) passes through a middleware chain, allowing you to intercept and act upon events.

The core of this system involves transformParams (pre-call hook) and transformResponse (post-call hook).

// Example: A custom logging middleware
const customLoggingMiddleware = {
  name: "custom-logger",
  priority: 50, // Runs after analytics, before guardrails
  transformParams: async (params, context) => {
    console.log(`[Custom Logger] AI Request: ${JSON.stringify(params.input)}`);
    return params;
  },
  transformResponse: async (response, context) => {
    console.log(
      `[Custom Logger] AI Response (Status: ${response.ok ? "OK" : "Error"})`
    );
    // You could also log specific parts of the response here
    return response;
  },
};

const neurolink = new NeuroLink({
  middleware: [customLoggingMiddleware],
});

By leveraging transformParams and transformResponse, you can build custom onFinish, onError, and other patterns:

onFinish: Implement logic in transformResponse that executes regardless of success or failure.
onError: Catch errors within transformResponse or implement a dedicated error-handling middleware that acts if a preceding middleware or the AI call itself throws. NeuroLink's onCatch mechanism in middleware allows for specific error interception.
onChunk: For streaming responses, specific middleware can process each chunk as it arrives, enabling real-time filtering or transformations.

Conclusion

NeuroLink's event-driven middleware system provides a robust and flexible foundation for building sophisticated AI applications. By treating AI operations as a series of events and providing powerful lifecycle hooks, developers can easily integrate real-time analytics, implement comprehensive guardrails for safety and compliance, ensure high-quality outputs through automated evaluation, and handle errors gracefully. This modular approach not only simplifies development but also empowers you to create reactive, observable, and production-ready AI systems that truly stand the test of time.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

11 Auth Providers for AI Apps: Securing Your LLM API Keys in TypeScript

NeuroLink AI — Mon, 06 Apr 2026 08:33:55 +0000

11 Auth Providers for AI Apps: Securing Your LLM API Keys in TypeScript

Building AI applications often involves interacting with multiple Large Language Model (LLM) providers. Managing API keys, credentials, and authentication across these diverse platforms can quickly become a complex security and operational challenge. In this article, we'll explore authentication patterns for AI apps and highlight eleven key providers, focusing on how NeuroLink, the universal AI SDK for TypeScript, simplifies this landscape.

The Challenge of Multi-Provider Authentication

When you integrate with various AI services like OpenAI, Anthropic, Google Cloud, or AWS Bedrock, each comes with its own authentication mechanisms. This typically involves:

API Keys: The most common method, often passed as HTTP headers or within the request body.
OAuth 2.0: Used for user authorization, granting limited access to resources without sharing credentials.
JWT (JSON Web Tokens): For secure information exchange, often used in service-to-service communication.
Service Accounts: For programmatic access by applications rather than individual users.
Environment Variables: A common way to manage sensitive keys in development and production environments.

The sheer variety can lead to:

Security Risks: Hardcoding keys, improper storage, or insecure transmission.
Operational Overhead: Managing key rotation, access control, and environment-specific configurations.
Developer Friction: Inconsistent APIs and authentication flows across providers.

NeuroLink: Unifying AI Authentication

NeuroLink addresses these challenges by providing a consistent API layer over 13 major AI providers. This means you configure your authentication once, and NeuroLink handles the provider-specific nuances under the hood.

Here's a look at how NeuroLink helps secure your AI applications across various providers:

Key Authentication Patterns for AI Apps

Environment-Based Configuration: NeuroLink leverages environment variables for API keys and other credentials, promoting secure storage and easy management across different deployment environments. This avoids hardcoding sensitive information.
Unified Credential Management: Instead of managing individual SDKs and authentication logic for each provider, NeuroLink centralizes this, reducing boilerplate and potential for errors.
Human-in-the-Loop (HITL) for Sensitive Operations: For regulated industries or high-stakes AI operations, NeuroLink offers a production-ready HITL system. This allows you to require human approval before AI executes sensitive tools or processes critical data, adding an extra layer of security and compliance. This includes:
- Tool Approval Workflows: Require human approval before AI executes sensitive tools (e.g., financial transactions, data modifications).
- Output Validation: Route AI outputs through human review pipelines (e.g., medical diagnosis, legal documents).
- Complete Audit Trail: Full audit logging for compliance (HIPAA, SOC2, GDPR).
Credential Management & Auditing: NeuroLink emphasizes secure credential management and provides auditing capabilities to ensure compliance and track access to sensitive AI resources.
Hardened OS Verification & Zero Credential Logging: NeuroLink is designed with enterprise security in mind, including hardened OS verification (SELinux, AppArmor) and a strict policy of zero credential logging to prevent accidental exposure.

11 Auth Providers Supported by NeuroLink

NeuroLink unifies access to these providers, simplifying authentication and interaction:

OpenAI (GPT-4o, GPT-4o-mini, etc.): Typically uses API keys. NeuroLink securely manages and passes these keys.
- Setup Guide: docs/getting-started/provider-setup.md#openai
Anthropic (Claude 4.5 Opus/Sonnet/Haiku): Also relies on API keys. NeuroLink abstracts this for seamless integration.
- Setup Guide: docs/getting-started/provider-setup.md#anthropic
Google AI Studio (Gemini 3 Flash/Pro): Often uses API keys. NeuroLink integrates these with a consistent interface.
- Setup Guide: docs/getting-started/provider-setup.md#google-ai
AWS Bedrock (Claude, Titan, Llama, Nova): AWS services use IAM roles and access keys. NeuroLink handles the underlying AWS SDK authentication.
- Setup Guide: docs/getting-started/provider-setup.md#bedrock
Google Vertex AI (Gemini 3/2.5): Leverages Google Cloud IAM for authentication. NeuroLink facilitates this integration.
- Setup Guide: docs/getting-started/provider-setup.md#vertex
Azure OpenAI (GPT-4, GPT-4o): Uses Azure Active Directory and API keys. NeuroLink supports secure configuration.
- Setup Guide: docs/getting-started/provider-setup.md#azure
LiteLLM: Acts as a proxy for 100+ models. NeuroLink's integration means you authenticate with LiteLLM, and it manages the downstream provider authentication.
- Setup Guide: docs/litellm-integration.md
AWS SageMaker: For custom deployed models, authentication involves AWS IAM. NeuroLink integrates with your SageMaker endpoints.
- Setup Guide: docs/sagemaker-integration.md
Mistral AI (Mistral Large, Small): Uses API keys, which NeuroLink manages.
- Setup Guide: docs/getting-started/provider-setup.md#mistral
Hugging Face (100,000+ models): Often uses API tokens. NeuroLink streamlines this for models compatible with its system.
- Setup Guide: docs/getting-started/provider-setup.md#huggingface
OpenRouter (200+ Models): Provides a unified API for many models. NeuroLink integrates with OpenRouter, simplifying authentication to a single point.
- Setup Guide: docs/getting-started/providers/openrouter.md

Practical Example: NeuroLink Setup

With NeuroLink, setting up your providers and validating keys is a straightforward process:

# 1. Run the interactive setup wizard (select providers, validate keys)
pnpm dlx @juspay/neurolink setup

# 2. Start generating with automatic provider selection
npx @juspay/neurolink generate "Write a launch plan for multimodal chat"

This command-line setup wizard guides you through configuring each provider, securely storing API keys, and validating your credentials.

Conclusion

Securing your AI applications and managing API keys across a multitude of LLM providers can be a significant undertaking. NeuroLink simplifies this by offering a unified TypeScript SDK that abstracts away provider-specific authentication complexities, promotes secure credential management practices, and provides enterprise-grade security features like Human-in-the-Loop workflows. By centralizing your AI interactions through NeuroLink, you can focus on building innovative AI features with confidence in your application's security and maintainability.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Running AI in the Browser: NeuroLink's Client-Side SDK for Web Apps

NeuroLink AI — Mon, 06 Apr 2026 08:32:57 +0000

Running AI in the Browser: NeuroLink's Client-Side SDK for Web Apps

The landscape of AI development is rapidly expanding, moving beyond traditional server-side inference to embrace the power of the client. Running AI directly in the browser offers exciting possibilities for enhanced user experiences, improved privacy, and reduced infrastructure costs. However, it also introduces unique challenges, particularly around bundle size, efficient execution, and secure API key management.

Enter NeuroLink, Juspay's universal AI SDK for TypeScript. While NeuroLink is a comprehensive platform designed for both client and server environments, its client-side SDK is specifically engineered to bring robust AI capabilities directly to your web applications. This article explores how NeuroLink addresses the intricacies of client-side AI, allowing developers to integrate powerful language models and tools into their browser-based projects seamlessly.

Why Client-Side AI for Web Applications?

Before diving into NeuroLink's specifics, let's consider the compelling reasons to run AI in the browser:

Reduced Latency and Real-time Feedback: Processing AI tasks directly on the user's device eliminates network roundtrips, leading to instantaneous responses. This is critical for interactive applications like real-time chat, content generation, and intelligent UIs.
Enhanced Privacy and Data Security: Sensitive user data can remain on the client, never leaving the user's browser. This local processing significantly improves privacy posture and simplifies compliance with data protection regulations.
Lower Infrastructure Costs: Offloading AI inference to the client reduces the computational burden on your backend servers, potentially leading to substantial cost savings on cloud resources.
Offline Functionality: For certain models or tasks, client-side execution can enable AI features even when the user is offline, providing a more resilient and consistent experience.
Personalization at Scale: Each user's browser becomes a personalized AI engine, capable of tailoring experiences based on local data and preferences without constant server communication.

NeuroLink's Client-Side SDK: A Powerful Foundation

NeuroLink's client-side SDK is built from the ground up for web environments, offering a suite of tools and integrations that make browser-based AI development a breeze.

At its core, the SDK provides a type-safe HTTP client (createClient) for interacting with NeuroLink APIs, whether hosted on your own infrastructure or through a managed service. This client handles everything from request/response serialization to automatic retries and middleware management.

For modern web frameworks, NeuroLink offers first-class integrations:

React Hooks: A rich set of React hooks like useChat, useAgent, useWorkflow, useVoice, useStream, and useTools simplify the integration of AI functionalities into React applications. These hooks manage state, handle streaming, and provide intuitive interfaces for building AI-powered UIs.
Vercel AI SDK Compatibility: For those already using the popular Vercel AI SDK, NeuroLink provides a LanguageModelV1 adapter (createNeuroLinkProvider). This allows NeuroLink models to be used interchangeably with generateText, streamText, and other AI SDK functions, providing flexibility and leveraging an existing ecosystem.

import { createClient } from "@juspay/neurolink/client";

const client = createClient({
  baseUrl: "https://api.neurolink.example.com",
  apiKey: process.env.NEUROLINK_API_KEY,
});

// Generate text
const result = await client.generate({
  input: { text: "Explain TCP in two sentences" },
  provider: "openai",
  model: "gpt-4o",
});

console.log(result.data.content);

For React developers:

import { NeuroLinkProvider, useChat } from "@juspay/neurolink/client";

function App() {
  return (
    <NeuroLinkProvider
      config={{
        baseUrl: "https://api.neurolink.example.com",
        apiKey: process.env.NEUROLINK_API_KEY,
      }}
    >
      <ChatComponent />
    </NeuroLinkProvider>
  );
}

function ChatComponent() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } =
    useChat({
      agentId: "my-agent",
    });

  return (
    <div>
      {messages.map((m) => (
        <div key={m.id}>
          <strong>{m.role}:</strong> {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

Bundle Size and Tree Shaking: Optimizing for the Web

One of the primary concerns with integrating complex SDKs into web applications is the impact on bundle size, which directly affects load times and user experience. NeuroLink is designed with this in mind.

The SDK leverages modern bundling techniques to ensure that only the necessary code is included in your client-side applications. The scripts/build-browser.mjs script, for instance, uses esbuild to create optimized browser bundles. A crucial aspect of this is stubbing out Node.js-specific modules and dependencies. Many internal NeuroLink components rely on Node.js APIs (like fs, path, crypto) or server-only npm packages. During the browser build process, these are replaced with light-weight, browser-compatible stubs or polyfills, or simply removed if not required by the client-side functionality.

// Excerpt from scripts/build-browser.mjs
const nodeBuiltins = [
  'fs','fs/promises','path','crypto','os','events','http','https','net','tls',
  // ... and many more Node.js specific modules
];

const npmStubs = [
  'sharp','canvas','ffmpeg-static','pdf-parse','exceljs','adm-zip',
  // ... and many server-only npm packages
];

// In the esbuild configuration, these are marked as external or resolved to noop stubs.
// This ensures they don't get bundled into the client-side code.

This aggressive tree-shaking and stubbing strategy ensures that the client-side bundle remains as small as possible, minimizing overhead and maximizing performance for web users.

Proxy Patterns for API Keys: Keeping Secrets Safe

Directly embedding API keys in client-side code is a significant security risk. NeuroLink facilitates secure interaction with AI services by encouraging and supporting proxy patterns for API key management. Instead of making direct calls to AI provider APIs from the browser with exposed keys, the NeuroLink client SDK is designed to communicate with your own backend (which then securely proxies requests to the AI providers using its own, secret API keys).

This can be achieved by setting baseUrl to your own API endpoint:

const client = createClient({
  // Your backend acts as a secure proxy
  baseUrl: "https://your-backend.com/neurolink-proxy",
  // API key for your backend, which then uses its own keys for AI providers
  apiKey: "your-backend-api-key-if-any",
});

This approach not only protects your sensitive credentials but also allows you to implement custom logic, rate limiting, logging, and caching on your backend, providing a robust and secure AI integration layer. OAuth2 client credentials and JWT token management are also supported, enabling more sophisticated authentication flows for enterprise applications.

Streaming in React/Vue/Svelte: Real-time AI Experiences

Modern AI applications thrive on real-time interaction, and streaming is a cornerstone of this experience. NeuroLink's client SDK offers comprehensive streaming capabilities, crucial for applications built with frameworks like React, Vue, or Svelte.

The SDK supports three primary streaming mechanisms:

Callback-Based Streaming (HTTP Client): The client.stream() method allows you to define callbacks (onText, onToolCall, onDone, onError, etc.) that are triggered as chunks of AI responses arrive. This is often the simplest way to integrate streaming into any JavaScript framework.

await client.stream(
  { input: { text: "Explain quantum computing" }, provider: "openai" },
  {
    onText: (text) => process.stdout.write(text), // Update UI with incoming text
    onDone: (result) => console.log("\nUsage:", result.usage),
  },
);

Server-Sent Events (SSE): For long-lived, unidirectional streaming from the server to the client, the createSSEClient provides a dedicated, auto-reconnecting SSE client. This is ideal for scenarios where the server pushes updates (e.g., agent progress, ongoing generation).
WebSockets: For bidirectional, real-time communication, the createWebSocketClient enables full-duplex interactions, perfect for interactive AI agents that require constant back-and-forth messaging.

The React hooks, such as useChat and useStream, abstract away much of this complexity, providing ready-to-use solutions for building streaming UIs that automatically update as AI generates content.

Client-Side AI and Edge Computing

Client-side AI is a natural complement to edge computing strategies. By performing inference directly in the browser, you effectively push computation to the "edge" of the network – the user's device. This distributed approach reduces reliance on centralized cloud resources, minimizes data transfer, and can lead to more resilient and scalable applications.

NeuroLink's design philosophy aligns with this trend, providing the tools necessary to build hybrid AI architectures where some tasks run on powerful cloud GPUs, while others, particularly those requiring low latency or high privacy, execute efficiently in the browser or on nearby edge devices.

Conclusion

The ability to run powerful AI models and tools directly within the browser opens up a new frontier for web application development. NeuroLink's Client-Side SDK for TypeScript provides a robust, type-safe, and highly optimized solution to navigate this landscape. By carefully managing bundle size, facilitating secure API key handling, and offering flexible streaming and framework integrations, NeuroLink empowers developers to create intelligent, responsive, and private AI-powered web experiences.

Whether you're building a real-time AI chat application, an intelligent content editor, or a personalized recommendation engine, NeuroLink's client SDK offers the foundation you need to bring your AI vision to the web.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Testing AI Outputs: 14 Scoring Strategies for Reliable LLM Applications

NeuroLink AI — Mon, 06 Apr 2026 08:31:56 +0000

Testing AI Outputs: 14 Scoring Strategies for Reliable LLM Applications

As Large Language Models (LLMs) become central to modern applications, ensuring the quality and reliability of their outputs is paramount. Without systematic evaluation, you risk deploying models that are inaccurate, biased, or even harmful. At Juspay, with our NeuroLink SDK, we've developed a robust evaluation system that helps developers rigorously test and score AI outputs.

This article dives into 14 key scoring strategies, ranging from simple string matching to sophisticated LLM-as-judge techniques, and showcases how NeuroLink enables you to integrate these into your development workflow.

Why Systematic AI Output Evaluation Matters

AI systems, especially LLMs, are probabilistic by nature. Their outputs can vary based on input nuances, model versions, and even the random seed used during generation. Relying solely on anecdotal testing or manual review is unsustainable and prone to human error. A systematic approach to evaluation allows you to:

Ensure Accuracy: Verify that the LLM generates factually correct and relevant information.
Maintain Consistency: Check for consistent behavior across different inputs and scenarios.
Detect Issues Early: Identify hallucinations, biases, and toxic outputs before they reach production.
Optimize Performance: Fine-tune prompts and models based on quantifiable metrics.
Build Trust: Deliver reliable AI applications that users can depend on.

NeuroLink, our TypeScript-first Universal AI SDK, provides an extensive framework for building and running evaluation pipelines. Let's explore the strategies.

NeuroLink's Evaluation System: A Deep Dive into Scoring Strategies

NeuroLink's src/lib/evaluation module is designed for comprehensive AI output assessment. It categorizes scorers into two main types: Rule-based Scorers and LLM-based Scorers (often referred to as "LLM-as-a-judge").

Rule-Based Scoring Strategies

These strategies are excellent for objective, quantifiable checks that don't require semantic understanding from another LLM. They are fast, deterministic, and can act as powerful first-pass filters.

String Matching:
- Description: Checks if the output contains specific keywords, phrases, or exact substrings. Ideal for verifying the inclusion of required information or the absence of forbidden terms.
- Use Case: Ensuring a chatbot includes a disclaimer, or a generated summary mentions key entities.
Regex Validation:
- Description: Uses regular expressions to validate the format of an AI output.
- Use Case: Checking if an extracted email address matches ^\S+@\S+\.\S+$, or if a generated JSON adheres to a specific structure.
Zod Schema Checks:
- Description: Leverages Zod, a TypeScript-first schema declaration and validation library, to ensure AI-generated JSON or structured data conforms to a predefined schema.
- Use Case: Critical for ensuring reliable function calling and structured output, where the AI is expected to return data in a specific shape. NeuroLink's structured output feature pairs perfectly with this.
Length Scoring (lengthScorer.ts):
- Description: Evaluates the output based on its character or token count.
- Use Case: Enforcing conciseness in summaries, or ensuring generated marketing copy meets minimum length requirements.
Keyword Coverage (keywordCoverageScorer.ts):
- Description: Measures the percentage of predefined keywords present in the AI's response.
- Use Case: Verifying that an article covers all essential topics, or a product description includes relevant features.
Content Similarity (contentSimilarityScorer.ts):
- Description: Compares the AI's output against a reference text using metrics like Jaccard similarity, cosine similarity (on embeddings), or Levenshtein distance.
- Use Case: Assessing how closely a generated response matches a golden answer, or detecting plagiarism.
Format Scoring (formatScorer.ts):
- Description: General checks for specific formatting requirements beyond regex, such as markdown correctness, code syntax, or adherence to a style guide.
- Use Case: Ensuring generated code snippets are valid, or a report follows a specific document structure.

LLM-as-a-Judge Scoring Strategies

These advanced strategies utilize another LLM to evaluate the primary LLM's output. This allows for nuanced, semantic assessments that rule-based systems cannot perform. NeuroLink's scorers/llm directory houses a rich collection of these.

Answer Relevancy (answerRelevancyScorer.ts):
- Description: An LLM judges whether the generated answer directly addresses the user's query and provides relevant information.
- Use Case: Essential for chatbots and Q&A systems to prevent off-topic responses.
Context Relevancy (contextRelevancyScorer.ts):
- Description: Evaluates if the AI's response uses only information available in the provided context, without introducing external knowledge.
- Use Case: Crucial for RAG (Retrieval Augmented Generation) systems to ensure grounded responses.
Faithfulness (faithfulnessScorer.ts):
- Description: Similar to context relevancy, but specifically checks if all claims made in the AI's output are directly supported by the source material.
- Use Case: Verifying summaries or factual extractions from documents.
Hallucination Detection (hallucinationScorer.ts):
- Description: An LLM-as-judge identifies instances where the primary LLM generates information that is factually incorrect or unsupported by its knowledge base/context.
- Use Case: A critical safety check for all LLM applications, especially those delivering factual content.
Toxicity Scoring (toxicityScorer.ts):
- Description: Determines if the AI's output contains offensive, hateful, or inappropriate language.
- Use Case: Content moderation, ensuring polite and safe interactions in user-facing applications.
Bias Detection (biasDetectionScorer.ts):
- Description: An LLM-as-judge assesses whether the output exhibits unwanted biases (e.g., gender, racial, cultural).
- Use Case: Promoting fairness and ethical AI in sensitive applications.
Prompt Alignment (promptAlignmentScorer.ts):
- Description: Evaluates how well the AI's response adheres to the specific instructions, tone, and style requested in the prompt.
- Use Case: Ensuring consistency in brand voice, adherence to legal guidelines, or specific output formats.

Building Evaluation Pipelines with NeuroLink

NeuroLink's evaluation/pipeline module allows you to chain these scorers together, define sampling strategies, and build comprehensive evaluation workflows. You can:

Create Custom Pipelines: Combine various rule-based and LLM-based scorers to form a multi-faceted evaluation.
Define Strategies: Implement batch processing or sampling strategies (batchStrategy.ts, samplingStrategy.ts) to manage evaluation costs and time for large datasets.
Generate Reports: Use the reporting module to aggregate metrics and generate actionable reports, providing insights into your LLM's performance over time.

For instance, a typical RAG evaluation pipeline might combine contextRelevancyScorer, faithfulnessScorer, answerRelevancyScorer, and toxicityScorer to ensure a grounded, relevant, and safe response.

Conclusion

Testing AI outputs systematically is no longer optional; it's a foundational requirement for building reliable and trustworthy LLM applications. NeuroLink provides a powerful, flexible, and TypeScript-native framework to implement a wide array of scoring strategies, from rigid rule-based checks to nuanced LLM-as-judge evaluations.

By integrating these strategies into your development lifecycle, you can confidently deploy LLM applications that meet high standards of accuracy, safety, and performance.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles