NeuroLink AI

Posted on Mar 11 • Edited on Jun 27 • Originally published at blog.neurolink.ink

AI Observability: Logging, Tracing, and Monitoring Your AI Calls

#ai #typescript #observability #langfuse

Your AI is a black box. Here's how to open it.

You deployed an AI feature. Users are complaining it's slow. Sometimes it returns garbage. You have no idea which model ran, what prompt was sent, or how many tokens it consumed. You open your cloud dashboard and see a single line item: "AI API calls — $847.23 this month."

That's the state of most AI applications in production. You are flying blind.

The good news: this is a solved problem. OpenTelemetry has defined standard semantic conventions for AI systems. Langfuse gives you a beautiful UI to inspect every trace. And NeuroLink wires all of this up automatically — zero boilerplate, one config block.

This article shows you how to go from zero observability to full tracing, monitoring, and EU AI Act-ready audit logging in under 30 minutes.

Why AI Observability Is No Longer Optional

Observability has always mattered for APIs and databases. For AI, it matters more — and for reasons beyond debugging.

Debugging: When your model returns a hallucination, you need to know the exact prompt, the provider, the model version, the temperature setting, and the full token usage. Without traces, you're guessing.

Cost management: LLM API costs are non-linear. A single misbehaving agent can consume thousands of dollars in a weekend. Token-level tracing lets you catch runaway usage before it hits your invoice.

Performance: Is your p99 latency 8 seconds because of the model, the network, your RAG pipeline, or a slow tool call? You can only answer that with distributed tracing.

Compliance: The EU AI Act's high-risk provisions go live August 2, 2026. Penalties reach €35M or 7% of global revenue. Among the requirements: maintaining auditable records of AI decision-making, human oversight procedures, and risk documentation. Auditable AI is now a regulatory requirement, not a best practice.

NeuroLink's Observability Stack

NeuroLink ships with two observability integrations out of the box:

Langfuse — the leading open-source LLM observability platform, with a hosted cloud option and self-hosted Docker deployment
OpenTelemetry — the CNCF standard for distributed tracing, compatible with Jaeger, Tempo, Honeycomb, Datadog, and any OTel-compatible backend

You can use one, the other, or both simultaneously. Here is the full configuration type:

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  observability: {
    langfuse: {
      enabled: true,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      environment: "production",

      // How traces are named in Langfuse
      traceNameFormat: "userId:operationName",
      // Format options:
      //   "userId:operationName"  -> "user@email.com:ai.streamText"
      //   "operationName:userId"  -> "ai.streamText:user@email.com"
      //   "operationName"         -> "ai.streamText"
      //   "userId"                -> "user@email.com"
      //   (ctx) => `[${ctx.operationName}] ${ctx.userId}`  // custom function

      autoDetectOperationName: true,

      // If your app already has an OTel setup, plug in instead of creating a new one
      useExternalTracerProvider: false,
      autoDetectExternalProvider: true,
      skipLangfuseSpanProcessor: false,
    },

    openTelemetry: {
      enabled: true,
      endpoint: "https://otel-collector.example.com",
      serviceName: "my-ai-service",
      serviceVersion: "1.0.0",
    },
  },
});

That's the entire setup. Every generate() call is now automatically instrumented.

Langfuse: Seeing Inside Every LLM Call

Langfuse traces show you the full lifecycle of each AI request: inputs, outputs, model selection, token counts, latency, and cost — all in a searchable UI.

Basic Setup

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  observability: {
    langfuse: {
      enabled: true,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      environment: "production",
      traceNameFormat: "userId:operationName",
      autoDetectOperationName: true,
    },
  },
});

Correlating Traces with requestId

Every AI call can carry a requestId that flows through your entire observability stack. This is the key to answering "which AI call was responsible for this user complaint?"

// requestId appears in Langfuse traces, OTel spans, and your application logs
const result = await neurolink.generate({
  input: { text: "Analyze sentiment of customer feedback" },
  provider: "anthropic",
  model: "claude-sonnet-4-6",
  requestId: "req-customer-feedback-001",  // your request correlation ID
});

// result.analytics contains per-call metrics
console.log(`Tokens used: ${result.usage?.totalTokens}`);
console.log(`Response time: ${result.responseTime}ms`);
console.log(`Provider: ${result.provider}, Model: ${result.model}`);

In Langfuse, you can search by requestId and immediately pull up the full trace: the exact prompt, the model response, token counts, latency breakdown, and cost.

Trace Naming Strategies

The traceNameFormat option controls how traces are organized in Langfuse. For multi-tenant applications, userId:operationName groups all of a user's AI activity together. For debugging by operation type, operationName:userId lets you filter by what the AI was doing.

You can also use a custom function for complete control:

const neurolink = new NeuroLink({
  observability: {
    langfuse: {
      enabled: true,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      // Custom trace naming for your specific context
      traceNameFormat: (ctx) => `[${ctx.operationName}] user=${ctx.userId} env=${process.env.NODE_ENV}`,
    },
  },
});

OpenTelemetry: Distributed Tracing with GenAI Semantic Conventions

OpenTelemetry's GenAI semantic conventions define a standard set of attributes for AI spans. NeuroLink automatically populates all of them on every call.

What Gets Traced

The following OTel attributes are captured on every generate() call:

gen_ai.system                  -> "anthropic" | "openai" | "vertex" | ...
gen_ai.request.model           -> "claude-sonnet-4-6" | "gpt-4o" | ...
gen_ai.response.model          -> the model that actually responded
gen_ai.usage.input_tokens      -> prompt token count
gen_ai.usage.output_tokens     -> completion token count
gen_ai.request.temperature     -> temperature setting
gen_ai.request.max_tokens      -> max tokens setting
ai.operationId                 -> operation identifier
ai.finishReason                -> "stop" | "length" | "tool_calls" | ...

These are the same attributes used by Datadog, Honeycomb, and every major OTel-compatible APM. Your AI spans integrate seamlessly with your existing infrastructure traces.

Connecting to Your OTel Collector

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  observability: {
    openTelemetry: {
      enabled: true,
      endpoint: "https://otel-collector.your-domain.com",
      serviceName: "ai-service",
      serviceVersion: "2.1.0",
    },
  },
});

For teams already running OTel in their application (common in microservice architectures), NeuroLink can plug into your existing tracer provider instead of creating a new one:

const neurolink = new NeuroLink({
  observability: {
    langfuse: {
      enabled: true,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      // Don't create a new OTel provider — use the one already initialized
      useExternalTracerProvider: true,
      autoDetectExternalProvider: true,
    },
  },
});

This means your AI spans appear in the same trace as your database queries and HTTP calls — full end-to-end visibility.

Context Compaction: Observing What You Can't See

Long-running agent conversations hit context limits. When they do, something has to give. NeuroLink handles this with a 4-stage context compaction pipeline — and understanding this pipeline is critical for observability.

Stage 1: Tool Output Pruning     (no LLM call — free)
Stage 2: File Read Deduplication (no LLM call — free)
Stage 3: LLM Summarization       (LLM call — costs tokens)
Stage 4: Sliding Window Truncation (no LLM call — fallback)

The pipeline tries the cheapest option first and escalates only when necessary. But Stage 3 — LLM summarization — is itself an AI call that consumes tokens and incurs cost. Without observability, you would not know it was happening.

With Langfuse tracing enabled, each compaction stage appears as a child span in your trace. You can see exactly when context compaction triggers, which stage ran, and how many tokens the summarization consumed.

Here is how to configure the compaction pipeline:

import { ContextCompactor } from "@juspay/neurolink";

const compactor = new ContextCompactor({
  enablePrune: true,
  enableDeduplicate: true,
  enableSummarize: true,       // Stage 3: uses an LLM — visible in traces
  enableTruncate: true,        // Stage 4: fallback, no LLM cost

  pruneProtectTokens: 40_000,    // protect the last 40k tokens from pruning
  pruneMinimumSavings: 20_000,   // only prune if it saves 20k+ tokens
  pruneProtectedTools: ["skill"],

  summarizationProvider: "vertex",
  summarizationModel: "gemini-2.5-flash",
  keepRecentRatio: 0.3,
  truncationFraction: 0.5,
});

The choice of summarizationProvider and summarizationModel lets you route compaction calls to a cheaper model — for example, using Gemini Flash for summarization while your main agent uses Claude Sonnet. This cost optimization is visible in your Langfuse traces: you will see two different models in the same conversation trace.

HITL Audit Logging: The Compliance Layer

Human-in-the-Loop (HITL) is NeuroLink's safety system for AI agents that take real-world actions. When an agent tries to call a dangerous tool — delete, drop, truncate, kill — HITL intercepts it and waits for human approval.

HITL's audit logging is a core part of your observability stack, especially for EU AI Act compliance. Every approval, rejection, and timeout is logged with full context.

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  hitl: {
    enabled: true,
    dangerousActions: ["delete", "drop", "truncate", "remove", "kill"],
    timeout: 30000,              // 30 seconds for human to respond
    allowArgumentModification: true,   // human can edit args before approving
    autoApproveOnTimeout: false,       // reject on timeout — safe default
    auditLogging: true,                // write compliance audit trail

    customRules: [
      {
        name: "production-database-rule",
        condition: (toolName, args) => {
          return toolName.includes("database") &&
                 JSON.stringify(args).includes("production");
        },
        requiresConfirmation: true,
        customMessage: "This action touches the production database!",
      },
    ],
  },
});

The HITLAuditLog type captures everything the EU AI Act's human oversight requirements ask for:

// What the audit log captures (HITLAuditLog type):
{
  eventType:    "confirmation-request" | "confirmation-response" | "timeout",
  toolName:     string,           // which tool was called
  arguments:    unknown,          // what arguments it was called with
  approved:     boolean,          // what the human decided
  reason:       string,           // why they approved or rejected
  userId:       string,           // who made the decision
  ipAddress:    string,           // from where
  userAgent:    string,           // from which client
  responseTime: number,           // how long the human took (ms)
  timestamp:    string,           // ISO timestamp
}

This audit log is your paper trail. When a regulator asks "did a human review this AI action?", you have a timestamped, tamper-evident record.

HITL Statistics

You can query live statistics from the HITL manager at any time:

const stats = hitlManager.getStatistics();
// {
//   totalRequests:       number,  // all-time confirmation requests
//   pendingRequests:     number,  // currently awaiting human decision
//   averageResponseTime: number,  // ms average across all decisions
//   approvedRequests:    number,
//   rejectedRequests:    number,
//   timedOutRequests:    number,
// }

Expose this via a metrics endpoint and you have a live dashboard of how often your AI is asking for human oversight — and how quickly humans are responding.

Putting It All Together: A Production-Ready Observable AI Setup

Here is a complete production setup combining all three layers: Langfuse tracing, OTel spans, and HITL audit logging.

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({
  // Full observability stack
  observability: {
    langfuse: {
      enabled: true,
      publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
      secretKey: process.env.LANGFUSE_SECRET_KEY!,
      environment: process.env.NODE_ENV as "production" | "development",
      traceNameFormat: "userId:operationName",
      autoDetectOperationName: true,
    },
    openTelemetry: {
      enabled: true,
      endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT!,
      serviceName: "ai-backend",
      serviceVersion: process.env.npm_package_version ?? "unknown",
    },
  },

  // HITL with audit logging for compliance
  hitl: {
    enabled: true,
    dangerousActions: ["delete", "drop", "truncate", "remove", "kill"],
    timeout: 30000,
    allowArgumentModification: true,
    autoApproveOnTimeout: false,
    auditLogging: true,
  },
});

// Every generate() call is now fully traced
async function analyzeCustomerFeedback(
  userId: string,
  feedbackText: string,
  requestId: string
) {
  const result = await neurolink.generate({
    input: { text: `Analyze the sentiment and key themes in: ${feedbackText}` },
    provider: "anthropic",
    model: "claude-sonnet-4-6",
    requestId,  // correlates this call across Langfuse, OTel, and your own logs
  });

  // result.analytics — per-call metrics
  console.log(`Tokens: ${result.usage?.totalTokens}`);
  console.log(`Cost: $${result.analytics?.cost?.toFixed(6)}`);
  console.log(`Latency: ${result.responseTime}ms`);
  console.log(`Provider: ${result.provider}`);

  return result.content;
}

When you call analyzeCustomerFeedback, here is what happens automatically:

A trace is created in Langfuse with the name userId:ai.generate
An OTel span is created with gen_ai.* attributes
The span is exported to your OTel collector
requestId appears in both traces, linking them
If a tool call triggers HITL, the approval decision is written to the audit log
result.analytics gives you the cost and token breakdown

Reading the Analytics Data

Every generate() result includes an analytics field with per-call metrics. You do not need observability enabled to access this — it is always present.

const result = await neurolink.generate({
  input: { text: "Summarize this quarterly report" },
  provider: "openai",
  model: "gpt-4o",
});

// Usage breakdown
console.log(result.usage?.inputTokens);    // prompt tokens
console.log(result.usage?.outputTokens);   // completion tokens
console.log(result.usage?.totalTokens);    // sum

// Performance
console.log(result.responseTime);          // milliseconds

// Which provider/model actually ran
console.log(result.provider);             // "openai"
console.log(result.model);               // "gpt-4o"

// Tools called (if any)
console.log(result.toolsUsed);           // ["search", "read_file"]

// Cost (if analytics enabled — it is by default)
console.log(result.analytics?.cost);     // USD float

Combine this with requestId correlation and you can build a complete picture: which user triggered which AI call, what it cost, how long it took, and which tools it used — all from your own application logs, without needing to open Langfuse.

EU AI Act Compliance Checklist

With NeuroLink's observability stack fully configured, here is what you can demonstrate to an auditor:

Requirement	How NeuroLink Covers It
Audit trail for AI decisions	HITL `HITLAuditLog` with userId, timestamp, decision, responseTime
Human oversight records	HITL approval/rejection events with `allowArgumentModification`
AI system inventory	`result.provider` + `result.model` in every trace gives you a live model inventory
Input/output logging	Langfuse traces capture full prompt and response
Performance monitoring	OTel spans + `result.responseTime` per call
Cost and usage tracking	`result.analytics?.cost` + `result.usage` per call
Risk documentation	`HITLStatistics` gives aggregate oversight metrics

The August 2026 deadline is months away. The teams scrambling to retrofit compliance into their AI stack are the ones who did not build with an observable SDK from the start.

What's Next

You have seen how NeuroLink turns your AI calls from a black box into a fully observable system. Every call is traced. Every cost is captured. Every human oversight decision is audited.

Try it yourself:

npm install @juspay/neurolink

Then sign up for a free Langfuse account, drop in your keys, and run your first traced AI call in under 5 minutes.

NeuroLink — The Universal AI SDK for TypeScript

GitHub: github.com/juspay/neurolink — star us if this helped
Install: npm install @juspay/neurolink
Docs: docs.neurolink.ink
Blog: blog.neurolink.ink — 150+ technical articles

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community