Your AI is a black box. Here's how to open it.
You deployed an AI feature. Users are complaining it's slow. Sometimes it returns garbage. You have no idea which model ran, what prompt was sent, or how many tokens it consumed. You open your cloud dashboard and see a single line item: "AI API calls — $847.23 this month."
That's the state of most AI applications in production. You are flying blind.
The good news: this is a solved problem. OpenTelemetry has defined standard semantic conventions for AI systems. Langfuse gives you a beautiful UI to inspect every trace. And NeuroLink wires all of this up automatically — zero boilerplate, one config block.
This article shows you how to go from zero observability to full tracing, monitoring, and EU AI Act-ready audit logging in under 30 minutes.
Why AI Observability Is No Longer Optional
Observability has always mattered for APIs and databases. For AI, it matters more — and for reasons beyond debugging.
Debugging: When your model returns a hallucination, you need to know the exact prompt, the provider, the model version, the temperature setting, and the full token usage. Without traces, you're guessing.
Cost management: LLM API costs are non-linear. A single misbehaving agent can consume thousands of dollars in a weekend. Token-level tracing lets you catch runaway usage before it hits your invoice.
Performance: Is your p99 latency 8 seconds because of the model, the network, your RAG pipeline, or a slow tool call? You can only answer that with distributed tracing.
Compliance: The EU AI Act's high-risk provisions go live August 2, 2026. Penalties reach €35M or 7% of global revenue. Among the requirements: maintaining auditable records of AI decision-making, human oversight procedures, and risk documentation. Auditable AI is now a regulatory requirement, not a best practice.
NeuroLink's Observability Stack
NeuroLink ships with two observability integrations out of the box:
- Langfuse — the leading open-source LLM observability platform, with a hosted cloud option and self-hosted Docker deployment
- OpenTelemetry — the CNCF standard for distributed tracing, compatible with Jaeger, Tempo, Honeycomb, Datadog, and any OTel-compatible backend
You can use one, the other, or both simultaneously. Here is the full configuration type:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({
observability: {
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
environment: "production",
// How traces are named in Langfuse
traceNameFormat: "userId:operationName",
// Format options:
// "userId:operationName" -> "user@email.com:ai.streamText"
// "operationName:userId" -> "ai.streamText:user@email.com"
// "operationName" -> "ai.streamText"
// "userId" -> "user@email.com"
// (ctx) => `[${ctx.operationName}] ${ctx.userId}` // custom function
autoDetectOperationName: true,
// If your app already has an OTel setup, plug in instead of creating a new one
useExternalTracerProvider: false,
autoDetectExternalProvider: true,
skipLangfuseSpanProcessor: false,
},
openTelemetry: {
enabled: true,
endpoint: "https://otel-collector.example.com",
serviceName: "my-ai-service",
serviceVersion: "1.0.0",
},
},
});
That's the entire setup. Every generate() call is now automatically instrumented.
Langfuse: Seeing Inside Every LLM Call
Langfuse traces show you the full lifecycle of each AI request: inputs, outputs, model selection, token counts, latency, and cost — all in a searchable UI.
Basic Setup
Sign up at langfuse.com (or self-host with Docker). Grab your public and secret keys from the project settings.
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({
observability: {
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
environment: "production",
traceNameFormat: "userId:operationName",
autoDetectOperationName: true,
},
},
});
Correlating Traces with requestId
Every AI call can carry a requestId that flows through your entire observability stack. This is the key to answering "which AI call was responsible for this user complaint?"
// requestId appears in Langfuse traces, OTel spans, and your application logs
const result = await neurolink.generate({
input: { text: "Analyze sentiment of customer feedback" },
provider: "anthropic",
model: "claude-sonnet-4-6",
requestId: "req-customer-feedback-001", // your request correlation ID
});
// result.analytics contains per-call metrics
console.log(`Tokens used: ${result.usage?.totalTokens}`);
console.log(`Response time: ${result.responseTime}ms`);
console.log(`Provider: ${result.provider}, Model: ${result.model}`);
In Langfuse, you can search by requestId and immediately pull up the full trace: the exact prompt, the model response, token counts, latency breakdown, and cost.
Trace Naming Strategies
The traceNameFormat option controls how traces are organized in Langfuse. For multi-tenant applications, userId:operationName groups all of a user's AI activity together. For debugging by operation type, operationName:userId lets you filter by what the AI was doing.
You can also use a custom function for complete control:
const neurolink = new NeuroLink({
observability: {
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
// Custom trace naming for your specific context
traceNameFormat: (ctx) => `[${ctx.operationName}] user=${ctx.userId} env=${process.env.NODE_ENV}`,
},
},
});
OpenTelemetry: Distributed Tracing with GenAI Semantic Conventions
OpenTelemetry's GenAI semantic conventions define a standard set of attributes for AI spans. NeuroLink automatically populates all of them on every call.
What Gets Traced
The following OTel attributes are captured on every generate() call:
gen_ai.system -> "anthropic" | "openai" | "vertex" | ...
gen_ai.request.model -> "claude-sonnet-4-6" | "gpt-4o" | ...
gen_ai.response.model -> the model that actually responded
gen_ai.usage.input_tokens -> prompt token count
gen_ai.usage.output_tokens -> completion token count
gen_ai.request.temperature -> temperature setting
gen_ai.request.max_tokens -> max tokens setting
ai.operationId -> operation identifier
ai.finishReason -> "stop" | "length" | "tool_calls" | ...
These are the same attributes used by Datadog, Honeycomb, and every major OTel-compatible APM. Your AI spans integrate seamlessly with your existing infrastructure traces.
Connecting to Your OTel Collector
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({
observability: {
openTelemetry: {
enabled: true,
endpoint: "https://otel-collector.your-domain.com",
serviceName: "ai-service",
serviceVersion: "2.1.0",
},
},
});
For teams already running OTel in their application (common in microservice architectures), NeuroLink can plug into your existing tracer provider instead of creating a new one:
const neurolink = new NeuroLink({
observability: {
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
// Don't create a new OTel provider — use the one already initialized
useExternalTracerProvider: true,
autoDetectExternalProvider: true,
},
},
});
This means your AI spans appear in the same trace as your database queries and HTTP calls — full end-to-end visibility.
Context Compaction: Observing What You Can't See
Long-running agent conversations hit context limits. When they do, something has to give. NeuroLink handles this with a 4-stage context compaction pipeline — and understanding this pipeline is critical for observability.
Stage 1: Tool Output Pruning (no LLM call — free)
Stage 2: File Read Deduplication (no LLM call — free)
Stage 3: LLM Summarization (LLM call — costs tokens)
Stage 4: Sliding Window Truncation (no LLM call — fallback)
The pipeline tries the cheapest option first and escalates only when necessary. But Stage 3 — LLM summarization — is itself an AI call that consumes tokens and incurs cost. Without observability, you would not know it was happening.
With Langfuse tracing enabled, each compaction stage appears as a child span in your trace. You can see exactly when context compaction triggers, which stage ran, and how many tokens the summarization consumed.
Here is how to configure the compaction pipeline:
import { ContextCompactor } from "@juspay/neurolink";
const compactor = new ContextCompactor({
enablePrune: true,
enableDeduplicate: true,
enableSummarize: true, // Stage 3: uses an LLM — visible in traces
enableTruncate: true, // Stage 4: fallback, no LLM cost
pruneProtectTokens: 40_000, // protect the last 40k tokens from pruning
pruneMinimumSavings: 20_000, // only prune if it saves 20k+ tokens
pruneProtectedTools: ["skill"],
summarizationProvider: "vertex",
summarizationModel: "gemini-2.5-flash",
keepRecentRatio: 0.3,
truncationFraction: 0.5,
});
The choice of summarizationProvider and summarizationModel lets you route compaction calls to a cheaper model — for example, using Gemini Flash for summarization while your main agent uses Claude Sonnet. This cost optimization is visible in your Langfuse traces: you will see two different models in the same conversation trace.
HITL Audit Logging: The Compliance Layer
Human-in-the-Loop (HITL) is NeuroLink's safety system for AI agents that take real-world actions. When an agent tries to call a dangerous tool — delete, drop, truncate, kill — HITL intercepts it and waits for human approval.
HITL's audit logging is a core part of your observability stack, especially for EU AI Act compliance. Every approval, rejection, and timeout is logged with full context.
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({
hitl: {
enabled: true,
dangerousActions: ["delete", "drop", "truncate", "remove", "kill"],
timeout: 30000, // 30 seconds for human to respond
allowArgumentModification: true, // human can edit args before approving
autoApproveOnTimeout: false, // reject on timeout — safe default
auditLogging: true, // write compliance audit trail
customRules: [
{
name: "production-database-rule",
condition: (toolName, args) => {
return toolName.includes("database") &&
JSON.stringify(args).includes("production");
},
requiresConfirmation: true,
customMessage: "This action touches the production database!",
},
],
},
});
The HITLAuditLog type captures everything the EU AI Act's human oversight requirements ask for:
// What the audit log captures (HITLAuditLog type):
{
eventType: "confirmation-request" | "confirmation-response" | "timeout",
toolName: string, // which tool was called
arguments: unknown, // what arguments it was called with
approved: boolean, // what the human decided
reason: string, // why they approved or rejected
userId: string, // who made the decision
ipAddress: string, // from where
userAgent: string, // from which client
responseTime: number, // how long the human took (ms)
timestamp: string, // ISO timestamp
}
This audit log is your paper trail. When a regulator asks "did a human review this AI action?", you have a timestamped, tamper-evident record.
HITL Statistics
You can query live statistics from the HITL manager at any time:
const stats = hitlManager.getStatistics();
// {
// totalRequests: number, // all-time confirmation requests
// pendingRequests: number, // currently awaiting human decision
// averageResponseTime: number, // ms average across all decisions
// approvedRequests: number,
// rejectedRequests: number,
// timedOutRequests: number,
// }
Expose this via a metrics endpoint and you have a live dashboard of how often your AI is asking for human oversight — and how quickly humans are responding.
Putting It All Together: A Production-Ready Observable AI Setup
Here is a complete production setup combining all three layers: Langfuse tracing, OTel spans, and HITL audit logging.
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink({
// Full observability stack
observability: {
langfuse: {
enabled: true,
publicKey: process.env.LANGFUSE_PUBLIC_KEY!,
secretKey: process.env.LANGFUSE_SECRET_KEY!,
environment: process.env.NODE_ENV as "production" | "development",
traceNameFormat: "userId:operationName",
autoDetectOperationName: true,
},
openTelemetry: {
enabled: true,
endpoint: process.env.OTEL_EXPORTER_OTLP_ENDPOINT!,
serviceName: "ai-backend",
serviceVersion: process.env.npm_package_version ?? "unknown",
},
},
// HITL with audit logging for compliance
hitl: {
enabled: true,
dangerousActions: ["delete", "drop", "truncate", "remove", "kill"],
timeout: 30000,
allowArgumentModification: true,
autoApproveOnTimeout: false,
auditLogging: true,
},
});
// Every generate() call is now fully traced
async function analyzeCustomerFeedback(
userId: string,
feedbackText: string,
requestId: string
) {
const result = await neurolink.generate({
input: { text: `Analyze the sentiment and key themes in: ${feedbackText}` },
provider: "anthropic",
model: "claude-sonnet-4-6",
requestId, // correlates this call across Langfuse, OTel, and your own logs
});
// result.analytics — per-call metrics
console.log(`Tokens: ${result.usage?.totalTokens}`);
console.log(`Cost: $${result.analytics?.cost?.toFixed(6)}`);
console.log(`Latency: ${result.responseTime}ms`);
console.log(`Provider: ${result.provider}`);
return result.content;
}
When you call analyzeCustomerFeedback, here is what happens automatically:
- A trace is created in Langfuse with the name
userId:ai.generate - An OTel span is created with
gen_ai.*attributes - The span is exported to your OTel collector
-
requestIdappears in both traces, linking them - If a tool call triggers HITL, the approval decision is written to the audit log
-
result.analyticsgives you the cost and token breakdown
Reading the Analytics Data
Every generate() result includes an analytics field with per-call metrics. You do not need observability enabled to access this — it is always present.
const result = await neurolink.generate({
input: { text: "Summarize this quarterly report" },
provider: "openai",
model: "gpt-4o",
});
// Usage breakdown
console.log(result.usage?.inputTokens); // prompt tokens
console.log(result.usage?.outputTokens); // completion tokens
console.log(result.usage?.totalTokens); // sum
// Performance
console.log(result.responseTime); // milliseconds
// Which provider/model actually ran
console.log(result.provider); // "openai"
console.log(result.model); // "gpt-4o"
// Tools called (if any)
console.log(result.toolsUsed); // ["search", "read_file"]
// Cost (if analytics enabled — it is by default)
console.log(result.analytics?.cost); // USD float
Combine this with requestId correlation and you can build a complete picture: which user triggered which AI call, what it cost, how long it took, and which tools it used — all from your own application logs, without needing to open Langfuse.
EU AI Act Compliance Checklist
With NeuroLink's observability stack fully configured, here is what you can demonstrate to an auditor:
| Requirement | How NeuroLink Covers It |
|---|---|
| Audit trail for AI decisions | HITL HITLAuditLog with userId, timestamp, decision, responseTime |
| Human oversight records | HITL approval/rejection events with allowArgumentModification
|
| AI system inventory |
result.provider + result.model in every trace gives you a live model inventory |
| Input/output logging | Langfuse traces capture full prompt and response |
| Performance monitoring | OTel spans + result.responseTime per call |
| Cost and usage tracking |
result.analytics?.cost + result.usage per call |
| Risk documentation |
HITLStatistics gives aggregate oversight metrics |
The August 2026 deadline is months away. The teams scrambling to retrofit compliance into their AI stack are the ones who did not build with an observable SDK from the start.
What's Next
You have seen how NeuroLink turns your AI calls from a black box into a fully observable system. Every call is traced. Every cost is captured. Every human oversight decision is audited.
Try it yourself:
npm install @juspay/neurolink
Then sign up for a free Langfuse account, drop in your keys, and run your first traced AI call in under 5 minutes.
Resources:
- NeuroLink GitHub Repository — source code, examples, and full documentation
- NeuroLink Discord — community, Q&A, and announcements
- Langfuse Docs — setting up your observability dashboard
- OpenTelemetry GenAI Semantic Conventions — the standard your traces follow
This is part of the NeuroLink AI Development series. Previous articles covered HITL safety systems, RAG pipelines, MCP in production, and multi-model workflow engines.
Top comments (1)
Quick personal review of AhaChat after trying it
I recently tried AhaChat to set up a chatbot for a small Facebook page I manage, so I thought I’d share my experience.
I don’t have any coding background, so ease of use was important for me. The drag-and-drop interface was pretty straightforward, and creating simple automated reply flows wasn’t too complicated. I mainly used it to handle repetitive questions like pricing, shipping fees, and business hours, which saved me a decent amount of time.
I also tested a basic flow to collect customer info (name + phone number). It worked fine, and everything is set up with simple “if–then” logic rather than actual coding.
It’s not an advanced AI that understands everything automatically — it’s more of a rule-based chatbot where you design the conversation flow yourself. But for basic automation and reducing manual replies, it does the job.
Overall thoughts:
Good for small businesses or beginners
Easy to set up
No technical skills required
I’m not affiliated with them — just sharing in case someone is looking into chatbot tools for simple automation.
Curious if anyone else here has tried it or similar platforms — what was your experience?