Muhammad Arslan

Posted on Jun 14

@hazeljs/agent 1.0.1: Production Hardening for Real Deployments

#ai #agents #typescript #devops

We are shipping @hazeljs/agent 1.0.1 — a patch release focused on operational durability, resilience consolidation, and production observability. If you run agents behind a load balancer, need human-in-the-loop tool approvals, or want circuit breakers and traces in production, this release is for you.

1.0.1 is backward compatible. No breaking API changes — only new optional configuration, factories, and exports.

Quick example

A minimal production bootstrap with Redis-backed state, durable approvals, and strict event handling:

import { HazelApp } from '@hazeljs/core';
import { Agent, Tool, AgentModule, AgentService } from '@hazeljs/agent';
import { createClient } from 'redis';

@Agent({ name: 'ops-agent', description: "'Operations assistant' })"
class OpsAgent {
  @Tool({ description: "'Restart a service', requiresApproval: true })"
  async restartService(input: { service: string }) {
    return { restarted: input.service, at: new Date().toISOString() };
  }
}

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

await AgentModule.forRootAsync({
  redis: { client: redis },
  useRedisApprovals: true,
  runtime: {
    strictEventHandlers: true,
    enableCircuitBreaker: true,
    observabilityProvider: myObservabilityProvider, // optional
  },
});

const app = new HazelApp({ modules: [AgentModule] });
const agentService = app.get(AgentService);

agentService.on('agent.tool.approval.requested', (event) => {
  // Approve from any replica — request is stored in Redis
  agentService.approveToolExecution(event.data.requestId, 'admin');
});

await agentService.execute('ops-agent', 'Restart the payment worker');

Same agent code as 1.0.0 — only module wiring changes for production.

Why this release?

@hazeljs/agent 1.0.0 shipped a full agent runtime: execution loop, tools, memory/RAG, multi-agent graphs, A2A, streaming, and guardrails hooks. What it did not optimize for was multi-instance production:

Area	1.0.0 default	Production risk
Execution state	In-memory	Lost on process restart
Tool approvals	In-memory `Map`s	Broken across replicas; lost on crash
Retry / rate limit	Local utilities	Drift from `@hazeljs/resilience`
Observability	Local metrics + events	No OTel spans or LLM cost bridge
RAG errors	Silently returned `[]`	Hard to debug in prod

1.0.1 closes these gaps without changing how you define agents or tools.

Durable state and approvals

Environment-driven state backends

New factory helpers pick the right persistence backend from config or environment:

import {
  createStateManager,
  createStateManagerFromEnv,
  AgentModule,
} from '@hazeljs/agent';

// Sync — when you already have a connected Redis client
const stateManager = createStateManager({
  backend: 'redis',
  redisClient,
});

// Async — connects from REDIS_URL
const stateManager = await createStateManagerFromEnv({
  redisUrl: process.env.REDIS_URL,
});

Environment variables:

Variable	Values	Behavior
`AGENT_STATE_BACKEND`	`memory`, `redis`, `database`	Explicit backend selection
`REDIS_URL`	Redis connection URL	Auto-selects Redis when set

AgentModule production defaults

AgentModule.forRoot() and AgentModule.forRootAsync() wire Redis state when a client or URL is provided:

import { AgentModule } from '@hazeljs/agent';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

await AgentModule.forRootAsync({
  redis: { client: redisClient },
  useRedisApprovals: true,
  runtime: {
    strictEventHandlers: process.env.NODE_ENV === 'production',
  },
});

forRoot() — use when you pass a pre-connected redis.client
forRootAsync() — connects via REDIS_URL or redis.url before boot

See PERSISTENCE.md for Redis, Prisma, and hybrid setups.

Pluggable approval store

Tool approvals no longer live only in process memory. A new IApprovalStore interface supports:

InMemoryApprovalStore — default for development and single-instance
RedisApprovalStore — durable requests with TTL; polling for cross-instance resolve

When AgentModule is configured with Redis, approvals are stored in Redis automatically so human-in-the-loop flows work across replicas.

New exports:

import {
  IApprovalStore,
  InMemoryApprovalStore,
  RedisApprovalStore,
  createApprovalStore,
} from '@hazeljs/agent';

Resilience consolidation

Local retry and rate-limit utilities now delegate to @hazeljs/resilience:

RetryHandler → RetryPolicy
RateLimiter → TokenBucketLimiter

The public API of RetryHandler and RateLimiter is preserved (marked @deprecated for direct resilience use in a future minor). The deprecated circuit-breaker.js shim was removed.

Circuit breaker behavior is now validated end-to-end: repeated LLM failures through AgentRuntime.execute() open the circuit and subsequent calls fail fast with CircuitBreakerError.

Failed agent executions (AgentState.FAILED) now propagate as errors through the circuit breaker and retry layers instead of returning silently.

Observability

Optional peers were added for production tracing and cost tracking:

@hazeljs/observability (optional)
@opentelemetry/api (optional)

When you pass an observabilityProvider in runtime config, the agent emits OpenTelemetry spans:

Span	When
`agent.execute`	Full agent run
`agent.tool.execute`	Tool invocation
`agent.llm`	LLM chat call

Span attributes include agent.name, agent.execution_id, agent.tool.name, and session metadata. LLM usage is bridged to trackCost() when the provider is configured.

import { AgentRuntime } from '@hazeljs/agent';

const runtime = new AgentRuntime({
  observabilityProvider: myObservabilityProvider,
  llmProvider,
});

No hard dependency on OTel — spans are no-ops unless a provider is injected.

Error handling and production guardrails

RAG failures are visible

RAG search failures no longer silently return an empty context. The runtime now:

Logs the error with execution/session metadata
Emits agent.rag.failed (AgentEventType.RAG_QUERY_FAILED)
Records metrics when enabled
Continues execution with ragContext: [] (graceful degradation)

Strict event handlers

AgentEventEmitter accepts strictEventHandlers: true. When enabled, errors in event handlers propagate instead of being swallowed — recommended for production.

LLM provider bootstrap

If @hazeljs/ai never registers __HAZELJS_AI_ENHANCED_SERVICE__, AgentService now logs a clear error after 500ms instead of failing silently. Set runtime.llmProvider explicitly or ensure the AI module loads first.

Typed infrastructure clients

State managers use minimal interfaces instead of any:

RedisClientLike — Redis state and approval stores
PrismaClientLike — DatabaseStateManager

Safer to wire real clients without losing type checking at the boundary.

Testing

474 tests pass with coverage thresholds enforced. New integration coverage includes:

Redis state persistence
Approval flow via RedisApprovalStore
Circuit breaker opening under repeated LLM failures
RAG failure events

Test locations:

tests/integration/production-hardening.test.ts
tests/integration/hardening-coverage.test.ts

Jest uses tsconfig.jest.json for monorepo-friendly typechecking; optional @hazeljs/eval peer is stubbed during tests.

Upgrade from 1.0.0

npm install @hazeljs/agent@1.0.1

No code changes required for existing apps. To adopt production features incrementally:

Redis state — add redis: { client } or await AgentModule.forRootAsync({ redis: { url } })
Durable approvals — set useRedisApprovals: true with a Redis client
Observability — install @hazeljs/observability and pass observabilityProvider
Strict events — set runtime.strictEventHandlers: true in production

What's next (1.x roadmap)

Not in 1.0.1, planned for future minors:

Durable A2A task store (same Redis pattern as approvals)
Full @hazeljs/flow integration for long-running workflows
Typed DI token replacing global __HAZELJS_AI_ENHANCED_SERVICE__

Links

Questions or feedback? Open an issue or join the discussion on GitHub.

DEV Community