DEV Community

Cover image for @hazeljs/agent 1.0.1: Production Hardening for Real Deployments
Muhammad Arslan
Muhammad Arslan

Posted on

@hazeljs/agent 1.0.1: Production Hardening for Real Deployments

We are shipping @hazeljs/agent 1.0.1 — a patch release focused on operational durability, resilience consolidation, and production observability. If you run agents behind a load balancer, need human-in-the-loop tool approvals, or want circuit breakers and traces in production, this release is for you.

1.0.1 is backward compatible. No breaking API changes — only new optional configuration, factories, and exports.


Quick example

A minimal production bootstrap with Redis-backed state, durable approvals, and strict event handling:

import { HazelApp } from '@hazeljs/core';
import { Agent, Tool, AgentModule, AgentService } from '@hazeljs/agent';
import { createClient } from 'redis';

@Agent({ name: 'ops-agent', description: "'Operations assistant' })"
class OpsAgent {
  @Tool({ description: "'Restart a service', requiresApproval: true })"
  async restartService(input: { service: string }) {
    return { restarted: input.service, at: new Date().toISOString() };
  }
}

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

await AgentModule.forRootAsync({
  redis: { client: redis },
  useRedisApprovals: true,
  runtime: {
    strictEventHandlers: true,
    enableCircuitBreaker: true,
    observabilityProvider: myObservabilityProvider, // optional
  },
});

const app = new HazelApp({ modules: [AgentModule] });
const agentService = app.get(AgentService);

agentService.on('agent.tool.approval.requested', (event) => {
  // Approve from any replica — request is stored in Redis
  agentService.approveToolExecution(event.data.requestId, 'admin');
});

await agentService.execute('ops-agent', 'Restart the payment worker');
Enter fullscreen mode Exit fullscreen mode

Same agent code as 1.0.0 — only module wiring changes for production.


Why this release?

@hazeljs/agent 1.0.0 shipped a full agent runtime: execution loop, tools, memory/RAG, multi-agent graphs, A2A, streaming, and guardrails hooks. What it did not optimize for was multi-instance production:

Area 1.0.0 default Production risk
Execution state In-memory Lost on process restart
Tool approvals In-memory Maps Broken across replicas; lost on crash
Retry / rate limit Local utilities Drift from @hazeljs/resilience
Observability Local metrics + events No OTel spans or LLM cost bridge
RAG errors Silently returned [] Hard to debug in prod

1.0.1 closes these gaps without changing how you define agents or tools.


Durable state and approvals

Environment-driven state backends

New factory helpers pick the right persistence backend from config or environment:

import {
  createStateManager,
  createStateManagerFromEnv,
  AgentModule,
} from '@hazeljs/agent';

// Sync — when you already have a connected Redis client
const stateManager = createStateManager({
  backend: 'redis',
  redisClient,
});

// Async — connects from REDIS_URL
const stateManager = await createStateManagerFromEnv({
  redisUrl: process.env.REDIS_URL,
});
Enter fullscreen mode Exit fullscreen mode

Environment variables:

Variable Values Behavior
AGENT_STATE_BACKEND memory, redis, database Explicit backend selection
REDIS_URL Redis connection URL Auto-selects Redis when set

AgentModule production defaults

AgentModule.forRoot() and AgentModule.forRootAsync() wire Redis state when a client or URL is provided:

import { AgentModule } from '@hazeljs/agent';
import { createClient } from 'redis';

const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();

await AgentModule.forRootAsync({
  redis: { client: redisClient },
  useRedisApprovals: true,
  runtime: {
    strictEventHandlers: process.env.NODE_ENV === 'production',
  },
});
Enter fullscreen mode Exit fullscreen mode
  • forRoot() — use when you pass a pre-connected redis.client
  • forRootAsync() — connects via REDIS_URL or redis.url before boot

See PERSISTENCE.md for Redis, Prisma, and hybrid setups.

Pluggable approval store

Tool approvals no longer live only in process memory. A new IApprovalStore interface supports:

  • InMemoryApprovalStore — default for development and single-instance
  • RedisApprovalStore — durable requests with TTL; polling for cross-instance resolve

When AgentModule is configured with Redis, approvals are stored in Redis automatically so human-in-the-loop flows work across replicas.

New exports:

import {
  IApprovalStore,
  InMemoryApprovalStore,
  RedisApprovalStore,
  createApprovalStore,
} from '@hazeljs/agent';
Enter fullscreen mode Exit fullscreen mode

Resilience consolidation

Local retry and rate-limit utilities now delegate to @hazeljs/resilience:

  • RetryHandlerRetryPolicy
  • RateLimiterTokenBucketLimiter

The public API of RetryHandler and RateLimiter is preserved (marked @deprecated for direct resilience use in a future minor). The deprecated circuit-breaker.js shim was removed.

Circuit breaker behavior is now validated end-to-end: repeated LLM failures through AgentRuntime.execute() open the circuit and subsequent calls fail fast with CircuitBreakerError.

Failed agent executions (AgentState.FAILED) now propagate as errors through the circuit breaker and retry layers instead of returning silently.


Observability

Optional peers were added for production tracing and cost tracking:

  • @hazeljs/observability (optional)
  • @opentelemetry/api (optional)

When you pass an observabilityProvider in runtime config, the agent emits OpenTelemetry spans:

Span When
agent.execute Full agent run
agent.tool.execute Tool invocation
agent.llm LLM chat call

Span attributes include agent.name, agent.execution_id, agent.tool.name, and session metadata. LLM usage is bridged to trackCost() when the provider is configured.

import { AgentRuntime } from '@hazeljs/agent';

const runtime = new AgentRuntime({
  observabilityProvider: myObservabilityProvider,
  llmProvider,
});
Enter fullscreen mode Exit fullscreen mode

No hard dependency on OTel — spans are no-ops unless a provider is injected.


Error handling and production guardrails

RAG failures are visible

RAG search failures no longer silently return an empty context. The runtime now:

  • Logs the error with execution/session metadata
  • Emits agent.rag.failed (AgentEventType.RAG_QUERY_FAILED)
  • Records metrics when enabled
  • Continues execution with ragContext: [] (graceful degradation)

Strict event handlers

AgentEventEmitter accepts strictEventHandlers: true. When enabled, errors in event handlers propagate instead of being swallowed — recommended for production.

LLM provider bootstrap

If @hazeljs/ai never registers __HAZELJS_AI_ENHANCED_SERVICE__, AgentService now logs a clear error after 500ms instead of failing silently. Set runtime.llmProvider explicitly or ensure the AI module loads first.

Typed infrastructure clients

State managers use minimal interfaces instead of any:

  • RedisClientLike — Redis state and approval stores
  • PrismaClientLikeDatabaseStateManager

Safer to wire real clients without losing type checking at the boundary.


Testing

474 tests pass with coverage thresholds enforced. New integration coverage includes:

  • Redis state persistence
  • Approval flow via RedisApprovalStore
  • Circuit breaker opening under repeated LLM failures
  • RAG failure events

Test locations:

  • tests/integration/production-hardening.test.ts
  • tests/integration/hardening-coverage.test.ts

Jest uses tsconfig.jest.json for monorepo-friendly typechecking; optional @hazeljs/eval peer is stubbed during tests.


Upgrade from 1.0.0

npm install @hazeljs/agent@1.0.1
Enter fullscreen mode Exit fullscreen mode

No code changes required for existing apps. To adopt production features incrementally:

  1. Redis state — add redis: { client } or await AgentModule.forRootAsync({ redis: { url } })
  2. Durable approvals — set useRedisApprovals: true with a Redis client
  3. Observability — install @hazeljs/observability and pass observabilityProvider
  4. Strict events — set runtime.strictEventHandlers: true in production

What's next (1.x roadmap)

Not in 1.0.1, planned for future minors:

  • Durable A2A task store (same Redis pattern as approvals)
  • Full @hazeljs/flow integration for long-running workflows
  • Typed DI token replacing global __HAZELJS_AI_ENHANCED_SERVICE__

Links


Questions or feedback? Open an issue or join the discussion on GitHub.

Top comments (0)