We are shipping @hazeljs/agent 1.0.1 — a patch release focused on operational durability, resilience consolidation, and production observability. If you run agents behind a load balancer, need human-in-the-loop tool approvals, or want circuit breakers and traces in production, this release is for you.
1.0.1 is backward compatible. No breaking API changes — only new optional configuration, factories, and exports.
Quick example
A minimal production bootstrap with Redis-backed state, durable approvals, and strict event handling:
import { HazelApp } from '@hazeljs/core';
import { Agent, Tool, AgentModule, AgentService } from '@hazeljs/agent';
import { createClient } from 'redis';
@Agent({ name: 'ops-agent', description: "'Operations assistant' })"
class OpsAgent {
@Tool({ description: "'Restart a service', requiresApproval: true })"
async restartService(input: { service: string }) {
return { restarted: input.service, at: new Date().toISOString() };
}
}
const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();
await AgentModule.forRootAsync({
redis: { client: redis },
useRedisApprovals: true,
runtime: {
strictEventHandlers: true,
enableCircuitBreaker: true,
observabilityProvider: myObservabilityProvider, // optional
},
});
const app = new HazelApp({ modules: [AgentModule] });
const agentService = app.get(AgentService);
agentService.on('agent.tool.approval.requested', (event) => {
// Approve from any replica — request is stored in Redis
agentService.approveToolExecution(event.data.requestId, 'admin');
});
await agentService.execute('ops-agent', 'Restart the payment worker');
Same agent code as 1.0.0 — only module wiring changes for production.
Why this release?
@hazeljs/agent 1.0.0 shipped a full agent runtime: execution loop, tools, memory/RAG, multi-agent graphs, A2A, streaming, and guardrails hooks. What it did not optimize for was multi-instance production:
| Area | 1.0.0 default | Production risk |
|---|---|---|
| Execution state | In-memory | Lost on process restart |
| Tool approvals | In-memory Maps |
Broken across replicas; lost on crash |
| Retry / rate limit | Local utilities | Drift from @hazeljs/resilience
|
| Observability | Local metrics + events | No OTel spans or LLM cost bridge |
| RAG errors | Silently returned []
|
Hard to debug in prod |
1.0.1 closes these gaps without changing how you define agents or tools.
Durable state and approvals
Environment-driven state backends
New factory helpers pick the right persistence backend from config or environment:
import {
createStateManager,
createStateManagerFromEnv,
AgentModule,
} from '@hazeljs/agent';
// Sync — when you already have a connected Redis client
const stateManager = createStateManager({
backend: 'redis',
redisClient,
});
// Async — connects from REDIS_URL
const stateManager = await createStateManagerFromEnv({
redisUrl: process.env.REDIS_URL,
});
Environment variables:
| Variable | Values | Behavior |
|---|---|---|
AGENT_STATE_BACKEND |
memory, redis, database
|
Explicit backend selection |
REDIS_URL |
Redis connection URL | Auto-selects Redis when set |
AgentModule production defaults
AgentModule.forRoot() and AgentModule.forRootAsync() wire Redis state when a client or URL is provided:
import { AgentModule } from '@hazeljs/agent';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
await AgentModule.forRootAsync({
redis: { client: redisClient },
useRedisApprovals: true,
runtime: {
strictEventHandlers: process.env.NODE_ENV === 'production',
},
});
-
forRoot()— use when you pass a pre-connectedredis.client -
forRootAsync()— connects viaREDIS_URLorredis.urlbefore boot
See PERSISTENCE.md for Redis, Prisma, and hybrid setups.
Pluggable approval store
Tool approvals no longer live only in process memory. A new IApprovalStore interface supports:
-
InMemoryApprovalStore— default for development and single-instance -
RedisApprovalStore— durable requests with TTL; polling for cross-instance resolve
When AgentModule is configured with Redis, approvals are stored in Redis automatically so human-in-the-loop flows work across replicas.
New exports:
import {
IApprovalStore,
InMemoryApprovalStore,
RedisApprovalStore,
createApprovalStore,
} from '@hazeljs/agent';
Resilience consolidation
Local retry and rate-limit utilities now delegate to @hazeljs/resilience:
-
RetryHandler→RetryPolicy -
RateLimiter→TokenBucketLimiter
The public API of RetryHandler and RateLimiter is preserved (marked @deprecated for direct resilience use in a future minor). The deprecated circuit-breaker.js shim was removed.
Circuit breaker behavior is now validated end-to-end: repeated LLM failures through AgentRuntime.execute() open the circuit and subsequent calls fail fast with CircuitBreakerError.
Failed agent executions (AgentState.FAILED) now propagate as errors through the circuit breaker and retry layers instead of returning silently.
Observability
Optional peers were added for production tracing and cost tracking:
-
@hazeljs/observability(optional) -
@opentelemetry/api(optional)
When you pass an observabilityProvider in runtime config, the agent emits OpenTelemetry spans:
| Span | When |
|---|---|
agent.execute |
Full agent run |
agent.tool.execute |
Tool invocation |
agent.llm |
LLM chat call |
Span attributes include agent.name, agent.execution_id, agent.tool.name, and session metadata. LLM usage is bridged to trackCost() when the provider is configured.
import { AgentRuntime } from '@hazeljs/agent';
const runtime = new AgentRuntime({
observabilityProvider: myObservabilityProvider,
llmProvider,
});
No hard dependency on OTel — spans are no-ops unless a provider is injected.
Error handling and production guardrails
RAG failures are visible
RAG search failures no longer silently return an empty context. The runtime now:
- Logs the error with execution/session metadata
- Emits
agent.rag.failed(AgentEventType.RAG_QUERY_FAILED) - Records metrics when enabled
- Continues execution with
ragContext: [](graceful degradation)
Strict event handlers
AgentEventEmitter accepts strictEventHandlers: true. When enabled, errors in event handlers propagate instead of being swallowed — recommended for production.
LLM provider bootstrap
If @hazeljs/ai never registers __HAZELJS_AI_ENHANCED_SERVICE__, AgentService now logs a clear error after 500ms instead of failing silently. Set runtime.llmProvider explicitly or ensure the AI module loads first.
Typed infrastructure clients
State managers use minimal interfaces instead of any:
-
RedisClientLike— Redis state and approval stores -
PrismaClientLike—DatabaseStateManager
Safer to wire real clients without losing type checking at the boundary.
Testing
474 tests pass with coverage thresholds enforced. New integration coverage includes:
- Redis state persistence
- Approval flow via
RedisApprovalStore - Circuit breaker opening under repeated LLM failures
- RAG failure events
Test locations:
tests/integration/production-hardening.test.tstests/integration/hardening-coverage.test.ts
Jest uses tsconfig.jest.json for monorepo-friendly typechecking; optional @hazeljs/eval peer is stubbed during tests.
Upgrade from 1.0.0
npm install @hazeljs/agent@1.0.1
No code changes required for existing apps. To adopt production features incrementally:
-
Redis state — add
redis: { client }orawait AgentModule.forRootAsync({ redis: { url } }) -
Durable approvals — set
useRedisApprovals: truewith a Redis client -
Observability — install
@hazeljs/observabilityand passobservabilityProvider -
Strict events — set
runtime.strictEventHandlers: truein production
What's next (1.x roadmap)
Not in 1.0.1, planned for future minors:
- Durable A2A task store (same Redis pattern as approvals)
- Full
@hazeljs/flowintegration for long-running workflows - Typed DI token replacing global
__HAZELJS_AI_ENHANCED_SERVICE__
Links
Questions or feedback? Open an issue or join the discussion on GitHub.
Top comments (0)