The latest update to hermes-memory-installer introduces a focused set of features that directly address production-level concerns: observability, storage management, security, fault tolerance, and performance introspection. If you maintain a message-processing pipeline or job queue, these are the components that often decide whether your system survives peak loads or security audits without manual heroics. Let's break down each addition and how you can integrate them into your workflow.
System Metrics
Exposing runtime health is no longer an afterthought. The new metrics module taps into the core processing loop and emits standard Prometheus-formatted data: message throughput (count and rate), latency percentiles, queue depths, and goroutine or thread pool utilization. This isn't a simple "up/down" gauge—you get histograms for processing duration and derived metrics like consumer lag. For example, if you run multiple worker instances, you can now directly compare their processing speeds via a Grafana dashboard. The endpoint is configurable, so you can keep it behind a reverse proxy or internal load balancer. Memory pressure triggers a separate gauge for heap usage per queue, which helps with capacity planning before it becomes a midnight incident.
Auto-Archive
Without auto-archive, old messages accumulate in memory or primary storage, driving up costs and slowing down scans. This feature moves processed or expired messages to a cheaper tier (S3, GCS, or local file system) based on age or queue size. The archive process is a background task that runs on a cron-like schedule; you can define how many messages to retain per queue before archiving kicks in. The compression is transparent—gzip by default, but you can switch to snappy or zstd. A key detail: archived messages retain their metadata and can be restored if needed, though the replay path skips them automatically unless explicitly requested. This is useful for audit trails or multi-region cold replicas.
Token Rotation
Security teams hate long-lived static credentials. The token rotation feature automates the issuance and revocation of bearer tokens used for inter-service authentication. You set a rotation interval (hours or days), and the system generates a new token before the current one expires, revoking the old one within a few seconds. The tokens are JWT-based with a configurable issuer and audience, making it easy to integrate with your existing OAuth2 infrastructure. The rotation itself is handled by a dedicated worker that runs independently from the main processing pipeline—so even under heavy load, token refresh events don't block message handling. The feature also logs every rotation event with the token’s fingerprint (hash minus the secret), so you can audit key changes without exposing credentials.
Dead-Letter Replay
Messages that exhaust all retries land in a dead-letter queue (DLQ). Previously, you had to export them manually and re-inject them. Now, you can replay individual messages or an entire batch with a single API call. The replay respects the original ordering and deduplication keys, but you can strip or modify headers (exponential backoff counters) before re-queuing. This is critical when a downstream service recovers and you need to reprocess failures without losing progress. The replay also logs each attempt with the original error message, so you can correlate the fix with the retry result. And if a replayed message fails again, it goes back to the DLQ with an incremented retry count—no infinite loops.
Prof
The prof feature is a lightweight performance sampler that collects CPU and memory profiles on demand. You trigger it via a signal or an HTTP endpoint, and it dumps a pprof-compatible trace. Unlike always-on instrumentation, this avoids overhead in normal operation. The profiles include component-level labels (queue name, worker ID, codec used), so you can see which part of the pipeline consumes the most resources. For memory-heavy transformations, this helps you decide whether to pre-allocate buffers or switch to stream processing. Combined with the system metrics, profiling gives you the why behind the what.
Code Example
Below is a minimal configuration snippet that enables metrics and token rotation (TOML format). The metrics endpoint will be available at /internal/metrics, and tokens rotate every 12 hours.
[metrics]
enabled = true
endpoint = "/internal/metrics"
histogram_buckets = [0.01, 0.05, 0.1, 0.5, 1, 5]
[security]
token_rotation = true
rotation_interval = "12h"
token_issuer = "hermes-cluster"
Restart the service, and you can curl the endpoint immediately. The rotation module will start the background refresh on the first tick.
Conclusion
This release closes several operational gaps without adding unnecessary abstraction. The metrics and profiling together give you a complete observability story; auto-archive reduces storage costs and keeps hot queues lean; token rotation bulletproofs your service mesh; dead-letter replay saves hours of manual recovery. Each feature can be enabled independently, so you can roll them out gradually. If you manage a production system that processes messages or jobs, this update makes hermes-memory-installer a more resilient backbone for your infrastructure.
Top comments (0)