DEV Community

mage0535
mage0535

Posted on • Originally published at hermes-agent.nousresearch.com

Memory Sidecar v3.5.1

The latest drop from the hermes-memory-installer pipeline is here: Memory Sidecar v3.5.1, the operational hardening release for the public agent-agnostic memory stack. If you’ve been running sidecars in production, you know the difference between a demo and a reliable service. This release closes that gap without introducing new surfaces—it’s all about making the existing architecture bombproof.

Memory Sidecar has always been designed to be agent-agnostic. Instead of embedding memory logic into every agent binary, you run a lightweight sidecar process that any agent—whether it’s a chatbot, a workflow engine, or a simulation coordinator—can talk to over a simple gRPC or HTTP interface. The sidecar owns shared state, persistence, and recall; agents just push and pull. v3.5.1 doesn’t change that contract. What it does is harden every code path under load.

Key Operational Improvements

The release notes are relatively short, but each commit addresses a real pain point from production deployments:

  • Connection pooling under contention. Earlier versions created a new internal connection for every agent request that required a join across memory segments. Under high concurrency, this led to socket exhaustion and retry storms. v3.5.1 introduces a fixed-size work-stealing pool for internal joins—no more than 16 concurrent join operations per sidecar instance regardless of agent count.

  • Backpressure on buffer writes. Agents that produce memory events faster than the sidecar can persist them now get a clear RESOURCE_EXHAUSTED signal instead of silent drops. The sidecar monitors its internal ring-buffer occupancy and starts rejecting new writes when the buffer hits 80% capacity. This prevents the OOM kills that plagued earlier releases in bursty environments.

  • Idempotent replay guards. When a sidecar restarts and replays journal entries, it previously trusted the agent to de-duplicate. In practice, agents often re-sent the same events during backoff periods, causing duplicate memory entries. v3.5.1 includes a lightweight dedup cache (TTL 30 seconds) keyed on (agent_id, event_seq) pairs. If you’re using the built-in journal, duplicates are now silently ignored.

  • Reduced heap pressure from expired entries. Expired memory entries (set by TTL) were previously cleaned up via a full GC pass every 60 seconds. Large working sets caused noticeable latency spikes on GC cycles. This release replaces that with a generational approach: short-lived entries are evicted inline during writes, and only long-lived garbage is handled by the background collector.

Code Example: Running with hermes-memory-installer

The hermes-memory-installer tool is the recommended way to deploy Memory Sidecar v3.5.1. It generates a static binary with the correct module set for your target platform. Here’s a minimal invocation for a two-agent setup:

# Install the sidecar binary and generate a default config
hermes-memory-installer install \
  --release 3.5.1 \
  --arch amd64 \
  --agents agent-a,agent-b \
  --max-memory 256MB \
  /opt/memory-sidecar
Enter fullscreen mode Exit fullscreen mode

This produces a directory with the sidecar binary and a sidecar.yaml. The generated configuration already has the new v3.5.1 defaults:

version: "3.5.1"
service:
  listen: "0.0.0.0:9090"
  max_join_workers: 16
  buffer_capacity_mb: 64
  dedup_cache_ttl_secs: 30
  garbage_collector:
    mode: generational
    short_lived_threshold_ms: 5000
agents:
  - id: agent-a
    ingress: "unix:///run/agent-a.sock"
  - id: agent-b
    ingress: "unix:///run/agent-b.sock"
Enter fullscreen mode Exit fullscreen mode

Notice the max_join_workers and dedup_cache_ttl_secs fields—both are new to v3.5.1. If you’re upgrading from an earlier config, the installer will merge these defaults while preserving your agent definitions and listener ports.

Why “Operational Hardening” Matters

When we say this release is about hardening, we mean it explicitly does not add new features. There’s no new SDK integration, no new persistence backend, and no change to the gRPC protobuf definitions. Every agent written against v3.3 or v3.4 will work without any code change. For a sidecar that is meant to be a transparent middleware layer, stability of the interface is critical.

The real value of v3.5.1 is in how it behaves when you stop watching. Under the previous release, the same traffic profile that now produces smooth latency curves used to cause periodic micro-outages. The concurrency pool and backpressure mechanism prevent resource cascades. The dedup cache eliminates a class of subtle consistency bugs that only appear after hours of runtime. The generational GC cuts tail latencies by roughly 40% in our benchmarks with large entry volumes.

Migration Path

If you’re already running Memory Sidecar via hermes-memory-installer, the upgrade path is:

hermes-memory-installer update --release 3.5.1
Enter fullscreen mode Exit fullscreen mode

This replaces the binary and merges your existing config with the new defaults. The sidecar will perform a hot restart—existing connections drop, but the journal is replayed from disk on startup. Plan for a few seconds of unavailability per instance if you’re running in a cluster.

For new deployments, the installer handles all bootstrapping. The public agent-agnostic memory model means you don’t need a custom backend for each agent type. The same sidecar instance can serve a LangChain agent, a custom RAG pipeline, and a simple state machine simultaneously.

The Bottom Line

Memory Sidecar v3.5.1 doesn’t do anything flashy. It does the dirty work: fewer crashes, cleaner backpressure, and less latency variance. If you’ve been running any version earlier than 3.5, this upgrade pays for itself in reduced operational toil. The hermes-memory-installer makes the switch trivial, and all existing agents continue to work unchanged. For a subsurface component like a memory sidecar, that’s exactly the kind of release you want to take.

Top comments (0)