DEV Community: Dhruv Sharma

How We Ensured API Keys Never Linger in RAM

Dhruv Sharma — Wed, 25 Mar 2026 15:12:37 +0000

Rust's ownership model cleans up memory automatically — but it doesn't overwrite it. A dropped String containing an API key still has its bytes sitting in physical RAM until something else claims that page. The zeroize crate fixes that. Here's every pattern we used in a production secrets vault.

The Problem

When you store and retrieve API keys in a credentials vault, the sensitive bytes touch several places in memory:

The Argon2-derived encryption key (lives for the session)
The raw key value as a String (lives during add/retrieve operations)
The master password from stdin (lives until validated)

Rust's drop frees the allocation, but the OS doesn't zero it — it just marks the page as reusable. A memory dump, cold boot attack, or crash dump can recover the value seconds to minutes after drop.

Three Patterns, Applied

Pattern 1 — Zeroize on a custom struct with Drop

The encryption key is a fixed-size byte array stored in a struct that holds it for the lifetime of the vault session. We implement Drop manually to ensure it's overwritten before the memory is released:

struct LockedSecretboxKey {
    key: [u8; DERIVED_KEY_LEN],
    locked: bool,
}

impl Drop for LockedSecretboxKey {
    fn drop(&mut self) {
        if self.locked {
            unsafe { libc::munlock(self.key.as_ptr().cast(), self.key.len()); }
        }
        self.key.zeroize(); // overwrite with zeros before dealloc
    }
}

The mlock call prevents the OS from swapping the page to disk. zeroize clears it from RAM. Together they close both attack surfaces.

Pattern 2 — Zeroizing<T> wrapper for automatic zeroing

For the decrypted credential returned to callers, we wrap the value type in Zeroizing<String>. It implements Drop internally — you get automatic zeroing without writing any Drop code:

pub struct DecryptedCredential {
    pub id: String,
    pub key: Zeroizing<String>, // zeros itself on drop
}

This also prevents Clone and Copy from being derived, which is exactly what you want — no accidental duplication of secret values.

Pattern 3 — Explicit .zeroize() before end of scope

During add_credential, the raw key string lives as a local while we encrypt it. After encryption completes, we call .zeroize() explicitly rather than waiting for the scope to end:

key_value.zeroize(); // explicit: zero now, not at brace

And during key derivation, we wrap the intermediate buffer in Zeroizing::new() so even if hash_password_into returns an error partway through, the partial derivation is wiped:

let mut derived = Zeroizing::new(vec![0u8; DERIVED_KEY_LEN]);
argon2.hash_password_into(master_password.as_bytes(), salt, &mut derived)?;

The Pitfalls

Drop order matters during error paths. In LockedSecretboxKey::new, if mlock fails and require_mlock is true, we call key.zeroize() before returning the error — because the key still exists in that stack frame and we would otherwise return with sensitive bytes uncleared.

String is special. The Zeroize trait works on String and Vec<u8> because they own their heap allocation. You cannot use it with &str — there's no ownership to zero through.

Clone/Copy on secret types is a footgun. We assert in tests that DecryptedCredential does not implement Copy or Clone. If it did, callers could silently duplicate the key into a plain String that never gets zeroed.

Takeaway

zeroize is a one-crate solution to a real gap in Rust's memory model: ownership handles cleanup, but not sanitization. The three patterns cover the full lifecycle — long-lived session keys, short-lived plaintext values, and intermediate derivation buffers. Pair it with mlock for anything that should never hit swap.

I Tried Duplicating Layers in Qwen 3.5 to Reduce Hallucinations — Here's What Actually Happened

Dhruv Sharma — Sun, 22 Mar 2026 11:32:41 +0000

I read two papers about improving LLMs at inference time — no training, no fine-tuning, just architectural surgery. I tried applying these ideas to Qwen 3.5-9B. The initial results looked incredible (+245% reasoning!). Then I ran fair evaluations and discovered most of the improvement was an evaluation artifact. Here's the full story, including what I got wrong and what's genuinely new.

The Research That Started This

Two pieces of research motivated this experiment:

1. The RYS Method (David Ng) — Transformers contain "reasoning circuits": contiguous blocks of 3-5 layers that act as indivisible cognitive units. Duplicate them in the GGUF file and the model gets a second pass through its reasoning pipeline. The llm-circuit-finder toolkit validated this on Devstral-24B (+245% logical deduction on BBH) and Qwen2.5-32B (+23% reasoning). The boundaries are sharp — shift by one layer and the improvement vanishes.

2. H-Neurons Paper (arXiv:2512.01797) — Fewer than 0.1% of neurons in an LLM predict whether it will hallucinate. These neurons are baked in during pre-training and survive instruction tuning. Scaling their activations at inference time controls hallucination rates.

Both papers point to the same idea: you can change model behavior at inference time by manipulating the architecture, without touching the weights. I wanted to try this on Qwen 3.5 — a newer, community-loved model.

Discovery 1: Qwen 3.5's Hybrid Architecture Requires Cycle-Aligned Duplication

Qwen 3.5 doesn't use standard transformer layers. It uses a repeating pattern of [DeltaNet, DeltaNet, DeltaNet, Attention] — three linear attention layers followed by one full quadratic attention layer. This 4-layer cycle repeats 8 times for 32 total layers.

I discovered this empirically. My first sweep tried duplicating 3-layer blocks. Every config crashed:

# Config (2,5) - 3 layers
llama_model_load: error: missing tensor 'blk.6.ssm_conv1d.weight'

# Config (4,7) - 3 layers
llama_model_load: error: missing tensor 'blk.7.attn_q.weight'

The errors alternate: ssm_conv1d (DeltaNet tensor) missing, then attn_q (Attention tensor) missing. Duplicating 3 layers shifts the pattern, putting the wrong layer type at each position. But duplicating 4 layers (one complete cycle) works — the pattern stays aligned.

This is new. The original RYS work only tested standard transformers where all layers are identical. Nobody had tried it on a hybrid DeltaNet architecture before. The finding: layer duplication on hybrid models must respect the architectural cycle.

Discovery 2: Initial Results Looked Amazing (But Were Wrong)

I built custom probes (code generation, hallucination detection, reasoning) and swept all cycle-aligned configs. The initial results were dramatic:

Config	Code Gen	Hallucination Resistance	Reasoning
Baseline	7%	54%	29%
(0,4) layers 0-3 duplicated	79%	96%	88%

Code generation went from 7% to 79%. Hallucination resistance nearly doubled. Reasoning tripled. I was convinced I'd found the reasoning circuit in Qwen 3.5.

Discovery 3: The Improvement Was an Evaluation Artifact

Then I ran fair evaluations. The initial sweep used max_tokens of 512-1024. Qwen 3.5 wraps responses in <think>...</think> tags, which consume tokens. With limited budget:

Base model: Spent 500+ tokens thinking, ran out before producing an answer → empty response → scored 0
RYS model: Didn't use think tags, answered directly in 50-200 tokens → correct response → scored 1

The "improvement" was measuring which model fits its answer within the token budget, not which model is smarter.

When I re-ran with max_tokens=4096 (fair for both):

Config	Code Gen	Hallucination Resistance	Reasoning	Overall
Baseline	80%	40%	100%	73.3%
(0,4)	60%	80%	100%	80.0%
(4,8)	80%	60%	80%	73.3%
(8,12)	0%	40%	80%	40.0%
(12,16)	0%	60%	80%	46.7%
(16,20)	0%	40%	100%	46.7%
(20,24)	60%	60%	100%	73.3%

The real improvement from (0,4) is +6.67% overall — not the +286% from the flawed evaluation. Most configs actually hurt the model. And the baseline reasoning score is 100%, not 29%.

Discovery 4: When Both Models Answer, They're Identical

I tested both models on 10 hard hallucination prompts (fake APIs, version confusion, tricky Python behavior). Side by side, with identical settings:

Both correctly rejected list.add(), dict.sort_by_value(), json.parse()
Both correctly refused to name a 2028 World Cup winner
Both correctly explained that list.sort() returns None
Both incorrectly said match/case works in Python 3.9 (it's 3.10+)
Both correctly explained banker's rounding for round(2.5)

The layer duplication doesn't change the model's knowledge. When both models respond, they give the same answers — same correct ones, same mistakes.

What the Original Author Actually Said

Going back to the original RYS blog, David Ng explicitly noted:

"Smaller models seem to be more complex...I never found a single area of duplication that generalised across tasks."

His successful results were on 72B+ parameter models. I used 9B. He also said:

"Every architecture has its own neuroanatomy...The brains are different."

And critically: neither the original author nor anyone else had tested RYS on hybrid DeltaNet architectures. The method was validated exclusively on standard transformers (Qwen2, Llama, Mistral, Phi). Qwen 3.5's hybrid architecture was untested territory.

Even though the author warned about small models, we tried it anyway and quantified exactly what happens. Next up: running this on Qwen 3.5 122B — the scale where Ng saw real gains.

What's Genuinely New Here

Despite the accuracy improvement not holding up, this experiment produced three findings nobody else has published:

Hybrid architectures require cycle-aligned duplication. On Qwen 3.5's [D,D,D,A] pattern, only block-size-4 duplication works. Block-size-3 crashes. This constrains how RYS can be applied to next-generation architectures.
Layer duplication can change output behavior. The (0,4) config switched the model from using <think> tags to responding directly. This is an unexpected side effect — duplicating layers doesn't just affect accuracy, it can change the model's generation strategy.
Evaluation methodology on thinking models is treacherous. Token budget, think-tag handling, and response parsing can swing results from "dramatic improvement" to "no improvement". Anyone evaluating thinking models needs to control for these factors.

How to Reproduce

# Clone the circuit finder toolkit
git clone https://github.com/alainnothere/llm-circuit-finder.git
cd llm-circuit-finder
pip install gguf

# Download Qwen3.5-9B GGUF (from unsloth on HuggingFace)
# Then build the modified model:
python layer_path.py Qwen3.5-9B-Q4_K_M.gguf \
    Qwen3.5-9B-RYS-0-4.gguf \
    -p "0..3,0,1,2,3,4..31" -v

# Run with llama.cpp
llama-server -m Qwen3.5-9B-RYS-0-4.gguf -c 8192 -ngl 99

Lessons Learned

Always run fair evaluations first. Same max_tokens, same conditions, same scoring for both models. Our first sweep used different effective token budgets and produced wildly misleading results.
Check what the original authors actually tested. We assumed RYS works on all transformers. The author explicitly said small models are harder and every architecture is different.
Empty responses are not zero capability. The base model returned empty strings on some prompts, but with enough tokens it answered correctly. Scoring empty as zero inflated the apparent improvement.
Hybrid architectures are genuinely different. Techniques proven on standard transformers don't transfer automatically. DeltaNet layers maintain recurrent state — duplicating them isn't the same as "thinking longer."

References & Links

RYS Model on HuggingFace — The modified GGUF with layers 0-3 duplicated
llm-circuit-finder — The sweep and GGUF surgery toolkit
RYS Method — David Ng — Original blog post and method
H-Neurons Paper (arXiv:2512.01797) — Hallucination-associated neurons in LLMs
Qwen 3.5 Architecture — Model card with hybrid DeltaNet details

Agent Orchestrator vs T3 Code vs OpenAI Symphony vs Cmux: Hands-On Comparison

Dhruv Sharma — Wed, 18 Mar 2026 21:49:18 +0000

Which tool fits your workflow? A cross-examined comparison.

Four tools shipped recently that keep getting compared as if they're competitors. They're not — they sit at different layers:

Orchestration layer (full lifecycle): AO, Symphony
Interaction layer (human-in-the-loop): T3 Code
Terminal layer (agent-aware environment): Cmux

AO and Symphony compete most directly. T3 Code and Cmux solve different problems entirely.

I ran all four on real codebases. Here's what I found.

What each tool actually is

Agent Orchestrator (AO) — Give it a GitHub/Linear/Jira issue. It spawns an agent in an isolated worktree, opens a PR, auto-fixes CI failures, routes review comments back. You intervene when it's done, stuck, or needs approval.

T3 Code — Desktop app by Theo Browne. Chat with a coding agent, see visual diffs, stay close to every change before it lands. Currently wraps Codex, Claude Code adapter in progress.

OpenAI Symphony — Its reference Elixir implementation polls your Linear board, auto-claims tickets, spawns Codex agents, delivers PRs with proof-of-work. Elixir/OTP for fault tolerance. Linear-only. Still an engineering preview.

Cmux — Native macOS terminal built for AI agents. Split panes, notification rings, scriptable in-app browser, Unix socket automation. Not an orchestrator — it's where you run your agents. macOS only.

Quick pick

If you want...	Use
Fire-and-forget: issue in, PR out, CI fixes handled	AO or Symphony
Review every change before it lands	T3 Code
Autonomous agents + GitHub Issues or Jira	AO
Autonomous agents + Linear	Symphony (or AO — it has a Linear plugin too)
A better terminal for running any AI agent	Cmux
The easiest first experience	T3 Code (`npx t3`) or Cmux (`brew install`)
Run 10+ agents on a backlog in parallel	AO or Symphony
Maximum fault tolerance (restart recovery on crash)	Symphony
Swap agents, runtimes, trackers, SCM via config	AO

Feature matrix

	AO	T3 Code	Symphony	Cmux
Type	Orchestrator + dashboard	GUI for coding agents	Autonomous pipeline	Agent-aware terminal
Spawns agents on issues	Yes (manual + auto-poller)	No	Yes (polls Linear)	No
CI failure → auto-fix	Yes (retries, then escalates)	No	Yes (verifies before landing)	No
Review comment handling	Forwards to agent incrementally	No	Restarts from scratch (ref. workflow)	No
Auto-merge	Configurable	No	Yes (ref. workflow)	No
Agents supported	Claude Code, Codex, Aider, others	Codex (Claude soon)	Codex (community Claude port)	Any CLI agent
Issue trackers	GitHub, Linear, Jira	None	Linear only	None
SCM	GitHub	GitHub	GitHub	N/A
Extensibility	8 plugin slots, swappable via config	Provider adapters (early)	Agent runtime swappable, rest fixed	Unix socket API
Dashboard / UI	Web (Next.js)	Electron desktop app	No UI	Native macOS terminal
Platform	Cross-platform	Mac, Win, Linux	Cross-platform	macOS only
License	MIT	MIT	Apache 2.0	AGPL-3.0

Where the real differences show up

The differences surface when things go wrong.

When an agent crashes

AO: Lifecycle manager detects the dead session via polling (~30s). Recovery system classifies it, attempts automatic recovery, then escalates to human notification.

Symphony (reference impl): OTP supervisor handles restart recovery with error context — designed to be transparent to the user.

T3 Code: The thread shows an error. You restart manually.

When CI fails

AO: Lifecycle manager detects the failure, fetches CI logs, sends them to the agent, agent fixes and pushes. Configurable retries, then escalates.

Symphony (reference workflow): Agent must provide proof-of-work — checks must pass before the PR is considered complete. If CI fails, the agent retries within its implementation run.

T3 Code: No CI handling. That's on you.

When a reviewer requests changes

AO: Forwards the review comments to the agent on the existing branch. Agent addresses them incrementally and pushes.

Symphony (reference workflow): Closes the PR, creates a new branch, re-implements from scratch.

T3 Code: No automated handling — you manage reviews manually.

Cross-examined: When to use what

Use AO when...

You want full lifecycle automation — issue in, PR out, CI fixes handled, review comments routed.

But can't Symphony do this too?
Yes. Both go from ticket to PR autonomously. Both handle CI verification. The differences: AO works with GitHub Issues, Linear, and Jira — Symphony is Linear-only. AO's plugin slots let you swap agent, runtime, tracker, SCM, and notifier independently. Symphony's reference implementation is more tightly integrated.

But can't T3 Code also run autonomously?
T3 Code has a mode where the agent writes files without asking. But there's no CI failure handling, review routing, or auto-merge. T3 Code automates coding. AO and Symphony automate the entire PR lifecycle.

Use T3 Code when...

You want to stay close to every change before it lands — visual diffs, structured chat.

But can't AO also do human-in-the-loop?
AO has task-level approval gates and escalation notifications. But AO delegates code review to GitHub. T3 Code lets you stay close to individual changes before they touch the filesystem. Different granularity: AO is human-on-the-loop (oversight at milestones), T3 Code is human-in-the-loop (oversight at every edit).

Use Symphony when...

You want fault-tolerant autonomous agents with strong concurrency guarantees, and you use Linear.

But AO also has a Linear plugin?
Yes. If you use Linear, both work. The concurrency architecture differs: Symphony runs on Erlang/OTP with supervision trees — designed for stronger restart recovery. Per-state concurrency limits bound concurrent agents. AO detects dead agents via polling and recovers, but doesn't transparently restart mid-execution.

What if I don't use Linear?
Symphony won't work for you today. AO supports GitHub Issues, Linear, and Jira.

Use Cmux when...

You want a better terminal experience for running AI agents — any agent, any orchestrator.

But AO already has a web dashboard with a terminal?
Yes. AO's dashboard has an xterm.js terminal via WebSocket. Cmux adds: native GPU rendering, lower latency, notification rings, drag-and-drop panes, scriptable browser. AO's dashboard adds: PR lifecycle cards, CI status, review comments, fleet overview. Different layers — Cmux is a terminal, AO's dashboard is a management plane.

Key architectural differences

AO — 8 plugin slots (runtime, agent, workspace, tracker, SCM, notifier, terminal, lifecycle). Reaction engine auto-handles CI failures, review comments, merge readiness — each with configurable retries and escalation. Session state is file-based. Polling-based detection (30s intervals).

Symphony — Erlang/OTP supervision trees for process-level fault tolerance. Agent behavior defined in WORKFLOW.md versioned with your code. Per-state concurrency limits. Review rework is destructive (full reset) in the reference workflow. Linear + Codex only (officially). Still an engineering preview.

T3 Code — Wraps coding agents with a conversational UI and visual diffs. Designed for focused 1-on-1 work where you want to see every change.

Cmux — Unix socket IPC. Agents can programmatically create panes, send notifications, control browsers. GPU-rendered via libghostty. Notification rings, in-app browser, socket/CLI automation. No higher-level orchestration logic.

Trust model

Execution happens locally for all four tools. Code and metadata go to GitHub/Linear and LLM providers (Anthropic, OpenAI) depending on your agent and tracker config.

All tools work via git branches. Agents push to feature branches and open PRs — your main branch is never touched until you merge.

Getting started

	AO	T3 Code	Symphony	Cmux
Prerequisites	Node 20+, pnpm, tmux, git 2.25+	Node, OpenAI API key	Elixir, Linear workspace, OpenAI key	macOS 14+
Install	`pnpm install && pnpm build`	`npx t3`	`mix setup && mix build`	`brew install --cask manaflow-ai/cmux/cmux`
Time to first run	~10 min	~2 min	~30-60 min	~1 min
Cost	Free + LLM API costs	Free + API costs	Free + OpenAI costs	Free

Combos (honest assessment)

Combo	Reality
AO + Cmux	Complementary layers (orchestration + terminal), but no native integration yet. Manual tmux-attach inside Cmux panes.
AO + T3 Code	Aspirational. T3 Code has no "review existing PR" workflow today.
Symphony + Cmux	Same as AO + Cmux. Manual terminal attachment.

FAQ

How many agents can run in parallel?
AO: no hard limit, defaults to 5 concurrent (configurable). Symphony: defaults to 10 with per-state limits. T3 Code: no hard limit but designed for focused work.

Which one will still exist in 6 months?
AO: backed by Composio, actively maintained. Symphony: backed by OpenAI, "engineering preview" — production commitment unclear. T3 Code: backed by Theo/Ping, active development. Cmux: backed by Manaflow (YC S24), actively maintained.

Solo dev or team?
Solo: T3 Code or AO (manual spawn). Small team: AO with dashboard. Platform/infra team: AO (plugin arch) or Symphony (if on Linear). Enterprise: evaluate carefully — none are enterprise-hardened yet.

The bottom line

AO and Symphony compete at the orchestration layer. T3 Code and Cmux sit at different layers entirely.

The question isn't "which is best." It's "which layer are you missing?"

Full discussion with architecture deep-dive and security FAQ: GitHub Discussion

Looking for volunteers

We want hands-on, honest comparisons — not marketing, just data:

AO vs Symphony — Same backlog, same codebase. Time-to-PR, fix rate, human time spent, cost.
AO vs T3 Code — Same issue. Autonomous vs human-in-the-loop when the agent gets something wrong.
AO + Cmux — Does Cmux actually improve the AO supervision experience?

Interested? Drop a comment on the GitHub Discussion.

Compiled March 2026. This space moves fast — comments welcome if anything's outdated.

A Hybrid Key Architecture for Autonomous Agent Credential Management

Dhruv Sharma — Mon, 09 Mar 2026 13:34:57 +0000

AI agents that move money on-chain have a problem nobody talks about cleanly: who holds the keys?

That's the problem I ran into building Fishnet, an AI agent transaction security proxy in Rust. Fishnet sits between the AI agent and the chain — a control plane that necessarily holds signing keys. You can't give it zero secrets. So the question becomes: how do you minimize blast radius when secrets are unavoidable?

The naive answer is to pick one storage primitive and use it for everything. That breaks down immediately when your system has multiple cryptographic operations with different security requirements. Keychain is good for secret storage but not the same thing as hardware-backed signing. In this flow, Secure Enclave gives me P-256, while Ethereum signing requires secp256k1. File storage is portable, but it mostly relies on filesystem permissions rather than hardware isolation.

The answer I landed on: use the right storage primitive for each key's threat model, and compose them behind a clean trait abstraction.

The Architecture at a Glance

Fishnet sits between the AI agent and the chain. Every transaction goes through it. That means Fishnet holds three distinct cryptographic identities — each with a completely different threat model.

Three keys. Three blast radii. Vault compromise does not imply signing access. Signing compromise does not imply credential access. The approval key is hardware-backed when Secure Enclave mode is active.

The Three Operations

Operation	Key Type	Threat Model	Storage
Vault encryption	Symmetric (256-bit)	Credential exposure at rest	Argon2id-derived key, optionally cached in Keychain
Onchain approval	P-256 asymmetric	Unauthorized permit approval and replay	Secure Enclave in runtime; software signer type exists for tests and explicit construction paths
Ethereum signing	secp256k1 asymmetric	Unauthorized permit signing	File (`.hex`)

Layer 1: Vault Encryption (Argon2id + Keychain)

The credential vault stores API keys encrypted at rest. Its encryption key is derived from a user password using Argon2id, a widely recommended memory-hard password KDF. Fishnet can also cache that derived 32-byte key in macOS Keychain when the operator opts in, so the security story has two paths: password-based unlock when the cache is absent, and Keychain-protected unlock when the cache is present.

const ARGON2_MEMORY_COST_KIB: u32 = 262_144;  // 256 MB
const ARGON2_TIME_COST: u32 = 3;
const ARGON2_PARALLELISM: u32 = 1;
const DERIVED_KEY_LEN: usize = 32;

The 256 MB memory cost is intentional. When the password-based unlock path is used, it pushes brute-force cost into memory bandwidth as well as compute, which makes large-scale GPU cracking materially more expensive and less efficient. It does not make GPU attacks impossible; it raises their cost.

The resulting 32-byte key feeds directly into libsodium's crypto_secretbox_easy for XSalsa20-Poly1305 authenticated encryption. The cipher here is XSalsa20-Poly1305, not AES.

Vault Unlock Flow

The version prefix on the cached Keychain entry (derived_hex:v1:) provides a migration path. Future derivation formats can use v2:, v3:, and so on without breaking existing entries.

The in-memory key is pinned with mlock() where the OS allows it, to keep it out of swap, and zeroed on drop via the zeroize crate:

impl Drop for LockedSecretboxKey {
    fn drop(&mut self) {
        if self.locked {
            unsafe { libc::munlock(self.key.as_ptr().cast(), self.key.len()); }
        }
        self.key.zeroize();
    }
}

On normal teardown, the key bytes are overwritten before the allocator can reuse that memory. That reduces post-use exposure in freed memory, but it does not protect against live-memory capture or a crash that happens before Drop runs.

Caching the derived key in Keychain is a conscious tradeoff: it improves operator ergonomics, but once that cache exists, the strength of that path depends more on Keychain access controls than on Argon2 parameters.

Layer 2: Onchain Approval Key (P-256 + Secure Enclave)

When onchain.approval.enabled is set, Fishnet adds a P-256 second signature requirement before it emits the secp256k1 permit signature. This is a hardware-backed approval proof layered in front of normal onchain permit signing.

That P-256 approval is enforced by Fishnet's control plane, not by the EVM itself. Its purpose is to gate whether the secp256k1 permit signature is ever emitted.

The type names still use BridgeSigner and BridgeApprovalSigner because the feature originated around bridge-style risk controls, but the current runtime wiring applies the approval layer to generic onchain permit issuance.

The BridgeApprovalSigner trait makes the approval layer pluggable. In the current macOS runtime, the signer is Secure Enclave-backed when the platform allows it. A software P-256 signer type also exists in the codebase for tests and explicit construction paths:

pub trait BridgeApprovalSigner: Send + Sync {
    fn mode(&self) -> &str;
    fn public_key_hex(&self) -> &str;
    fn sign_prehash(&self, prehash: &[u8; 32]) -> Result<P256Signature, SignerError>;
}

When the persistent Secure Enclave path is active, the key is created with:

kSecAccessControlPrivateKeyUsage — usable for private-key operations like signing
kSecAccessControlUserPresence — user presence required
kSecAttrAccessibleWhenUnlockedThisDeviceOnly — inaccessible while the device is locked and bound to that device

The non-exportability comes from Secure Enclave key generation itself; the ThisDeviceOnly accessibility class keeps the keychain item from migrating to another device.

The graceful degradation story is important. If persistent Secure Enclave storage is denied or unavailable, which is common in unsigned CLI or non-interactive contexts, Fishnet falls back to a session-only Secure Enclave key and surfaces the mode string to the caller:

Mode string	Meaning
`p256-secure-enclave-bridge`	Hardware-backed, persists across restarts
`p256-secure-enclave-bridge-session`	Hardware-backed, rotates on restart
`p256-local-bridge`	Software signer type present in tests/dev code, not the automatic runtime fallback on this branch

It never silently downgrades from persistent to session-only Secure Enclave storage without labeling the mode. On the current branch, non-macOS runtime approval is fail-closed rather than an automatic software fallback.

Layer 3: Ethereum Signing Key (secp256k1 + file)

EIP-712 permit signing happens on every agent transaction. The secp256k1 key lives in a hex file with 0600 permissions. The tradeoff is portability: Linux agents do not have macOS Keychain, and the on-chain nonce provides the final replay backstop.

The address derivation follows the Ethereum spec exactly:

pub fn try_from_bytes(secret_bytes: [u8; 32]) -> Result<Self, SignerError> {
    let signing_key = SigningKey::from_bytes((&secret_bytes).into())?;
    let verifying_key = signing_key.verifying_key();
    let public_key_bytes = verifying_key.to_encoded_point(false); // uncompressed (65 bytes)
    let hash = Keccak256::digest(&public_key_bytes.as_bytes()[1..]); // drop 0x04 prefix
    let mut address = [0u8; 20];
    address.copy_from_slice(&hash[12..]); // last 20 bytes
    Ok(Self { signing_key, address })
}

The uint48 footgun

The permit schema uses a uint48 expiry field, while Rust stores it as u64. If the Rust side accepts values above 2^48 - 1, the request is now outside the Solidity type's valid domain. Depending on the encoder or verifier, that can show up as rejected inputs, invalid typed-data payloads, or signatures that no longer match what the contract expects to hash.

Fishnet validates this at the boundary before any signature runs:

const UINT48_MAX: u64 = (1u64 << 48) - 1;

if self.expiry > UINT48_MAX {
    return Err(SignerError::InvalidPermit(format!(
        "expiry {} exceeds uint48 max ({}), invalid for Solidity uint48",
        self.expiry, UINT48_MAX
    )));
}

Hard rejection at input. Not a warning. Not a clamp. A rejection that keeps off-chain inputs inside the exact range the Solidity side accepts.

Composing the Layers: `BridgeSigner`

The three layers compose cleanly. BridgeSigner wraps any SignerTrait (the secp256k1 signer) with any BridgeApprovalSigner (P-256, software, or Secure Enclave). Despite the name, this wrapper currently sits in the generic onchain permit path:

pub struct BridgeSigner {
    inner: Arc<dyn SignerTrait>,                     // secp256k1 layer
    approval_signer: Arc<dyn BridgeApprovalSigner>,  // P-256 layer
    approval_ttl_seconds: u64,
    replay_cache: Mutex<HashMap<[u8; 32], u64>>,     // keyed by derived replay hash over stable permit fields
}

Approval Signing Flow

Step 7 (sign, then verify) catches key corruption immediately rather than producing an invalid proof that propagates deeper into the system. Step 8's rollback ensures a failed secp256k1 signing does not leave a consumed replay cache entry behind that would block a retry.

Key Hierarchy Summary

┌──────────────────┬──────────────────────┬───────────────────────┐
│   Vault Layer    │   Approval Layer     │   Signing Layer       │
├──────────────────┼──────────────────────┼───────────────────────┤
│ Argon2id         │ P-256 (secp256r1)    │ secp256k1             │
│   ↓              │                      │ (k256 crate)          │
│ XSalsa20-Poly    │                      │                       │
├──────────────────┼──────────────────────┼───────────────────────┤
│ macOS Keychain   │ Secure Enclave       │ .hex file             │
│ (cached key)     │ (user presence)      │ (0600 permissions)    │
├──────────────────┼──────────────────────┼───────────────────────┤
│ + mlock()        │ In enclave mode,     │ Validated at input    │
│ + zeroize on drop│ key stays on-chip    │ (uint48, U256, addr)  │
├──────────────────┼──────────────────────┼───────────────────────┤
│ Protects:        │ Protects:            │ Produces:             │
│ API credentials  │ Permit approvals     │ EIP-712 permit sigs   │
│ at rest          │ from replay + abuse  │ for on-chain actions  │
└──────────────────┴──────────────────────┴───────────────────────┘

What This Architecture Gets Right

Blast radius containment. Each key has exactly one job. Compromising the secp256k1 key lets an attacker sign Ethereum transactions, but not decrypt vault credentials. Compromising the vault key exposes API keys, but doesn't enable on-chain actions. The approval key adds a second factor that must be compromised independently — and, in Secure Enclave mode, it never leaves hardware.

Hardware backing where it matters. The approval key is a likely target for a "sign this transaction" attack. When Secure Enclave mode is active, the private key is non-exportable and isolated from normal process memory.

Graceful degradation without silent failure. When persistent Secure Enclave storage is unavailable, Fishnet surfaces the mode string to callers. No silent downgrade to session-only mode, and no automatic runtime software fallback on unsupported platforms.

Versioned storage formats. The Keychain prefix (derived_hex:v1:), replay cache key (fishnet-bridge-replay-v1|), and intent hash prefix (fishnet-bridge-approval-v1|) all include version identifiers. The approval-related prefixes still carry bridge-flavored names for historical reasons, but the versioning itself is what matters. Future migrations can introduce new formats without ambiguous parsing or ad hoc compatibility logic.

Boundary validation. Rust uses u64, Solidity expects uint48, and Fishnet rejects out-of-range values before signing.

What I'd Do Differently

The secp256k1 key in a hex file is the weakest link. For production, this should move to an HSM, KMS, or another OS-managed key store appropriate to the deployment target. The hex file was chosen for portability, but that is still architectural debt worth acknowledging explicitly.

The replay cache is in-memory only. A process restart clears it, meaning a cached permit could be replayed across a restart boundary. For Fishnet's current use case, the on-chain nonce provides the final replay protection, but a persistent replay store would be more robust.

The goal is always to minimize what any single compromise can reach. When you can't give your control plane zero secrets, the next best thing is ensuring each secret only unlocks one blast radius.

How do you handle key management in systems where secrets are unavoidable?

How I Saved 20,000 Gas Per Transaction by Reordering One Line in Solidity

Dhruv Sharma — Sun, 01 Mar 2026 18:58:02 +0000

While building a smart wallet contract for Fishnet — an AI agent transaction security proxy — I ran a self-imposed code review and found a subtle optimization that every Solidity developer should know about.

One variable reorder. 20,000 gas saved per transaction.

Here's the full breakdown.

The Problem: Silent Storage Slot Waste

My state variables looked like this:

address public owner;          // 20 bytes → Slot 0
address public fishnetSigner;  // 20 bytes → Slot 1
mapping(uint256 => bool) public usedNonces; // Slot 2
bool public paused;            // 1 byte  → Slot 3  ← wasting 31 bytes

That bool paused at the bottom? It's only 1 byte, but it was consuming an entire 32-byte storage slot. That's 31 bytes of wasted space — and more importantly, an extra SLOAD/SSTORE on every pause check.

Why the EVM cares

The EVM operates on 32-byte words. Every storage slot is exactly 32 bytes. When the Solidity compiler lays out your state variables, it goes top to bottom in declaration order:

Slot 0: [owner ─────────────── 20 bytes][── 12 bytes empty ──]
Slot 1: [fishnetSigner ─────── 20 bytes][── 12 bytes empty ──]
Slot 2: [usedNonces mapping hash ───────────────── 32 bytes ─]
Slot 3: [paused ─ 1 byte][─────── 31 bytes empty ───────────]

The compiler does not reorder your variables for you. If a variable can't fit in the remaining space of the current slot, it starts a new one. An address is 20 bytes. A bool is 1 byte. They fit together with 11 bytes to spare — but only if they're adjacent in your declaration.

The Fix: Storage Slot Packing

Move paused right after owner:

address public owner;          // 20 bytes ─┐
bool public paused;            // 1 byte  ──┘ Slot 0 (21/32 bytes)
address public fishnetSigner;  // 20 bytes → Slot 1
mapping(uint256 => bool) public usedNonces; // Slot 2

New layout:

Slot 0: [owner ─────────────── 20 bytes][paused 1B][─ 11 bytes empty ─]
Slot 1: [fishnetSigner ─────── 20 bytes][── 12 bytes empty ──────────]
Slot 2: [usedNonces mapping hash ───────────────── 32 bytes ─────────]

4 slots → 3 slots. One fewer storage slot touched at runtime.

The Gas Math

Here's what this saves in practice:

Operation	Before (separate slots)	After (packed)	Savings
Cold `SLOAD` (first read in tx)	2,100 gas × 2 slots	2,100 gas × 1 slot	2,100 gas
Cold `SSTORE` (pause/unpause)	~20,000 gas	0 (slot already warm from `owner`)	~20,000 gas
`whenNotPaused` modifier per call	Reads its own slot	Reads `owner`'s slot (often already warm)	Up to 2,000 gas

The big win is the cold SSTORE elimination. Writing to a storage slot that hasn't been accessed in the current transaction costs ~20,000 gas. But if owner has already been read (which it almost always has in the same transaction context), the slot containing paused is now warm — and a warm SSTORE costs only ~2,900 gas.

How to Check Your Own Contracts

Foundry makes this trivial:

forge inspect YourContract storage-layout

This outputs every state variable with its slot number, offset, and byte size. Look for:

Variables that could pack together (combined size ≤ 32 bytes) but are in separate slots
bool, uint8, uint16, address separated by mappings or larger types
Related variables read together that are in different slots

Example output:

| Name          | Type                        | Slot | Offset | Bytes |
|---------------|-----------------------------|------|--------|-------|
| owner         | address                     | 0    | 0      | 20    |
| paused        | bool                        | 0    | 20     | 1     |
| fishnetSigner | address                     | 1    | 0      | 20    |
| usedNonces    | mapping(uint256 => bool)    | 2    | 0      | 32    |

When Offset > 0, you've got packing happening. When small types have Offset = 0 and their own slot — that's a packing opportunity.

5 Other Things I Found in the Same Review

Storage packing was the optimization win, but the same code review caught much more:

1. Critical permit.value vulnerability

The execute() function accepted a permit signature but never validated that permit.value matched msg.value. An attacker could get a permit signed for 0.01 ETH but submit the transaction with 100 ETH, draining the wallet.

// Before: no validation
function execute(Permit calldata permit, ...) external payable {
    // permit.value could be anything vs msg.value
}

// After: explicit check
require(permit.value == msg.value, InsufficientValue());

2. Chain ID validation for fork protection

The contract cached DOMAIN_SEPARATOR at deployment but never recomputed it. On a chain fork (like ETH/ETH Classic), signatures from one chain would be valid on the other.

function _domainSeparator() internal view returns (bytes32) {
    if (block.chainid == _CACHED_CHAIN_ID) {
        return _CACHED_DOMAIN_SEPARATOR;
    }
    return _computeDomainSeparator(); // recompute on fork
}

3. Fail-fast signature validation

The original code ran an expensive keccak256 hash before checking if the signature was even the right length. Flipping the order saves gas on every invalid input.

// Before: hash first, then check length
bytes32 hash = keccak256(abi.encodePacked(...));
require(signature.length == 65, InvalidSignature());

// After: check length first, hash only if valid
require(signature.length == 65, InvalidSignature());
bytes32 hash = keccak256(abi.encodePacked(...));

4. Custom errors over string reverts

Replaced all require(condition, "String message") with custom errors. Each string revert stores the message in bytecode and costs ~50 extra gas per revert.

// Before
require(msg.sender == owner, "Not authorized");

// After
error Unauthorized();
if (msg.sender != owner) revert Unauthorized();

5. Dead test code cleanup

Found leftover console.log imports and unused test helper functions that had accumulated during rapid iteration. They don't affect runtime gas, but they bloat deployment bytecode.

Key Takeaway

Code review isn't just about finding bugs. It's about understanding the machine your code runs on.

The EVM has a 32-byte word size, and every storage slot costs real money. Knowing how the compiler lays out storage is the difference between a contract that costs users $2 per transaction and one that costs $5.

Run forge inspect YourContract storage-layout. Look at your slot assignments. You might be surprised what you find.

This came out of building Fishnet — an open-source security proxy for AI agent transactions on Ethereum. If you're working on AI × Web3 infra, check it out.