DEV Community: Highpass Studio

AI memory is broken. We built one that forgets.

Highpass Studio — Sun, 05 Apr 2026 01:35:33 +0000

Every agent framework has the same problem with memory: it doesn't forget.

Context windows reset between sessions. RAG and vector DBs store everything with equal weight and grow until they're noisy. So when your project changes direction two weeks in, the AI still pulls up week-one decisions like they're current.

What this actually looks like

Week 1: You tell the agent "we're using React for the frontend."

Week 2: You switch. "Moving to Svelte, React bundle is too big."

Week 4: You ask "what's our frontend stack?"

A normal retrieval system hands back both answers. React and Svelte sit side by side with equal weight. Nothing in the system knows one replaced the other. So the agent might reference React, Svelte, or some confused mix of both.

We kept running into this while building agent tooling, and it became clear the issue isn't retrieval quality — it's that these systems have no concept of time or obsolescence.

The numbers

We ran a 4-week simulated project through both systems. 24 events total — decisions, corrections, errors, repeated observations. Two major direction changes mid-project.

	Naive	Sparsion
Top result correct	No	Yes
Pruned stale memories	0	2
Retrievable at week 4	24	22

Naive retrieval puts a stale entry on top. Sparsion puts the correction first — salience 1.65 vs 0.55 for the outdated original.

What Sparsion actually does

It treats memory as a lifecycle instead of a log.

Events → Salience Scoring → Hot → Warm → Cold → Forgotten

Old memories weaken over time (exponential decay, configurable half-life)
Repeated events get stronger (log-frequency)
You can flag things as critical — those survive 4x longer
Corrections score 3x higher than observations by default
Anything below a salience floor gets dropped from retrieval entirely

A critical correction enters the system at salience 13.18. A throwaway observation enters at 0.77. After six weeks with no reinforcement, the observation is gone. The correction is still there.

Try it

from sparsion import Runtime

rt = Runtime("agent_memory.db")

# Week 1
rt.record("user", "decision", "Frontend framework: React", importance="high")

# Week 2
rt.record("user", "correction", "Switching to Svelte — React bundle too large", importance="critical")

# Query
memories = rt.query(text="frontend", limit=3)
for m in memories:
    print(f"[{m['tier']}] {m['content']} (salience: {m['salience']:.2f})")
# [Hot] Switching to Svelte — React bundle too large (salience: 13.18)
# [Hot] Frontend framework: React (salience: 4.39)

# Age everything
result = rt.sweep()
print(f"Forgot {result['forgotten']} stale memories")

Under the hood

Rust core, Python bindings via PyO3/maturin, SQLite for storage. No model dependency — salience scoring is heuristic for now.

Rust core
  ├── Event store (SQLite)
  ├── Salience scorer
  ├── Tier manager (hot/warm/cold)
  ├── Decay engine
  └── Ranked retrieval
       ↓
  PyO3 → Python SDK (pip install sparsion)

Tests: 12 Rust unit, 5 integration (deterministic time via MockClock), 4 Python end-to-end.

What's in v0.1

Temporal decay with configurable half-life
Reinforcement through repetition
Importance hints (low/normal/high/critical)
Event type weighting — corrections > decisions > errors > actions > observations
Tier migration and forgetting loop through storage
Python SDK

What's coming

Plugging into real agent workflows
Bigger benchmarks, longer time horizons
Contradiction-aware updates
LangChain memory backend

If you're building agents and keep hitting stale context problems, I'd like to hear about your use case.

Sparsion Runtime — github.com/HighpassStudio/sparsion-runtime

Your logs are still a text file

Highpass Studio — Sun, 22 Mar 2026 04:52:12 +0000

Your logs are still a text file

Every incident investigation starts the same way:

zgrep "user_id=51013" logs/*.gz

...and you wait.

30 seconds. A minute.

You tweak the query. Run it again. Another minute.

Same files. Same decompression. Same full scan.

After ten queries, you've spent ten minutes rereading the same data.

What if grep could remember?

I built xgrep for that.

# One-time: build index (~2 min for 1.7GB)
xgrep --build-index logs/*.gz

# Every query after that
xgrep "user_id=51013" logs/*.gz    # 25ms
xgrep "ERROR" logs/*.gz            # 25ms
xgrep "timeout.*conn" logs/*.gz    # 25ms

Instead of decompressing everything every time, xgrep:

splits logs into 64KB blocks
builds a bloom filter per block
only reads blocks that might match

Everything else is skipped.

The result: read 1% of the data instead of 100%.

That's the whole idea.

Benchmarks on real production logs

Datasets from LogHub: Hadoop (HDFS), Blue Gene/L (BGL), and Spark.

HDFS — 7.5GB decompressed

Query	xgrep	zgrep	Speedup
Block ID	30ms	27s	913x
`WARN`	28ms	25s	907x
`INFO` (very common)	23ms	28s	1,217x

BGL — 5.0GB decompressed

Query	xgrep	zgrep	Speedup
Node ID	26ms	17s	655x
`FATAL`	25ms	17s	708x

Spark — 3,852 files

Query	xgrep	zgrep	Speedup
`Executor`	5s	10m	118x
`ERROR`	2.7s	10m	220x

These are repeated-query (cached) results.

First query is still ~18x faster than zgrep (parallel decompression), but the real win is every query after that — which is how incident debugging actually works.
Here's the replacement JSON section:

JSON logs

zcat logs.json.gz | jq 'select(.user_id == "42042")' is the standard workflow. It works. It's also brutally slow — full decompression, full JSON parse, zero skipping, on every query.

xgrep's -j flag does field-aware search on NDJSON/JSONL logs:

Benchmark

1M NDJSON lines, 244MB uncompressed, 22MB gzip.

Query	Matches	Baseline	xgrep -j	Speedup	Block skip
user_id=42042	9	40.6s	0.22s	188x	97%
status=503	111,130	40.6s	1.75s	23x	0%
level=error status=503	15,838	40.6s	1.71s	24x	0%

Baseline: zcat logs.json.gz | jq 'select(...)'

Every count matches jq exactly. 9/9, 111,130/111,130, 15,838/15,838. No approximations, no missed lines.

How it works

During index build, xgrep hashes three things per JSON field into each block's bloom filter: the field name, the value, and the field-value pair. When you query user_id=42042, the bloom can distinguish "42042 appears in the user_id field" from "42042 appears somewhere in the line." That precision is what drives the skip rate.

Benchmark

1M NDJSON lines, 244MB uncompressed, 22MB gzip.

| Query | Matches | zcat \| jq | xgrep -j | Speedup | Block skip |
|---|---|---|---|---|---|
| user_id=42042 | 9 | 40.6s | 0.22s | 188x | 97% |
| status=503 | 111,130 | 40.6s | 1.75s | 23x | 0% |
| level=error status=503 | 15,838 | 40.6s | 1.71s | 24x | 0% |

Every count matches jq exactly. 9/9, 111,130/111,130, 15,838/15,838. No approximations, no missed lines.

Why it's still fast at 0% skip

The selective query (188x) is the classic block-pruning win — 97% of blocks never get read. But the broad queries are the interesting result. At 0% skip, every block is searched, and xgrep is still 23x faster than zcat | jq. That's because jq parses every line into a full JSON AST and evaluates an expression tree. xgrep does a targeted field lookup — no AST, no expression evaluator, just hash check then verify.

Two advantages compound: I/O avoidance (skip blocks) and CPU avoidance (lighter evaluation). Even when the first one doesn't apply, the second one still delivers.

How it works (short version)

Index: decompress once, split into blocks, build bloom filters
Query: check filters, read only candidate blocks
Execution: memory-mapped, OS loads only what's needed

The key metric isn't speed. It's bytes touched per query.

zgrep: 100% every time
xgrep: 0.1-1%

That's why the gap grows with data size.

Tradeoffs (honest)

Cache size: ~5x compressed size (stores decompressed data)
First run: ~2 min index build (amortized quickly)
Not universal grep: built for compressed logs + repeated search
For plain text: use ripgrep.

Who this is for

If you've ever:

waited on zgrep during an incident
rerun the same search 10 times
dealt with rotated .gz logs
wanted log-platform speed without log-platform overhead

Try it

cargo install xgrep-cli
xgrep "ERROR" logs/*.gz

github.com/HighpassStudio/xgrep

Deep dive

Architecture + benchmark methodology: ARCHITECTURE.md

xgrep is Apache-2.0 licensed. Built with Rust, rayon, memchr, and flate2.

Stop decompressing entire archives to get one file — introducing ARCX

Highpass Studio — Thu, 19 Mar 2026 20:43:35 +0000

Most archive formats make a simple task unnecessarily expensive: you need one file, so you download and decompress everything.

I built ARCX, a compressed archive format designed to fix that.

ARCX combines cross-file compression (like tar+zstd) with indexed random access (like zip), so you can retrieve a single file from a large archive in milliseconds without decompressing the rest.

Try it

GitHub: https://github.com/getarcx/arcx

Install:

cargo install arcx

Benchmark results

Across 5 real-world datasets:

~7ms to retrieve a file from a ~200MB archive
up to 200x less data read vs tar+zstd
compression within ~3% of tar+zstd

Example:

Dataset	ARCX Bytes Read	TAR+ZSTD Bytes Read	Reduction
Python ML	326 KB	63.1 MB	198x less
Build Artifacts	714 KB	140.4 MB	202x less

Why this matters

Modern systems don't need entire archives. They need one file, immediately.

This shows up in:

CI/CD pipelines (artifacts)
cloud storage (partial retrieval)
large codebases
package registries

ARCX reduces archive access to a manifest lookup, one block read, and one block decompress.

How it works

ARCX uses:

block-based compression
a binary manifest index
direct offset reads

Instead of scanning or decompressing the full archive:

Look up the file in the index
Seek to the relevant block
Decompress only that block

Comparison

Format	Compression	Selective Access
ZIP	weaker	fast
tar+zstd	strong	slow
ARCX	strong	fast

Tradeoffs

ARCX is not designed for streaming (like tar). The archive must be complete before reading because the manifest is written at the end.

Current limitations

Remote/S3 range-read workflows not fully benchmarked yet
Metadata/index overhead still being optimized for very large file counts
Full extraction benchmarks in Rust are still in progress

Feedback

Still early -- feedback welcome.