DEV Community

Cover image for Your AI Forgets Everything. Here's the Open Source Protocol That Fixes It.
RobIW_dev
RobIW_dev

Posted on

Your AI Forgets Everything. Here's the Open Source Protocol That Fixes It.

Every AI session starts from zero.

You spent 40 minutes yesterday working through an architecture decision. You evaluated three approaches, rejected two with good reasons, picked the third, and started building. Today you open a new session and the AI has no idea any of that happened.

So you re-explain. Re-state the constraints. Re-derive the decisions. If you're lucky, you remember everything. If you're not, the AI cheerfully re-suggests the approach you rejected yesterday, and you don't catch it because you forgot why you rejected it.

This is the default experience for everyone using AI for sustained project work. And most people have accepted it.

I didn't want to accept it anymore. So I built a protocol that fixes it.

The Problem Is Structural

I'm a database developer. 15 years of production T-SQL, SSIS, SQL Server. When I started using AI seriously for development work, the capability impressed me but the amnesia killed me.

The usual workarounds all break down:

Copy-paste context seems like the obvious solution. Paste your notes from last session into the new one. But this grows with every session. By session 5, your recap is 3,000 tokens. By session 10, it's competing with actual work for context space. On small models, the recap alone fills the window.

RAG retrieves fragments from a knowledge base. But fragments aren't project state. Knowing that "the user prefers PostgreSQL" is different from knowing "we rejected database X because x y z". You need the assembled picture.

Platform memory captures preferences like your name, your coding style, your preferred language. It doesn't capture that you're three sessions into a trading system redesign where you've decided on event sourcing but haven't implemented the snapshot mechanism yet.

Starting over is what most people actually do, it's what I did and it works fine for one-off questions. But for multi-session projects where you're building something real over days or weeks the starting over means re-deriving work you already did, or worse, losing decisions you forgot to re-state.

and all this leads to heavier and heavier cognitive load

What I Built

AIST (AI State Transfer Protocol) is a structured format for capturing project state at the end of a session and restoring it at the start of the next one.

A real example. This is a handoff from a session where I worked on SSIS development tooling:

@AIST/5.0

§HEADER
project: ssis-vscode | session: 2026-01-30T12:00:00Z

§ESSENCE
Building SSIS development extension for VS Code. ManagedDTS + YAML
chosen over BimlExpress. Type mapping solved. Architecture complete.

§MEMORY
+approach: ManagedDTS API with optional YAML front-end
+types: SQL types in YAML, DT_ types handled internally by compiler
+target: open source, free for students and hospitals
!warning: Script Task C# parsing needs Roslyn — separate phase

§DECISIONS
[D1] 2026-01-30 "manageddts-over-biml"
     why: Direct DTSX control, no licensing dependency
     rejected: BimlExpress (license restricts redistribution)
     revisit-when: If BimlExpress goes fully open source

§THREADS
[T1] manageddts-poc status:ready effort:4h
     next: Build minimal data flow with ManagedDTS API
[T2] yaml-schema status:designed effort:6h
     next: Define YAML schema for common SSIS patterns

§HANDOFF
to: next-session | focus: T1 ManagedDTS proof of concept
Enter fullscreen mode Exit fullscreen mode

That's roughly 200 tokens. The conversation that produced it was 40,000+(after I asked claude to run the numbers). That's a 200x compression ratio on this particular session. A typical handoff runs around 950 tokens for a complex multi-session project which is still a 40-60x compression.

Tomorrow, I paste this into a new session and say "here's where we left off." The AI resumes with full context. It knows what I built, what I rejected and why, what's blocking the next step, and what to be careful about. No re-explaining. No lost decisions.

Why This Matters More Than You Think

Pillar 1: Your AI forgets everything

This isn't a minor inconvenience. It's a fundamental problem with how AI-assisted work currently operates.

A 40-minute conversation produces knowledge that exists nowhere except in that conversation's context window. Close the tab, and it's gone. Even on the same platform with the same account, the next session is blank.

The knowledge isn't just "what we talked about." It's structured:

  • Decisions with rationale and rejected alternatives
  • Warnings about things that will break
  • Calibration from iterative refinement ("too formal" → "more like how I'd write in Slack")
  • Architecture that took multiple rounds to converge on
  • Rejected approaches that shouldn't be re-suggested

Without structured transfer, you either re-derive all of this (expensive) or lose it silently (dangerous). AIST makes it explicit: here's what we know, here's what we decided, here's what's next.

Pillar 2: The smallest context windows benefit the most

An AIST handoff is roughly 950 tokens(can be as low as 120 or as high as 950). On a 200K context window, that's 0.5%. On a 1M window, it's 0.1%. For Pro-tier users on frontier models, AIST is a convenience almost a nice to have(almost), but barely noticeable in context cost.

But on an 8K free tier, 950 tokens is 12%. On a 4K local model, it's 24%.

Here's the thing: those users need AIST the most. Because without it, they can't do multi-session work at all.

The math is straightforward. Without AIST, context recaps grow with every session:

  • Session 1: 0 tokens of recap (fresh start)
  • Session 2: ~1,500 tokens (recap of session 1)
  • Session 3: ~2,800 tokens (recap of sessions 1-2)
  • Session 5: ~5,000+ tokens (recap is now half your context on an 8K model)

On a 4K local model, this kills the project after 2-3 sessions. The recap fills the window and there's no room for actual work.

With AIST, the handoff stays the same size forever. Session 2 and session 50 both cost 950 tokens. The project doesn't outgrow the model.

Project lifecycle comparison with AIST vs without

This means:

  • Free tier users can run multi-week projects
  • Local model users on 4K-32K get genuine session continuity
  • API users on a budget spend tokens on work, not recap
  • Developers in countries where $20/month matters get access to sustained AI collaboration that was previously out of reach

The value of AIST is inversely proportional to context size. The less context you have, the more it matters.

Pillar 3: Small models can now do sustained work

AIST drops the minimum viable context for multi-session project work from roughly 32K tokens to roughly 4K.

A $200 used GPU running a 7B model can now maintain project continuity across sessions. A student running a quantized model on a laptop can sustain a multi-week coding project. These configurations are genuinely infeasible without structured state transfer and genuinely functional with it.

This isn't optimization. It's a new category of user who can now do serious AI-assisted work.

The Anti-Incentive Problem

A reasonable question: if this is so useful, why hasn't it been built already?

Because the organizations best positioned to build it have no incentive to do so.

AI providers monetize tokens. Every token spent re-explaining context is revenue. Every session that starts from zero means more API calls, more subscription usage, more compute burned. A protocol that reduces context overhead by 60x works directly against the business model.

This isn't a conspiracy. Nobody at these companies is sitting in a room plotting to waste your context. It's a structural incentive just like your phone manufacturer doesn't optimize for 10-year battery life. The solution has to come from outside.

That's why AIST is open-source under CC BY-SA 4.0. The share-alike clause ensures derivatives stay open. Nobody wraps this and sells it back to you.

How It Works

At the end of a session

Tell your AI: "Generate an AIST handoff for this session."

It produces a structured document with sections for essence (what happened), memory (key facts), decisions (what you chose and why), threads (active work streams), and a transfer budget showing what each compression level costs.

At the start of the next session

Paste the handoff and say "here's where we left off."

The AI resumes with full context. Decisions, rejections, warnings, next steps it's all preserved in a fraction of the original conversation's tokens.

The Transfer Budget

This is the key v0.5 innovation. At handoff time, AIST shows you what each compression level preserves and what it loses:

§TRANSFER-BUDGET
total-captured: ~950 tokens

| Fidelity | Tokens | What's lost                                    |
|----------|--------|------------------------------------------------|
| MAX      | ~950   | Nothing                                        |
| HIGH     | ~700   | Calibration pairs, working style               |
| MEDIUM   | ~450   | + Heuristics, significance                     |
| LOW      | ~250   | + Ideas, artifacts, implementation details      |
| MIN      | ~120   | Everything except essence + memory + handoff    |
Enter fullscreen mode Exit fullscreen mode

You decide what to keep. The protocol never silently drops information. If you're on a 200K model take MAX it's 0.5% of your context. If you're on a 4K local model, LOW gives you enough to resume while leaving room to work.

What AIST Captures That Others Don't

Knowledge type Example Platform memory? RAG? AIST?
Preference "I prefer Python" Yes Maybe Not its job
Decision with rationale "Chose PostgreSQL because access patterns require joins" No No Yes
Rejected alternative "Tried event sourcing, abandoned because snapshot complexity" No No Yes
Warning "Don't touch billing module, undocumented side effects" No No Yes
Creative calibration "Voice needs to sound less formal, more practitioner" No No Yes
Active work state "Rate limiting designed but not built, blocked on Redis pooling" No No Yes

AIST doesn't replace platform memory or RAG. They're different tools for different problems. Platform memory knows who you are. RAG retrieves what you've stored. AIST knows where your project is right now.

No Tooling Required

AIST is plain text. Any LLM that reads text can consume and generate AIST handoffs. No API keys, no plugins, no platform dependency.

The spec is 16 sections, all optional except the header and essence. A minimal handoff is 3 lines. A full handoff with decisions, calibration, and implementation details is still under 950 tokens.

You can read AIST handoffs. You can edit them. You can version-control them in git. They're Markdown-adjacent text files, not binary blobs or opaque embeddings.

Try It

The repo is live: github.com/RobIW-dev/aist-protocol

What's in there:

  • Full v0.5 specification — every section defined with examples
  • 5 templates — from minimal (120 tokens) to maximum fidelity (950 tokens)
  • 3 real examples — including AIST describing its own design sessions
  • Interactive context calculator — see what AIST costs on your specific model across 68 models and 22 languages
  • Visual explainers — infographics you can use in your own articles or talks

Quick start

  1. At the end of your next AI session, say: "Generate an AIST handoff for this session."
  2. Save the output.
  3. At the start of your next session, paste it and say: "Here's where we left off."
  4. Notice the difference.

No installation. No signup. Just paste text.

The Recursive Proof

AIST v0.5 was designed across multiple AI sessions. The protocol that preserves session state was used to preserve the state of its own design process.

The design sessions produced roughly 80,000 tokens of conversation. The self-handoff is 650 tokens. Every design decision, rejected alternative, and architectural rationale preserved at 120x compression.

The self-handoff is in the repo: examples/self-handoff.aist

What This Isn't

Not a product. There's no service, no subscription, no freemium tier. AIST is a specification published under CC BY-SA 4.0.

Not a compression algorithm. It's a protocol for deciding what to keep and what to lose, with explicit tradeoffs shown to the user.

Not finished. This was built for one developer's workflow. Fork it. Change everything. The approach matters more than the format. If you adapt it, I'd genuinely like to hear what you changed and why.


Built by a developer who got tired of re-explaining the same project to the same AI every morning.

Links:

Top comments (0)