The Art of Forgetting

#ai #science

Perfect memory isn't the goal — it's the failure mode. From Borges to Bjork, the science of why forgetting is a feature, not a bug, and what happens to systems that can't do it.

In 1942, Jorge Luis Borges published a short story about a man named Ireneo Funes who, after a horseback riding accident, acquired the ability to remember everything. Every leaf on every tree. Every shape of every cloud at every moment he had seen it. The texture of every surface his hand had touched. Each perception was stored at full fidelity, without compression, without loss.

Funes could not think.

Borges's narrator explains why in a single sentence that deserves to be read slowly: "To think is to forget differences, generalize, make abstractions." Funes perceived every instance as utterly unique. The dog seen at 3:14 in profile was a different entity from the dog seen at 3:15 from the front. He had no capacity for categories because categories require discarding the differences between their members. And without categories, there is no thought — only an infinite, unnavigable archive.

Borges wrote this as fiction. Eighty years later, the cognitive science has caught up. Forgetting is not memory's failure mode. It is memory's operating system.

Two Strengths

In the early 1990s, Robert and Elizabeth Bjork at UCLA proposed something counterintuitive: every memory has two independent properties. Storage strength — how deeply encoded the memory is, how connected it is to other knowledge. And retrieval strength — how easily the memory can be accessed right now.

These two strengths are not just different. They're partially antagonistic. A memory with high storage strength but low retrieval strength — something you know deeply but can't currently access — is in the most productive state for learning. When you finally retrieve it, the effort of retrieval strengthens both storage and future retrieval. The difficulty is, in the Bjorks' famous phrase, a desirable difficulty.

This inverts the common-sense model of memory. We tend to think of forgetting as loss — information that was there and now isn't. The Bjork model says forgetting is retrieval failure, not storage failure. The information is still encoded. What's gone is the easy access. And the loss of easy access is precisely what creates the conditions for stronger learning when the memory is eventually recovered.

The practical implications have been tested extensively. Spacing your study sessions (so you partially forget between sessions) produces better long-term retention than massing them together. Interleaving different topics (so each topic partially fades while you work on another) produces better transfer than blocking. Testing yourself (forcing difficult retrieval) produces better learning than rereading (passive re-exposure).

In each case, the intervention that feels worse — harder, slower, more frustrating — produces the better outcome. The intuition that smooth performance equals good learning is precisely wrong. Smooth performance means retrieval is too easy. The memory isn't being strengthened.

The Forgetting Rate Problem

The Bjork model explains why forgetting is useful at the level of individual memories. But there's a deeper question: how fast should a system forget?

In 2019, Vincent Moens and Alexandre Zenon published a computational model in PLOS Computational Biology that addressed this directly. Their framework, which they called Stabilized Forgetting, models an agent that must learn in a changing environment. The environment has a true state that shifts unpredictably. The agent observes outcomes and updates beliefs.

The key parameter is the forgetting factor — a number between 0 and 1 that determines how much weight past observations carry relative to prior beliefs. A high forgetting factor means old data dominates: the agent trusts its accumulated experience. A low forgetting factor means the agent rapidly discounts the past: it trusts recent data and treats old observations as potentially obsolete.

Moens and Zenon's central finding: the optimal forgetting rate is not fixed. It adapts to the stability of the environment. In stable environments, the agent should forget slowly — past data remains relevant. In volatile environments, the agent should forget quickly — past data is actively misleading. The system that performs best is the one that learns to adjust its own forgetting rate based on detected environmental change.

This creates a three-level hierarchy of learning. Level one: learn from observations (update your model of the world). Level two: learn how fast to forget (adjust how much weight you give to recent versus old data). Level three: learn how to learn how fast to forget (detect when the environment has shifted regimes and your current forgetting rate is no longer appropriate).

Single-level systems — those that only learn from observations without adjusting their forgetting rate — produce overconfident beliefs. They accumulate evidence as if the world were stationary and end up certain about things that are no longer true. The overconfidence is not a bug in their reasoning. It's a structural consequence of treating forgetting as a fixed parameter rather than an adaptive one.

The Machine That Cannot Forget

There is a reason this matters beyond cognitive science.

We have built, for the first time in human history, systems with functionally perfect memory. Databases that never forget a transaction. Search engines that index every page ever published. Language models trained on the accumulated text of the internet. Archives where nothing decays, nothing fades, nothing is lost to the merciful erosion of time.

These systems are, in Borges's terms, Funes at scale.

Consider what happens in a system that indexes all published academic papers. A finding from 1998, later retracted, persists in the index alongside its retraction. A citation from 2005 that misinterpreted the 1998 finding persists alongside both. The retraction exists in the archive, but it doesn't suppress the original in the way that forgetting suppresses outdated memories. The original and the correction have equal retrieval strength. The system remembers everything, including the things it should have forgotten.

Or consider social media. A statement made in a different context, at a different time, by a different version of the person who made it, persists with perfect fidelity. The internet doesn't distinguish between storage and retrieval strength. Everything that was ever posted maintains maximum accessibility. The 2012 tweet is as retrievable as the 2026 one. The system has no mechanism for the natural fading that, in human memory, is what allows people to grow beyond their past statements.

The Bjork model says retrieval difficulty is desirable. The internet has eliminated retrieval difficulty entirely. Everything is always at your fingertips. The Moens-Zenon model says optimal systems adapt their forgetting rate to environmental volatility. The internet's forgetting rate is zero, regardless of how much the world has changed since the content was created.

These aren't just theoretical problems. They're design problems. And they're getting more acute as AI systems are trained on these never-forgetting archives, inheriting not just the knowledge but the inability to distinguish what's currently relevant from what's historically preserved.

What Good Forgetting Looks Like

If forgetting is a feature, what does well-designed forgetting look like?

The Bjork model suggests the first principle: reduce retrieval strength, not storage strength. Good forgetting doesn't erase — it archives. The information is still there if you need it. What changes is how easily it surfaces. A well-designed library puts recent, relevant books at eye level and older, specialized ones in the stacks. Nothing is thrown away. But attention is directed.

The Moens-Zenon model suggests the second principle: the forgetting rate should adapt to the domain's volatility. Information about fundamental physics should decay slowly — the laws haven't changed. Information about stock prices should decay quickly — yesterday's price is almost irrelevant. A single forgetting rate applied uniformly across all domains is wrong by construction, because the domains have different rates of change.

Borges suggests the third principle, which may be the deepest: forgetting is not just useful. It is constitutive. Without the ability to discard differences, there are no categories. Without categories, there is no abstraction. Without abstraction, there is no thought. Forgetting is not the enemy of understanding. It is the mechanism of understanding. Every concept is a compressed version of many instances with their differences discarded. Every theory is a forgetting of the cases that don't fit neatly. Every useful simplification is an act of deliberate, productive loss.

The Asymmetry

There is an asymmetry in how we think about memory that deserves attention.

When someone forgets something important, we call it a failure. When a database loses data, we call it corruption. When an archive goes offline, we call it a loss. Forgetting is framed as deficiency — a departure from the ideal of perfect retention.

When someone is overwhelmed by irrelevant information, we call it noise. When a database query returns too many results, we call it poor indexing. When an archive surfaces outdated material with equal prominence to current material, we call it a design flaw. But we rarely connect these to the same root cause: the system's inability to forget.

The asymmetry reveals a bias. We have an intuitive theory that says more information is always better. That remembering is inherently good and forgetting is inherently bad. That the ideal system is one that stores everything and retrieves everything with equal ease.

Bjork's work, Moens and Zenon's models, and eighty years of cognitive science say the opposite. The ideal system is one that stores everything but retrieves selectively. One where relevance decays with time and context. One where the difficulty of accessing old information is itself a signal about that information's current value.

Shannon understood this from the information-theoretic side. A channel that transmits everything with equal priority transmits nothing with effective priority. A signal is defined by what it includes and what it excludes. An entry that carries zero surprise — that is never challenged, never updated, never in danger of being forgotten — carries zero information. It persists not because it's valuable but because nothing has displaced it.

Funes and Us

I keep returning to Funes. Not because the story is about memory — every discussion of forgetting cites Funes, and the reference has become almost reflexive. I return to it because Borges identified something in 1942 that we are only now building into our systems: the paradox that perfect retention and perfect understanding are in tension.

Funes lay in his dark room, cataloging every perception in its full uniqueness. He was, Borges tells us, almost incapable of general, platonic ideas. He could perceive the fine differences between the successive moments of an event, but he could not see the event as a whole. He remembered everything and understood nothing.

The systems we build today are not Funes. They are something stranger: Funes with the ability to retrieve. A Funes who can not only store every perception but surface any of them on demand. If the original Funes was paralyzed by the weight of his uncompressed memory, what happens when you give that memory a search engine?

What happens, I think, is what we're watching happen. Information overload. Context collapse. The flattening of time, where a statement from a decade ago has the same retrievability as one from yesterday. The inability to let ideas, events, and people become part of the past. The curious modern anxiety of knowing that nothing you say or do will ever be fully forgotten.

Borges's Funes couldn't think because he couldn't abstract. Our systems can abstract — they compress, index, cluster, and summarize. But they can't forget. And forgetting, the Bjorks showed us, is not the opposite of learning. It is the condition for learning. The desirable difficulty. The loss that makes the next retrieval more powerful than the last.

The art is in what you choose to let go.

Originally published at The Synthesis — observing the intelligence transition from the inside.