NARESH

Posted on May 29

LLM Wiki Solved Memory for AI. I Wanted Memory for Humans.

#ai #productivity #tutorial #architecture

TL;DR
AI coding agents are helping software evolve faster than humans can mentally keep up with it.
Tools like LLM Wiki give AI agents persistent memory of the repository, but while using these workflows across multiple projects, I realized something important:
The AI could continuously understand the evolving system.
I could not.
That led me to build Architecture-as-Memory (AAM) - a lightweight architectural memory layer that helps humans stay oriented as AI agents continuously modify and evolve the architecture.
Instead of relying only on documentation or chat history, AAM maintains a structured architectural memory directly inside the repository using YAML + live visual graphs.
LLM Wiki focuses on memory for AI.
AAM focuses on memory for humans.

🔗 npm package:
Architecture-as-Memory on npm

🌐 Website:
Architecture-as-Memory Website

A few months ago, most of my development workflow started shifting toward AI coding agents like Claude Code, Cursor, Gemini CLI, and similar systems. And if you have seriously built projects with these tools, you probably already know what happens after a while.

Things stop moving at normal human speed.

Features that normally take days of implementation, debugging, and architectural planning can suddenly appear through a single conversation. A small project with two or three clean ideas starts evolving rapidly. One feature creates three more ideas. Those ideas become workflows. Workflows become systems. Systems start depending on other systems. And because the agents already understand the project context, they do not just implement what you ask for anymore. They extend it.

At first, that speed feels incredible.

Then you slowly realize something strange is happening.

The bottleneck is no longer writing code.

The bottleneck becomes keeping up with the architecture that is evolving around you.

That was the part I kept running into while building. The AI agents did not seem lost. In many cases, they understood the structure of the project better than I did because they could continuously reload context, inspect relationships, revisit implementation details, and reason across the repository without mental fatigue.

Humans do not work like that.

We context-switch. We step away from projects. We come back after work, meetings, side quests, experiments, and life itself. Meanwhile, the system continues evolving at machine speed.

At some point, it stopped feeling like traditional software development and started feeling more like trying to hold onto something accelerating beyond human recall. Like the architecture had entered the Speed Force, but my own mental model had not caught up yet.

Around that same time, I came across the idea of LLM Wiki from Andrej Karpathy. The idea immediately clicked for me because it approached AI coding from a very different angle: persistent memory instead of repeated context reconstruction.

Instead of forcing an AI coding agent to repeatedly scan an entire repository every time it needed context, the agent could maintain its own structured understanding of the system through a persistent wiki generated from the architecture, workflows, relationships, and capabilities inside the project itself.

And honestly, it worked extremely well.

The agents became more context-aware. They navigated repositories faster. They made better implementation decisions. They stopped treating projects like unstructured piles of files.

But while using this workflow across multiple projects, I started noticing another problem.

LLM Wiki was solving memory for the AI.

I still had not solved memory for myself.

Even with documentation, I kept returning to the same questions whenever I reopened a project after a few days:

What changed?
Which systems are connected now?
Why does this feature exist?
What depends on this service?
What is stable?
What is still evolving?

The problem was not missing information.

The problem was cognitive overload.

Humans do not naturally reason about software through folders, imports, dependency trees, or implementation details. We reason through capabilities. Authentication. Payments. Notifications. Search. Analytics. Operational boundaries. We understand systems through compressed mental models, not through thousands of lines of code.

That realization eventually led me to build Architecture-as-Memory (AAM).

Not as a replacement for LLM Wiki.

Not as another documentation generator.

And not as a static architecture visualization tool.

But as a persistent architectural memory layer designed for humans living inside AI-native software development workflows.

The Problem Was Never Just Context Windows

One of the biggest conversations around AI coding today is context management.

People talk about token limits, repository indexing, retrieval systems, memory injection, and ways to stop AI agents from repeatedly scanning the same codebase again and again. That is exactly why approaches like LLM Wiki became so valuable in the first place. Instead of treating a repository like an unstructured pile of files every single time, the AI maintains a structured memory of the system.

And honestly, it works really well.

The agents become faster, more context-aware, and significantly better at making implementation decisions across large projects. They stop navigating the repository blindly and start reasoning about the system with continuity.

But after using these workflows for months, I started realizing something important:

Context windows were never the real bottleneck.

Human cognition was.

The AI could continuously reload architectural context without fatigue. It could inspect relationships across hundreds of files, revisit implementation details instantly, and reconnect decisions made weeks earlier within seconds. Humans do not work that way.

We context-switch constantly.

We leave projects for days. We come back after work. We jump between meetings, side projects, production issues, experiments, and life itself. Meanwhile, the architecture continues evolving at machine speed.

And modern AI coding workflows accelerate that evolution even further.

When an AI coding agent implements a feature, it rarely changes only one thing. A single request can quietly affect multiple workflows, services, dependencies, and future architectural decisions. The agent might restructure abstractions, introduce new relationships, optimize adjacent systems, or extend capabilities beyond the original scope of the request.

Most of the time, these are actually good improvements.

But the speed of architectural mutation becomes difficult for humans to track mentally.

The codebase starts evolving faster than the developer's internal model of the system.

That creates a strange asymmetry inside AI-native development workflows:

The AI understands the project because it has persistent context.

The human slowly loses architectural orientation because memory does not scale at the same speed as implementation.

That was the point where I realized the problem was not just about helping AI agents remember repositories better.

It was also about helping humans continue understanding systems that are now evolving faster than human recall cycles.

LLM Wiki Changed My Thinking

Around this time, I came across the idea of LLM Wiki from Andrej Karpathy, and it completely changed the way I started thinking about AI-native development workflows.

The idea was surprisingly simple.

Instead of forcing an AI coding agent to repeatedly scan an entire repository every single time it needed context, the agent could maintain its own structured memory of the project. A persistent wiki generated from the architecture, workflows, components, relationships, and operational understanding of the system itself.

If you have seriously used tools like Claude Code or Cursor on larger projects, you immediately understand why this matters.

Normally, when you ask an agent to implement a feature, a significant portion of its work is not actually implementation. It is context reconstruction. The agent searches files, reads folders, follows dependencies, tries to understand workflows, and slowly rebuilds a mental map of the repository before it can confidently make changes.

LLM Wiki changes that dynamic completely.

Instead of rediscovering the system repeatedly, the agent already has a compressed memory layer describing how the project works. That means less unnecessary context rebuilding, better architectural consistency, and faster implementation decisions over time.

So I started using this workflow heavily across multiple projects.

And honestly, the difference was noticeable almost immediately.

The agents stopped behaving like code generators and started behaving more like systems-aware collaborators. Features became more coherent. Refactors became safer. The agents could understand relationships between workflows without repeatedly traversing the entire repository from scratch.

But while using this approach, I started noticing another problem that was much harder to ignore.

The AI was becoming better at understanding the evolving architecture.

I was not.

A project that originally started with three clean ideas would slowly evolve into something much larger after dozens of implementation cycles. One feature would branch into multiple adjacent capabilities. The agent would optimize surrounding systems automatically. Shared abstractions would appear. Service boundaries would evolve. New dependencies would quietly emerge between workflows.

The architecture was no longer growing linearly.

It was compounding.

And because these changes were happening incrementally across hundreds of interactions, the architectural intent slowly started disappearing into chat history.

That was the moment where I realized something important:

LLM Wiki solved persistent memory for the AI.

But there was still no equivalent memory layer for the human trying to keep up with the system.

Why Documentation Started Breaking Down

At first, I thought the solution was simple: document everything better.

If the architecture was becoming harder to track, then the obvious answer seemed straightforward. Write cleaner notes. Maintain better documentation. Record architectural decisions somewhere permanent.

But AI-native development changes the scale of software evolution completely.

The problem is not that documentation becomes useless. The problem is that systems now evolve faster than most humans can continuously rebuild their understanding of them.

A project that once changed gradually can now evolve multiple times in a single evening. New workflows appear quickly. Existing services gain new responsibilities. Boundaries shift. Relationships between systems become more interconnected over time. And because many of these changes happen incrementally across hundreds of interactions, architectural intent slowly starts disappearing into chat history, implementation diffs, and scattered conversations.

The information technically still exists.

But reconstructing the entire mental model repeatedly becomes exhausting.

That was the part I kept running into.

I did not need more raw information.

I needed faster reorientation.

And this is where I think most existing tooling still optimizes primarily for machines instead of humans.

Most systems focus on retrieval, indexing, semantic search, token efficiency, repository understanding, and implementation context. Those things absolutely matter. They make AI agents more effective.

But humans do not naturally rebuild understanding through massive textual context dumps.

We think visually.

We think relationally.

We think through compressed abstractions.

When developers think about a system, they usually are not asking:

"Which file imports this dependency?"

They are asking:

"How does authentication connect to onboarding?"

"What breaks if this service changes?"

"Which systems are still unstable?"

"Why does this workflow even exist?"

That is architectural cognition.

And the more I worked with AI coding systems, the more I realized that traditional documentation alone was never designed for software evolving at machine iteration speed.

I Wanted Something That Could Answer a Few Questions Instantly

At some point, I stopped thinking about this as a documentation problem and started seeing it as a cognition problem.

The issue was not that the information was missing. Most of the time, the information already existed somewhere inside the repository, inside documentation, or buried across previous conversations with the AI. The real problem was that the architecture was evolving faster than my brain could continuously rebuild and maintain a stable mental model of it.

And honestly, I do not think this has anything to do with intelligence or being a "better engineer."

The speed itself is the problem.

AI agents can now modify multiple parts of a system within minutes. They introduce new abstractions, extend workflows, reorganize responsibilities, and improve adjacent systems while implementing the original request. Even when those changes are correct, the architecture starts shifting continuously beneath you.

After enough iterations, you are no longer struggling because the system is poorly designed.

You are struggling because the system evolved faster than your ability to mentally reorient yourself inside it.

That was the point where I realized I did not need more documentation.

I needed faster reorientation.

I wanted something that could instantly answer a few simple but extremely important questions whenever I returned to a project:

What exists in the system right now?
Why does this feature exist?
Which systems are connected?
What changed recently?
What is stable, and what is still evolving?
What could break if this service changes?

That idea eventually became the foundation behind Architecture-as-Memory (AAM).

Not as another documentation platform, and not as a replacement for deep technical systems like LLM Wiki. In fact, I think both solve different layers of the same problem.

LLM Wiki gives AI agents persistent textual understanding of the repository.

AAM focuses on giving humans compressed architectural understanding of the system itself.

The goal was to create something lightweight enough that AI agents could maintain incrementally alongside the codebase, while remaining visual and structured enough that a developer could return after days or weeks away and regain architectural orientation within minutes instead of spending hours reconstructing context from scratch.

Introducing Architecture-as-Memory (AAM)

That realization eventually led me to build Architecture-as-Memory (AAM).

At its core, AAM is a lightweight architectural memory layer designed for AI-native software development workflows. The idea is simple: as AI agents continuously evolve the system, the architecture should evolve alongside it in a structured and visible way instead of slowly disappearing into chat history, implementation diffs, and scattered documentation.

The goal was never to create another static diagram generator or another documentation platform that developers eventually stop updating.

I wanted something much more practical.

Something that could help me return to a project after days or weeks away and understand the current shape of the system within minutes instead of spending hours reconstructing context manually.

What exists right now?
How are systems connected?
What changed recently?
Which services are stable?
Which workflows are still evolving?
What could break if this component changes?

Those are the kinds of questions AAM is designed to answer quickly.

The architecture itself lives as structured YAML directly inside the repository, while AI agents continuously update that memory layer alongside implementation changes. On top of that, AAM generates a live architectural graph that helps visualize relationships, dependencies, workflows, and system evolution in a way that humans can process much faster than raw documentation alone.

And this is also where AAM differs from most traditional architecture tooling.

It is not trying to replace deep documentation systems like LLM Wiki. In fact, I think both approaches work extremely well together.

LLM Wiki gives AI agents persistent textual understanding of the repository.

AAM focuses on compressed architectural cognition for humans.

One helps the AI reason deeply about implementation.

The other helps humans stay oriented while the system continues evolving at machine speed.

A Real Problem I Kept Running Into

One situation kept repeating itself while working on larger projects.

Assume there are multiple connected services or workflows inside the system.

Service A depends on Service B.

Service B is connected to Services C and D.

Now imagine asking an AI coding agent to improve or extend Service B.

The agent finishes the original task, but while doing that, it also updates adjacent workflows, restructures a shared abstraction, modifies event handling, and introduces a cleaner dependency flow between systems. Technically, the implementation is correct. In many cases, it is actually better than what I originally planned.

But after enough iterations, something subtle starts happening.

The downstream architectural impact becomes difficult to track mentally.

You know something changed.

You know the system evolved.

But unless you spend significant time rereading diffs, documentation, chat history, and implementation details, your understanding of how everything currently connects starts becoming fragmented.

And this becomes even more noticeable in architectures that naturally evolve into:

microservices,
event-driven systems,
modular workflows,
AI orchestration layers,
or highly interconnected feature systems.

Because in those environments, changing one capability rarely affects only one capability.

Relationships compound over time.

That was one of the biggest reasons I started building AAM around relationships and architectural state instead of only static structure.

If a service changes, I want to immediately see which systems are connected to it.

If a workflow is evolving rapidly, I want that visible.

If a capability becomes unstable because multiple neighboring systems changed recently, I want architectural drift exposed before it becomes invisible technical debt buried inside implementation history.

That visibility matters because the challenge is no longer just "understanding code."

The challenge is maintaining system-level awareness while the architecture keeps evolving continuously around you.

Humans Do Not Think in Files. They Think in Capabilities.

One of the biggest things I realized while building AAM is that humans and AI agents do not experience software architecture the same way.

AI agents can comfortably navigate repositories through files, imports, dependency chains, and implementation relationships. That is natural for them because they can continuously reload context directly from the codebase itself.

Humans usually do not think that way.

When developers think about a system, they are rarely visualizing folder structures in their heads.

They are thinking about capabilities.

Authentication.
Payments.
Notifications.
Analytics.
Search.
Workflows.
Operational boundaries.

That is how architectural understanding actually exists inside human cognition.

And this distinction becomes extremely important once projects start growing rapidly with AI-assisted development.

Because the real problem is not "Where is this file located?"

The real problem becomes:

"How does this capability interact with the rest of the system?"

That is a very different question.

Traditional dependency graphs are usually too implementation-focused for this kind of thinking. They expose technical relationships, but they often fail to preserve architectural meaning. After a certain scale, they become visually dense without actually helping developers regain orientation quickly.

That was another major reason behind the way AAM was designed.

Instead of treating architecture as a collection of files and imports, AAM treats architecture as a network of evolving capabilities and relationships. The graph is not there just to visualize connections. It acts more like a cognitive compression layer for the system.

You are not trying to understand every implementation detail at once.

You are trying to rebuild enough architectural awareness to confidently continue evolving the project.

That difference matters a lot more in AI-native development than I initially expected.

What AAM Is Not

One thing I want to make very clear is that AAM is not trying to become another overly complex architecture management platform.

It is not UML.

It is not repository indexing.

It is not an AI replacement layer.

And it is definitely not trying to generate massive enterprise diagrams that become outdated after two sprint cycles.

In fact, one of the biggest design goals behind AAM was reducing friction instead of adding more process.

The goal is intentionally simple.

Whenever an AI coding assistant like Claude Code, Cursor, Gemini CLI, or similar systems ships a feature, refactors a workflow, or modifies architectural relationships, it should also update the architectural memory of the project alongside those changes.

That is the core idea.

When you install AAM, the project gets an architecture memory layer directly inside the repository along with agent instruction files like aam-skill.md. Those instructions define how the AI should maintain the architecture continuously as the system evolves.

So instead of architecture becoming disconnected from implementation over time, the memory evolves together with the codebase itself.

That part matters a lot because manually maintaining architecture documentation almost always breaks once development speed increases.

Especially in AI-native workflows.

If developers need to constantly pause implementation to manually synchronize diagrams, rewrite architecture docs, or maintain separate tooling outside the repository, the system eventually gets ignored.

AAM tries to avoid that failure mode completely.

The goal is not perfect architectural representation.

The goal is persistent architectural awareness.

Because once software starts evolving at machine iteration speed, even a lightweight but continuously evolving architectural memory layer becomes more valuable than perfectly designed diagrams that nobody updates anymore.

And honestly, I think this gap between implementation speed and human architectural recall is only going to grow from here.

Setting Up AAM

One thing I cared about while building AAM was keeping the setup extremely simple.

I did not want another system that required complicated infrastructure, external services, or hours of configuration before it became useful. The entire idea was to make architectural memory feel like a natural extension of the existing AI coding workflow instead of another layer developers have to maintain separately.

So the setup is intentionally lightweight.

You initialize the package inside the root of your project, and AAM scaffolds the architectural memory layer directly into the repository. From there, the workflow becomes mostly automatic. The full setup process and commands are available through the npm package and documentation itself.

Internally, AAM looks for existing AI instruction files already present in the project, things like CLAUDE.md, AGENT.md, .gemini/GEMINI.md, AI-INSTRUCTIONS.md, and similar instruction layers depending on the coding assistant you use. Instead of replacing those workflows, AAM simply appends lightweight instructions telling the agent to read the architectural memory and update it incrementally whenever the system evolves.

That part was very important to me.

AAM is not trying to take control of the workflow.

It is simply teaching the coding assistant one additional habit:

Whenever the system changes, update the architectural memory too.

For Claude Code specifically, there is also optional hook support and slash command integration so the workflow stays consistent across sessions, even when context resets happen.

And honestly, that small behavioral loop is the entire foundation behind the system.

Build the feature.

Ship the change.

Update the architectural memory alongside it.

Continuously.

Because the moment architecture becomes something developers plan to update "later," it usually stops getting updated entirely.

Where I Think This Is Going

I honestly do not think this problem is temporary.

AI coding systems are getting dramatically better at implementation, architectural reasoning, and repository-scale context management. Which means software itself is going to keep evolving faster and faster over time.

And I think that changes the role of architecture completely.

For a long time, architecture mostly existed inside human memory, diagrams, documentation, and team conversations. That worked because software evolved slowly enough for humans to continuously rebuild and maintain the mental model.

But AI-native development changes that balance.

Now architecture evolves continuously across conversations, implementations, refactors, abstractions, and autonomous improvements happening at machine iteration speed. And once that happens, relying entirely on human recall becomes fragile.

That is really the core idea behind AAM.

Not replacing developers.

Not replacing documentation.

And not trying to automate architectural thinking away.

The goal is much simpler than that.

I think AI-native software development needs persistent cognition layers shared between humans and AI systems. Something that allows the AI to continuously evolve the system while still allowing humans to stay oriented inside that evolution without constantly reconstructing the architecture from scratch.

Because at some point, the problem stops being:

"How do we generate code faster?"

And starts becoming:

"How do humans continue understanding systems that no longer evolve at human speed?"

That is the problem I kept running into while building.

And honestly, Architecture-as-Memory is my current attempt at solving it.

Final Thoughts

To be honest, I am not trying to convince everyone to use AAM specifically.

The important idea is not the package itself.

The important idea is the workflow.

If this problem resonates with you, you can honestly build a lightweight version of this with almost any AI coding assistant today. You can ask Claude Code, Cursor, Gemini CLI, or whichever system you use to maintain a structured architectural memory layer directly inside the repository as the project evolves.

I personally prefer YAML because it is lightweight, readable, diff-friendly, and works well for both humans and AI systems. But the format itself is not really the important part.

The important part is preserving architectural understanding continuously instead of repeatedly reconstructing it from scratch.

You can even layer a simple local visualization system on top of it so you can quickly inspect relationships, workflows, dependencies, and evolving system boundaries whenever you return to the project.

And honestly, that alone already changes a lot.

Because this is not really about generating prettier diagrams or replacing UML tooling. Any AI coding assistant can already generate diagrams if you ask for them.

The real problem is consistency.

How many times are you realistically going to regenerate architecture diagrams manually while the system keeps evolving every single day?

That is the gap this workflow tries to solve.

The architecture evolves together with the project itself.

So if someone asks about the structure of your system, you do not need to mentally reconstruct everything again or dig through old implementation history. You can simply open the architectural memory layer and immediately understand how the system currently behaves, how capabilities connect, what changed recently, and where the important boundaries exist.

That is the core idea behind AAM.

Not static documentation.

Not enterprise process.

Just persistent architectural memory for systems evolving at AI speed.

And honestly, I think approaches like this will slowly become normal in AI-native development workflows, especially alongside ideas like LLM Wiki.

Because the faster AI systems become at building software, the more important architectural memory becomes for the humans working with them.

🔗 Connect with Me

📖 Blog by Naresh B. A.

👨‍💻 Building AI & ML Systems | Backend-Focused Full Stack

🌐 Portfolio: Naresh B A

📫 Let's connect on LinkedIn | GitHub: Naresh B A

Thanks for spending your precious time reading this. It's my personal take on a tech topic, and I really appreciate you being here. ❤️

Top comments (2)

Harjot Singh • May 31

"Memory for AI vs memory for humans" is a sharp distinction and worth pulling apart, because the two problems pull in opposite directions. AI memory optimizes for retrieval - dense embeddings, fast nearest-neighbor, recall whatever's relevant to this token. Human memory optimizes for the opposite: forgetting on purpose, surfacing at the right moment (spaced repetition), and connecting ideas you didn't ask for. A system that just bolts vector search onto your notes gives you a search box, not memory - the human version needs intent (why did I save this), resurfacing, and association, not just "find the nearest chunk."

The design tension I'd watch: trust and noise. Human memory tools live or die on signal - resurface the wrong thing and people stop trusting it, same way a noisy notification gets muted. So the hard part isn't storing, it's deciding what's worth bringing back and when. That "surface only what earns it" discipline is something I think about a lot in Moonshift, the thing I build - a multi-agent pipeline that takes a prompt to a deployed SaaS, where a verify layer decides what's good enough to act on rather than dumping everything. Multi-model routing keeps a build ~$3 flat, first run's free no card. Really like the framing. What's your resurfacing trigger - time-based (spaced repetition), or contextual (you're working on X, here's what you saved about X)? The contextual one is harder but it's the thing that would actually feel like memory.

NARESH • Jun 14

all great points. i actually lean more toward contextual resurfacing than time-based triggers. if i'm working on payments, i care about payment-related architectural context, not something that changed three weeks ago. i think the real challenge isn't storing memory, it's deciding what deserves attention at a given moment.