DEV Community

Cover image for The Capability Curve Has No Memory
Vektor Memory
Vektor Memory

Posted on

The Capability Curve Has No Memory

And everyone keeps building anyway. What choice do we really have?

Anthropic urges coordinated pause on advanced AI development
They published a progress report by Marina Favaro and Jack Clark last week that I have not been able to stop thinking about, that AI systems are accelerating and could reach “recursive self-improvement,”
https://www.anthropic.com/institute/recursive-self-improvement

Not because of the headline numbers, though those are striking enough. Claude authored over 80% of the code merged into Anthropic’s own codebase, and so are other frontier companies now. Engineers are shipping eight times more output per quarter than they did two years ago. An agent completing tasks that would take a skilled human sixteen hours, working continuously, without being redirected once.

What got me was the graph showing lines of code per engineer over time. Flat for four years. Then a sharp bend upward in 2025 when Claude started running code rather than just suggesting it, the ouroboros, a binary Gödel machine feeding code back into itself. Then steeper again in 2026 when agents started working autonomously over longer horizons.

Since writing this piece yesterday, Anthropic released Fable 5

Their most capable model yet, available for general use. The numbers are striking: Stripe reported it compressed months of engineering into a single day on a 50-million-line codebase. Drug design running ten times faster. A week of autonomous genomics research producing results that outperformed a published paper in Science.

But the detail that stood out most was buried in the memory section of the announcement. When Anthropic gave Fable 5 access to persistent file-based notes while playing a game, performance improved three times more than it did for their previous model. 

The same capability jump, amplified dramatically by memory. Anthropic built that test into their own product launch because they already know what the data shows: the more capable the model, the more it benefits from structured state across time. The capability curve and the memory curve are not independent. They compound each other, and right now only one of them is being invested in at scale.

And stop building!

But I don’t think anyone will, even at Anthropic's request; technology is like an organism; it just keeps evolving.

Smart cookies, Anthropic. In just a few years they managed to get the moola, 1 trillion, in fact. Purchase the missing puzzle pieces of infrastructure like Vercept, Bun, Coefficient Biohealth, Fractional AI, and Stainless, the SDK experts, for whom Anthropic was one of their largest clients, makes sense symbiotically and strategically, well played.

I don’t know everything going on inside Anthropic, but Dario and his team are starting to look like 4D chess grand masters.

I looked at that graph and felt two things at the same time. Genuinely impressed. I really like Anthropic, and, if I’m honest, I'm a little concerned.

The concentration of control: pretty much all of the brains and infrastructure in AI will be consolidated into a handful of Silicon Valley tech companies, reminiscent of the 80's when Microsoft made deals with all the hardware manufacturers so Windows was the only licensed OS allowed. That's why Linux was smart to pivot to servers and retained 60% of market share to this day, Ubuntu is great; it works and very rarely has any reliability issues, along with Red Hat and Debian.

The Inflection Point Nobody Has a Map For

Here is what I think is actually happening and why the idea is more rational rather than alarmist.

We are approaching a threshold. Not gradually, but in the way the frog in hot water approaches boiling with nothing much visible, then everything all at once. An agent can reliably replicate its own development cycle and sustain above 90% code accuracy on open-ended tasks, the nature of human work does not just change. It restructures from the ground up; it amplifies and compounds.

The Anthropic article is careful to frame this as a positive development, and they are not wrong. More code shipped faster, bugs caught before production, research that would have taken humans months to years was completed in weeks. Real gains for real problems.

But here is what it means on the ground for the people doing the work. The volume of what needs to get done does not decrease. It multiplies. What changes is the type of work. Manual execution gives way to high-level direction. Writing code gives way to reviewing it, shaping it, and deciding what strategic problems it should be solving. The human role becomes a layer of high-level authorisation above an autonomous system that is already capable of most of the execution.

That is not less work; it is a more complex job, more cerebral, and also requires multidisciplinary experience and deep problem-solving detective skills. Ten times the output means ten times the decisions, ten times the context to hold, and ten times the responsibility for what ships correctly; that's the compounding effect.

And agentic bots are going to do all of this for us, some already are.

Being the head of HITL is not easy; stuff moves so quickly. Did you read 20 pages of code from 20 different projects and text instantly on your mobile phone and approve all of them?

You Already Need to Know 100 Things

I feel this shift personally, and I feel it constantly.

Building VEKTOR as a solo developer means I am a developer, a product manager, a security engineer, a devops engineer, a content writer, a growth person, a customer support function, and a business owner, all at once.

AI has made each of those roles individually more accessible and even feasible. It has also made it technically possible to run all of them simultaneously in a way that was not realistic before, via delegation. If you go back in time, I remember we had 5 systems at work: Oracle Unix green screen (it never crashed once), which was fast but needed mental repetition to learn; one database; Outlook; Intranet; then Salesforce came along and 20 other apps bolted on.

The result is not fewer tasks. It is more complicated work, spread across more domains and more systems with API’s, M2FA logins with higher stakes at each one.

Even humans can't work this captcha out, agentic bots are going to need a standardized system to traverse the internet without getting blocked.

And yes, the biggest brains are working on this problem right now. Solving multiple agentic bot layers with credentialed passports.

Whoever thought of this captcha idea above needs to be spanked immediately.

This week I was mid-session debugging a certbot renewal failure on the VPS when it became clear the issue was a credentials format mismatch between an old apt-installed certbot version and a Cloudflare API token that expected a newer format. The fix required understanding the snap package ecosystem, the certbot renewal hook architecture, and the Cloudflare API token permission model, all at the same time.

Claude who handled it flawlessly by logging into the VPS via Vektor Cloak SSH tools, and worked through all of it in 5 mins. I didn't really do anything but authorise and ask a few questions on how we can fix it for good, as I was working on other issues in another web browser.

Without Claude, I would have lost a few hours manually running cert checks in Ubuntu and scratching my head.

But here is the thing: Claude did not know any of that context when the session started. I had to authorize the skill file, which has all the system commands to access the VPS. What the cert structure issue looked like. The intelligence was there to diagnose and fix quickly once known. The memory of the prior work was not yet fully formed, so another memory node was added to save time in the future.

That gap is not a minor inconvenience. At the pace we are now expected to operate, losing context between sessions is structurally expensive. It is a heck of a lot better than what it was 6 months to a year ago, that's for sure.

It is getting to a point where the context and prompts are just caveman-like, as the memory graph and skill files are so dense that Claude knows everything; very rarely do you have to explain anything in great detail.

The Todo List That Taught Me Something

A few days ago I asked Claude to help me build a proper to-do list.

Standard Claude behavior followed. Within a few exchanges there was a proposed graphical interface, charts, colour coding by priority, a dashboard with status indicators. Genuinely impressive in its way. Also completely wrong for what I needed.

I told him: text-based list only. He complied immediately, without argument. Produced exactly what I asked for.

That small interaction has stayed with me because it captures something important about where we actually are. The capability is extraordinary. The judgment about what is appropriate for a given context is not yet reliable.

The human in the loop is not just there to authorise, we are there to calibrate. To say: not a dashboard, a list. Not sixteen layers of abstraction, one flat file. Not the most impressive Kanban board solution, the right one for the moment.

That calibration role is real and valuable. But it requires the human to maintain a clear head about what they actually want, which is harder than it sounds when the agent is confidently generating impressive-looking shiny output at speed.

The Things AI Still Cannot Remember

The hardest part of being a developer right now is not writing code. The code is, increasingly, becoming the easy part.

The hard part is remembering everything that needs to get done and when.

The VPS certificates that expire in 27 days. The mobile app submission that is sitting at step four of nine waiting for a policy acknowledgement. The three emails from Google Play that each require a checkbox response and a reupload and then a wait and then another adjustment. Google has now become more bureaucratic than the government. The dependency that needs updating before the next release. The blog post half-written. The changelog not updated. The analytics tag still pointing at the wrong domain.

None of that is difficult work. All of it requires context, continuity, and memory across time. And that is precisely what current AI systems do not have in a structured form.

I built Vektor partly because I kept running into this problem in my own work. Not the capability gap — the memory gap. The agent could help me fix the certbot issue, but it could not remember that we had looked at this same problem six weeks ago and had taken a different approach that turned out to be wrong. It could not connect the current error to the prior attempt or workout it out. Or how to get back into the VPS folder structure to view it; it could not carry forward the context that makes accumulated work compound rather than reset.

That is what persistent memory architecture is actually for. Not impressing people with recall of trivia from earlier conversations. Enabling agents to do work that compounds across time the way a human engineer’s experience does. Building the institutional knowledge layer that makes the difference between an agent that is capable and an agent that actually learns.

There are going to be many new issues, but once resolved, you don't want to have to repeat yourself; that is the metric that needs calculating: accuracy of past/present task recall.

What the Graphs Do Not Show

The Anthropic charts are impressive. The productivity curves bending upward. The benchmark saturations. The task horizon doubling every few months.

What those charts do not show is what is happening inside the agents doing the work. They show output. They do not show whether the system is building structured knowledge of its own history, or whether each session is still starting cold and rediscovering the same territory.

A dozen agents that can work for 24 hours but forget everything at the end of the session are not a self-improving system. The difference matters enormously once you are thinking about what recursive self-improvement actually means in practice.

For that loop to close properly, for AI development of AI systems to genuinely compound rather than just accelerate, the memory architecture has to be as solid as the capability architecture. The causal record of what was tried and why. The structured knowledge of what failed and under what conditions. The accumulated context that lets the next session start from where the last one ended rather than from zero.

That is the infrastructure problem that needs solving in parallel with the capability problem. And it is, right now, significantly underdeveloped relative to the capability curve that Anthropic’s graphs describe.

On Being Concerned and Building Anyway

I want to return to the realisation for a moment, because I do not think it should be dismissed.

The concern is not that AI will become capable. It already is. The pace of restructuring that follows from that capability, and whether the humans doing the work have enough time and support to adapt to roles that are genuinely different from the ones they trained for.

The shift toward high-level direction and authorisation work is real, but it is not a gentle transition. It happens fast, it is not evenly distributed, and the skills it requires—broad theoretical knowledge, clear communication of intent, calibration of agent output, and strategic decision-making across many domains simultaneously, are not skills that most people have been explicitly developing.

I feel the gap in my own work every day. Not in my ability to use the tools, but in the cognitive load of operating across so many domains at once while maintaining the judgment to know when the impressive output is the right output and when it needs to be deflated to a plain text list.

That cognitive load is going to increase, not decrease, as the capability curve steepens.

The answer is not to slow the curve. That is neither possible nor, honestly, desirable. The gains are real. The work being done is genuinely good.

The answer is to build the infrastructure, the ability to traverse across memory, context, continuity, and structured knowledge—that makes the human direction layer sustainable rather than overwhelming. To make the authorisation work tractable rather than a firehose of decisions without adequate context.

That is what I am building toward. Not because the problem is solved, but because I can see clearly that it is the right problem to be working on.

The graphs are impressive. The gap they do not show is who is going to maintain 200 agentic bot decisions across 200 different API-connected systems every hour on cron autopilot mode and still manage to have lunch.

I guess we all could have more pressing issues to worry about when that finally happens.

I'm going out to lunch; Claude, you run the show and make good decisions.

Made by the developer behind VEKTOR Slipstream, a local-first persistent memory SDK for AI agents. It runs on SQLite, recalls in 8ms, and ships with a 4-layer causal graph architecture. vektormemory.com

Llm Agent
Anthropic Claude
Ai Memory
Machine Learning

Top comments (0)