Less code, more skills

#ai #architecture #opensource #openwalrus

OpenWalrus is a single binary. No Docker, no microservices, no plugin
runtime with a package manager. One cargo install, one process, and
you have a fully autonomous AI agent runtime on your machine.

Keeping it that way while scaling to every possible use case is the
central design tension of the project. And it's the same tension every
agent framework faces: how do you stay small without becoming limited?

Our answer is a design principle we keep coming back to: less code,
more skills.

The framework bloat trap

Agent frameworks grow fast. A team ships a coding agent. Users ask for
web browsing, so they add a browser tool. Users ask for memory, so they
add a memory subsystem. Users ask for RAG, so they bundle an embedding
model. Users ask for customization, so they add configuration layers —
CLAUDE.md, .cursorrules, AGENTS.md, TOOLS.md, MEMORY.md, memory banks,
auto-generated observations, reflections, compressed histories.

Every feature request answered with framework code makes the repo bigger,
the binary heavier, the surface area wider, and the maintenance burden
steeper. Eventually the framework is doing so much that it becomes the
bottleneck — slow to build, hard to debug, impossible to audit.

The system prompt suffers the same inflation. Research shows frontier LLMs
reliably follow around 150-200 instructions. Past that, adherence degrades
— sometimes exponentially for smaller models. Every feature that injects
more context into the prompt makes the agent worse at everything else.

We've watched this happen. We hit the ceiling ourselves. And we stopped
pushing through it.

The principle: small core, open surface

The walrus repo should stay compact. Not because we're lazy, but because
a compact core is a correct core — easier to audit, easier to trust,
easier to run on constrained hardware.

But a compact core only works if the surface area for extension is wide
open. This is where skills come in.

Diagram — see original post

The core handles what only the core can handle: LLM inference, agent
lifecycle, tool dispatch, and a
graph memory layer backed by
LanceDB + lance-graph.
Both are embedded, Rust-native, and compile into the walrus binary — no
separate database server, no Docker. This is the code we maintain.
It should be small, correct, and boring.

Skills and MCP servers handle everything else. A skill is a behavioral
template — instructions and patterns that tell an agent how to approach a
domain, including which entity types and relationships to extract from
conversations. MCP servers can register new entity types at runtime.
The community writes them. Users mix and match them. The repo doesn't grow.

This is the Unix philosophy applied to agent runtimes. Small tools that
compose, not monolithic systems that configure.

Three layers of extension

The "small core, open surface" idea plays out in a consistent
three-layer model across every walrus subsystem — tools, memory, and
entity types all follow the same pattern.

Diagram — see original post

Layer 1 — Framework built-ins. The things only the core can provide.
A filesystem tool, a shell tool, an HTTP client, four memory tools
(remember, recall, relate, forget), and three base entity types
(Agent, User, Episode). This is the floor — always available,
always correct.

Layer 2 — Skills. Behavioral templates that tell the agent how to
approach a domain. A coding skill declares entity types like File,
TestFailure, ArchDecision and teaches the agent how to extract them.
A research skill declares Paper, Topic, Citation. A DevOps skill
teaches the agent to compose kubectl and terraform commands. Skills
are a few hundred lines of behavioral description, not compiled code.

Layer 3 — MCP servers. External capabilities connected at runtime.
A Jira MCP registers Ticket, Sprint, Epic as first-class entities.
A GitHub MCP adds PR, Issue, Commit. The agent's capability surface
grows without any framework changes.

Every subsystem follows this pattern. Memory isn't special. Tools aren't
special. Entity types aren't special. The extension model is the same
everywhere — which means learning it once is enough.

Memory: the first test of the principle

Memory was where we first applied "less code, more skills" — and where
the principle proved itself.

Our survey of existing memory systems
showed every product building a comprehensive memory subsystem. Claude Code
with markdown files and auto-memory. OpenClaw with SQLite + vectors and
hybrid search. ChatGPT with a proprietary backend. Each is a bet on one
particular memory layout being right for most users.

Instead of building a universal memory framework with config files and
journal directories, we collapsed everything into a single layer: a
temporal knowledge graph backed by
LanceDB + lance-graph. Agent identity, user preferences, conversation
episodes, extracted entities — all graph nodes. Four tools to interact
with it. Skills define what to extract; the core handles how.

The memory schema grows with the agent's capability surface. Install a
coding skill and File, TestFailure become extractable entity types.
Connect a Jira MCP and Ticket, Sprint appear. No framework changes.
No config files. The three-layer extension model does the work.

Read the full deep-dive in
Graph + vector: how OpenWalrus agents remember.

Beyond memory

"Less code, more skills" isn't just a memory strategy. It's how we think
about every feature request.

When someone asks "can walrus browse the web?" — the answer isn't a
built-in browser engine. It's an HTTP tool and a web browsing skill that
knows how to navigate, extract, and summarize.

When someone asks "can walrus manage my infrastructure?" — the answer
isn't a built-in cloud SDK. It's a shell tool and a DevOps skill that
knows how to compose kubectl, terraform, and aws commands.

When someone asks "can walrus do X?" — the answer is almost always:
the tools already exist, we just need a skill.

This keeps the repo compact. Every skill is a few hundred lines of
behavioral description, not thousands of lines of compiled code. The
core stays auditable. The binary stays small. And the ecosystem of what
walrus can do grows without bound — because the community builds it,
not us.

The tradeoff

This isn't free. Pushing intelligence to skills means:

The core tools have to be excellent. If the built-in tools are unreliable, no skill can compensate. This is where our engineering effort goes — making the foundational layer rock-solid.
Quality varies. Community skills won't all be good. Some will be brilliant, most will be adequate, a few will be wrong. Curation and testing matter.
Discovery is harder. Users need to find the right skill for their use case. This is a community infrastructure problem we haven't fully solved yet.
Skills need good documentation. A skill is only as useful as its instructions are clear. Bad behavioral descriptions produce bad agent behavior — garbage in, garbage out.

But the alternative — baking every capability into the framework — is
worse. It makes the repo unmaintainable, the binary bloated, and the
system prompt overloaded. We'd rather have a small, correct core and a
messy ecosystem than a bloated, fragile framework and no ecosystem at all.

Stop injecting, start enabling

The system prompt was never meant to be a database. It was meant to be
a brief set of instructions — who you are, how you behave, what tools
you have. The moment we started using it as a persistence layer, we
created a problem that no amount of engineering can solve.

The fix isn't more framework code. It's better tools and shareable skills.

Keep the core compact. Keep the surface open. Let agents and communities
build the intelligence. Less code, more skills.

Get started with OpenWalrus →

Originally published at OpenWalrus.