DEV Community

Charles Wu for seekdb

Posted on

Beyond RAG: Why Knowledge Engineering Becomes the Real Moat in the Agent Era

RAG brings books to the exam. Knowledge Engineering teaches Agents to study. Memory architecture matters more than retrieval tuning.

Photo by Veronica Karoli on Unsplash

Everyone says the Agent era is about better prompts, bigger context windows, and smarter retrieval. That is true — but it is not the bottleneck anymore.

The bottleneck is memory architecture.

Most teams still treat knowledge like a temporary input: retrieve chunks, answer the question, discard the trace. That works for demos. It fails in long-running systems. The same questions get re-solved. The same contradictions get rediscovered. The same context gets paid for, again and again.

If an Agent cannot organize, maintain, and evolve what it learns, model strength alone is not enough. You do not get compounding intelligence. You get expensive repetition.

That is why I think Knowledge Engineering is now more foundational than RAG tuning alone.

Projects like LLM Wiki, Obsidian-Wiki, and GBrain all point to the same shift: from one-shot retrieval to persistent, structured memory that compounds over time.

Knowledge Engineering

In other words:

  • RAG helps an Agent bring books into the exam room.

  • Knowledge engineering helps it study deeply, synthesize, and keep notes that improve week after week.

That distinction is where production leverage starts.

The Problem: Knowledge Piles vs. Structured Memory

Andrej Karpathy (OpenAI co-founder) open-sourced LLM Wiki — a simple but profound pattern centered on a Markdown file.

The core problem it addresses is one we’ve all lived with:

How do you transform unstructured material into a knowledge system that AI can actually reason over?

A related project is GBrain by Garry Tan (YC President & CEO), which follows a similar philosophy but pushes further into engineering rigor.

Compounding Knowledge

Why this matters

Humans are great at collecting information and terrible at maintaining it.

We bookmark articles. We save PDFs. We clip notes. Then they decay in browser folders and desktop chaos. (If you check your “saved for later” folders right now, you’ll probably find digital fossils.)

At both personal and enterprise scale, two problems dominate:

  • Time decay and lifecycle churn: knowledge expires as products, policies, and reality change.

  • Organizational complexity: manual maintenance of multidimensional relationships is expensive and brittle.

In the Agent era, this is existential because:

Knowledge quality sets the upper bound on Agent performance.

Knowledge Engineering > Prompt Engineering

Prompt Engineering teaches a model what task to perform.

Knowledge Engineering teaches a model:

  • what it should know, and

  • how it should apply what it knows.

That is why LLM Wiki is a meaningful shift from classic RAG patterns.
Instead of re-discovering answers from raw chunks every time, it asks the model to maintain a persistent, linked, contradiction-aware wiki that compounds over time.

Knowledge stops being a static pool. It becomes a living artifact.

LLM Wiki

Why Skills Still Matter (and Why They’re Hard)

In coding workflows, we don’t just want syntactically correct output.
We want style, norms, and operational habits:

  • naming conventions,

  • comment style,

  • “interface-first” vs “prototype-first” development,

  • preferred frameworks,

  • automatic tests/linting after code generation.

Those are not facts. They are experience-shaped operating rules.

In Hermes Agent and OpenClaw ecosystems, that experience is encoded as Skills.

But writing good Skills is non-trivial. Tutorials help, but conversion of tacit practice into executable Skill logic still takes deep domain understanding and iteration.

This is exactly why auto-skill generation matters:

The leap from “human-authored Skills” to “Agent-generated and Agent-refined Skills” is a key step toward true self-evolving systems.

Skillify: Knowledge as a Progressive-Disclosure Form

Both LLM Wiki and GBrain broaden the meaning of “Skill.”

Traditional Skill = one SKILL.md recipe.
Skillify mindset = any content can become callable, staged knowledge if metadata/schema defines:

  • when it should be loaded,

  • what context it serves,

  • how it should be linked to related knowledge.

So Skill is no longer just one file format. It becomes a knowledge shape with progressive disclosure.

You keep feeding material; the Agent compiles and maintains memory.

Why This Feels Bigger Than RAG Alone

A useful mental model:

  • RAG is bringing books into the exam room.

  • Skillify is reading those books deeply and turning them into structured notes you can reuse instantly.

For high-stability, high-accuracy Agent systems, that difference is fundamental.

LLM Wiki: The Three-Layer Closed Loop

Karpathy’s LLM Wiki pattern can be summarized as three layers:

  • Raw Sources (immutable truth)

  • The Wiki (LLM-maintained structured pages)

  • The Schema (rules/workflows that discipline behavior)

And three core operations:

  • Ingest: parse one new source, summarize, cross-link, update many pages.

  • Query: answer from wiki pages (not just raw chunks), file high-value answers back as pages.

  • Lint: periodically detect contradictions, stale claims, orphan pages, missing links.

Two utility files make this scalable:

  • index.md (content navigation)

  • log.md (chronological evolution trail)

This is why the pattern works: it automates the bookkeeping burden humans abandon.

Obsidian-Wiki: From Idea to System

LLM Wiki is a concept pattern. Obsidian-Wiki is a more engineering-oriented implementation around that pattern.

Obsidian-Wiki

Core traits:

  • agent-agnostic (works with multiple agent ecosystems),

  • Skill-driven operations,

  • native use of Obsidian capabilities (wikilinks, graph view, Dataview).

Notable enhancements

  • Delta tracking with .manifest.json + SHA-256 diff classification (new, modified, unchanged, etc.)

  • Trust boundary: source docs are untrusted; never execute embedded instructions (prompt injection defense)

  • Provenance markers (extracted, inferred, ambiguous)

  • Visibility tags (visibility/internal, visibility/pii)

  • hot.md cache: short semantic snapshot for fast recent-context awareness

“Self-evolution” in practice: history ingestion

A standout capability is automated ingestion of interaction history from multiple agent tools, then distilling that into structured wiki pages via:

  • incremental scan,

  • priority parsing (memory files > recent notes > long transcripts),

  • privacy scrubbing,

  • semantic clustering,

  • distilled page generation.

This turns fragmented “chat residue” into memory assets.

Reality Check: Where LLM Wiki-Style Systems Shine (and Break)

Strong fits

  • long-horizon personal research,

  • structured reading companions,

  • project memory (ADR, architecture evolution, postmortems),

  • agent memory consolidation across tools,

  • lightweight internal wiki for small teams.

Limitations

  • markdown-first storage can hit search/query ceilings at larger scales,

  • no built-in always-on scheduling unless externally orchestrated,

  • weak typed-edge semantics vs formal graph systems,

  • delayed linking if maintenance jobs are manual.

So yes — it is elegant, transparent, and controllable. But at scale, retrieval and relationship complexity require stronger infrastructure.

GBrain: Hybrid Retrieval + Graph Evolution

If LLM Wiki is minimalist knowledge philosophy, GBrain is that philosophy plus heavier engineering.

GBrain

It preserves file-based knowledge and progressive disclosure, but adds middleware for scale:

  • hybrid retrieval,

  • entity relationship graph,

  • layered feeding strategies.

Its architecture can be summarized as: Thin Harness, Fat Skills.

A provocative inversion of current trends: keep harness minimal, push capability into rich Skill layers.

Latent Space vs Deterministic Logic

A key design split:

  • Let the LLM decide what should happen (latent-space judgment).

  • Let deterministic code enforce where/how it happens (format, links, validations, repeatability).

Example:

  • “Should this information belong on this person page?” → LLM judgment

  • “How links are built and citations validated” → deterministic code

This division reduces ambiguity where precision matters.

“Isn’t This Just RAG Again?”

Short answer: no.

GBrain does not replace file-native knowledge with search-only retrieval.
It uses retrieval as a coarse filter before deep reading.

Typical flow:

  • Hybrid search finds relevant chunks cheaply

  • Full page load (get_page() style) retrieves complete context

  • Progressive disclosure feeds the model only what matters

Result: This “two-stage retrieval” balances speed and fidelity
better than:

  • ❌ Brute-force file traversal (slow)

  • ❌ Pure chunk-level RAG (loses context)

Graph Construction: From Text to Traversable Structure

Another GBrain differentiator is practical graph construction:

  • entity extraction (rule/pattern-driven),

  • auto page generation per entity,

  • relation typing (works_at, founded, invested_in, etc.),

  • forced backlink enforcement for connectivity.

Even without strict RDF formalism, this still yields the essentials of a graph:

  • nodes,

  • typed edges,

  • traversal depth queries.

That enables richer reasoning than document retrieval alone.

Multimodal and Operational Loop

GBrain also supports multimodal ingestion (text, PDFs, audio/video transcripts, screenshots + OCR), then runs a closed operational cycle:

ingest → summarize/transcribe → extract entities → archive → index → retrieve → cite/repair → iterate

Compared with naive memory accumulation, this is the difference between self-evolution and self-chaos.

The Infrastructure Layer: Where seekdb Fits

Projects like LLM Wiki, Obsidian-Wiki, and GBrain explain how knowledge should be structured and evolved.

AI-native hybrid retrieval databases like seekdb address the infrastructure layer:

  • semantic vector recall,

  • keyword/full-text matching,

  • scalar filtering,

  • re-ranking,

  • unified SQL/SDK interface.

That matters because production systems need all of these at once — without glueing together fragile retrieval stacks.

At enterprise scale, hybrid architecture is usually the practical answer:

  • fast first-stage filtering for latency,

  • deep model reading + progressive disclosure for accuracy and durable memory.

Final Takeaway

The real frontier is not “one better prompt” or “one better retriever.”
It is whether an Agent can reliably move from episodic trial-and-error to persistent learning.

That transition depends on:

  • Skill systems,

  • knowledge lifecycle management,

  • and progressive, structured disclosure.

LLM Wiki and GBrain represent different ends of the same trajectory:

  • one maximizes simplicity and transparency,

  • the other emphasizes engineering robustness and scale.

The shared objective is identical:

Give Agents memory that can be maintained, trusted, and evolved.

Building Agent memory systems? I would love to hear what patterns you are using — drop a comment below.

👏 Clap if this helped · 🔔 Follow for more Agent engineering deep dives

References

Top comments (0)