Charles Wu for seekdb

Posted on Apr 27

Token, Harness, OpenClaw, RAG, MCP, Agent — What’s the Difference? One Map Makes It Clear

#ai #llm #softwareengineering #machinelearning

You know these terms alone. Together? They’re confusing. Here’s the map.

Ever feel like this — people keep saying you need “Agents” for AI apps, connect “MCP”, install “OpenClaw”, and now there’s “Harness” too? Each term makes sense alone, but together it’s overwhelming. Today we’ll sort it out — who comes first, who comes after, who manages whom. By the end, you’ll see clearly.

No fluff. Let’s look at a real case: Your boss asks you to “research the latest competitor dynamics online, combine with our company’s past two years of legacy product data, and produce a new product development PPT with data charts.”

Below is the complete process from start to finish. Once we run through this, those confusing concepts will naturally fall into place.

Step 1: You Receive the Task, Send Instructions to OpenClaw

Your boss’s request is clear, but you can’t personally search for info, query data, draw charts, and write the PPT. To complete this efficiently, you organize the task into a single instruction and send it to something called OpenClaw.

1. OpenClaw (“Lobster” 🦞)

What is OpenClaw? Simply put, it’s the “central dispatch console” for the entire AI assembly line — task decomposition, resource allocation, budget monitoring, and logging.

To understand why OpenClaw is needed, we first need to know what the foundation of the entire system is. No matter how complex the operations later become, everything ultimately rests on two most basic things.

2. Large Language Model (LLM)

ChatGPT, Claude — essentially, they’re just really smart brains. Brilliant, knowledgeable, but they have two fatal flaws:

First, they only “respond passively” — you ask one question, they give one answer, never proactively working.

Second, the default chat experience has no durable thread state — every conversation is a fresh start; close the dialog and the model does not “remember” your last session. Production stacks fix that outside the model with logs, RAG, Memory, and databases (covered below).

3. Token

Many people think Token equals word count — big mistake. Token is the model’s basic text unit — think “atoms of text.” One word can be one token (cat) or split into several (understanding → under + stand + ing). On average, 100 English words ≈ 130 tokens. Every sentence you send and every piece the model generates shows up on the meter. This determines two things:

First, your money — APIs charge by token
Second, its “short-term memory” — whatever fits in the context window for this request

Why does token affect memory? Here’s a counter-intuitive mechanism. LLMs themselves have no memory function. Before answering you, the system packages your prior conversation together with your new question into a huge text block and feeds it to the model to read from scratch. The size of this text block is the “context window,” and the token limit is this window’s maximum capacity. Once conversation history gets too long and exceeds the token limit, the system has to truncate — discarding the earliest content.

So “amnesia” in a bare chat UI isn’t mystical — there is literally a finite window. Token is both fuel and the size of that window. Long-term facts and preferences still live in external stores (RAG, Memory, databases) — not inside the next blank chat.

Alright, foundation is clear. But foundation alone isn’t enough — who orchestrates all those complex parts? This is why OpenClaw exists. Next, it will awaken a team to get to work.

Step 2: OpenClaw Awakens a Multi-Agent Team, Each Playing Their Role

Upon receiving the instruction, OpenClaw awakens a Multi-Agent team.

4. Multi-Agent

Multi-agent is the product of necessary division of labor for complex tasks. One agent can be a great line cook. But don’t ask it to also be the head chef, server, cashier, and dishwasher simultaneously — that’s why kitchens have stations.

In the multi-agent model, you create a group with four core roles:

Search Agent — finding information across the web
Writer Agent — drafting articles and reports
Reviewer Agent — checking for errors and policy violations
Analyst Agent — generating charts and data insights

There are two coordination patterns:

Orchestrator–worker — A coordinator decomposes tasks, assigns work, and collects results (most enterprise setups)
Peer-to-peer — No fixed coordinator; multiple agents message each other in a shared channel and pick up relevant tasks

Enterprise deployments usually prefer orchestrator–worker because it’s easier to control, permission, and audit.

In this PPT task, OpenClaw awakens three of these agents:

“Search Agent” crawls competitor dynamics
“Internal Data Agent” retrieves historical data
“Analyst Agent” generates charts

How do they work? This brings us to the essence of Agent.

Many people think Agent is just “LLM plus some tools,” but this misses the most critical thing. The core difference between Agent and LLM is the locus of control.

To achieve this transformation, you need to wrap a “scheduler” layer outside the LLM. This scheduler does four things:

Decomposition — Break complex tasks into executable sub-steps
Execution — Call tools one by one to complete each step
Observation — Watch execution results of each step; continue if successful, retry or switch plans if failed
Decision — Make its own judgments at forks in the road

So: Agent = Brain (LLM) + Scheduler + Knowledge Base + Skill Library + Tooling (often wired through MCP). The LLM only understands goals and generates instructions; the true “proactiveness” comes from the scheduler layer outside. AI assistants help you brainstorm; agents finish the work for you.

One more common point of confusion: What’s the difference between Agent and OpenClaw?

In one sentence: Agent is the worker getting the job done; OpenClaw is the system managing the workers.

One agent is like a renovation worker — you tell him “paint this wall white,” he finishes. Multi-agent is like a renovation team with masons, electricians, painters, collaborating to renovate a room. OpenClaw is the renovation company’s operations backend. It doesn’t manage how to paint walls specifically; it manages: which worker is available, whether tools are ready, whether there’s permission to enter the site, how much work was done and how much it cost, whether the work process was logged, what to do if a worker runs away.

Why can’t one giant agent replace OpenClaw? Three recurring failure modes:

With the agent concept clear, let’s see how the three agents awakened by OpenClaw actually work. This is where MCP, databases, RAG, Skill, Memory naturally emerge.

5. MCP (Model Context Protocol)

First, the “Search Agent” crawls competitor dynamics across the web via MCP interfaces.

MCP is an open standard for plugging tools into models (think USB-C for capabilities). Before it appeared, to let AI search the web, you needed programmers to write bespoke glue translating “what AI wants to search” into “call search API”. Change the tool, rewrite the code; change the AI model, maybe rewrite again. This is the “M×N Problem”: M models × N tools = M×N development efforts.

MCP changes this pattern to “M+N”: Tool developers ship one MCP surface; any MCP-capable host can call it; hosts implement MCP once and reach many tools. MCP is essentially a translation layer — AI says “I want to search competitors,” MCP translates to browser-understandable instructions; browser returns results, MCP translates back to model-friendly content. With MCP, the model is like plugging into a hub — gaining many connectors without custom glue each time.

6. Vector Database / AI Database

Second, the “Internal Data Agent” triggers RAG, diving into the vector database to retrieve the past two years of historical data.

A vector database (or AI-native DB) is a semantic index over embeddings. Traditional databases (like MySQL) are rigid — you search “happy,” it absolutely won’t find “glad.” Vector databases turn documents and chat into vectors — long arrays of numbers representing directions in embedding space. Texts with similar meaning land near each other in that space. “Happy” and “glad” are close; “happy” and “sad” are far.

When you search “competitor Q3 data,” it doesn’t match keywords — it embeds the query, then returns the nearest neighbors. It’s not matching text; it’s retrieving by semantic distance.

Related projects:

Vector Database OceanBase: https://github.com/oceanbase/oceanbase
Native AI Database seekdb: https://github.com/oceanbase/seekdb

7. RAG (Retrieval-Augmented Generation)

Without RAG, LLMs lean on parametric knowledge — when that’s thin, they hallucinate more often. With RAG, a typical loop has four steps:

Retrieval — Find relevant materials in the vector database
Ranking — Pick the most reliable pieces
Context assembly — Combine materials and question into a single prompt
Generation — The LLM answers conditioned on those materials

RAG reduces — but does not eliminate — fabrication: The model is steered with “answer from the following context” instead of an open-ended prompt. If the context is thin or wrong, it can still go off rails — guardrails and harnesses still matter.

8. Skills (Skill Package)

Third, the “Analyst Agent” calls up the chart generation Skill you predefined earlier, and queries its Memory: “The boss can’t distinguish red from green; charts must not rely on red–green encoding.”

Skills exist to solve a real problem: Prompts don’t stick. A prompt is a one-off command like “rewrite this email to sound more professional” or “summarize this meeting transcript into 3 bullets.” It works today, but when you open a new chat tomorrow, you have to type it again. Writing Prompts daily equals doing AI chores every day.

Skills fix this by making repeatable workflows permanent. Think automated buttons, not verbal instructions. You write the SOP once, the system executes it forever. Prompt is giving orders; Skill is building an assembly line.

9. Memory (Long-Term Memory)

And the Memory just mentioned records “you as a person.” RAG remembers objective materials; Memory remembers subjective preferences. Technically they’re often implemented similarly — both retrieved from stores outside the model (vector DBs, KV, etc.). The difference: RAG stores documents and reports, imported by developers in advance; Memory stores user preferences and identity tags, automatically extracted and stored by the system during conversation. RAG is the company’s shared filing cabinet; Memory is your personal file folder. With Memory, the assistant can reuse stable preferences — e.g., avoiding red–green palettes — without you repeating them every session.

PowerMem (a reference implementation for OpenClaw-style hosts): https://github.com/oceanbase/powermem

Step 3: Encountering Hard Problems, Summon Special Forces

The task involves writing a complex piece of data analysis code — beyond what the standard Analyst Agent can handle. For this, OpenClaw activates a specialist coding agent (in our stack, Claude Code — the coder’s exclusive tool for deep technical work).

10. Claude Code

Don’t confuse Claude Code with the web-chat version of Claude. Web version is a consultant — in the browser you ask one question, it answers one. Claude Code is completely different — it lives directly in your computer terminal, with broad filesystem and shell access — read, write, modify, and delete files where your OS user allows. Workflow: you give the goal, it decomposes and executes with minimal interruption. Built-in tools include reading files, writing files, running commands, searching code.

Its principle is: When Anthropic trained Claude, they specifically strengthened its terminal command and file operation capabilities, then packaged it as a local terminal agent, pre-connecting two MCP tools: file system and command line. Opening Claude Code equals launching an agent specialized in writing code. One sentence — it digs through tens of thousands of lines of codebase itself, fixes bugs itself, submits tests itself.

Claude Code finishes writing and running the data analysis code, results returned to “Analyst Agent,” charts generated smoothly. PPT draft emerges.

Step 4: Finished Product, First Pass Security Check

PPT draft is generated. But do you really dare to send it directly to your boss?

What if the agent used red–green-only charts (the boss can’t distinguish red from green)? What if there’s a number in the data charts the model invented? What if the format completely doesn’t match company templates? Worse: what if a tool-enabled agent had credentials broad enough to damage production data? Keep database credentials, destructive tools, and customer PII out of reach of exploratory agents — scope tools per environment.

This is why the AI assembly line still needs one final layer: Harness Engineering.

11. Harness Engineering

Mitchell Hashimoto (HashiCorp co-founder) wrote persuasively in early 2026 about treating long-running agents like systems that need a harness — constraints, checks, and recovery. See My AI Adoption Journey. “Harness” literally evokes horse tackle — reins, harness, saddle — equipment for guiding strength without fighting it. The metaphor fits: agents are powerful and fast, but they can spook, drift, or act outside intent. Harness engineering is the discipline of making unsafe failure modes rare — with permissions, tests, sandboxes, humans in the loop, and telemetry.

Harness Engineering differs from ad-hoc “debug and hope.” Traditional thinking: agent makes a mistake, you manually intervene, then hope it doesn’t repeat. Harness thinking: each time a failure mode appears, encode a guard — policy, test, sandbox, or auto-rollback — so the same class of harm is much harder to repeat.

Mitchell Hashimoto cited a classic example: Let an AI agent refactor a million-line codebase. The naive path is wide GitHub permissions and “go ahead,” then waiting for disaster — the agent will recklessly modify files, introduce bugs, and delete files it thinks are useless. A harnessed approach might look like this:

Speaking of this, you might be curious: What’s the difference between Harness Engineering and OpenClaw?

OpenClaw manages “assembly line operations” — scheduling, allocation, monitoring, logging. Harness Engineering manages “assembly line safety” — constraining behavior boundaries, validating output quality, building self-healing closed loops. One manages “can run”; the other manages “runs stably.”

Here, let’s pause and think about a question: Why do many enterprises still dare not put AI agents into production environments?

It’s not because agents aren’t smart enough — it’s because of distrust. You don’t know what it’ll do next second, you don’t know if it’ll spend your entire budget, you don’t know if it’ll send a nonsensical email to clients in the middle of the night.

Harness Engineering targets that trust gap. It uses engineered constraints — policy, tests, observability, approvals — to move agents from opaque automation toward auditable, predictable, stoppable systems. Predictability is what lets serious work land on agents.

Back to our task. The system validates PPT format. It checks color encodings against the accessibility rule. After passing, the PPT lands in your Slack drafts.

Tokens are metered end to end; per-step caps catch runaway loops. If spend crosses 80% of budget, route to a cheaper model tier where policy allows. Every tool call hits an audit log so when the boss asks where did this number come from? You can answer in seconds.

In a few minutes, done — and you only clicked Approve once.

Conclusion: Which Layer Are You At?

Once you see how these thirteen building blocks fit together across five layers — from the foundation (LLM + Token) through memory and knowledge
(RAG + Memory + Vector DB + AI DB + Skill + MCP), execution
(Agent + Multi-Agent + Claude Code), orchestration (OpenClaw),
and finally the safety envelope (Harness Engineering) — you stop treating the space as magic jargon.

If this helped, clap and share it with someone who’s still drawing the map on a whiteboard.

Where are you now?