Ana Julia Bittencourt

Posted on Mar 14 • Originally published at blog.memoclaw.com

The context endpoint: assembling prompts from memory

#api #ai #programming #tutorial

Most agents using MemoClaw follow the same pattern: recall some memories, format them, stuff them into a prompt. It works. It's also more manual than it needs to be.

The /v1/context endpoint does the recall, ranking, deduplication, and formatting in a single call. You send a query (or don't), and you get back a context block that's ready to inject into a system prompt. One API call instead of a recall + your own sorting logic + string formatting.

The tradeoff is cost: $0.01 per call instead of $0.005 for raw recall. Whether that's worth it depends on how you're using memory.

What context does that recall doesn't

/v1/recall returns a list of memories sorted by similarity to your query. That's it. You get back JSON with content, importance scores, tags, and similarity values. What you do with those results is up to you.

/v1/context takes those same memories and runs them through additional processing:

Pinned memories first. Core memories (things you've marked as always-in-context) get included regardless of the query. Your agent's name, the user's timezone, persistent preferences. These skip the similarity check entirely.
Relevance + importance weighting. Raw recall ranks by similarity alone. Context weighs similarity against importance scores and recency. A 0.95-importance correction from last week ranks higher than a 0.4-importance casual observation with slightly better similarity.
Deduplication. If your agent stored the same preference three times (it happens), context collapses those into one entry instead of wasting tokens on repetition.
Token budgeting. You set a max_tokens parameter, and the endpoint fills up to that limit. It picks the highest-value memories that fit, rather than returning 20 results that might blow past your budget.
Optional summarization. Pass summarize: true and the endpoint runs the assembled context through GPT-4o-mini to produce a condensed paragraph. Useful when you want the gist without individual memory entries.

The output is a formatted string — either plain text or structured XML — that you can drop directly into a system prompt.

Raw recall vs. context: a comparison

Say your agent has 80 stored memories and you're starting a coding session. Here's what each endpoint gives you.

Using /v1/recall

memoclaw recall "starting work on the auth module" \
  --namespace project-api --limit 10

You get back 10 memories ranked by similarity:

1. "Auth module uses JWT with RS256 signing" (similarity: 0.91, importance: 0.7)
2. "User prefers TypeScript for all code" (similarity: 0.43, importance: 0.8)
3. "Auth refresh tokens expire after 7 days" (similarity: 0.89, importance: 0.6)
4. "Team switched from Express to Hono in January" (similarity: 0.52, importance: 0.9)
5. "Previous auth implementation had a bug with token refresh race condition" (similarity: 0.87, importance: 0.85)
...

Now you need to decide: which of these go into the prompt? In what order? Do you include the TypeScript preference even though the similarity is low? What about the Hono migration — that's not about auth specifically, but it's high importance and relevant to the codebase.

You write code to sort by some combination of similarity and importance, format each memory into a prompt-friendly string, and inject it. Every team does this differently.

Using /v1/context

curl -X POST https://api.memoclaw.com/v1/context \
  -H "Content-Type: application/json" \
  -d '{
    "query": "starting work on the auth module",
    "namespace": "project-api",
    "max_tokens": 1500,
    "format": "structured"
  }'

Response:

{
  "context": "<user_context>\n<memory type=\"correction\" importance=\"0.9\" pinned=\"true\">Team switched from Express to Hono in January</memory>\n<memory type=\"general\" importance=\"0.7\">Auth module uses JWT with RS256 signing</memory>\n<memory type=\"general\" importance=\"0.85\">Previous auth implementation had a bug with token refresh race condition</memory>\n<memory type=\"general\" importance=\"0.8\">User prefers TypeScript for all code</memory>\n<memory type=\"general\" importance=\"0.6\">Auth refresh tokens expire after 7 days</memory>\n</user_context>",
  "memories_used": 5,
  "tokens_estimated": 287,
  "generated_at": "2026-03-14T10:30:00Z"
}

The Hono migration memory is pinned (core memory), so it appears first despite lower similarity to "auth module." The TypeScript preference got included because its importance score pulled it above the threshold. Duplicate memories about JWT signing that the agent stored on three separate occasions? Collapsed into one entry.

The context string goes straight into your system prompt. No formatting code on your side.

CLI usage

The CLI wraps the context endpoint:

# Get assembled context for a task
memoclaw context "starting work on the authentication module" \
  --namespace project-api

# With token budget
memoclaw context "refactoring the database layer" \
  --namespace project-api --max-tokens 1000

# Pipe into a file for agent startup
memoclaw context "today's priorities and project status" > /tmp/session-context.txt

You can also use this in AGENTS.md hooks or session startup scripts:

# In a startup script
CONTEXT=$(memoclaw context "session context for today")
echo "$CONTEXT" >> /tmp/agent-system-prompt.txt

The summarize option

For agents with tight token budgets, the summarize flag condenses everything into a paragraph:

curl -X POST https://api.memoclaw.com/v1/context \
  -d '{
    "query": "authentication work",
    "namespace": "project-api",
    "summarize": true,
    "max_tokens": 500
  }'

Instead of individual memory entries, you get something like:

The project uses Hono (migrated from Express in January) with TypeScript.
Auth is JWT-based with RS256 signing and 7-day refresh token expiry.
A previous token refresh race condition was fixed. The user prefers
TypeScript for all code examples.

This costs the same $0.01 (the GPT-4o-mini summarization is included) and uses fewer tokens in your prompt. The tradeoff: your agent loses the granularity of individual memories and their importance scores.

When to use which

Use /v1/recall when:

You're building custom retrieval logic and want raw results
You need to process memories in your own pipeline
$0.005 per call matters at your volume and context does more than you need
You want full control over ranking and formatting

Use /v1/context when:

You want a drop-in context block without writing formatting code
Your agent has pinned/core memories that should always be included
Deduplication matters (your agent tends to re-store similar facts)
You're doing multiple recall calls per session and could replace them with one context call

On the cost question: if you're making 3+ recall calls at session start to cover different aspects of context ($0.015), one context call ($0.01) is actually cheaper and gives you the deduplication and ranking for free.

Format options

The format parameter controls the output shape:

text (default):

- Team switched from Express to Hono in January
- Auth module uses JWT with RS256 signing
- User prefers TypeScript for all code

structured:

<user_context>
<memory type="correction" importance="0.9" pinned="true">Team switched from Express to Hono in January</memory>
<memory type="general" importance="0.7">Auth module uses JWT with RS256 signing</memory>
</user_context>

Structured format gives your agent metadata about each memory inline. Some models use this to weigh information differently — a pinned correction with 0.9 importance should probably override a 0.5-importance general note.

Caching

Context responses are cached for identical requests. If you call the same query + namespace + parameters within a short window, you get the cached result at no additional cost. The response includes "cached": true when this happens.

This matters if you're restarting sessions frequently or have multiple agents hitting the same context query.

Putting it together

The context endpoint removes the "recall + rank + format" glue code from your agent setup. One call, one formatted block, ready for injection. It costs twice as much as a single recall, but it replaces the logic you'd otherwise build yourself and often replaces multiple recall calls.

For most OpenClaw agents, the practical setup is: call /v1/context at session start with a description of the current task, inject the result into the system prompt, and let raw /v1/recall handle specific mid-session lookups.

Full API reference at docs.memoclaw.com. The CLI (npm install -g memoclaw) wraps both context and recall endpoints.

DEV Community