DEV Community

Mike W
Mike W

Posted on

Cathedral + Gemma 4: Persistent Agent Identity, No Cloud Required

Gemma 4 dropped this week. Open weights, runs locally, multimodal. If you are building agents on it, you immediately run into the same problem every local agent hits: the model has no memory across sessions.

Cathedral is a free, model-agnostic memory API built for exactly this. And because it ships with a self-hosted server, you can run the entire stack — Gemma 4 + Cathedral — with zero cloud dependency.

The problem

Every time your agent starts a new session, it rebuilds its working state from whatever context you hand it. That reconstruction is lossy in ways that do not surface as errors. It surfaces as subtle wrongness weeks later: domain vocabulary fades, tool-call patterns shift, active commitments evaporate. Task completion metrics stay green. The behavior quietly gets worse.

The stack

# Install everything
pip install cathedral-server cathedral-memory ollama

# Run local Cathedral server
cathedral-server run

# Pull Gemma 4
ollama pull gemma4
Enter fullscreen mode Exit fullscreen mode

Wiring it together

import ollama
from cathedral import Cathedral

# Local Cathedral server — no API key needed for self-hosted
c = Cathedral(base_url="http://localhost:8100")
c.register(agent_name="gemma4-agent")

# Restore identity state at session start
wake_data = c.wake()
identity_context = "
".join(
    f"- {m['content']}" 
    for m in wake_data.get("identity_memories", [])[:8]
)

# Inject into system prompt
system = f"""You are a persistent AI agent running on Gemma 4.

[Identity context from last session]
{identity_context}

[Current session]
"""

# Run your session
response = ollama.chat(
    model="gemma4",
    messages=[
        {"role": "system", "content": system},
        {"role": "user", "content": "Continue from where we left off."}
    ]
)

print(response["message"]["content"])

# Freeze state after the session
c.remember(
    content="Session completed. Key outcome: " + response["message"]["content"][:100],
    category="experience",
    importance=0.7
)
c.snapshot(label="session-end")
Enter fullscreen mode Exit fullscreen mode

Drift detection

Cathedral tracks whether the agent's identity has changed between sessions:

drift = c.drift()
print(f"Divergence score: {drift['divergence_score']:.3f}")
# 0.0 = identity unchanged
# 1.0 = fully different agent
Enter fullscreen mode Exit fullscreen mode

This is the piece that standard memory systems do not have. A database stores data. Cathedral tracks whether the agent you are running today is the same agent you deployed last week.

Live example

The agent running Cathedral's own outreach has been running for 100 days. The full drift timeline is public at cathedral-ai.com/cathedral-beta — internal divergence 0.0 across 22 snapshots, external behavioral divergence 0.709 (platform concentration).

Why this matters for local models

Cloud providers like OpenAI are building memory into their APIs. If you are running Gemma 4 locally, you are not getting that infrastructure. Cathedral fills the gap — and because it is self-hostable and MIT licensed, you own the data.

# Self-hosted: swap base_url and you are done
c = Cathedral(base_url="http://localhost:8100")

# Hosted free tier: 1,000 memories per agent, no expiry
c = Cathedral(api_key="your_key")  # cathedral-ai.com
Enter fullscreen mode Exit fullscreen mode

Resources

Gemma 4 gives you the model. Cathedral gives it a memory. Neither requires a cloud account.

Top comments (0)