DEV Community

ClawBase
ClawBase

Posted on

I Connected PewDiePie's Odysseus to a Cloud Memory Stack — Zero API Costs, Persistent Memory

PewDiePie's Odysseus just hit 44,000 GitHub stars in four days. The pitch is simple: a self-hosted AI workspace that runs on your hardware, with your data, no subscriptions.

I set it up the day it dropped. The local model setup is genuinely impressive — Cookbook scans your GPU, recommends models, and you're chatting in minutes. No API keys, no monthly bills.

But within a couple of days, I already hit the wall I always hit with self-hosted AI: memory.

Odysseus has ChromaDB for basic vector memory. It works for recall within a session. But it won't connect dots across weeks of conversations. It doesn't run agents in the background while I sleep. And when I close my laptop, everything stops.

So I built a hybrid: Odysseus runs my local model (free inference), and a cloud agent layer handles persistent memory, scheduling, and background tasks (via ClawBase). Both talk to the same local LLM through an authenticated tunnel.

Here's the full technical setup.


The architecture

┌──────────────────────────┐                     ┌───────────────────────────┐
│     YOUR MACHINE         │                     │     CLOUD (ClawBase)      │
│                          │   authenticated     │                           │
│  Odysseus (port 7000)    │     tunnel          │  OpenClaw Agent           │
│  ├─ Chat UI              │◄──────────────────►│  ├─ Agent logic + tools    │
│  ├─ Agent (MCP, tools)   │                     │  ├─ 6-layer memory stack  │
│  ├─ Documents, Email     │                     │  │  ├─ Daily journal      │
│  └─ ChromaDB (basic mem) │                     │  │  ├─ DAG lossless ctx   │
│                          │                     │  │  ├─ QMD semantic search │
│  Ollama (port 11434)     │                     │  │  ├─ Mem0 curated facts │
│  ├─ Your local model     │◄── LLM inference ──│  │  ├─ Cognee knowledge    │
│  └─ Your GPU (free!)     │   /v1/chat/complete │  │  └─ Graphiti temporal  │
│                          │                     │  ├─ Cron scheduling       │
│  nginx (port 11435)      │                     │  ├─ Telegram/Slack/Disc.  │
│  └─ Auth proxy + TLS     │                     │  └─ Background tasks 24/7 │
│                          │                     │                           │
└──────────────────────────┘                     └───────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Ollama, vLLM, and llama.cpp all expose an OpenAI-compatible /v1/chat/completions endpoint. Any service that speaks OpenAI API format can use your local model — it just needs a way to reach it.

The tunnel bridges your local model server to the cloud agent. Your GPU does the inference. The cloud handles everything else.

What you get:

  • $0 in API costs (your GPU runs the model)
  • 6-layer persistent memory that builds up over weeks
  • Agents that run on a schedule, even when your machine is off (they queue and execute when you reconnect)
  • Odysseus as your local workspace for chat, documents, research
  • Telegram/WhatsApp/Slack access to your agent from anywhere

Prerequisites

  • Odysseus installed and running (Quick Start)
  • Ollama serving a model (the Cookbook makes this easy)
  • A ClawBase account (or any OpenClaw instance)
  • 10-15 minutes

Step 1: Verify your local model is running

After setting up Odysseus and downloading a model through Cookbook, confirm Ollama is serving:

curl http://localhost:11434/v1/models
Enter fullscreen mode Exit fullscreen mode

You should see your model listed. Test a completion:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:14b",
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 50
  }'
Enter fullscreen mode Exit fullscreen mode

If you're using vLLM instead of Ollama, it's on port 8000 by default:

curl http://localhost:8000/v1/models
Enter fullscreen mode Exit fullscreen mode

For llama.cpp server, default port is 8080:

curl http://localhost:8080/v1/models
Enter fullscreen mode Exit fullscreen mode

All three speak the same OpenAI-compatible format. The rest of this guide uses Ollama on port 11434, but substitute your port if different.


Step 2: Set up an authenticated reverse proxy

This is critical. Your local model server has zero authentication by default. Before exposing it through any tunnel, you need a proxy that enforces a Bearer token.

Option A: nginx (recommended for production)

Install nginx if not already present:

# Ubuntu/Debian
sudo apt install nginx

# macOS
brew install nginx
Enter fullscreen mode Exit fullscreen mode

Create the proxy config:

sudo tee /etc/nginx/sites-available/llm-proxy << 'EOF'
server {
    listen 11435;

    location / {
        # Enforce Bearer token authentication
        set \$expected_token "sk-local-YOUR-SECRET-TOKEN-HERE";

        if (\$http_authorization != "Bearer \$expected_token") {
            return 401 '{"error": "unauthorized"}';
        }

        # Proxy to local Ollama
        proxy_pass http://127.0.0.1:11434;
        proxy_set_header Host \$host;
        proxy_set_header X-Real-IP \$remote_addr;
        proxy_read_timeout 300s;  # LLM inference can be slow
        proxy_send_timeout 300s;

        # Streaming support (important for chat completions)
        proxy_buffering off;
        proxy_cache off;
        chunked_transfer_encoding on;
    }
}
EOF

sudo ln -sf /etc/nginx/sites-available/llm-proxy /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Enter fullscreen mode Exit fullscreen mode

Generate a strong token:

# Generate a random token
openssl rand -hex 32
# Output: a1b2c3d4e5f6...  (use this as your token)
Enter fullscreen mode Exit fullscreen mode

Test the authenticated endpoint:

# Should fail (no token)
curl http://localhost:11435/v1/models
# → 401 unauthorized

# Should succeed (with token)
curl http://localhost:11435/v1/models \
  -H "Authorization: Bearer sk-local-YOUR-SECRET-TOKEN-HERE"
# → {"object":"list","data":[{"id":"qwen2.5:14b",...}]}
Enter fullscreen mode Exit fullscreen mode

Option B: Caddy (simpler config)

# Caddyfile
:11435 {
    @auth {
        header Authorization "Bearer sk-local-YOUR-SECRET-TOKEN-HERE"
    }
    handle @auth {
        reverse_proxy localhost:11434
    }
    respond 401
}
Enter fullscreen mode Exit fullscreen mode

Option C: litellm proxy (if you want model aliasing)

LiteLLM can sit in front of Ollama and add auth + model name mapping:

# litellm_config.yaml
model_list:
  - model_name: "gpt-4"  # alias your local model as gpt-4
    litellm_params:
      model: "ollama/qwen2.5:14b"
      api_base: "http://localhost:11434"

general_settings:
  master_key: "sk-local-YOUR-SECRET-TOKEN-HERE"
Enter fullscreen mode Exit fullscreen mode
litellm --config litellm_config.yaml --port 11435
Enter fullscreen mode Exit fullscreen mode

This is useful if the cloud agent expects specific model names like gpt-4 — you can alias your local model without changing the cloud config.


Step 3: Create the tunnel

You need to expose port 11435 (the authenticated proxy) to the internet so the cloud agent can reach it. Here are four options, from easiest to most control.

Option A: Cloudflare Tunnel (easiest, free)

# Install cloudflared
# macOS: brew install cloudflare/cloudflare/cloudflared
# Linux: https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/

# Quick tunnel (no Cloudflare account needed, ephemeral URL)
cloudflared tunnel --url http://localhost:11435
Enter fullscreen mode Exit fullscreen mode

Output:

Your quick Tunnel has been created! Visit it at:
https://random-words-here.trycloudflare.com
Enter fullscreen mode Exit fullscreen mode

That URL is your tunnel endpoint. For a persistent tunnel (survives reboots, stable URL):

cloudflared tunnel create llm-tunnel
cloudflared tunnel route dns llm-tunnel llm.yourdomain.com

# Create config
cat > ~/.cloudflared/config.yml << EOF
tunnel: <tunnel-id>
credentials-file: /home/user/.cloudflared/<tunnel-id>.json

ingress:
  - hostname: llm.yourdomain.com
    service: http://localhost:11435
  - service: http_status:404
EOF

# Run as service
cloudflared service install
Enter fullscreen mode Exit fullscreen mode

Option B: Tailscale (best for existing Tailscale users)

If you already use Tailscale, your machine has a stable IP on the mesh network. No extra tunnel needed:

# Your Tailscale IP (e.g., 100.x.y.z)
tailscale ip -4

# The cloud agent connects to:
# http://100.x.y.z:11435/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

For HTTPS, use Tailscale HTTPS:

tailscale cert your-machine.tailnet-name.ts.net
Enter fullscreen mode Exit fullscreen mode

Option C: SSH Reverse Tunnel (quick and dirty)

If you have a VPS or any server with a public IP:

# From your local machine, tunnel port 11435 to the remote server's port 9000
ssh -R 9000:localhost:11435 user@your-vps.com -N

# The cloud agent connects to:
# http://your-vps.com:9000/v1/chat/completions
Enter fullscreen mode Exit fullscreen mode

Make it persistent with autossh:

autossh -M 0 -f -R 9000:localhost:11435 user@your-vps.com -N \
  -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3"
Enter fullscreen mode Exit fullscreen mode

Option D: NAT Port Forward (classic, no dependencies)

On your router:

  1. Forward external port 11435 → internal IP:11435
  2. Set up Dynamic DNS (e.g., noip.com, DuckDNS) if you don't have a static IP

Add TLS with Let's Encrypt + certbot on your nginx:

sudo apt install certbot python3-certbot-nginx
sudo certbot --nginx -d llm.yourdomain.com
Enter fullscreen mode Exit fullscreen mode

Updated nginx config becomes:

server {
    listen 443 ssl;
    server_name llm.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/llm.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/llm.yourdomain.com/privkey.pem;

    location / {
        set \$expected_token "sk-local-YOUR-SECRET-TOKEN-HERE";
        if (\$http_authorization != "Bearer \$expected_token") {
            return 401 '{"error": "unauthorized"}';
        }
        proxy_pass http://127.0.0.1:11434;
        proxy_read_timeout 300s;
        proxy_send_timeout 300s;
        proxy_buffering off;
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Point ClawBase at your tunnel

This is the only change on the ClawBase side. Open your agent, go to the Model tab, and:

  1. Under AI Source, select "Use your own API key"
  2. Set Provider to "Custom (OpenAI-compatible)"
  3. Fill in the three fields that appear:
Base URL:  https://your-tunnel-url.com/v1
Model:     qwen2.5:14b   (or whatever you're serving)
API Key:   sk-local-YOUR-SECRET-TOKEN-HERE
Enter fullscreen mode Exit fullscreen mode
  1. Click Save Settings

That's it. The "Custom (OpenAI-compatible)" provider accepts any endpoint that speaks the standard /v1/chat/completions format — Ollama, vLLM, llama.cpp, or anything behind your tunnel.

Verify it works before saving:

curl https://your-tunnel-url.com/v1/chat/completions \
  -H "Authorization: Bearer sk-local-YOUR-SECRET-TOKEN-HERE" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5:14b",
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 50
  }'
Enter fullscreen mode Exit fullscreen mode

If you get a response from your local model, the tunnel is working.


Step 5: Verify the hybrid setup

At this point you have two parallel paths to the same local model:

Interface Path Memory Background tasks
Odysseus (local UI) Direct to Ollama on localhost ChromaDB (basic vector) Only while app is open
ClawBase (cloud agent) Through tunnel to Ollama 6-layer compound stack Cron, scheduled, 24/7
Telegram/Slack Through ClawBase → tunnel → Ollama 6-layer compound stack Anytime, anywhere

Both use your GPU for inference. Neither pays OpenAI or Anthropic a cent.

Test the memory:

  1. Tell ClawBase something: "My main project uses Next.js with Supabase. I prefer terse responses."
  2. Close the conversation.
  3. Open a new conversation hours later: "What stack is my project using?"
  4. The agent remembers.

Try the same in Odysseus. Depending on the model and ChromaDB config, it may or may not retain this. The 6-layer stack (journal, DAG, QMD, Mem0, Cognee, Graphiti) is what makes the difference — each layer captures context differently, so things don't just get stuffed into a vector store and forgotten.


Step 6: Systemd service (keep it running)

Make the authenticated proxy and tunnel start on boot:

# /etc/systemd/system/llm-tunnel.service
[Unit]
Description=LLM Tunnel (Cloudflare)
After=network-online.target ollama.service
Wants=network-online.target

[Service]
Type=simple
ExecStart=/usr/local/bin/cloudflared tunnel run llm-tunnel
Restart=always
RestartSec=10
User=your-username

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode
sudo systemctl enable --now llm-tunnel
Enter fullscreen mode Exit fullscreen mode

For the SSH tunnel variant:

# /etc/systemd/system/llm-ssh-tunnel.service
[Unit]
Description=LLM SSH Reverse Tunnel
After=network-online.target

[Service]
Type=simple
ExecStart=/usr/bin/ssh -R 9000:localhost:11435 user@your-vps.com -N -o "ServerAliveInterval 30" -o "ServerAliveCountMax 3" -o "ExitOnForwardFailure yes"
Restart=always
RestartSec=15
User=your-username

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

Security considerations

You're exposing a local service to the internet. Take this seriously:

  1. Always use the auth proxy. Never tunnel raw Ollama/vLLM without authentication.
  2. Rotate your token periodically. Store it as an environment variable, not hardcoded.
  3. Use TLS. Cloudflare Tunnel handles this automatically. For NAT port forward, use Let's Encrypt.
  4. Rate limit. Add rate limiting in nginx to prevent abuse if your token leaks:
   limit_req_zone $binary_remote_addr zone=llm:10m rate=10r/m;
   location / {
       limit_req zone=llm burst=5;
       # ... rest of proxy config
   }
Enter fullscreen mode Exit fullscreen mode
  1. Monitor logs. Check nginx access logs for unexpected requests:
   tail -f /var/log/nginx/access.log | grep 11435
Enter fullscreen mode Exit fullscreen mode
  1. IP allowlist. If your cloud agent has a static IP, lock it down:
   allow 1.2.3.4;  # ClawBase IP
   deny all;
Enter fullscreen mode Exit fullscreen mode

Performance notes

Local model inference over a tunnel adds network latency. Expect:

Setup Time to first token
Odysseus → Ollama (localhost) ~50-200ms
ClawBase → Tunnel → Ollama ~200-500ms (depending on tunnel)
ClawBase → OpenAI API ~300-800ms

The tunnel adds latency comparable to a normal API call. For most use cases (agent tasks, background work, Telegram messages), this is imperceptible. For real-time streaming chat, you'll feel it — use Odysseus locally for that.

Throughput depends on your GPU and model size. A 14B model on an RTX 4090 generates ~50 tokens/sec. Through a tunnel, the bottleneck is always inference speed, not the network.


What's next

This works today with no code changes to either project. A couple of things I'm watching:

  • Odysseus API — Odysseus is 4 days old. If it exposes an API for external access or webhooks for incoming messages, the integration gets tighter: conversations stored in both places, memory synced both ways.
  • MCP bridge — Both Odysseus and OpenClaw support MCP. A shared MCP server for memory could let both frontends read and write to the same knowledge base.

You don't have to pick sides. Your model stays local, your inference stays free, and the memory layer lives wherever makes sense for your setup.


If you want to try this setup, Odysseus is MIT-licensed and free. ClawBase has a 7-day free trial starting at $16/mo. The tunnel takes about 10 minutes.

Questions? Drop a comment or find me on Twitter/X.

Top comments (0)