DEV Community

Steven Aragón Urrea
Steven Aragón Urrea

Posted on

Experience Working with OpenClaw (Clawbot)

Experience Working with OpenClaw (Clawbot)

I want to share my experience as a developer working with OpenClaw (Clawbot), including a real-world setup and some practical insights after using it in production-like scenarios.

My setup is based on:

  • AMD Ryzen 5 5600X
  • 32GB RAM
  • RTX 3060 (LHR)
  • NVMe SSD storage
  • Ubuntu Server (headless environment)

I also experimented with multiple Ollama local models as part of a fallback strategy, along with cloud models like Kimi 2.5.

From a configuration standpoint, OpenClaw is a powerful and flexible system — but that flexibility comes at a cost.


Real Setup (Explained)

Instead of showing raw configuration, here’s how my setup is structured conceptually:

  • Primary model: Claude Opus 4.6
  • Cloud alternative: Kimi 2.5
  • Fallback #1: Local models via Ollama (GPU)
  • Fallback #2: OpenAI models
  • Interface: Telegram
  • Remote access: Tailscale as a service

Flow:

Primary cloud model → Alternative cloud → Local fallback → Secondary cloud fallback

This setup aims to balance:

  • Performance
  • Reliability
  • Offline capability

Infrastructure & Tooling

Ubuntu Server

I chose Ubuntu Server to fully utilize the machine:

  • Lower overhead (no GUI)
  • Better resource allocation for models
  • More predictable performance for long-running processes

Tailscale (as a service)

I used Tailscale running as a background service to access the machine remotely.

The experience was excellent:

  • Fast and stable connections
  • Zero-config networking
  • Secure remote access without exposing ports

This made it extremely easy to:

  • Manage OpenClaw remotely
  • Debug issues
  • Interact with the system from anywhere

Claude Code for Setup

I used Claude Code to bootstrap and configure the environment.

This significantly reduced setup friction:

  • Faster iteration
  • Easier debugging
  • Better guidance wiring models and fallbacks

Local Models Tested (Ollama)

I tested several local models using Ollama:

  • Gemma 3 (12B)
  • Qwen 3 (14B, abliterated)
  • Qwen 3.5 (9B)

Model Performance Ranking

🧠 Overall Ranking (Reasoning + Speed)

  1. Kimi 2.5 (cloud) → Best overall performance
  2. Gemma 3 (local)
  3. Qwen 3 (local)
  4. Qwen 3.5 (local)

☁️ Provider Ranking

  1. Anthropic (Claude models) → Most reliable reasoning
  2. OpenAI models → Strong and consistent
  3. Ollama (local models) → Significantly weaker

My Experience with Local Models

Using Ollama with local models is a great idea in theory.

Pros:

  • Works without internet
  • Fully local
  • Good fallback strategy

Reality:

Even running on an RTX 3060 and testing multiple models:

Local models were, in practice, a major downgrade.

It almost feels like the system gets “lobotomized” when switching to them:

  • Weak reasoning
  • Poor context handling
  • Inconsistent outputs

This becomes very clear when compared to:

  • Claude (Anthropic)
  • OpenAI models
  • Kimi 2.5 (which performed surprisingly well)

Key Challenges

1. Opacity in the TUI

The TUI feels like a black box.

You don’t know:

  • Which model is active
  • When fallbacks trigger
  • Why decisions are made

This makes debugging painful.


2. Lack of Cross-Channel Consistency

  • Telegram ≠ TUI
  • No shared continuity
  • Fragmented sessions

3. Configuration Complexity

You must carefully align:

  • Models
  • Providers
  • Fallbacks
  • Channels

Otherwise:

  • Silent failures
  • Weird behaviors
  • Hard-to-debug issues

Final Thoughts

OpenClaw has a strong architectural foundation:

  • Multi-model orchestration
  • Fallback strategies
  • Multi-channel interaction

But it needs improvements in:

  • Transparency (TUI)
  • Cross-channel consistency
  • Developer experience
  • Local model performance

Conclusion

Combining:

  • Ubuntu Server
  • Tailscale
  • Cloud + local models

creates a very powerful personal AI infrastructure.

However:

Today, cloud models still massively outperform local ones, even on decent hardware like an RTX 3060.

The idea is solid.

The execution is promising.

But the ecosystem — especially around local models — still has a long way to go.

Top comments (0)