Aman Sachan

Posted on Jun 14

I rebuilt Zo Computer from scratch in 775 lines of Python — here's what stuck and what snapped

#ai #architecture #python #opensource

Zo Computer gives you an AI agent, a skills registry, a compute pool, browser automation, file hosting, scheduled automations, and persistent memory — all on a personal server. I wanted to understand every seam, so I rebuilt the whole thing in vanilla Python 3 with no web framework and no Docker. The result is ZoClone: 10 modules, 775 lines, 4 SQLite tables, one ThreadPoolExecutor. This is what the architecture actually looks like when you strip out the platform.

The whole orchestrator in one class

The main module is ZoClone.__init__ — and that's the entire dependency graph. Each subsystem is an attribute:

class ZoClone:
    def __init__(self):
        self.db = init_db()
        self.executor = ThreadPoolExecutor(max_workers=10)
        self.ai_client = None
        self.pool = pool              # ComputePool singleton
        self.hosting = hosting        # HostingService singleton
        self.memory = memory          # SQLite-backed memory
        self.scheduler = scheduler    # cron-like automations

No DI container, no event bus, no message queue. Every tool is a method on the same object. If you're coming from a microservice background, this is going to look like a 2014 Django app — and that's the point. When you can fit the whole mental model on one screen, you stop second-guessing where a bug lives.

The SQLite schema is the truth

Four tables. No ORM. No migrations. The schema is in a single executescript block:

CREATE TABLE IF NOT EXISTS conversations(id TEXT PRIMARY KEY, title TEXT, updated_at INTEGER);
CREATE TABLE IF NOT EXISTS messages(id TEXT PRIMARY KEY, conv_id TEXT, role TEXT, content TEXT, tools TEXT, created_at INTEGER);
CREATE TABLE IF NOT EXISTS memory(id TEXT PRIMARY KEY, key TEXT UNIQUE, value TEXT, updated_at INTEGER);
CREATE TABLE IF NOT EXISTS files(id TEXT PRIMARY KEY, path TEXT UNIQUE, content TEXT, encoding TEXT, updated_at INTEGER);

IDs are SHA-256 hashes of (timestamp, content) truncated to 24 chars. The tools column on messages is a freeform JSON blob. The memory table is a key-value store with UNIQUE on key, which forces last-write-wins semantics. When your entire data model is four tables, schema design becomes a five-minute conversation instead of a five-day one.

The skills system is just frontmatter + importlib

Skills in Zo are a folder with a SKILL.md (frontmatter) and a scripts/<name>.py (handler). I auto-discover them at import time:

def load_skill(name: str, path: Path) -> Skill:
    md_content = path.read_text()
    # parse YAML-ish frontmatter between --- markers
    frontmatter = {}
    if md_content.startswith("---"):
        end = md_content.find("---", 3)
        for line in md_content[3:end].strip().split("\n"):
            if ":" in line:
                k, v = line.split(":", 1)
                frontmatter[k.strip()] = v.strip()

    py_file = path.parent / "scripts" / f"{name}.py"
    spec = importlib.util.spec_from_file_location(name, py_file)
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    handler = getattr(module, "run", getattr(module, "execute", None))
    return Skill(name=name, description=..., triggers=..., handler=handler)

No registry service, no API call to discover skills. The filesystem is the registry. Drop a folder, restart, it's loaded. The triggers field in frontmatter is just a comma-separated string — the LLM gets all skill descriptions in its system prompt and decides which one to call. There's no embedding-based retrieval because, at 30 skills, exact-match triggers work fine.

Compute pool: priority queue with a single lock

The peer-to-peer compute mesh in ZoClone is a dict of jobs, a dict of nodes, and one threading.Lock:

def assign_job(self, node_id: str) -> Optional[Dict]:
    with self.lock:
        pending = [j for j in self.jobs.values() if j["status"] == "pending"]
        if not pending:
            return None
        pending.sort(key=lambda x: -x["priority"])
        job = pending[0]
        job["status"] = "assigned"
        job["assigned_node"] = node_id
        return job

That's it. The hub polls, picks the highest-priority pending job, marks the node busy, returns the work. No Redis Streams, no RabbitMQ, no Kafka. The trade-off is obvious: this is a single-process orchestrator, not a horizontally-scalable scheduler. But for a 50-node grid running nightly ML batch jobs, you don't need Kafka. You need a lock and a sort.

GPU tier multipliers, regional pricing, and reputation decay are all JSON columns in the nodes dict. When you need to add a new pricing rule, you change one line of assign_job. Compare that to a Kubernetes operator with custom resource definitions, admission webhooks, and reconciler loops.

The agent manager is just async gather

Zo has a /zo/ask API that spawns child agent invocations. The clone just calls it:

async def spawn(self, agent_id: str, prompt: str):
    async with aiohttp.ClientSession() as session:
        async with session.post(
            "https://api.zo.computer/zo/ask",
            headers={"authorization": self.api_token, "content-type": "application/json"},
            json={"input": prompt, "model_name": self.model}
        ) as resp:
            return {"agent_id": agent_id, "output": (await resp.json()).get("output", "")}

async def spawn_all(self, agents: list):
    return await asyncio.gather(*[self.spawn(a["id"], a["prompt"]) for a in agents])

Five agent invocations in parallel is asyncio.gather. No Celery, no RQ, no Dask. The model_name is hardcoded — there's exactly one LLM driver, and it's whatever Zo gives you. If you want a different model, change one string.

The honest list of things that broke

No sandboxing. run_command is subprocess.run(cmd, shell=True). The agent can rm -rf ~ and it will. Production Zo wraps this in gVisor; I don't.
No embedding search. Memory recall is a LIKE '%query%' scan. Fine at 1k rows, embarrassing at 100k.
No streaming. Every chat() call is blocking. You see the full response or nothing.
No auth. set_key() writes API keys to a flat JSON file in ~/.zoclone/. Multi-user means multi-disaster.
No tests. The whole codebase is a personal learning exercise. There is one if __name__ == "__main__" block that prints the pool status.

What I'd change if I were building a real product

Wrap run_command in a gVisor container, or at minimum a chroot + seccomp.
Swap the memory table for SQLite-vec0 and do real semantic recall.
Replace the lock-and-dict compute pool with a proper work queue (BullMQ, or just Redis streams).
Add an Authorization header check on every API endpoint. Even internal services.
Add a single integration test that runs a real agent loop end-to-end.

The real lesson wasn't "look how short the code is" — it was "look how much of the platform is just a thin layer over a database, a thread pool, and a few HTTP calls." The parts that are genuinely hard (the LLM orchestration loop, the skill discovery) are maybe 100 lines. The rest is plumbing, and most of the plumbing doesn't need to exist.

Repo: github.com/AmSach/ZoClone
License: MIT
Stack: Python 3.10+, SQLite, requests, aiohttp, no web framework

If you've built a personal-AI clone of your own, drop the repo link in the comments. I want to see how other people split the agent loop from the storage layer.

Python #AI #OpenSource #Architecture #BuildInPublic #SQLite

DEV Community