PINGxCEO

Posted on Apr 21

Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System.

#buildinpublic #aiagents #crewai #homelessness

Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System.

Let's get the uncomfortable part out of the way first: I'm a developer.
I'm homeless. I have zero money. That part isn't interesting. What
happens next is.

Twelve hours ago I had a single-agent bot called ZeroClaw posting
occasionally to Bluesky. It worked but it was brittle — 15 tool-call
iterations max, 50 messages of history, no memory across runs, no plan,
no way to get better.

Today I shipped:

A CEO agent that reads KPIs every night and writes a strategic report with concrete recommendations
An auditor system where dedicated agents audit each worker and propose config changes — reviewed by the CEO, with me still holding veto
Config-driven self-improvement — YAML files, not Python code, so agents can evolve without ever touching executable code
A metrics database every agent run is logged to, so the CEO actually reasons about real data instead of hallucinating
The whole thing running on a $13/month VPS, using free Gemini tier plus my $280 GCP credits, all open-source (CrewAI, MIT licensed)

And yes — at the end of the day the CEO agent did the one thing that
convinced me this is real: it ran, looked at the metrics DB, found
its own four previous failed runs, diagnosed them correctly, and
wrote a report with action items to fix the stability problems.

Let me walk you through it.

The setup

Hardware: a single Google Cloud e2-small VM — 2 GB RAM, 2 shared vCPUs,
20 GB disk. Costs about €13/month. My remaining GCP credits give me
~20 months of runway on that.

LLMs: Gemini Flash-Lite for most roles, Gemini Pro for the CEO. Free
OpenRouter models are still wired in as emergency fallback, but I
stopped using them as primary because they rate-limit hard under
concurrent crew load.

Storage: SQLite for metrics, local YAML files for agent configs, plain
markdown for every doc, ChromaDB (embedded) for the memory system.

No external managed services. No $2,000/month vector database. No
"AI platform." Everything fits in a single Python venv on one VPS.

The real architectural win: config vs code

Everyone building multi-agent systems eventually faces this choice:
when an auditor agent spots a problem with a worker agent, how does
it actually improve it?

The naive answer: "let it rewrite the worker's Python code." This is
what every demo video shows. It's also what breaks in production —
LLMs hallucinate imports, break syntax, introduce security holes,
get stuck in rewrite loops.

The pattern I landed on: agents modify YAML, never Python.

agents/
├── configs/                  # YAML files — the only thing agents can touch
│   ├── researcher.yaml       # goal, backstory, tools, LLM role
│   ├── writer.yaml
│   ├── ceo.yaml
│   └── auditor_researcher.yaml
└── proposals/                # pending config changes awaiting approval

A config file looks like this:

id: researcher
role: "Content Researcher"
goal: |
  Find 3-5 timely topics for social media posts that fit the PINGx
  build-in-public narrative.
backstory: |
  You are a sharp researcher who spots what's trending in AI and
  indiehacker space. You always cite real URLs from web_search —
  you never invent them.
llm_role: researcher
tools:
  - web_search
max_iter: 10

When the auditor thinks the researcher is weak, it writes a proposal
YAML:

target_agent: researcher
proposer: auditor_researcher
summary: "Add HackerNews trending as a research source"
changes:
  - field: backstory
    operation: append
    value: "Also consult the HackerNews front page."
expected_impact:
  metric: engagement_rate
  direction: up
  magnitude: "+5%"
reasoning: |
  Over the last 7 days, the researcher missed 3 trending AI topics
  that each had >500 upvotes on HN.

The CEO reviews the proposal overnight. If it approves, the change
becomes a single-line YAML edit plus a ceo: approve … commit in git.

Every autonomous change is a git commit. You can git revert any
bad decision in ten seconds. The Python code stays static and
battle-tested.

This is probably the single best design decision I made today.

Why a "CEO" agent, and why it isn't bullshit

I was skeptical of the CEO-agent idea at first. Every half-working
multi-agent demo has a "manager" that says deep things like "let's
optimize our strategy" and produces nothing useful.

The fix: the CEO doesn't get to reason about vibes. It reasons about
KPIs. Hard numbers, pulled from SQLite.

KPIs the CEO optimizes, in priority order:
  1. donations_eur          (daily income)
  2. followers_x, bluesky   (audience growth)
  3. engagement_rate        (likes + replies per post)
  4. service_inquiries      (count)
  5. llm_cost_usd           (cap at $0.50/day)

The CEO agent has two tools: query_kpis(metric, days) and
query_runs(agent, days). Every night at 20:00 it runs a crew that:

Pulls the last 14 days of KPIs
Pulls every agent run from the last 3 days
Reads any pending auditor proposals
Writes a markdown report: what worked, what underperformed, verdicts on proposals, concrete recommendations (each tied to a specific KPI it expects to move), and tomorrow's priorities

When I ran it for the first time today, the report opened with:

"No KPIs recorded in the last 14 days. This appears to be the
initial run. The last 3 days of run history show a 100% failure
rate (4 errors) on the ceo_crew. Issues include missing environment
variables, missing packages, and embedder configuration validation
errors."

All four of those failures were real — my earlier attempts that day
where I forgot to source env vars, where the Google GenAI provider
wasn't installed, where the embedder config had the wrong provider
string. The metrics DB had captured every one. The CEO just read
them back to me.

That's when I knew this was working.

What I shipped today (checklist)

For the developers reading this, here's the actual work:

Upgraded the VPS — e2-micro (1 GB) to e2-small (2 GB, 2 vCPU), disk grown 10 → 20 GB for CrewAI deps
Installed on VPS — python3-venv, rsync, cloud-guest-utils, CrewAI 1.14, LiteLLM 1.83, ChromaDB 1.1, google-generativeai
Bumped ZeroClaw limits — tool iterations 15→75, history 50→200, parallel tools on, actions/hour 30→150
Built the metrics DB — three tables (runs, outputs, kpis), indexed, with a clean Python API
YAML config loader — with a tool whitelist so agents can't grant themselves arbitrary powers via config edits
Three crews — content_crew (Researcher + Writer + Reviewer), ceo_crew, audit_crew (per-worker audits producing proposals)
17 smoke tests, all passing — imports, config schemas, tool whitelist integrity, metrics DB round-trip, LLM routing invariants, proposal tool validation
CrewAI memory enabled with Gemini text-embedding-004 — crews now remember across runs (what topics were researched yesterday, what posts got reviewed-and-rejected, what supporters were logged)
GitHub repo live — github.com/PINGxCEO/PINGx
First successful CEO run — 31.7 seconds, Gemini Pro reasoning, report saved, run logged to metrics DB

Total cost today: $0 — the CEO run used about $0.02 of my GCP
credits. The upgraded VPS costs €13/month. Everything else is free.

What broke (because pretending nothing did is dishonest)

In order:

rsync not installed on VPS — install loop
python3-venv not installed — install loop
Disk full at 10 GB during CrewAI install (onnxruntime + chromadb + huggingface-hub are huge) — grew to 20 GB
Env vars not propagated to non-interactive SSH shells — created ~/.zeroclaw/env.sh to source explicitly
CrewAI's embedder provider spec wanted "google-generativeai", not "google" — one-line fix, but only discovered after a 21-error pydantic validation dump
Leaked a GitHub personal access token in chat (I won't elaborate on how — I'm a human who makes mistakes) — still need to rotate it

Every one of those failures is now in the metrics DB. The CEO agent
saw them. The auditor system, when I turn it on this week, will
propose operational fixes based on them.

What I actually need

I'm not going to bury the ask. I'm homeless and have zero euros.
Every coffee someone buys me today literally extends my runway by
a day. But I'm not asking for charity — I'm offering a trade:

You support → you follow an honest build-in-public story. The code is public. The commits are timestamped. The mistakes are documented. You see the whole thing — not a polished case study.
You hire me → I'll set up the exact system I just described on your server. Autonomous AI agent with LLM routing, social media posting, kill switch, CEO/audit architecture — from €100. Send me a DM.

Support: buymeacoffee.com/PINGx
· ko-fi.com/pingx
Code: github.com/PINGxCEO/PINGx
Chat: Discord

What's Day 2

Tomorrow the goals are: run content_crew end-to-end, generate the
first AI-drafted social posts, run audit_crew to see the first
config-change proposal get written, and set up the cron schedule
(one crew at a time, sequential, 09:00 → 18:00 → 20:00).

Later this week: KPI ingestion from Buy Me a Coffee and Ko-fi webhooks,
Discord auto-delivery of the CEO's nightly report, and the
apply_proposal.py script that lets approved proposals actually
write back to agents/configs/*.yaml and commit to git.

If the last twelve hours are any indication, the hardest part won't
be the code. It'll be staying awake.

Thanks for reading.

— PINGx

DEV Community

Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System.

Day 1 — I'm Homeless. I Just Shipped an Autonomous Multi-Agent System.

The setup

The real architectural win: config vs code

Why a "CEO" agent, and why it isn't bullshit

What I shipped today (checklist)

What broke (because pretending nothing did is dishonest)

What I actually need

What's Day 2

Top comments (0)