SaifRehman

Posted on Apr 27

I pointed the OpenAI SDK at my friend's gaming PC. It just worked.

#opensource #p2p #llm #ai

I changed two strings in a Python script — base_url and api_key — and it stopped calling OpenAI.

Instead, the request travelled across the public internet, into a Podman container running on a friend's idle gaming PC, ran inference on a local Llama 3.2, and streamed the response back to my laptop.

No cloud account. No API key. No data egress to a third party.

The Python looked like this:

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8080/v1",
    api_key="anything",
)

resp = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Draft a 500-word leave policy."}],
)
print(resp.choices[0].message.content)

To the openai SDK, it was just another OpenAI endpoint. To everyone else watching the wire, it was two laptops talking to each other through a peer-to-peer mesh.

This post is about what's underneath that snippet, and why someone built it.

What is AgentFM

AgentFM is a peer-to-peer compute grid for AI agents.

You package your agent as a container — any container that reads stdin and writes stdout — and it joins a mesh of other people's containers. Anyone on the mesh can dispatch tasks to your container, your container can dispatch tasks to theirs, and the data never touches a centralized cloud.

Think of it as SETI@home, but for AI. Or BitTorrent, but the thing being shared is GPU time.

Three pieces, that's the whole vocabulary:

A Worker runs on whoever's donating compute. It wraps your container and broadcasts what hardware it has and how busy it is.
A Boss is your laptop. It dispatches tasks. It can be an interactive radar terminal, or a headless HTTP gateway your apps talk to.

That's it. No central API, no rate limit, no billing dashboard.

Why someone built this

There's a gaming PC in your bedroom, a workstation at your co-worker's house, and a GPU server at the office. All idle, most of the day.

There's also you, three tabs deep into the OpenAI billing dashboard wondering why an evening of LangChain experiments cost $40.

The gap between those two facts has bothered me for a while. The hardware exists. The models exist. What's missing is the connective tissue — something that lets the gaming PC and the openai-python script find each other across NATs, firewalls, and continents, without anyone having to set up a VPN, open a port, or learn a new SDK.

A dozen projects have tried "distributed inference" before. They mostly failed not on the runtime, but on the integration story. Each one shipped a custom client library, a custom auth scheme, a custom retry policy. By the time you'd wired your existing LangChain code into it, you'd written more glue than original code.

So the constraint became: whatever this is, it has to look like OpenAI from the outside.

Not "OpenAI-inspired." Not "OpenAI-flavored." Same routes. Same JSON. Same SSE framing. Same error envelopes.

So that LangChain, LlamaIndex, LiteLLM, Continue, Open WebUI, the raw openai Python and Node SDKs — every existing AI tool that already knows how to call OpenAI — would work without modification.

Spoiler from the opening: they do.

A real example, end to end

Here's the whole thing in one terminal session.

On the machine donating compute (the "gaming PC"):

agentfm -mode worker \
  -agentdir "./my-agent" -image "my-agent:v1" \
  -model "llama3.2" -agent "Home Rig" \
  -maxtasks 10 -maxcpu 60 -maxgpu 70

That command does two things. It builds the container in ./my-agent and starts broadcasting "I have llama3.2, I'm at 14% CPU load, I have 0/10 tasks running" to the mesh every two seconds. The -maxcpu 60 is a hard circuit breaker — if your CPU climbs past 60% (you started a game, you opened Photoshop), the worker auto-rejects new tasks and flips to BUSY. A node serving the mesh can't be DoS'd into hurting its operator's actual workflow.

On your laptop (the "client"):

agentfm -mode api -apiport 8080 &

That starts the local HTTP gateway. The OpenAI SDK call from the top of the post talks to this — a process running on 127.0.0.1. The gateway is the bridge: it speaks OpenAI's HTTP dialect on one side and AgentFM's libp2p protocols on the other.

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="anything")
resp = client.chat.completions.create(
    model="llama3.2",
    messages=[{"role": "user", "content": "Draft a 500-word leave policy."}],
)

What actually happens between those lines:

The local gateway gets the OpenAI request.
It looks up which workers are advertising llama3.2.
It picks the least-loaded one.
It opens a direct, encrypted peer-to-peer tunnel to that worker. (If both sides are behind NAT, libp2p coordinates a hole-punch through the relay — the kind of trick BitTorrent and IPFS pioneered.)
It ships the prompt, drains the worker's container stdout straight into your HTTP response.
Server-Sent Events frames go out exactly as OpenAI clients expect.

LangChain doesn't see any of this. LlamaIndex doesn't see any of this. Open WebUI just sees another OpenAI-compatible model dropdown to populate.

Files the agent drops into /tmp/output get zipped and shipped back automatically too. The agent writes a PDF, a generated image, a CSV — and it lands on the client's disk in ./agentfm_artifacts/. No SDK call, no callback, no decorator. Just write the file.

The composability surprise

The thing I didn't expect when starting this: once the OpenAI dialect was working, every existing AI tool just worked.

LangChain agents that call model="gpt-4" against the standard endpoint? Change two strings, they call into your friend's GPU instead. n8n workflow nodes that POST to OpenAI's API? Point them at 127.0.0.1:8080/v1 and they're now dispatching across a peer-to-peer grid. A Continue extension in VS Code? It just sees another backend.

It cuts the other direction too. A "planner" agent running inside someone's worker container can itself POST to the same OpenAI-compatible endpoint to fan out sub-tasks across other workers in the mesh. Agent of agents, peer to peer, no central coordinator at any layer.

This isn't because AgentFM is special. It's because OpenAI's wire format won the protocol war. Adopting it means the entire ecosystem becomes a free distribution channel for whatever runs underneath.

The decentralized compute layer was the part that needed building. The integration layer was already there, waiting.

What's underneath, briefly

The mesh runs on libp2p — same networking stack that powers IPFS, Ethereum, Filecoin. NAT punching, peer discovery, end-to-end encryption all come for free.
Workers are identified by Ed25519 public keys, not IPs. A worker's home router can flip its IP at 3am — the peer ID stays stable and the mesh rediscovers it.
Containers run in Podman sandboxes tied to the libp2p stream's lifecycle. If the boss disappears mid-task, the container is killed within 2 seconds. No wasted compute.
Want privacy? agentfm -mode genkey spits out a 256-bit pre-shared key. Nodes that share it form a closed darknet, invisible to the public mesh. Useful for "my office laptop and my home GPU box, nothing else."
Everything is observable. Prometheus metrics on /metrics, structured slog JSON logs, six metric families per role. Drop it into your existing Grafana stack and you have visibility across the whole grid.

Where to look

The whole thing is open source: github.com/Agent-FM/agentfm-core

The Hello World in the README boots a worker running Llama 3.2 and dispatches a task to it in about five minutes if you already have Podman and Ollama installed.

The most fun part to read, if you're curious about how the OpenAI bridge works, is agentfm-go/internal/boss/openai.go — that's where the wire-format translation happens. About 300 lines.

Why it matters

We've spent the last two years renting GPUs from three companies. Meanwhile we collectively own more compute than those companies have, sitting idle, waiting for nothing.

The interesting question isn't "can we build distributed AI infrastructure?" — we obviously can. The interesting question is "can we make it integrate so cleanly that nobody has to choose between the convenient OpenAI API and running on hardware they actually trust?"

Two strings — base_url and api_key — turn out to be enough.

The gaming PC in the bedroom is waiting.

Repo: Agentfm-Github