<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SaifRehman</title>
    <description>The latest articles on DEV Community by SaifRehman (@saifrehman).</description>
    <link>https://dev.to/saifrehman</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1393359%2F4ae8683c-19ab-4b63-94e6-a332b543c6bd.jpeg</url>
      <title>DEV Community: SaifRehman</title>
      <link>https://dev.to/saifrehman</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/saifrehman"/>
    <language>en</language>
    <item>
      <title>I pointed the OpenAI SDK at my friend's gaming PC. It just worked.</title>
      <dc:creator>SaifRehman</dc:creator>
      <pubDate>Mon, 27 Apr 2026 20:46:46 +0000</pubDate>
      <link>https://dev.to/saifrehman/i-pointed-the-openai-sdk-at-my-friends-gaming-pc-it-just-worked-55kp</link>
      <guid>https://dev.to/saifrehman/i-pointed-the-openai-sdk-at-my-friends-gaming-pc-it-just-worked-55kp</guid>
      <description>&lt;p&gt;I changed two strings in a Python script — &lt;code&gt;base_url&lt;/code&gt; and &lt;code&gt;api_key&lt;/code&gt; — and it stopped calling OpenAI.&lt;/p&gt;

&lt;p&gt;Instead, the request travelled across the public internet, into a Podman container running on a friend's idle gaming PC, ran inference on a local Llama 3.2, and streamed the response back to my laptop.&lt;/p&gt;

&lt;p&gt;No cloud account. No API key. No data egress to a third party.&lt;/p&gt;

&lt;p&gt;The Python looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8080/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anything&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draft a 500-word leave policy.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To the &lt;code&gt;openai&lt;/code&gt; SDK, it was just another OpenAI endpoint. To everyone else watching the wire, it was two laptops talking to each other through a peer-to-peer mesh.&lt;/p&gt;

&lt;p&gt;This post is about what's underneath that snippet, and why someone built it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is AgentFM
&lt;/h2&gt;

&lt;p&gt;AgentFM is a peer-to-peer compute grid for AI agents.&lt;/p&gt;

&lt;p&gt;You package your agent as a container — any container that reads stdin and writes stdout — and it joins a mesh of other people's containers. Anyone on the mesh can dispatch tasks to your container, your container can dispatch tasks to theirs, and the data never touches a centralized cloud.&lt;/p&gt;

&lt;p&gt;Think of it as &lt;strong&gt;SETI@home, but for AI&lt;/strong&gt;. Or &lt;strong&gt;BitTorrent, but the thing being shared is GPU time&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Three pieces, that's the whole vocabulary:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;Worker&lt;/strong&gt; runs on whoever's donating compute. It wraps your container and broadcasts what hardware it has and how busy it is.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Boss&lt;/strong&gt; is your laptop. It dispatches tasks. It can be an interactive radar terminal, or a headless HTTP gateway your apps talk to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's it. No central API, no rate limit, no billing dashboard.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why someone built this
&lt;/h2&gt;

&lt;p&gt;There's a gaming PC in your bedroom, a workstation at your co-worker's house, and a GPU server at the office. All idle, most of the day.&lt;/p&gt;

&lt;p&gt;There's also you, three tabs deep into the OpenAI billing dashboard wondering why an evening of LangChain experiments cost $40.&lt;/p&gt;

&lt;p&gt;The gap between those two facts has bothered me for a while. The hardware exists. The models exist. What's missing is the  &lt;strong&gt;connective tissue&lt;/strong&gt; — something that lets the gaming PC and the openai-python script find each other across NATs, firewalls, and continents, without anyone having to set up a VPN, open a port, or learn a new SDK.&lt;/p&gt;

&lt;p&gt;A dozen projects have tried "distributed inference" before. They mostly failed not on the runtime, but on the integration story. Each one shipped a custom client library, a custom auth scheme, a custom retry policy. By the time you'd wired your existing LangChain code into it, you'd written more glue than original code.&lt;/p&gt;

&lt;p&gt;So the constraint became: &lt;strong&gt;whatever this is, it has to look like OpenAI from the outside.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not "OpenAI-inspired." Not "OpenAI-flavored." Same routes. Same JSON. Same SSE framing. Same error envelopes.&lt;/p&gt;

&lt;p&gt;So that LangChain, LlamaIndex, LiteLLM, Continue, Open WebUI, the raw &lt;code&gt;openai&lt;/code&gt; Python and Node SDKs — every existing AI tool that already knows how to call OpenAI — would work without modification.&lt;/p&gt;

&lt;p&gt;Spoiler from the opening: they do.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real example, end to end
&lt;/h2&gt;

&lt;p&gt;Here's the whole thing in one terminal session.&lt;/p&gt;

&lt;p&gt;On the machine donating compute (the "gaming PC"):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentfm &lt;span class="nt"&gt;-mode&lt;/span&gt; worker &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-agentdir&lt;/span&gt; &lt;span class="s2"&gt;"./my-agent"&lt;/span&gt; &lt;span class="nt"&gt;-image&lt;/span&gt; &lt;span class="s2"&gt;"my-agent:v1"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-model&lt;/span&gt; &lt;span class="s2"&gt;"llama3.2"&lt;/span&gt; &lt;span class="nt"&gt;-agent&lt;/span&gt; &lt;span class="s2"&gt;"Home Rig"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-maxtasks&lt;/span&gt; 10 &lt;span class="nt"&gt;-maxcpu&lt;/span&gt; 60 &lt;span class="nt"&gt;-maxgpu&lt;/span&gt; 70
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That command does two things. It builds the container in &lt;code&gt;./my-agent&lt;/code&gt; and starts broadcasting "I have llama3.2, I'm at 14% CPU load, I have 0/10 tasks running" to the mesh every two seconds. The &lt;code&gt;-maxcpu 60&lt;/code&gt; is a hard circuit breaker — if your CPU climbs past 60% (you started a game, you opened Photoshop), the worker auto-rejects new tasks and flips to BUSY. A node serving the mesh can't be DoS'd into hurting its operator's actual workflow.&lt;/p&gt;

&lt;p&gt;On your laptop (the "client"):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;agentfm &lt;span class="nt"&gt;-mode&lt;/span&gt; api &lt;span class="nt"&gt;-apiport&lt;/span&gt; 8080 &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That starts the local HTTP gateway. The OpenAI SDK call from the top of the post talks to &lt;em&gt;this&lt;/em&gt; — a process running on &lt;code&gt;127.0.0.1&lt;/code&gt;. The gateway is the bridge: it speaks OpenAI's HTTP dialect on one side and AgentFM's libp2p protocols on the other.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8080/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anything&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;llama3.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Draft a 500-word leave policy.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What actually happens between those lines:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The local gateway gets the OpenAI request.&lt;/li&gt;
&lt;li&gt;It looks up which workers are advertising &lt;code&gt;llama3.2&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;It picks the least-loaded one.&lt;/li&gt;
&lt;li&gt;It opens a direct, encrypted peer-to-peer tunnel to that worker. (If both sides are behind NAT, libp2p coordinates a hole-punch through the relay — the kind of trick BitTorrent and IPFS pioneered.)&lt;/li&gt;
&lt;li&gt;It ships the prompt, drains the worker's container stdout straight into your HTTP response.&lt;/li&gt;
&lt;li&gt;Server-Sent Events frames go out exactly as OpenAI clients expect.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;LangChain doesn't see any of this. LlamaIndex doesn't see any of this. Open WebUI just sees another OpenAI-compatible model dropdown to populate.&lt;/p&gt;

&lt;p&gt;Files the agent drops into &lt;code&gt;/tmp/output&lt;/code&gt; get zipped and shipped back automatically too. The agent writes a PDF, a generated image, a CSV — and it lands on the client's disk in &lt;code&gt;./agentfm_artifacts/&lt;/code&gt;. No SDK call, no callback, no decorator. Just write the file.&lt;/p&gt;




&lt;h2&gt;
  
  
  The composability surprise
&lt;/h2&gt;

&lt;p&gt;The thing I didn't expect when starting this: once the OpenAI dialect was working, &lt;em&gt;every existing AI tool just worked&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;LangChain agents that call &lt;code&gt;model="gpt-4"&lt;/code&gt; against the standard endpoint? Change two strings, they call into your friend's GPU instead. n8n workflow nodes that POST to OpenAI's API? Point them at &lt;code&gt;127.0.0.1:8080/v1&lt;/code&gt; and they're now dispatching across a peer-to-peer grid. A Continue extension in VS Code? It just sees another backend.&lt;/p&gt;

&lt;p&gt;It cuts the other direction too. A "planner" agent running inside someone's worker container can itself POST to the same OpenAI-compatible endpoint to fan out sub-tasks across other workers in the mesh. &lt;strong&gt;Agent of agents, peer to peer, no central coordinator at any layer.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This isn't because AgentFM is special. It's because OpenAI's wire format won the protocol war. Adopting it means the entire ecosystem becomes a free distribution channel for whatever runs underneath.&lt;/p&gt;

&lt;p&gt;The decentralized compute layer was the part that needed building. The integration layer was already there, waiting.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's underneath, briefly
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The mesh runs on &lt;strong&gt;&lt;a href="https://libp2p.io" rel="noopener noreferrer"&gt;libp2p&lt;/a&gt;&lt;/strong&gt; — same networking stack that powers IPFS, Ethereum, Filecoin. NAT punching, peer discovery, end-to-end encryption all come for free.&lt;/li&gt;
&lt;li&gt;Workers are &lt;strong&gt;identified by Ed25519 public keys&lt;/strong&gt;, not IPs. A worker's home router can flip its IP at 3am — the peer ID stays stable and the mesh rediscovers it.&lt;/li&gt;
&lt;li&gt;Containers run in &lt;strong&gt;Podman&lt;/strong&gt; sandboxes tied to the libp2p stream's lifecycle. If the boss disappears mid-task, the container is killed within 2 seconds. No wasted compute.&lt;/li&gt;
&lt;li&gt;Want privacy? &lt;strong&gt;&lt;code&gt;agentfm -mode genkey&lt;/code&gt;&lt;/strong&gt; spits out a 256-bit pre-shared key. Nodes that share it form a closed darknet, invisible to the public mesh. Useful for "my office laptop and my home GPU box, nothing else."&lt;/li&gt;
&lt;li&gt;Everything is observable. &lt;strong&gt;Prometheus metrics&lt;/strong&gt; on &lt;code&gt;/metrics&lt;/code&gt;, structured &lt;code&gt;slog&lt;/code&gt; JSON logs, six metric families per role. Drop it into your existing Grafana stack and you have visibility across the whole grid.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where to look
&lt;/h2&gt;

&lt;p&gt;The whole thing is open source: &lt;strong&gt;&lt;a href="https://github.com/Agent-FM/agentfm-core" rel="noopener noreferrer"&gt;github.com/Agent-FM/agentfm-core&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Hello World in the README boots a worker running Llama 3.2 and dispatches a task to it in about five minutes if you already have Podman and Ollama installed.&lt;/p&gt;

&lt;p&gt;The most fun part to read, if you're curious about how the OpenAI bridge works, is &lt;code&gt;agentfm-go/internal/boss/openai.go&lt;/code&gt; — that's where the wire-format translation happens. About 300 lines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why it matters
&lt;/h2&gt;

&lt;p&gt;We've spent the last two years renting GPUs from three companies. Meanwhile we collectively own more compute than those companies have, sitting idle, waiting for nothing.&lt;/p&gt;

&lt;p&gt;The interesting question isn't "can we build distributed AI infrastructure?" — we obviously can. The interesting question is "can we make it integrate so cleanly that nobody has to choose between &lt;em&gt;the convenient OpenAI API&lt;/em&gt; and &lt;em&gt;running on hardware they actually trust&lt;/em&gt;?"&lt;/p&gt;

&lt;p&gt;Two strings — &lt;code&gt;base_url&lt;/code&gt; and &lt;code&gt;api_key&lt;/code&gt; — turn out to be enough.&lt;/p&gt;

&lt;p&gt;The gaming PC in the bedroom is waiting.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/Agent-FM/agentfm-core" rel="noopener noreferrer"&gt;Agentfm-Github&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>p2p</category>
      <category>llm</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
