DEV Community

Aman Sachan
Aman Sachan

Posted on

I built a three-tier monorepo for a peer-to-peer GPU grid — here is why the API is the loneliest part

I built a three-tier monorepo for a peer-to-peer GPU grid — here's why the API is the loneliest part

There are a lot of "decentralized compute" projects. Most of them have a slick landing page, a token whitepaper, and a backend that quietly turns out to be three cron jobs on a single VM. I wanted to build one where the boring infrastructure is also open source, deployable today, and small enough for one person to actually understand end-to-end. That's Sarva — a 200-line FastAPI hub, a Next.js 14 dashboard, and a 12-dependency Python agent that you can run on a laptop with pip install.

Here's what each tier actually does, and the design calls that made it possible to keep all three in one repo without the monorepo rotting.

The shape of the system

┌──────────────┐       poll        ┌────────────────────────────┐
│  SarvaNode   │ ◄──────────────►  │  Sarva API (FastAPI+Neon)  │
│  (Python)    │   heartbeat       │  /jobs/next, /jobs/complete│
│              │   job claim       │  /credits/*, /nodes/*      │
└──────────────┘                   └────────────────────────────┘
        │                                       ▲
   runs job locally                              │ submit / monitor
        ▼                                       │
┌──────────────┐                   ┌────────────────────────────┐
│  ML output   │                   │  Sarva Web (Next.js 14)    │
│  / log / png │                   │  Vercel · credits UI ·     │
└──────────────┘                   │  node map · leaderboard    │
                                   └────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The split is intentional: the API knows nothing about ML, the node runs anything that can be expressed as a script, and the web is a pure read/write client. That last rule is what kept me from painting myself into a corner — there's no business logic in the Next.js app at all.

Backend: FastAPI + SQLAlchemy + Neon

The hub is one file (main.py, ~340 lines including comments) and five SQLAlchemy models. I use Neon for Postgres because I didn't want a docker-compose.yml to be a deployment prerequisite; DATABASE_URL is the only env var the API cares about.

Pricing logic lives in three dicts at module top:

GPU_MULT = {
    "rtx-4090": 3.0, "rtx-5090": 3.0, "rtx-3090": 2.5,
    "rtx-4070": 2.5, "rtx-3060": 2.0, "rtx-2070": 2.0,
    "gtx-1080ti": 1.5, "gtx-1080": 1.5, "gtx-1660": 1.3, "cpu": 0.8
}
GEO_RATE = {"in": 0.7, "india": 0.7, "us": 1.0, "uk": 1.0, "eu": 0.95}
PLATFORM_FEE = 0.20
CASHIOUT_MIN = 500.0
Enter fullscreen mode Exit fullscreen mode

A 4090 in Mumbai earns 2.1× what a 1080ti in London earns on the same job. That isn't a tax on geography — it's a recognition that an Indian contributor can buy 3× more compute hour per rupee than a US one, and the platform's job is to surface that arbitrage honestly.

Job dispatch is naive by design:

@app.get("/jobs/next")
def next_job(node_id: str, db: Session = Depends(get_db)):
    node = db.query(Node).filter(Node.id == node_id).first()
    if not node:
        raise HTTPException(status_code=404, detail="Node not found")
    job = (db.query(Job)
             .filter(Job.status == JobStatus.PENDING)
             .order_by(Job.priority.desc(), Job.created_at.asc())
             .first())
    job.status = JobStatus.ASSIGNED
    job.assigned_node_id = node_id
    node.status = NodeStatus.BUSY
    db.commit()
    return {"job": {"id": job.id, "type": job.type, "script": job.script, "slices": job.slices}}
Enter fullscreen mode Exit fullscreen mode

First-come-first-served by priority, no scheduling cleverness, no GPU-affinity matching yet. The job of v0.1 is to prove that a node can claim work, run it, and get credited for it — adding a smarter scheduler on top of broken plumbing just hides the real bugs.

audit(db, "job_assigned", {...}) writes to an AuditLog table on every state transition. Three months from now, when someone asks "why was I underpaid last Tuesday", I want the answer to be a SQL query, not a guess.

Node agent: 12 dependencies, cross-platform

The node is the part that has to work on a contributor's actual machine — Windows, macOS, Linux, WSL, a 2015 ThinkPad, a brand-new 4090 rig. The dependency list is the worst-case compat matrix:

requests>=2.31.0
psutil>=5.9.8
Enter fullscreen mode Exit fullscreen mode

That's it. GPU detection falls back from GPUtil (cleanest) to nvidia-smi (universal on Linux) to None, cpu (everything else). The agent sniffs its own hardware at startup, claims the right tier, and goes.

def get_gpu_info():
    try:
        import GPUtil
        gpus = GPUtil.getGPUs()
        if gpus:
            g = gpus[0]
            name, vram = g.name, round(g.memoryTotal / 1024, 1)
            if "4090" in name: return "rtx-4090", vram
            # ...
            return "gpu", vram
    except ImportError:
        pass
    if platform.system() == "Linux":
        result = subprocess.run(
            ["nvidia-smi", "--query-gpu=name,memory.total", "--format=csv,noheader"],
            capture_output=True, text=True, timeout=5,
        )
        # ... same name-to-tier mapping
Enter fullscreen mode Exit fullscreen mode

The polling loop is time.sleep(POLL_INTERVAL) between requests.get("/jobs/next", ...). Not async, not threaded, not clever. If the node has 32 cores, only one is in use — that's a known limitation, and the v0.2 plan is to make job execution a subprocess so the polling thread isn't blocked by compute. For now, simplicity wins.

Frontend: Next.js 14, no business logic

The web tier is the strictest in the repo: it only calls the API. No credit calculations, no node scoring, no priority math. The reason is that every bug I've ever shipped in a distributed system came from a place where business logic was duplicated across two tiers. If you want to compute "what will I earn if I submit this 4090 job", the API tells you. The web just renders.

Stack:

  • Next.js 14 (App Router)
  • TypeScript
  • Tailwind CSS
  • React 18

Vercel auto-deploys from main. The dev server doesn't share state with the API — they're separate processes that happen to share a DATABASE_URL and a HUB_URL.

Why a monorepo and not three repos

I almost split this into sarva-api, sarva-web, and sarva-node. The reason I didn't: breaking changes need to land in all three at once. A new /jobs/submit parameter is a backend, frontend, and node change. If those live in different repos with different release cadences, you end up with a node agent from January hitting an API from March and a web from December, and nobody can reproduce the bug.

The monorepo makes that impossible. One git log shows the entire history of a feature.

sarva/
├── backend/    # FastAPI + SQLAlchemy + Neon
├── frontend/   # Next.js 14 + Tailwind
└── node/       # Python agent (psutil + requests)
Enter fullscreen mode Exit fullscreen mode

What's not done

I won't pretend this is production-grade. Things I know are missing:

  • Gaming job type — the schema supports it, but no node runs it yet.
  • gVisor sandboxing — nodes run submitted scripts as the user. Don't run this on a machine you care about yet.
  • Reputation decay — a node that's been online 30 days earns the same as a node that just registered. That should change.
  • Multi-tenant auth — currently user_id == "god" is admin. Don't ship that past 10 users.

Try it

If you run the node on a 4090 and see credits flow in, open an issue — I'd like to know what GPU detection corner case you hit.


Three tiers, one repo, no token. Built to be forked, not funded.

Python #FastAPI #NextJS #Distributed #OpenSource #GPU

Top comments (0)