Daniil Koryto for GonkaGate

Posted on Feb 3

Cut LLM API costs with a decentralized inference network

#ai #opensource #api #productivity

LLM bills can climb fast, especially when you're shipping an MVP, an internal tool, or a workflow-heavy app. We hit that wall and started building GonkaGate to make open-source inference usable without a full rewrite.

This is a practical walkthrough. We'll show how it works, how to migrate with minimal code changes, and where the trade-offs are real.

Disclosure: We are the team behind GonkaGate.

What you'll learn

Why centralized LLM APIs are expensive
How decentralized inference changes the cost structure
How to switch from the OpenAI SDK with a simple base URL change
When this approach fits and when it doesn't

Why LLM APIs are expensive

Centralized providers have a cost stack that adds up:

Infrastructure (top-tier GPUs cost tens of thousands of dollars per card)
Frontier-model R&D
Enterprise sales and compliance
Margin

Pricing changes frequently, so here are the official pages:

The alternative is self-hosting open-source models, and that has its own pain:

Renting and maintaining GPU clusters
CUDA, drivers, and dependency setup
Scaling, load balancing, failover
Ongoing infra maintenance

For small teams, both options can hurt in different ways.

A third path: decentralized inference

Gonka is a decentralized network for open-source model inference. The public network tracker shows about ~5.4k H100-equivalent GPUs as of February 03, 2026. In December 2025, Bitfury announced a $50M investment in Gonka as part of its $1B program for decentralized AI projects.

How the price drops

Compute is nearly fully utilized. Traditional PoW chains burn a lot of compute on consensus. Gonka uses a mechanism called Sprint (Transformer-based Proof-of-Work), described in the whitepaper. The work is closer to LLM inference than to hashing.
Distributed GPU hosts. Hardware owners (individual and enterprise) contribute idle compute and earn rewards. The network aggregates existing capacity instead of building centralized data centers. A wide range of GPUs is supported (H100/H200, A100) with a 48 GB VRAM minimum.
Dynamic on-chain pricing. Pricing follows network load .
Open-source model ecosystem. The model list changes as operators join the network. New models appear as they are added.

What GonkaGate does

GonkaGate is an API gateway to the Gonka network. You pay in USD, and the integration is compatible with the OpenAI SDK. In most cases, you just swap the base URL and API key.

Available models

These are open-source models, not frontier proprietary ones:

Model	Context	Best for
`qwen/qwen3-235b-a22b-instruct-2507-fp8`	262K	Complex reasoning, code

Honest take: qwen3-235b is strong at code, summarization, and reasoning. It still lags frontier models on nuanced creative writing and the hardest multi-step tasks. Test on your real use cases.

Why it's interesting: open-source means transparency and no vendor lock-in, and the list grows as more operators join.

Migrate from OpenAI SDK (minimal changes)

If you're already using the OpenAI SDK, you only change the endpoint and key.

Python

from openai import OpenAI

# Before: OpenAI
# client = OpenAI(api_key="sk-...")

# After: GonkaGate
client = OpenAI(
    base_url="https://api.gonkagate.com/v1",
    api_key="your-gonkagate-key"
)

response = client.chat.completions.create(
    model="qwen/qwen3-235b-a22b-instruct-2507-fp8",
    messages=[
        {"role": "user", "content": "Explain recursion in simple terms"}
    ]
)

print(response.choices[0].message.content)

JavaScript / Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.gonkagate.com/v1',
  apiKey: process.env.GONKAGATE_KEY,
});

const response = await client.chat.completions.create({
  model: 'qwen3-235b-a22b-instruct-2507-fp8',
  messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13' }],
});

console.log(response.choices[0].message.content);

curl

curl https://api.gonkagate.com/v1/chat/completions \
  -H "Authorization: Bearer $GONKAGATE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-235b-a22b-instruct-2507-fp8",
    "messages": [{"role": "user", "content": "Hello, world!"}]
  }'

Supported: streaming responses, chat completions API, and standard OpenAI SDK methods.

n8n and workflow automation

If you use n8n, GonkaGate is a good fit. n8n allows custom base URLs in OpenAI credentials:

Set base URL to https://api.gonkagate.com/v1.
Add your GonkaGate API key.
Use any AI nodes (Chat, Agent, etc.) with lower costs.

Why this matters: most n8n workflows are simple tasks (summarization, classification, extraction, basic Q&A). You don't need frontier-level reasoning for that.

With hundreds or thousands of daily workflow runs, the price difference adds up quickly.

The numbers

Current pricing is about $0.0021 per 1M tokens (input + output) for all models at the time of writing. See the pricing page for details and fees.

For comparison, here are the official pricing pages of centralized providers:

Simple math: if you currently pay $X per 1M tokens, your GonkaGate budget is roughly current_budget * (0.0021 / X).

Important: These savings are illustrative. Real costs depend on model choice, traffic patterns, and current network load (supply/demand).

Tip: Use the pricing calculator on the pricing page to estimate your monthly spend based on your actual traffic profile.

When it fits (and when it doesn't)

Criterion	GonkaGate	Centralized providers
Budget	Tight	Flexible
Model quality	Good enough for the task	Frontier-level
Use case	MVPs, internal tools, n8n workflows	Production, enterprise, critical features
Models	Prefer open-source	Proprietary is OK
Stability	Occasional hiccups acceptable	High uptime required
Vendor lock-in	Want to avoid it	Not a priority

If you're closer to the left column, try it. If you need strict SLAs and top-tier quality, stick with centralized providers.

Limitations and risks

Early-stage network. Gonka is new. Instability is possible. If your app is mission-critical, plan accordingly.
Open-source model ceiling. qwen3-235b is strong, but it's not a frontier proprietary model. Some tasks will show a gap.

Wrap up

If you want a quick way to reduce LLM spend without a big rewrite:

Sign up at GonkaGate
Get an API key
Swap the endpoint in your code
Test on your real use cases

Questions about GonkaGate or decentralized inference? Drop them in the comments.

DEV Community