LLM bills can climb fast, especially when you're shipping an MVP, an internal tool, or a workflow-heavy app. We hit that wall and started building GonkaGate to make open-source inference usable without a full rewrite.
This is a practical walkthrough. We'll show how it works, how to migrate with minimal code changes, and where the trade-offs are real.
Disclosure: We are the team behind GonkaGate.
What you'll learn
- Why centralized LLM APIs are expensive
- How decentralized inference changes the cost structure
- How to switch from the OpenAI SDK with a simple base URL change
- When this approach fits and when it doesn't
Why LLM APIs are expensive
Centralized providers have a cost stack that adds up:
- Infrastructure (top-tier GPUs cost tens of thousands of dollars per card)
- Frontier-model R&D
- Enterprise sales and compliance
- Margin
Pricing changes frequently, so here are the official pages:
The alternative is self-hosting open-source models, and that has its own pain:
- Renting and maintaining GPU clusters
- CUDA, drivers, and dependency setup
- Scaling, load balancing, failover
- Ongoing infra maintenance
For small teams, both options can hurt in different ways.
A third path: decentralized inference
Gonka is a decentralized network for open-source model inference. The public network tracker shows about ~5.4k H100-equivalent GPUs as of February 03, 2026. In December 2025, Bitfury announced a $50M investment in Gonka as part of its $1B program for decentralized AI projects.
How the price drops
- Compute is nearly fully utilized. Traditional PoW chains burn a lot of compute on consensus. Gonka uses a mechanism called Sprint (Transformer-based Proof-of-Work), described in the whitepaper. The work is closer to LLM inference than to hashing.
- Distributed GPU hosts. Hardware owners (individual and enterprise) contribute idle compute and earn rewards. The network aggregates existing capacity instead of building centralized data centers. A wide range of GPUs is supported (H100/H200, A100) with a 48 GB VRAM minimum.
- Dynamic on-chain pricing. Pricing follows network load .
- Open-source model ecosystem. The model list changes as operators join the network. New models appear as they are added.
What GonkaGate does
GonkaGate is an API gateway to the Gonka network. You pay in USD, and the integration is compatible with the OpenAI SDK. In most cases, you just swap the base URL and API key.
Available models
These are open-source models, not frontier proprietary ones:
| Model | Context | Best for |
|---|---|---|
qwen/qwen3-235b-a22b-instruct-2507-fp8 |
262K | Complex reasoning, code |
Honest take:
qwen3-235bis strong at code, summarization, and reasoning. It still lags frontier models on nuanced creative writing and the hardest multi-step tasks. Test on your real use cases.
Why it's interesting: open-source means transparency and no vendor lock-in, and the list grows as more operators join.
Migrate from OpenAI SDK (minimal changes)
If you're already using the OpenAI SDK, you only change the endpoint and key.
Python
from openai import OpenAI
# Before: OpenAI
# client = OpenAI(api_key="sk-...")
# After: GonkaGate
client = OpenAI(
base_url="https://api.gonkagate.com/v1",
api_key="your-gonkagate-key"
)
response = client.chat.completions.create(
model="qwen/qwen3-235b-a22b-instruct-2507-fp8",
messages=[
{"role": "user", "content": "Explain recursion in simple terms"}
]
)
print(response.choices[0].message.content)
JavaScript / Node.js
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.gonkagate.com/v1',
apiKey: process.env.GONKAGATE_KEY,
});
const response = await client.chat.completions.create({
model: 'qwen3-235b-a22b-instruct-2507-fp8',
messages: [{ role: 'user', content: 'Solve: 2x + 5 = 13' }],
});
console.log(response.choices[0].message.content);
curl
curl https://api.gonkagate.com/v1/chat/completions \
-H "Authorization: Bearer $GONKAGATE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-235b-a22b-instruct-2507-fp8",
"messages": [{"role": "user", "content": "Hello, world!"}]
}'
Supported: streaming responses, chat completions API, and standard OpenAI SDK methods.
n8n and workflow automation
If you use n8n, GonkaGate is a good fit. n8n allows custom base URLs in OpenAI credentials:
- Set base URL to
https://api.gonkagate.com/v1. - Add your GonkaGate API key.
- Use any AI nodes (Chat, Agent, etc.) with lower costs.
Why this matters: most n8n workflows are simple tasks (summarization, classification, extraction, basic Q&A). You don't need frontier-level reasoning for that.
With hundreds or thousands of daily workflow runs, the price difference adds up quickly.
The numbers
Current pricing is about $0.0021 per 1M tokens (input + output) for all models at the time of writing. See the pricing page for details and fees.
For comparison, here are the official pricing pages of centralized providers:
Simple math: if you currently pay $X per 1M tokens, your GonkaGate budget is roughly current_budget * (0.0021 / X).
Important: These savings are illustrative. Real costs depend on model choice, traffic patterns, and current network load (supply/demand).
Tip: Use the pricing calculator on the pricing page to estimate your monthly spend based on your actual traffic profile.
When it fits (and when it doesn't)
| Criterion | GonkaGate | Centralized providers |
|---|---|---|
| Budget | Tight | Flexible |
| Model quality | Good enough for the task | Frontier-level |
| Use case | MVPs, internal tools, n8n workflows | Production, enterprise, critical features |
| Models | Prefer open-source | Proprietary is OK |
| Stability | Occasional hiccups acceptable | High uptime required |
| Vendor lock-in | Want to avoid it | Not a priority |
If you're closer to the left column, try it. If you need strict SLAs and top-tier quality, stick with centralized providers.
Limitations and risks
- Early-stage network. Gonka is new. Instability is possible. If your app is mission-critical, plan accordingly.
-
Open-source model ceiling.
qwen3-235bis strong, but it's not a frontier proprietary model. Some tasks will show a gap.
Wrap up
If you want a quick way to reduce LLM spend without a big rewrite:
- Sign up at GonkaGate
- Get an API key
- Swap the endpoint in your code
- Test on your real use cases
Questions about GonkaGate or decentralized inference? Drop them in the comments.
Top comments (0)