DEV Community: Andrew S. Bandy

Your API Key Can't See Its Own Usage - So We Fixed That in GoModel

Andrew S. Bandy — Wed, 08 Jul 2026 14:10:18 +0000

Every AI coding agent, plugin, and status bar eventually wants to answer one small question for its user: how much have I used, and when do I get cut off?

You'd expect a one-liner for that. GET /v1/usage, send the same API key you use for inference, get a number back. We went looking for the standard way to do this — across OpenAI, Anthropic, Codex, Claude Code, OpenCode, and every major AI gateway — because we wanted GoModel to follow it.

Here's the punchline: there is no standard. There isn't even a common convention. What we found instead was a landscape of admin-only reporting APIs, private endpoints reserved for first-party tools, and community plugins literally scraping HTML dashboards with browser cookies.

What the big providers give you: nothing (on purpose)

OpenAI has a real Usage API - GET /v1/organization/usage/completions, plus a Costs API. It's well designed: paginated time buckets, group-by model or API key, cached-token breakdowns. But it requires a separate Admin API key that only an org owner can mint, and that admin key can't do inference. The key your app actually holds can't ask about itself at all. The old self-service endpoints people remember (GET /v1/usage?date=..., /v1/dashboard/billing/usage) were never official and have been locked to browser sessions since 2023.

Anthropic made the same call. Usage and cost reports live under /v1/organizations/usage_report/* and need an sk-ant-admin... key. They even ship a read-only Rate Limits API that names "sync your gateway with your org limits" as the use case — and gated that behind the admin key too.

With a regular key, both vendors give you exactly one thing: rate-limit response headers (x-ratelimit-remaining-tokens, anthropic-ratelimit-requests-remaining, ...) on each inference call. Useful, but reactive — you learn your headroom as a side effect of spending it, and you learn nothing about budgets or historical usage.

Meanwhile, their own tools poll private endpoints

The funny part: both companies clearly know consumers need this, because their own products have it — privately.

Codex polls chatgpt.com/backend-api/wham/usage (OAuth-only, undocumented) about once a minute to render its usage display. The response is percent-of-window: {"primary_window": {"used_percent": 12, "reset_at": ...}}.
Claude Code's /usage command calls api.anthropic.com/api/oauth/usage — also undocumented, also OAuth-only, and it even rate-limits you into oblivion if you don't send the right User-Agent. Community usage monitors reverse-engineered it anyway.

So the pattern at the top of the market is: self-service usage exists, but only for our own clients, on internal endpoints you're not supposed to use.

And the tools downstream are hurting

The demand side is not subtle. The example that motivated this work: OpenCode — the open-source coding agent — has a gateway (Zen) with real usage windows and balances, and no API for any of it. The result:

A community plugin (opencode-quota) scrapes the billing dashboard's HTML using browser cookie auth to display quota percentages.
There are open issues asking for GET /zen/v1/balance and usage-window endpoints, and an open proposal for GET /zen/go/v1/usage returning {"windows": [{"name": "5 hour", "usagePercent": 42, "resetInSec": 4711}]}.
Ollama Cloud users are filing the same issues (/api/me, /api/usage) for the same reason.

When your users are scraping your dashboard with cookies, they have told you what API they want.

The closest thing to a convention

Two gateways actually shipped self-service introspection, and one got close:

OpenRouter: GET /api/v1/key with your normal inference key returns usage, usage_daily/weekly/monthly, limit, limit_remaining, limit_reset. This is the de-facto reference design.
Requesty: GET /v1/manage/apikey/self returns monthly_spend and monthly_limit with the same bearer key.
LiteLLM: GET /key/info with no parameter defaults to the calling key and returns spend, max_budget, budget_reset_at, tpm_limit, rpm_limit.

Distill those and you get the only "standard" that exists — a design pattern, not a schema: GET, same base URL, same bearer key, self-referential, no parameters required. Plus rate-limit headers on inference responses as the complementary channel.

How GoModel does it

GoModel now answers the whole question in one call. GET /v1/usage, authenticated with the same key you use for chat completions:

curl http://localhost:8080/v1/usage \
  -H "Authorization: Bearer sk_gom_..."

{
  "user_path": "/team/alpha",
  "server_time": "2026-07-07T09:30:00Z",
  "usage": {
    "start_date": "2026-06-08",
    "end_date": "2026-07-07",
    "total_requests": 128,
    "total_tokens": 91234,
    "cached_input_tokens": 12790,
    "total_cost": 1.73
  },
  "budgets": [
    {
      "user_path": "/team/alpha",
      "period_label": "monthly",
      "amount": 100,
      "spent": 41.2,
      "remaining": 58.8,
      "usage_ratio": 0.412,
      "period_end": "2026-08-01T00:00:00Z",
      "resets_in_seconds": 2125800,
      "exceeded": false
    }
  ],
  "rate_limits": [
    {
      "user_path": "/team/alpha",
      "period_label": "minute",
      "max_requests": 60,
      "requests_used": 3,
      "requests_remaining": 57,
      "requests_usage_ratio": 0.05,
      "window_end": "2026-07-07T09:31:00Z",
      "resets_in_seconds": 60,
      "exhausted": false
    }
  ]
}

Three blocks, three questions answered:

usage — requests, tokens (with cache breakdowns), and estimated cost over a date window you control (days, or start_date/end_date, up to 365 days; last 30 by default).
budgets — every spend limit that gates you, including ones inherited from parent paths, with spent, remaining, the period bounds, and an exceeded flag that mirrors what enforcement will actually do.
rate_limits — live counters for every rate-limit rule covering you: used, remaining, and when the window resets.

A few design decisions worth explaining:

It's scoped by user path, not by key. In GoModel, budgets, rate limits, and usage accounting all hang off a hierarchical user_path (like /team/alpha/service), and API keys bind to a path. Several keys can share one path, and when we add a full users system, a user will simply be a path. Every self-service precedent out there is key-centric; ours is identity-centric — but from the caller's perspective it behaves identically: send your key, get your own status, and you can't read outside your subtree.

Polling is free. The endpoint doesn't consume rate-limit quota, so a status widget checking every few seconds never eats into the budget it's displaying. (Codex polls its internal endpoint every 60 seconds for the same reason.)

Absolute numbers and the percentages. The consumer-facing idiom at OpenAI and Anthropic is percent-of-window (used_percent, utilization) — deliberately opaque about actual quotas. We return real counts and real money with real window boundaries, and then precompute the display idiom anyway. Every budget carries a usage_ratio, a resets_in_seconds countdown, and an exceeded flag. Every rate limit carries an exhausted flag, a usage ratio per limited dimension (requests_usage_ratio, tokens_usage_ratio), and — for windowed rules — the same countdown; a concurrency cap has no window, so it reports in-flight slots instead. Countdowns are relative to the server_time in the same payload, so client clock skew can't break them, and both flags mirror what enforcement will actually do. A status bar renders this with zero math; a plugin that wants the raw numbers has them too.

Headers still work. GoModel keeps emitting OpenAI-style x-ratelimit-* headers on inference responses, so the endpoint and the in-band signals tell one consistent story.

The endpoint is in GoModel today, and it exists because someone asked for exactly the OpenCode-plugin use case described above. Full request and response reference — every field, the date window, the error cases — is in the GoModel Usage API docs.

The takeaway

Usage visibility has quietly become admin-only across the LLM ecosystem — not because it's hard, but because nobody made it part of the API contract. The result is a cottage industry of OAuth token borrowing, dashboard scraping, and GitHub issues asking for the same three fields: used, limit, resets at.

If you run a gateway, your callers shouldn't need admin access to know when they're about to hit a wall. Give them one GET.

Benchmarking AI Gateways: GoModel vs LiteLLM vs Portkey vs Bifrost

Andrew S. Bandy — Fri, 26 Jun 2026 17:51:26 +0000

In October 2025 I tried to build my startup on top of LiteLLM.

At first it looked like the obvious choice. It supported many providers, it had
an OpenAI-compatible API, and it was already used by a lot of people. I did not
want to write an AI gateway. I wanted to build the product behind it.

Then I started running it on the hot path.

My opinion changed there.

A gateway is not a dashboard or integration glue you call once in a while. It
sits on every request, every retry, every stream, every tool call, every
fallback, every timeout.

A heavy gateway charges rent forever.

Most AI gateway comparisons miss that part. They talk about provider count,
dashboards, tracing, and "support for 1000+ models". Those things matter, but
they are not free. Before the gateway calls OpenAI, Anthropic, Gemini, vLLM, or
anything else, it has already spent your CPU, memory, cold-start time, and
operational budget.

I am not comparing full product maturity here. I am comparing how these gateways
behave on the hot path.

So I started writing GoModel: a small
open-source AI gateway and AI control plane in Go, with an OpenAI-compatible API
and explicit provider adapters.

When I launched GoModel on Hacker News,
I promised a real, reproducible benchmark. This article is that follow-up.

The benchmark question is simple:

How lean is each AI gateway when it sits on the request path?

That question runs through the whole benchmark: GoModel vs LiteLLM vs Portkey vs
Bifrost, measured by latency, throughput, memory, CPU, cold start, and image
size rather than landing pages or feature matrices.

The runtime footprint matters

Latency gets the easiest arguments. It rarely tells the whole story.

Most real LLM calls are dominated by inference time. If a model takes 2000 ms
to answer, the difference between 5 ms and 15 ms of proxy overhead is not
the main story.

The main story is the deployment envelope:

How much RAM does the gateway need under load?
How much CPU does it burn per request?
How many requests can it serve per core?
How fast does it cold-start?
How large is the Docker image?
Can you run it as a sidecar, on a small VM, in serverless, or near local models?
Is the core gateway actually open-source?

Those numbers decide whether the gateway can run where you want it to run.

A 372 MB compressed image (1.2 GB unpacked) that idles around gigabytes of
RAM and takes 25 s to cold-start is a different operational thing than a
16 MB image that peaks at 37 MB of RAM and is serving traffic 0.56 s after
launch.

So I care about the runtime footprint.

What this benchmark does not prove

This benchmark does not prove that one gateway is best for every company.

I am not measuring:

bug counts or overall correctness
semantic cache quality
tracing UI quality
guardrail quality
admin dashboards
long-term provider maintenance
every possible provider-specific feature
total provider count

Those things matter. Some of them matter a lot.

LiteLLM in particular has more integrated providers and more gateway features
than GoModel today. If your first requirement is maximum provider coverage right
now, LiteLLM has a real advantage. This benchmark does not erase that. It
measures the runtime footprint of putting each gateway on the request path. In
practice, many smaller or newer providers already expose an OpenAI-compatible
API, so provider count is not always the same as practical routing coverage.

The benchmark measures one narrower thing: runtime and deployment overhead on
the request path.

That still matters, because the gateway is on the hot path. If you run high
request volume, local models, serverless workloads, edge workloads, or many small
model calls, the overhead stops being theoretical.

AI gateway benchmark setup

I tested four AI gateways people actually compare:

GoModel
LiteLLM
Portkey
Bifrost

Every gateway talked to the same instant mock backend, on purpose. I did not
want to benchmark OpenAI, Anthropic, AWS networking, or random internet jitter.
I wanted to isolate the gateway itself.

Each gateway ran one at a time, in Docker, on an AWS c7i.large with
2 vCPU and 4 GiB RAM, running the latest Amazon Linux 2023 AMI. The whole
thing is Terraform'd, runs with one command, and tears itself down afterwards.

I first ran this on a free-tier t2.micro. That was cheap and easy to
reproduce, but unfair to the heavier gateways. A 1 GiB machine cannot hold a
gateway that wants gigabytes of memory, so it starts swapping. At that point you
are benchmarking the host being too small.

So I moved to c7i.large: still small, but non-burstable and large enough that
nothing swaps. It also makes the LiteLLM setup more honest. LiteLLM recommends
one worker per vCPU, and this machine has 2 vCPUs, so LiteLLM gets 2
workers. That gives it the multi-core access it is supposed to have instead of
pinning it to a single worker on a tiny box.

The test covered six workloads:

chat completions, non-streaming
chat completions, streaming
Responses API, non-streaming
Responses API, streaming
Anthropic messages, non-streaming
Anthropic messages, streaming

Each workload used 8,000 requests at concurrency 10, across two trials
with randomized gateway order. Latency is the median across trials, and I
report p99 with its min-max range so one noisy window cannot tell the whole
story.

I would not call this a statistically exhaustive study. It is a reproducible
engineering benchmark, and the harness is public so people can rerun it, change
the machine, or add their own workloads.

A few details matter if you want to reproduce or criticize the numbers:

Throughput is measured, not inferred. The latency runs report completed-req/s at fixed concurrency, but real capacity comes from a separate concurrency sweep that drives each gateway to saturation.
Every dialect is warmed up before measurement. LiteLLM lazily imports some per-dialect translation code on first use. A chat-only warmup made its Responses and Messages paths look worse than they should. I warmed up all dialects to avoid that.
Retries are disabled for all gateways. I also disabled GoModel's circuit breaker for this benchmark. In production, rejecting traffic after upstream trouble is the right behavior. In a saturation benchmark, it would make the throughput number unfairly low.
LiteLLM runs with its recommended worker count. A LiteLLM worker is effectively single-threaded, and its production guidance is one worker per vCPU. On this box that means 2 workers.
Streaming uses terminal-marker or idle-gap detection. If a gateway streams content but never sends a terminal event, the harness measures to last byte instead of hanging forever.

GoModel vs LiteLLM vs Portkey vs Bifrost

Representative latency is chat completions, non-streaming. All resource figures
are measured under load on the same box.

Metric	GoModel	Bifrost	Portkey	LiteLLM
Runtime	Go	Go	Node.js	Python
Latency overhead `p50`	`1.8 ms`	`2.5 ms`	`9.7 ms`	`30.6 ms`
Latency `p99`	`6.9 ms`	`18.3 ms`	`30.5 ms`	`39.3 ms`
Throughput (sustained)	`4900 req/s`	`3100 req/s`	`950 req/s`	`324 req/s`
Peak RAM under load	`37 MB`	`143 MB`	`112 MB`	`2.3 GB`
Efficiency (req/s per CPU %)	`52`	`25`	`8.2`	`2.6`
Cold start to first request	`0.56 s`	`7.1 s`	`1.1 s`	`25.5 s`
Docker image (compressed pull)	`16 MB`	`77 MB`	`59 MB`	`372 MB`
Workload coverage	`6/6`	`6/6`	`4/6`	`6/6`
Vendor-neutral core	Yes	Partial †	Yes	Yes
Core source available	Yes ‡	Partial ‡	Partial ‡	Yes

What stood out

GoModel had the lowest median latency and the tightest tail: 1.8 ms p50 and
6.9 ms p99.

Bifrost was close on median latency at 2.5 ms, which is a good result. The
gap opened at the tail and in memory: 18.3 ms p99 and 143 MB peak RAM under
load.

Portkey was heavier than I expected for this narrow proxy benchmark. It served
950 req/s sustained and used 112 MB peak RAM under load. In this setup it did
not serve the Anthropic /v1/messages dialect, so it gets 4/6 workload
coverage. Treat that as a setup limitation, not a claim that Portkey cannot
support Anthropic in a fuller virtual-key configuration.

LiteLLM was the outlier. At its recommended worker count, it used about
2.3 GB of RAM, cold-started in 25.5 s, and sustained 324 req/s.

Not because Python is morally bad. The language matters only when it changes the
deployment envelope. Here it does: memory floor, image size, cold-start time,
dependency graph, and throughput per core.

The later supply-chain incident around LiteLLM
also made me more confident in GoModel's design direction. A small Go binary
with a standard-library-heavy dependency tree is structurally less exposed to
that class of problem than a large Python dependency graph.

What AI gateway benchmarks do not capture

Forwarding JSON is not the hard part.

The hard part is provider drift.

OpenAI, Anthropic, Gemini, AWS Bedrock, Azure OpenAI, Groq, xAI, Cerebras, vLLM,
and local servers all disagree in small ways. Then they change those ways. Tool
calling changes. Streaming changes. Reasoning parameters change. Image inputs
change. Error formats change. Rate-limit semantics change.

An AI gateway or AI control plane has to absorb that without becoming magic.

GoModel's bet is not "support every model name on the internet".

The bet is:

support the providers people actually deploy
keep provider adapters explicit
accept OpenAI-compatible requests generously
translate only what needs translation
pass through what should stay provider-specific
return conservative OpenAI-compatible responses

For the same reason, GoModel starts as a small OpenAI-compatible gateway, not as
a dashboard with a proxy attached.

Why this matters for local models and vLLM

If all your traffic goes to a cloud model that takes several seconds to answer,
gateway overhead can look academic.

Local models change the math.

If you are routing through an AI gateway to vLLM, Ollama, LM Studio, llama.cpp,
or small specialized models on your own network, the model call can be much
faster. Then gateway overhead, cold starts, memory, and sidecar size matter more.

One reason I want GoModel to stay small: a gateway should be cheap enough to put
near the workload.

Notes on neutrality and open source

Bifrost is built by Maxim AI, an LLM
evaluation and observability platform. It routes to many model providers, but
the gateway also sits close to Maxim's eval and observability ecosystem. If you
want to choose your own eval platform, or stay independent from any eval
platform, ask whether Bifrost is the right match for you. Good software can
still have incentives attached. "Vendor-neutral" needs an asterisk here.

"Open-source" also needs care.

Portkey keeps observability storage, dashboard, multi-team RBAC, and at-scale
semantic caching in a closed managed tier. Bifrost's core gateway is Apache-2.0,
but its Enterprise edition adds closed or managed features. LiteLLM's proxy core
is MIT, but enterprise features like SSO, audit logs, and fine-grained access
control sit behind a proprietary commercial license.

GoModel is open-source today. Some enterprise-grade AI control plane features may
stay private. The core gateway is intended to remain useful without those private
features.

Reproduce it yourself

The benchmark is built to be self-verifiable. It provisions the AWS instance,
runs every gateway against the same backend, prints the tables, and destroys the
infrastructure.

Reproduce it yourself:

./run.sh

One caveat: it runs on paid AWS infrastructure, not the free tier. A
c7i.large is about $0.09/hour and the run self-destructs within an hour or
two, so budget under $1 per run to be safe.

If you pass KEEP=1 or teardown fails, you keep paying until you destroy the
box, so double-check the teardown.

Conclusion

I did not start GoModel because I wanted another AI gateway in the world.

I started it because the gateway I wanted to use became part of the problem. It
sat on the hot path, but did not feel like hot-path software: too heavy, too
slow to start, too expensive to keep around, too large for the job.

This benchmark is the result of turning that frustration into numbers.

The numbers say GoModel is small in the places I care about: 16 MB image,
37 MB peak RAM, 0.56 s cold start, 1.8 ms p50, 6.9 ms p99, and
4900 req/s sustained throughput on a small AWS box.

LiteLLM still has more providers and more features today. Portkey and Bifrost
have their own strengths. But if the gateway is going to sit between your users
and every model call, I think it should first be cheap, predictable, and boring
to run.

GoModel is my attempt to build that kind of gateway.

LiteLLM was compromised, but GoModel is a good alternative

Andrew S. Bandy — Tue, 24 Mar 2026 18:39:51 +0000

LiteLLM just had a serious supply chain incident.

According to the public GitHub reports, malicious PyPI versions of LiteLLM were published, including 1.82.8, with code that could run automatically on Python startup and steal secrets like environment variables, SSH keys, and cloud credentials. The reported payload sent that data to an attacker-controlled domain. A follow-up issue says the PyPI package was compromised through the maintainer's PyPI account, and that the bad releases were not shipped through the official GitHub CI/CD flow.

This is bigger than one package. It is a reminder that the AI infra layer is now part of your security boundary.

Fortunately, there is a good alternative. GoModel: a faster, simpler alternative to LiteLLM, written in Go. Simpler, smaller and better performance for teams that want a reliable LLM gateway.

Repo link: https://github.com/ENTERPILOT/GOModel/

Benchmarking GoModel, a LiteLLM alternative: lessons learned from building a simple benchmark

Andrew S. Bandy — Mon, 23 Mar 2026 15:36:23 +0000

When I started to look for an AI Gateway for my project I've encountered GoModel. I did not plan to spend much time on benchmarking.

I assumed benchmarking would be annoying, fragile, and probably much harder than it looked. In my head, it felt like one of those tasks that sounds simple at first, but turns into a mini research project once you actually start.

What I learned is the opposite: creating a useful benchmark is much easier than most people think.

And one big reason is that AI makes the whole process much easier than it was a few years ago.

That was the biggest lesson for me.

What is GoModel?

GoModel is an open-source AI gateway / LLM proxy written in Go. It sits between your app and model providers like OpenAI, Anthropic, Gemini, Groq, xAI, and Ollama, and exposes a single OpenAI-compatible API.

I built it because I wanted a lightweight, production-friendly gateway that was easy to deploy, easy to reason about, and fully open-source.

Why I decided to benchmark it

At some point, I kept making the same claim in my head:

“GoModel feels lighter and faster.”

That may be true, but “feels” is not evidence.

I was mostly comparing it against LiteLLM, because LiteLLM is the best-known option in this space and the default reference point for many people looking at LLM gateways.

So I decided to stop guessing and just measure it.

That turned out to be one of the most useful things I have done for the project, not only because of the results, but because of what I learned while building the benchmark itself.

The biggest change: benchmarking is easier now because you can just talk to AI (Lesson 1)

A few years ago, even starting a benchmark felt heavy.

First you had to think through the methodology. Then you had to decide what to measure. Then you had to write the scripts. Then you had to figure out how to run them, collect the numbers, and make sense of the results.

Now a lot of that work is much easier.

You can literally start by describing what you want in plain English:

I have two services
they do the same job
I want to compare throughput, latency, and memory usage
I want a simple repeatable benchmark
I do not need a perfect academic setup
I just want something fair and useful

That is already enough to get moving.

AI is very good at helping with exactly this kind of task. Not because it magically solves benchmarking for you, but because it removes a lot of the friction around getting started.

It can help you:

define a reasonable benchmark scope
generate load scripts
suggest what metrics to collect
point out obvious mistakes in the setup
format results
help you explain the limitations clearly

That part feels very different from how things used to be.

Before, benchmarking often felt blocked by setup cost.

Now it is much more like: just talk to AI, get a first version working, then iterate.

That does not mean you should trust every output blindly. You still need to think. You still need to validate the setup. You still need to understand what is actually being measured.

But the barrier to entry is much lower now.

And I think that is a big deal.

Lesson 2: a benchmark does not need to be perfect to be useful

This was the biggest mindset shift.

I think many developers avoid benchmarking because they imagine they need a huge setup: many machines, a big test matrix, production traffic replay, deep statistical analysis, and charts for every possible scenario.

In reality, you can learn a lot from a small benchmark if you ask a clear question.

My question was simple:

If both tools are used as an LLM gateway in front of the same kind of workload, how do they behave in terms of throughput, latency, and memory usage?

That is already enough.

You do not need to model the entire internet. You just need a test that is fair enough to reveal something meaningful.

AI also helps here because it forces you to phrase the question clearly. If you cannot explain the benchmark clearly to an AI assistant, there is a good chance your scope is still too vague.

Lesson 3: benchmarking forces product clarity

This part surprised me.

I expected benchmarking to tell me about performance.

What it also did was clarify the product itself.

Once you measure something, you are forced to answer questions like:

What is this product actually optimized for?
Where should it be better?
What trade-offs did I make intentionally?
What should users care about most?

In my case, the benchmark made the positioning much clearer.

GoModel is not just “an AI gateway.”

It is a Go-based, open-source, single-binary gateway designed to be lightweight, simple to deploy, and efficient in the hot path of LLM requests.

Without benchmarking, those are just words.

With benchmarking, they become testable claims.

Lesson 4: benchmarking is also a debugging tool

Before doing this, I mostly thought about benchmarks as something you publish.

That was a mistake.

A benchmark is also one of the fastest ways to find weak spots in your own system.

As soon as you push something under repeatable load, you start noticing where memory grows faster than expected, where latency becomes uneven, and where parts of the system become bottlenecks.

Even if I had never published the results, building the benchmark would still have been worth it.

It gave me a much more honest picture of the system.

And again, AI helps here not by replacing the benchmark, but by helping you move faster once you find a problem. You can ask it to review the script, suggest what might be skewing the result, or help you isolate one part of the test.

My biggest takeaway

The biggest lesson I learned is very simple:

Benchmarking is much more accessible today with AI tools.

You do not need a lab.

You do not need a giant team.

You do not need a perfect methodology.

And now, you also do not need to start from a blank page.

You can just describe what you want to measure, use AI to help generate a first version, and improve it from there.

You still need to think.

You still need to validate the setup.

You still need to be honest about the limits.

But getting started is much easier than it used to be.

Final thought

If you are building infrastructure, developer tools, or performance-sensitive software, I think it is worth benchmarking earlier than you expect.

Not because you need a marketing graph.

Because benchmarking forces clarity.

It helps you understand your product better, find bottlenecks faster, and communicate value more concretely.

And today, with AI, it is easier than ever to start.

That was true for me with benchmarking GoModel, and it is probably true for a lot of other projects too.

If you want to check out the project, GoModel is open-source and available on GitHub:

GOModel on GitHub

There is also a benchmark result published there by enterpilot start-up:

GoModel vs LiteLLM benchmark (March 2026)