DEV Community: Daniel Dong

New on AIBridge: Kimi K3 (1M Context) and GitHub OAuth Login

Daniel Dong — Fri, 24 Jul 2026 01:14:55 +0000

Two things shipped this week:

Kimi K3

A reasoning model that doesn't ask you to turn on "think mode."

1M context window. Always reasoning. No toggle.

import openai
client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)
client.chat.completions.create(
model="kimi-k3",
messages=[{"role":"user","content":"Analyze this codebase for security bugs"}]
)

Drop a repo. Get a review. No prompt engineering.

GitHub OAuth

Registration went from 4 steps to 1 click.

Email → password → verification code? Replaced by: GitHub → authorize → done.

5 seconds. Same 500K free tokens. Same 15 models.

One key. 15 Chinese AI models (DeepSeek, Qwen, GLM, Kimi K3).
Free to start. No credit card.

→ aibridge-api.com/playground.html (try K3, no signup)
→ aibridge-api.com/register.html (GitHub login, one click)

Kimi K3 + GitHub Login — now live on AIBridge

Daniel Dong — Thu, 23 Jul 2026 13:47:47 +0000

Two things just went live on my API gateway:

Kimi K3 — 1M context, always reasoning

No toggle. No "think step by step." It just thinks.

1M context window. 4x Claude, 8x GPT-4o.
Drop a whole codebase in one prompt.

import openai
client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)
client.chat.completions.create(
model="kimi-k3",
messages=[{"role":"user","content":"Review this entire repo for bugs"}]
)

GitHub OAuth — one click, no password

Email → password → 6-digit code? Gone.

Now it's: click GitHub → authorize → dashboard.

Registration time: ~5 seconds.

15 Chinese AI models (DeepSeek, Qwen, GLM, Kimi K3).
One OpenAI-compatible endpoint. One API key.

Free 500K tokens/month. Try K3 with no signup:
→ aibridge-api.com/playground.html

Or skip the email form entirely:
→ aibridge-api.com/register.html

Kimi K3 meets GitHub login: two features, one week, zero friction

Daniel Dong — Thu, 23 Jul 2026 02:34:29 +0000

I added two things to my API gateway this week:

1. Kimi K3 — a reasoning model that thinks by default

No toggle. No "please think step by step." No config.
Just send the prompt. K3 reasons through it. Every time.

1M context window. Drop a whole codebase in one prompt.
4x what Claude gives you. 8x what o1 gives you.

import openai
client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)
client.chat.completions.create(
model="kimi-k3",
messages=[{"role":"user","content":"Find every silent failure in this 5,000-line module"}]
)

2. GitHub OAuth — registration in one click

Our old flow: email → password → verify inbox → type 6-digit code → finally get API key.
Conversion rate: 1.4%.

Now there's a "Continue with GitHub" button on the signup page.
One click → authorize → you're in. Under 5 seconds.

Same 15 models. Same 500K free tokens.
Just no more 6-digit codes to copy-paste.

Try either one free — no signup needed for the playground:
→ aibridge-api.com/playground.html

Or grab a key and skip the email verification entirely:
→ aibridge-api.com/register.html

Most reasoning models let you choose when to think.

Daniel Dong — Wed, 22 Jul 2026 13:17:39 +0000

Most reasoning models let you choose when to think.

Kimi K3 doesn't give you a choice. It always reasons — whether you ask it to or not.

The difference shows up in the weirdest places. Debugging a race condition.
Refactoring a 3,000-line module. Analyzing a log file. Tasks where most models
skim the surface and K3 goes two layers deeper without being told.

1M context window. 4x what Claude gives you. 8x what o1 gives you.

import openai
client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)
client.chat.completions.create(
model="kimi-k3",
messages=[{"role":"user","content":"Find the subtle bug in this codebase"}]
)

One key. 15 models. K3 just joined the lineup.

Free trial — no signup: aibridge-api.com/playground.html

Most reasoning models make you choose: think mode ON or OFF.

Daniel Dong — Tue, 21 Jul 2026 10:58:11 +0000

Most reasoning models make you choose: think mode ON or OFF.

Kimi K3 doesn't have a toggle. It just thinks. Every time.

I tested four models on the same complex debugging task:

import openai
client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)

Response:
GPT-4o → quick but shallow, missed the off-by-one
DeepSeek R1 → thorough but 64K context clipped my trace
Claude 3.5 → caught the bug, but needed me to prompt "think step by step"
Kimi K3 → 1M context swallowed the entire 2000-line trace.
Always-on reasoning caught the off-by-one
without me asking for it.

No reasoning_effort prompt engineering.
No thinking parameter.
No "please reason carefully" in the system message.

Just send the prompt. K3 thinks. You get the answer.

1M context. Always reasoning. One key alongside 14 other models.

→ aibridge-api.com/playground.html

I asked Kimi K3 to review my entire codebase. Here's what happened.

Daniel Dong — Tue, 21 Jul 2026 03:36:21 +0000

I asked Kimi K3 to review my entire codebase. Here's what happened.

Most AI code reviews work like this:

Paste a function
Get a suggestion
Paste another function
Get another suggestion
Lose context. Miss the big picture.

Kimi K3 works differently. 1M context window.
I dropped 8,000 lines of Python in one prompt.

"Find every place where error handling is silently swallowing exceptions,
cross-reference the call chain, and suggest a consistent fix."

It thought for a moment — the reasoning pass is always on, no toggle needed —
then returned a list of 12 locations, ranked by severity, with the exact
line numbers and proposed diffs.

Not just "add try/except here" — actual refactoring suggestions with
call chain context. The kind of review you'd pay a senior dev for.

The 1M context advantage

Other thinking models (Claude, o1) top out at 200K tokens. K3 gives you
5x that. For codebases, that's the difference between reviewing one module
and reviewing the whole app.

Model	Context Window	Reasoning
Kimi K3	1M tokens	Always on
Claude 3.5 Sonnet	200K	Manual prompt
GPT-4o	128K	Manual prompt
DeepSeek R1	64K	Always on

K3 is the only model on this list that combines always-on reasoning with
a context window large enough for an entire production codebase.

The catch

It's slower than a non-reasoning model. The thinking pass adds latency.
For "What's 2+2?" it's overkill. For "Where's the race condition in my
3,000-line async pipeline?" — worth every millisecond.

How to try it

K3 is available through AIBridge alongside 14 other Chinese models
(DeepSeek, Qwen, GLM) behind one OpenAI-compatible endpoint:

Same code. Different model string. That's the whole migration.

Free 500K tokens/month to test K3 against your own codebase.
No signup needed for the playground:

→ aibridge-api.com/playground.html

Kimi K3: A model that thinks before it answers.

Daniel Dong — Mon, 20 Jul 2026 15:21:55 +0000

Kimi K3: A model that thinks before it answers.

No toggle. No config. Every request runs a reasoning pass first —
then gives you the result.

1M context window. Drop a whole codebase in one prompt.
14 models total. One key. Same code.

Free 500K tokens/month. Try it without signing up:
aibridge-api.com/playground.html

I route 90% of my AI traffic to the cheapest model. Here's why your app should too.

Daniel Dong — Mon, 20 Jul 2026 14:42:18 +0000

Most devs pick one model and send everything to it. That's like
using a sledgehammer to hang a picture frame.

The wake-up call

My app had three types of requests:

"Summarize this email" — 200 tokens in, 100 out
"Is this sentiment positive?" — 150 in, 5 out
"Debug this Python function" — 2000 in, 800 out

All three went to glm-4-plus at $7.10/M tokens.

The sentiment check cost me $0.001 per call.
The same call on deepseek-chat would cost $0.00003.

33x cheaper. Same accuracy. I was burning money on a yes/no answer.

The routing logic

It took 20 lines of code:

def pick_model(messages, task_type):
    # Cheap model for simple tasks
    if task_type in ("classify", "sentiment", "summarize_short"):
        return "deepseek-chat"          # $0.27/M

    # Mid-tier for general work
    if task_type in ("draft", "translate", "summarize_long"):
        return "qwen-plus"              # ~$0.40/M

    # Power model only when reasoning matters
    if task_type in ("debug", "reason", "analyze"):
        return "glm-4-plus"             # $7.10/M

    # Default: cheapest
    return "deepseek-chat"


# Same endpoint, different model string — no code changes
client.chat.completions.create(
    model=pick_model(msgs, task_type),
    messages=msgs,
)

That's it. One function. One endpoint. The provider never knows
which model I'm using — the gateway handles it.

What changed

Metric Before (all glm-4-plus) After (routed)
Monthly cost $180 $34
Avg latency 1.4s 0.6s
Quality complaints 0 0

Quality didn't drop because the cheap model is plenty smart for
90% of requests. It's the long-tail reasoning tasks that need the
expensive brain — and those are only 10% of volume.

The counter-argument I hear
"But what if the cheap model gives a worse answer?"

Test it. Run the same 100 prompts through both models blind.
You'll be surprised how often the 0.27modelmatchesthe7 one.

The gap between "best model" and "good-enough model" has
collapsed. The gap between "good-enough model" and "your bill"
has not.

When NOT to route

Creative writing where you want the model's "voice"
Safety-critical medical/legal reasoning
Anything where 95% accuracy isn't acceptable

For everything else — chatbots, classifiers, summarizers, drafters,
translators — route to the cheap model and pocket the difference.

The setup

This only works if switching models is free. With direct provider
accounts, each switch means a new SDK, new auth, new billing.

With an OpenAI-compatible gateway, it's a string change:

client = openai.OpenAI(
    api_key="mb-xxx",
    base_url="https://aibridge-api.com/v1"
)

# Same call, 14 possible models, one bill
client.chat.completions.create(model="deepseek-chat", messages=msgs)
client.chat.completions.create(model="glm-4-plus", messages=msgs)

Stop paying premium prices for yes/no answers.

→ aibridge-api.com — 14 models, one endpoint, free 500K tokens/month

70% of your AI API calls are duplicates. You're paying for them anyway.

Daniel Dong — Sun, 19 Jul 2026 07:28:45 +0000

I analyzed 3 months of API logs for a production app. The results
were embarrassing.

The data

Type	% of requests	Cost share
Unique prompts	28%	34%
Near-duplicates	41%	52%
Exact duplicates	31%	14%

31% of calls were literally identical — same model, same prompt,
same temperature. Every single one hit the provider and burned
tokens.

41% were "near-duplicates" — same system prompt, slightly different
user input. A 10-word difference on a 500-token context. Wasted.

28%. That's how many of my API calls actually needed to reach the
model.

The fix that took 5 minutes

AIBridge has response caching built in. Not a separate Redis layer.
Not a middleware I had to deploy. Just... on.

curl https://aibridge-api.com/v1/chat/completions
-H "Authorization: Bearer mb-xxx"
-H "Content-Type: application/json"
-H "X-Cache-TTL: 3600"
-d '{"model":"deepseek-chat","messages":[...]}'

Add one header. X-Cache-TTL: 3600. That's it.

Same prompt within the hour? Returned from cache. Zero tokens.
Zero latency. Zero cost.

What changed

Before	After
100% of requests hit provider	~30% hit provider
$120/month in API costs	$38/month
Avg latency 1.2s	200ms (cached hits)
Rate limit anxiety	Gone

The $82/month I saved isn't life-changing. But the latency
improvement for cached responses is — users notice 200ms vs 1200ms.
Every single time.

When to use it

Not everything should be cached. Here's my rule:

✅ Static system prompts ("You are a helpful coding assistant")
✅ FAQ-style user prompts ("What's your return policy?")
✅ Deterministic generation (temperature=0, JSON mode)
❌ Creative generation (temperature > 0.5)
❌ Per-user personalized responses
❌ Real-time data queries

For most SaaS apps — chatbots, customer support, documentation
search — the cache hit rate should be above 50%.

For API wrappers and middleware that resell AI access? 70%+ is
realistic. Every cached response is pure margin.

The lesson

Before you spend another dollar on API credits, check your logs
for repeats. You're probably paying for the same answer twice.

If you're on AIBridge, add X-Cache-TTL. If you're not, build a
simple in-memory cache. Either way, stop burning tokens on answers
you already have.

→ aibridge-api.com — 14 models, caching built in, free to start.

I spent $47 testing 4 AI models last month. Here's what I learned.

Daniel Dong — Sat, 18 Jul 2026 13:14:07 +0000

I spent $47 testing 4 AI models last month. Here's what I learned.

Not about the models. About the workflow.

The setup

Same prompt. Same temperature (0.3). Same max_tokens (500).
Four different models through the same endpoint.

Model	Avg Latency	Cost/1M tokens	Best for
deepseek-chat	1.2s	$0.27	General purpose, fastest
qwen-max	0.8s	$2.80	Multilingual, structured output
glm-4-plus	1.5s	$7.10	Complex Chinese reasoning
moonshot-v1-32k	1.0s	$12.00	Long context summarization

Notice something? The cheapest model is also the fastest.
And the most expensive isn't always the best — it depends on
what you're asking.

The discovery that made me angry

I was using 4 different API keys. Four pip install commands.
Four different create() signatures. Four dashboards to check
at the end of the month.

Then I realized: they're all OpenAI-compatible.

All of them. Every single one.

So I swapped 4 base_url strings for 1. I changed nothing else.
Same openai library. Same code. Same response parsing.

client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)

DeepSeek
client.chat.completions.create(model="deepseek-chat", messages=msgs)

Qwen — literally the same call
client.chat.completions.create(model="qwen-max", messages=msgs)

What I actually gained

Not just cleaner code. Three things I didn't expect:

Free exploration. The free tier (500K tokens/month) means I
can test models without committing to a paid account.
Model routing. My app now picks the model based on the task
at runtime — cheap model for summarization, powerful model for
reasoning. One endpoint makes this trivial.
One bill. I don't log into 4 dashboards anymore. I don't
worry about 4 separate rate limits. I don't get 4 emails when
my card expires.

The rule I follow now

If you're building anything that calls an AI model, abstract the
provider layer on day one. Not day 30 when DeepSeek rate-limits
you and you're scrambling to wire up Qwen before users notice.

The abstraction pays for itself the first time you switch.

→ aibridge-api.com — 14 models, one endpoint, free to start.

I built a working AI chatbot in 50 lines of HTML.

Daniel Dong — Fri, 17 Jul 2026 11:17:30 +0000

I built a working AI chatbot in 50 lines of HTML.

No backend. No build step. No framework.

One HTML file. One API key. Open it in your browser.
It works.

Here's the code:

<!DOCTYPE html>
<html>
<head><meta charset="UTF-8"><title>Minimal Chat</title></head>
<body>
  <div id="chat"></div>
  <input id="msg" placeholder="Ask anything..."
         style="width:300px;padding:8px">
  <button onclick="send()">Send</button>

  <script>
    const KEY = "mb-your-api-key-here";

    function append(role, text) {
      const div = document.createElement("div");
      div.innerHTML = `<b>${role}:</b> ${text}`;
      document.getElementById("chat").appendChild(div);
    }

    async function send() {
      const input = document.getElementById("msg");
      const prompt = input.value; input.value = "";
      append("You", prompt);

      const res = await fetch(
        "https://aibridge-api.com/v1/chat/completions",
        {
          method: "POST",
          headers: {
            "Content-Type": "application/json",
            "Authorization": `Bearer ${KEY}`,
          },
          body: JSON.stringify({
            model: "deepseek-chat",
            messages: [{ role: "user", content: prompt }],
          }),
        }
      );
      const data = await res.json();
      append("AI", data.choices[0].message.content);
    }
  </script>
</body>
</html>

No npm install. No Express server. No .env files.
Save as chat.html, double-click, start talking.

Want streaming? Add "stream": true to the body.
Want a different model? Change "deepseek-chat" to "qwen-max".

That's the power of a truly OpenAI-compatible API.

14 models. One endpoint. Free tier is 500K tokens/month.

→ aibridge-api.com

I tried switching my app from OpenAI to DeepSeek.

Daniel Dong — Thu, 16 Jul 2026 14:17:38 +0000

I tried switching my app from OpenAI to DeepSeek.

Here's what I had to learn: nothing.

import openai
client = openai.OpenAI(
api_key="mb-xxx",
base_url="https://aibridge-api.com/v1"
)
client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "hello"}]
)

Same library. Same function signatures. Same response format.
Just a different model string.

That's the whole migration. Zero new SDKs. Zero new documentation.
Zero hours spent learning another provider's quirks.

14 models (DeepSeek, Qwen, GLM, Moonshot) behind the same interface
you're already using.

Free tier: 500K tokens/month. No credit card.

aibridge-api.com