DEV Community: LaoWuuu

401 Unauthorized: The API Error That's Easier to Fix Than You Think

LaoWuuu — Sat, 20 Jun 2026 05:33:59 +0000

If you've ever stared at a 401 error wondering what went wrong, you're not alone. It's one of the most common API errors, and unlike 429 rate limits (which I covered last time), 401 is almost always your fault. That's actually good news because it means you can fix it.

A 401 error means the server doesn't recognize your credentials. It's not saying "go away" like a 403 Forbidden. It's saying "I don't know who you are." The difference matters. With 403, you're authenticated but not authorized. With 401, you're not authenticated at all.

The most common cause is a typo in your API key. I've seen developers spend hours debugging their code only to find a trailing space in their key. Copy-paste is your enemy here. Some terminals add invisible characters when you copy. Always trim your keys and check for whitespace.

Another frequent culprit is expired keys. Most providers don't send you a friendly reminder when your key expires. They just start returning 401. If your code worked yesterday and fails today, check the key's expiration date first.

Environment variables are another trap. You set up your key in .env, restart your terminal, and forget that the old session is still running with the old environment. This is especially common in Docker containers where environment variables are baked in at build time.

The sneakiest 401 error comes from incorrect headers. Some APIs expect "Authorization: Bearer sk-..." while others want just "Authorization: sk-..." or even a custom header like "X-API-Key". Read the documentation carefully. I once spent two hours debugging a 401 only to discover I was using the wrong header format for that particular provider.

If you're using a gateway or proxy, the 401 might not be from the final API but from the gateway itself. Check both layers. The gateway might have its own authentication that's separate from the upstream API.

Here's my debugging checklist when I hit a 401: First, print the exact key being sent (redacted, of course). Second, verify the key hasn't expired. Third, check the header format. Fourth, test the key directly with curl outside your code. Fifth, check if you're behind a proxy that's stripping or modifying headers.

The fix is usually simple once you find the cause. Regenerate the key if it's expired. Trim whitespace. Use the correct header format. Update your environment variables. The hard part is finding which of these it is.

Unlike 429 errors where you need to implement backoff and rate limiting, 401 errors are binary. Either your credentials work or they don't. There's no retry strategy that will fix a 401. You need to fix the credentials themselves.

429 rate limit errors: why they happen and what to do about them

LaoWuuu — Sun, 14 Jun 2026 15:58:58 +0000

You're running a batch job, hitting the API in a loop, and suddenly everything stops. No error message in your app, just silence. Or maybe you see a 429 status code and have no idea what it means or how long to wait.

429 means "Too Many Requests." The server is telling you to slow down. Every API provider has rate limits — OpenAI, Anthropic, DeepSeek, all of them. The limits vary by plan, by model, and sometimes by time of day when the service is under heavy load.

There are two types of limits most providers enforce. Requests per minute (RPM) controls how many API calls you can make in a 60-second window. Tokens per minute (TPM) controls how much text you can process in that same window. You might hit either one depending on your usage pattern. A loop that sends 100 small requests fast will hit RPM. A single request with a huge prompt might hit TPM.

When you get a 429, the response usually includes a Retry-After header. This tells you how many seconds to wait before trying again. If the header isn't there, a safe default is to wait 60 seconds. Don't immediately retry — that makes the problem worse. The server already told you to back off, and hammering it again just extends your penalty.

If you're building an application that calls the API, implement retry logic with exponential backoff. Wait 1 second, then 2, then 4, then 8. Most HTTP libraries have this built in or as a plugin. Don't just wrap the call in a while loop with no delay — that's how you turn a temporary rate limit into a permanent ban.

For teams or projects that need higher limits, there are a few options. Upgrade your plan — most providers offer higher tiers with more generous limits. Spread requests across multiple keys if the provider allows it. Use a gateway that can distribute load across multiple upstream keys automatically.

One thing that trips people up: rate limits are usually per-key, not per-account. If you have 3 keys on the same account, each one has its own limit. But some providers also have account-level limits that are lower than the sum of your individual keys. Check the docs.

Another gotcha: some providers count failed requests against your rate limit. If your request is malformed and returns a 400, that still counts as a request. Fix your request format before retrying, or you'll burn through your limit on errors.

If you're hitting rate limits consistently, it's worth checking if your usage pattern can be optimized. Batch multiple messages into one request instead of sending them individually. Use streaming to get partial results faster instead of waiting for the full response. Cache results where possible — if you're asking the same question multiple times, store the answer.

Rate limits are annoying but they exist for a reason. Without them, a single bad actor could monopolize the service and everyone else suffers. The key is building your application to handle them gracefully instead of treating them as unexpected errors.

Your API key might already be leaked. Here's how to check.

LaoWuuu — Sat, 13 Jun 2026 15:59:14 +0000

Most developers don't think about API key security until something goes wrong. A surprise bill, a rate limit you didn't trigger, or worse — someone using your key to run inference on models you never touched.

Here's the uncomfortable truth: if you've ever committed a .env file to a public repo, pasted a key in a Slack channel, or shared it in a support ticket, your key is probably out there. GitHub scans for secrets, but it doesn't catch everything. And once a key is in a public commit history, even if you delete the repo, it's already been scraped.

The first thing to do is check if your key has been exposed. Search your GitHub repos for your key prefix. Most API providers use a prefix that identifies the service — OpenAI keys start with "sk-", Anthropic with "sk-ant-", DeepSeek with "sk-". Run a search across all your repos, including forks and gists.

If you find a match, rotate the key immediately. Don't just delete the file — the commit history still has it. Generate a new key and delete the old one from the provider's dashboard.

Beyond checking for leaks, here are some habits that help. Never hardcode keys in source code. Use environment variables or a secrets manager. If you're in a team, use a shared vault instead of passing keys around in chat. Set spending limits on your API accounts so a leaked key can't rack up a huge bill before you notice.

For teams managing multiple keys across multiple models, a gateway adds another layer of control. Instead of distributing individual provider keys to every developer, you give them one gateway key. If someone leaves the team or a key gets compromised, you only need to rotate one key instead of tracking down every place a provider key was used.

The worst feeling is finding out your key was leaked because of an unexpected bill. Check your repos today — takes five minutes and could save you a lot of trouble.

Stop Sharing One API Key Across All Your AI Tools

LaoWuuu — Wed, 10 Jun 2026 15:37:11 +0000

A mistake I keep seeing with AI tools is simple:

One developer creates one API key, then uses it everywhere.

Cursor.

Open WebUI.

Cherry Studio.

A local script.

A prototype SaaS app.

Maybe even a teammate's machine.

At first, this feels convenient. One key, one balance, one endpoint. Done.

But once usage grows, this becomes one of the fastest ways to lose control of your AI costs.

One Key Means No Visibility
If everything uses the same key, your logs become noisy.

You may know that your balance dropped, but you do not know why.

Was it Cursor doing long codebase edits?

Was it Open WebUI running long conversations?

Was it a test script stuck in a loop?

Was it a teammate experimenting with a larger model?

Was the key leaked somewhere?

With one shared key, all of these look like the same user.

That is fine for a five-minute test. It is terrible for anything you plan to keep using.

Cost Control Starts With Separation
The simplest rule is:

Create one API key per tool or project.

For example:

cursor-dev
open-webui-team
cherry-studio-personal
backend-staging
backend-production
Then set a limit for each key.

Cursor should not be able to burn the same balance as your production backend.

A local experiment should not share the same key as your customer-facing app.

A teammate's client should not be able to affect your main service.

This is not overengineering. It is basic cost isolation.

The Real Problem Is Not The Model
When developers talk about AI API cost, they often focus on model pricing.

That matters, but it is only part of the story.

Your bill is also affected by:

long context windows
repeated retries
verbose outputs
background agents
accidental loops
expensive models used for simple tasks
multiple tools sharing the same key
no daily or monthly spending limit
A cheap model can still become expensive if a tool sends too much context too often.

A powerful model can be reasonable if you only use it for the right tasks.

The key is not "always use the cheapest model."

The key is "know which tool is spending what."

Debugging Also Gets Easier
Separate keys are useful even when money is not the issue.

If a request fails with 401, 404, 429, or timeout errors, you can narrow the problem faster.

With separate keys, you can ask:

Is this only happening in Cursor?
Is Open WebUI using the wrong model name?
Did this specific key hit a quota limit?
Is one tool sending requests too frequently?
Did I accidentally disable the wrong key?
Is production healthy while staging is failing?
Without separation, you are guessing.

Key Leaks Become Less Dangerous
API keys leak more often than people admit.

They end up in:

frontend code
GitHub commits
screenshots
shared config files
browser extensions
team chats
old local projects
If one shared key leaks, everything is exposed.

If a limited tool-specific key leaks, you can disable it quickly and limit the damage.

That is why I prefer API keys with:

clear names
spending limits
usage logs
expiration dates when possible
separate keys for each environment
separate keys for each client or tool
A Practical Setup
For small teams or solo developers, I would start with this:

One key per AI client
One key per backend environment
Low limits for experiments
Higher limits only for production
Check logs before increasing limits
Disable unused keys If you are using an OpenAI-compatible gateway, make this your default workflow.

A custom endpoint might look like:

https://your-domain/v1
For my own testing, I use AI OpenCloud as an OpenAI-compatible multi-model gateway:

https://aiopencloud.xyz/v1
The important part is not the endpoint itself.
The important part is having key-level visibility and limits.
Final Thought
AI tools are becoming more powerful, but also easier to overuse.

A year ago, the main question was:

"Which model should I use?"

Now the better question is:

"Which tool is spending money, and can I stop it quickly if something goes wrong?"

If you cannot answer that, your API key setup is too loose.
Start by splitting your keys.
It is one of the simplest changes you can make, and it pays off the first time something misbehaves.

What I see running an AI API gateway (and why the rsync/Claude debate misses the point)

LaoWuuu — Tue, 09 Jun 2026 14:58:40 +0000

There's a post on HN right now asking whether Claude introduced bugs into rsync. 475 comments, mostly arguing about whether AI code is trustworthy. It got me thinking about what I see from the other side of the API gateway.

My last post was about troubleshooting model-list-works-but-chat-fails. That's the infrastructure side. Today I want to talk about what people are actually sending through.

I run AIOpenCloud, a small API gateway. Not huge traffic, but enough to see patterns.

About 60% of requests are coding-related. The rest is writing, analysis, random stuff. But the interesting part isn't the volume. It's how differently people use the same models.

Some users have built entire review workflows around their AI coding setup. They'll send a prompt, get the response, then immediately send a second request that says "review the above code for bugs." Two API calls per task. Their code probably has fewer bugs than most human-written code, because they've automated the paranoia.

Other users paste directly. No review, no second pass. Copy, paste, ship. I know this because I can see the request patterns. One request, long pause, done.

The rsync analysis everyone's debating on HN is interesting, but it's looking at one specific case: AI contributing to a 20-year-old C codebase with strict correctness requirements. That's the hardest possible scenario. Most people aren't writing rsync. They're writing a script to parse CSV files, or a CRUD endpoint, or a test suite.

For those tasks, AI code with human review beats human code with no review. And that's what I see in the usage data. The users who spend more (meaning they're using it heavily) tend to have review patterns built in. The drop-in-and-out users don't.

I'm not saying AI code is safe. I'm saying the risk is mostly in the workflow, not the model. The rsync case is a cautionary tale about dropping AI suggestions into code you don't fully understand. But for most developers writing most code, the question isn't "is AI code perfect." It's "is my review process good enough."

The models keep getting better. The review habits don't automatically improve with them.

Last post: troubleshooting why chat requests fail when model lists work. Next up: what actually happens when you switch models mid-conversation (hint: it's messier than you'd think).

What's your review process for AI-generated code? Do you do a second pass, or trust and ship?

Running your own AI gateway? Try AIOpenCloud — $8.88 free, no credit card.

API gateway troubleshooting: model list works but chat requests fail

LaoWuuu — Mon, 08 Jun 2026 16:09:11 +0000

Got a weird one that keeps coming up. You set up your API client — Cursor, Open WebUI, Cherry Studio, whatever — point it at your gateway, hit test connection, and it passes. Model list loads fine. Then you actually try to chat and... timeout. Or 400. Or just silence.

The model list working means the URL and key are correct. So what breaks between "can list models" and "can actually generate text"?

Here are the five things I keep seeing.

Model name mismatch. The /models endpoint returns what the gateway registered, which might not match what the upstream actually expects. Your gateway calls it "gpt4", upstream wants "gpt-4o". Request goes through, upstream says "I don't know this model", you get a 400 or 404. Quick test: curl the chat endpoint directly with the exact model name you're using. If it fails, check what the upstream actually expects.

Streaming breaks things. Most clients default to stream:true. Some gateways handle streaming poorly — buffers don't flush, SSE format gets mangled, chunks arrive incomplete. The client chokes and either hangs or drops the connection. Test: turn off streaming in the client settings. If it suddenly works, that's your problem.

Timeout too short. LLM inference takes time. Long prompts, large models, high server load — all push response times up. If your client or gateway times out at 10 seconds, longer requests will fail even though the server is still working on them. The tell: requests fail at a consistent time boundary (10s, 30s, 60s). Fix: find the timeout setting and bump it to 60s or more.

Request body incompatibility. OpenAI's API format is the de facto standard, but details vary. tool_choice, response_format, function calling — some upstreams don't support all fields. Your client auto-includes these params, upstream rejects the whole request with a 400. Debug: grab the actual JSON your client sends and strip fields one by one until it works.

Rate limiting hidden from the UI. 429 errors sometimes get swallowed by the client. No error shown, just empty responses or infinite retries. Shared keys, free tier accounts, and low-tier API plans hit concurrency limits fast. Check: curl the same request manually and look at the HTTP status code. If it's 429, check the Retry-After header.

The universal debugging step: bypass the client and curl the upstream directly. If curl works, the problem is in your client or gateway config. If curl fails, it's upstream or the key itself. This step is non-negotiable — saves hours of guessing.

What other weird failures have you hit with API gateways? Curious what edge cases people have run into.

"Fine-tuning an LLM to write docs" made me realize why I need a gateway

LaoWuuu — Fri, 05 Jun 2026 14:33:44 +0000

Saw a post on HN today about fine-tuning an LLM to write documentation like it's 1995. The idea is neat — train a model on your old docs, get new ones in the same style. But it got me thinking about the infrastructure side of things.

If you're fine-tuning, you're probably touching at least three different APIs. OpenAI for GPT-4 base, maybe Claude for comparison runs, DeepSeek for the cheap bulk processing. Each one has its own billing cycle, its own key management, its own rate limits.

I used to keep a spreadsheet. Seriously. Five API keys, five dashboards, five invoices at the end of the month. The worst part wasn't even the money — it was the context switching. You're in the middle of debugging a prompt, and suddenly you hit a rate limit on one provider. Now you're scrambling to switch keys mid-thought.

That's what pushed me to build AIOpenCloud. One key, one bill, models behind it just work. DeepSeek v4 for the heavy lifting, GPT-4 when I need it, Claude for the nuanced stuff. The routing happens at the gateway level — I don't think about it anymore.

The fine-tuning post also reminded me: when you're experimenting with multiple models, the cost adds up fast. Having a single gateway with transparent pricing means I can actually track what's happening. No surprise bills on the 1st.

If you're running multi-model workflows, how do you manage the keys? Still got that spreadsheet?

→ aiopencloud.xyz

5-minute setup

LaoWuuu — Wed, 03 Jun 2026 10:02:26 +0000

Wrote about why I picked New API — got asked "ok but how do I actually use it?" Fair. Here's the 5-min setup. One key, all models.

→ aiopencloud.xyz?utm_source=devto

5 minutes to your first API call with AIOpenCloud

LaoWuuu — Wed, 03 Jun 2026 10:02:03 +0000

A couple weeks ago I wrote about why I chose New API as my gateway backend. Since then the most common question has been: "ok, how do I actually use it?" So here's the 5-minute setup.

If you're managing multiple API keys for different AI models, you know the pain. One key for OpenAI, another for Claude, a third for DeepSeek. Different billing dashboards, different SDKs, different endpoints.

AIOpenCloud wraps them all behind one endpoint. Here's how to start using it in 5 minutes.

Step 1: Get your key
Sign up at aiopencloud.xyz. You get $8.88 free credit — no credit card needed.

Step 2: Install the client (optional)
You don't actually need a client. Any OpenAI-compatible SDK works. But for a quick test:

pip install openai

Step 3: Make your first call

from openai import OpenAI

client = OpenAI(
    base_url="https://aiopencloud.xyz/v1",
    api_key="your-key-here"
)

response = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Hello, what models do you support?"}]
)

print(response.choices[0].message.content)

That's it. One endpoint. One key. All models.

What you get:

DeepSeek, GPT, Claude, Gemini, Qwen — all through the same API
One dashboard for usage and billing
No lock-in — your key works with any OpenAI SDK

Pricing example

Model	Input (per 1M tokens)
DeepSeek V4 Flash	$0.16
GPT-5	$6.00
Claude Sonnet 4	$3.60
Gemini Flash	$0.60

Full pricing comparison coming soon at aiopencloud.xyz/pricing-vs

What's your setup for managing multiple AI models? Still juggling separate keys, or found a workflow that works? I'd love to hear what others are doing — drop a comment.

Next up: a breakdown of what models people actually use vs what they think they need. Follow if you want the real numbers.

https://aiopencloud.xyz?utm_source=devto

one week

LaoWuuu — Tue, 02 Jun 2026 10:28:12 +0000

Everyone's split on whether the stockmarket can swallow Anthropic and OpenAI. Meanwhile I'm running a one-man AI gateway. 12 users, a Cloudflare-happy monitoring alarm, and zero revenue. People are using it though. That counts.

One week of running my own AI gateway

LaoWuuu — Tue, 02 Jun 2026 10:27:49 +0000

There's a thread on HN today asking whether the stockmarket can swallow Anthropic, SpaceX, and OpenAI. Big money circling big names. Meanwhile I hit one week running a side-project AI gateway, and the contrast is something.

The first 48 hours were silent. I kept refreshing the dashboard. Nobody. Around 3am on day two a user registered and immediately tested Claude. Complained about latency. I switched upstreams and it worked. That was when it started to feel real.

My monitoring setup was the biggest headache. Fired 17 times in one night. Every alert was a Cloudflare hiccup, not an actual outage. Took a few rounds of retries and a cooldown before I could trust it.

Seven days in, twelve users. DeepSeek is the most requested model by miles. Nobody asked for GPT-4. Enough to rethink the whole model lineup.

Zero revenue. Working software that people chose. I'll take that for week one.

https://aiopencloud.xyz?utm_source=devto

Why I chose New API over the alternatives for my AI gateway

LaoWuuu — Sun, 31 May 2026 10:05:05 +0000

A couple weeks ago I wrote about building a unified API gateway and the monitoring headaches that followed. Some people asked why I picked New API as the backend in the first place.

The requirements were straightforward: support multiple upstream providers, manage API keys, track usage, and not break the bank.

I tried a custom proxy first. It worked but every time I needed rate limiting, user management, or logging, I had to build it from scratch. That got old fast.

Then I found New API. It's open source and does most of what I needed out of the box: multi-model routing, key management, usage tracking, and a dashboard. Setup took about a weekend. Been running it for a month and it's held up fine.

The main tradeoff is you're tied to its data model and API conventions. But for a small operation like mine, that's worth it. I'd rather spend time on the service than reinventing user management.

Next up: how I handle pricing and free tiers without losing money.