The rollout looks responsible. A team wants AI in their editors, so someone wires up the tooling against a hosted model, drops a shared key into the config, ships it to the team, and moves on to the next thing. An afternoon's work. Everyone's happy. It feels like the careful version of "just let people use it."
It isn't, and the reason it isn't is that everything wrong with it is invisible on day one. There's no error. Nothing breaks. The tools work, the developers are productive, and the problems you've created don't announce themselves — they wait. That's what makes this the dangerous version, not the careful one.
I've built the thing that sits between a team of developers and a hosted model — the gateway that fronts the API, hands out keys, and logs the traffic. And the single most useful decision I made was treating three controls as non-negotiable from the first commit rather than features to add once we'd "proven it out": per-user keys, audit logging, and cost controls. Not because we were big. Because they're trivial to build in and genuinely painful to bolt on later, and that asymmetry is the whole argument.
Per-user keys
A shared key is one key that everyone's tooling carries. It works right up until you need it not to.
Someone leaves the company. Now you either rotate the key — which breaks every developer's tools at once and turns into a coordinated fire drill — or you don't, and a former employee's laptop still talks to your models. A key leaks into a screenshot, a commit, a pasted config in a support channel. Same fork. Rotate and break everyone, or shrug and hope. Neither is a position you want to be standing in.
And every day in between, you can't answer the most basic question: who did this? Who sent that request? Who's responsible for that spike? With a shared key the answer is "the team," which is the same as no answer.
Per-user keys make all of that boring. One person leaves, you deactivate one key, nobody else notices. One key leaks, you kill one key. Every request carries a person's identity, so "who did this" is a lookup instead of an investigation. The cost to build this in on day one is one table and a five-line CLI to mint a key. The cost to add it later is touching every developer's machine to swap the shared key for an individual one, coordinated across the whole team, while everyone's mid-task. Same outcome, an order of magnitude more pain, and you only do it after something already went wrong.
Audit logging
Here's a question someone will eventually ask you, and the only thing that varies is who and when: what did the AI tool actually see, do, and cost? Could be security after an incident. Could be compliance during a review. Could be you, at 2 a.m., trying to understand why something went sideways.
If you logged from day one, that's a query. If you didn't, it's an investigation with no evidence, because the logs you didn't write don't exist. You cannot audit the past retroactively. That's the trap with logging specifically — every other control you can add late and at least cover yourself going forward, but the gap in your audit trail is permanent. The day you wish you'd been logging is the day it's already too late to have been.
One thing worth getting right, since it's where I see people overcorrect: log metadata by default, not content. Who, when, which model, how many tokens, what it cost. The moment you start logging the actual prompts and completions, you've built a store of potentially sensitive text — source code, internal data, whatever developers paste in — and that store has its own retention, access, and compliance weight hanging off it. Good auditing isn't "record everything." It's recording enough to answer the questions you'll be asked, and making the choice to capture content a deliberate one with a reason behind it, not a default you backed into.
Cost controls
Token spend is invisible until the invoice, and AI usage is spiky in a way that makes that especially dangerous. It's not a smooth line you can extrapolate. It's a flat baseline with occasional cliffs — an agent stuck in a retry loop, someone pasting an entire repository into context, a script that was supposed to run once running ten thousand times. The spend that hurts you is the spend you didn't see coming, and without controls you see none of it coming.
Two cheap things change that. A per-user cap, checked before the request goes through, so one runaway can't run away very far. And a budget alert at the provider level as a backstop, so even if your app-level accounting has a blind spot, the cloud bill itself trips a wire. The difference between having these and not is the difference between a Slack notification on a Tuesday and a meeting with finance about last month.
Notice this one needs the first one. You can't cap or attribute spend per person if you can't tell people apart — per-user keys are what make per-user cost controls possible. The three aren't a checklist of separate features. They're one posture, and they reinforce each other: identity makes attribution possible, attribution makes both auditing and cost control meaningful, and all three come for nearly free if you lay them down together at the start.
"We're moving fast — isn't this premature?"
This is the real objection, and it deserves a real answer rather than a dismissal, because the instinct behind it is correct most of the time. Early on, you shouldn't gold-plate. Most "enterprise readiness" work genuinely is premature, and shipping beats polishing more often than engineers like to admit. If I thought these three were gold-plating, I'd agree with the objection.
They're not, and here's the distinction. Gold-plating is work whose payoff is speculative and whose cost is the same whenever you do it. These three are the opposite: their payoff is near-certain — someone will leave, a key will leak, a bill will surprise you, given enough time — and their cost is wildly different depending on when you do them. A day at the start. A migration under pressure after an incident later. When the cost of doing something is low now and high later, and the need is a matter of when rather than if, "we're moving fast" is the argument for doing it now, not against. Speed is exactly why you don't want to be retrofitting auth and accounting across a live fleet six months from now.
The fast path and the responsible path are the same path here. That's not usually true. It's true this time.
Day one is the cheap day
You don't earn these controls by hitting some headcount or some spend threshold where they suddenly become worth it. The first developer and the first key is the cheapest moment they'll ever be — one table, one CLI, one budget alert, an afternoon. Every day after that, the team grows, the configs proliferate, the unattributed spend piles up, and the same work gets a little more expensive and a little more disruptive to do.
I built all three in on day one. It cost me about a day, and I have never once wished I'd skipped it. The regret in this space only ever runs the other direction — it's always the team that handed out a shared key and figured they'd sort out the rest later, standing in the rubble of a leaked credential or a surprise invoice, doing the expensive version of the work they could have done cheap. Do it cheap. Do it first.
Top comments (2)
One thing I found while building an x402-powered API is that payment systems introduce a similar challenge. You need to know not only who is making requests, but also who is paying for them, what was consumed, and whether the transaction completed successfully.
The audit trail ended up being just as important as the API itself. It's easy to focus on the endpoint and forget that attribution, logging, and usage visibility are what make a service manageable once real users arrive.
I also agree that these things are much easier to build before adoption rather than after. Retrofitting them later usually means touching every part of the system.
Yeah, payments make it even more obvious. With the gateway I was tracking who, what, and how much. You've got one more on top: did the money actually move. A settlement that can fail means "did this complete" is something you have to record, not something you can assume from a 200 coming back. The audit trail mattering as much as the endpoint itself lines up exactly with what I ran into. It's the boring part nobody designs first and everybody needs by month two.