After my last post about a Stripe webhook silently failing for 5 days, the next incident hit two days later.
It started with one support ticket from a customer:
"Our staff says they can't log in. They didn't change their password."
Another store reported the same symptom. "It happens occasionally."
That "occasionally" turned out to be a 4-year-old API auth-bypass vulnerability. Build-in-Public post #8 — full incident log.
The morning: investigation begins
I checked the database. The affected account's password column (a bcrypt hash) had indeed been updated that morning. But the user says they didn't change it.
My first hypothesis: a bug in the staff admin panel where editing a cast (= performer / staff member) silently overwrites their password. Classic React form-state hidden-field issue.
I reproduced in QA:
- Pick a test cast, open the edit modal
- Inspect the DOM → no password input field exists at all
- Save without changes → password hash unchanged
- Change the display name and save → check Network tab → request body has no
passwordfield - DB password hash unchanged
→ The edit modal is not the culprit. It has to be something else.
Going through the history
I dug back through the database for similar cases. One specific email address had the same "can't log in" event hit twice already:
| Year | Event |
|---|---|
| 2021-12 | One staff with that email locked out → admin creates a new staff record with the same email |
| 2023-09 | Different staff with the same email, same symptom → admin creates yet another record |
| 2026-05 | Today's incident |
So this is a chronic, recurring problem — at least 4 years running.
The root cause
I dove into the server code for the password-reset endpoint:
// controller (cast password reset)
@Post(`/passwordReset`)
async passwordReset(@Body() req: { email: string; password: string }) {
return await this.connection.transaction(async (entityManager) => {
return await this.service.passwordReset(entityManager, req);
});
}
// service
async passwordReset(entityManager, req: { email: string; password: string }) {
const casts = await ...createQueryBuilder("cast")
.where("cast.email = :email", { email: req.email })
.getMany();
if (!casts.length) throw new HttpException("...", 400);
const password = await bcrypt.hash(req.password, 10);
// overwrite all matching casts' password
...
}
The structural issues:
-
No auth guard (no
@UseGuards(...)or auth decorator) - No
resetTokenvalidation - POST
{ email, password }and the endpoint will overwrite that account's password — full stop
The "reset URL" sent in the password-reset email contains a ?token=... query string — but the frontend uses that token only to fetch the email address (via findByResetToken). The server never validates the token on the actual reset call.
→ Anyone who knows an email address can hit the API directly and overwrite that account's password. That's been live for 4 years.
In our industry (vertical SaaS for Japan's nightlife sector), customer email addresses circulate among adjacent vendors. The attack vector is real.
Hot fix design
The full proper fix (change the controller signature to { resetToken, password } and update the frontend in two apps) requires rebuilding both frontends and invalidating CloudFront caches. Heavy for an emergency deploy.
Minimum-surface fix:
// service.ts (cast)
async passwordReset(entityManager, req: { email: string; password: string }) {
const casts = await ...createQueryBuilder("cast")
.where("cast.email = :email", { email: req.email })
.andWhere("cast.reset_token IS NOT NULL") // ← one-line guard
.getMany();
...
}
// service.ts (staff) — same single-line addition
.andWhere("staff.reset_token IS NOT NULL")
Effects:
- ✅ Direct hits without going through
sendEmailfirst are rejected (reset_tokenis null) - ✅ After a successful reset,
resetTokenclears to null — prevents back-to-back tampering - ✅ The legitimate flow (frontend
sendEmail→ email → URL → new password) still works without any frontend changes - ✅ No frontend rebuild required, server-only deploy
QA E2E test
I deployed to QA and ran 4 cases:
| Test | Expected | Actual |
|---|---|---|
| Direct hit (cast, no token) | 400 | ✅ 400 |
| Direct hit (staff, no token) | 400 | ✅ 400 |
| Legit flow (sendEmail → reset) | 201 | ✅ 201 |
| Replay after token clears | 400 | ✅ 400 |
All as expected. Pushed to production.
Production deploy + recovery
Deployed to production EC2 (Node.js + PM2 + NestJS), built, pm2 restart api. Five seconds to come back online, 92MB stable.
Verified the same 400 on production direct hits → vulnerability closed.
But the affected account already had its password overwritten by the attacker, so the legitimate user still can't log in. I ran an admin script to force-reset their password to a safe random value, then communicated the temp password to the customer through a side channel and asked them to log in and immediately change it themselves.
Lessons
1. "Happens occasionally" is not a feature, it's an unsolved bug
The store treated this as a known quirk and just kept asking us to reissue accounts. For 4 years. Take the customer's words ("but I didn't change it") seriously instead of pattern-matching to "yet another forgotten password."
2. PR plan < Emergency repair
I had a whole day of PR work scheduled — all canceled. Of course. And then publishing the incident as a Build-in-Public post is more transparent than "we shipped what we planned."
3. The "implicit trust" assumption is where vulnerabilities hide
"Server doesn't validate resetToken here, but the frontend uses it for fetching email, so it's fine." That kind of implicit-trust reasoning is exactly how 4-year-old vulnerabilities survive.
The right design assumption: attackers will hit your API directly, regardless of what your frontend does.
4. Minimum-surface hot fix is a discipline
Full proper fix takes longer; "service-layer one-line guard" closes the immediate attack surface in minutes. The tradeoff is fine — schedule the proper refactor later.
What's left
-
Full fix: change the controller signature to
{ resetToken, password }+ frontend updates in both cast-app and staff-app. Closes the remaining theoretical "attacker hits sendEmail then guesses the next request" path - WAF / rate limit: 1-IP burst protection on the password-reset endpoint
- ALB access log: enable for forensic capability — ours had access logs disabled, so we can't reconstruct the past 4 years of incidents
- Audit other "implicit trust" endpoints: there are likely a few more
If you run a SaaS with a similar password-reset flow, here's the test — try this from curl:
curl -X POST https://your-api.example.com/auth/passwordReset \
-H "Content-Type: application/json" \
-d '{"email":"someone@example.com","password":"attackerWasHere"}'
If that returns 200/201, you have the same vulnerability. The fix takes one line in your service layer.
Original Japanese version: Build-in-Public 第 8 弾
Hire me for security / API auth design reviews: tasteck.tech/work — non-industry projects welcome, English OK
Previous post: Stripe webhook silently failing for 5 days
Top comments (2)
The
getMany()on email without a unique constraint is doing more damage than it looks here. Your incident table shows the workaround was "create another staff record with the same email" — so now you've got multiple rows matching, and the password reset overwrites all of them in one shot. That turns a targeted account takeover into a blast radius problem where one attacker request corrupts every account sharing that email. The hot fix closes the unauthenticated path, but for the full fix I'd also look at whether email should be unique per-tenant (or at least per-role), and whether the reset should operate on a specific credential ID rather than a bare email lookup. Otherwise you're still onegetMany()away from the next class of bugs where operations meant for one account silently affect siblings.Sharp catch — yes, the unique-constraint gap is the structural issue, not just the unauthenticated path. Two follow-ups going into the backlog after this:
UNIQUE INDEX on (tenant_id, email) for both casts and staff — turns the
getMany()blast radius into single-row semantics, which then makes thereset_token IS NOT NULLguard a real one-shot operation.Migrate
passwordResetto a credential-id-based contract (POST /reset/:tokenIdwhere the token resolves to tenant + role + user), so the API stops doing email lookups at all.The hot fix is genuinely just stopping the bleed. The full fix is the schema + contract change you described. Going to credit your comment in the post-incident review blog — this is exactly the kind of review I publish for. Thanks.