edhiblemeer

Posted on May 8 • Originally published at tasteck.tech

A 4-year-old auth-bypass vulnerability hidden in our password-reset API — discovery, hot fix, recovery

#security #saas #buildinpublic #webdev

After my last post about a Stripe webhook silently failing for 5 days, the next incident hit two days later.

It started with one support ticket from a customer:

"Our staff says they can't log in. They didn't change their password."

Another store reported the same symptom. "It happens occasionally."

That "occasionally" turned out to be a 4-year-old API auth-bypass vulnerability. Build-in-Public post #8 — full incident log.

The morning: investigation begins

I checked the database. The affected account's password column (a bcrypt hash) had indeed been updated that morning. But the user says they didn't change it.

My first hypothesis: a bug in the staff admin panel where editing a cast (= performer / staff member) silently overwrites their password. Classic React form-state hidden-field issue.

I reproduced in QA:

Pick a test cast, open the edit modal
Inspect the DOM → no password input field exists at all
Save without changes → password hash unchanged
Change the display name and save → check Network tab → request body has no password field
DB password hash unchanged

→ The edit modal is not the culprit. It has to be something else.

Going through the history

I dug back through the database for similar cases. One specific email address had the same "can't log in" event hit twice already:

Year	Event
2021-12	One staff with that email locked out → admin creates a new staff record with the same email
2023-09	Different staff with the same email, same symptom → admin creates yet another record
2026-05	Today's incident

So this is a chronic, recurring problem — at least 4 years running.

The root cause

I dove into the server code for the password-reset endpoint:

// controller (cast password reset)
@Post(`/passwordReset`)
async passwordReset(@Body() req: { email: string; password: string }) {
  return await this.connection.transaction(async (entityManager) => {
    return await this.service.passwordReset(entityManager, req);
  });
}

// service
async passwordReset(entityManager, req: { email: string; password: string }) {
  const casts = await ...createQueryBuilder("cast")
    .where("cast.email = :email", { email: req.email })
    .getMany();
  if (!casts.length) throw new HttpException("...", 400);
  const password = await bcrypt.hash(req.password, 10);
  // overwrite all matching casts' password
  ...
}

The structural issues:

No auth guard (no @UseGuards(...) or auth decorator)
No resetToken validation
POST { email, password } and the endpoint will overwrite that account's password — full stop

The "reset URL" sent in the password-reset email contains a ?token=... query string — but the frontend uses that token only to fetch the email address (via findByResetToken). The server never validates the token on the actual reset call.

→ Anyone who knows an email address can hit the API directly and overwrite that account's password. That's been live for 4 years.

In our industry (vertical SaaS for Japan's nightlife sector), customer email addresses circulate among adjacent vendors. The attack vector is real.

Hot fix design

The full proper fix (change the controller signature to { resetToken, password } and update the frontend in two apps) requires rebuilding both frontends and invalidating CloudFront caches. Heavy for an emergency deploy.

Minimum-surface fix:

// service.ts (cast)
async passwordReset(entityManager, req: { email: string; password: string }) {
  const casts = await ...createQueryBuilder("cast")
    .where("cast.email = :email", { email: req.email })
    .andWhere("cast.reset_token IS NOT NULL")  // ← one-line guard
    .getMany();
  ...
}

// service.ts (staff) — same single-line addition
.andWhere("staff.reset_token IS NOT NULL")

Effects:

✅ Direct hits without going through sendEmail first are rejected (reset_token is null)
✅ After a successful reset, resetToken clears to null — prevents back-to-back tampering
✅ The legitimate flow (frontend sendEmail → email → URL → new password) still works without any frontend changes
✅ No frontend rebuild required, server-only deploy

QA E2E test

I deployed to QA and ran 4 cases:

Test	Expected	Actual
Direct hit (cast, no token)	400	✅ 400
Direct hit (staff, no token)	400	✅ 400
Legit flow (sendEmail → reset)	201	✅ 201
Replay after token clears	400	✅ 400

All as expected. Pushed to production.

Production deploy + recovery

Deployed to production EC2 (Node.js + PM2 + NestJS), built, pm2 restart api. Five seconds to come back online, 92MB stable.

Verified the same 400 on production direct hits → vulnerability closed.

But the affected account already had its password overwritten by the attacker, so the legitimate user still can't log in. I ran an admin script to force-reset their password to a safe random value, then communicated the temp password to the customer through a side channel and asked them to log in and immediately change it themselves.

Lessons

1. "Happens occasionally" is not a feature, it's an unsolved bug

The store treated this as a known quirk and just kept asking us to reissue accounts. For 4 years. Take the customer's words ("but I didn't change it") seriously instead of pattern-matching to "yet another forgotten password."

2. PR plan < Emergency repair

I had a whole day of PR work scheduled — all canceled. Of course. And then publishing the incident as a Build-in-Public post is more transparent than "we shipped what we planned."

3. The "implicit trust" assumption is where vulnerabilities hide

"Server doesn't validate resetToken here, but the frontend uses it for fetching email, so it's fine." That kind of implicit-trust reasoning is exactly how 4-year-old vulnerabilities survive.

The right design assumption: attackers will hit your API directly, regardless of what your frontend does.

4. Minimum-surface hot fix is a discipline

Full proper fix takes longer; "service-layer one-line guard" closes the immediate attack surface in minutes. The tradeoff is fine — schedule the proper refactor later.

What's left

Full fix: change the controller signature to { resetToken, password } + frontend updates in both cast-app and staff-app. Closes the remaining theoretical "attacker hits sendEmail then guesses the next request" path
WAF / rate limit: 1-IP burst protection on the password-reset endpoint
ALB access log: enable for forensic capability — ours had access logs disabled, so we can't reconstruct the past 4 years of incidents
Audit other "implicit trust" endpoints: there are likely a few more

If you run a SaaS with a similar password-reset flow, here's the test — try this from curl:

curl -X POST https://your-api.example.com/auth/passwordReset \
  -H "Content-Type: application/json" \
  -d '{"email":"someone@example.com","password":"attackerWasHere"}'

If that returns 200/201, you have the same vulnerability. The fix takes one line in your service layer.

Original Japanese version: Build-in-Public 第 8 弾
Hire me for security / API auth design reviews: tasteck.tech/work — non-industry projects welcome, English OK
Previous post: Stripe webhook silently failing for 5 days

Top comments (2)

Rahul S • May 8

The getMany() on email without a unique constraint is doing more damage than it looks here. Your incident table shows the workaround was "create another staff record with the same email" — so now you've got multiple rows matching, and the password reset overwrites all of them in one shot. That turns a targeted account takeover into a blast radius problem where one attacker request corrupts every account sharing that email. The hot fix closes the unauthenticated path, but for the full fix I'd also look at whether email should be unique per-tenant (or at least per-role), and whether the reset should operate on a specific credential ID rather than a bare email lookup. Otherwise you're still one getMany() away from the next class of bugs where operations meant for one account silently affect siblings.

edhiblemeer • May 8

Sharp catch — yes, the unique-constraint gap is the structural issue, not just the unauthenticated path. Two follow-ups going into the backlog after this:

UNIQUE INDEX on (tenant_id, email) for both casts and staff — turns the getMany() blast radius into single-row semantics, which then makes the reset_token IS NOT NULL guard a real one-shot operation.
Migrate passwordReset to a credential-id-based contract (POST /reset/:tokenId where the token resolves to tenant + role + user), so the API stops doing email lookups at all.

The hot fix is genuinely just stopping the bleed. The full fix is the schema + contract change you described. Going to credit your comment in the post-incident review blog — this is exactly the kind of review I publish for. Thanks.