Ravi Gupta

Posted on Apr 20

Everyone Talks About How to Build Secure Auth. Nobody Talks About What to Watch After You Ship It.

#python #backend #security #webdev

This is a bonus post in the AuthShield series - a production-ready standalone authentication microservice. The original 4-part series covered building auth from scratch. This post covers what to watch after you ship it: structured logging, the two alert patterns that actually matter in production, and where the gaps still are.
Previous parts:
Part 1 is here: Why I Stopped Writing Auth Code for Every Project and Built AuthShield
Part 2 is here: I Thought OAuth Was Just Adding a Google Button. Turns Out It's a CSRF Problem Disguised as a Feature
Part 3 is here: I Thought JWTs Were Stateless. Turns Out Logout Made Me Build a Stateful Layer Anyway.
Part 4 is here: I Thought the Hard Part Was the Code. Turns Out Production Is Where Security Assumptions Go to Die.

AuthShield is done. Four posts, every decision documented, the repo is public. I thought that was the end of it.

Then a security architect left a comment on the last post asking how I set up logging and monitoring. His exact words: "This is invaluable info when investigating incidents." I replied in the comments but the reply kept growing. So here it is properly.

This is not about how to build auth. That's done. This is about what you actually need to watch once it's running. Because shipping secure auth and knowing whether it's holding up in production are two completely different problems - and most engineers only think about the first one.

The Problem With Most Auth Logging

When you first set up logging, the instinct is to log everything and figure out what matters later. That is the wrong instinct.

I made this mistake early in AuthShield. I had logs running. Every request was leaving a trace. I felt covered. But what I actually had was volume without signal - a stream of events that told me the system was running but wouldn't tell me anything useful the moment something went wrong.

The problem is that not all logs are equal. There's a difference between logging that something happened and logging enough context to understand what happened, why it happened, and who triggered it. Most default setups give you the first. What you need for security is the second.

Plain text logs look like this in practice:

2026-04-19 08:23:11 WARNING Login failed for user@example.com

That tells you a login failed. It doesn't tell you which IP made the attempt, how many times that IP has tried in the last sixty seconds, whether this is the same email being targeted repeatedly or a different one each time, or what the specific failure reason was. During normal operation that feels fine. During an incident at 2am it means you're guessing.

Structured JSON logging changes the format of every event:

{
  "event": "AUTH_LOGIN_FAILED",
  "user_id": null,
  "email": "user@example.com",
  "ip_address": "203.0.113.42",
  "failure_reason": "invalid_credentials",
  "timestamp": "2026-04-19T08:23:11Z",
  "level": "warning"
}

Same event. Completely different investigative value. Every field is queryable. You can ask your log aggregator: how many AUTH_LOGIN_FAILED events came from this IP in the last five minutes? How many distinct emails were targeted? Was this a password failure or an account that doesn't exist? You can't ask those questions when the answer is buried in a plain text string.

AuthShield uses structlog for this reason. Not because it's trendy but because when something goes wrong - and something will go wrong in any production auth system that's actually being used - you need to be able to reconstruct exactly what happened without digging through unstructured text.

The fields that matter on every auth event: event type, user ID or email, IP address, timestamp, outcome, and failure reason when applicable. That's the minimum set that makes logs useful rather than decorative.

The reframe that helped me was this: don't ask "what should I log." Ask "what would I need if I had to explain exactly what happened during an incident to someone who wasn't there." Design your logs to answer that question and you'll log the right things.

The Two Signals That Actually Matter

You can instrument every auth event with structured logs and still miss what's important if you don't know which patterns to watch for. Most of what your logs generate day to day is normal - logins, refreshes, logouts, profile updates. The signal you need is buried in the noise.

After thinking carefully about what AuthShield actually emits and what each event means from a security perspective, two signals stand out as genuinely actionable. Everything else is context. These two are alerts.

AUTH_INVALID_CREDENTIALS Spike

One failed login is a user mistyping their password. It happens constantly and means nothing. Ten failed logins in sixty seconds from the same IP is a brute force attempt. Twenty failed logins targeting ten different email addresses from the same IP in the same window is credential stuffing - an attacker running a known list of email/password pairs from a previous data breach against your login endpoint.

The individual event is noise. The pattern is the signal.

This is the alert you set up first, and it's rate-based not occurrence-based. You don't want to be notified every time a login fails. You want to be notified when the failure rate crosses a threshold that indicates automated attack behaviour rather than human error.

What distinguishes brute force from credential stuffing in the logs is the email pattern. Brute force typically hammers a single account - same email, many passwords, same IP. Credential stuffing cycles through many accounts - different email each attempt, same IP or a small cluster of IPs, usually one attempt per email because the attacker is testing known pairs not guessing passwords.

Both look like AUTH_INVALID_CREDENTIALS in your logs. The difference is visible when you query: is this one email being targeted repeatedly, or is the failure spread across many different emails from the same source? That query is only possible if you're logging the email and IP on every failure event.

AuthShield's rate limiting - Redis sliding window per IP - slows this down at the endpoint level. But rate limiting is not a detection mechanism. It's a speed bump. The logs are what tell you the attack is happening, how severe it is, and whether it's a single-IP brute force that your rate limiter is already handling or a distributed attack that needs a different response.

One more thing the commenter pointed out that I hadn't fully implemented initially: granular failure reasons. My original setup logged invalid_credentials for both wrong password and nonexistent email - intentionally, to prevent email enumeration at the API level. But internally, in logs that are never exposed to the client, you want to distinguish between them. email_not_found versus wrong_password versus account_disabled tells you fundamentally different things about what's happening. If you're seeing hundreds of email_not_found failures across many different addresses it's almost certainly enumeration or credential stuffing. If you're seeing wrong_password failures concentrated on a handful of accounts it's targeted brute force. The error code the client sees can stay generic. The log your monitoring system sees should be specific. I've updated AuthShield to log the granular reason internally since that comment.

AUTH_REFRESH_TOKEN_REUSED

This one operates on completely different logic. There is no volume threshold. There is no rate to monitor. Every single occurrence of this event deserves immediate investigation, full stop.

Here's the mechanism behind why. AuthShield rotates refresh tokens on every use. When you call the refresh endpoint, you hand in your current refresh token, and in return you get a new access token and a new refresh token. The old refresh token is immediately marked as used and invalidated. It can never be used again.

If an old refresh token - one that has already been rotated - shows up at the refresh endpoint, there are only two possible explanations. Either your client code has a bug that's somehow sending the old token instead of the new one, which you would catch immediately in development. Or someone stole the refresh token from your user's device or intercepted it in transit, and they're trying to use it.

In practice, in a production system where your client code is working correctly, AUTH_REFRESH_TOKEN_REUSED means token theft. Not probably. Not possibly. Means.

What AuthShield does when this happens is revoke the entire token family. The family is the chain of tokens connected through rotation - the original token and every successor. Revoking the family means both the attacker and the legitimate user are logged out simultaneously. The attacker's stolen token is dead. The legitimate user's current token is dead. The user notices their session ended unexpectedly, logs back in, and gets a new token family. The attacker has nothing.

This is the right behaviour. But the log event is the early warning signal. If you see AUTH_REFRESH_TOKEN_REUSED once for a user, investigate - look at which IP submitted the reuse request versus which IP the legitimate user has been authenticating from. If you see it repeatedly for the same user, their device or a session token stored somewhere is compromised and they need to know.

The reason this event needs an any-occurrence alert rather than a rate-based one is that a single occurrence is already significant. You don't wait for ten token theft signals before you respond.

Where These Logs Should Go

Structured JSON logs sitting inside a running container are better than plain text logs sitting inside a running container. They're still useless during an incident if you can't query them.

The logs need to ship to an external aggregator - somewhere you can write queries, set up alerts, and look at trends without SSH-ing into a server and running grep commands while something is actively on fire.

Better Stack and Datadog are both solid choices depending on where you are.

Better Stack is the right starting point if you're running a solo project or a small team and don't need a complex setup. The log ingestion is straightforward - point your structlog output at their endpoint and logs start appearing within seconds. The alerting UI is clean enough to set up the two alerts above without writing complex rules. The pricing is much more accessible at low volume, which matters when you're running a project that isn't yet processing millions of auth events per day.

Datadog makes more sense once you need depth. The alerting is significantly more powerful - you can write multi-condition alerts, correlate across multiple log streams, set up anomaly detection that learns your normal baseline rather than requiring you to set static thresholds. The dashboards let you visualise AUTH_INVALID_CREDENTIALS over time in a way that makes the spike pattern immediately obvious rather than having to manually count events. The tradeoff is cost and setup complexity, both of which are higher than Better Stack.

Either way, the two alerts worth configuring first are the same:

A rate-based alert on AUTH_INVALID_CREDENTIALS - trigger when the count from a single IP exceeds a threshold within a rolling time window. What threshold? Start conservative. Look at your normal failure rate first and set the alert at three to five times that. Tune it down as you understand what your baseline actually looks like. A threshold set too low generates alert fatigue. A threshold set too high means the attack is well underway before you know.

An any-occurrence alert on AUTH_REFRESH_TOKEN_REUSED - no threshold, no rate, no rolling window. The moment this event appears in your logs, you want to know.

Redis memory is worth adding to your monitoring even though it's not an auth event. Rate limit keys accumulate in Redis under sustained attack - every IP attempting authentication generates a sorted set entry that expires after the rate limit window. Under normal traffic this is trivial. Under a sustained attack from many IPs, the key count climbs. Redis memory usage climbing during the same window that AUTH_INVALID_CREDENTIALS is spiking is corroborating evidence that you're under active attack at scale. Infrastructure metric and application log telling the same story at the same time.

What's Still Missing

I think one of the most useful things you can do in any technical post is be honest about where the gaps are. Here are the two I know about.

CAPTCHA on registration

The commenter who prompted this post flagged this directly: per-IP rate limiting does not stop distributed attacks. He's right. If an attacker has a botnet with a thousand distinct IP addresses, each IP can attempt the registration endpoint three times before hitting the rate limit. That's three thousand registration attempts before a single IP is blocked. The rate limiter was never designed to defend against that scenario.

The proper fix is CAPTCHA on the registration endpoint. hCaptcha or Cloudflare Turnstile specifically - both are privacy-respecting, both are effective against automated registration attempts, and both are significantly harder to bypass than IP-based rate limiting alone. This is on the AuthShield roadmap. The current defence is Nginx rate limiting layered with Redis sliding window and bcrypt's cost factor making each attempt slow even if it gets through - but that's a mitigation stack, not a solution.

GeoIP on auth events

Right now every auth event logs the IP address. What it doesn't log is where that IP address is geographically. Adding country code to every auth event costs almost nothing - MaxMind GeoLite2 is a free offline database, no API call per request, just a local lookup that adds a few microseconds.

Once you have it, a whole category of detection becomes possible that isn't currently. The most obvious is impossible travel - a user successfully authenticates from India at 08:00, then a successful authentication for the same account appears from Germany at 08:04. That's not a timezone difference. That's either a stolen session or a compromised account, and it's worth an immediate alert and a forced re-authentication.

Without GeoIP you can spot the pattern if you manually look at the IPs and trace them. With GeoIP your monitoring system can spot it automatically. The difference in response time between those two scenarios is the difference between catching a stolen session in minutes and finding out about it days later when the user notices something wrong.

Neither of these is in the current version. But knowing they're missing means the gap is visible and on the list rather than invisible and forgotten.

The Honest Summary

Shipping secure auth is not the finish line. It's the starting line for a different problem.

The code defines what should happen. The logs record what actually happened. The alerts tell you when what actually happened requires you to act right now. All three layers need to exist and all three need to be wired together before you can say your auth system is genuinely production-ready rather than just production-deployed.

I thought shipping AuthShield meant I was done thinking about security. Turns out shipping it just meant the attack surface was now live and the security thinking had to shift from design to observation.

Two events. One pattern-based, one occurrence-based. That's where I'd start.

Always learning, always observing.

AuthShield repo: https://github.com/ravigupta97/authshield