Ravi Gupta

Posted on Apr 13

I Thought the Hard Part Was the Code. Turns Out Production Is Where Security Assumptions Go to Die.

#python #security #docker #backend

Rate limits, SMTP, and Docker build tips

This is Part 4 of a 4-part series on building AuthShield - a production-ready standalone authentication microservice. This post covers rate limiting, integration testing, switching from Mailtrap to production SMTP, and multi-stage Docker builds.
Previous parts:
Part 1 is here: Why I Stopped Writing Auth Code for Every Project and Built AuthShield
Part 2 is here: I Thought OAuth Was Just Adding a Google Button. Turns Out It's a CSRF Problem Disguised as a Feature
Part 3 is here: I Thought JWTs Were Stateless. Turns Out Logout Made Me Build a Stateful Layer Anyway.

I finished the AuthShield codebase. The auth flows worked. The token rotation worked. The OAuth implementation worked. Tests passed locally.

I thought I was done.

I was not done.

Shipping to production introduced four problems I had underestimated - rate limiting, testing strategy, email delivery, and Docker. None of them were complicated in isolation. But each one had a detail that mattered and that I only found by running into it.

Rate Limiting: It Is Not a Config Setting

I knew AuthShield needed rate limiting on sensitive endpoints. Login, registration, password reset - these are the obvious brute force targets. I assumed adding rate limiting would be straightforward.

What I had not thought carefully about was the difference between fixed-window and sliding-window rate limiting, and why that difference is a security concern rather than just an implementation detail.

A fixed-window rate limiter works by counting requests within a fixed time window. Allow 5 login attempts per minute. The window resets at the top of every minute. Simple to understand, simple to implement.

The problem is the boundary. An attacker who knows the window resets at :00 can send 5 requests at :59 and 5 more at :01. That is 10 requests in 2 seconds against a limiter that is supposed to allow 5 per minute. The boundary is exploitable.

A sliding-window rate limiter has no fixed boundary. It counts requests in the last N seconds from the current moment, not from the start of a fixed window. The window moves with time. There is no reset to exploit.

AuthShield implements sliding-window rate limiting using Redis sorted sets:

import time
from redis.asyncio import Redis
from fastapi import HTTPException, Request

class SlidingWindowRateLimiter:
    def __init__(self, redis: Redis):
        self.redis = redis

    async def check_rate_limit(
        self,
        key: str,
        max_requests: int,
        window_seconds: int
    ) -> tuple[bool, int]:
        now = time.time()
        window_start = now - window_seconds

        pipe = self.redis.pipeline()

        # Remove entries outside the current window
        # Sorted set score is the timestamp — remove anything older than window_start
        pipe.zremrangebyscore(key, 0, window_start)

        # Count remaining entries in the window
        pipe.zcard(key)

        # Add current request with timestamp as score
        pipe.zadd(key, {str(now): now})

        # Set TTL so keys expire automatically when no longer needed
        pipe.expire(key, window_seconds)

        results = await pipe.execute()
        current_count = results[1]

        if current_count >= max_requests:
            # Calculate when the oldest request in the window will expire
            oldest = await self.redis.zrange(key, 0, 0, withscores=True)
            retry_after = int(oldest[0][1] + window_seconds - now) if oldest else window_seconds
            return False, retry_after

        return True, 0


async def rate_limit_login(
    request: Request,
    redis: Redis
) -> None:
    client_ip = request.client.host
    key = f"rate:login:{client_ip}"

    limiter = SlidingWindowRateLimiter(redis)
    allowed, retry_after = await limiter.check_rate_limit(
        key=key,
        max_requests=5,
        window_seconds=60
    )

    if not allowed:
        raise HTTPException(
            status_code=429,
            detail="Too many login attempts. Please try again later.",
            headers={
                "Retry-After": str(retry_after),
                "X-RateLimit-Limit": "5",
                "X-RateLimit-Window": "60"
            }
        )

Three things worth noting here.

The pipeline executes all Redis operations atomically in a single round trip. Without pipelining, a high-traffic scenario could have a race condition between the count check and the new entry addition.

The Retry-After header tells the client exactly how many seconds to wait before retrying. This is important for programmatic clients - without it, they have to guess or implement exponential backoff blindly.

Rate limiting is disabled during testing via a TESTING environment variable. All tests except the dedicated rate limiting tests run without limits - otherwise every test suite would need to manage rate limit state between tests, which is painful and slow.

async def rate_limit_login(request: Request, redis: Redis) -> None:
    # Skip rate limiting in test environment
    if settings.TESTING:
        return

    # Normal rate limit check...

The rate limiting tests then explicitly unset this flag to test the real behaviour:

# tests/test_rate_limiting.py
async def test_login_rate_limit_enforced(client, test_settings):
    # Temporarily disable the TESTING bypass
    test_settings.TESTING = False

    # Make 6 requests — 5th should succeed, 6th should return 429
    for i in range(5):
        response = await client.post("/api/v1/auth/login", json=valid_credentials)
        assert response.status_code != 429

    response = await client.post("/api/v1/auth/login", json=valid_credentials)
    assert response.status_code == 429
    assert "Retry-After" in response.headers

Integration Testing: Mocks Test Your Assumptions, Not Your Code

When I started writing tests for AuthShield, I had a choice. Mock the database and Redis and test the business logic in isolation. Or run tests against real PostgreSQL and Redis instances.

I chose real infrastructure. Here is why.

Mocks are a contract between your test and your assumptions about how a dependency behaves. If your assumption is wrong - if Redis behaves differently than you expect under certain conditions, if a SQLAlchemy async session does not flush the way you assumed, if a transaction rollback leaves state you did not account for - the mock will pass and the bug will ship.

Real infrastructure tests what actually happens. The test either works or it does not.

AuthShield has 48 integration tests. Six of them caught bugs that mocked tests would not have found:

A race condition in the token rotation logic that only appeared under the async session behaviour of the real database driver
A Redis pipeline that behaved differently than expected when the key did not exist yet
An email verification token that was being deleted before the transaction committed, causing intermittent failures
A session query that returned stale data because of SQLAlchemy's identity map caching
A rate limit key that was not expiring correctly because of a TTL calculation error that only showed with real time passing
A refresh token family query that was missing an index, only visible with real query execution

None of these would have been caught by mocks. All of them would have shipped to production.

The test setup uses a transaction rollback pattern to keep tests isolated without resetting the entire database between each test:

# tests/conftest.py
import pytest
import pytest_asyncio
from sqlalchemy.ext.asyncio import AsyncSession, create_async_engine
from httpx import AsyncClient, ASGITransport

@pytest_asyncio.fixture
async def db():
    # Create a connection and begin a transaction
    async with test_engine.connect() as connection:
        await connection.begin()

        # Create a session bound to this connection
        async with AsyncSession(bind=connection) as session:
            yield session

        # Roll back after every test — no persistent test data
        await connection.rollback()


@pytest_asyncio.fixture
async def client(db):
    # Override the database dependency to use the test session
    app.dependency_overrides[get_db] = lambda: db

    async with AsyncClient(
        transport=ASGITransport(app=app),
        base_url="http://test"
    ) as ac:
        yield ac

    app.dependency_overrides.clear()

Every test runs inside a transaction that rolls back when the test finishes. The database is never in a dirty state between tests. No teardown logic, no truncating tables, no test ordering dependencies.

Email: The Mailtrap to Production SMTP Gap

During development, AuthShield uses Mailtrap for email delivery. Mailtrap is a fake SMTP server - it catches all outgoing emails and shows them in a web inbox. Nothing reaches a real email address. This is the right approach for development: you can test email flows without worrying about accidentally sending to real users or burning through email provider quotas.

In production, Mailtrap stops working. Real emails need to go to real inboxes.

The switch to Gmail SMTP in production introduced problems I had not anticipated.

The first was credentials. Gmail does not accept your regular password for SMTP authentication. You need to enable two-factor authentication on the Google account and generate an app password - a 16-character password specifically for the application. This is not obvious from the error messages when it fails.

The second was deliverability. Verification emails were arriving in spam. The from address, the subject line, and the sending domain all affect spam scoring. Getting a verification email reliably into the inbox required testing from multiple email providers - Gmail to Gmail, Gmail to Outlook, Gmail to other domains - because each has different filtering behaviour.

The third was debugging. When an email does not arrive locally, Mailtrap tells you immediately - it is sitting in the inbox. When an email does not arrive in production, you are debugging across SMTP logs, spam folders, and delivery receipts. The feedback loop is slower and the failure modes are less obvious.

# config.py
from pydantic_settings import BaseSettings

class Settings(BaseSettings):
    # Email configuration
    EMAIL_PROVIDER: str = "smtp"  # "smtp" for both dev and prod

    SMTP_HOST: str = "sandbox.smtp.mailtrap.io"  # Override in production
    SMTP_PORT: int = 587
    SMTP_USER: str = ""
    SMTP_PASSWORD: str = ""
    SMTP_USE_TLS: bool = True

    EMAIL_FROM: str = "noreply@authshield.dev"
    EMAIL_FROM_NAME: str = "AuthShield"

    # Frontend URL used in email links
    FRONTEND_URL: str = "http://localhost:3000"

# In development .env:
# SMTP_HOST=sandbox.smtp.mailtrap.io
# SMTP_USER=your_mailtrap_username
# SMTP_PASSWORD=your_mailtrap_password

# In production .env:
# SMTP_HOST=smtp.gmail.com
# SMTP_PORT=587
# SMTP_USER=your.email@gmail.com
# SMTP_PASSWORD=your_16_char_app_password  <- Not your regular Gmail password
# FRONTEND_URL=https://yourdomain.com

The configuration is identical between environments - only the values change. This is intentional. Switching from Mailtrap to production SMTP should be a config change, not a code change.

Docker: Multi-Stage Builds and Why the Final Image Matters

AuthShield's Dockerfile uses a multi-stage build. This is worth explaining because the default approach most developers take - a single Dockerfile that installs everything - ships unnecessary weight and unnecessary risk to production.

A single-stage build installs build tools, compilers, and development dependencies, then runs the application. All of that stays in the image. The final image is large and contains tools that have no business running in production.

A multi-stage build separates the build environment from the runtime environment:

# Stage 1: Build
# Install all dependencies including build tools
FROM python:3.12 AS builder

WORKDIR /app
COPY requirements.txt .

# Install dependencies into a specific directory
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Runtime
# Start from a minimal base image — no build tools
FROM python:3.12-slim

WORKDIR /app

# Copy only the installed packages from the builder stage
COPY --from=builder /install /usr/local

# Copy application code
COPY app/ ./app/
COPY alembic/ ./alembic/
COPY alembic.ini .

# Non-root user — never run application code as root
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser

EXPOSE 8000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

The final image is built from python:3.12-slim - a minimal base image. It contains only the installed packages and the application code. No pip, no gcc, no build tools of any kind. Smaller image, faster deployments, and a reduced attack surface.

The non-root user is also worth noting. Running application code as root inside a container is a bad practice - if the application is compromised, the attacker has root access inside the container. Running as a dedicated non-root user limits the damage.

Docker Compose ties everything together:

# docker-compose.yml
services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql+asyncpg://user:password@postgres:5432/authshield
      - REDIS_URL=redis://redis:6379/0
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password
      POSTGRES_DB: authshield
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d authshield"]
      interval: 5s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    volumes:
      - redisdata:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 5

volumes:
  pgdata:
  redisdata:

The depends_on with condition: service_healthy matters. Without health checks, Docker starts the API container the moment the Postgres and Redis containers start - not when they are ready to accept connections. The API tries to connect, fails, and crashes. Health checks ensure the API only starts after its dependencies are actually ready.

What Production Actually Taught Me

Looking back at the full AuthShield build, the code was the part I thought about most before starting. Turned out it was the part that went most smoothly.

Rate limiting required understanding the security distinction between fixed and sliding windows before I could implement it correctly. Testing required a deliberate decision about what kind of confidence I actually wanted - and real infrastructure gave me confidence that mocks could not. Email required understanding the gap between a development convenience and production reality. Docker required thinking about what belongs in a production image and what does not.

None of these are about writing clever code. They are about understanding what you are actually building and what it means for it to work in the real world.

That is the thing about security-focused infrastructure. The code is the beginning. Everything after it is where the assumptions get tested.

What Is Next for AuthShield

The current version is stable and in use. A few things are on the roadmap:

GitHub OAuth is fully implemented and working alongside Google. The contribution guide in the README covers how to add additional OAuth providers - the pattern is consistent once you understand the first one.

Multi-tenancy is something I am thinking about for a future version - allowing AuthShield to serve multiple isolated applications from a single deployment. That is a significant architectural change and not something I want to rush.

The repo is open. If you have built something on top of AuthShield or found something worth improving, contributions and feedback are welcome.

Integrating AuthShield Into Any Backend Project

Once AuthShield is deployed, any backend project needs exactly one thing to start validating its tokens - the same SECRET_KEY AuthShield uses to sign them. Your frontend handles all auth operations through AuthShield directly. Your backend receives the JWT, validates it locally using the shared secret, and reads the user ID and roles from the payload. No call back to AuthShield on every request, no users table in your application, no auth logic anywhere except AuthShield itself.

The full integration guide covering FastAPI, Django, Flask and Express is in the README: https://github.com/ravigupta97/authshield

Closing the Loop

This series started with a simple observation: I kept writing the same auth code across every project. AuthShield was the answer to that - build it once, properly, and never write it again.

Four posts later, the honest summary is this. The auth flows are the visible part. The rate limiting, the testing discipline, the production email setup, the Docker configuration - these are the invisible part that determines whether the visible part actually holds up.

Security is not a feature you implement. It is a property you maintain across every layer, from the JWT claims to the Docker image to the SMTP credentials in your environment variables.

AuthShield taught me that more clearly than anything I had read about it.

Always learning, always observing.

Top comments (18)

BridgeXAPI • Apr 15

This part about everything working locally but breaking assumptions in production is very real.

Ran into something similar on the messaging side. Requests were valid, responses were 200, logs looked clean, but delivery behavior still varied in ways we couldn’t explain at first.

What made it tricky is that once the request leaves your system, you’re depending on a whole chain you don’t control. Routing decisions, carrier handling, timing, even filtering can change the outcome without anything in your code changing.

So from the app perspective everything is “correct”, but the execution path isn’t stable.

Feels like a lot of these problems only show up when you treat the system as more than just your code and start looking at what happens after the boundary.

Curious if you ended up adding more visibility around those external layers, or if you just accepted some level of unpredictability there.

Ravi Gupta • Apr 16

Exactly this. The boundary between your system and the external layer is where the interesting failures live, and they are the hardest to debug because everything on your side looks correct.

On visibility around those external layers - partially. The startup SMTP check and health endpoint catch configuration failures immediately. Richer error logging on failures helps narrow down where in the chain things broke. But once the email leaves your SMTP server you are largely dependent on delivery receipts and bounce handling, which I have not fully wired up yet.

The honest answer is some level of unpredictability is just the reality of depending on external systems. The best you can do is fail loudly at the boundary, log enough context to reconstruct what happened, and know quickly when something breaks rather than finding out two days later from a user complaint.

BridgeXAPI • Apr 16

Yeah that’s fair.

I used to think of it as unpredictability too, but over time it started looking more like hidden variation rather than randomness.

Same request leaves your system, but depending on how it gets handled downstream, you end up with different timing, paths or even filtering decisions.

From the outside it feels unpredictable, but there’s actually structure there, just not exposed.

That’s the part I’ve been finding hardest to reason about.

Ravi Gupta • Apr 17

"Hidden variation rather than randomness" is a much better mental model. There is structure in how those downstream systems behave, it is just not exposed to you. That is what makes it hard to reason about - you are trying to debug a system you can only observe at the boundary, not inspect from inside.

BridgeXAPI • Apr 17

Exactly that.

At some point the problem isn’t that things fail, it’s that you don’t have a model of how the system behaves anymore.

You can observe inputs and outputs, but without visibility into the execution path, you can’t really reason about why outcomes differ.

That’s where most debugging just turns into trial and error.

Ravi Gupta • Apr 17

Trial and error is exactly where you end up without a model of the execution path. At that point you are not debugging, you are guessing with extra steps. The visibility problem is what makes external system failures so expensive - you cannot reason about what you cannot observe.

Miloslav Homer • Apr 13

Great effort!

Auth is one of the most heavily targeted functionality in prod. Even low rate-limits are problematic for registration endpoint (I'd recommend captcha).

How did you setup logging and monitoring please? This is invaluable info when investigating incidents.

Ravi Gupta • Apr 13

Thank you!

You are absolutely right. Per-IP rate limiting does not stop distributed attacks. CAPTCHA on registration is the proper fix, hCaptcha or Cloudflare Turnstile specifically. It is on the roadmap. Current defence for now is Nginx rate limiting, Redis sliding window, and bcrypt cost factor making each attempt slow even if it gets through.

On logging, AuthShield uses structlog for structured JSON logs on every auth event. The two alerts worth setting up first are a spike in AUTH_INVALID_CREDENTIALS (brute force signal) and any AUTH_REFRESH_TOKEN_REUSED event (token theft signal, every single occurrence deserves investigation). These would ship to Better Stack or Datadog in production.

Good suggestions, appreciate the feedback!

Miloslav Homer • Apr 14

I'd recommend also logging missing username/email, logging success. And that's before we get into all that GeoIP business.

I've also noticed that the DBs are included in the Docker compose - I'd be careful around that, one wrong push and you're overriding your identity DB.

Good luck, this is tough, very tough to get right.

Ravi Gupta • Apr 14

Really appreciate this, clearly production-grade thinking that is hard to learn from tutorials.

On granular logging: you are right, failure reasons are too coarse right now. Distinguishing email_not_found, wrong_password, account_disabled makes the difference between "something failed" and "we are being credential stuffed." Adding this.

On GeoIP : on the roadmap. MaxMind GeoLite2 is the free path, offline database so no API call per request. Country code on every auth event is enough to start spotting unusual patterns before getting into impossible travel detection.

On Docker Compose and the DB : valid warning and You're right to flag this, and more people should hear. Our production database is on Neon, completely independent of any Docker Compose operation so that specific risk does not apply here. Adding a prominent warning comment to the docker-compose.yml to make that explicit for anyone who forks the repo.

Genuinely useful feedback, comments like yours are more useful than any tutorial
Thank you!

Mykola Kondratiuk • Apr 14

the operational layer is always the gap - auth code gets reviewed, rate limits and smtp config get skimmed. spent two days debugging a prod email issue that never surfaced in staging.

Ravi Gupta • Apr 16

That's exactly what happened during my own deployment: registration returned 201, user created, but verification email silently dropped due to an env var mismatch (SMTP_USER vs SMTP_USERNAME). The code worked perfectly, the operational layer failed silently.
Two things I've added since:

Startup SMTP check - app tests the SMTP connection at boot and logs a loud warning if it fails. Catches misconfigured credentials in the first 10 seconds of a deploy rather than when a user complains.
Richer failure logs - email errors now log the host, port, username, and whether the password env var was even set. Turns a two-day debug into two minutes.

The broader lesson I learned is that the Mailtrap in staging creates false confidence because the code path is identical but the operational layer is completely different. The only real test of production email is production email.

Mykola Kondratiuk • Apr 16

The SMTP_USER vs SMTP_USERNAME gap is exactly what I call config drift - code and infra docs evolving separately. I now treat env var audits as a mandatory pre-launch gate, not a debug step after the fact. A startup smoke test that actually hits the email path catches this before users see it.

Ravi Gupta • Apr 17

Config drift is exactly the right term for it. Env var audits as a pre-launch gate rather than a post-incident debug step that is a habit I am adopting going forward. The startup smoke test making it a hard check rather than a hope is the right call.

Mykola Kondratiuk • Apr 17

Startup smoke test as a hard gate is so underrated. Most teams treat env var validation as a post-incident lesson rather than a pre-flight checklist. Decoupling the audit from the incident timeline is exactly what separates teams that scale from teams that perpetually firefight.

Ravi Gupta • Apr 17

"Separates teams that scale from teams that perpetually firefight" - that framing is going in my notes. Pre-flight checklist is the right mental model. Catching it before the incident rather than learning from it after is the whole point.

Mykola Kondratiuk • Apr 17

Right — pre-flight works because it shifts validation from debugging the symptom to gating the cause. Teams that keep firefighting usually have their hard-won knowledge locked in incident post-mortems instead of startup contracts. The moment you codify it as a hard gate, the lesson stops needing to be relearned every six months.

Henry A • Apr 17

This matches what I've seen across dozens of AWS accounts. The gap between "it works in dev" and "it's secure in production" is almost always the same list: no CloudTrail in all regions, default VPC still exists, S3 buckets with overly permissive policies, IAM users with long-lived access keys and no MFA enforcement, and no GuardDuty or Config rules to catch drift.

The frustrating part is that most of these are 15-minute fixes with the right CloudFormation or Terraform — but teams don't know what to check. CIS AWS Foundations Benchmark is a solid starting point. Even just implementing the Level 1 controls (password policy, CloudTrail, S3 Block Public Access, root account monitoring) closes 80% of the attack surface that trips people up in production.

View full discussion (18 comments)