DEV Community: Wilfrid Okorie

The Bcrypt Trap: How Hashing JWTs Silently Breaks Refresh Token Rotation

Wilfrid Okorie — Thu, 02 Jul 2026 18:55:20 +0000

This article talks about an easy-to-miss danger of using JWTs for authentication tokens. It is a danger that is so often not tested negatively, often slips into production codebases, yet is bad enough to bring the entire systems down.

JSON Web Tokens

JSON Web Tokens (JWTs) are an open industry standard used to securely transmit information between parties as a JSON object. This information is verifiable and trusted because it is digitally signed.
JWTs are stateless, and the server does not need to store the session information. Instead, the server issues a token to the client, and the client sends that token back with every subsequent request. The server can verify the token's authenticity independently without hitting a database.

Structure of a JSON Web Token

A JWT is composed of three parts, separated by dots (.): header.payload.signature

Header

The header consists of the type of token which is usually JWT, and the signing algorithm used, such as RSA or HMAC SHA256. It is encoded in base64URL.
The decoded header looks something like:

{
   "alg": "HS256",
   "typ": "JWT"
}

Payload

The payload consists of metadata, sometimes known as Claims. They are statements about an entity (in authentication, the user) and additional data. They could be registered claims such as issuance time (iat), issuer (iss), expiration time (exp), or subject (sub). They could be public claims defined publicly in IANA JSON Web Token Registry. They could also (and most often used in authentication) be private claims, which is where user metadata falls, such as user_id, role, etc.
The payload is also encoded in base64URL.

{
   "sub": "117272829822882828",
   "name": "John Doe",
   "admin": "true"
}

Signature

This is created by using a secret key, with the algorithm specified in the header to sign encoded payload + header.
It looks something like:
HMACSHA256(base64UrlEncode(header) + "." + base64UrlEncode(payload), secret)
The signature is used to verify that the sender of the JWT is who they say they are, and ensure the message wasn't changed along the way.

A JWT is a compact, URL-safe means of representing claims to be transferred between two parties. A sample string looks something like:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Modern-day Authentication

Modern-day authentication uses JWTs, with the server issuing two tokens to the client; a short-lived (typically 3 to 15 minutes) access token sent on every request and used to authenticate the user, and a long-lived (sometimes up to 7 days) refresh token, which is used to generate a new set of tokens (access and refresh) according to their use. They can live up to 7 days, so that a user can obtain tokens and continue their session even when they have been away for that long, without having to login over again.

Login: User sends credentials to the auth server
Creation: Auth server authenticates the user and generates JWT(s) using the secret key(s).
Transmission: The server sends the tokens back to the client
Storage: The client stores the token(s) wherever, which could be in cookies, localStorage, sessionStorage, or whatever resourceful solution the client comes up with.
Request: For every protected request, the client sends the JWT in the Authorization header: Bearer <token> (or it is sent automatically as with httpOnly cookies).
Verification: The API server receives the token, decodes the header to see the algorithm, and uses its copy of the secret to verify the signature. If valid, it trusts the claim in the payload.
Refresh: The access token is short-lived, so when it expires naturally, it cannot be used to authenticate requests.

During initial login, the generated refresh token is hashed, and the hash is persisted. When the client wants to refresh, the client sends the refresh token, and the hash is compared against what is stored in the database, and if it matches, tokens are signed again and sent to the user in the response. The new refresh token that is generated is hashed, and the new token hash replaces the old one in the database. This way, if a refresh token has been used to refresh once, it cannot be used again.

The Easy-to-miss Danger

Everything below applies when JWTs are used as refresh tokens, not access tokens. The access token should contain user metadata, since it is used to authenticate the user on every request.
Different hashing algorithms are used to hash the generated refresh token before storing the hash in the database. Many systems use SHA256, but one of the most frequently used hashing methods is bcrypt hashing. Bcrypt (a hashing function built on the Blowfish cipher) differs from SHA256 in the sense that it is intentionally slow, slowing down brute-force attacks, has a salt built-in so that hash('123') != hash('123') twice to stop rainbow attacks.
However, Bcrypt has a 72-byte limit, and silently truncates anything greater than 72 bytes. JWT strings are very long, typically > 200 bytes in length, so using Bcrypt to hash JWTs hashes just the first 72 bytes.
Only the signature part of a JWT is cryptographically secure. The rest of it is encoded, not encrypted.
Let's take a practical look at this.
For the next section, we use this jwt encoder and this random key generator to generate random UUIDs so that we use different JTIs
Let's construct a JWT. The header would look something like:

Our secret key: a-string-secret-at-least-256-bits-long
{
  "alg": "HS256",
  "typ": "JWT"
}

We will consistently use this secret key to sign different payloads with the jwt encoder.

Payload: 
{
  "sub": "1234567890",
  "name": "John Doe",
  "admin": true,
  "jti": "f27ca436-01b6-42fb-854b-f2de6e29c337"
}

Resulting token: 
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWUsImp0aSI6ImYyN2NhNDM2LTAxYjYtNDJmYi04NTRiLWYyZGU2ZTI5YzMzNyJ9.C9yFWhlCNLvPnEGo37fYX93C2hgyQMBjvOm91tPw2oY

Payload:
{
  "sub": "1234567890",
  "name": "John Doe",
  "admin": true,
  "jti": "02657189-789c-4ef8-9412-d5f3ad48f260"
}

Resulting token: 
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWUsImp0aSI6IjAyNjU3MTg5LTc4OWMtNGVmOC05NDEyLWQ1ZjNhZDQ4ZjI2MCJ9.2Jc6qLmbKyWP0Y1g4xCZPVVv3KJQBuf89cNoiqfOu78

Payload:
{
  "sub": "1234567890",
  "name": "John Doe",
  "jti": "02657189-789c-4ef8-9412-d5f3ad48f260",
  "admin": true
}

Resulting token: 
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwianRpIjoiMDI2NTcxODktNzg5Yy00ZWY4LTk0MTItZDVmM2FkNDhmMjYwIiwiYWRtaW4iOnRydWV9.t1SoE0cz40FpRNGy9e9mj4OGAmFWzJnx-3_OIAg8n9A

From the results above, using the header.payload.signature structure, the header is exactly the same in all cases - it is mere encoding. You can decode the header with this base64URL decoder.

This is because the JSON serializer serializes the fields in a consistent order. This can also be seen when you see that in the first two payloads, the payload section of the JWT string is almost the same as well, and the only difference is towards the end, because JTIs differ. To buttress that further, when the jti and admin fields are exchanged in position, the divergence point shifts earlier in the string.

However, the fact remains that in all of the tokens, the first 72 bytes: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI is the same in all of the resulting tokens.

This means that for two different refresh tokens, the hash matches when checked with bcrypt's compare method.

While this would not directly cause refresh tokens not to work, it renders the invalidating process of refresh tokens (replacing old hashes with new hashes) pointless, because no matter how many times the refresh token is generated, the same bytes are in the first 72 bytes of the resulting JWT string, and these 72 bytes are hashed. The implication is that old refresh tokens still work.

This means that if an attacker gets hold of a refresh token issued by a server in which this error is made on the backend, the attacker can 100% extend their session as long as they want, because even when the real owner refreshes, the old refresh token remains valid, and hence the attacker who gets an access token in refreshing as well, now has the keys needed to make requests to every protected route.

Mitigation:

Since we now know that using bcrypt to sign JWT refresh tokens is a disaster, we have three logical options, and here is what each of the options entails:
Option 1: Keep using JWT, use SHA256. Since the issue is bcrypt's 72-byte limit, we could use SHA-256. There is no byte length limit for this, so token rotation works fine. However, using a JWT as a refresh token carries redundant data (the payload which is already in the DB). This would work, and then the refresh token hash can actually be used for DB lookup, since with SHA256, the hash is always the same down to the byte. When you hash the provided token and you don't find it in the database, it can be used as an unauthorized flag. There is no real disadvantage here, but there is also no benefit.

Option 2: Use cryptographically-secure random bytes, with bcrypt. This is completely opaque. 32 random bytes results in 64 hex characters, is completely opaque, high entropy, and does not leak session information. There is no bcrypt truncation issue here, since the random bytes can be generated to be less than 72 bytes in length. Bcrypt works exactly as intended, but your refresh token is not a JWT anymore, so it does not look pretty. No real issue here. However, for a randomly generated token, bcrypt is not exactly the right pick. The features of bcrypt mentioned at the top of the article aree specifically built to resist brute-forcing low-entropy secrets like passwords. Random bytes like this has at least 256 bits of entropy, so the chances of brute force attacks are negligible. The slow hashing and built-in salt are just extra work on the CPU. Also, bcrypt's built-in salt makes it unsuitable for hash-based lookups at all, meaning the entire entity would have to be fetched before checking presented hash against stored hash using bcrypt's compare method. This does not scale before for multiple active user sessions, forces you to know user id before verifying token, and adds more calls to what would have been solved by a simple hash lookup.

Option 3: Use cryptographically-secure random bytes + SHA256. This also works fine, and SHA256 is faster and still secure for high-entropy inputs like random bytes. It is slightly less collision-resistant than bcrypt in a brute-force scenario. For 32 or 64 (or less than 72) random bytes, the practical difference is negligible. Furthermore, SHA256 is deterministic, making hash-based lookups possible, and mitigating the issues mentioned in option 2 with Bcrypt.

I would go for option 3. Instead of signing a JWT at all for refresh token generation, cryptographically-secure random bytes is the solution, because it already comes with the entropy it needs.

It is surprising how often this combination is misused, in production codebases. How often have you seen this mistake made?
What are some other oversights in terms of security often made in authentication?

From Resilience Infrastructure to Event Driven Architectures - What HNGi14 Taught Me About Real Systems

Wilfrid Okorie — Sat, 13 Jun 2026 15:56:18 +0000

Most backend systems look simple, until they don't. A single server, a database, with requests coming in and responses going out: it looks clean, predictable and easy to reason about. All of a sudden, business flows, traffic grows, external services start failing, processes crash mid-job, latency increases. What looked like a solid foundation reveals itself as a system that only works in the happy path.
During HNGi14, I spent months building systems that had to work outside the happy path; intelligent retries, automatic failure recovery and mishap management, asynchronous job processing, latency drops, and system hardening in general. This article covers two of those systems: a retry engine that covers exponential backoff and jitter, and a Redis pubsub metrics pipeline that demonstrates an event-driven architecture.

Stage 8 Task - Retry Engine

Problem it Solved

Imagine you have a system, and your server is down for 5 seconds. Say your system using Stripe for payments, and stripe is supposed to deliver a webhook to your server on successful payments. The payment happens that time, your server does not receive it, so your server doesn't consume the customer actually paid for the service, and the customer doesn't get served. In this context, I will refer to stripe as the client, and your server as the server. In this case, it is not the client's fault that the webhook wasn't delivered. It is actually the server's fault. If the client tried that request again now your server is up, it will probably succeed. This is a typical case where a retry engine is needed.

What is a Retry Engine?

A Retry Engine is a small HTTP service that acts as a reliable proxy for outbound HTTP requests. Instead of making a request directly and hoping it succeeds, it persists the job, and returns immediately with an ID, while a background worker handles the actual call. If the call succeeds, that is the happy path. It stores the result, and you can get it whenever you want. If it fails with a retryable error, the worker backs off and tries again. If it fails permanently, it stops, and marks the job dead.

When a server experiences transient failures, the clients that have made requests to it experience errors, and the typical solution is to retry the requests, depending on if the errors are retryable errors, or Transient Errors. Transient Errors are those that would probably succeed if tried again. However, if the errors are retried immediately, they would probably crash your just-recovering server. That is where Exponential Backoff. The clients requesting give the server some time to recover from the error, and the more the error persists, the higher the backoff period before the next request.

However, the issue with this is that if the server crashed due to high traffic all at once, and those retries, though exponentially backed off, happen at the same time, it would probably lead to the same spike in traffic, and would crash the server again. That is where Jitter comes in. Jitter adds randomness to the retries, and spreads these retries. Assuming they were supposed to happen at the 10 seconds after the previous try, jitter spreads it to 8.5s, 8.7s, 8.8s, 9.1s, 9.5s, 10.2s, 10.4s etc. It depends on randomness for varying the next_retry_at for each of the requests

How I Approached The Task

I was genuinely confused when I refused the task, so the first thing I did was give the task to claude to break it down. The prompt requested a breakdown, and key concepts needed to be studied to understand the task and implement. After this, I went to YouTube, searched through different videos for explanations.
I spent more time doing research than writing the code for that task, because the code was genuinely very easy to write, maybe because of how much I did research.
After understanding, I wrote a summary of what I got, and questions I had, and headed back to Claude. It was only after I had gotten the concept to the core, that I started writing the code, and it took me just an afternoon to complete the task.

What Broke and How I Fixed it

During implementation, the main things that broke had to do with my use of sqlite3 library, and how it differs from Postgres.
Although it uses SQL, it is file-based, unlike PostgreSQL.
I also had some issues with module and commonjs during the implementation, because I did not use any framework to bootstrap it - I wrote from scratch.

What I Took Away From It

Concepts:

How workers are actually just setIntervals or recursive setTimeout calls with a memory-layer, like DB or redis for BullMQ.
How exponential backoff and jitter work mathematically, not just conceptually
The thundering herd problem and why jitter solves it
What transient errors are and why the distinction between client errors and server errors matter for retry logic, and cases of exceptions
How SQLite DB works, using it for the first time

Patterns:

The claim-before-process pattern: previously learned in Web3 and applied here, to prevent duplicate work in polling workers
Using future time as self-healing lock instead of boolean flags all the time
How recursive setTimeouts can be a safe alternative to setInterval for async work loops

Debugging Techniques

Using console.time() to track time in the console. Used in debugging, but not currently present

Why I Picked It

Among the many individual tasks that were done throughout the internship, this was the most exciting, because the retry engine, exponential backoff and jitter, were all completely foreign concepts to me before the task. After the task, I understood these things very well in my opinion, and good enough to take on the next task, which was basically a highly scoped-up version of it - a full production-grade Job Scheduler.

Project Details:

Team Task - Real-Time Inverter Metrics Streaming

In the HNG Internship, I was the backend track lead, and general technical leader of the EnergyIQ team. We built an application that turns solar inverter data gotten from the specific inverter brand APIs, into actionable intelligence for financial optimization, costs and savings, and device maintainance, for the user.

Initially, the flow mentioned the frontend polling the backend for updates from these APIs. This would mean an browser requests data, and the backend requests the data from the external service, and get the data, and then return it to the user, while also storing to the DB

The Problem

The problem is what is mentioned in the section above. For a handful of users, this works very well. However, as the user-base for the app grows, this starts to crack under pressure. EnergyIQ supports multiple users being able to view the dashboard for a particular inverter, in a feature known as Team Access. This means that if 10 people are viewing the inverter dashboard of a particular inverter, their 10 different browser instances make requests for the same data periodically. This also means that if the client doesn't ask, the backend doesn't get the inverter data, and cannot use the data to do other things it should. Also, these brand APIs have rate limits, and requested request times, so it becomes chaos when multiple clients ask for data at different times. One simple solution is them asking the DB instead, but this way, they don't exactly get data the moment it is available, and for critical cases, this latency can mean a lot.

What It Was

As a result, I created:

a single poller service for the brand APIs
a redis pubsub service for publishing data
an SSE endpoint for connecting clients to this stream by subscribing

The poller service is a background service that polls all the brand APIs at organized times, depending on the specific rate limit for that brand's API. This way, it organizes the traffic into a few, organized requests, using Promise.all to make this requests concurrently. When it makes these requests, if there are errors, the errors are handled and logged.
If the data is gotten successfully, it uses the pubsub service to publish this data to a specific channel, which is determined by the id of the inverter for which it has just polled.
The SSE endpoint enables the client to connect once to the server and listen for updates, by subscribing to the same channel pattern used by the poller/pubsub service.
As a result, data is gotten real-time, eliminating the polling from the frontend completely, and replacing chaos with order.

How I Approached It:

As the head of the team, I chose the task for myself, since I had been learning about different API and endpoints types and shapes. I thought of different solutions to it, of which the main ones were websockets, long-polling, server-sent events using the EventSource API, or Redis Pubsub + SSE - this is the one I chose.

The real challenge was how to make sure it was separate from other services i.e. it didn't result in circular dependencies by calling a function in the inverters service directly to deliver the data. One of the important design justifications in this implementation was avoiding the poller service being called by the endpoint controller directly.

Before writing a single line of code, I did my research to find out why redis was as fast as it was, how it was able to do a number of different things (also used in cache), and how SSE endpoints worked, and why they differ from websockets.

After this, I wrote an implementation plan, and used it.

What broke and How I fixed:

Here are some things that broke:

Provision for a sandbox
Initially, EnergyIQ had no inverter to test this feature on. I had to write a mock-inverter-server that spits data that makes sense periodically. This server was configured to perform its own internal physics, and is a worker that just spits data guessed from the physics on every tick, with daytime and night time awareness. I wrote this, deployed, and created a new inverter brand to be used in development, which I called the SANBOX INVERTER. This also turned out to be central to EnergyIQ, as it was used to support implementation of basically every other feature.
The device type assumption
I hardcoded deviceType: 'min' in the Growatt adapter. When I tested against a real inverter, it returned empty arrays for min, inv, sph, and max. The fix was discovering the device was type: 2 (storage) and switching to deviceType: storage, and later making device type dynamic, stored at onboarding from the device list response rather than hardcoded.
Content-Type for the Growatt v4 endpoint
The initial adapter was sending JSON. The Growatt v4 queryLastData endpoint requires application/x-www-form-urlencoded. The fix was switching to URLSearchParams and setting the correct Content-Type header.
Field name mismatches from the TRD
The TRD documented bmsSOC - the actual API returned bdc1Soc and bmsSoc as separate fields depending on the device type. The storage response used completely different field names (capacity for SoC instead of bmsSOC, vBat instead of bdc1Vbat). The fix was mapping the storage-specific fields correctly.

What I Took Away From It:

From this task:

I learned how Redis works, and how it uses the RAM for data storage, for speed.
It was my first implementation ever of a lasting connection between client and server, so I got to understand practically how server-sent events work.
I learned alternatives that can be used in other cases. I had heard about Kafka for streaming. While it was not a direct solution to this, it was worth learning how and where it is used.

Why I Picked This:

I picked this because it is still my proudest contribution to the team, not because it is a very big thing, but because it enhanced EnergyIQ a lot. Many other services now use pattern pubsub (listen for updates on patterned channels) to perform their functions, without need to poll the DB. It became a backbone for the entire app, since the app is centered around getting and using solar inverter data from the brand APIs.

It is something worth being proud of.

CONCLUSION

In conclusion, I wrote my first Express server in HNG, coming with the intention of learning how systems work, and understanding Systems Design. It is not a destination, but I can proudly say I am moving fast and effectively on that journey, as a result of the well thought-through tasks that have been given in this HNG cohort. I am proud of myself.

How I Created a DDoS Protection Engine

Wilfrid Okorie — Wed, 29 Apr 2026 20:33:14 +0000

As part of my tasks in HNG14, track DevOps, stage 3, I was to build an engine to protect a live Nextcloud server from DDoS attacks, without using any existing security tools like Fail2Ban.

In summary, this means writing a program that watches traffic in real time, learns what how regular traffic is, and automatically locks out attackers the moment something goes wrong i.e. in times of suspicious traffic spikes.

This post explains exactly how I did it — in plain English, no security background required.

What is a DDoS Attack

DDoS stands for Distributed Denial of Service. Imagine there is a club that has regular traffic, say 10 people averagely entering every minute. A DDoS Attack would be an attacker sending 200 random people to stand in line, without entering so that honest people that want to enter the club are denied the service.
In this case, instead of fake customers, it is fake http requests, thousands per second to flood your server until your server cannot serve real requests anymore.
The protection engine serves as the bouncer in such a case.

Architecture: The Pieces that Work Together

In this setup, there are three pieces to focus on that work together. Here is a diagram of the flow:

- Internet Traffic

- Nginx -> reverse proxy, logs everything to JSON

- Nextcloud -> the actual app, don't touch this

- The Detector Daemon -> reads Nginx logs continuously

The tool:
- Detects anomalies
- Blocks IPs via iptables
- Sends Slack alerts
- Serves a live dashboard
- Auto-unbans on schedule

The idea is that Nextcloud runs behind Nginx (which is a web server, acting as a gatekeeper).
To every request that comes in, Nginx logs it in real time. The tool is an adaptable tool that reads the logs in real time as they come, calculate the average rate of requests at a particular period, to understand what normal is, and then react to suspicious spikes in this rate. The reaction is automatically blocking attackers.
In addition, there is a live web dashboard showing what is happening.

Here are the steps to building such a tool:

1: Set Up Your VPS

Go to your cloud provider (I used AWS). Create an EC2 instance (or a "Virtual machine", or whatever it is called with your cloud provider) with at least 2 vCPUs and 2GB RAM. Start the instance, and copy the Public IPv4.

With your instance running, SSH into it and install the tools you need:

sudo apt update && sudo apt upgrade -y
sudo apt install -y docker.io docker compose git
sudo systemctl enable docker && sudo systemctl start docker

Also open these ports in your cloud firewall (AWS calls it a Security Group):

22 - SSH
80 - HTTP (Nginx/Nextcloud)
443 - HTTPS
5000 - Your detector dashboard

Point a domain or subdomain at your server's public IP - you'll need this for the dashboard URL. If your IP changes after restarting the instance (it will on AWS unless you use an Elastic IP), update your DNS A record.

Problems I faced at this step:

The default Ubuntu image comes with containerd already installed, which conflicts with docker.io. Remove it first, then install Docker.
I initially created one with less RAM and a tiny 8GB disk, and spent a lot of time debugging crashes that were simply caused by running out of memory and disk space. Save yourself the pain: start with a t3.small (2 vCPU, 2GB RAM) and a 16GB disk minimum if you are using AWS.
Don't forget to also allow port 5000 in UFW (Ubuntu's local firewall) - I missed this and spent time confused about why the dashboard wasn't accessible even though the Security Group was correct:

sudo ufw allow 5000

2: Set Up the Docker Compose Stack

I did all of this in my Python codebase, and pulled from git from within my EC2 instance, but besides the code, here is what you need:

Create your project folder:

mkdir -p ~/hng_devops_stage_3/nginx
cd ~/hng_devops_stage_3

Create docker-compose.yml:

version: "3.9"

volumes:
  HNG-nginx-logs:      # nginx writes here, detector reads here
  detector-audit:      # persists audit logs across restarts

services:

  nginx:
    image: nginx:alpine
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - HNG-nginx-logs:/var/log/nginx
    depends_on:
      - nextcloud

  nextcloud:
    image: kefaslungu/hng-nextcloud
    restart: unless-stopped
    volumes:
      - HNG-nginx-logs:/var/log/nginx:ro

  detector:
    build:
      context: ./detector
      dockerfile: Dockerfile
    restart: unless-stopped
    network_mode: host        # required for iptables to affect host firewall
    env_file: .env            # contains your Slack webhook URL
    volumes:
      - HNG-nginx-logs:/var/log/nginx:ro
      - ./config.yaml:/app/config.yaml:ro
      - detector-audit:/var/log/detector
    cap_add:
      - NET_ADMIN             # required to run iptables commands
    depends_on:
      - nginx
    environment:
      - DETECTOR_CONFIG=/app/config.yaml

Here are some things to note:

Why network_mode: host? Docker containers normally have their own isolated network. But iptables rules you add inside a container only affect that container, not the actual host machine. With host networking, the container shares the host's network stack, so iptables rules you add actually block traffic at the server level. Without this, your bans do nothing.
Why cap_add: NET_ADMIN? By default, containers can't modify firewall rules, since that is a privileged operation. This capability grants exactly the permission needed, and nothing more.

Why a named volume HNG-nginx-logs? This is the shared pipe between Nginx and your detector. Nginx writes logs into it. Your detector reads from it. The name must be exactly HNG-nginx-logs, since the task requires it.

When your disk fills up (and it will if you're not careful - the Nextcloud image alone is over 1GB), clean unused Docker data:

sudo docker system prune -f

If you resize your cloud disk, remember to extend the filesystem too:

sudo growpart /dev/nvme0n1 1
sudo resize2fs /dev/nvme0n1p1

3: Configure Nginx

I also did this in my IDE so it would appear when I pulled from github.

Create nginx/nginx.conf:

user  nginx;
worker_processes  auto;
error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    # JSON log format — every field the detector needs
    log_format json_log escape=json
        '{'
            '"source_ip":"$remote_addr",'
            '"timestamp":"$time_iso8601",'
            '"method":"$request_method",'
            '"path":"$request_uri",'
            '"status":$status,'
            '"response_size":$body_bytes_sent'
        '}';

    access_log /var/log/nginx/hng-access.log json_log;

    # Trust X-Forwarded-For so real client IPs are logged
    real_ip_header    X-Forwarded-For;
    set_real_ip_from  0.0.0.0/0;

    sendfile       on;
    keepalive_timeout  65;

    upstream nextcloud {
        server nextcloud:80;
    }

    server {
        listen 80;
        server_name _;

        location / {
            proxy_pass         http://nextcloud;
            proxy_set_header   Host              $host;
            proxy_set_header   X-Real-IP         $remote_addr;
            proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
            proxy_set_header   X-Forwarded-Proto $scheme;

            client_max_body_size    10G;    # allow large file uploads
            proxy_request_buffering off;
        }
    }
}

Two important things here:

JSON logs: the detector parses these logs line by line. They must be valid JSON. The escape=json directive ensures special characters in URLs don't break the JSON structure.

Real IP forwarding: without real_ip_header X-Forwarded-For, every log entry shows Nginx's internal Docker IP instead of the actual visitor's IP. Your detector would see every request coming from the same internal address and never identify real attackers.

Test Nginx is working before moving on:

sudo docker compose up -d nginx nextcloud
curl http://YOUR_SERVER_IP

You should see the Nextcloud setup page. If you see a 502 error, Nextcloud is still starting up; wait 30 seconds and try again. If you get a port conflict error, something else is using port 80:

sudo systemctl stop nginx    # stop any system nginx
sudo systemctl disable nginx

4: Build the Detector App

Your detector lives in a detector/ folder and is made up of several Python files, each with a single responsibility. Here's what each one does and why it exists:

config.py: loads config.yaml and environment variables. All thresholds live here. Nothing is hardcoded anywhere else in the codebase.

monitor.py: tails the Nginx log file line by line, exactly like tail -f in your terminal. Every new line gets parsed from JSON and fed into the sliding windows. This runs in its own thread continuously.

baseline.py: keeps a 30-minute rolling history of per-second request counts. Every 60 seconds it recalculates the mean and standard deviation. Maintains per-hour slots so peak-hour traffic doesn't distort off-peak baselines.

detector.py: evaluates current request rates against the baseline. Fires if z-score exceeds 3.0 or rate exceeds 5x the mean. Tightens thresholds for IPs with high error rates.

blocker.py: executes iptables to block flagged IPs and records the ban.

unbanner.py: runs on a schedule, checks expired bans, removes iptables rules, and escalates the backoff level for repeat offenders.

notifier.py: sends HTTP POST requests to your Slack webhook with ban/unban/global alert details.

dashboard.py: a Flask web server serving a live metrics page that refreshes every 3 seconds.

main.py: the entry point. Starts all threads and keeps the daemon running.

Your config.yaml holds all the tunable values:

slack_webhook_url: "${SLACK_WEBHOOK_URL}"   # loaded from .env at runtime

zscore_threshold: 3.0
rate_multiplier: 5.0
error_rate_multiplier: 3.0

sliding_window_seconds: 60
baseline_window_minutes: 30
baseline_recalc_interval_seconds: 60

dashboard_port: 5000
log_file_path: "/var/log/nginx/hng-access.log"
audit_log_path: "/var/log/detector/audit.log"

For the Slack webhook URL, never put the real URL in your config file if your repo is public. Instead, I created a .env file on my server (which I added to .gitignore) and let Docker inject it as an environment variable:

.env (on your server only, never committed):

SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/REAL/WEBHOOK

Your Dockerfile for the detector:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . ./detector/
CMD ["python", "-m", "detector.main"]

To bring the full stack up:

sudo docker compose up -d --build
sudo docker compose logs -f detector

You should see the Flask dashboard starting and log lines being processed. If you see No space left on device, clean up Docker and resize your disk as described in section 2.

5: Audit Log

Every significant action the detector takes gets written to a structured audit log at /var/log/detector/audit.log. The format is:

[timestamp] ACTION ip | condition | rate | baseline | duration

Real examples from my running system:

[2026-04-28T08:57:09Z] BAN ip=102.90.99.58 | condition=zscore | rate=1.2/s | baseline=1.0/s | duration=600s
[2026-04-28T09:07:28Z] UNBAN ip=102.90.99.58 | condition=backoff-0 | rate=N/A | baseline=1.0/s | duration=1800s
[2026-04-28T09:00:00Z] BASELINE_RECALC ip=global | mean=1.0 | stddev=0.06

To read it live:

sudo docker exec $(sudo docker ps -qf "name=detector") tail -f /var/log/detector/audit.log

The detector-audit Docker volume means this log survives container restarts — if your detector crashes and restarts, the full ban history is still there. This matters because the unbanner needs ban history to know which backoff level to apply next.

6: Set Up Slack

Go to api.slack.com/apps and Create New App
Give it a name (Mine was "HNG DDoS Protection Engine") and pick your workspace
In the left sidebar, click Incoming Webhooks, toggle it On
Click Add New Webhook to Workspace → pick the channel you want alerts in, and Allow
Copy the webhook URL that looks like https://hooks.slack.com/services/T.../B.../...
On your server, add it to your .env file:

echo "SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK" > ~/hng_devops_stage_3/.env

Restart the detector to pick it up:

sudo docker compose restart detector

Test it's working by sending a flood of requests to trigger a ban:

for i in {1..300}; do curl -s http://YOUR_SERVER_IP/ > /dev/null; done

Within 10 seconds you should see a Slack message like this:

🚨 IP Banned
IP: YOUR_IP
Condition: zscore
Current Rate: 4.8 req/s
Baseline Mean: 1.0 req/s
Ban Duration: 600s

Wait 10 minutes and you'll get the unban notification automatically. That confirms the full cycle — detection, blocking, alerting, and auto-unban — is working end to end.

Here is How Some Components Work:

Sliding Window: The problem this solves is, how do you measure requests per second in real time? A naive solution would be to count requests per minute, but that is static, if you measure the time against how long an attack takes i.e. the attacker could be done before your system is done counting a minute.
Instead, imagine you have a stick, with a number of spots where items can sit. When things are placed, they move from one end of the stick to the other end, over 60 seconds. After the first 60 seconds, the number of items on the stick gives you your current rate. After the first 60 seconds are gone, items that have gotten to the other end fall off, and more items (this is an analogy for requests) come in.
Whenever there is an attack, the number of items at the same time on the 60-second window stick would get abnormally high, and that is how you would know there is an attack.
This is implemented in Python with a double-ended queue.
Baseline Mean: The function of this is to learn from traffic. The sliding window is very good, but it would give false positives if in the first place, you don't know how many requests should be "too many". For a personal blog for instance, having 50 requests per second is a massive spike. For a large cloud platform, or a social media app, it is actually normal during peak hours. Therefore, you can't hardcode a number for this. The value has to be specific to your actual traffic patterns.
Every second, the detector records how many requests came in, keeping a rolling 30-minute history of these per-second counts. Every 60 seconds, it recalculates two things:
- Mean: the average requests per second over the last 30 minutes
- Standard deviation: how much the rates typically varies from the mean It also maintains per hour slots, so that peak times are separate from quiet times. Also, the baseline mean never drops below 1.0, to prevent floor division by zero. This means that the baseline needs time, but after that period, it gets what normal looks like for a server.
Detection Logic Decision: From the above two, we have a current rate, a mean, and a standard deviation. The detector calculates a z-score:

z = (current_rate - mean)/standard_deviation

The z-score answers how many standard deviations above normal, a particular rate is. A z-score of 1.0 means it is slightly above average. 3.0 means this happens by random chance less than 0.3% of the time. 10.0 means something is very wrong.
When something is wrong, the detector takes action by banning the source IP, and a notification is sent on the system (slack in this case).
There is also an error surge detector. If an IP is getting errors much higher than normal, its detection thresholds tighten automatically. This is to catch attackers who might not send high volumes of requests.

How iptables Blocks an IP

Whenever an IP is flagged, the detector runs the command:

iptables -A INPUT -s 1.2.3.4 -j DROP

iptables is Linux's built-in firewall
-A INPUT adds a rule to the input chain i.e. incoming traffic
-s 1.2.3.4 selects traffic coming from the source IP that is given as argument to the flag.
-j DROP discards the packet.

With this, the attacker's requests never reach Nginx; Linux drops them at the lowest level, before the application code runs.

Bans are not permanent by default. The


 follows a backoff schedule.
First ban is for 10 minutes, second for 30 minutes, third for two hours, and fourth permanent. This means repeat offenders get permanently blocked.

## Personal Takeaway:
My personal takeaway from this project was the logic used to build adaptive thresholds. I love the math in that, and how it makes sure - to a good extent - that the thresholds adjust very well to different traffic patterns. It adds the time dimension to the volume of requests, which is what accurate systems need: **CONTEXT**.

Here is the github repository: **https://github.com/OWK50GA/ddos-attack-protection-engine**

Bitcoin Dust Attacks: What They Are and How to Defend Against Them

Wilfrid Okorie — Wed, 15 Apr 2026 06:04:54 +0000

In the context of blockchain, dust is referred to as tiny amounts of any cryptocurrency that is uneconomical to spend.
An amount is considered uneconomical to spend when the transaction cost it incurs is greater than its value.

An attack is a malicious-intent action that attempts to do things like steal funds, exploit loopholes in rules, or disrupt the network. There are different types of attacks in Bitcoin. One of them is the Dust Attack.

How addresses/keys today in Bitcoin work:

Bitcoin uses Elliptic Curve Cyptography, which is a form of asymmetric cryptography involving a keypair instead of a key, where there is the private key, and there is the public key. Public keys are derived from private keys. They are secured by the discrete-log problem in math, which ensures that it is extremely difficult to work your way back from a public key to a private key. There are different forms of spend, but the basic rule is simple: public keys are associated with bitcoins recorded on the blockchain, and the controller of the keys (the holders of the corresponding private key) can spend the bitcoins, by addressing them to some other public key. Addresses are derived from public keys.
Now, the blockchain is a public network, implying that transactions there are completely visible, so that when a transaction happens, you can see what addresses are involved in the transactions, and the amounts spent.
Bitcoiners love their privacy however, so a natural solution is to control more than one keypair, so that you can send different addresses for different transactions, and make it more difficult for addresses to be traced to you.

Modern wallets have a clever way of doing this whereby from a seed, a private key can be derived, and a whole tree of child keys (private and public) can be derived. The whole idea is that each user has a master key, that gives birth to descriptors, that are used to deterministically derive new keypairs, to evolve into a whole tree of keys, so that any key in the tree can be parent to other keys, and there is no limit on the depth of the tree. More on this can be seen on the Bitcoin Book

What This Implies

This means that for every transaction you receive coins from, you could actually publish a different address, so that the coins could not be traced to you. There is also the concept of internal and external descriptors, where internal descriptors derive wallet addresses you never share - also called change descriptors, because you use them to make change for yourself in transactions. External descriptors derive addresses that are shared, for you to receive coins from other people.
The main point of descriptors and many addresses is privacy through non-reuse. Having these many addresses breaks transaction links, helps hide which output from a transaction is your change (since to the unknowing eye, it is not addressed to your wallet), it limits damages from key leak, so that if a private key gets leaked, just that address and its UTXOs are in danger, not your entire wallet.

Note: As you continue down this article, know that Bitcoin doesn't use the arithmetic balance model, but the UTXO model for amounts, so that you have units of cryptocurrency, instead of a constant you just subtract from when you want to spend. It is like different units of money in a physical wallet. Each unit is called an Unspent Transaction Output (UTXO)

Enter Dust Attacks

In dust attacks, an attacker is not directly after your coins. Instead, they want to uncover your identity, by plotting your identity graph - a map of which Bitcoin addresses belong to the same wallet. This can be used to deanonymize a person, so that you find out their total holdings. Knowledge of who holds how many bitcoins in the past has greatly led to threats, physical attakcs, abuse, torture, etc., so you want to protect yourself.
The knowledge can also be used to link a pseudonymous identity - yours, to a real-world activity.

How Dust Attacks Happen:

Step 1: Target Selection:
First of all, the attacker identifies the address(es) they want to deanonymize. This is done by looking at the blockchain directly, since transactions are public. They could get the address from anywhere at all.

Step 2: Dust Delivery:
Next, the attacker builds the dust delivery transaction. They can do this anyway, but to be economical, it is usually a single transaction, that sends tiny amounts (dust) to all target addresses simultaneously. This amount is chosen so carefully, it is most likely very very tiny, but not so tiny as it wouldn't make sense to spend economically, and above the relay floor, so that even if the victims might not notice it, nodes could relay it.

Step 3: Waiting/Monitoring:
Next, the attacker waits for the wallet owner to spend the outputs they sent. This could be a gamble, depending on how long it takes for the user to spend the coins. If somehow, the victim's wallet's coin selection algorithm picks any of these UTXOs, alongside a real UTXO, it becomes part of the graph.

Step 4: Clustering:
When the user spends this UTXO, the inputs reveal co-ownership, since all inputs to a transaction usually belong to one controller. The attacker now knows that these addresses in the transaction input belong to the target.

Step 5: Graph Expansion:
The attacker can now expand their knowledge graph of you, trace forward or backward, expand the graph, and get closer to whatever goal it is they have.

How To Combat dust attacks:

From the steps above as regards the How of dust attacks, the way to avoid dust attacks is simple: Do not spend the dust. More accurately, do not spend the dust on a canonical transaction. Instead, you can arrange transactions with no output to spend them, so that the entire dust goes into network fees and the trail ends there, or you could mix the dust through a coinjoin with other users' dust. Here it gets mixed with hundreds other dust UTXOs, and the link breaks.

I built a tool using Rust called Hoover that acts to identify dust attack UTXOs, and construct Partially Signed Bitcoin Transactions (PSBTs) that safely spend them to fees, and safely remove them from the users' wallets.

Hoover

Hoover is one of many dust attack tools that have been created for this purpose, and they all have one general idea.
These dust attack tools take in your descriptors, internal and/or external, and register them. Then, on your command, it scans your wallet for dust UTXOs, showing them to you. Promptly, it constructs PSBTs for you to spend these transactions, with OP_RETURN output, so that there is no UTXO output. These PSBTs can then be taken and signed however you may, and broadcasted to the network as regular signed transactions.

One critical part of creating the PSBTs is that you do not create a PSBT that contains dust from multiple addresses. The consolidation must be per address, else it compromises the privacy it aims to protect.

Of course, these steps/processes differ from tool to tool, with different tools making innovations in different parts of the process, such as the nature of the PSBT.
In the same light, the part of the process Hoover does different is in the dust identification.

Usually, the tool would just check all the UTXOs to see which one is below or sometimes equal to the relay floor for that script type. Optionally, a user may configure their minimum number of sats for a UTXO to be considered non-dust, so that for a relay floor of 546 sats, a user may choose to remove every UTXO under 600 sats.
However, a user may by coincidence have a lot of UTXOs that are not economically unspendable, but might be dust attacks, and in this case, depending on how many of them they have, sweeping all from their wallet may be considered waste.
Another case is that depending on how sophisticated the attacker is, the attacker could send as high as 700 sats so that the user is much more likely to spend, and less likely to suspect as a dust attack UTXO, which increase the attacker's chances of succeeding in that attack.
Here is what Hoover does different: Hoover divides dust into different types. UTXOs under the relay floor for a particular script-type are automatically counted as dust, and since they are economically unspendable, they are added to the "psbt staging area".
Hoover goes further to inspect UTXOs as high as a configurable amount by the user, trying to detect dishonest patterns from the source, and marks the UTXO in the list by suspicion level. This results in a user knowing their UTXO may be from a dust attack, even if it is not canonical dust. Some of the flags used to detect these dishonest patterns are:

Dust UTXOs received on a change address: this could be dust, or it could be a bit more than a dust UTXO, but your change addresses should not receive UTXOs from external parties in the first place. A person should only know your change address if they have been monitoring your transactions on the blockchain. This particular flag is highly suspicious.
Multiple of your addresses receiving dust from same txid: this is also almost certainly from a dust attack, and the attacker has actually done their research well and almost accurately, and at this point, doing anything might help finally achieve their goal. The suspicion on this flag is also very high.

Number of outputs of sending tx: this is another fingerprint of a dust attack. The outputs of the sending transaction are inspected. If a high ratio of these outputs are dust, as opposed to not, it is almost certainly a dust attack. The suspicion flag is also high, but not as high as the previous two.
Suspicious round value: in case this wasn't caught by first type of dust, UTXOs with their values too close to, or exactly at the relay floor have to be inspected to be dust attack UTXOs. Other methods are used to check, but the fact that it is too close to the relay floor adds a suspicion weight as well to the UTXO.
Same address sending dust in more than one tx: Catching dust from the same address more than once, even when not in the same transaction is also a bad flag.

There are more flags Hoover uses for detection (some still being implemented), but the idea is the same: In what context did my wallet receive this UTXO? That will tell whether or not the UTXO is a dust-attack UTXO, even for UTXOs above the canonical dust threshold.

Conclusion

When dust attacks succeed, an attacker successfully clusters your address. The attacker now knows you hold significant amounts in bitcoin, and you become a target for extortion; kidnapping or ransom has happened to known Bitcoin holders. For an individual, transaction history exposure could expose salary, spending habits, political donations. Every transaction you ever made/will ever make becomes readable in the context of your identity. Social engineering is also possible with this information in the hands of bad actors. They know when you received large amounts, which services you use, and can do terrible things, such as timely impersonation of such services.

What are some other flags you think would mean UTXOs probably are part of a dust attack campaign?