The Struggle
I remember the payment bug that kept me up until 3 AM.
Stripe was sending a invoice.payment_failed webhook, but only in production.
I checked my logs: Truncated.
I checked my tunneling tool: Session expired.
I checked my SaaS bin: History limit reached.
I realized I didn't have a debugging tool; I had a toy.
The Solution: Webhook Debugger
I decided to build my own solution. But I didn't just want a "bucket" that catches requests. I wanted to build a Reference Implementation for how a modern, secure Node.js application should look in 2026.
Here are the 13 Engineering Patterns I used to build it:
1. Global SSE Heartbeat & Padding π
Most SSE implementations leak memory by creating a timer per connection.
My Approach: A single setInterval iterates a Set of clients.
The Pro Tip: I added res.write(' '.repeat(2048)) (2KB whitespace) and X-Accel-Buffering: no headers. Why? Because corporate firewalls (and Nginx) love buffering streams. The padding forces them to flush the connection immediately.
2. SSRF Protection (DNS & IP Verification) π‘οΈ
Allowing user-defined webhooks is dangerous (Server-Side Request Forgery).
The Fix: I wrote a custom validator in src/utils/ssrf.js that resolves the DNS before the request. It checks the IP against a blocklist of private ranges (RFC 1918) and cloud metadata services (169.254.169.254). It even handles IPv4-mapped IPv6 addresses (::ffff:127.0.0.1).
3. Deep Replay with Exponential Backoff π
Retrying a failed webhook isn't just "try again".
The Logic: If the destination yields a transient error (ECONNABORTED, 503), the system waits 1s, then 2s, then 4s.
Header Stripping: The replay engine automatically strips sensitive headers (Authorization, Cookie) so you don't accidentally send production credentials to your local dev environment.
4. Timing-Safe Authentication β±οΈ
Never compare API keys with ===.
The Attack: An attacker can measure how long your server takes to say "No" to guess the key character-by-character.
The Fix: I use crypto.timingSafeEqual in src/utils/auth.js to ensure the comparison takes the exact same time whether the key is 99% correct or 0% correct.
5. Memory-Safe Rate Limiting (LRU) π§
Standard rate limiters are often purely in-memory maps. If a botnet hits you with 1 million IPs, your server crashes (OOM).
The Pattern: My RateLimiter uses a Sliding Window with LRU Eviction. It hard-caps at 1,000 entries. If the map is full, the oldest IP is evicted to make room. It prioritizes stability over strictness.
6. Memory-Safe Dataset Filtering π
Searching for a single timestamp in a 1GB JSON dataset will crash a standard Node.js process.
The Solution: Iterative Pagination. The /replay endpoint reads chunks of 1000 items (dataset.getData({ limit, offset })), searches for the event ID, and fetches the next chunk only if not found. This ensures we never load the entire dataset into memory.
7. Input Sanitization & Coercion π§Ή
Inputs from the wild are messy. Strings look like numbers; booleans look like strings.
The Pattern: A dedicated coerceRuntimeOptions utility in src/utils/config.js recursively walks the input object, coercing "true" -> true and "5" -> 5, ensuring the runtime configuration isn't crashed by type mismatches.
8. Index.html Caching π
We serve a UI, but we aren't a CDN.
The Optimization: The index.html template is read from disk once at startup and cached in a string variable (indexTemplate). Placeholders like {{VERSION}} are replaced on-the-fly using escapeHtml(), but the disk I/O cost is paid only once.
9. Bootstrap Validation Logic π©Ή
What happens if the user manually edits the INPUT.json and breaks the JSON syntax?
Self-Healing: The ensureLocalInputExists function in src/utils/bootstrap.js detects corrupt JSON on startup. Instead of crashing, it automatically renames the bad file to .tmp and writes a fresh default configuration, logging a warning. The app always starts.
10. Optimized Headers β‘
Every millisecond counts.
-
Content-Encoding: identity: Disables gzip for the SSE stream (gzip buffers, which kills real-time). -
Cache-Control: no-cache: Forces browsers to verify the stream status. -
Connection: keep-alive: Critical for long-lived streams.
11. Testing Asynchronous Code π§ͺ
Testing streaming and retries is notoriously hard.
The Strategy: We use jest with custom helpers (waitForCondition in tests/helpers/test-utils.js) and mocked timers. In resilience.test.js, we mock axios to fail exactly twice with ECONNABORTED to verify the retry logic attempts exactly 3 times before giving up.
12. Hot-Reloading (Zero Downtime Config) π₯
The Problem: Restarting the server just to change the API key or add a webhook URL loses all SSE connections.
The Solution: A background poller in src/main.js reads the INPUT.json from the Key-Value Store every 5 seconds. When a change is detected:
- It diffs the new config against the old one.
- It updates middleware (body parser limits, rate limiter), auth keys, and webhook counts dynamically.
- It reconciles the webhook IDs: if the user increased
urlCount, new IDs are generated; if decreased, no IDs are removed to prevent data loss.
This is all enabled by the loggerMiddleware.updateOptions() function, which allows runtime reconfiguration of the logger instance.
13. Escape the "SaaS Tax" (Self-Hosting) πΈ
If you are an agency handling 50 clients, paying $30/mo per seat for debugging tools adds up.
Since this is a standard Dockerized Node.js app, you can deploy it to your own generic VPS.
FROM apify/actor-node:20
COPY . .
RUN npm install
CMD npm start
GitHub Repo (v2.8.7 is out now!)
Top comments (0)