DEV Community

Cover image for Stop paying for webhook debuggers. I built a better one (Open Source).
Ahmed Rehan
Ahmed Rehan

Posted on

Stop paying for webhook debuggers. I built a better one (Open Source).

The Struggle

I remember the payment bug that kept me up until 3 AM.
Stripe was sending a invoice.payment_failed webhook, but only in production.
I checked my logs: Truncated.
I checked my tunneling tool: Session expired.
I checked my SaaS bin: History limit reached.

I realized I didn't have a debugging tool; I had a toy.

The Solution: Webhook Debugger

I decided to build my own solution. But I didn't just want a "bucket" that catches requests. I wanted to build a Reference Implementation for how a modern, secure Node.js application should look in 2026.

Here are the 13 Engineering Patterns I used to build it:

1. Global SSE Heartbeat & Padding πŸ’“

Most SSE implementations leak memory by creating a timer per connection.
My Approach: A single setInterval iterates a Set of clients.
The Pro Tip: I added res.write(' '.repeat(2048)) (2KB whitespace) and X-Accel-Buffering: no headers. Why? Because corporate firewalls (and Nginx) love buffering streams. The padding forces them to flush the connection immediately.

2. SSRF Protection (DNS & IP Verification) πŸ›‘οΈ

Allowing user-defined webhooks is dangerous (Server-Side Request Forgery).
The Fix: I wrote a custom validator in src/utils/ssrf.js that resolves the DNS before the request. It checks the IP against a blocklist of private ranges (RFC 1918) and cloud metadata services (169.254.169.254). It even handles IPv4-mapped IPv6 addresses (::ffff:127.0.0.1).

3. Deep Replay with Exponential Backoff πŸ”„

Retrying a failed webhook isn't just "try again".
The Logic: If the destination yields a transient error (ECONNABORTED, 503), the system waits 1s, then 2s, then 4s.
Header Stripping: The replay engine automatically strips sensitive headers (Authorization, Cookie) so you don't accidentally send production credentials to your local dev environment.

4. Timing-Safe Authentication ⏱️

Never compare API keys with ===.
The Attack: An attacker can measure how long your server takes to say "No" to guess the key character-by-character.
The Fix: I use crypto.timingSafeEqual in src/utils/auth.js to ensure the comparison takes the exact same time whether the key is 99% correct or 0% correct.

5. Memory-Safe Rate Limiting (LRU) 🧠

Standard rate limiters are often purely in-memory maps. If a botnet hits you with 1 million IPs, your server crashes (OOM).
The Pattern: My RateLimiter uses a Sliding Window with LRU Eviction. It hard-caps at 1,000 entries. If the map is full, the oldest IP is evicted to make room. It prioritizes stability over strictness.

6. Memory-Safe Dataset Filtering πŸ”

Searching for a single timestamp in a 1GB JSON dataset will crash a standard Node.js process.
The Solution: Iterative Pagination. The /replay endpoint reads chunks of 1000 items (dataset.getData({ limit, offset })), searches for the event ID, and fetches the next chunk only if not found. This ensures we never load the entire dataset into memory.

7. Input Sanitization & Coercion 🧹

Inputs from the wild are messy. Strings look like numbers; booleans look like strings.
The Pattern: A dedicated coerceRuntimeOptions utility in src/utils/config.js recursively walks the input object, coercing "true" -> true and "5" -> 5, ensuring the runtime configuration isn't crashed by type mismatches.

8. Index.html Caching πŸš€

We serve a UI, but we aren't a CDN.
The Optimization: The index.html template is read from disk once at startup and cached in a string variable (indexTemplate). Placeholders like {{VERSION}} are replaced on-the-fly using escapeHtml(), but the disk I/O cost is paid only once.

9. Bootstrap Validation Logic 🩹

What happens if the user manually edits the INPUT.json and breaks the JSON syntax?
Self-Healing: The ensureLocalInputExists function in src/utils/bootstrap.js detects corrupt JSON on startup. Instead of crashing, it automatically renames the bad file to .tmp and writes a fresh default configuration, logging a warning. The app always starts.

10. Optimized Headers ⚑

Every millisecond counts.

  • Content-Encoding: identity: Disables gzip for the SSE stream (gzip buffers, which kills real-time).
  • Cache-Control: no-cache: Forces browsers to verify the stream status.
  • Connection: keep-alive: Critical for long-lived streams.

11. Testing Asynchronous Code πŸ§ͺ

Testing streaming and retries is notoriously hard.
The Strategy: We use jest with custom helpers (waitForCondition in tests/helpers/test-utils.js) and mocked timers. In resilience.test.js, we mock axios to fail exactly twice with ECONNABORTED to verify the retry logic attempts exactly 3 times before giving up.

12. Hot-Reloading (Zero Downtime Config) πŸ”₯

The Problem: Restarting the server just to change the API key or add a webhook URL loses all SSE connections.
The Solution: A background poller in src/main.js reads the INPUT.json from the Key-Value Store every 5 seconds. When a change is detected:

  1. It diffs the new config against the old one.
  2. It updates middleware (body parser limits, rate limiter), auth keys, and webhook counts dynamically.
  3. It reconciles the webhook IDs: if the user increased urlCount, new IDs are generated; if decreased, no IDs are removed to prevent data loss.

This is all enabled by the loggerMiddleware.updateOptions() function, which allows runtime reconfiguration of the logger instance.

13. Escape the "SaaS Tax" (Self-Hosting) πŸ’Έ

If you are an agency handling 50 clients, paying $30/mo per seat for debugging tools adds up.
Since this is a standard Dockerized Node.js app, you can deploy it to your own generic VPS.

FROM apify/actor-node:20
COPY . .
RUN npm install
CMD npm start
Enter fullscreen mode Exit fullscreen mode

GitHub Repo (v2.8.7 is out now!)


Top comments (0)