The series “Working With Webhooks” explores the most important concepts to consider when receiving incoming webhooks.
As a developer, ensuring your back-end works smoothly is key to delivering reliable services. If you don’t ingest and process your webhooks properly, you risk poor performance and server outages, which can negatively impact your product, your users, and your team.
In this language-agnostic reference guide, you’ll learn to take your webhook game to the next level by implementing delayed processing, one of the most important concepts when building reliable webhooks. Along the way, we’ll cover the basics of ingestion, queueing, processing, retries, and alerts.
After reading this guide, you’ll have a solid understanding of what it takes to build a robust and well-designed webhook infrastructure.
Webhooks are considered asynchronous by nature. This is because no specific response is expected from your server, besides a receipt confirmation.
Developers who are just getting started with webhooks will often set up their handlers in a very simple way:
- Receive the request.
- Perform some method.
- Return once the method is completed.
While this might seem like a solid approach to handling webhooks, it has several weaknesses.
- You don't control the rate at which you receive webhooks.The platform sending webhooks to your server might deliver a burst of requests all at once. This could be due to bulk actions taken on your end (like uploading a new email list to an email service), or it could be because the platform has accumulated a backlog of webhooks to send after an outage. Whatever the cause, you can see how a large volume of requests in a short span of time might overwhelm a system built using a bare-bones approach.
- It doesn’t account for errors.Your webhook handling method might run into an error because it's running out of resources, depends on a third party that is returning an error, or is simply a bad deploy.
- It doesn’t account for outages.Outages happen, and your webhook infrastructure needs to be prepared for them. You can’t assume your HTTP endpoint will be listening when you receive the webhook. In the case of an outage, you still need to be able to deliver consistent results.
- It doesn’t account for timeout windows.Many platforms offer small timeout windows on the request. If your server fails to perform the work within that window, the connection will be terminated. In the overly simplistic approach outlined above, the method you perform upon receiving the inbound webhook must complete before that window closes, or the requesting platform may report a failure – even if the method eventually succeeds.
You’ve decided that the bare-bones approach isn’t for you. Good instincts. So, how should you design your webhook infrastructure? This is where delayed processing for webhooks comes in.
Most of the time, webhooks can be handled easily. But if your server receives too many requests at once, it can become overloaded. Add all the other unrelated jobs your server handles to that ballooning volume of requests, and you may have a service outage on your hands.
A well-designed webhook system will handle these volume spikes efficiently and process all inbound requests, regardless of volume. To that end, your server should perform the bare minimum amount of work upon receiving a webhook. We call this delayed processing.
A server that follows the principles of delayed processing will first return an HTTP 200 response, then queue the event to be processed asynchronously by your system.
Utilizing delayed processing has several advantages:
- It limits the number of resources necessary to process a request.
- It gives you more capacity to handle multiple concurrent requests.
- It allows you the freedom to employ processing-intensive rules and logic on inbound webhooks, without affecting the reliability of your infrastructure.
A webhook infrastructure with delayed processing must successfully implement three key steps: ingestion, queuing and processing.
Our only goal here is to expose a simple HTTP POST endpoint. This endpoint must:
- Receive webhooks
- Store the request headers & body in a queue
- Return an HTTP 200 status code
The endpoint should be built to utilize as few resources as possible, which will ensure it remains scalable.
Once we’ve successfully received the request, we need to keep track of it to process at a later time. The best solution here is to implement a queue.
However you design your queue, make sure it won't crumble under the load. Most queues are limited by the number of concurrent connections, or throughput in GB/s – you want to make sure that your queueing system can keep pace with any unexpected spikes in volume. Some queues like AWS SQS and PubSub are virtually limitless, making them great candidates for this use case.
One good strategy to save on costs is to store the webhook headers and bodies in a file on AWS S3 or GCP Cloud Storage and queue a reference to the file instead of the actual contents.
Depending on the nature of your use case, you might have to have multiple queues for different webhook topics or types. Some request types will naturally be more important than others. By introducing multiple queues, you can assign different priorities to each one.
Once you have your webhooks in a queue, you can safely process them. Just be sure to do so at a rate that is reasonable for your infrastructure and use case. Different queues take different approaches to processing – but, generally speaking, you will want a set of worker services, each pulling from the queue at its own pace.
A webhook queueing system is a great way to ingest and process webhook requests. But even the best queueing systems can drop requests when faced with server overload or other network issues. When this happens, your system won’t complete the underlying transaction or event notification, which can lead to a poor user experience. To handle such cases, we need to implement a retry feature that triggers on two scenarios: ingestion failure and processing failure.
If you fail to ingest a webhook request, you'll need to rely on your provider retry policy. Most platforms have some form of retry strategy. If the platform doesn't offer retries, or you fail to receive all retries, you will need to reconcile your data by pulling the provider API. Since this is not always possible, the reliability of your ingestion service is critical.
It's virtually guaranteed that, at one point or another, you will run into errors when processing your queue. Thankfully you can leverage your queue to perform retries by properly acknowledging (or not acknowledging) the messages.
Don't forget to also include a dead letter policy to set aside messages that fail to process multiple times in a row. If you don’t, you may end up jamming the queue with events that are unprocessable.
You need to know when things aren't going as planned in your production environment. When it comes to delayed processing of webhooks, there are two places where alerts become critical.
If your ingestion service responds with an error, this means you are actively missing webhooks. You will want to monitor HTTP responses from your endpoint, and set up an alert if the status isn't a 2xx. Some API providers will also offer some form of alerting if your endpoint doesn’t return a successful response.
If you fail to process a message multiple times in a row, you'll want to inspect it so you can determine the root cause of the failure. You will want to monitor the depth of your dead letter queue to alert you when new messages appear.
If you’re looking for a robust platform to handle all the above complexity for you, Hookdeck has your back. You can easily set up and monitor your webhooks without worrying about ingestion, queuing, or troubleshooting processes in your workflow. And it’s free!
To test-drive these features and more, sign up today!