If you have ever tried to wire up webhooks from more than three SaaS apps, you already know the punchline: every vendor invented their own conventions, and none of them are wrong, but none of them agree either.
I was building agent tooling that had to understand many of them. Halfway through, I stopped trying to keep it in my head and started writing it down. Then I kept writing it down. The result is now an open dataset.
1,119 webhook events. 30 platforms. One schema. Free, CC-BY-4.0.
- HuggingFace: automatelab/saas-webhooks
- Browsable index: automatelab.tech/products/datasets/saas-webhooks/
What it is
A normalized catalog covering Stripe, GitHub, Slack, Notion, Linear, Jira, HubSpot, Salesforce, Zendesk, Intercom, Discord, Twilio, Calendly, Mailchimp, Zoom, Microsoft Teams, PagerDuty, Pipedrive, Asana, ClickUp, Front, Help Scout, Loom, Greenhouse, Ashby, BambooHR, Gusto, Attio, Close, and Freshdesk.
For every event, you get:
-
event_nameandtrigger_description -
payload_schemaas JSON Schema (draft 2020-12) -
auth_method,signature_header, and the exact signing algorithm -
delivery_guaranteesandretry_policy -
idempotency_key_header(if the vendor provides one) -
docs_urlback to the canonical vendor docs - Format: JSONL per vendor, plus Parquet for the full set
A sample row
Here is what a single Stripe event looks like, trimmed:
{
"vendor": "stripe",
"category": "payments",
"event_name": "account.application.authorized",
"trigger_description": "Fires when a user authorizes a Stripe application.",
"auth_method": "hmac-sha256",
"signature_header": "Stripe-Signature",
"signature_algorithm_detail": "HMAC-SHA256 with versioned scheme; header contains timestamp and v1 hash. Verify timestamp to prevent replay attacks.",
"delivery_guarantees": "at-least-once",
"retry_policy": {
"max_attempts": null,
"backoff": "Exponential backoff over multiple hours.",
"total_retry_window": "PT72H"
},
"payload_schema": { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "...": "..." } },
"docs_url": "https://docs.stripe.com/api/events/types"
}
Same shape across all 30 vendors. That is the entire point.
The surprising stuff
A few things jumped out once everything was in one schema:
Signing is not standardized in any way. A small sampler:
| Vendor | Auth | Header | Detail |
|---|---|---|---|
| Stripe | HMAC-SHA256 | Stripe-Signature |
Timestamp + v1 hash, replay-window enforced |
| GitHub | HMAC-SHA256 | X-Hub-Signature-256 |
Plus legacy SHA1 on X-Hub-Signature
|
| Slack | HMAC-SHA256 | X-Slack-Signature |
5-minute timestamp window |
| Shopify | HMAC-SHA256 | X-Shopify-Hmac-Sha256 |
Base64 of HMAC of raw body |
| Linear | HMAC-SHA256 | Linear-Signature |
Hex digest only |
Same algorithm, five different envelopes. Anyone writing one verifier and trying to reuse it has a bad afternoon ahead.
Retry policies are wildly different. Stripe retries for 72 hours with exponential backoff. GitHub does not retry by default at all (it depends on app type). Slack retries 3 times. Some vendors do not publish a retry policy in their docs, which means you should not rely on one.
Idempotency support is hit-or-miss. GitHub gives you X-GitHub-Delivery. Stripe gives you the event id. Several vendors give you nothing, which means you either dedupe by payload hash or accept duplicates.
Max payload sizes are mostly undocumented. GitHub publishes 25 MB. Most vendors do not say.
These are not opinions. They are facts I would rather not have had to learn the hard way.
Why this exists
I am building agents that integrate with many SaaS products. Two things have to be true for an agent to call any webhook tool correctly:
- The agent needs the payload schema to know what fields it can rely on.
- The agent needs the auth contract to know how to validate inbound deliveries.
If that information is scattered across 30 different docs sites in 30 different shapes, the agent cannot use it. Once it is in a single schema, the agent can.
Same logic applies to anything that has to interop with many vendors at once: an integration platform, a security scanner that audits webhook configurations, a docs site that needs to render comparison tables, a research notebook.
Use it however
from datasets import load_dataset
ds = load_dataset("automatelab/saas-webhooks")
stripe_events = ds["train"].filter(lambda r: r["vendor"] == "stripe")
print(stripe_events[0]["payload_schema"])
The dataset is CC-BY-4.0. No email gate, no sign-up. Attribution is appreciated when you ship something interesting on top of it.
It updates monthly from source. If you find a missing event or a vendor that should be there, the GitHub issue tracker is the right place to drop it.
Links
- Dataset on HuggingFace: automatelab/saas-webhooks
- Browsable catalog (one page per vendor): automatelab.tech/products/datasets/saas-webhooks/
- All open datasets: automatelab.tech/products/datasets/
If you build something on top of it I would genuinely like to know. The whole point of putting it under CC-BY is that the work compounds.
Top comments (0)