DEV Community: DevOps Daily

What Actually Happens After You Send a Webhook

DevOps Daily — Thu, 30 Jul 2026 11:21:14 +0000

The first version of a webhook is always the same four lines:

await fetch(customer.webhookUrl, {
  method: 'POST',
  headers: { 'content-type': 'application/json' },
  body: JSON.stringify(event),
});

It works. You ship it. Then the tickets start.

A customer's endpoint was down during a deploy and they want the events from that window. Another integration processed the same order twice. Someone asks how they can prove the request came from you. Six hours after an incident, a customer wants to know exactly when you attempted event evt_8813 and what their server returned.

The HTTP call is the easy part. Delivery is the queue, retry policy, signature scheme, idempotency story, and attempt log around it.

Those failure paths are difficult to learn from a finished code sample, so we built an interactive Webhook Delivery Simulator. It lets you break a delivery on purpose, follow every attempt, inspect the signed bytes, and redeliver the same message to see what a safe receiver does.

The delivery outcomes are simulated. The HMAC-SHA256 signing and verification are real and run locally in your browser.

TL;DR

A webhook sender is a durable state machine, not an HTTP client with a retry loop.
Retry timeouts, 429, and 5xx responses. Most other 4xx responses should stop immediately.
A timeout means the outcome is unknown. The receiver may have completed the work before its response was lost.
At-least-once delivery makes receiver-side deduplication mandatory.
Sign the message ID, timestamp, and raw body. Verify before parsing.
The simulator uses the published Svix retry schedule and Standard Webhooks signing model so the behavior maps to a concrete production system.

Start with the failure that retries are for

Open the simulator, leave the endpoint response set to Intermittent, and send invoice.paid.

The endpoint returns two temporary failures and then recovers:

Attempt 1: 503 -> retry in 5 seconds
Attempt 2: 503 -> retry in 5 minutes
Attempt 3: 200 -> delivered

This is the shape of a deploy, a brief database outage, or a process restarting at the wrong moment. Trying again helps because the endpoint is broken now, not permanently.

The simulator compresses the real waits into a few seconds. Its time-warp indicator still shows the actual gap being skipped, and every attempt keeps its own status, scheduled time, duration, response body, and explanation.

That attempt history is not optional operational polish. It is how you answer the customer who asks why an event arrived five minutes late.

Retry failures, not mistakes

The useful question is not "Did the request fail?" It is "Could the same request plausibly succeed later?"

Result	Retry?	Reason
Timeout	Yes, carefully	There is no reliable outcome
`429 Too Many Requests`	Yes	Back off and reduce the endpoint's rate
`500`, `502`, `503`	Yes	The server may recover
`408 Request Timeout`	Yes	The server explicitly asks for another attempt
`400`, `422`	No	The payload is still wrong on the next attempt
`401`, `403`	No	Retrying cannot repair credentials
`404`, `410`	No	The endpoint is gone

Try the 400, 429, 500, and timeout modes. They all produce a non-successful first attempt, but they should not produce the same next state.

Hammering a malformed request for 27 hours does not make it valid. It turns a customer's configuration error into your outbound traffic problem and buries useful failures in the delivery log.

Why the retry gaps get large

Fixed retries are cheap to write and expensive to operate. Retrying every 30 seconds for an hour produces 120 attempts against an endpoint that may already be struggling to recover.

The simulator uses Svix's published retry schedule:

attempt 1   immediately
attempt 2   +5 seconds
attempt 3   +5 minutes
attempt 4   +30 minutes
attempt 5   +2 hours
attempt 6   +5 hours
attempt 7   +10 hours
attempt 8   +10 hours

The final attempt lands roughly 27 hours and 35 minutes after the first.

The exact numbers are less important than the shape. Try quickly enough to ride out a small network blip, then spread later attempts out so a long outage costs a handful of requests rather than thousands. A production sender also adds jitter and per-endpoint rate limits so thousands of delayed messages do not wake up together.

A timeout does not mean the work failed

Timeout is the case that makes webhook delivery genuinely awkward.

Consider this sequence:

The receiver accepts the request.
It updates its database.
Its response is lost, or arrives after your timeout.
Your sender schedules another attempt.

The receiver completed the work, but the sender cannot know that. If you retry, the receiver sees the message twice. If you do not retry, you might silently lose an event. There is no third choice that avoids both risks.

The normal answer is at-least-once delivery: retry the unknown outcome and make repeated processing safe.

A stable message ID must survive every retry and manual redelivery. The receiver stores that ID in the same transaction as the business change:

async function handleWebhook(messageId, event) {
  await database.transaction(async (tx) => {
    if (await tx.processedMessages.exists(messageId)) {
      return; // Seen already. Return 200 so retries stop.
    }

    await applyBusinessChange(tx, event);
    await tx.processedMessages.insert({ messageId });
  });
}

Writing the dedup row before the work risks losing the event after a crash. Writing it after the work lets two concurrent deliveries pass the check. One transaction closes both gaps.

After a successful run in the simulator, click Redeliver same message. The message ID stays stable and the receiver acknowledges the duplicate without applying the event twice.

Verify bytes before you trust JSON

Webhook signatures prove that the sender knows a shared secret and that the signed content was not changed in transit.

The simulator follows the Standard Webhooks format used by Svix. The signature covers the message ID, timestamp, and raw request body:

const signedContent = `${messageId}.${timestamp}.${rawBody}`;
const signature = hmacSha256(secret, signedContent);

The important word is raw.

If your framework parses the JSON and you serialize it again, whitespace, escaping, or property order can change. The object may mean the same thing while its bytes are different, which correctly produces a different signature.

A receiver should:

Read the raw body.
Reject timestamps outside the accepted replay window.
Verify the HMAC using constant-time comparison.
Parse the JSON only after verification passes.
Deduplicate using the stable message ID.

Open Verify signature in the simulator and try each failure mode:

Edit body changes one value after the request was signed.
Wrong secret calculates a valid HMAC with the wrong key.
Replay later moves verification outside the timestamp tolerance.
Untouched restores the valid request.

The simulator shows the signed content and the exact rejection reason rather than reducing every failure to "invalid signature."

Where Svix fits

The useful thing about Svix Dispatch is not that it sends an HTTP request. The useful thing is that it packages the operational surface around the request: durable delivery, automatic retries, signing and secret rotation, rate limits, event filtering, searchable attempt logs, manual replay, and a customer-facing endpoint portal.

Building can still be reasonable when you control every consumer, event volume is low, or you already run a capable job system such as Temporal, Sidekiq, or River.

The boundary changes when the endpoints belong to customers. At that point, retries are only one part of the product. Customers also need to configure endpoints, rotate secrets, inspect failed attempts, and replay messages without asking an engineer to search production logs.

A useful test is to write down who answers this question:

"Did you send event evt_8813, and what did our endpoint return?"

If the answer is "an engineer with log access," include that recurring support cost in the build-versus-buy calculation.

A five-minute simulator walkthrough

Send invoice.paid to the Intermittent endpoint.
Watch the delivery path and compressed retry delays.
Select each attempt and compare its response and scheduled time.
Inspect the request headers and raw body.
Open Verify signature and edit the body.
Restore the untouched request and verify it successfully.
Redeliver the same message and confirm that the receiver ignores it.
Repeat with Timeout, then with 400 bad request.

The final comparison is the useful one. A timeout and a 400 both lack a successful response, but they require opposite decisions: retry the unknown outcome and drop the invalid payload.

What to carry into production

Before calling a webhook sender reliable, check that it has:

Durable storage before the first attempt
Explicit response classification
Backoff with jitter and a defined give-up point
Stable message IDs across retries
HMAC signatures over the raw body
Timestamp-based replay protection
Receiver-side deduplication
Per-endpoint rate limits
Searchable attempt history
Manual replay with an audit trail

The POST is the smallest part of the system. Reliability comes from the state and operational controls around it.

Use the Webhook Delivery Simulator to exercise the failure paths, then read the full production webhook delivery guide for a typed Svix sender and Express receiver.

Receiving webhooks without getting burned

DevOps Daily — Mon, 27 Jul 2026 16:48:59 +0000

Sending a webhook is easy. You POST some JSON at a URL and move on.

Receiving one is where the bodies are buried. The endpoint is public, so anyone can call it. It gets retried, so it will run twice. It arrives out of order, so "delivered" can land before "sent". And it is on the critical path of somebody else's system, so if you are slow they will time out and retry, which makes you slower.

None of this is hard once you know it. All of it is invisible until production. Here is the complete set of things a webhook receiver has to handle, with the failure each one prevents.

The running example is email delivery webhooks (bounces and complaints), because they happen to have every awkward property at once: they are security-sensitive, they retry, they arrive out of order, and processing one twice corrupts real state. Everything here applies just as well to Stripe, GitHub, Shopify or anything else that calls you back.

1. Verify the signature, on the raw body

Your endpoint is a public URL. Without verification, anyone who learns it can post a fake "this address hard bounced" event and get a customer suppressed, or a fake "payment succeeded" and get a free subscription.

Providers sign each request with a shared secret. You recompute the signature and compare.

import { createHmac, timingSafeEqual } from "node:crypto";

export function verifySignature(
  rawBody: string,
  signatureHeader: string,
  secret: string,
): boolean {
  const expected = createHmac("sha256", secret).update(rawBody).digest("hex");

  const a = Buffer.from(expected);
  const b = Buffer.from(signatureHeader);

  // Different lengths mean it cannot match, and timingSafeEqual throws
  // rather than returning false if the lengths differ.
  if (a.length !== b.length) return false;

  return timingSafeEqual(a, b);
}

Two things in there matter more than they look.

Use timingSafeEqual, not ===. A normal string comparison returns as soon as it finds a differing byte. An attacker who can measure that timing can recover a valid signature byte by byte. It is a real attack, it is easy to avoid, and the fix is one function call.

Sign the raw body, not the parsed object. This is the single most common webhook bug, and it is maddening to debug because everything looks correct.

// This breaks signature verification. JSON.parse then JSON.stringify is not
// byte-identical to what was sent: key order, whitespace and unicode escaping
// can all change.
app.use(express.json());
app.post("/webhooks/email", (req, res) => {
  verifySignature(JSON.stringify(req.body), req.get("X-Signature"), SECRET);
});

// Capture the raw bytes for this route before anything parses them.
app.post(
  "/webhooks/email",
  express.raw({ type: "application/json" }),
  (req, res) => {
    const raw = req.body.toString("utf8");
    if (!verifySignature(raw, req.get("X-Signature"), SECRET)) {
      return res.sendStatus(401);
    }
    const event = JSON.parse(raw);
    // ...
  },
);

In a Next.js route handler you get this for free, because await req.text() gives you the untouched body:

export async function POST(req: Request) {
  const raw = await req.text();
  const signature = req.headers.get("x-signature") ?? "";

  if (!verifySignature(raw, signature, process.env.WEBHOOK_SECRET!)) {
    return new Response("invalid signature", { status: 401 });
  }

  const event = JSON.parse(raw);
  // ...
}

If your framework has body-parsing middleware enabled globally, exempt the webhook path. Every "the signature is valid in curl but fails in my app" question traces back to this.

2. Reject replays

A valid signature proves the payload came from your provider. It does not prove it is happening now. Someone who captures one request can send it again in a month, signature intact.

Providers include a timestamp in the signed payload for this. Check it:

const FIVE_MINUTES = 5 * 60 * 1000;

function isFresh(timestamp: number): boolean {
  return Math.abs(Date.now() - timestamp) < FIVE_MINUTES;
}

Use Math.abs so clock skew in either direction is handled. Five minutes is the usual tolerance: long enough to survive a slow retry, short enough that a captured request is useless by the time anyone finds it.

3. Return 200 immediately, do the work afterwards

Providers give you a short timeout, often 5 to 30 seconds. Go over it and they record a failure and retry. If your handler is slow because it sends an email, updates three tables and calls another API, you will get retried while the first attempt is still running, and now you have two of them.

Acknowledge first, process after:

export async function POST(req: Request) {
  const raw = await req.text();
  if (!verifySignature(raw, req.headers.get("x-signature") ?? "", SECRET)) {
    return new Response("invalid signature", { status: 401 });
  }

  const event = JSON.parse(raw);

  // Durable queue, not a floating promise. If the process dies between the
  // 200 and the work, a background job survives it; a stray async call does not.
  await enqueue("webhook-events", event);

  return new Response("ok", { status: 200 });
}

The enqueue has to be durable. processEvent(event) without an await, fired off before returning, will silently lose events on deploy, restart or crash, and you will never know which ones.

Return the right status. 200 means "I have it, stop retrying". A 4xx means "this is broken, do not bother retrying". A 5xx means "try me again". Returning 200 on an error you could have recovered from throws the event away permanently.

4. Assume every event arrives twice

Retries are not an edge case. They happen on timeouts, on deploys, on network blips, and some providers retry on a schedule for hours. Your handler will run twice on the same event, and it must not do the work twice.

Do not solve this with a "have I seen this?" check in application code. Two concurrent retries will both read "no" before either writes. Let the database decide:

CREATE TABLE processed_webhook_events (
    event_id     TEXT PRIMARY KEY,
    processed_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

async function handleOnce(event: WebhookEvent) {
  const claimed = await db
    .insertInto("processed_webhook_events")
    .values({ event_id: event.id })
    .onConflict((oc) => oc.doNothing())
    .executeTakeFirst();

  // Somebody else already claimed this id: a retry, or a concurrent delivery.
  if (claimed.numInsertedOrUpdatedRows === 0n) return;

  await doTheActualWork(event);
}

The primary key does the work. Whichever request inserts first wins, the other returns immediately, and no amount of concurrency changes that.

If the provider does not send a stable event id, build one from fields that identify the occurrence: the message id plus the event type plus the timestamp. Do not hash the whole payload, because providers add fields over time and your fingerprint would change for what is really the same event.

5. Events arrive out of order

There is no ordering guarantee. A retried sent from three minutes ago can land after the delivered that followed it. Naive handling walks the status backwards:

// Wrong: last writer wins, whatever it says.
await db.update(emails).set({ status: event.type }).where(eq(emails.id, id));

An email that was delivered, then bounced, then had its sent retried will finish as "sent". Every dashboard is now wrong.

Fix it with precedence, not timestamps. Some states are terminal and outrank late arrivals:

const RANK = { queued: 0, sent: 1, delivered: 2, complained: 3, bounced: 4 };

function nextStatus(current: Status, incoming: Status): Status {
  return RANK[incoming] > RANK[current] ? incoming : current;
}

Now a late sent cannot overwrite bounced, and the order events happen to arrive in stops mattering. Provider timestamps work too, but they come from someone else's clock, and precedence encodes what you actually mean.

6. Fail loudly, and keep what you could not process

Events you reject are gone. The provider retries a few times and gives up, and by then nobody is looking. Store what you could not handle:

try {
  await handleOnce(event);
} catch (err) {
  await db.insert(failedWebhookEvents).values({
    eventId: event.id,
    payload: JSON.stringify(event),
    error: String(err),
  });
  throw; // 5xx so the provider retries as well
}

Two reasons this earns its place. You can replay after fixing the bug, and a sudden pile of rows in that table is the clearest possible alert that something changed on their side.

Testing it locally

You cannot receive a webhook on localhost from the internet, so use a tunnel:

# cloudflared, ngrok or similar: whatever puts a public URL on your local port
cloudflared tunnel --url http://localhost:3000

For the tests that matter, skip the network entirely. Signature verification, idempotency and ordering are pure functions of the payload, so test them directly:

it("ignores a replayed event", async () => {
  const event = { id: "evt_1", type: "bounced", messageId: "m1" };

  await handleOnce(event);
  await handleOnce(event); // the retry

  expect(await countSuppressions("m1")).toBe(1);
});

it("does not let a late sent overwrite a bounce", () => {
  expect(nextStatus("bounced", "sent")).toBe("bounced");
});

These run in milliseconds, need no tunnel, and cover the failures that actually happen in production.

The checklist

Verify the signature on the raw body, with a constant-time compare.
Reject events outside a few minutes of now.
Return 200 fast, do the work in a durable queue.
Deduplicate on the provider's event id, enforced by a unique constraint.
Rank your states so late events cannot walk them backwards.
Persist failures so you can replay them and notice them.

Most providers document their half of this well. SMTPfast's delivery webhooks send a signed payload with a stable event id per delivery, bounce and complaint, which is what makes points 1 and 4 straightforward, and it is worth checking your provider gives you both before you build against them. If they do not send a stable event id, you have to synthesise one, and that is a good thing to find out on day one rather than after your first duplicate.

The pattern is the same everywhere: an endpoint anyone can call, that runs more than once, in an order you do not control. Build for that from the start and webhooks are boring. Discover it in production and they are the thing that quietly corrupted a week of data.

Build a Terraform provider for your side project's API (it's smaller than you think)

DevOps Daily — Thu, 23 Jul 2026 19:56:52 +0000

Writing a Terraform provider sounds like a big-company activity, something HashiCorp partners do with a team and a roadmap. In practice, if your API has CRUD endpoints, the provider is mostly plumbing. Here is the map worth having before you start.

The running example throughout is the open-source Terraform provider for SMTPfast, a transactional email API. It is a small, complete provider you can read end to end as a reference:

https://github.com/smtpfast/terraform-provider-smtpfast

Why bother

Because your users' infrastructure already lives in Terraform. If creating a domain or an API key in your product requires clicking a dashboard, your product is the one manual step in their otherwise automated stack. A provider turns your product into:

resource "smtpfast_domain" "prod" {
  name = "mail.example.com"
}

resource "smtpfast_api_key" "ci" {
  name = "ci-sender"
}

That is a real workflow now: domains and keys created in code review, not in a browser.

The stack in 2026: Plugin Framework

Use the Terraform Plugin Framework (terraform-plugin-framework), not the older SDKv2. It is the maintained path, strongly typed, and much less magical. A provider is a Go module with three kinds of pieces:

The provider type: configuration (API endpoint, API key) and client setup.
Resources: things Terraform creates, updates, and deletes. Each one implements Create, Read, Update, Delete.
Data sources: read-only lookups.

The skeleton for a resource looks like this (trimmed):

type domainResource struct {
    client *apiClient
}

func (r *domainResource) Create(ctx context.Context, req resource.CreateRequest, resp *resource.CreateResponse) {
    var plan domainModel
    resp.Diagnostics.Append(req.Plan.Get(ctx, &plan)...)

    domain, err := r.client.CreateDomain(ctx, plan.Name.ValueString())
    if err != nil {
        resp.Diagnostics.AddError("create failed", err.Error())
        return
    }

    plan.ID = types.StringValue(domain.ID)
    plan.Status = types.StringValue(domain.Status)
    resp.Diagnostics.Append(resp.State.Set(ctx, plan)...)
}

Read fetches current state, Delete calls the delete endpoint, Update is often empty for immutable resources. If you have written an API client before, none of this is new; the framework just dictates where each piece lives.

The parts that actually take time

State drift is the whole game. Terraform's job is comparing desired state to real state, and your Read function is where reality comes from. Get lazy there (returning stale fields, ignoring server-side changes) and users get mysterious diffs on every plan. Rule of thumb: Read should map every attribute you declared, from the API response, every time.

Sensitive values need care. Some APIs return an API key only once, at creation (SMTPfast's does). That means the provider stores the secret in state at create time and must not try to re-read it later. Mark it Sensitive: true in the schema and document that state files must be treated as secrets (they always should be anyway).

Import support is what makes it adoptable. Users have existing domains. terraform import support (implementing ImportState) lets them adopt the provider without recreating anything. Skipping it is the difference between a toy and a tool.

Acceptance tests hit the real API. The framework's acceptance test harness (TF_ACC=1) spins resources up and down against a live account. Budget for a test tenant, because these tests create and destroy real things.

Publishing to the registry

This part is more checklist than code:

Repo named terraform-provider-<name>, tagged releases with semver (v0.1.0).
Release binaries built by GoReleaser for the usual OS/arch matrix.
Sign the release with a GPG key.
Publish the GPG public key on the Terraform Registry, then add the provider from your GitHub repo.

After that, terraform init fetches the provider like any other, and users write source = "smtpfast/smtpfast" in their required_providers block.

Is it worth it?

For the effort (roughly a weekend for v0 with two resources and a data source), unreasonably yes. It closes a real gap for users who automate everything, and writing the provider tends to surface API inconsistencies you did not know you had (drift makes a sloppy API visible). Infrastructure-as-code users are exactly the audience a developer API wants.

If your side project has an API, put a provider on the list. Start with your two most-created resources, implement Read properly, ship import support, and grow from there. The SMTPfast provider repo is MIT and small enough to read in one sitting; steal the structure.

Send an Email by Hand: The Raw SMTP Conversation (and Why You Should Not Do It in Production)

DevOps Daily — Thu, 23 Jul 2026 10:21:43 +0000

Every email your application sends is, underneath the library and the API, a short text conversation between two servers. You can have that conversation yourself: open a socket to a mail server, type a handful of commands, and a real message lands in a real inbox. Doing it once, by hand, teaches you more about email than any amount of reading, because it shows you exactly what your send() call is doing on your behalf.

This post walks the whole SMTP conversation one command at a time, then explains the harder truth: the reason nobody sends production email this way. The gap between "I typed the commands and it worked" and "millions of messages reach the inbox every day" is where retries, encryption, authentication, DKIM, suppression, and sender reputation live. Understanding the raw protocol is exactly what makes those production concerns make sense.

If you would rather watch the flow than type it, our SMTP Flow Simulator animates the same conversation, from app submission through TLS, auth, DNS checks, the recipient MX relay, retries, and bounces. Keep it open in a tab as you read.

TL;DR

SMTP is a line-based text protocol. The client types commands (EHLO, MAIL FROM, RCPT TO, DATA); the server answers with 3-digit codes (220, 250, 354).
You can send a real email by hand with telnet or openssl s_client. It works, and it is the single best way to understand the protocol.
The envelope (MAIL FROM / RCPT TO) is separate from the headers (From: / To: inside DATA). That split is why spoofing is easy and why SPF, DKIM, and DMARC exist.
Production sending needs everything the raw conversation does not give you: TLS everywhere, authentication, DKIM signing, connection reuse, retry-with-backoff, bounce and complaint handling, suppression lists, and IP/domain reputation.
Once you have seen the protocol, an API like SMTPfast stops being a black box: it is the raw conversation plus every production concern handled for you.

Prerequisites

A terminal with telnet and openssl (both ship on macOS and most Linux distros).
A rough idea of TCP ports and DNS. You do not need to know SMTP yet, that is the point.
A domain you control if you want to test authenticated sending. Sending to your own address is the safe way to experiment.

The conversation, one command at a time

SMTP runs on a few well-known ports: 25 (server-to-server relay), 465 (implicit TLS submission), and 587 (submission with STARTTLS). As a client submitting mail, you want 587.

Every exchange follows the same rhythm: you send a line, the server replies with a 3-digit status code and some text. 2xx means success, 3xx means "keep going, send more", 4xx is a temporary failure (try again later), and 5xx is permanent (do not retry).

Here is the opening. Connect to port 25 of a mail server and say hello with EHLO (the extended HELO), which asks the server to list what it supports:

That 250- block is the server advertising what it can do: it supports STARTTLS (upgrade the connection to encrypted), AUTH (log in), a max message SIZE, and 8BITMIME. The last line uses 250 (space, not dash) to signal the end of the list.

Notice what the server told us: it offers STARTTLS, so right now we are talking in plaintext. Anything we send, including a password, is readable on the wire. So before authenticating, we upgrade.

Never send AUTH credentials over an un-upgraded connection. If a server lets you authenticate in plaintext on port 25, that is a red flag, not a convenience. Always STARTTLS (or connect to the implicit-TLS port 465) before AUTH.

Encrypt, authenticate, and send

After STARTTLS, the connection becomes TLS-encrypted and the plaintext telnet can no longer read it. The practical way to do the encrypted half by hand is openssl s_client, which performs STARTTLS for you and then drops you into the now-secure session:

That 250 Ok: queued as 4F1a2b3c is the moment the server accepts responsibility for your message. You just sent an email with your bare hands.

Here is the whole handshake as a flow. Open the simulator alongside it to watch the same steps animate, including what happens after the queue (DNS lookups, the recipient's MX, retries, and inbox placement):

The one detail that explains a decade of email security

Look again at two different places the sender address appeared:

In the envelope: MAIL FROM:<you@example.com>
In the headers, inside DATA: From: You <you@example.com>

These are two independent fields, and nothing in SMTP forces them to match. The envelope MAIL FROM is what the receiving server uses for routing and bounce returns; the header From: is what the recipient sees in their mail client. You can put anything you like in either.

That single design fact is why email spoofing is trivial and why the entire modern anti-abuse stack exists:

SPF checks whether the sending IP is allowed to use the envelope MAIL FROM domain.
DKIM cryptographically signs the message so a receiver can verify the header From: domain really authorized it.
DMARC ties the two together and tells receivers what to do when they disagree.

You cannot understand why deliverability is hard until you have seen that the protocol itself will happily let you claim to be anyone. If you want the practical setup for the three records, we walk through them in the SMTP Flow Simulator's DNS-check stage.

Why you should not do this in production

Typing the conversation once is enlightening. Building your production sending on top of raw SMTP calls is a mistake, and here is the specific list of what the happy-path telnet session quietly skips.

Delivery is not a single request. Your 250 queued only means the first hop accepted the message. The receiving server still has to be found (MX lookup), might be down, might greylist you with a 4xx and expect a retry in a few minutes, or might defer under load. Production senders need a real retry queue with exponential backoff that distinguishes 4xx (retry) from 5xx (give up and record a bounce). A shell one-liner does none of this.

Authentication of the message, not just the connection. AUTH LOGIN proved you could log in. It did nothing to prove to the recipient that the message is legitimate. That requires DKIM signing every outgoing message with a private key whose public half lives in your DNS. Get the canonicalization or header selection wrong and signatures fail silently at the receiver.

Connections are expensive and rate-limited. Opening a fresh TCP + TLS handshake per message is slow and will get you throttled. Real senders pool connections, pipeline commands, and respect per-receiver rate limits (Gmail, Outlook, and Yahoo each have their own).

Bounces and complaints must feed back. When a 5xx bounce or a spam complaint (via a feedback loop) comes in, you must stop mailing that address, immediately. Keep hitting dead addresses and mailbox providers read it as spammer behavior and start filtering everything you send. This means maintaining a suppression list and honoring it on every send.

Reputation is earned slowly and lost fast. Mailbox providers score the IP and domain you send from. New senders must warm up gradually; a sudden spike from a cold IP looks like a compromised account. One bad campaign, or one afternoon of retrying dead addresses, can tank delivery for weeks.

None of these are protocol features. They are operational systems you would have to build and run around SMTP. That is the actual product an email platform sells.

The two production paths (and where each fits)

Once you have decided not to hand-roll SMTP, you have two real options, and they are not mutually exclusive.

1. Keep speaking SMTP, but let something else manage it. Your app already knows how to talk SMTP (every language has a client), so the smallest change is to point that client at a service that handles TLS, auth, DKIM, retries, and reputation for you. That is exactly what the SMTPfast SMTP bridge is: you keep your existing nodemailer / smtplib / Mail::Sender code and just change the host, port, and credentials. Everything from the "why not in production" list above becomes someone else's job. This is the path of least resistance for legacy apps and anything that already emits SMTP.

2. Send over a REST API. If you are writing new code, a JSON POST is simpler than managing an SMTP client, connection pool, and MIME construction. You hand over the from, to, subject, and body; the platform builds the message, signs it, sends it, retries it, and streams back delivery events. SMTPfast exposes this as a plain REST API (and there is a hosted MCP server if you want an AI agent to send on your behalf).

The useful way to think about it: the raw conversation you just typed is the floor. An API is that floor plus the retry queue, the DKIM signer, the suppression list, and the reputation management, all of which you would otherwise build and babysit yourself.

Raw SMTP (by hand):

EHLO laptop.local
AUTH LOGIN
...
MAIL FROM:<you@example.com>
RCPT TO:<friend@example.net>
DATA
Subject: Sent by hand

hello
.

SMTP client (bridge):

// point an existing SMTP client at the bridge
const t = nodemailer.createTransport({
  host: 'smtp.smtpfa.st', port: 587,
  auth: { user: 'apikey', pass: process.env.SMTPFAST_KEY }
});
await t.sendMail({ from: 'you@example.com', to: 'friend@example.net', subject: 'hi', text: 'hello' });

REST API:

curl https://smtpfa.st/api/v1/emails \
  -H "Authorization: Bearer $SMTPFAST_KEY" \
  -H 'Content-Type: application/json' \
  -d '{"from":"you@example.com","to":"friend@example.net","subject":"hi","text":"hello"}'

What to take away

The SMTP conversation is small enough to type by hand and old enough to have accumulated every workaround the internet ever invented for trust. Sending one message manually is the fastest way to internalize three things: the protocol is just text, the envelope and headers are separate (so the sender is unverified by default), and the 250 queued you get back is the easy part.

Everything hard about email, deliverability, authentication, retries, reputation, lives above the protocol, in the operational layer. That is precisely the layer you are choosing to build yourself or hand to a service like SMTPfast when you pick how your app sends mail.

Go type the conversation once. Then go watch the whole delivery path, retries and bounces included, in the SMTP Flow Simulator. After that, send() will never look like a black box again.

SPF, DKIM, DMARC: the 15-minute setup that actually passes

DevOps Daily — Wed, 22 Jul 2026 10:23:20 +0000

Every guide to email authentication starts with a history lesson. Skip it. You are here because your emails land in spam, or a client asked "is DMARC set up?", or you saw dmarc=fail in a bounced message. Here is the 15-minute version: the exact records, why each one exists in one sentence, and how to verify you got it right.

The one-sentence versions

SPF lists which servers may send mail claiming to be from your domain.
DKIM cryptographically signs each message so receivers can verify it was not altered and really came from you.
DMARC tells receivers what to do when SPF or DKIM fail, and sends you reports about it.

SPF and DKIM are the mechanisms. DMARC is the policy on top. You need all three.

SPF: one TXT record

At your domain root (or the subdomain you send from), add a TXT record:

v=spf1 include:_spf.yourprovider.com ~all

Replace the include with whatever your email provider documents. Sending through Amazon SES it is include:amazonses.com; Google Workspace is include:_spf.google.com. If you send through several providers, chain the includes in a single record:

v=spf1 include:amazonses.com include:_spf.google.com ~all

Mistake #1: two SPF records. SPF allows exactly one TXT record starting with v=spf1 per domain. Two records means SPF returns a permanent error, which is worse than no record at all. Merge them.

Also know the 10-DNS-lookup limit: every include costs lookups, and past 10 the check fails. If you have collected includes from years of tools, prune them.

Verify it with dig:

dig +short TXT yourdomain.com | grep spf1

or use a checker that also counts your lookups, like this free SPF checker.

DKIM: the CNAMEs your provider gives you

You do not write DKIM records by hand. Your provider generates a key pair, keeps the private key, and gives you DNS records (usually 1 to 3 CNAMEs) that publish the public key. They look like:

abc123._domainkey.yourdomain.com  CNAME  abc123.dkim.provider.com

Add them exactly as given and wait for verification. That is it.

Mistake #2: proxying the DKIM CNAMEs. If your DNS is behind Cloudflare, those records must be DNS only (grey cloud). Proxied CNAMEs resolve to Cloudflare IPs and DKIM verification never completes. This one costs people days.

Verify with the selector your provider used:

dig +short TXT abc123._domainkey.yourdomain.com

You should see a v=DKIM1; k=rsa; p=... blob. A DKIM checker does the same with the parsing done for you.

DMARC: start monitoring, then enforce

Add a TXT record at _dmarc.yourdomain.com:

v=DMARC1; p=none; rua=mailto:dmarc-reports@yourdomain.com

p=none means "change nothing, just send me aggregate reports about who is sending as my domain." Run in this mode for a couple of weeks and read the reports; you will usually discover a forgotten tool sending as your domain.

Then enforce:

v=DMARC1; p=quarantine; rua=mailto:dmarc-reports@yourdomain.com; pct=100

and eventually p=reject. Enforcement is what actually stops spoofing, and since 2024 Gmail and Yahoo require a DMARC record for bulk senders at all.

Mistake #3: jumping straight to p=reject. If some legitimate system sends unaligned mail (a CRM, a billing tool, an old cron job), p=reject silently kills those messages. Monitor first, enforce second.

Verify:

dig +short TXT _dmarc.yourdomain.com

or decode the policy in plain English with a DMARC analyzer.

The 15-minute checklist

One SPF record, correct include, ~all at the end. Check the lookup count.
Add the provider's DKIM CNAMEs, unproxied. Confirm the selector resolves.
_dmarc record at p=none with a rua address. Calendar reminder for two weeks: read reports, move to quarantine, then reject.
Send a test email to a Gmail account, open "Show original", and confirm all three lines say PASS.

Step 4 is the ground truth. Gmail's "Show original" view shows spf=pass dkim=pass dmarc=pass right at the top, and it is checking the real thing rather than just the DNS.

Once these pass, deliverability problems stop being an authentication problem and start being a reputation problem. That is a different article, but you cannot get there without this one.

While speaking of emails, if you want to learn how SMTP works under the hood, watch a message move from application code to an SMTP relay, through TLS and AUTH, across DNS and recipient MX checks, and finally into a mailbox check out this simulator: SMTP Flow Simulator

I gave my AI agent the ability to send email

DevOps Daily — Mon, 20 Jul 2026 19:43:43 +0000

Last month I wired Claude up to my email infrastructure. Not "Claude writes an email draft and I paste it somewhere" but the agent checks my domain status, sends the email, and reads back the delivery events, all from the chat.

The glue that makes this possible is MCP (Model Context Protocol), and the whole setup takes about five minutes. Here is exactly how to do it, plus what surprised me once agents could actually touch production infrastructure.

What MCP actually is

MCP is a small JSON-RPC protocol that lets an AI client (Claude, Cursor, Windsurf, your own agent) call tools exposed by a server. The server describes its tools with JSON schemas, the model picks a tool and fills in the arguments, and the client executes the call.

The important design decision is where the server runs. A lot of MCP servers are local stdio processes you install per machine. For anything talking to a hosted API, a hosted MCP server is the better shape: nothing to install, your API key is the auth, and every client that speaks HTTP can use it.

I use SMTPfast for transactional email, which ships a hosted MCP endpoint at https://smtpfa.st/api/mcp. That is what I will use in the examples, but the pattern applies to any hosted MCP server.

Step 1: Connect Claude Code

One command:

claude mcp add --transport http smtpfast https://smtpfa.st/api/mcp \
  --header "Authorization: Bearer sf_your_api_key"

For Cursor, it is a snippet in .cursor/mcp.json:

{
  "mcpServers": {
    "smtpfast": {
      "url": "https://smtpfa.st/api/mcp",
      "headers": { "Authorization": "Bearer sf_your_api_key" }
    }
  }
}

That is the entire installation. No npm package, no local process, no version drift between machines.

Step 2: See what the agent can do

MCP servers self-describe. You can poke one with curl to see the tool list:

curl -sS https://smtpfa.st/api/mcp \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sf_your_api_key" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}'

The SMTPfast server exposes eight tools: send_email, list_emails, get_email, list_domains, verify_domain, list_suppressions, get_analytics, and list_contacts. The model reads those schemas and figures out the rest on its own.

Step 3: Just ask

With the server connected, I can type things like this into Claude:

"Check if my domain is verified, then send a test email from hello@mydomain.com to my personal address and tell me when it is delivered."

Behind the scenes the agent calls list_domains, sees the DKIM status, calls send_email, waits, then calls get_email to read the delivery events. I watch each tool call go by and approve it. No SDK, no glue code, no copy-pasting message IDs.

The debugging workflow is where this gets genuinely useful:

"Why did the email to jane@example.com bounce yesterday?"

The agent pulls the email, reads the bounce event with the SMTP diagnostic code, checks whether the address landed on the suppression list, and explains it in plain language. That used to be five minutes of clicking through a dashboard.

What surprised me

1. The approval step matters more than I expected. Most MCP clients show you each tool call before it runs. For read tools that feels like friction. For send_email it is exactly right. I would not connect a send-capable tool to an agent that runs unattended without scoping the key first.

2. Agents are great at chaining, bad at restraint. Ask a vague question and the agent will happily call four tools when one would do. Tight tool descriptions in the server matter as much as good prompts.

3. The server is the easy part. If your product already has a REST API, an MCP server is mostly a translation layer: tool schema in, API call out. The hard work (auth, rate limits, validation) already exists in the API. That is also why I prefer hosted MCP over stdio: it reuses everything the API already enforces, including that a compromised key can be revoked in one place.

Try it

If you want to reproduce this end to end: grab a free SMTPfast account (3,000 emails/month, no card), verify a domain, create an API key, and run the claude mcp add command above. The MCP docs cover the tool schemas and a few example prompts.

And if you are building a dev tool yourself: ship the hosted MCP endpoint. It is a weekend of work and it makes your product usable by every AI agent your customers already run.

I Gave an AI Agent a Database, Compute, Storage, and Models From One CLI

DevOps Daily — Wed, 15 Jul 2026 15:21:45 +0000

A working AI agent has an unglamorous shopping list. It needs a database to remember things, somewhere to run that can stream tokens without timing out, object storage for whatever it produces, and access to a model. Assembled the usual way, that is four separate signups: a Postgres host, a compute platform, an S3 bucket, and an OpenAI or Anthropic account, each with its own credential to provision, inject, and rotate per environment.

Neon's June 2026 platform preview collapses that list. The pitch is that the database, the compute, the storage, and the model gateway all come from one account and branch together. I wanted to know if that was real or a slide, so I built the canonical example end to end: an image-generating agent that takes a prompt, calls a model, stores the result, and indexes it in Postgres. This is the build log, with the real commands and output, and the parts where the preview still shows.

(Companion repo: The-DevOps-Daily/neon-ai-agent. Everything below ran against a fresh project created while writing.)

One command to scaffold the whole stack

Neon ships starter templates through its CLI. The image agent is one of them:

neonctl bootstrap ./ai-agent --template ai-sdk

That scaffolds 26 files: a Hono function, a Drizzle schema, a neon.ts config, and (a nice touch) a .agents/skills/ directory with skill docs for the AI assistant you are probably using to edit the project. Neon bundles agent instructions for its own products, which tells you who this template is aimed at.

The file that matters is neon.ts. It is the entire backend declared in one object:

import { defineConfig } from '@neondatabase/config/v1';

export default defineConfig({
  preview: {
    aiGateway: true,
    buckets: {
      images: {},
    },
    functions: {
      imagegen: {
        name: 'AI SDK image agent',
        source: 'src/index.ts',
      },
    },
  },
});

Three lines of intent: turn on the AI gateway, give me a bucket called images, and deploy src/index.ts as a function. No connection strings, no bucket ARNs, no model API keys. Those get filled in later, automatically.

Linking creates the project, deploying creates everything else

neon link creates and attaches a Neon project. The new platform features are private preview, so there are two constraints worth stating up front: everything is in AWS us-east-2, and it only works on projects created inside the preview. Your existing Neon databases do not grow these features in place.

Then neon deploy reads neon.ts and provisions the declared services. Here is the whole sequence, link through deploy:

That last line is the actual product. Eleven environment variables (the DATABASE_URL, the S3 access key/secret/endpoint, and the AI gateway token and base URL) all written for me, all scoped to this branch. The four credentials I would normally collect from four dashboards arrived from one deploy.

The model call: one credential, any provider

The AI Gateway is OpenAI-compatible. Your existing SDK works by changing only the base URL, so the same chat completion against the cheapest catalog model looks like this in whatever you already use:

curl:

curl "$NEON_AI_GATEWAY_BASE_URL/ai-gateway/mlflow/v1/chat/completions" \
  -H "Authorization: Bearer $NEON_AI_GATEWAY_TOKEN" \
  -d '{"model":"gpt-5-nano","messages":[
        {"role":"user","content":"What is Neon branching?"}]}'

Python:

from openai import OpenAI

client = OpenAI(base_url=GATEWAY_URL, api_key=GATEWAY_TOKEN)
client.chat.completions.create(
    model="gpt-5-nano",
    messages=[{"role": "user", "content": "What is Neon branching?"}],
)

Node:

import OpenAI from 'openai';

const client = new OpenAI({ baseURL: GATEWAY_URL, apiKey: GATEWAY_TOKEN });
await client.chat.completions.create({
  model: 'gpt-5-nano',
  messages: [{ role: 'user', content: 'What is Neon branching?' }],
});

Hitting it once returns exactly what you would expect from the model:

{
  "model": "gpt-5-nano-2025-08-07",
  "choices": [{ "message": { "role": "assistant",
    "content": "Neon Postgres branching creates lightweight, independent
      clones of a running database that can be developed in isolation..." }}]
}

The same token reaches around 25 models across Anthropic, OpenAI, Google, and a few open-source providers. You move between them by changing one model string. There is no separate OpenAI or Anthropic account in this project. The published prices look like each provider's own list rate, so the gateway reads as pass-through with the convenience of a single credential:

The point is not the specific numbers, it is that "use a cheap model in CI and a frontier model in prod" becomes a config value rather than a second vendor integration.

Storage that the function can reach with the normal S3 SDK

The images bucket is plain S3 as far as your code is concerned. The injected AWS_* variables point the standard AWS SDK at a branch-scoped endpoint, so this just works inside the function with no custom client:

const s3 = new S3Client({ forcePathStyle: true });
await s3.send(new PutObjectCommand({
  Bucket: 'images', Key: key, Body: jpeg, ContentType: 'image/jpeg',
}));
const url = await getSignedUrl(s3, new GetObjectCommand({ Bucket: 'images', Key: key }));

I confirmed it directly: a PutObject then GetObject round-tripped, and the presigned URL came back on a host scoped to the branch (br-green-star-….storage.c-3.us-east-2.aws.neon.tech). That branch scoping is the part you cannot get by bolting an external S3 bucket onto a database: open a branch and its files fork with it, so a preview environment never writes into production's objects.

Putting it together: the agent runs

The function is a small handler. It streams a model response, and when the model calls its image-generation tool, it uploads the JPEG to the bucket, inserts a row in Postgres, and returns a presigned URL. Calling the deployed agent:

curl -X POST "$IMAGEGEN_URL" -H 'content-type: application/json' \
  -d '{"messages":[{"role":"user",
       "content":"Draw a small minimalist server rack icon, flat style"}]}'

The response streams back as the agent narrates and draws, and afterward the side effects are all there. The object is in the bucket, and the row is in Postgres pointing at it:

 id |              prompt               |             bucket_key              | bytes
----+-----------------------------------+-------------------------------------+-------
  2 | Draw a small minimalist server... | generated/ed49b102-…-f8c46e2f8c16.jpg | 47372
  1 | Draw a small minimalist server... | generated/9125d5b4-…-63b54a892695.jpg | 47372

From an empty directory to a deployed agent that generates an image, stores it, and indexes it in Postgres took a few minutes and exactly one credential. The model call, the file write, and the database insert were all wired by the platform, not by me.

Where it still shows the preview

The build was smooth, but it is private preview and a few seams are worth knowing before you plan around it.

One region, new projects only. Everything is in AWS us-east-2 and only works on projects created inside the preview. You cannot bolt these features onto an existing production database today.

Functions are request/response, not a job runner. Great for the agent's synchronous loop and streaming; background work (queues, retries, schedules) still belongs to something like Inngest or QStash.
Two gateway dialects, and it matters. The OPENAI_BASE_URL Neon injects points at the OpenAI Responses API route. A plain chat-completions call needs the mlflow dialect route instead. I hit a 404 until I switched routes. The SKILL docs the template ships actually explain this, which is the kind of detail that saves you ten minutes if you read it first.
Billing is half-public. Per-model token prices are listed, but whether there is a markup or preview credits on top is not spelled out. Fine for a demo, a question to ask before a budget.
The convenience is also coupling. One config file declaring your functions, buckets, and gateway is, by design, Neon-shaped. The S3-compatible API and standard SDKs keep the exit ramps wide, but this is a bet on one vendor for four things you used to buy separately.

So is it real?

Yes, with an asterisk for "preview." The genuinely useful part is not any single feature, it is that the four pieces an agent needs arrive together, branch together, and authenticate with one credential. If you have ever spent the first afternoon of an AI side project wiring a database to a compute host to an S3 bucket to a model provider, collapsing that into one neon.ts and one deploy is a real reduction in moving parts.

Whether you should build on it today depends on your appetite for a private preview and for vendor consolidation. But as a statement of direction, an agent stack from one CLI is a clear one. We dig into the strategy behind it in Neon is becoming a backend platform, not just Postgres, and we benchmark Neon's database side in the Neon vs Supabase series. As these features leave preview, we will keep testing them the same way: real projects, real output, and the demo code published so you can run it yourself.

The full project is on GitHub. Clone it, point neonctl at a new us-east-2 project, and deploy:

https://github.com/The-DevOps-Daily/neon-ai-agent

Realtime Without a WebSocket Service

DevOps Daily — Wed, 15 Jul 2026 15:09:58 +0000

The moment a feature needs to update live, a live counter, a presence indicator, a "new message" badge, an activity feed, the reflex is to reach for a websocket service. Pusher, Ably, a Socket.IO server, a stateful Node process parked next to your stateless app. That is one more thing to deploy, scale, secure, and pay for, and it exists mostly to move small events from one place to a bunch of connected browsers.

If your data already lives in Postgres, you already have a message bus for that. Postgres ships with LISTEN and NOTIFY, a lightweight publish/subscribe system built into the database. Pair it with server-sent events from a serverless function and you can fan realtime updates out to every connected client without standing up any realtime infrastructure at all. In this post I build exactly that on a Neon Function, explain the one part that is subtle on serverless, and prove it works with two live subscribers. The repo is at the end.

TL;DR

Postgres LISTEN/NOTIFY is a built-in pub/sub. NOTIFY channel, 'payload' delivers to every connection that has run LISTEN channel.
A serverless function holds each browser's SSE connection open and keeps one Postgres LISTEN connection. On a write, the app calls pg_notify, and every isolate pushes the event to its SSE clients.
The subtle part on serverless: the runtime runs several isolates, each with its own in-memory set of clients. LISTEN/NOTIFY is what fans an event across all of them; an in-process broadcast alone would only reach one isolate's clients.
One real gotcha: LISTEN needs a session, so it must use a direct (unpooled) connection, not the transaction pooler.
It is fan-out for small live events, not a durable queue. For guaranteed delivery or bidirectional low-latency you still want a real broker or websockets.

Prerequisites

A Neon project on the platform preview (Functions, us-east-2)
The Neon CLI (npm i -g neon, then neon login)
Familiarity with Postgres and with SSE / EventSource on the client

The two pieces

Postgres LISTEN/NOTIFY is a pub/sub channel inside the database. A connection subscribes with LISTEN counter_updates, and any connection (from anywhere) that runs NOTIFY counter_updates, '42' causes Postgres to deliver that payload to every subscriber. No extra service, no broker to run; it is a feature of the database you already have.

Server-sent events (SSE) are the other half. SSE is a long-lived HTTP response that streams data: frames to the browser, consumed with the built-in EventSource API. It is one-directional (server to client), which is exactly the shape of most realtime UI: the server has news, the browser wants it. And because it is just an HTTP response, a serverless function can serve it.

Put them together: the function streams SSE to browsers and relays anything it hears on a Postgres channel.

The part that is subtle on serverless

Here is the trap. A function under load does not run as one process; the runtime spins up several isolates in parallel. Each isolate has its own memory, so each keeps its own set of open SSE connections. If you only broadcast in-process, a client connected to isolate A never sees an event triggered through isolate B.

LISTEN/NOTIFY is what closes that gap. Every isolate opens its own LISTEN connection to Postgres. When any code anywhere calls NOTIFY, Postgres delivers it to all of those connections, so every isolate gets the event and pushes it to its own clients. Postgres is the shared fan-out point that the isolates do not otherwise have.

// One dedicated LISTEN connection per isolate. LISTEN needs a real session,
// so use the DIRECT (unpooled) URL, not the transaction pooler.
const listener = new Client({ connectionString: env.postgres.databaseUrlUnpooled });
await listener.connect();
await listener.query('LISTEN counter_updates');

// SSE connections held open by THIS isolate.
const clients = new Set<ReadableStreamDefaultController<Uint8Array>>();

listener.on('notification', (msg) => {
  const frame = new TextEncoder().encode(`data: ${msg.payload}\n\n`);
  for (const c of clients) c.enqueue(frame); // push to this isolate's browsers
});

The write path is a normal query plus a NOTIFY:

app.post('/increment', async (c) => {
  const [row] = await db
    .insert(counters)
    .values({ id: 1, value: 1 })
    .onConflictDoUpdate({ target: counters.id, set: { value: sql`${counters.value} + 1` } })
    .returning({ value: counters.value });
  // Fan the new value out to every isolate, and thus every browser.
  await pool.query('SELECT pg_notify($1, $2)', ['counter_updates', String(row.value)]);
  return c.json({ value: row.value });
});

And the SSE endpoint just registers the browser and streams:

app.get('/events', async (c) => {
  const stream = new ReadableStream<Uint8Array>({
    start(controller) {
      clients.add(controller);
      // send the current value immediately so a new tab is correct on load
      readCount().then((v) => controller.enqueue(encode(`data: ${v}\n\n`)));
    },
    cancel() {
      /* remove this controller from clients */
    },
  });
  return new Response(stream, {
    headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' },
  });
});

LISTEN holds a session-level subscription, which the transaction pooler (PgBouncer in transaction mode) does not support. Use the direct, unpooled connection string for the listener (Neon injects it as DATABASE_URL_UNPOOLED). Keep using the pooled URL for your normal queries. Getting this wrong is the usual reason "notifications never arrive."

Proving it works

I deployed the counter as a Neon Function and connected two independent SSE subscribers, then fired three increments. Every subscriber should see its starting value on connect and then each new value as it happens. Here is the actual run:

Both streams saw every value. Neither subscriber talked to the other, and there is no websocket server anywhere in this picture; the events traveled browser → function → Postgres NOTIFY → every function isolate → every browser.

WebSocket service vs LISTEN/NOTIFY + SSE

	Dedicated websocket service	LISTEN/NOTIFY + SSE on a function
Extra infrastructure	A service to run, scale, secure	None; uses Postgres + the function
Direction	Bidirectional	Server to client (SSE)
Fan-out bus	The service	Postgres `NOTIFY`
Delivery	Often buffered / retried	Best-effort; dropped if no listener
Best for	Chat, cursors, games, huge fan-out	Live counters, feeds, notifications, presence

Where this stops being enough

This pattern is a genuine "delete a service" win for a large class of realtime features, but be honest about its edges:

It is not a durable queue. NOTIFY is fire-and-forget. If nobody is listening at that instant, the message is gone. That is fine for a live UI that re-reads state on reconnect; it is not fine for guaranteed delivery or work queues.
Payloads are small. Postgres caps a NOTIFY payload at 8000 bytes. Send an id or a small value and let clients fetch details, rather than shipping large blobs through the channel.
SSE is one-way. For low-latency bidirectional traffic (multiplayer, live cursors, collaborative editing) a websocket is still the right tool.
At very high scale a dedicated broker earns its keep. This shines at the small-to-medium fan-out that most apps actually need, without the standing infrastructure.

The repo

The full counter, backend function plus a small web client, is here:

https://github.com/The-DevOps-Daily/neon-realtime-demo

Wrapping up

Realtime does not always mean a websocket service. For the common cases, a live number, a badge, a feed, an activity stream, Postgres LISTEN/NOTIFY is a pub/sub you already run, and SSE from a serverless function is enough to get those events to the browser. On Neon the function lives on the branch next to Postgres, so the listener connection is a local hop and the whole realtime path is one deploy, no separate service to operate. Reach for a real broker or websockets when you need durability or two-way low latency; reach for this when you just want the UI to update and would rather not run another box to make it happen.

Preview Environments That Include the Backend, Not Just the Frontend

DevOps Daily — Wed, 15 Jul 2026 15:00:57 +0000

Open a pull request and your frontend host hands you a preview URL. Vercel, Netlify, Cloudflare Pages all do it: every PR gets its own isolated build you can click through before merging. It is one of the genuinely great DevOps conveniences of the last decade.

Then you look at what that preview talks to. The API and the database behind it are almost always a single shared staging environment. Every open PR hits the same backend, runs migrations against the same schema, and reads and writes the same rows. So the preview is only half a preview. The frontend is isolated; the thing it depends on is a free-for-all.

Neon changes what a "branch" contains. A branch is not just a copy of your schema, it is a copy-on-write copy of the data too, and with Neon Functions the compute deploys onto that branch as well. So a branch is the database, its data, and the backend, forked together, each with its own URL. That makes a real per-PR backend cheap enough to create and throw away on every pull request. In this post I show the workflow and prove the isolation with a live function, then sketch how to wire it into CI.

TL;DR

Frontend previews are isolated per PR. The backend they call usually is not, so previews share one staging database and its migrations and data.
A Neon branch copies the schema and the data (copy-on-write), and Neon Functions deploy onto the branch, so each branch is a full isolated backend with its own function URL.
I tested it: branched a live todos API, the branch came up with a copy of main's rows, a write to the branch left main untouched, and the branch had its own URL.
In CI this is: on PR open, create a branch and deploy the function; hand the frontend preview that branch's URL; on PR close, delete the branch and everything goes with it.
Because branches are copy-on-write and functions scale to zero, a preview backend costs almost nothing while it sits idle.

Prerequisites

A Neon project on the platform preview (Functions, us-east-2) with a deployed function
The Neon CLI (npm i -g neon, then neon login)
A CI system that can run CLI commands on pull-request events (the example uses GitHub Actions)

Why shared staging quietly hurts

A shared staging backend fails in ways that are easy to miss until they bite:

Migrations collide. Two PRs each add a column, or one renames a table the other still reads. Whoever runs their migration second gets a broken staging environment, and now both previews are wrong.
Data bleeds between PRs. One PR's test run creates records another PR's preview then displays. Bugs appear and vanish depending on who ran what, and nobody can reproduce them.
The preview is not like production. To avoid touching real data, staging often runs a thin set of seed fixtures, so the preview never sees the shape or volume of real data and "works in preview" does not mean "works in prod."
Resetting is scary. Because everyone shares it, nobody wants to be the one who wipes staging, so bad data accumulates for months.

None of this is a tooling failure on the frontend side. It is that the backend was never actually part of the preview.

What a Neon branch gives you

A Neon branch is a copy-on-write fork of the database at a point in time. It starts with the parent's schema and data instantly, without physically copying the bytes, and it diverges only as you write to it. Neon Functions extend that: when you deploy, the function is applied to a branch, and every branch gets its own function URL of the form https://<branch>-<function>.compute.<region>.aws.neon.tech.

Put those together and a branch is a self-contained backend: its own database, its own copy of the data, and its own API endpoint. Nothing it does touches the parent.

Proving the isolation

I have a small todos API (Hono + Drizzle on a Neon Function) already deployed on main, with a handful of rows. Here is the whole preview-backend lifecycle against it, with the real output.

That is the whole point in one sequence. The branch came up with its own function URL and a copy of main's four rows, a write landed only on the branch, main stayed at four, and deleting the branch cleaned up the database, the data, and the endpoint in one step. Every number there is from the real run.

Wire it into CI

The manual commands map directly onto pull-request automation. On open or update, create a branch named after the PR and deploy the function; expose the branch's function URL to your frontend preview as its API base; on close, delete the branch.

# .github/workflows/preview-backend.yml
name: preview-backend
on:
  pull_request:
    types: [opened, synchronize, reopened, closed]

jobs:
  preview:
    runs-on: ubuntu-latest
    env:
      NEON_API_KEY: ${{ secrets.NEON_API_KEY }}
      BRANCH: pr-${{ github.event.number }}-preview
    steps:
      - uses: actions/checkout@v4

      # Create-or-update the branch and (re)deploy the function to it.
      - if: github.event.action != 'closed'
        run: |
          npx neon branches create --name "$BRANCH" || echo "branch exists"
          npx neon deploy --branch "$BRANCH"
          # Expose the branch's function URL to the frontend preview, e.g. as
          # an env var on the Vercel/Netlify deploy for this PR.

      # Tear it all down when the PR closes.
      - if: github.event.action == 'closed'
        run: npx neon branches delete "$BRANCH"

Now the frontend preview and the backend preview live and die together. Reviewers click a preview that is running that PR's real code against that PR's own database, seeded from a real copy of production data, and none of it can affect anyone else.

Shared staging vs a branch per PR

	Shared staging backend	Branch per PR
Isolation	One database for all PRs	Own database + data + URL per PR
Migrations	Collide across PRs	Run only against that branch
Data realism	Thin seed fixtures	Copy-on-write copy of real data
Teardown	Manual, scary, shared	Delete the branch, everything goes
Idle cost	An always-on staging box	Copy-on-write storage + scale-to-zero compute

Because a branch is copy-on-write, it does not duplicate your data on disk; it stores only what diverges. Combined with functions that scale to zero when idle, a preview backend for a PR that nobody is actively clicking costs close to nothing, which is what makes one-per-PR practical rather than a budget conversation.

The repo

The todos API used here (Hono + Drizzle on a Neon Function) is the same one from the first post in this series:

https://github.com/The-DevOps-Daily/neon-functions-demo

Wrapping up

Preview environments earned their reputation on the frontend, where every PR gets a clean, clickable, isolated build. The backend got left behind on shared staging, and that is where the confusing bugs and the migration standoffs come from. Because a Neon branch carries the schema, the data, and now the function together, you can give each pull request a real backend of its own and delete it on merge. The frontend preview finally talks to something as disposable and isolated as it is.

A Postgres-Backed MCP Server in ~20 Lines

DevOps Daily — Wed, 15 Jul 2026 14:46:58 +0000

The Model Context Protocol is how an AI agent gets tools. You stand up an MCP server, it advertises a set of tools with typed inputs, and the agent calls them. For a huge number of real MCP servers, those tools are thin wrappers around a database: search these records, create this row, update that field. The server is mostly a translator between JSON-RPC and SQL.

Which raises an obvious question. If an MCP server spends its life talking to Postgres, why does it so often run somewhere far away from Postgres? The usual setup is an MCP server on one host and the database on another, so every tool call pays a network round trip to reach the data it needs.

Neon Functions let you skip that. You deploy the MCP server as a function that lives on the same database branch it queries, in the same region, so the server-to-Postgres hop is a local one. In this post I build a Postgres-backed MCP server, deploy it onto a branch, connect a real MCP client, and show what the round trips actually look like. The whole thing is about twenty lines of interesting code, and the repo is at the end.

TL;DR

An MCP server that exposes database tools is mostly network plus queries. Running it next to the database removes a cross-region hop from every tool call.
Neon Functions deploy your MCP server onto a database branch, co-located with Postgres. The server-to-database query is a same-region hop of a millisecond or two, not a transatlantic one.
The core is small: define a Drizzle schema, register a tool whose handler runs a query, and expose the MCP server over the streamable HTTP transport at /mcp. That is the ~20 lines.
Any MCP client that speaks streamable HTTP connects to it: mcporter, the MCP SDK, or an agent like Claude or Cursor pointed at the URL.
Each branch gets its own function URL, so every preview or test branch can have its own isolated MCP endpoint over its own copy of the data.

Prerequisites

Node.js 20+ and the Neon CLI (npm i -g neon, then neon login)
A Neon account with the platform preview enabled (Functions, new us-east-2 projects)
Basic familiarity with Postgres and TypeScript
Optional: an MCP client to point at it, such as mcporter, Claude, or Cursor

What an MCP server actually is

Strip away the branding and an MCP server is a small RPC service. It speaks JSON-RPC over a transport, and it advertises a list of tools. Each tool has a name, a description, and an input schema. When the agent decides to call a tool, the server runs a handler and returns a result. That is the whole contract.

The transport here is streamable HTTP: the client POSTs JSON-RPC messages to a single endpoint (/mcp) and reads responses back, with server-sent events for anything streamed. It works over plain HTTPS, which is exactly what a serverless function serves, so an MCP server and a Neon Function are a natural fit.

The ~20 lines

Here is the core of a Postgres-backed MCP server. A schema, one tool whose handler runs a query, and the wiring to expose it over streamable HTTP. Everything else is more of the same.

import { Hono } from 'hono';
import { drizzle } from 'drizzle-orm/node-postgres';
import { Pool } from 'pg';
import { ilike } from 'drizzle-orm';
import { z } from 'zod';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPTransport } from '@hono/mcp';
import { contacts } from './db/schema';

// One pool per isolate, reused across requests.
const db = drizzle(new Pool({ connectionString: process.env.DATABASE_URL }));

const mcp = new McpServer({ name: 'contacts', version: '1.0.0' });

mcp.registerTool(
  'search_contacts',
  {
    description: 'Search contacts by name. Omit the query to list everyone.',
    inputSchema: { query: z.string().optional().describe('substring to match') },
  },
  async ({ query }) => {
    const rows = await db
      .select()
      .from(contacts)
      .where(query ? ilike(contacts.name, `%${query}%`) : undefined);
    return { content: [{ type: 'text', text: JSON.stringify(rows) }] };
  },
);

// Expose the server over streamable HTTP at /mcp.
const app = new Hono();
const transport = new StreamableHTTPTransport();
app.all('/mcp', async (c) => {
  if (!mcp.isConnected()) await mcp.connect(transport);
  return transport.handleRequest(c);
});

export default app;

The tool handler is the interesting part. It is just a query. registerTool gives the agent the name, the description, and a Zod input schema (the SDK turns that into the JSON schema the model sees), and your handler returns content. The companion repo fills this out to full CRUD (create_contact, update_contact, delete_contact, search_contacts) against a small contacts table, but every tool follows this same shape: describe it, run a query, return the rows.

The schema is ordinary Drizzle:

import { pgTable, serial, text, timestamp } from 'drizzle-orm/pg-core';

export const contacts = pgTable('contacts', {
  id: serial('id').primaryKey(),
  name: text('name').notNull(),
  email: text('email'),
  company: text('company'),
  notes: text('notes'),
  createdAt: timestamp('created_at').defaultNow().notNull(),
});

And the function declaration that tells Neon what to deploy:

// neon.ts
import { defineConfig } from '@neon/config/v1';

export default defineConfig({
  preview: {
    functions: {
      contacts: { name: 'contacts mcp server', source: 'src/index.ts' },
    },
  },
});

Deploy it onto the branch

The Neon CLI scaffolds the template, links (or creates) a project, pushes the schema, and deploys the function. From an empty directory:

That last URL is the deployed MCP server. The function and the Postgres branch it queries are in the same region, us-east-2. The MCP endpoint is that URL plus /mcp. If you want to iterate before deploying, neon dev serves the same function locally at http://localhost:8787 with the MCP endpoint at /mcp.

A Neon Function has a public HTTPS URL, reachable by anyone who has it. This example runs open for the demo, which is not acceptable for anything real: these tools read and write your database. Gate the endpoint before you share the URL.

The gate is a few lines of Hono middleware in front of /mcp. The repo ships it env-gated: leave MCP_TOKEN unset and the demo stays open, set it and every request needs the bearer token.

app.use('/mcp', async (c, next) => {
  const token = process.env.MCP_TOKEN;
  if (token && c.req.header('authorization') !== `Bearer ${token}`) {
    return c.json({ error: 'unauthorized' }, 401);
  }
  await next();
});

Most MCP clients can send custom headers, so the agent side is one config line (Authorization: Bearer <token>). I verified the gate directly against the app: no header and a wrong token both get a 401, the right token passes through to the transport, and with MCP_TOKEN unset the endpoint behaves exactly as before.

Wire up a client and watch it work

Any MCP client that speaks streamable HTTP can connect to /mcp. Here are three ways: a CLI, the SDK, and adding it to an agent.

mcporter (CLI):

# List the tools the server advertises
mcporter list https://<branch>-contacts.compute.c-3.us-east-2.aws.neon.tech/mcp --schema

# Call a tool
mcporter call ".../mcp.create_contact" name="Ada Lovelace" company="Analytical Engines"
mcporter call ".../mcp.search_contacts" query="engine"

MCP SDK (Node):

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';

const url = new URL('https://<branch>-contacts.compute.c-3.us-east-2.aws.neon.tech/mcp');
const client = new Client({ name: 'test', version: '1.0.0' });
await client.connect(new StreamableHTTPClientTransport(url));

console.log((await client.listTools()).tools.map((t) => t.name));
const r = await client.callTool({ name: 'search_contacts', arguments: { query: 'ada' } });
console.log(r.content[0].text);

Claude / Cursor:

# Point an MCP-aware agent at the URL as a streamable HTTP server.
# add-mcp writes the client config for you:
npx add-mcp https://<branch>-contacts.compute.c-3.us-east-2.aws.neon.tech/mcp -a claude

# Then in the agent: "search my contacts for anyone at the Navy"

I ran the SDK client against the deployed server from a machine in Europe. The handshake and the tool calls all worked on the first try:

connect (initialize + handshake): ~1.5 s   (cold start ~2 s the first time)
tools/list: create_contact, update_contact, delete_contact, search_contacts
create_contact: 196 ms  ->  { "created": { "id": 1, "name": "Ada Lovelace", ... } }
search_contacts "navy": 150 ms  ->  { "count": 1, "contacts": [ { "name": "Grace Hopper", ... } ] }

A direct SELECT count(*) against the branch afterwards showed the rows really landed in Postgres. Nothing is held in memory; the tools are just queries.

Why co-location is the point

Those tool-call numbers are around 150 to 200 milliseconds, but that is a measurement of my distance to the function, not the function's speed. I am in Europe and the function is in us-east-2, so each call is roughly one transatlantic round trip. An agent running near the region, or the model provider's own infrastructure calling the tool, sees a small fraction of that.

The number that does not move with the client's location is the hop from the function to Postgres, and that is the one co-location fixes. In the first post in this series I measured exactly that: a SELECT from inside the function against the co-located branch ran in about 1.2 ms, versus about 135 ms for the same query issued across the Atlantic.

A tool call that runs one or two queries inherits that difference on every invocation. Put the MCP server a region away from its database and each tool call carries an extra cross-region round trip on top of whatever the client already paid to reach the server. Put the server on the branch and that part is effectively free. For a server whose entire job is querying Postgres, that is the hop worth optimizing.

One endpoint per branch

There is a second thing you get for free here. Neon Functions are deployed per branch, and each branch has its own function URL. Because a branch is also a copy of your data, that means every branch can have its own MCP server over its own dataset.

Spin up a branch for a preview environment and it comes with an MCP endpoint backed by that branch's data. Give an agent a scratch branch to work against and it cannot touch production. Run your CI against a branch and the agent's tools operate on the ephemeral copy, then it all gets thrown away with the branch. You are not standing up and tearing down a separate MCP service per environment; the endpoint rides along with the branch you already have.

The repo

The full example, with all four CRUD tools, the schema, the deploy config, and client test scripts, is here:

https://github.com/The-DevOps-Daily/neon-mcp-demo

Wrapping up

An MCP server that fronts a database is mostly network and queries, and the network part is worth taking seriously because an agent may call these tools dozens of times in a single task. Neon Functions let you collapse the server-to-database distance to a same-region hop by deploying the MCP server onto the branch it queries, and the code to do it is small: a schema, a tool that runs a query, and the streamable HTTP transport. Point any MCP client at the URL and the agent has typed, database-backed tools running right next to the data. Give each branch its own endpoint and you get isolated, per-environment agent tooling without any extra services to run.

Streaming an AI Agent Without a Function Timeout

DevOps Daily — Wed, 15 Jul 2026 14:16:30 +0000

An AI agent and a serverless function want different things. The agent wants to think, call a tool, stream some tokens, call another tool, and keep the connection open the whole time, which can be tens of seconds or more. A lot of serverless tiers want the opposite: do your work quickly and return, because the invocation has an execution cap. Put them together and you get the failure everyone who has shipped an agent has seen at least once: the response is still streaming when the platform decides time is up and closes the socket.

This is the second post in our series on Neon Functions. The first was about where your compute runs relative to your data; this one is about how long it is allowed to keep talking. Neon Functions are built to hold long-lived streaming connections, so a slow agent or a long stream is a normal request, not a fight with a timeout. To show it rather than assert it, I deployed two endpoints and measured them.

(Companion repo, deploy it yourself: The-DevOps-Daily/neon-streaming-demo.)

Two endpoints, one config

The whole backend is a single Hono function with the AI Gateway switched on in neon.ts:

import { defineConfig } from '@neondatabase/config/v1';

export default defineConfig({
  preview: {
    aiGateway: true,
    functions: {
      stream: { name: 'streaming demo', source: 'src/index.ts' },
    },
  },
});

The streaming itself is ordinary Hono. The first endpoint holds a server-sent-events connection open and emits a tick every second, for as many seconds as you ask:

import { streamSSE } from 'hono/streaming';

app.get('/long-stream', (c) => {
  const seconds = Math.min(600, Math.max(1, Number(c.req.query('seconds') ?? '90')));
  const start = Date.now();
  return streamSSE(c, async (stream) => {
    for (let i = 1; i <= seconds; i++) {
      await stream.writeSSE({ event: 'tick', data: JSON.stringify({ tick: i, elapsed_ms: Date.now() - start }) });
      await stream.sleep(1000);
    }
    await stream.writeSSE({ event: 'done', data: JSON.stringify({ ticks: seconds }) });
  });
});

It streamed for 90 seconds without being asked twice

I called /long-stream?seconds=90 and let it run. It ticked once a second, on the second, for a minute and a half, and closed cleanly on its own terms:

Ninety seconds is not a magic number; I picked it because it is comfortably past the execution cap a lot of serverless functions ship with by default, and the function did not care. No special mode, no config flag, no "streaming response" opt-in. The handler just held the connection.

To be precise about the comparison: this is about defaults and design, not "infinite versus finite." Traditional serverless functions cap a single invocation low by default (Vercel's Hobby tier at 10 seconds, Pro at 60), which is exactly where a slow agent gets cut off. Platforms do offer longer runs when you reach for them: Vercel's Fluid Compute extends to 300 to 1800 seconds, and AWS Lambda allows up to 15 minutes. The point is that long-lived streaming is the default behaviour of a Neon Function, not a setting you discover after your agent times out in production.

Now stream an actual agent

A ticking clock proves the connection lasts. The real workload is a model streaming tokens. The second endpoint sends the prompt to the Neon AI Gateway with stream: true and relays each token to the caller as it arrives:

const upstream = await fetch(`${process.env.NEON_AI_GATEWAY_BASE_URL}/ai-gateway/mlflow/v1/chat/completions`, {
  method: 'POST',
  headers: { Authorization: `Bearer ${process.env.NEON_AI_GATEWAY_TOKEN}`, 'content-type': 'application/json' },
  body: JSON.stringify({ model: 'gpt-5-nano', stream: true, messages }),
});
// ...parse the upstream SSE and re-emit each delta as it lands
await stream.writeSSE({ event: 'token', data: JSON.stringify({ delta }) });

Calling it with a small prompt, the first token came back at 466 ms and the full 62-token reply finished at about 2.0 seconds. The reader sees the answer forming almost immediately instead of waiting two seconds for a wall of text:

Two seconds is short because the model and the prompt are small. The reason this matters is that real agents are not short: they make several model calls, run tools between them, and a full run is routinely tens of seconds. On a platform that caps invocations at 10 or 60 seconds, that run is a gamble against the clock. On a function built to hold the stream, it is just a request that takes a while.

What it is, and what it is not

Private preview, one region, new projects only. Everything is in AWS us-east-2 and only works on projects created inside the preview. Plan accordingly before building on it.

Two more things worth knowing before you reach for this:

It is request/response, even when the response is long. These functions answer a caller and can keep streaming to it for a long time, including over WebSockets and SSE. They are not a background job runner. Work that should outlive the request (queues, retries, scheduled tasks) belongs to something like Inngest or QStash.
Idle functions can be evicted. A long active stream is fine; a function sitting idle may be scaled to zero and cold-start on the next call. That is the usual serverless tradeoff, not a streaming-specific one.

Who this is for

If you are shipping anything agentic (a chat assistant, a tool-using agent, a long generation, an MCP server holding a session), the timeout is the wall you hit first, and the usual workaround is to learn your platform's extended-duration mode and hope you configured it right. A function that holds the stream by default removes that whole category of "why did my response get cut off" debugging.

The full demo, both endpoints, is here. The streaming logic is about 80 lines:

https://github.com/The-DevOps-Daily/neon-streaming-demo

Next in the series: a Postgres-backed MCP server in about twenty lines, and preview environments that include the backend, not just the frontend. The strategy behind all of it is in Neon is becoming a backend platform, not just Postgres.

Compute That Lives on Your Database Branch

DevOps Daily — Wed, 15 Jul 2026 12:59:23 +0000

Ask where your backend code runs relative to your database and the answer is often "somewhere else." Your function is in one provider's us-east-1, your Postgres is in another region entirely, and every query crosses that gap. Most of the time you don't see it, because one query is fast enough to ignore. Then a request makes eight queries in sequence, each pays the round trip, and suddenly an endpoint that should take milliseconds takes most of a second.

Neon Functions, part of Neon's June 2026 platform preview, takes a different position: run the compute in the same region as the database branch, on a URL scoped to that branch. This is the first in a series on what that buys you. It is also the simplest to demonstrate, because the benefit is something you can measure. I deployed a small REST API and timed a trivial query two ways. The numbers are at the bottom, and they are not close.

(Companion repo, deploy it yourself: The-DevOps-Daily/neon-functions-demo.)

The whole backend is one config file

Neon ships starter templates through its CLI. The REST API is one of them:

What gets deployed is declared in neon.ts. For this API it is three lines of intent: take src/index.ts and run it as a function called todos.

import { defineConfig } from '@neondatabase/config/v1';

export default defineConfig({
  preview: {
    functions: {
      todos: { name: 'todo api', source: 'src/index.ts' },
    },
  },
});

No connection string in there, no region to pick for the compute, no URL to reserve. The DATABASE_URL is injected at deploy time, and the function lands in the same region as the branch automatically.

The function is a normal web handler

There is nothing Neon-specific in the application code. It is a standard Hono app talking to Postgres through a connection pool, the same code you would write for any Node host:

import { Hono } from 'hono';
import { drizzle } from 'drizzle-orm/node-postgres';
import { Pool } from 'pg';
import { parseEnv } from '@neondatabase/env';
import config from '../neon';
import { todos } from './db/schema';

const env = parseEnv(config);
const pool = new Pool({ connectionString: env.postgres.databaseUrl, max: 5 });
const db = drizzle(pool);

const app = new Hono();
app.get('/todos', async (c) => c.json(await db.select().from(todos)));
app.post('/todos', async (c) => {
  const { text } = await c.req.json<{ text: string }>();
  const [row] = await db.insert(todos).values({ text }).returning();
  return c.json(row, 201);
});

export default app;

After neonctl deploy, that handler answers at a branch-scoped URL, and the create/read path works end to end:

curl -X POST "$URL/todos" -H 'content-type: application/json' -d '{"text":"ship it"}'
# {"id":1,"text":"ship it","createdAt":"2026-06-25T16:17:10.692Z"}  (201)
curl "$URL/todos"
# [{"id":1,"text":"ship it","createdAt":"2026-06-25T16:17:10.692Z"}]  (200)

The phrase "branch-scoped URL" is the part worth slowing down on. Open a branch off this one and it gets its own function at its own URL, running your latest code against that branch's data. The preview environment for a pull request stops being "the frontend plus a shared backend" and becomes a real, isolated copy. We will spend a whole post on that later; for now, the point is that the function and the branch are one unit.

Now measure the distance

Here is the part you can put a number on. The function exposes a /db-latency endpoint that times thirty SELECT 1 round trips from inside the handler and returns the median. Because the function runs in the same region as the branch, this is the local hop:

curl "$URL/db-latency"
# { "from": "neon function (us-east-2, co-located with Postgres)",
#   "runs": 30, "min_ms": 1.13, "median_ms": 1.19, "p95_ms": 1.62 }

Just over a millisecond. Then I ran the exact same SELECT 1, against the exact same database, from a machine in Europe (this site's build box, a Raspberry Pi a long way from us-east-2):

# same query, same database, from a machine on another continent
# { "from": "europe -> us-east-2", "runs": 30,
#   "min_ms": 130.46, "median_ms": 134.54, "p95_ms": 138 }

Same query, same database. The only thing that changed is where the caller sits.

About 113x. And that is for one round trip. A request that reads a session, loads a user, fetches their settings, and runs three more queries pays that distance once per query if it runs them in sequence. At 1.2 ms the six-query endpoint spends roughly 7 ms talking to the database; at 135 ms it spends most of a second, and no amount of application tuning fixes it, because the time is in the network. This is the tax co-located compute removes. It is also where a lot of "serverless Postgres is slow" folklore actually comes from: not the database, but a function in one region reconnecting to a database in another on every cold start.

To be fair about the comparison: a real deployment is rarely as far away as Europe-to-Virginia. If your Lambda and your database are both in us-east-1 the gap is smaller. But "both in the same region" is exactly the property Neon Functions give you by default instead of by careful configuration, and "smaller" is not "zero."

What it is, and what it is not

A few things are worth stating plainly before you build around this, because it is a private preview and it has clear edges.

Private preview, one region, new projects only. Everything is in AWS us-east-2 and only works on projects created inside the preview. You cannot turn this on for an existing production database today.

Beyond that:

These are request/response functions, not a job runner. They are built for APIs, agents, webhooks, and real-time connections (they support streaming and long-lived sockets, not just quick replies). Background work, queues, retries, and schedules are a different kind of compute; pair them with something like Inngest or QStash.
Function memory is fixed (2048 MiB at preview), so this is not yet a knob-for-everything compute platform.
It is a Neon-shaped commitment. One config file declaring your functions is convenient precisely because it is integrated. That is coupling, traded for the locality and the branching.

Who this is for

If your backend already lives in mature infrastructure-as-code with compute and database carefully placed in the same region, Neon Functions are not solving a problem you have. You already paid the cost to make the hop short.

The teams this helps are the ones who never got around to that: side projects and small teams whose compute and database drifted into different regions because nobody decided otherwise, and anyone who wants a pull request to spin up a genuinely isolated backend without wiring it by hand. For them, "the function runs next to the database, on this branch's data, at this URL" is a real reduction in both latency and moving parts, and it is the default rather than a configuration you have to get right.

We dig into the bigger picture in Neon is becoming a backend platform, not just Postgres, and the rest of this series walks through the other things a branch-scoped function unlocks: streaming agents, MCP servers, and preview environments that include the backend. The full demo, including the /db-latency endpoint, is here:

https://github.com/The-DevOps-Daily/neon-functions-demo