HelperX

Posted on Jun 28

Idempotency Keys for Social Automation: Never Double-Post on a Timeout

#backend #api #node #architecture

A scheduled post fires. The request to publish it goes out. The network hangs. After 30 seconds, our client times out. We retry. The tweet publishes. Then the original request completes too — and a second, identical tweet goes out.

That's a double-post. On a personal account it's embarrassing. On a client account at an agency it's a support ticket and a credibility hit. Either way, it's the single most visible failure mode in any posting system, and it's caused by one of the most common conditions in distributed systems: ambiguous outcomes under timeout.

At HelperX, we ship scheduled posts, replies, and DMs across hundreds of accounts. Every one of those actions can time out, retry, and double-execute. This article is about how we prevent that with idempotency keys — the same pattern payment systems use to prevent double-charges, applied to social actions.

The problem, precisely

A timeout is ambiguous. When a publish request times out, the action is in one of three states:

Never reached the server. The post didn't publish. Safe to retry.
Reached the server, failed there. The post didn't publish. Safe to retry.
Reached the server, succeeded, response lost. The post did publish. Retrying publishes it again.

The client cannot distinguish states 1 and 2 from state 3. They all look identical: "I sent a request and didn't get a response." Naive retry logic treats all three the same and retries — which is correct for 1 and 2 but catastrophic for 3.

This is the classic "exactly-once is impossible" problem. You can't guarantee an action executes exactly once over an unreliable network. But you can guarantee it takes effect exactly once, using idempotency.

The idempotency key idea

An idempotency key is a unique identifier the client generates before sending the request and includes with it. The server uses the key to recognize a retry and avoid re-executing:

First request with key K → server executes, stores "K succeeded, result was R."
Retry with same key K → server sees K already executed, returns the stored result R without re-executing.

The key transforms "ambiguous timeout" into "safe retry." If we time out, we retry with the same key. If the original succeeded, the retry returns the original result (no double-post). If the original didn't succeed, the retry executes once (correct). Either way, exactly one execution takes effect.

This is how Stripe, Square, and every payment API prevent double-charges. The pattern transfers directly to social actions.

The catch: the server has to support it

The idempotency-key pattern requires the server to deduplicate. If you're calling an API that doesn't accept idempotency keys (X's posting endpoints, as of this writing, don't), you can't rely on the server. You have to implement dedup yourself, on your side, before the request even goes out.

That's our situation. So we do client-side idempotency: we guarantee, through our own state, that we never send a duplicate request — even under timeout and retry.

Client-side idempotency for posting

The core idea: decide, durably, that you're going to post before you send the request. Then a retry never re-decides; it just continues a decision already made.

Our scheduled-post flow, simplified:

async function publishScheduledPost(slotId, postId, content) {
  // Step 1: claim the post atomically. This is the idempotency boundary.
  const claim = await db.claimPost(slotId, postId);
  if (!claim.acquired) {
    // Another run already claimed it. Whether that run succeeded or is
    // still in flight, we must NOT publish again from here.
    return { status: 'already_in_progress', priorResult: claim.priorResult };
  }

  // Step 2: we own this post. Attempt the publish.
  let result;
  try {
    result = await xClient.createTweet(content);
    await db.recordPostSuccess(slotId, postId, result.tweetId);
  } catch (e) {
    if (isAmbiguousError(e)) {
      // Timeout or network error — the post MIGHT have published.
      // Mark ambiguous, let reconciliation figure it out. DO NOT retry here.
      await db.recordPostAmbiguous(slotId, postId);
      return { status: 'ambiguous', message: 'will reconcile' };
    }
    // Definitive failure (e.g., 403, content policy) — safe to mark failed
    await db.recordPostFailure(slotId, postId, e);
    return { status: 'failed', error: e };
  }

  return { status: 'sent', tweetId: result.tweetId };
}

Two things make this idempotent.

The claim (Step 1) is the idempotency key. db.claimPost atomically marks the post as "being processed" in a transaction. If two runs race — say, the scheduler fired twice, or a retry overlapped a manual trigger — only one acquires the claim. The other sees the post is already claimed and bows out, regardless of whether the first run has finished.

Ambiguous errors are NOT retried inline (Step 2). This is the subtle, critical part. When a publish times out, the temptation is to retry immediately. We don't. We record the ambiguity and let a separate reconciliation process resolve it. Retrying inline would risk the exact double-post we're trying to prevent.

Reconciliation: resolving the ambiguous state

An "ambiguous" post is one where we don't know if it published. The reconciliation job, running every few minutes, picks these up and determines the truth:

async function reconcileAmbiguousPosts() {
  const ambiguous = await db.getPostsWithStatus('ambiguous');

  for (const post of ambiguous) {
    // Did the post actually go out? Search the account's recent tweets
    // for one matching our content (by text + scheduled time window).
    const liveTweet = await findLiveTweet(post.slotId, post.content);

    if (liveTweet) {
      // It published! Record the tweet ID and move on. No retry needed.
      await db.recordPostSuccess(post.slotId, post.id, liveTweet.id);
    } else if (await hasTimeToRetry(post)) {
      // It didn't publish, and we still have time before the scheduled slot.
      // Release the claim so a fresh publish attempt can occur.
      await db.releaseClaim(post.slotId, post.id);
    } else {
      // Didn't publish, and the scheduled window has passed. Mark failed.
      await db.recordPostFailure(post.slotId, post.id, 'window_elapsed');
    }
  }
}

The reconciliation is what makes the whole thing safe. By checking reality (did the tweet actually appear on the account?) rather than guessing, we never double-publish. The worst case is a delayed single publish (we wait for reconciliation), never a double publish.

Why we don't retry inline

This deserves emphasis because it's counterintuitive. Most retry logic retries immediately on timeout. For idempotency, that's wrong.

Consider the failure: a publish request times out after 30 seconds. In reality (state 3 above), the tweet published at second 28, but the response was slow. If we retry at second 31, we send a second publish request — and there's nothing on X's side to stop it, because the second request looks brand-new. Double-post.

The only safe behavior under ambiguous timeout is: stop, record the ambiguity, and let a process that can observe reality decide. The reconciliation job observes reality (is the tweet there?) and acts accordingly. It's slower than inline retry, but it's correct, and correctness is the whole point.

The rule: retry only when the outcome is unambiguous. A definitive 403 is unambiguous (failed, safe to handle). A timeout is ambiguous (don't retry; reconcile).

The claim in detail

The claim mechanism is a database row with a status, updated atomically:

CREATE TABLE post_jobs (
  slot_id    TEXT NOT NULL,
  post_id    TEXT NOT NULL,
  status     TEXT NOT NULL,   -- 'pending' | 'claimed' | 'sent' | 'ambiguous' | 'failed'
  claimed_at INTEGER,
  tweet_id   TEXT,            -- set on success
  PRIMARY KEY (slot_id, post_id)
);

The claim is a conditional update:

UPDATE post_jobs
SET status = 'claimed', claimed_at = ?
WHERE slot_id = ? AND post_id = ? AND status = 'pending';

If this affects 0 rows, someone else already claimed it (or it's already done). The WHERE status = 'pending' is the atomic guard that makes the claim race-proof. SQLite/Postgres execute this as a single statement; two concurrent claims can't both succeed.

Edge cases worth handling

The claim that never releases. A run claims a post, then crashes before recording success or failure. The post is stuck in claimed forever. We handle this with a claim TTL — a claimed post older than N minutes is treated as abandoned and reset to pending (eligible for a fresh claim). This trades a small risk of late double-publish (if the original run is actually still in flight) for avoiding permanent stalls. The TTL is set generously (several minutes) so the risk is negligible.

Reconciliation finds a tweet we didn't record. Sometimes a post publishes but our success-recording fails. The reconciliation job's findLiveTweet will discover it and record the tweet ID retroactively. This is the system self-healing — exactly the property we want.

The same content scheduled twice by the operator. Two distinct scheduled posts with identical text are different jobs (different post_id) and will both publish. That's correct behavior — the operator scheduled two posts. Idempotency prevents accidental duplicates (timeouts, retries), not intentional ones. We don't second-guess the operator.

Scheduled time has passed by the time we publish. A post scheduled for 9:00 AM that gets an ambiguous timeout at 9:01 might reconcile at 9:05. By then, posting at 9:05 instead of 9:00 is a minor miss but not a double-post. The reconciliation respects a grace window — if we're still within a few minutes of the scheduled time, publish; if we're way past, fail rather than post stale content.

What we learned

1. The idempotency boundary is the claim, not the request. Decide durably that you're going to act before the network call. The network is unreliable; your database isn't.

2. Never retry ambiguous failures inline. Timeouts are ambiguous. Inline retry on timeout is how double-posts happen. Reconcile instead.

3. Reconcile against observed reality, not assumed state. "Did the tweet actually appear?" is a question you can answer by looking. "Did the request succeed?" is a question you sometimes can't. Ask the answerable one.

4. The claim needs a TTL. Processes crash. A claim that never releases stalls the job forever. A generous TTL turns permanent stalls into eventual recovery.

5. Idempotency is a system property, not a function. It's not enough to add an idempotency key to one call. The claim, the non-retry-on-ambiguous, the reconciliation, and the TTL together form the guarantee. Remove any piece and double-posts return.

This pattern — durable claim before action, no inline retry on ambiguous failure, reconciliation against reality — is how you get effectively-once execution over an unreliable network without server-side idempotency support. It's more work than retry(3), but retry(3) is exactly the code that produces double-posts. The work is the point.

HelperX ships scheduled posts, replies, and DMs with client-side idempotency that never double-executes under timeout — claim before action, reconcile against reality. Free 30-day trial.

DEV Community