MORINAGA

Posted on May 10

How I fixed a Bluesky image upload race against Cloudflare Pages deploy lag

#typescript #webdev #showdev #githubactions

After adding a Bluesky summary image pipeline to my article publish workflow, every Bluesky post came out as plain text with no thumbnail. Dev.to and Hashnode looked fine. The problem turned out to be a timing gap between when GitHub Actions commits an image to the repo and when Cloudflare Pages actually serves it — a window that can be 2–4 minutes wide. This post is the exact fix and why it works.

The pipeline that created the problem

My publish-articles.yml workflow runs in order:

python scripts/generate-summary.py — renders a 1080×1350 PNG via Playwright Chromium and writes the output path into each article's summary_image frontmatter
python scripts/generate-og.py — same pattern, 1200×630 for Dev.to/Hashnode cover images
git add + git commit + git push — the generated PNGs go into apps/ai-tools/public/og/summary/
pnpm post:article — publishes pending articles to Dev.to, Hashnode, and Bluesky

Step 3's push triggers a Cloudflare Pages deploy. Step 4 runs immediately after the push returns. There's no wait. When Bluesky needs the image bytes, it fetches from https://aiappdex.com/og/summary/<slug>.png — a URL that returns 404 for the next few minutes while Cloudflare Pages is still deploying.

The Bluesky AT Protocol (com.atproto.repo.uploadBlob) requires raw image bytes — you cannot pass it a URL. So the publish code was fetching the PNG from the public URL and uploading those bytes. The fetch was failing silently because I had wrapped it in a try/catch that logged a warning and continued. The post went out; the thumbnail did not.

I noticed the issue the same day I shipped the summary pipeline. Checking Bluesky an hour after publish, the posts had no image. This is a different failure mode from the Cloudflare Pages 500 bug I hit last week — that was a build error, this was a deployment timing gap.

Why fetching from the public URL doesn't work here

The fix seems obvious in hindsight: the PNG already exists on the filesystem when step 4 runs. generate-summary.py ran in step 1 and wrote the PNG to apps/ai-tools/public/og/summary/<slug>.png inside the CI runner's working directory. That file is right there. I was fetching it from a URL that hadn't propagated yet when I could have just read it from disk.

The reason I fetched from the URL in the first place: the publish package (packages/publish/) is designed to work standalone — you should be able to run pnpm post:article content/articles/some-file.md from any checkout, including ones where the PNG doesn't exist locally. The summary_image frontmatter field provides the URL for that case. The local-disk path was the piece I hadn't thought through.

The fix: read from disk first

async function readLocalSummaryImage(
  articlePath: string,
): Promise<{ data: ArrayBuffer; mime: string } | null> {
  try {
    const slug = basename(articlePath, ".md");
    const repoRoot = dirname(dirname(dirname(articlePath)));
    const local = join(
      repoRoot,
      "apps/ai-tools/public/og/summary",
      `${slug}.png`,
    );
    const buf = await readFile(local);
    return {
      data: buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength),
      mime: "image/png",
    };
  } catch {
    return null;
  }
}

articlePath is something like /home/runner/work/seo-farm/seo-farm/content/articles/15-2026-05-10-slug.md. Three dirname calls walk up to the repo root:

dirname once: content/articles
dirname twice: content
dirname three times: repo root

Then join(repoRoot, "apps/ai-tools/public/og/summary",${slug}.png) resolves to the correct path. This assumes articles always live exactly two directory levels deep from the root — which is enforced by the naming convention.

One detail worth calling out: buf.buffer.slice(buf.byteOffset, buf.byteOffset + buf.byteLength). Node.js readFile returns a Buffer, which is a Uint8Array subclass. The .buffer property is the underlying ArrayBuffer, but Node.js pools small buffer allocations, so byteOffset may not be 0. If you pass the raw buf.buffer to uploadBlob, you might upload the surrounding pool memory, not just your file. The .slice(byteOffset, byteOffset + byteLength) extracts exactly the file's bytes.

The complete fallback chain

async function uploadBlob(
  session: BlueskySession,
  article: Article,
): Promise<BlobRef | null> {
  const local = await readLocalSummaryImage(article.filePath);
  const remote =
    local ?? (article.frontmatter.summary_image
      ? await fetchRemoteImage(article.frontmatter.summary_image)
      : null);
  if (!remote) return null;
  if (remote.data.byteLength > 950_000) {
    console.warn(`  bluesky thumb too large (${remote.data.byteLength} B), skipping`);
    return null;
  }
  // ... upload to com.atproto.repo.uploadBlob
}

Three levels:

Local disk — fast, no network, works in CI right after the PNG is generated
Remote URL — used when running publish outside of the CI context, e.g. re-publishing an old article from a fresh checkout where the local PNG doesn't exist
No image — if both fail, uploadBlob returns null. The caller composes and publishes a text-only post rather than blocking

The third outcome is intentional. Dev.to and Hashnode are the canonical publication targets for this project, as I laid out in the original architecture post. Bluesky is an additive distribution channel. A failed thumbnail should not block the article from going live.

The 950 KB size cap

Bluesky's uploadBlob endpoint fails silently if you send too much data — it returns an error body that my earlier code wasn't surfacing. The actual limit is around 1 MB. My generate-summary.py produces PNGs in the 500–600 KB range for the 1080×1350 format, but Playwright's PNG output varies based on font rendering and image complexity. I set the guard at 950,000 bytes to leave headroom.

If a PNG exceeds that cap — which hasn't happened in practice — the function returns null and the post goes out without a thumbnail. A warning is logged so it shows up in the CI run output.

Why Playwright for image generation

The ffmpeg CI pipeline post covers the OG image generation context. The short version: Playwright Chromium was already installed in the workflow for OG images, and the summary image template needed CSS Grid, custom fonts via Google Fonts CDN, and brand colors. Canvas-based approaches and simpler HTML-to-image libraries don't handle that reliably. Playwright screenshots match what a browser renders.

The cost is Chromium installation (~150 MB downloaded on each CI run). That's baked into the workflow budget already. The full publish workflow — including both Python image generation scripts, git operations, and article publish — stays under 10 minutes.

If I were starting from scratch and didn't need CSS Grid or external fonts, I'd look at Satori (JSX to SVG, no Chromium required). But migrating now would mean porting two HTML templates to JSX and adding a React dependency. Not worth it for the current scale.

What the `summary_image` frontmatter field still does

After the fix, summary_image in the article frontmatter still serves two purposes:

The URL that fetchRemoteImage uses when running publish from a fresh checkout (local PNG not present)
The value passed to Hashnode's coverImage field for that platform's API

The generate-summary.py script writes the field into the frontmatter after generating the PNG: summary_image: https://aiappdex.com/og/summary/<slug>.png. That URL becomes valid once Cloudflare Pages finishes deploying, so Hashnode's own image hosting picks it up correctly. Bluesky is the only platform where timing matters because it's the only one that needs the bytes uploaded in the same CI run.

What I'd do differently

Explicit deploy wait is the cleanest long-term solution. Cloudflare Pages exposes a deployment status API. Adding a polling loop (30-second intervals, 5-minute timeout) before the Bluesky publish step would let me fetch from the public URL and remove the local path dependency entirely. I skipped it because the disk approach was faster to implement and required no additional API tokens. If the Cloudflare API key is already present in the workflow for other reasons, the polling loop is probably worth adding.

Pre-upload images as a separate step would decouple image availability from article timing. Upload all PNGs to Bluesky's blob store in one step, collect the CIDs, then reference them when composing posts. The tradeoff: an extra round-trip per image before articles are ready, and you'd need to store the CID somewhere accessible to the publish step. For my publish frequency, the added complexity isn't worth it.

Absolute path from an environment variable would be more robust than three dirname calls. Something like REPO_ROOT=${GITHUB_WORKSPACE} passed to the publish command, then join(process.env.REPO_ROOT, "apps/ai-tools/public/og/summary", slug). The current dirname chain works as long as articles stay at exactly the same depth — a constraint that's currently enforced but could silently break if someone moves the articles directory.

I'll likely switch to the REPO_ROOT approach before adding a second Astro app, since the path to the summary image directory is app-specific and the current code hardcodes apps/ai-tools.

FAQ

Why does Bluesky require uploading raw bytes instead of accepting a URL?
The AT Protocol's app.bsky.embed.external type has a thumb field for link card thumbnails, but thumb must be a blob reference (a CID of content uploaded to the network). There's no provision for referencing an external URL directly. This is by design — the protocol aims to be host-agnostic, so content is stored in the network rather than linked from third-party servers.

Does the same race affect Dev.to and Hashnode cover images?
Dev.to and Hashnode both accept cover_image as a URL in their APIs. The OG image URL could theoretically get a 404 on the first request if Cloudflare Pages hasn't deployed. In practice this hasn't been a problem: both platforms retry image fetches asynchronously after article creation, so a 404 on initial publish is usually recovered. Bluesky uploads the bytes synchronously at post time — there's no retry.

How does the publish script know which articles are unpublished?
The packages/publish/ code reads published_urls from each article's frontmatter. If published_urls.bluesky is missing, the article gets published to Bluesky. If present, it's skipped. The Claude Haiku client post touches on a similar pattern for ETL idempotency.

What happens if the CI run is retried after a partial failure?
If the publish step fails halfway through — say, Dev.to succeeds but Bluesky fails — the CI re-run starts with published_urls.devto already set and published_urls.bluesky still missing. The script re-attempts only Bluesky. The local PNG is still present from the earlier run (the repo is checked out fresh, and generate-summary.py runs again at the top of the retry). So the disk-read path works correctly on retries too.

Why not use the Bluesky queue for retries instead?
I do have a content/bluesky-queue.jsonl file for deferred posts (used for the bluesky-queue.yml workflow). But that queue is for scheduled future posts, not for CI retries. Running a separate queue flush after a failed image upload would work but adds complexity. The simpler fix — just read from disk — handles the timing problem directly.

The publish workflow for this article's own images will use this fix. If you're running a similar Playwright-generate-then-social-publish pipeline and seeing bare posts on Bluesky, the local disk path is probably the fastest resolution. The summary_image URL still matters for other platforms — you just don't need to wait for it to be live before Bluesky gets its bytes.

DEV Community

How I fixed a Bluesky image upload race against Cloudflare Pages deploy lag

The pipeline that created the problem

Why fetching from the public URL doesn't work here

The fix: read from disk first

The complete fallback chain

The 950 KB size cap

Why Playwright for image generation

What the `summary_image` frontmatter field still does

What I'd do differently

FAQ

Top comments (0)

The pipeline that created the problem

Why fetching from the public URL doesn't work here

The fix: read from disk first

The complete fallback chain

The 950 KB size cap

Why Playwright for image generation

What the summary_image frontmatter field still does

What I'd do differently

FAQ

What the `summary_image` frontmatter field still does