The Bluesky image upload race I fixed a few weeks ago was the last painful incident in an otherwise simple posting pipeline. Here's how the queue system works — the design is different from every social-scheduling SaaS I looked at, and that difference matters on GitHub Actions.
The queue: a flat JSONL file
The entire post schedule lives in content/bluesky-queue.jsonl. Each line is a self-contained JSON object:
{"text": "New article: What I learned about JSON-LD audits in CI. #webdev #tutorial https://aiappdex.com/articles/jsonld-audit-post-deploy-ci"}
{"text": "TIL: Turso vs Cloudflare D1 for Astro monorepos — the practical difference. #opensource #astro https://ossfind.com/articles/turso-libsql"}
{"posted_at": "2026-05-20T09:02:15Z", "post_uri": "at://did:plc:abc123/app.bsky.feed.post/3xyz", "text": "..."}
Unposted entries have only text. After a post succeeds, the script rewrites that line in-place with posted_at and post_uri added. The queue drains from top to bottom; the script picks the first line without a posted_at field and exits after posting one entry.
This format is a deliberate trade-off. It's not a real database. You can't query it, you can't easily filter by tag, and editing it by hand means being careful about JSON syntax on every line. What it gives you: a single file that's diff-friendly in git history, trivially readable, and appended to by any CI job that generates content — the article-publish workflow appends a Bluesky promotion line to the queue after each successful publish.
The post script: richtext facets for hashtags and URLs
Bluesky's API expects richtext facets — byte-range annotations that tell the client which parts of the text are links or hashtags. These aren't inferred; you have to compute them and include them in the post record. The post script builds them from the text string using regex:
function buildFacets(text) {
const facets = [];
const enc = new TextEncoder();
for (const m of text.matchAll(/(?:^|[\s,.;:!?])(#[a-zA-Z][a-zA-Z0-9_]*)/g)) {
const tagWithHash = m[1];
const offset = (m.index ?? 0) + m[0].length - tagWithHash.length;
const byteStart = enc.encode(text.slice(0, offset)).length;
const byteEnd = byteStart + enc.encode(tagWithHash).length;
facets.push({
index: { byteStart, byteEnd },
features: [{ $type: "app.bsky.richtext.facet#tag", tag: tagWithHash.slice(1) }],
});
}
for (const m of text.matchAll(/https?:\/\/[^\s)]+/g)) {
const byteStart = enc.encode(text.slice(0, m.index ?? 0)).length;
const byteEnd = byteStart + enc.encode(m[0]).length;
facets.push({
index: { byteStart, byteEnd },
features: [{ $type: "app.bsky.richtext.facet#link", uri: m[0] }],
});
}
return facets;
}
The byte offset calculation is the non-obvious part. Bluesky byte ranges are UTF-8 byte positions, not JavaScript character positions. A string with emoji before a hashtag would have different byte and character offsets. Using TextEncoder to measure text.slice(0, offset) gives the correct UTF-8 byte position regardless of what precedes the match.
Off-minute cron scheduling
The workflow fires three times daily:
schedule:
- cron: "37 23 * * *" # 08:37 UTC → ~09:00 JST
- cron: "37 7 * * *" # 16:37 UTC → ~17:00 JST
- cron: "37 13 * * *" # 22:37 UTC → ~23:00 JST
The :37 offset is intentional. GitHub Actions schedules at top-of-hour slots — 0 * * * *, 0 0 * * * — are heavily contended globally. I measured 3–4 hour actual delays on a 0 0 * * * slot before moving to :37. The off-minute timing doesn't eliminate delay but reduces it significantly; real-world delivery now lands within 15–20 minutes of the intended JST time.
Inside the job, there's a random additional delay before posting:
- name: Random start delay (0-5 min) to avoid bot-pattern timing
run: |
DELAY=$(( RANDOM % 300 ))
echo "Sleeping ${DELAY}s before posting"
sleep $DELAY
This spreads the actual post time across a 5-minute window. Bluesky's feed algorithms tend to de-emphasize accounts that post at machine-exact times; the random delay makes the pattern look more organic. I don't have data to prove it works, but the cost is zero.
Self-trigger prevention
After posting, the script rewrites content/bluesky-queue.jsonl and commits the change back to the repo. Without a guard, that commit would trigger the workflow again immediately — draining the queue faster than intended.
The guard is a commit message convention:
- name: Commit queue update
run: |
git add content/bluesky-queue.jsonl
git commit -m "chore(bluesky): mark queued post as posted [skip bluesky-queue]"
git push
jobs:
post:
if: "!contains(github.event.head_commit.message, '[skip bluesky-queue]')"
The workflow skips if the triggering commit contains [skip bluesky-queue]. It's the same skip-token pattern I use across every self-committing workflow in the shared CI pipeline — article ETL, OG image regeneration, the sitemap rebuild — each with a distinct [skip <name>] token so workflows don't accidentally skip each other.
What I'd do differently
The JSONL format breaks down if you want to schedule posts for a specific future date rather than "next in queue." A scheduled_after ISO timestamp field would fix this without changing the format much; the picker logic shifts from "first line without posted_at" to "first line where scheduled_after <= now and posted_at is absent."
The other gap: no retry backoff on failed posts. If the Bluesky API returns an error, the script exits and the next scheduled run retries the same entry — correct behavior, but without backoff a transient 500 hits the same entry three times across the day's posting slots. So far that hasn't caused duplicate posts, but it's a latent issue worth fixing before the account grows.
Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.
Top comments (0)