Azamat Safarov

Posted on May 25

Building an Autoposting Pipeline with Hermes Agent: Why Waterfall Beats Parallel, and the Edge Cases Nobody Talks About

#python #automation #architecture #hermes

I write every day. Distributing to 8 platforms used to take 50 minutes. Now it takes 90 seconds. Here's what I learned building the pipeline, why waterfall beats parallel, and every API trap that almost made me quit.

The Problem

I write every day. For two years, every article ended the same way: open eight tabs, copy the markdown, strip formatting for Telegram, compress images for Mastodon, rewrite the hook for Bluesky's 300-character limit, paste URLs into each platform, log what went where. Fifty minutes of mechanical work after the creative part was done.

The real cost isn't time. It's context switching. After four hours of writing, switching to "platform adaptation mode" feels like starting a second job. You think "I'll distribute tomorrow" and tomorrow you don't remember what the article was about.

I tried Buffer. Tried Zapier. Both fail on the same two fronts: they don't handle markdown-to-plaintext adaptation (Telegram and VK show **bold** literally, not bold), and they don't support Bluesky AT Protocol or Paragraph GraphQL. For niche platforms, custom publishers are the only path.

So I built one. Eight platforms, four pipeline stages, one command.

Architecture: Why Waterfall, Not Parallel

First attempt was naive: blast all platforms simultaneously. Failed immediately.

The issue is dependency chains. Bluesky teasers need the WordPress URL to build link cards — you can't post a teaser before the canonical article exists. Dev.to and Paragraph have rate limits that, when hit in parallel, cascade into 429 storms across all publishers. One platform choking kills the entire run.

Solution: waterfall with selective concurrency. Four stages, each producing artifacts the next stage consumes.

┌─────────────────────────────────────────────────────────────────────┐
│  PIPELINE: WATERFALL WITH SELECTIVE CONCURRENCY                       │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Stage 1: Content Adaptation (sequential)                           │
│    → Parse markdown source                                          │
│    → Generate platform-specific variants via Jinja2 templates       │
│    → Compress images per platform limits (FFmpeg)                    │
│                                                                     │
│  Stage 2: Primary Hub (sequential, blocking)                        │
│    → WordPress: publish full article → get canonical URL            │
│    → Dev.to: publish markdown variant → get dev.to URL              │
│                                                                     │
│  Stage 3: Social Teasers (parallel, 4 threads)                     │
│    → Bluesky: 300 chars + compressed image + link                   │
│    → Mastodon: 500 chars + media upload + link                      │
│    → Tumblr: photo post + caption + link                            │
│    → Paragraph: markdown + external image URLs + link               │
│                                                                     │
│  Stage 4: Archive (sequential)                                      │
│    → Git commit with all published URLs                             │
│    → Update LLM-Wiki index                                          │
│    → Write execution log for debugging                              │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Full architecture: LLM-Wiki vault → adaptation → 20+ platforms. Hermes Agent orchestrates each stage.

Why waterfall? Each stage produces artifacts the next stage consumes. WordPress URL becomes the canonical reference for all social platforms. Without it, you're posting orphaned content that disappears in 48 hours.

Stage 1: Content Adaptation

This is where most pipelines die silently. You can't post the same markdown to Dev.to, Telegram, and Bluesky. Dev.to renders ## headers natively. Telegram shows raw # symbols. Bluesky strips all markdown and shows plain text. Each platform needs its own variant.

I built an adaptation layer with Jinja2 templates — one per platform. The source is always the same markdown file in the LLM-Wiki vault. The adapter reads it, applies platform rules, and generates the variant.

# scripts/adaptation.py
import re
from pathlib import Path
from jinja2 import Environment, FileSystemLoader

TEMPLATES_DIR = Path(__file__).parent.parent / "templates"
env = Environment(loader=FileSystemLoader(TEMPLATES_DIR))

def adapt_for_platform(source_md: str, platform: str) -> str:
    """Generate platform-specific variant from canonical markdown."""
    template = env.get_template(f"{platform}.j2")

    # Platform-specific preprocessing
    if platform in ("telegram", "vk"):
        # Strip all markdown — these platforms don't render it
        text = re.sub(r'\*\*(.*?)\*\*', r'\1', source_md)   # bold
        text = re.sub(r'\*(.*?)\*', r'\1', text)             # italic
        text = re.sub(r'__(.*?)__', r'\1', text)            # underline
        text = re.sub(r'~~(.*?)~~', r'\1', text)            # strikethrough
        text = re.sub(r'\[(.*?)\]\((.*?)\)', r'\2', text)   # links → bare URL
        text = re.sub(r'!\[.*?\]\(.*?\)', '', text)        # remove images
        text = re.sub(r'#{1,6}\s+', '', text)              # remove headers
        return template.render(content=text, has_images=False)

    elif platform == "bluesky":
        # 300 graphemes hard limit — distill to hook + detail + URL
        text = re.sub(r'#{1,6}\s+', '', source_md)
        text = re.sub(r'\*\*(.*?)\*\*', r'\1', text)
        text = re.sub(r'\n+', ' ', text)
        # Leave room for URL (~30 chars)
        return text[:270].strip() + "..."

    elif platform == "devto":
        # Markdown-native but SVG doesn't render
        text = source_md.replace(".svg)", ".png)")  # rough SVG→PNG swap
        return template.render(content=text, tags=extract_tags(source_md))

    elif platform == "wordpress":
        # Full HTML conversion for WordPress REST API
        return template.render(content=markdown_to_html(source_md))

    elif platform == "mastodon":
        # 500 chars, no markdown
        text = re.sub(r'\*\*(.*?)\*\*', r'\1', source_md)
        text = re.sub(r'\n+', ' ', text)
        return text[:470].strip() + "..."

    elif platform == "paragraph":
        # Markdown accepted, but images must be external URLs
        return template.render(content=source_md, images_are_external=True)

    return source_md  # fallback

def extract_tags(md: str) -> list:
    """Extract tags from YAML frontmatter, limit to 4 for Dev.to."""
    match = re.search(r'^tags:\s*(.+)', md, re.MULTILINE)
    if match:
        tags = [t.strip() for t in match.group(1).split(",")]
        return tags[:4]  # Dev.to hard limit
    return ["python", "automation"]

Platform rules (hard-won through failure):

Platform	Markdown	Images	Length Limit	Special Rules
Telegram / VK	Strip all	External URL previews only	~4000 chars	Bare URLs for previews, no markdown
Bluesky	Strip all	Blob upload (2MB max)	300 graphemes	Must count graphemes, not bytes
Mastodon	Strip all	Two-step media upload	500 chars	`POST /api/v1/media` → `id` → `POST /api/v1/statuses`
Dev.to	Native	Must be PNG (SVG breaks)	None	4 tags max, draft mode available
WordPress	HTML	External URLs only (free plan)	None	`posts` scope only, no `media`
Paragraph	Native	External URLs only	None	GitHub raw URLs hotlink-blocked

The adaptation stage takes ~30 seconds — reads source, applies all rules, writes 8 variants. Each variant is git-versioned.

Stage 2: Primary Hub — WordPress & Dev.to

Two platforms publish first. Both produce URLs that downstream platforms need.

WordPress is the SEO anchor. Every social teaser links back to it as canonical source. Social platforms are not indexed by Google; WordPress is. Without it, you're posting orphaned content.

# api/publishers/wordpress.py
import os
import requests

ACCESS_TOKEN = os.environ.get('WORDPRESS_ACCESS_TOKEN')
BLOG_ID = os.environ.get('WORDPRESS_BLOG_ID')

def publish_post(title: str, content: str, featured_image_url: str = None,
                 categories: list = None, tags: str = "") -> dict:
    url = f"https://public-api.wordpress.com/rest/v1.2/sites/{BLOG_ID}/posts/new"
    headers = {
        "Authorization": f"Bearer {ACCESS_TOKEN}",
        "Content-Type": "application/json",
    }
    payload = {
        "title": title,
        "content": content,
        "status": "publish",
    }
    if categories:
        payload["categories"] = categories
    if tags:
        payload["tags"] = tags
    if featured_image_url:
        payload["featured_image"] = featured_image_url

    r = requests.post(url, headers=headers, json=payload, timeout=30)
    data = r.json()

    return {
        "success": "ID" in data,
        "url": data.get("URL"),
        "id": data.get("ID")
    }

Critical limitation: Free WordPress.com plans only grant posts OAuth scope. media scope requires paid plan. Cannot upload images via API on free tier. Workaround: host images externally and reference by URL.

Dev.to is the technical hub. Markdown-native, code blocks work, frontmatter tags auto-categorize. The Dev.to URL becomes the "also published on" link.

# api/publishers/devto.py
import os
import requests

API_KEY = os.environ.get('DEVTO_API_KEY')

def publish_article(title: str, body: str, tags: list, published: bool = False) -> dict:
    url = "https://dev.to/api/articles"
    headers = {
        "api-key": API_KEY,
        "Content-Type": "application/json"
    }

    payload = {
        "article": {
            "title": title,
            "body_markdown": body,
            "published": published,
            "tags": tags[:4]  # Dev.to hard limit: 4 tags max
        }
    }

    r = requests.post(url, headers=headers, json=payload, timeout=30)
    data = r.json()

    if r.status_code == 201:
        return {"success": True, "url": data.get("url"), "id": data.get("id")}
    return {"success": False, "error": data.get("error", f"HTTP {r.status_code}")}

Invisible limits discovered:

4 tags maximum. Sending 5 returns 422 "Tag list exceed the maximum of 4 tags".
SVG images don't render. Dev.to's CDN doesn't process SVG. Convert to PNG via FFmpeg before upload.
Draft mode: published: false creates a draft visible in your dashboard. You review, then hit "Publish" manually.

Both must complete before Stage 3 starts. If WordPress fails, the pipeline halts.

Stage 3: Social Teasers — Parallel

Once WordPress and Dev.to return URLs, four social platforms fire in parallel.

Bluesky: AT Protocol Is Not REST

Bluesky doesn't use REST. It uses AT Protocol — binary blob uploads, record creation via JSON-RPC, 2MB blob size limit.

# api/publishers/bluesky.py
import os
import requests
from datetime import datetime, timezone

HANDLE = os.environ.get('BLUESKY_HANDLE')
APP_PASSWORD = os.environ.get('BLUESKY_APP_PASSWORD')
BASE_URL = "https://bsky.social/xrpc"

def create_session() -> dict:
    r = requests.post(
        f"{BASE_URL}/com.atproto.server.createSession",
        json={"identifier": HANDLE, "password": APP_PASSWORD},
        timeout=30
    )
    data = r.json()
    return {
        "success": "accessJwt" in data,
        "accessJwt": data.get("accessJwt"),
        "did": data.get("did")
    }

def upload_blob(image_path: str, session: dict) -> dict:
    ext = image_path.lower().split(".")[-1]
    mime = {"jpg": "image/jpeg", "jpeg": "image/jpeg", "png": "image/png", "gif": "image/gif"}.get(ext, "image/png")

    with open(image_path, "rb") as f:
        data = f.read()

    r = requests.post(
        f"{BASE_URL}/com.atproto.repo.uploadBlob",
        headers={
            "Authorization": f"Bearer {session['accessJwt']}",
            "Content-Type": mime
        },
        data=data,
        timeout=120  # 1.5MB uploads need time
    )

    blob = r.json().get("blob")
    return {"success": bool(blob), "blob": blob}

def post(text: str, image_path: str = None) -> dict:
    session = create_session()
    if not session["success"]:
        return session

    now = datetime.now(timezone.utc).isoformat().replace("+00:00", "Z")

    record = {
        "$type": "app.bsky.feed.post",
        "text": text,
        "createdAt": now,
    }

    if image_path and os.path.exists(image_path):
        if os.path.getsize(image_path) > 2_000_000:
            compressed = image_path.replace(".png", "-compressed.jpg")
            os.system(f"ffmpeg -y -i {image_path} -q:v 2 {compressed}")
            image_path = compressed

        blob = upload_blob(image_path, session)
        if blob["success"]:
            record["embed"] = {
                "$type": "app.bsky.embed.images",
                "images": [{"alt": "", "image": blob["blob"]}]
            }

    r = requests.post(
        f"{BASE_URL}/com.atproto.repo.createRecord",
        headers={"Authorization": f"Bearer {session['accessJwt']}"},
        json={
            "repo": session["did"],
            "collection": "app.bsky.feed.post",
            "record": record
        },
        timeout=30
    )

    data = r.json()
    if "uri" in data:
        post_id = data["uri"].split("/")[-1]
        return {
            "success": True,
            "url": f"https://bsky.app/profile/{HANDLE}/post/{post_id}"
        }
    return {"success": False, "error": data}

Three traps in this code:

uploadBlob expects raw bytes with Content-Type header — not multipart files={...}. requests.post(..., data=bytes, headers={"Content-Type": "image/png"}) — not files={"image": open(...)}.
Default 30-second timeout is too short for 1.5MB uploads. Increased to 120 seconds after three consecutive failures.
300-grapheme limit is hard. len(text) counts codepoints; Japanese emoji counts as multiple grapemes. Must use regex.findall(r'\X', text) for accurate counting.

Mastodon: Two-Step Media

Mastodon requires separate media upload before status creation.

# api/publishers/mastodon.py
import os
import requests

BASE_URL = os.environ.get('MASTODON_URL')
ACCESS_TOKEN = os.environ.get('MASTODON_ACCESS_TOKEN')

def upload_media(image_path: str) -> dict:
    url = f"{BASE_URL}/api/v1/media"
    headers = {"Authorization": f"Bearer {ACCESS_TOKEN}"}

    with open(image_path, "rb") as f:
        files = {"file": (os.path.basename(image_path), f)}
        r = requests.post(url, headers=headers, files=files, timeout=60)

    data = r.json()
    return {
        "success": "id" in data,
        "id": data.get("id"),
        "url": data.get("url")
    }

def post_status(text: str, media_ids: list = None) -> dict:
    url = f"{BASE_URL}/api/v1/statuses"
    headers = {"Authorization": f"Bearer {ACCESS_TOKEN}"}
    payload = {"status": text}
    if media_ids:
        payload["media_ids[]"] = media_ids

    r = requests.post(url, headers=headers, data=payload, timeout=30)
    data = r.json()

    return {
        "success": "id" in data,
        "url": data.get("url")
    }

Single-call status creation with both text and file does NOT work. Must be two requests.

Tumblr and Paragraph follow similar patterns — media upload first, then post creation with media IDs or external URLs.

Stage 4: Archive & The Orchestrator

Hermes Agent schedules tasks but never autopublishes without explicit approval.

# scripts/orchestrator.py
import sys
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor

sys.path.insert(0, str(Path(__file__).parent.parent))

from api.publishers.bluesky import post as publish_bluesky
from api.publishers.wordpress import publish_post as publish_wordpress
from api.publishers.devto import publish_article as publish_devto
from api.publishers.mastodon import post_status as publish_mastodon, upload_media as upload_mastodon_media
from api.publishers.paragraph import publish_post as publish_paragraph
from scripts.adaptation import adapt_for_platform

def run_pipeline(article_path: str, image_path: str, dry_run: bool = True):
    with open(article_path) as f:
        full_text = f.read()

    # Stage 1: Adaptation
    variants = {
        "wordpress": adapt_for_platform(full_text, "wordpress"),
        "devto": adapt_for_platform(full_text, "devto"),
        "bluesky": adapt_for_platform(full_text, "bluesky"),
        "mastodon": adapt_for_platform(full_text, "mastodon"),
        "tumblr": adapt_for_platform(full_text, "tumblr"),
        "paragraph": adapt_for_platform(full_text, "paragraph"),
    }

    if dry_run:
        print("[DRY RUN] Platform variants generated:")
        for platform, text in variants.items():
            print(f"  {platform}: {len(text)} chars")
        return

    # Stage 2: Primary Hub (sequential, blocking)
    wp = publish_wordpress({
        "title": "Article Title",
        "content": variants["wordpress"],
        "status": "publish",
        "categories": ["Productivity", "Tools"],
        "tags": "automation, python, publishing, hermes"
    })
    canonical_url = wp.get("url")

    dev = publish_devto({
        "title": "Article Title",
        "body_markdown": variants["devto"],
        "published": False,  # Draft for review
        "tags": ["python", "automation", "publishing", "hermes"]
    })

    # Stage 3: Social Teasers (parallel, 4 threads)
    def publish_bluesky_trailer():
        text = f"{variants['bluesky']} → {canonical_url}"
        return publish_bluesky(text, image_path)

    def publish_mastodon_trailer():
        media = upload_mastodon_media(image_path)
        text = f"{variants['mastodon']} → {dev.get('url', canonical_url)}"
        return publish_mastodon(text, [media["id"]] if media["success"] else [])

    def publish_tumblr_post():
        # Tumblr API: photo post with caption
        return publish_tumblr(image_path, variants["tumblr"], canonical_url)

    def publish_paragraph_article():
        return publish_paragraph("Article Title", variants["paragraph"])

    with ThreadPoolExecutor(max_workers=4) as executor:
        futures = {
            "bluesky": executor.submit(publish_bluesky_trailer),
            "mastodon": executor.submit(publish_mastodon_trailer),
            "tumblr": executor.submit(publish_tumblr_post),
            "paragraph": executor.submit(publish_paragraph_article),
        }
        social_results = {k: v.result() for k, v in futures.items()}

    # Stage 4: Archive
    log_publish_results(wp, dev, social_results)
    git_commit_with_urls(wp, dev, social_results)
    update_llm_wiki_index(wp, dev, social_results)

Human-in-the-loop design:

User says "publish this article"
Agent generates all variants, shows previews
User reviews Dev.to draft (most complex variant)
User says "confirmed" — only then API calls execute

This prevents the "oh no I just live-posted a draft" panic. Every publisher has dry_run=True default.

Failure handling: If Bluesky fails, the pipeline logs the error, continues with remaining platforms, reports partial success:

Bluesky: FAILED (blob > 2MB)
WordPress: SUCCESS — https://azamatsafarov.wordpress.com/...
Dev.to: SUCCESS (draft) — https://dev.to/...
Mastodon: SUCCESS — https://mastodon.social/@...

Edge Cases: The Real Curriculum

Edge Case	How I Discovered It	The Fix
Bluesky blob > 2MB	Three consecutive failures Thursday evening, all 400 with no error body	FFmpeg compression: `ffmpeg -y -i input.png -q:v 2 output.jpg`
WordPress free plan blocks media upload	403 on every image upload, docs say "check token scope"	Host images externally, reference by URL
Dev.to SVG broken	Pipeline diagram showed empty box for a week	`ffmpeg -y -i diagram.svg diagram.png` in adaptation stage
Dev.to 5th tag rejected	Got 422, read error, realized the limit	Hard truncate to 4 tags. No negotiation
Mastodon two-step media	Posted text without image three times	Two separate requests: `POST /api/v1/media` → `id` → `POST /api/v1/statuses`
Bluesky 300 graphemes	Posted 305-char teaser, got `"record too large"` with no details	Grapheme counting via `regex.findall(r'\X', text)` before posting
Paragraph hotlink blocking	GitHub raw URLs showed as broken images	VK CDN URLs from `photos.getById` after VK upload
LinkedIn token expires	Posts failed with 401 after 60 days	Manual refresh. Not worth automating for monthly posting
Markdown divergence	Telegram showed `bold` literally, VK showed raw brackets	Strip all markdown for Telegram/VK in adaptation stage
NotebookLM cookies expire	Cron job ran at 11:00, failed silently at auth	Two-step cron: 10:30 reminder → user pastes cookies → 11:00 publish

Results: What Actually Changed

Metric	Before Pipeline	After Pipeline
Time per article	50 minutes (8 platforms × ~6 min each)	90 seconds (automation) + 5 min review
Platforms covered	3–4, depending on energy	8 automated + drafts for 10 more
Failed uploads	~30% (forgot image, wrong format, expired token)	~5% (token expiration only)
SEO indexing	None — social content ephemeral	WordPress as canonical, indexed by Google
Context switching	8 tabs, 8 different UIs, 8 copy-paste operations	One command, one preview, one approval
Article versioning	None — distributed copies diverged	Git tracks every variant, platform, URL

The biggest change isn't speed. It's consistency. Before the pipeline, I distributed to 3–4 platforms depending on how tired I was. The pipeline makes 8 platforms the default. Energy-independent.