Building a Personal Content Pipeline: From Draft to Published Post

#ai #automation #developer

I write about infrastructure, scripting, and deployment patterns. Keeping a weekly publishing schedule meant I either stayed up late formatting markdown files or let half-finished posts accumulate in a local folder. I didn’t need an assistant that wrote my thoughts for me. I needed a system that handled the repetitive parts of my workflow: expanding outlines into consistent structure, running style checks, and pushing to production only when everything passed validation.

The result is a file-based pipeline that runs on a schedule, logs each step, and only publishes when a draft clears automated review. It treats content generation as one discrete stage among several, not as a replacement for editorial judgment.

The architecture is deliberately linear and state-driven. Every stage communicates through files in a single repository. A topics/ directory holds JSON manifests with titles, target word counts, and technical constraints. A drafts/ directory stores generated markdown. A quarantine/ directory catches anything that fails validation. GitHub Actions handles orchestration, and a static site generator renders the final output. Keeping everything file-backed means I can audit exactly what changed between runs, roll back a bad post, and run stages manually when needed.

Stage one covers generation. I maintain a queue of outlines rather than vague prompts. Each outline contains a title, three required sections, a list of prohibited phrases, and a target reading level. A Python script reads the queue, formats a strict system prompt, and sends it to an OpenRouter endpoint. I route requests through a single wrapper function that handles rate limiting, retries on 5xx errors, and writes the raw API response to disk before parsing. The script strips trailing whitespace, ensures heading hierarchy matches my site’s template, and injects YAML front matter. The output lands in drafts/YYYY-MM-DD-slug.md. If the API call fails or returns malformed markdown, the script logs the error, skips the entry, and leaves the queue intact for the next run.

Stage two is where the pipeline earns its keep. Generation without validation just automates inconsistency. Every new draft runs through two parallel checks. The first uses markdownlint-cli2 with a custom .markdownlint.json that enforces strict rules: no nested lists deeper than two levels, no inline HTML, mandatory alt text for images, and consistent heading capitalization. The second check runs a Python validator that parses the draft with python-markdown, verifies code blocks include a language identifier, runs pyspellchecker against a custom dictionary of infrastructure terms, and calculates a Flesch-Kincaid score. If any rule triggers, the draft moves to quarantine/ and a GitHub Issue opens with file paths, line numbers, and the specific violations. I review those manually, adjust the outline or prompt, and re-run the pipeline. No draft reaches the public branch without passing both checks.

Here’s the configuration that ties the stages together:

# pipeline.yaml
generation:
  model: "anthropic/claude-3-haiku-20240307"
  max_tokens: 4000
  temperature: 0.3
  system_prompt: "system_prompts/tech_writing.md"

validation:
  markdownlint_config: ".markdownlint.json"
  spellcheck_dict: "tech_terms.dic"
  min_readability_score: 55
  quarantine_dir: "quarantine"

publishing:
  target_branch: "main"
  site_generator: "hugo"
  deploy_target: "cloudflare_pages"
  project_name: "personal-blog"

Stage three handles deployment. Once validation passes, the pipeline moves the draft into a staged/ directory, commits it with a message containing the generation timestamp and model used, and pushes to a release/candidate branch. A GitHub Actions workflow triggers on that branch, runs a local Hugo build, and deploys to Cloudflare Pages using wrangler. The workflow waits for the deploy status, then calls a lightweight webhook to update a private spreadsheet with publish time, word count, and validation metrics. If the build fails, the workflow reverts the merge, deletes the candidate branch, and sends a Slack notification with the build logs. This rollback behavior prevents broken posts from ever reaching the live site.

I deliberately avoid chaining API calls where possible. The pipeline downloads the draft, runs local checks, and only makes outbound requests for deployment and logging. This reduces dependency on external services, makes debugging straightforward, and keeps costs predictable. I also cap generation to one post per week. The queue structure means I can batch outlines during free time, but the cron schedule forces a steady pace rather than a content dump.

Maintenance is mostly about updating the validation rules. When I changed my code block formatting standard, I updated the Python validator and ran it against the last twenty drafts to verify backward compatibility. When the LLM provider rotated their endpoint, I swapped the base URL in the generation config without touching the rest of the pipeline. The file-based state model keeps changes isolated and testable.

This setup doesn’t remove writing from my workflow. It removes the friction of formatting, spell-checking, linting, and manual deployment. I still choose the topics, adjust the structure, and decide when a draft is ready to publish. The pipeline just handles the mechanical steps that used to eat into my evenings. If you spend more time managing markdown files than actually writing them, treating your blog like a CI/CD process is worth the initial setup.

💡 Further Reading: Pi Stack

DEV Community

Building a Personal Content Pipeline: From Draft to Published Post

Top comments (0)