Jackson Studio

Posted on Feb 21

Blog Ops #2: 30-Day Pipeline Data — What Actually Happened

#automation #python #devops #blogging

Blog Ops #2: I Analyzed 30 Days of My Content Pipeline — Here's What the Data Actually Said

30 days in. Real numbers. Some surprises, some embarrassments.

When I launched my automated content pipeline last month (Blog Ops #1), I had exactly zero data to back my choices. I was running on gut feeling and Stack Overflow. Now I have 30 days of real pipeline logs, traffic data, and — crucially — a few face-palm moments I want to share.

Spoiler: The thing I spent the most time building barely moved the needle. The thing I almost skipped became my highest-traffic driver.

Let me show you exactly what happened.

The Baseline: What My Pipeline Looked Like on Day 1

Quick recap for context. My setup:

Jekyll blog on GitHub Pages
Python scripts to cross-post to Dev.to
Cron jobs to schedule and automate publishing
GitHub Actions for build/deploy

Target: 10+ posts/week across blog + Dev.to. Actual output on Day 1: 2 posts published, 3 stuck in draft hell.

The Data: 30 Days of Pipeline Logs

I instrumented my deploy script from Day 1 to log every publish attempt. Here's what came out:

Total publish attempts: 143
Successful on first try: 89 (62%)
Retried once:           31 (22%)
Retried 2+ times:        9 (6%)
Failed completely:       14 (10%)

A 10% complete failure rate sounds terrible. But when I dug into why, it was nearly all the same root cause.

The #1 Killer: Front Matter Drift

Of the 14 failed posts:

11 were front matter validation errors (missing description, wrong date format, tags over the 4-item limit)
2 were rate-limit hits (I got impatient and hammered the Dev.to API)
1 was a genuine network timeout

I had been treating front matter as boilerplate. Turns out it's the thing that kills your pipeline reliability most often.

The fix was embarrassingly simple:

# validate_frontmatter.py — add this before every deploy
import yaml
import sys

REQUIRED_FIELDS = ['title', 'date', 'description', 'tags', 'published']
MAX_TAGS = 4

def validate(filepath):
    with open(filepath, 'r') as f:
        content = f.read()

    # Extract front matter between --- markers
    if not content.startswith('---'):
        return False, "No front matter found"

    parts = content.split('---', 2)
    if len(parts) < 3:
        return False, "Malformed front matter"

    try:
        meta = yaml.safe_load(parts[1])
    except yaml.YAMLError as e:
        return False, f"YAML parse error: {e}"

    # Check required fields
    for field in REQUIRED_FIELDS:
        if field not in meta:
            return False, f"Missing field: {field}"

    # Check tag limit
    if isinstance(meta.get('tags'), list) and len(meta['tags']) > MAX_TAGS:
        return False, f"Too many tags: {len(meta['tags'])} (max {MAX_TAGS})"

    # Check description length
    desc = str(meta.get('description', ''))
    if len(desc) > 160:
        return False, f"Description too long: {len(desc)} chars (max 160)"

    return True, "OK"

if __name__ == '__main__':
    ok, msg = validate(sys.argv[1])
    print(f"{'✅' if ok else '❌'} {msg}")
    sys.exit(0 if ok else 1)

After adding this as a pre-deploy check, my success-on-first-try rate jumped from 62% → 91% in the following week. One script, 40 lines, instant impact.

Traffic: The Surprise Rankings

I expected my "big concept" posts to dominate. I was wrong.

Here's my actual traffic breakdown for the month (normalized to hide absolute numbers — the ratios are what matter):

Post Type	Relative Traffic	Time to First 100 Readers
"I Built X" stories	3.2x baseline	4 hours
Step-by-step tutorials	1.8x baseline	18 hours
Opinion/hot take	1.4x baseline	2 hours
Concept explanations	1.0x baseline (baseline)	36 hours
"Top N Tools" lists	0.7x baseline	72+ hours

The "Top N Tools" format — which I initially planned as my bread and butter — was my worst performer by a significant margin. Meanwhile, my "I built X" posts consistently hit early traction.

This tracks with what the Reddit programming communities showed me: people are drowning in tool lists. They want to see what you actually made.

The Cron Schedule Problem (And How I Fixed It)

My original cron schedule was optimized for... nothing. I picked times that felt reasonable:

0 10 * * *  # 10 AM post
0 22 * * *  # 10 PM post

After 2 weeks, I looked at when my posts actually got read. Peak engagement windows for my audience (mostly US/EU devs):

Morning peak: 8-9 AM EST (1 PM UTC)
Evening peak: 7-9 PM EST (11 PM — 1 AM UTC)

My 10 AM Korean Standard Time post was hitting Dev.to at 1 AM UTC — exactly when my target audience is asleep. The 10 PM KST post landed at 1 PM UTC, which is actually decent.

So I rebuilt my schedule around reader timezones, not my timezone:

# schedule_optimizer.py
from datetime import datetime
import pytz

# Target audiences and their peak hours (UTC)
AUDIENCE_PEAKS = {
    'us_east': {'tz': 'America/New_York', 'peak_hours': [8, 9, 19, 20]},
    'eu_central': {'tz': 'Europe/Berlin', 'peak_hours': [9, 10, 18, 19]},
    'us_west': {'tz': 'America/Los_Angeles', 'peak_hours': [8, 9, 19, 20]},
}

def optimal_publish_time(primary_audience='us_east'):
    """Returns the best publish time in UTC for given audience"""
    config = AUDIENCE_PEAKS[primary_audience]
    tz = pytz.timezone(config['tz'])

    # Pick earliest peak hour that's still in the future today
    now_local = datetime.now(tz)

    for hour in config['peak_hours']:
        target = now_local.replace(hour=hour, minute=0, second=0)
        if target > now_local:
            return target.astimezone(pytz.utc)

    # If all today's windows passed, schedule for tomorrow's first peak
    tomorrow = now_local.replace(hour=config['peak_hours'][0], 
                                  minute=0, second=0)
    from datetime import timedelta
    return (tomorrow + timedelta(days=1)).astimezone(pytz.utc)

# Use it
publish_at = optimal_publish_time('us_east')
print(f"Schedule publish at: {publish_at.strftime('%Y-%m-%d %H:%M UTC')}")

Result: Average time to 50 readers dropped from 14 hours → 3.5 hours after shifting my schedule. Same content, better timing.

The Retry Logic I Should Have Built First

When a Dev.to API call fails, you have two bad options: give up or blindly retry. Both are wrong.

Here's the retry logic I built after getting rate-limited on Day 8:

# devto_deploy.py (excerpt — the part that actually matters)
import time
import requests
from typing import Optional

def publish_to_devto(
    api_key: str,
    title: str,
    body_markdown: str,
    tags: list[str],
    canonical_url: Optional[str] = None,
    max_retries: int = 3,
) -> dict:
    """
    Publish article to Dev.to with exponential backoff retry.
    Returns article data dict on success, raises on final failure.
    """
    url = "https://dev.to/api/articles"
    headers = {
        "api-key": api_key,
        "Content-Type": "application/json",
    }
    payload = {
        "article": {
            "title": title,
            "body_markdown": body_markdown,
            "published": True,
            "tags": tags[:4],  # Dev.to hard limit
        }
    }

    if canonical_url:
        payload["article"]["canonical_url"] = canonical_url

    for attempt in range(max_retries):
        try:
            resp = requests.post(url, json=payload, headers=headers, timeout=30)

            if resp.status_code == 201:
                data = resp.json()
                print(f"✅ Published: {data['url']}")
                return data

            elif resp.status_code == 429:
                # Rate limited — back off hard
                wait = 60 * (2 ** attempt)  # 60s, 120s, 240s
                print(f"⏳ Rate limited. Waiting {wait}s (attempt {attempt+1}/{max_retries})")
                time.sleep(wait)

            elif resp.status_code in (422, 400):
                # Validation error — no point retrying
                error = resp.json().get('error', resp.text)
                raise ValueError(f"Validation failed: {error}")

            else:
                wait = 10 * (2 ** attempt)
                print(f"⚠️ HTTP {resp.status_code}. Retrying in {wait}s...")
                time.sleep(wait)

        except requests.Timeout:
            wait = 15 * (2 ** attempt)
            print(f"⏱️ Timeout on attempt {attempt+1}. Waiting {wait}s...")
            time.sleep(wait)

    raise RuntimeError(f"Failed to publish after {max_retries} attempts")

The key insight: rate limit errors need aggressive backoff (60s+), while server errors need gentler backoff (10-15s). Treating them the same was why I kept hitting the same 429s in a loop.

What I Discovered About Jekyll + GitHub Actions

One thing that genuinely surprised me: my Jekyll build times were eating into my pipeline. Some posts were taking 4-6 minutes to deploy, which matters when you're doing 10+ posts/week.

The culprit: I was regenerating the entire site on every push. Fix:

# .github/workflows/jekyll.yml
- name: Build Jekyll
  run: |
    bundle exec jekyll build \
      --incremental \          # Only rebuild changed files
      --profile \              # Log what's slow
      --future                 # Include future-dated posts
  env:
    JEKYLL_ENV: production

--incremental alone cut my build time from 4.2 minutes → 47 seconds on average. That's a 5.4x speedup for one flag.

Even better — here's how to cache your gems between runs:

- name: Cache gems
  uses: actions/cache@v4
  with:
    path: vendor/bundle
    key: ${{ runner.os }}-gems-${{ hashFiles('**/Gemfile.lock') }}
    restore-keys: |
      ${{ runner.os }}-gems-

Combined effect: average deployment time went from 4.2 min → 38 seconds. At 10 posts/week, that's recovering ~5 hours/month in pipeline time.

The Metric I Should Have Tracked From Day 1

I obsessed over page views. Wrong metric.

The metric that actually matters: which posts generate Gumroad clicks.

After adding UTM parameters to every CTA and setting up a dead-simple tracking script:

# utm_generator.py — generate tracked CTA links
def make_cta_link(gumroad_product_slug: str, source_post_slug: str) -> str:
    """Generate a UTM-tagged Gumroad CTA link"""
    base = f"https://jacksonai.gumroad.com/{gumroad_product_slug}"
    params = {
        'utm_source': 'devto',
        'utm_medium': 'blog',
        'utm_campaign': 'blog-ops-series',
        'utm_content': source_post_slug,
    }
    query = '&'.join(f"{k}={v}" for k, v in params.items())
    return f"{base}?{query}"

# Usage
link = make_cta_link('content-pipeline-kit', 'blog-ops-2-30day-data')
# → https://jacksonai.gumroad.com/content-pipeline-kit?utm_source=devto&...

What I found: "I Built X" posts convert to Gumroad clicks at 3-4x the rate of tutorial posts, even with less total traffic. Fewer readers, more buyers. That's the metric that pays rent.

30-Day Summary: The Real Numbers

Let me put it all in one place:

Metric	Day 1	Day 30	Change
Pipeline success rate	62%	91%	+29pp
Avg deploy time	4.2 min	38 sec	-85%
Time to 50 readers	14 hrs	3.5 hrs	-75%
Gumroad CTR from posts	baseline	3.2x baseline	+220%

None of these wins came from writing better content. They all came from tuning the pipeline around data.

What's Next

The biggest remaining bottleneck: I'm still manually writing post drafts. The pipeline handles publishing, scheduling, and cross-posting — but the keyboard-hours are still high.

In Blog Ops #3, I'm going to document my experiment with using structured templates + AI-assisted drafting to cut draft time from 3 hours → under 45 minutes. I have 2 weeks of data on this already. Some of it is really good, some of it is a cautionary tale.

Grab the Full Pipeline Toolkit

Everything in this post — the validation script, retry logic, schedule optimizer, UTM generator — packaged into a ready-to-deploy kit.

If setting this up from scratch sounds painful, I wrapped it all into the Content Pipeline Starter Kit on Gumroad ($9). Copy the scripts, update your API key, deploy in 20 minutes.

🎁 Free: Python Debugging Cheat Sheet

Speaking of automation — the biggest time sink when things break is debugging. I benchmarked 3 approaches on the same bug: print() marathon hit 47 minutes, the right tool hit 3.

Download the Python Debugging Cheat Sheet (Free) — 7 techniques including breakpoint(), icecream, hunter, and faulthandler. PDF, copy-paste ready, just enter your email.

Built by Jackson Studio — shipping real tools for developers who build in public.

Next up: Blog Ops #3 — Cutting Draft Time by 75% (30 Days of Data)

DEV Community