An Evening of Quirk Hunting in My AI Content Pipeline: 3 Bugs, 1…

#aiautomation #contentpipeline #validator #postmortem

Three different AI bugs in three hours

This evening, up to 18:00, everything was flowing nicely. My hourly content-generate cron had locked into its 24 posts/day rhythm. Then suddenly four crons in a row failed. Pipeline-health monitor hadn't yet sent an email because the 4-hour threshold hadn't been crossed, but the GitHub Actions panel was turning red.

I opened things up — three different reasons for the explosions. All of them small format quirks in Gemini's output. None of them an "AI gave the wrong answer"; all of them small YAML/character anomalies that the Astro Zod validator rejects without mercy.

This post is the story of how I hunted down the three bugs and ended up fixing all of them with a single auto-fixer.

Bug #1: Slash inside a tag, breaking the /tags route

The first failure message:

[ImageNotFound] route /tags/CI%2FCD g%C3%BCvenli%C4%9Fi failed
TypeError: Cannot read properties of undefined (reading 'split')

Astro has a /tags/[tag]/index.astro route. The tag string comes through as a URL parameter. The AI had produced "CI/CD güvenliği" in its tag list. Because / is interpreted as a separator in the URL, it splits into two segments — CI and CD güvenliği — the route can't be found, and the build blows up.

My first instinct was "tell the AI not to do that". Would adding "don't use slashes in tags" to the prompt fix it? No. Because the AI doesn't always follow prompt directives — one day it remembers, the next day it forgets. You have to think of it like insurance: even if the door's left open, the security system needs another step.

The fix went into the enforceFrontmatterLimits function:

const sanitizeItem = (s: string): string => {
  const m = s.match(/^(['"])([\s\S]*)\1$/);
  if (!m) return s;
  const cleaned = m[2]
    .replace(/[/\\?#]/g, '-')      // slash, backslash, ?, # → '-'
    .replace(/-{2,}/g, '-')         // collapse runs of dashes into one
    .trim();
  return `${m[1]}${cleaned}${m[1]}`;
};

/, \, ?, # are all URL-breaking characters. 'CI/CD güvenliği' becomes 'CI-CD güvenliği'. Build passes, URL is clean, all that's left is a small visual difference.

Bug #2: publishDate as a quoted string

An hour later, the second blow. The run showed up as "success" but when I looked at the content side, the file had been created empty — only the fallback PNG had been committed, the .mdx hadn't been written.

Generate log:

Post generation failed: Error: Frontmatter validation failed:
  publishDate invalid ""2026-05-01""

The reason for the double-quote-inside-double-quote is that the error message itself adds quotes. The actual value was "2026-05-01" — meaning the AI had wrapped the date in quotes.

From a YAML standpoint, both publishDate: 2026-05-01 and publishDate: "2026-05-01" are valid. Astro's Zod schema accepts either with z.coerce.date(). But my local validator does not. Validator regex:

if (!/^\d{4}-\d{2}-\d{2}/.test(publishDate))
  errors.push(`publishDate invalid "${publishDate}"`);

This regex starts at "2026-05-01, the first character is " (a quote), \d{4} doesn't match → fail. The validator rejects the file before it gets near git — that behavior is right, but a quoted date shouldn't be rejected, it should be cleaned up.

A new step in the same enforceFrontmatterLimits:

out = out.replace(/^(\s*publishDate:\s*)(.+)$/m, (_match, prefix, value: string) => {
  let cleaned = value.trim().replace(/^["'](.*?)["']\s*$/, '$1').trim();
  if (!/^\d{4}-\d{2}-\d{2}/.test(cleaned)) {
    if (opts.today) cleaned = opts.today;  // fallback today
  }
  return `${prefix}${cleaned}`;
});

If it's quoted, I strip the quotes. If it's broken, I override with today's date. Whatever the AI does, the frontmatter comes out valid.

Bug #3: Dotted-i in the filename, path mismatch

This was the sneakiest one. The validate step exploded with:

[ImageNotFound] Could not find requested image
  ../../../assets/blog/technology/uretımda-bir-sunucu-kmesinin-thundering-herd-krizine-mudahalesi.png

Look at this path: uretımda — there's a Turkish ı (dotless i). But the actual file on disk is named uretimda (ASCII i). So my slugify() function had normalized the title to ASCII and written the file correctly. But when the AI wrote the coverImage path into the frontmatter, it used the original Turkish character.

Two different paths:

Disk:        uretimda-bir-sunucu-kmesinin-...png    (ASCII)
Frontmatter: uretımda-bir-sunucu-kmesinin-...png   (Turkish ı)

Astro build says "no such file" — and it's right.

The fix is simple but important: force the frontmatter to sync with what's on disk. Generate-content.ts already knew the actual file name (relativeImagePath). I pass that value into enforceFrontmatterLimits as an option and have it override:

if (opts.expectedCoverImage) {
  const expected = opts.expectedCoverImage;
  out = out.replace(
    /^(\s*coverImage:\s*)(["'])([\s\S]*?)\2/m,
    (_match, prefix, quote, value: string) => {
      if (value !== expected) {
        console.log(`    [sanitize] coverImage "${value}" → "${expected}"`);
      }
      return `${prefix}${quote}${expected}${quote}`;
    }
  );
}

Whatever the AI writes, it gets replaced with the actual file name on disk. Insurance pattern.

A single consolidated normalizer

All three bugs share a pattern: the AI generates content that's 95% in the shape my schema wants, but a tiny detail slips through. The classic "strict validator, flexible generator" tension.

I changed my approach. Old way:

AI writes
Validator checks
If there are errors, reject — don't write the file

New way:

AI writes
Auto-fix first (correct known quirk patterns)
Validator checks (now only truly unfixable things remain)
If no errors, write the file

The logic of auto-fix: "the AI didn't do this on purpose, it was a small slip, I'm making a small correction and moving on". Soft normalization instead of hard fail.

One function, three quirks:

function enforceFrontmatterLimits(content: string, opts: EnforceOptions = {}): string {
  // Fix #1: URL-breaking characters inside tags
  // Fix #2: publishDate quotes + invalid format → today fallback
  // Fix #3: coverImage path → force expected path
  // ... and title/description char limits
}

💡 The normalizer pattern is gold for AI content

Don't assume the AI output fits your schema perfectly. Collect known
quirks in a single normalizer. Choosing fix over reject keeps the flow
smooth — when you hit 3 separate bugs in an hour, all of them live in one
file you can navigate easily.

When I woke up the next day

In the morning there was no pipeline-health "DEGRADED" mail. So yesterday's fixes had held — 24 posts had flowed through the pipe overnight. I checked the run history: 20 successes, 0 failures.

The real reason I'm writing this post: a week from now when another AI quirk shows up, my reflex should be "add a new quirk pattern", not "write a new try-catch". I'm sure everyone who builds AI-generated content sooner or later learns this, but I wanted to share how cheap that learning can be.

If I trust the AI, I have to trust insurance against the AI too.