<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rami Mamar</title>
    <description>The latest articles on DEV Community by Rami Mamar (@rami_mamar).</description>
    <link>https://dev.to/rami_mamar</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3944698%2F371ecb4b-1270-4194-9a0b-3ec7d3d5501d.png</url>
      <title>DEV Community: Rami Mamar</title>
      <link>https://dev.to/rami_mamar</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rami_mamar"/>
    <language>en</language>
    <item>
      <title>How a 14-character regex destroyed every comparison table in my AI content pipeline</title>
      <dc:creator>Rami Mamar</dc:creator>
      <pubDate>Thu, 21 May 2026 18:40:05 +0000</pubDate>
      <link>https://dev.to/rami_mamar/how-a-14-character-regex-destroyed-every-comparison-table-in-my-ai-content-pipeline-3280</link>
      <guid>https://dev.to/rami_mamar/how-a-14-character-regex-destroyed-every-comparison-table-in-my-ai-content-pipeline-3280</guid>
      <description>&lt;p&gt;I run an AI content pipeline that publishes 30 SEO articles a month. It has six phases: outline, write, fact-check, tone-QA, internal-link rewire, validate-and-repair. Every published article gets schema, FAQ, citations, the works.&lt;/p&gt;

&lt;p&gt;Last week I shipped 14 articles in one batch. Two of them had GFM markdown tables (the &lt;code&gt;| Tool | Pricing | Trade-off |&lt;/code&gt; kind). When the articles landed on the live site, the tables rendered as visual rubble:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| Perplexity   | Real-time research | Free + Pro     |
|, - |, - |, - |
| ChatGPT      | Long-form synthesis | Free + Plus    |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Look at the second row. That's supposed to be a GFM separator: &lt;code&gt;| --- | --- | --- |&lt;/code&gt;. Instead it's &lt;code&gt;|, - |, - |, - |&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The bug had been silently corrupting every comparison-table article for as long as the pipeline had been running. I just hadn't noticed because most articles don't emit tables.&lt;/p&gt;

&lt;p&gt;The fix is one line. The lesson is bigger.&lt;/p&gt;

&lt;h2&gt;
  
  
  What actually happened
&lt;/h2&gt;

&lt;p&gt;My pipeline has a phase called "tone-QA." It runs a deterministic regex scrubber against the draft before the LLM rewrite stage. The scrubber strips classic AI tells: em-dashes used as commas, "in today's landscape," "leverage," "robust," "delve into."&lt;/p&gt;

&lt;p&gt;The em-dash rule looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;–—&lt;/span&gt;&lt;span class="se"&gt;][&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;em/en-dash → comma&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That one is fine. It only matches actual em-dashes (&lt;code&gt;—&lt;/code&gt;, U+2014) or en-dashes (&lt;code&gt;–&lt;/code&gt;, U+2013).&lt;/p&gt;

&lt;p&gt;The bug is in the next rule:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*--&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;double-hyphen → comma&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Double-hyphen (&lt;code&gt;--&lt;/code&gt;) is a common ASCII em-dash stand-in. The regex matches &lt;code&gt;--&lt;/code&gt; with optional whitespace on either side, replaces with &lt;code&gt;,&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Now consider the GFM table separator row:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;| --- | --- | --- |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;---&lt;/code&gt; is three hyphens. The regex doesn't ask if the &lt;code&gt;--&lt;/code&gt; is followed by another &lt;code&gt;-&lt;/code&gt;. It just greedily matches the first two.&lt;/p&gt;

&lt;p&gt;Trace it on &lt;code&gt;| --- |&lt;/code&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Position 0: &lt;code&gt;|&lt;/code&gt; — no match (regex needs &lt;code&gt;--&lt;/code&gt; somewhere).&lt;/li&gt;
&lt;li&gt;Position 1: space — &lt;code&gt;[ \t]*&lt;/code&gt; consumes it (1 char). Then &lt;code&gt;--&lt;/code&gt; matches positions 2-3. Then &lt;code&gt;[ \t]*&lt;/code&gt; tries position 4 — that's &lt;code&gt;-&lt;/code&gt;, not whitespace, so it matches zero chars. Total match span: positions 1-3, which is &lt;code&gt;" --"&lt;/code&gt;. Replace with &lt;code&gt;", "&lt;/code&gt;. Result: &lt;code&gt;|, - |&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The trailing dash from &lt;code&gt;---&lt;/code&gt; stays. The leading two dashes get eaten. Multiply by five separator cells across a row, and the table separator becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;|, - |, - |, - |, - |, - |
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Markdown parsers see no separator. The whole table renders as broken paragraph text.&lt;/p&gt;

&lt;h2&gt;
  
  
  How I caught it (eventually)
&lt;/h2&gt;

&lt;p&gt;I caught it via a manual-LLM testing mode I'd added to the pipeline a session earlier. The idea: don't actually call any LLM API; instead write each phase's prompt to a JSON file, hand-author the response, drop it into the cache, and re-run. Lets you debug the full pipeline without burning tokens.&lt;/p&gt;

&lt;p&gt;When I ran one article through this manual mode, I saw the tone-QA phase's INPUT had a clean table separator (&lt;code&gt;| --- | --- |&lt;/code&gt;) and its OUTPUT had the corrupted one (&lt;code&gt;|, - |, - |&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Between input and output there's only one thing: the deterministic regex scrubber. The LLM gets called after, but the corruption was already done.&lt;/p&gt;

&lt;p&gt;Diff between two cached files in the cache directory. Five seconds to find the line.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fix
&lt;/h2&gt;

&lt;p&gt;Negative lookbehind + lookahead. Don't match &lt;code&gt;--&lt;/code&gt; if it's part of a &lt;code&gt;---&lt;/code&gt; run.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Was&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nl"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*--&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;double-hyphen → comma&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;// Now&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;(?&amp;lt;&lt;/span&gt;&lt;span class="sr"&gt;!-&lt;/span&gt;&lt;span class="se"&gt;)[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*--&lt;/span&gt;&lt;span class="se"&gt;(?!&lt;/span&gt;&lt;span class="sr"&gt;-&lt;/span&gt;&lt;span class="se"&gt;)[&lt;/span&gt;&lt;span class="sr"&gt; &lt;/span&gt;&lt;span class="se"&gt;\t]&lt;/span&gt;&lt;span class="sr"&gt;*/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;replacement&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;, &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;flag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;double-hyphen → comma&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verification cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Input&lt;/th&gt;
&lt;th&gt;Old output&lt;/th&gt;
&lt;th&gt;New output&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;`\&lt;/td&gt;
&lt;td&gt;--- \&lt;/td&gt;
&lt;td&gt;--- \&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;{% raw %}&lt;code&gt;before---after&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;before, -after&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;before---after&lt;/code&gt; ✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;`\&lt;/td&gt;
&lt;td&gt;----\&lt;/td&gt;
&lt;td&gt;` (4 dashes)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;word--word&lt;/code&gt; (real em-dash use)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;word, word&lt;/code&gt; ✓&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;word, word&lt;/code&gt; ✓&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;word -- word&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;word, word&lt;/code&gt; ✓&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;word, word&lt;/code&gt; ✓&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The lookbehind says "don't match if the previous character is a dash." The lookahead says "don't match if the next character is a dash." So &lt;code&gt;--&lt;/code&gt; only fires when it's standalone (or surrounded by whitespace), never inside a longer hyphen run.&lt;/p&gt;

&lt;p&gt;Tables survive. Real em-dash stand-ins still get scrubbed. Triple-dash separators (and quadruple, and the entire &lt;code&gt;---&lt;/code&gt; family) are now safe.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three lessons I'm taking forward
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Markdown is not a string
&lt;/h3&gt;

&lt;p&gt;Markdown contains structural delimiters that look like prose. My scrubber treated &lt;code&gt;---&lt;/code&gt; as ASCII punctuation when it was actually a table separator. The fix isn't just my one regex; it's a category. Anywhere the pipeline regex-mutates markdown, I now mentally check: could this match a delimiter token?&lt;/p&gt;

&lt;p&gt;Similar landmines I added test cases for after this bug:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;**bold**&lt;/code&gt; (asterisks used as emphasis)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;`code`&lt;/code&gt; (backticks)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[link](url)&lt;/code&gt; (the URL might contain characters my scrubber strips)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;&amp;gt; blockquote&lt;/code&gt; (the leading &lt;code&gt;&amp;gt;&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;# heading&lt;/code&gt; (the leading &lt;code&gt;#&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The principle: regex on markdown is regex on a parser input, not on prose.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Multi-step pipelines need step-level diffs
&lt;/h3&gt;

&lt;p&gt;I had been using the pipeline for months. The bug existed for months. I caught it in 90 seconds when I could inspect the input and output of each phase as separate files.&lt;/p&gt;

&lt;p&gt;Black-box pipelines hide their failures. The fix for that is mechanical: dump every step's I/O to disk when running in debug mode. The whole "manual LLM cache" pattern I'd built for cost reasons turned out to be a debugging multiplier I hadn't expected. A regex scrubbing bug between phase 4 and phase 5 was invisible from the published output but completely obvious in the on-disk diff.&lt;/p&gt;

&lt;p&gt;If you have any AI pipeline with more than two phases, build the I/O snapshot now. You'll find a bug.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Visual output requires visual checking
&lt;/h3&gt;

&lt;p&gt;Every other phase in the pipeline (factcheck, tone-QA structural rules, citability scoring) was producing telemetry numbers. Word counts. Citation match counts. Score breakdowns. The table corruption produced no numeric signal: the article still hit its word count, still had headings, still had a "key takeaways" block. Nothing flagged.&lt;/p&gt;

&lt;p&gt;Now my pipeline includes a structural-presence check: if a brief's &lt;code&gt;tableSpec&lt;/code&gt; was set, the final markdown is checked for a GFM table (&lt;code&gt;| ... |&lt;/code&gt; line followed by &lt;code&gt;| --- |&lt;/code&gt; separator). The validator surfaces &lt;code&gt;missingRequiredTable: true&lt;/code&gt; if it's absent.&lt;/p&gt;

&lt;p&gt;Three months of writing tables, none of them present in the published HTML, zero alerts. Visual outputs need visual checks.&lt;/p&gt;

&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;The bug is a one-liner. The lesson is that every regex I touch from now on lives next to a test case for the delimiters it could accidentally match.&lt;/p&gt;

&lt;p&gt;If you run an AI content pipeline of any kind, search your codebase for &lt;code&gt;--&lt;/code&gt;, em-dash, &lt;code&gt;—&lt;/code&gt;, &lt;code&gt;&amp;amp;mdash;&lt;/code&gt;. Make sure none of the scrubbers walks into a markdown table separator. You'll find one. Promise.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I run &lt;a href="https://seohive.io" rel="noopener noreferrer"&gt;SeoHive&lt;/a&gt;, a productized AI SEO platform. The bug was real, the articles are live with the fix in place, and the pipeline now publishes 30 articles a month without eating its own tables.&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>seo</category>
      <category>nextjs</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
