DEV Community

Cover image for LLM Content Engineering: How to Write for AI Search in 2026
Aribu js
Aribu js

Posted on • Originally published at shcho-i-yak.pp.ua

LLM Content Engineering: How to Write for AI Search in 2026

BLUF - Bottom Line Up Front (Part 2 of the GEO/SEO 2026 series)

  • Problem: Your robots.txt is open and AI bots are crawling your site - but ChatGPT and Perplexity ignore content written with old-school SEO filler.
  • Root cause: RAG pipelines rank sources by information density - the concentration of verified facts, numbers, and metrics per unit of text.
  • Solution: Three content engineering techniques: boost fact density → pack data into HTML tables → apply BLUF structure to every H2 section.
  • In this article: before/after examples, ready-to-use HTML table code with mobile responsiveness, and a pre-publish checklist.
  • Part 1 of the series: Technical GEO Architecture: robots.txt, JSON-LD, and Semantic HTML

Why an open robots.txt is no longer enough

In Part 1 of this series, we covered the technical foundation: opening AI crawlers, implementing JSON-LD schema, and switching to semantic HTML. But there's a problem that no amount of technical configuration can solve.

Imagine two competitors. Both are open to GPTBot and PerplexityBot. Both have Article and FAQPage schema. Yet Perplexity only cites one of them. Why?

The answer is in the content itself. RAG architecture loads the page HTML into a context window and runs scoring - evaluating how useful each sentence is for answering a user query. Sentences with concrete facts, numbers, and verified metrics score high. Sentences with filler score low or zero.


1. Information Density vs. SEO Filler: The Math of Getting Cited

Information density is the number of verified facts, concrete figures, and metrics per 100 words of text. LLMs are optimized to minimize hallucinations - they favor sources where every sentence carries maximum informational payload.

Before/After: The Same Idea, Two Different Densities

❌ Old-school SEO writing (low density):
"In today's digital world, it is critically important to improve
website performance, as page load speed can significantly impact
visitor behavior and their overall experience on the resource..."

✅ LLM-ready writing (high density):
"A 100ms increase in TTFB reduces conversion by 7% (Deloitte Digital, 2024).
RAG parsers skip domains with TTFB > 500ms due to context timeout -
a static Eleventy site achieves TTFB < 50ms and stays out of the discard pile."
Enter fullscreen mode Exit fullscreen mode

Both are ~30 words. The first contains 0 verified facts. The second contains 3 concrete metrics with a source. For LLM scoring, that difference is enormous.

The One-Fact-Per-200-Words Rule

An empirical benchmark for LLM-oriented content engineering: every 200 words should contain at least one verified element:

Element Type Example LLM Weight
Specific number with year "47% more often, 2025" ✅ High
Research reference "(Surfer SEO, 2025)" ✅ High
Technical metric "TTFB < 50ms" ✅ High
Version or release date "Firebase SDK 10.12.0" 🟡 Medium
Unsourced general claim "sites got faster" ❌ Low

2. Table Dominance: HTML vs. Markdown and Why It Matters

According to Surfer SEO research (2025), pages with HTML tables are cited in Perplexity 47% more often than pages with equivalent text-only content. The reason is technical - an LLM reads a <table> matrix as a token array with clear cell boundaries, making it far easier to extract specific values into a response.

Markdown Table vs. HTML: The Risk

❌ Markdown table - rendering depends on the SSG parser:
| Platform  | TTFB   | Price  |
|-----------|--------|--------|
| WordPress | 800ms  | $9/mo  |

If Eleventy (or another SSG) converts this to <table> - great.
If it stays as plain text - the LLM just sees a mess of pipes and dashes.

✅ HTML table - guaranteed parsing for any crawler:
<table>
  <thead><tr><th>Platform</th><th>TTFB</th><th>Price</th></tr></thead>
  <tbody>
    <tr><td>WordPress (shared)</td><td>800+ ms</td><td>$9/mo</td></tr>
    <tr><td>Eleventy + NVMe VPS</td><td>&lt;50ms</td><td>$5–10/mo</td></tr>
  </tbody>
</table>
Enter fullscreen mode Exit fullscreen mode

Mobile Responsiveness Without Breaking Parsability

The core dilemma: a full HTML table is ideal for LLMs, but on a 375px mobile screen it's a disaster. The solution is a CSS card transformation using data-label attributes:

/* Mobile: table → cards */
@media (max-width: 600px) {
  .cmp-table thead { display: none; }

  .cmp-table tr {
    display: block;
    border: 1px solid #30363d;
    border-radius: 6px;
    margin-bottom: 12px;
  }

  .cmp-table td {
    display: flex;
    justify-content: space-between;
    border-bottom: 1px solid #21262d;
    padding: 6px 10px;
  }

  /* Show column name before the value */
  .cmp-table td::before {
    content: attr(data-label);
    font-weight: 600;
    color: #8b949e;
    margin-right: 8px;
  }
}
Enter fullscreen mode Exit fullscreen mode
<!-- HTML with data-label: readable by both LLMs and mobile users -->
<tr>
  <td data-label="Platform">Eleventy + NVMe VPS</td>
  <td data-label="TTFB">&lt;50ms</td>
  <td data-label="Price">$5–10/mo</td>
</tr>
Enter fullscreen mode Exit fullscreen mode

The result: on desktop - a classic table that an LLM parses cleanly. On mobile - cards where every value is labeled. The semantic <table> structure is fully preserved - a crawler sees no difference.


3. Deep BLUF: Every H2 Section as a Self-Contained Semantic Node

In Part 1 we talked about BLUF at the article level. But a RAG pipeline often loads not the full page, but only the relevant fragment - the H2 section that most closely matches the user's query. This means every section needs to be a self-sufficient semantic node.

Anatomy of a GEO-Optimized H2 Section

<section>

  <!-- 1. Heading - a natural long-tail query -->
  <h2>Why HTML Tables Get 47% More Citations in Perplexity</h2>

  <!-- 2. BLUF sentence - a ready answer to the query intent -->
  <p>
    <strong>Short answer:</strong> LLMs read a &lt;table&gt; matrix as a
    structured token array - extracting a specific value from a cell
    is far more efficient than parsing the same fact from running text.
  </p>

  <!-- 3. Technical depth for the human reader -->
  <p>Detailed explanation, context, nuances...</p>

  <!-- 4. Structured data - gold for the parser -->
  <table>...</table>

  <!-- 5. Code or specific commands if applicable -->
  <pre><code>...</code></pre>

</section>
Enter fullscreen mode Exit fullscreen mode

Comparing Structures: Bad vs. Good H2 Section

Element ❌ Old SEO Approach ✅ GEO Approach
H2 Heading "Benefits of HTML Tables" "Why HTML Tables Get 47% More Citations"
First Paragraph "In this section we will explore..." One sentence with a ready answer + a number
Main Content Wall of keyword-stuffed text Text + table or code block
Section End Smooth transition to the next section Micro-conclusion or a specific next step

4. Hard E-E-A-T Signals That AI Cannot Fake

ChatGPT and Claude can instantly rewrite dry theory. But there's a class of content they physically cannot generate - live operational experience from a specific person in a specific environment.

Three Uniqueness Anchors for GEO

1. Real terminal output

# Live TTFB measurement - this result only exists on your server
$ curl -o /dev/null -s -w "TTFB: %{time_starttransfer}s\n" \
  https://your-site.com/posts/your-article/
TTFB: 0.048s

# For comparison - a typical WordPress on shared hosting
$ curl -o /dev/null -s -w "TTFB: %{time_starttransfer}s\n" \
  https://example-wp-blog.com/
TTFB: 0.923s
Enter fullscreen mode Exit fullscreen mode

An LLM cannot generate this stdout - it's unique and tied to your infrastructure at a specific point in time.

2. Specific bugs and environment quirks

Generic statements like "Eleventy is fast" can be rewritten by AI in seconds. But "when using eleventy-img with formats: ["avif", "webp"] and Sharp 0.33+, a memory leak occurs on Node.js 22 when processing > 200 images in parallel" - that's a specific bug in a specific environment. That level of detail is a primary-source signal for search algorithms.

3. Live operational metrics

Firebase Realtime Database - actual load from a personal blog
(May 2026, Spark Plan):

Monthly read operations:  47,832 / 400,000 limit (12%)
Storage:                  0.003 GB / 1 GB limit  (0.3%)
Peak traffic (comments):  8 concurrent connections / 100 limit
Monthly cost:             $0.00
Enter fullscreen mode Exit fullscreen mode

These numbers can't be convincingly fabricated - they're tied to a real project and verifiable through Firebase Console. This is exactly the content Google E-E-A-T marks as "Experience" and AI search cites as a primary source.


5. Pre-Publish Checklist

Before hitting publish, verify each item:

  • [ ] The introductory "watery" paragraph has been replaced with a BLUF sentence containing a concrete fact
  • [ ] Every 200 words contain at least one verified number or metric
  • [ ] Every comparison or feature set is formatted as an HTML <table>
  • [ ] Tables have data-label attributes for mobile CSS transformation
  • [ ] Every H2 section starts with a BLUF sentence (<strong>Short answer:</strong>)
  • [ ] At least one live experience element is present: terminal output, a versioned bug, a real metric
  • [ ] Links to primary sources (research, documentation, GitHub) are included
  • [ ] A FAQ section at the end with at least 3 naturally-phrased questions

FAQ

Do texts written for LLMs become dry and boring for human readers?

No - and here's why: humans don't read filler either. People scan text looking for concrete answers. The BLUF + facts + tables structure improves UX for humans just as much as for AI parsers. Storytelling and narrative remain - they're simply separated from technical blocks with clean semantic tags.

Should I ditch keyword optimization entirely in favor of fact density?

No, it's not either/or. Keywords are still needed for classic SEO and for AI parsers to find the page for a given query in the first place. But in 2026, having a keyword present is no longer enough to get cited - you also need high density of verified facts surrounding it.

How many tables is optimal for one article?

A good rule of thumb: one table per comparison or feature set - typically 2-4 tables in a full technical article. More tables than text signals to Google that the content is "thin." Fewer than one per 1,500 words is a missed opportunity for structured citation.

How do I check whether my article is being cited in ChatGPT or Perplexity?

The simplest method: ask Perplexity a question your article should answer and see if your domain shows up under "Sources." For ChatGPT Search - the same thing via the web interface with search enabled. Systematic tracking and automation of this process will be covered in detail in Part 4 of the series.


What's Next in the GEO/SEO 2026 Series

Part 3 - GEO Automation in Eleventy: how to auto-generate JSON-LD schema, BLUF blocks, and mobile tables through Nunjucks templates - so you never write micro-markup by hand for each new article.

Part 4 - Measuring the GEO Effect: how to track how often your site is cited in ChatGPT, Perplexity, and Gemini; which metrics replace classic CTR; and how to build a monitoring dashboard without paid tools.

Top comments (0)