DEV Community

Cover image for The Magento product description that AI can't read
Ievgenii Gryshkun
Ievgenii Gryshkun

Posted on • Originally published at angeo.dev

The Magento product description that AI can't read

This article was originally published on angeo.dev.

You installed robots.txt rules for GPTBot and OAI-SearchBot. You generated llms.txt. You added Product schema. The AEO audit is mostly green.

And ChatGPT still recommends your competitor.

There is a good chance the problem is not in any of those signals. It is in your product page itself — specifically in how Magento renders product content relative to what AI extraction systems actually process.


How your product page travels through an AI recommendation system

Magento HTML response
         ↓
AI crawler fetches raw HTML
(limited or no JavaScript rendering)
         ↓
Extraction layer parses text, headings, structured data
         ↓
Content chunking — splits page into retrievable units
         ↓
Relevance scoring — ranks chunks against query intent
         ↓
LLM answer synthesis — selects candidates for the response
         ↓
Citation / recommendation
Enter fullscreen mode Exit fullscreen mode

The extraction layer is where most Magento stores lose. If your product description is collapsed behind a tab, or rendered only after JS executes — your product may never become a strong candidate for prompts like:

  • "best trail running shoes under €150"
  • "comfortable sneakers for all-day walking"
  • "running shoes with visible air cushioning"

This is not an indexing problem. It is a retrieval problem.


Content visibility is not binary

For AI extraction systems, content falls into at least four levels:

Level How the content exists Extraction reliability
1 Server-rendered, early in HTML High
2 Server-rendered but hidden or collapsed Reduced — parser-dependent
3 Present in DOM only after JS execution Low
4 Loaded async after user interaction Very low

Googlebot runs a full headless Chrome pipeline — levels 2–4 are "indexable enough" for SEO. Most AI crawlers do not. Even where partial rendering exists, JS-dependent content is significantly less reliable for AI extraction than server-rendered HTML.


What actually happens on a default Magento product page

Magento 2's standard layout wraps descriptions inside a tabs widget (data-mage-init='{"tabs": {...}}'). In most Luma configs:

  • The description is in the server-rendered HTML — but marked inactive, collapsed, and positioned deep in the document
  • It appears after hundreds of characters of navigation markup, widget JSON, form keys, and boilerplate

Run this to see what AI actually sees:

curl -s "https://yourstore.com/your-product-url.html" \
  | python3 -c "
import sys, re
html = sys.stdin.read()
text = re.sub(r'<[^>]+>', ' ', html)
text = re.sub(r'\s+', ' ', text).strip()
print(text[:3000])
"
Enter fullscreen mode Exit fullscreen mode

What you want: product description text appearing clearly in the first 3000 characters.

What commonly appears: product name, price, SKU, breadcrumb, button text, widget boilerplate — and very little substantive content.


Three approaches to fix it

Approach 1 — Make the description tab active by default

Lowest effort. Sets the description tab as active on load so the content is not hidden at the CSS level.

<!-- app/design/frontend/YourVendor/YourTheme/Magento_Catalog/layout/catalog_product_view.xml -->
<referenceBlock name="product.info.details">
    <arguments>
        <argument name="config" xsi:type="array">
            <item name="settings" xsi:type="array">
                <item name="active" xsi:type="boolean">true</item>
            </item>
        </argument>
    </arguments>
</referenceBlock>
Enter fullscreen mode Exit fullscreen mode

Keeps tab UI intact for visitors. Moves content from Level 2 collapsed → Level 2 visible.


Approach 2 — Render description directly above tabs

Moves description to Level 1 — early in the HTML, clearly associated with the product, before any tabs infrastructure.

<!-- catalog_product_view.xml override in your theme -->
<referenceContainer name="product.info.main">
    <block class="Magento\Catalog\Block\Product\View\Description"
           name="product.description.above.tabs"
           template="YourVendor_Theme::product/description-visible.phtml"
           after="product.info.price"/>
</referenceContainer>
Enter fullscreen mode Exit fullscreen mode
<?php
// view/frontend/templates/product/description-visible.phtml
$description = $block->getProduct()->getData('description');
if (!$description) return;
?>
<section class="product-description-aeo" itemprop="description">
    <?= /* @noEscape */ $description ?>
</section>
Enter fullscreen mode Exit fullscreen mode

Keep the tabs further down for human navigation — description appears in both places, but the extraction-friendly version is unconditional.


Approach 3 — Render all tab content into the initial HTML

The most comprehensive fix. All tab content is in the initial HTML response; tabs widget hides non-active tabs via CSS only.

<!-- Override of Magento_Catalog::product/view/details.phtml -->
<div class="product-tabs" data-mage-init='{"tabs":{"active":0}}'>
    <?php foreach ($block->getGroupChildNames('detailed_info') as $alias): ?>
    <?php $childBlock = $block->getLayout()->getBlock($alias); ?>
    <?php if (!$childBlock) continue; ?>

    <div class="data item title">
        <a data-toggle="trigger" href="#tab-<?= $block->escapeHtmlAttr($alias) ?>">
            <?= $block->escapeHtml($childBlock->getTitle()) ?>
        </a>
    </div>

    <div class="data item content"
         id="tab-<?= $block->escapeHtmlAttr($alias) ?>"
         data-role="content">
        <?= $childBlock->toHtml() ?>
    </div>
    <?php endforeach; ?>
</div>
Enter fullscreen mode Exit fullscreen mode

AI parsers reading raw HTML see everything. Visitors see only the active tab.


Hyva Theme

Hyva uses Alpine.js x-show directives. Description is typically in the HTML — but structural noise around it reduces extraction clarity.

Most reliable fix: render the description once outside the Alpine component, as clean server-side HTML, early in the product layout.

<!-- In your Hyva theme's catalog_product_view.xml -->
<referenceContainer name="product.info.main">
    <block class="Magento\Catalog\Block\Product\View\Description"
           name="product.description.retrieval"
           template="YourVendor_Theme::product/description-server.phtml"
           after="product.info.price"/>
</referenceContainer>
Enter fullscreen mode Exit fullscreen mode

Server-side JSON-LD — same rule applies

If your JSON-LD is injected via GTM or hydrated client-side, AI extraction systems often don't see it.

Schema delivery Availability to AI parsers
Server-rendered in <head> Reliable
Inline in <body> before tabs Reliable
Injected by GTM after page load Often unavailable
Hydrated by JS framework on mount Often unavailable
Deferred script with async/defer Inconsistent

Verify:

curl -s "https://yourstore.com/your-product.html" \
  | grep -o '"@type": "Product"'
Enter fullscreen mode Exit fullscreen mode

No output = schema is client-side rendered.


What the difference looks like

Before — default Magento, description at Level 2, deep in document:

[~900 chars: navigation, breadcrumbs, price markup, button attributes,
 data-mage-init JSON, widget config, form keys, UI boilerplate]

 "description": "The Nike Air Max 90 features visible Air cushioning..."

[content buried, collapsed, structurally ambiguous]
Enter fullscreen mode Exit fullscreen mode

After — description at Level 1, early in document:

[~120 chars: minimal structural markup]

The Nike Air Max 90 is built for all-day comfort with visible Air
cushioning in the heel. Upper: leather and synthetic mesh. Weight: 310g.
Outsole: rubber, minimum 20% recycled content. Sizes: EU 38–47.

Key features:
- Visible Air unit in heel
- Foam midsole
- Waffle-pattern rubber outsole
Enter fullscreen mode Exit fullscreen mode

The second version is what an extraction system reads when deciding if your product is a strong candidate for "best running shoes with visible air cushioning."


Summary

Most AEO guides focus on what to add — llms.txt, schema, AI bot permissions. This problem is about how content is exposed: its position in the document, whether it is server-rendered, and whether AI extraction can reach it without JavaScript.

The fix in most cases takes under an hour. Layout XML overrides, no new modules, no API keys.

Run the curl test on your top-selling product now. If the description text is not readable in the first few thousand characters of the source — you know where the gap is.


The angeo/module-aeo-audit checks rendering visibility alongside robots.txt, llms.txt, schema, and six other AEO signals in one CLI command.

Full article with schema delivery comparison table and Hyva deep-dive: angeo.dev →

Top comments (0)