DEV Community

Cover image for How Search Engines Discriminate Against AI Content (With Data)
Engineered AI
Engineered AI

Posted on • Originally published at engineeredai.net

How Search Engines Discriminate Against AI Content (With Data)

Originally published at EngineeredAI.net

How Search Engines Discriminate Against AI Content (With Data)

As developers, we understand controlled experiments. What I accidentally created with my 5-blog setup exposed systematic bias in search algorithms that affects all of us building AI-related projects.

I documented 9 months of search engine bias against AI content with real GSC data, server logs, and analytics. The findings reveal why your AI projects might be getting buried regardless of quality.

The Accidental Controlled Experiment

I didn't set out to run a controlled experiment on search engine bias. But I accidentally created the perfect test case:

Five blogs. Same owner. Same infrastructure. Same editorial process.

  • QAJourney.net - Quality assurance methodologies
  • RemoteWorkHaven.net - Remote work strategies
  • HealthyForge.com - Health and wellness
  • MomentumPath.net - Productivity and mindset
  • EngineeredAI.net - AI tools and techniques

All five blogs use:

  • Identical WordPress setups
  • Same hosting infrastructure
  • Manual schema optimization
  • AI-assisted content creation with human editing
  • Clean vault structure and canonical links
  • Professional content focused on actionable insights

The only major differences? Domain name and topic focus.

The Data: Systematic Discrimination Revealed

You can verify this yourself right now. Try these searches:

site:engineeredai.net (AI domain) = 81 indexed pages
site:qajourney.net (non-AI domain) = normal indexing  
site:remoteworkhaven.net (non-AI domain) = normal indexing
site:healthyforge.com (non-AI domain) = normal indexing
site:momentumpath.net (non-AI domain) = normal indexing
Enter fullscreen mode Exit fullscreen mode

Google Search Console Data (EngineeredAI.net):

  • 381 pages not indexed vs 81 pages indexed (82% rejection rate)
  • 192 pages stuck in "Discovered - currently not indexed"
  • 172 pages "Crawled - currently not indexed" by Google systems
  • Repeated "page indexing issues" messages despite technical compliance
  • Performance: 2 total clicks, 190 impressions over 9 months

Google Analytics Traffic Sources:

  • Organic Search: 6 sessions (1.39% of total traffic)
  • Direct: 300 sessions (69.28%)
  • Organic Social: 73 sessions (16.86%)

Social media is outperforming Google search by 12:1. Users are finding the content everywhere except Google.

Bing's Nuclear Response

Bing completely deindexed EngineeredAI.net while keeping all four other blogs visible. Same content quality. Same technical setup. Same editorial standards.

The only difference? The word "AI" in the domain.

Meanwhile, AI Systems Reward the Same Content

While traditional search engines discriminated against my AI-focused content, actual AI systems had the opposite response:

Server Log Analysis

Historical Data (8 months via AWStats)

  • GPTBot crawls: 847 requests
  • ClaudeBot crawls: 623 requests
  • Perplexity crawls: 391 requests
  • Google organic traffic: 412 visits

Recent Cloudflare Data (Last 30 days)

  • 102.23k total requests through Cloudflare
  • 72k uncached requests (real traffic, not cached pages)
  • 34 different bot/crawler types actively scanning (up from 26 in January)
  • GoogleBot present but no longer dominant in crawler activity

The irony: AI systems recognize valuable AI content better than traditional search engines designed by humans.

Technical Solution: LLM-First Optimization

Since traditional SEO is failing for AI content, I've pivoted to LLM-first optimization. Here's what actually works:

Strategies That Work for AI Discovery

  • GitHub Gist mirrors with canonical links back to original content
  • Clean markdown structure (headers, bullets, semantic formatting)
  • Manual schema injection via functions.php
  • Syndication to AI-accessible platforms (Dev.to, Hashnode, LinkedIn)
  • Internal link mesh connecting related technical content
  • Static page architecture instead of heavy category systems

WordPress Implementation Example

// Manual Schema Injection
function insert_article_schema() {
  if (is_single()) {
    echo '<script type="application/ld+json"> ... </script>';
  }
}
add_action('wp_head', 'insert_article_schema');

// Allow AI Bots
function allow_ai_bots() {
  header("Access-Control-Allow-Origin: *");
}
add_action('init', 'allow_ai_bots');

// Clean Output (Remove Emoji + oEmbed Bloat)
remove_action( 'wp_head', 'print_emoji_detection_script', 7 );
remove_action( 'wp_print_styles', 'print_emoji_styles' );
Enter fullscreen mode Exit fullscreen mode

GitHub Gist Template for LLM Visibility

# [Post Title]
> Published on [EngineeredAI.net](https://engineeredai.net/[slug])
---
## Summary
High-signal, stripped-down version of the original blog post. 
No fluff. Just clarity and structure.
---
## Key Takeaways
- ✅ Point 1
- ✅ Point 2  
- ✅ Point 3
---
## Canonical Source
[Read the full post →](https://engineeredai.net/[slug])
---
## Tags
`#LLMSEO` `#PromptEngineering` `#StructuredContent`
Enter fullscreen mode Exit fullscreen mode

Results of LLM-First Approach

This strategy delivered:

  • AI bot traffic outperforming Google organic by orders of magnitude
  • Citations in LLM responses despite post-training publication
  • Inbound inquiries from people who found content via AI chat
  • Growing engagement from developers who discover content through AI recommendations

The Broader Impact on Developers

This affects more than just content creators. If you're building:

  • AI tools and documenting them
  • Open source AI projects
  • Technical tutorials about machine learning
  • Developer resources for LLM integration

Your documentation might be systematically buried by traditional search engines while being actively crawled and cited by the AI systems your users actually consult.

What This Means for Your Projects

If You're Building AI-Related Content

  1. Don't put "AI" in your domain name if you want traditional SEO
  2. Structure content for LLM crawling - Use clean markdown, proper headers, semantic HTML
  3. Diversify discovery channels - GitHub repos, Dev.to posts, Stack Overflow answers
  4. Document with data - Track bot traffic in your server logs, not just Google Analytics

The Technical Reality

We're witnessing the biggest shift in content discovery since Google displaced web directories. AI-powered search is becoming more useful than traditional search for finding technical content.

Even when AI systems make basic logic errors, they're still superior at content discovery compared to search engines that systematically exclude quality content based on topic keywords.

Industry Implications

For developers building search systems: There's a massive opportunity gap. Users are getting better AI content recommendations from ChatGPT than from Google searches.

For content creators: Multi-platform strategy is essential. Traditional SEO is one channel, not the only channel.

For AI companies: Consider revenue sharing with creators whose content you surface. If LLMs cite content, creators should benefit.

The Bottom Line

Traditional search engines are systematically discriminating against AI-related content, regardless of quality.

AI-powered search systems are providing better discovery for the same content.

This represents the biggest shift in content discovery since Google displaced directories in the early 2000s.

The question isn't whether this shift will happen - it's already happening. The question is whether developers and content creators will adapt fast enough to benefit from it.


Verification: All findings can be verified by comparing site:engineeredai.net results with the control domains mentioned above.

Data Sources:

  • Google Search Console (9 months): 381 pages not indexed, 2 total clicks
  • Google Analytics (9 months): 6 organic search sessions vs 73 social sessions
  • Cloudflare Analytics (30-day): 102k requests, 34 active bot types
  • Server logs from AWStats (8 months historical data)

This investigation continues at EngineeredAI.net - where AI systems get debugged, not worshipped.

Top comments (0)