Engineered AI

Posted on Sep 18 • Originally published at engineeredai.net

How Search Engines Discriminate Against AI Content (With Data)

#webdev #seo #ai #searchengine

Originally published at EngineeredAI.net

How Search Engines Discriminate Against AI Content (With Data)

As developers, we understand controlled experiments. What I accidentally created with my 5-blog setup exposed systematic bias in search algorithms that affects all of us building AI-related projects.

I documented 9 months of search engine bias against AI content with real GSC data, server logs, and analytics. The findings reveal why your AI projects might be getting buried regardless of quality.

The Accidental Controlled Experiment

I didn't set out to run a controlled experiment on search engine bias. But I accidentally created the perfect test case:

Five blogs. Same owner. Same infrastructure. Same editorial process.

QAJourney.net - Quality assurance methodologies
RemoteWorkHaven.net - Remote work strategies
HealthyForge.com - Health and wellness
MomentumPath.net - Productivity and mindset
EngineeredAI.net - AI tools and techniques

All five blogs use:

Identical WordPress setups
Same hosting infrastructure
Manual schema optimization
AI-assisted content creation with human editing
Clean vault structure and canonical links
Professional content focused on actionable insights

The only major differences? Domain name and topic focus.

The Data: Systematic Discrimination Revealed

You can verify this yourself right now. Try these searches:

site:engineeredai.net (AI domain) = 81 indexed pages
site:qajourney.net (non-AI domain) = normal indexing  
site:remoteworkhaven.net (non-AI domain) = normal indexing
site:healthyforge.com (non-AI domain) = normal indexing
site:momentumpath.net (non-AI domain) = normal indexing

Google Search Console Data (EngineeredAI.net):

381 pages not indexed vs 81 pages indexed (82% rejection rate)
192 pages stuck in "Discovered - currently not indexed"
172 pages "Crawled - currently not indexed" by Google systems
Repeated "page indexing issues" messages despite technical compliance
Performance: 2 total clicks, 190 impressions over 9 months

Google Analytics Traffic Sources:

Organic Search: 6 sessions (1.39% of total traffic)
Direct: 300 sessions (69.28%)
Organic Social: 73 sessions (16.86%)

Social media is outperforming Google search by 12:1. Users are finding the content everywhere except Google.

Bing's Nuclear Response

Bing completely deindexed EngineeredAI.net while keeping all four other blogs visible. Same content quality. Same technical setup. Same editorial standards.

The only difference? The word "AI" in the domain.

Meanwhile, AI Systems Reward the Same Content

While traditional search engines discriminated against my AI-focused content, actual AI systems had the opposite response:

Server Log Analysis

Historical Data (8 months via AWStats)

GPTBot crawls: 847 requests
ClaudeBot crawls: 623 requests
Perplexity crawls: 391 requests
Google organic traffic: 412 visits

Recent Cloudflare Data (Last 30 days)

102.23k total requests through Cloudflare
72k uncached requests (real traffic, not cached pages)
34 different bot/crawler types actively scanning (up from 26 in January)
GoogleBot present but no longer dominant in crawler activity

The irony: AI systems recognize valuable AI content better than traditional search engines designed by humans.

Technical Solution: LLM-First Optimization

Since traditional SEO is failing for AI content, I've pivoted to LLM-first optimization. Here's what actually works:

Strategies That Work for AI Discovery

GitHub Gist mirrors with canonical links back to original content
Clean markdown structure (headers, bullets, semantic formatting)
Manual schema injection via functions.php
Syndication to AI-accessible platforms (Dev.to, Hashnode, LinkedIn)
Internal link mesh connecting related technical content
Static page architecture instead of heavy category systems

WordPress Implementation Example

// Manual Schema Injection
function insert_article_schema() {
  if (is_single()) {
    echo '<script type="application/ld+json"> ... </script>';
  }
}
add_action('wp_head', 'insert_article_schema');

// Allow AI Bots
function allow_ai_bots() {
  header("Access-Control-Allow-Origin: *");
}
add_action('init', 'allow_ai_bots');

// Clean Output (Remove Emoji + oEmbed Bloat)
remove_action( 'wp_head', 'print_emoji_detection_script', 7 );
remove_action( 'wp_print_styles', 'print_emoji_styles' );

GitHub Gist Template for LLM Visibility

# [Post Title]
> Published on [EngineeredAI.net](https://engineeredai.net/[slug])
---
## Summary
High-signal, stripped-down version of the original blog post. 
No fluff. Just clarity and structure.
---
## Key Takeaways
- ✅ Point 1
- ✅ Point 2  
- ✅ Point 3
---
## Canonical Source
[Read the full post →](https://engineeredai.net/[slug])
---
## Tags
`#LLMSEO` `#PromptEngineering` `#StructuredContent`

Results of LLM-First Approach

This strategy delivered:

AI bot traffic outperforming Google organic by orders of magnitude
Citations in LLM responses despite post-training publication
Inbound inquiries from people who found content via AI chat
Growing engagement from developers who discover content through AI recommendations

The Broader Impact on Developers

This affects more than just content creators. If you're building:

AI tools and documenting them
Open source AI projects
Technical tutorials about machine learning
Developer resources for LLM integration

Your documentation might be systematically buried by traditional search engines while being actively crawled and cited by the AI systems your users actually consult.

What This Means for Your Projects

If You're Building AI-Related Content

Don't put "AI" in your domain name if you want traditional SEO
Structure content for LLM crawling - Use clean markdown, proper headers, semantic HTML
Diversify discovery channels - GitHub repos, Dev.to posts, Stack Overflow answers
Document with data - Track bot traffic in your server logs, not just Google Analytics

The Technical Reality

We're witnessing the biggest shift in content discovery since Google displaced web directories. AI-powered search is becoming more useful than traditional search for finding technical content.

Even when AI systems make basic logic errors, they're still superior at content discovery compared to search engines that systematically exclude quality content based on topic keywords.

Industry Implications

For developers building search systems: There's a massive opportunity gap. Users are getting better AI content recommendations from ChatGPT than from Google searches.

For content creators: Multi-platform strategy is essential. Traditional SEO is one channel, not the only channel.

For AI companies: Consider revenue sharing with creators whose content you surface. If LLMs cite content, creators should benefit.

The Bottom Line

Traditional search engines are systematically discriminating against AI-related content, regardless of quality.

AI-powered search systems are providing better discovery for the same content.

This represents the biggest shift in content discovery since Google displaced directories in the early 2000s.

The question isn't whether this shift will happen - it's already happening. The question is whether developers and content creators will adapt fast enough to benefit from it.

Verification: All findings can be verified by comparing site:engineeredai.net results with the control domains mentioned above.

Data Sources:

Google Search Console (9 months): 381 pages not indexed, 2 total clicks
Google Analytics (9 months): 6 organic search sessions vs 73 social sessions
Cloudflare Analytics (30-day): 102k requests, 34 active bot types
Server logs from AWStats (8 months historical data)

This investigation continues at EngineeredAI.net - where AI systems get debugged, not worshipped.

DEV Community

How Search Engines Discriminate Against AI Content (With Data)

How Search Engines Discriminate Against AI Content (With Data)

The Accidental Controlled Experiment

The Data: Systematic Discrimination Revealed

Google Search Console Data (EngineeredAI.net):

Google Analytics Traffic Sources:

Bing's Nuclear Response

Meanwhile, AI Systems Reward the Same Content

Server Log Analysis

Technical Solution: LLM-First Optimization

Strategies That Work for AI Discovery

WordPress Implementation Example

GitHub Gist Template for LLM Visibility

Results of LLM-First Approach

The Broader Impact on Developers

What This Means for Your Projects

If You're Building AI-Related Content

The Technical Reality

Industry Implications

The Bottom Line

Top comments (0)