Originally published at EngineeredAI.net
How Search Engines Discriminate Against AI Content (With Data)
As developers, we understand controlled experiments. What I accidentally created with my 5-blog setup exposed systematic bias in search algorithms that affects all of us building AI-related projects.
I documented 9 months of search engine bias against AI content with real GSC data, server logs, and analytics. The findings reveal why your AI projects might be getting buried regardless of quality.
The Accidental Controlled Experiment
I didn't set out to run a controlled experiment on search engine bias. But I accidentally created the perfect test case:
Five blogs. Same owner. Same infrastructure. Same editorial process.
- QAJourney.net - Quality assurance methodologies
- RemoteWorkHaven.net - Remote work strategies
- HealthyForge.com - Health and wellness
- MomentumPath.net - Productivity and mindset
- EngineeredAI.net - AI tools and techniques
All five blogs use:
- Identical WordPress setups
- Same hosting infrastructure
- Manual schema optimization
- AI-assisted content creation with human editing
- Clean vault structure and canonical links
- Professional content focused on actionable insights
The only major differences? Domain name and topic focus.
The Data: Systematic Discrimination Revealed
You can verify this yourself right now. Try these searches:
site:engineeredai.net (AI domain) = 81 indexed pages
site:qajourney.net (non-AI domain) = normal indexing
site:remoteworkhaven.net (non-AI domain) = normal indexing
site:healthyforge.com (non-AI domain) = normal indexing
site:momentumpath.net (non-AI domain) = normal indexing
Google Search Console Data (EngineeredAI.net):
- 381 pages not indexed vs 81 pages indexed (82% rejection rate)
- 192 pages stuck in "Discovered - currently not indexed"
- 172 pages "Crawled - currently not indexed" by Google systems
- Repeated "page indexing issues" messages despite technical compliance
- Performance: 2 total clicks, 190 impressions over 9 months
Google Analytics Traffic Sources:
- Organic Search: 6 sessions (1.39% of total traffic)
- Direct: 300 sessions (69.28%)
- Organic Social: 73 sessions (16.86%)
Social media is outperforming Google search by 12:1. Users are finding the content everywhere except Google.
Bing's Nuclear Response
Bing completely deindexed EngineeredAI.net while keeping all four other blogs visible. Same content quality. Same technical setup. Same editorial standards.
The only difference? The word "AI" in the domain.
Meanwhile, AI Systems Reward the Same Content
While traditional search engines discriminated against my AI-focused content, actual AI systems had the opposite response:
Server Log Analysis
Historical Data (8 months via AWStats)
- GPTBot crawls: 847 requests
- ClaudeBot crawls: 623 requests
- Perplexity crawls: 391 requests
- Google organic traffic: 412 visits
Recent Cloudflare Data (Last 30 days)
- 102.23k total requests through Cloudflare
- 72k uncached requests (real traffic, not cached pages)
- 34 different bot/crawler types actively scanning (up from 26 in January)
- GoogleBot present but no longer dominant in crawler activity
The irony: AI systems recognize valuable AI content better than traditional search engines designed by humans.
Technical Solution: LLM-First Optimization
Since traditional SEO is failing for AI content, I've pivoted to LLM-first optimization. Here's what actually works:
Strategies That Work for AI Discovery
- GitHub Gist mirrors with canonical links back to original content
- Clean markdown structure (headers, bullets, semantic formatting)
- Manual schema injection via functions.php
- Syndication to AI-accessible platforms (Dev.to, Hashnode, LinkedIn)
- Internal link mesh connecting related technical content
- Static page architecture instead of heavy category systems
WordPress Implementation Example
// Manual Schema Injection
function insert_article_schema() {
if (is_single()) {
echo '<script type="application/ld+json"> ... </script>';
}
}
add_action('wp_head', 'insert_article_schema');
// Allow AI Bots
function allow_ai_bots() {
header("Access-Control-Allow-Origin: *");
}
add_action('init', 'allow_ai_bots');
// Clean Output (Remove Emoji + oEmbed Bloat)
remove_action( 'wp_head', 'print_emoji_detection_script', 7 );
remove_action( 'wp_print_styles', 'print_emoji_styles' );
GitHub Gist Template for LLM Visibility
# [Post Title]
> Published on [EngineeredAI.net](https://engineeredai.net/[slug])
---
## Summary
High-signal, stripped-down version of the original blog post.
No fluff. Just clarity and structure.
---
## Key Takeaways
- ✅ Point 1
- ✅ Point 2
- ✅ Point 3
---
## Canonical Source
[Read the full post →](https://engineeredai.net/[slug])
---
## Tags
`#LLMSEO` `#PromptEngineering` `#StructuredContent`
Results of LLM-First Approach
This strategy delivered:
- AI bot traffic outperforming Google organic by orders of magnitude
- Citations in LLM responses despite post-training publication
- Inbound inquiries from people who found content via AI chat
- Growing engagement from developers who discover content through AI recommendations
The Broader Impact on Developers
This affects more than just content creators. If you're building:
- AI tools and documenting them
- Open source AI projects
- Technical tutorials about machine learning
- Developer resources for LLM integration
Your documentation might be systematically buried by traditional search engines while being actively crawled and cited by the AI systems your users actually consult.
What This Means for Your Projects
If You're Building AI-Related Content
- Don't put "AI" in your domain name if you want traditional SEO
- Structure content for LLM crawling - Use clean markdown, proper headers, semantic HTML
- Diversify discovery channels - GitHub repos, Dev.to posts, Stack Overflow answers
- Document with data - Track bot traffic in your server logs, not just Google Analytics
The Technical Reality
We're witnessing the biggest shift in content discovery since Google displaced web directories. AI-powered search is becoming more useful than traditional search for finding technical content.
Even when AI systems make basic logic errors, they're still superior at content discovery compared to search engines that systematically exclude quality content based on topic keywords.
Industry Implications
For developers building search systems: There's a massive opportunity gap. Users are getting better AI content recommendations from ChatGPT than from Google searches.
For content creators: Multi-platform strategy is essential. Traditional SEO is one channel, not the only channel.
For AI companies: Consider revenue sharing with creators whose content you surface. If LLMs cite content, creators should benefit.
The Bottom Line
Traditional search engines are systematically discriminating against AI-related content, regardless of quality.
AI-powered search systems are providing better discovery for the same content.
This represents the biggest shift in content discovery since Google displaced directories in the early 2000s.
The question isn't whether this shift will happen - it's already happening. The question is whether developers and content creators will adapt fast enough to benefit from it.
Verification: All findings can be verified by comparing site:engineeredai.net
results with the control domains mentioned above.
Data Sources:
- Google Search Console (9 months): 381 pages not indexed, 2 total clicks
- Google Analytics (9 months): 6 organic search sessions vs 73 social sessions
- Cloudflare Analytics (30-day): 102k requests, 34 active bot types
- Server logs from AWStats (8 months historical data)
This investigation continues at EngineeredAI.net - where AI systems get debugged, not worshipped.
Top comments (0)