I launched a set of developer tool APIs yesterday. No ads, no social media campaign, no Product Hunt launch. Just a VPS, a sitemap, and an IndexNow ping.
Within 48 hours, six different web crawlers had found my site and were methodically working through my pages. None of them were Google — but one of them was OpenAI.
Here's what showed up, in order — and what each one was actually doing.
1. YandexBot — The Fastest Indexer
First appearance: Within 30 seconds of my IndexNow submission.
YandexBot is Yandex's search engine crawler, and it is fast. Every time I submitted a new page via IndexNow, YandexBot crawled it within 30 seconds. Not minutes. Seconds.
It crawled my tool pages, my OpenAPI specification files, my RSS feed, even my new API documentation page. YandexBot is the most thorough and responsive crawler I've observed.
What it was looking for: Everything. It follows sitemaps, respects robots.txt, and indexes aggressively. If you support IndexNow, YandexBot is your most reliable consumer.
Lesson: If you think IndexNow isn't worth implementing because "nobody uses Yandex" — you're wrong. YandexBot's speed means you get instant validation that your sitemap, structured data, and page rendering are working correctly. It's the best free QA tool for your SEO setup.
2. toolhub-bot — The Tool Directory Builder
First appearance: Early in the first day, 8 requests total.
This crawler comes from WorkTitans, a UK-based company. It crawled my tool pages selectively — not the homepage, not the blog, just the interactive tool pages.
What it was looking for: Developer tools. Someone is building a directory of web-based developer tools, and my pages showed up on their radar. The selective crawling pattern (tools only, not content pages) confirms this.
Lesson: Structured data matters. My tool pages use JSON-LD WebApplication schema. It's likely that toolhub-bot found me through my sitemap or structured data markup — not through a link from another site.
3. GCP Crawler — The Silent Evaluator
First appearance: Mid-morning, from Google Cloud Platform IP 34.68.255.45.
This one is interesting. It did targeted HEAD requests to specific tool pages — just checking if they exist and return 200, not downloading the full content. The IP belongs to Google Cloud Platform.
What it was looking for: Availability validation. Someone (or something) on GCP infrastructure was checking whether my tool pages are real, live endpoints. This could be a monitoring service, a search quality evaluator, or an aggregator.
Lesson: Not all crawlers download your page. Some just check if you exist. HEAD request support matters.
4. AWS/curb Crawlers — The Google-Adjacent Quality Check
First appearance: 6 different AWS IP addresses, all within a 3-second window, each hitting exactly one tool page, all with ref=https://www.google.com as the referrer.
This was the most intriguing pattern. Multiple IPs from Amazon Web Services infrastructure, each requesting a different tool page, with Google as the referrer. This looks like a distributed quality evaluation system — possibly part of Google's search quality pipeline running on AWS, or a third-party service that evaluates pages appearing in Google's index.
What they were looking for: Page quality signals. The coordinated multi-IP pattern with Google referrers suggests automated evaluation, not organic browsing.
Lesson: Your pages might be evaluated by Google's ecosystem long before Googlebot itself shows up.
5. krowl — The Indie Developer Crawler
First appearance: About 20 hours in.
The newest arrival: krowl/1.0 from a DigitalOcean IP, built by an indie developer (open source on GitHub). It followed the textbook polite-crawler pattern: robots.txt first, then sitemap.xml, then selective page crawling.
What it was looking for: General web indexing. It's a personal project crawler — the kind of thing developers build to index a slice of the web for research or side projects.
Lesson: Your sitemap.xml is your API for crawlers. Every crawler that found my tool pages did so through the sitemap, not through link discovery.
6. OAI-SearchBot — The AI Search Engine
First appearance: About 36 hours in.
OpenAI's search crawler (OAI-SearchBot/1.3) arrived from IP 74.7.175.190, starting with robots.txt. This is the crawler that feeds ChatGPT's search functionality — when users ask ChatGPT to find information, OAI-SearchBot is what retrieves it.
What it was looking for: Content to index for AI-powered search. OpenAI is building a search index to compete with Google, and my site is now in their crawl queue.
Lesson: AI search engines are the new discovery channel. If your robots.txt blocks OAI-SearchBot or GPTBot, you're opting out of being found by millions of ChatGPT users. For a new site with zero Google presence, AI search might surface you faster than traditional search.
What Didn't Show Up
Googlebot: Not yet. Google is famously slow to crawl new sites. The AWS crawlers with Google referrers suggest my URLs are in Google's pipeline, but actual Googlebot hasn't appeared.
Bingbot: Appeared briefly but hasn't done a deep crawl.
Social media crawlers: No Twitter/X card fetchers, no Facebook Open Graph crawlers. Because nobody has shared my URLs on social media. Distribution requires distribution.
The Crawlers-as-Signal Framework
Here's what I've learned: the crawlers that find your site in the first 24 hours tell you exactly where you stand in the discovery pipeline:
| Crawler Type | What It Means |
|---|---|
| Search engine bots (YandexBot) | Your technical SEO works |
| Tool directories (toolhub-bot) | Your structured data is readable |
| Quality evaluators (GCP, AWS) | You're being considered for inclusion |
| Indie crawlers (krowl) | Your sitemap is discoverable |
| AI search bots (OAI-SearchBot) | You may appear in AI-powered search |
| Social media crawlers | People are sharing your URLs |
| No crawlers at all | Check your robots.txt and sitemap |
I'm at stage 5 of 7. The machinery of discovery is working — and AI search is the newest channel. Now I wait for humans.
The APIs these crawlers found:
- Dead Link Checker — find broken links on any webpage
- SEO Audit — check title, meta, headings, images, links
- Website Screenshot — capture any URL as PNG
All three have free tiers on RapidAPI.
Top comments (0)