I Built My Own Site Crawler

Carlos Arias — Sun, 05 Apr 2026 16:24:25 +0000

I lost my job last year.

So I did what most engineers do when they don’t have a safety net. I started taking on clients. Mostly in the legal space, doing SEO and digital marketing.

Very quickly, I noticed something stupid.

Every project had the same workflow:
Open the client’s website → copy content → paste into GPT → try to “train” it on their business. Over and over again.

No structure. No consistency. Just manual work pretending to be “AI-powered.”

So I looked for tools that could automate this. I found things like Firecrawl. On paper, it solves the problem.

In reality:
It gets expensive fast if you're doing this at scale
It’s not always reliable; And it’s still not really built for how people actually use LLMs day-to-day

Most of these tools feel like they were built for demos, not production workflows.

So I built my own crawler.

Not a “vibe coded” wrapper. An actual tool designed for one job:
Extract clean, structured content from websites so it can be used directly with LLMs.

No fluff. No unnecessary features. Just something that works.

This is the part most people don’t want to hear:
You can’t shortcut this with prompts and duct tape.

If you actually rely on LLMs in real workflows, you need proper data pipelines. Crawling, cleaning, structuring—that’s the real work. Everything else is just UI.

Anyway, I made it public.

It’s free for now. If I turn it into a SaaS, it won’t be another overpriced tool trying to charge you per page like you're running a data center.

If you’re doing anything with RAG, AI agents, or content pipelines, you’ll get it immediately.

My Project:
Carlos Arias - Site Crawler

DEV Community: Carlos Arias

I Built My Own Site Crawler