Alfon

Posted on May 28 • Edited on May 31

Building an SEO crawler in TypeScript: what I learned

#programming #typescript #node #opensource

I have been working on a project called SEOCore, which is an SEO crawler and audit CLI built with TypeScript.

This is actually an older project that I first built last year for my website, qlear.app.

Recently, I came back to it because I wanted to keep building it for my new website, and also make it useful for other developers who need a free and solid SEO analyzer.

It is also my first public repository, so this project means a lot to me.

Building it has been a mix of learning in public, solving real problems, making mistakes, and slowly improving things over time.

I chose TypeScript for a simple reason: it is the language I am most familiar with.

Since I already spend most of my time working with TypeScript, it felt like the right choice. I wanted to focus on building the crawler and the audit logic, not on learning a new language at the same time.

What started as a small idea turned into a much bigger project than I expected.

At first, I only wanted a tool that could crawl pages, check a few SEO basics, and show useful results in the terminal. But while building it, I kept finding more things I wanted to add.

That is how the project slowly grew into something that can do much more than a basic crawl.

Why I started building it

There are already many SEO tools out there, but I wanted to build something that felt more natural for developers.

I wanted a tool that could:

run from the command line
fit into a normal Node.js workflow
be easier to extend
help debug technical SEO problems
be useful for automation, not just manual checking

I also liked the idea of understanding how these tools work instead of only using them from the outside.

Why TypeScript worked well

Even though I picked TypeScript because I know it well, it also turned out to be a good fit for this kind of project.

SEO audits deal with a lot of different kinds of data at once:

HTML
headers
metadata
links
redirects
structured data
performance signals
crawl rules

That can get messy very quickly.

TypeScript helped me keep the code more organized. It also made it easier to catch mistakes early and split the project into smaller parts as it grew.

So the choice started from familiarity, but it ended up being practical too.

The crawler was only one part of the job

One thing I learned early is that crawling pages is only the beginning.

Fetching a page and following links is not the hardest part. The harder part is deciding what to do with the data after that.

A useful audit tool needs to understand more than just status codes.

It needs to look at things like:

canonical tags
headings
meta titles and descriptions
internal links
redirects
schema markup
image issues
page structure
JavaScript-rendered content

That changed how I thought about the whole project.

It stopped feeling like "just a crawler" and started feeling more like a small analysis engine.

Keeping the CLI simple mattered a lot

Another thing I learned is that even a useful tool becomes hard to use if the interface feels confusing.

So I tried to keep the commands simple.

For example, a basic audit can look like this:

seocore audit https://example.com

And if I want to check how JavaScript changes the page for SEO, I can run:

seocore js-impact https://example.com

That may seem small, but clear commands make a huge difference.

It also made me think more carefully about naming, output, and what people actually need when they use a CLI tool.

SEO data gets noisy very fast

This was probably one of the biggest lessons for me.

It is easy to collect data.

It is much harder to turn that data into something useful.

A crawler can quickly generate too much output:

repeated warnings
weak signals
low-confidence guesses
too many things that are technically true but not actually helpful

That made me spend more time thinking about structure, scoring, filtering, and how to present the results in a clearer way.

I think that became one of the most important parts of the project.

Because in the end, better output is often more useful than more output.

JavaScript made things more interesting

Modern websites made this project more challenging.

A simple HTML check is still useful, but many pages now depend heavily on JavaScript. Sometimes the page that loads first is very different from what appears after rendering.

Because of that, I added Playwright-based checks for deeper analysis.

That made it possible to compare:

raw HTML
rendered DOM
metadata before and after rendering
links that only appear after JavaScript
structured data added on the client side

This ended up being one of the parts I found most interesting, because it helps explain why a page may look fine in the browser but still have SEO problems.

Building in public taught me a lot

Since this is my first public repository, I also learned things that are not only about code.

Publishing something in public feels different from building something only for yourself.

You think more about:

project structure
naming
documentation
how other people might use it
how to keep improving it without making it too messy

I am still learning that part, but I think it has already helped me become more careful and more practical as a developer.

A small note about AI

I also want to be open about this: I used AI to help write some parts of the code in this project.

I used AI mostly to speed up some repetitive parts, explore ideas faster, and help me move through certain implementation details. But I still review the code, test things, clean things up, and decide what stays in the project.

Since this is my first public repo, I think it is better to be honest about that.

For me, AI was a tool in the process, not a replacement for understanding the project.

There is more in the repo than I covered here

I kept this first post simple on purpose.

There are a lot of commands and features in the repo that I did not cover in this post. If you want to see more, feel free to visit the project and check the README:

https://github.com/codepurse/SEOCORE

If the project looks interesting or useful, I would be very grateful for a star or a fork.

And if you have an idea for a new feature, see something that can be improved, or want to help fix a bug, feel free to open an issue or create a PR. I would really appreciate that too.

Final thoughts

Building this project taught me that making a crawler is not only about collecting pages.

It is really about turning messy website data into something clear enough that people can use.

TypeScript was the right choice for me because it is what I know best.

And making this project public taught me just as much as the code itself.

If you have built anything similar, or if you work on technical SEO tools, I would love to hear how you think about it.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.