DEV Community: Alfon

I Built an SEO Tool That Lied to Me. So I Rebuilt It.

Alfon — Mon, 01 Jun 2026 02:37:59 +0000

This is a submission for the GitHub Finish-Up-A-Thon Challenge.

What I Built

SEOCORE is the project I came back to finish for this challenge.

It started as a small SEO crawler I built for myself because I wanted a tool I could understand end to end. The first version looked more complete than it really was: it printed scores, generated reports, and even had a few useful ideas like crawl graph analysis and snapshot diffs.

But I eventually realised I could not trust its output.

This rebuild added a lot of new capabilities, but the most important change was that I finally made the tool trustworthy. I turned that abandoned prototype into a TypeScript CLI that can crawl real sites, handle redirects and robots.txt, render JavaScript-heavy pages when needed, and report issues with context and suggested fixes instead of hiding everything behind one misleading score.

The Before

The old script was roughly 700 lines. It had classes, interfaces, config objects, sitemap parsing, structured output, an HTML report, and a snapshot diff system.

Running it on example.com looked like this:

Starting SEO audit of https://example.com

https://example.com
Score: 80/100
Grade: B
Title: Example Domain
Meta description: missing
Canonical: missing

Audit complete. Report saved to ./audit-output/

That output was the problem.

A page with no meta description, no canonical tag, and no robots.txt should not feel "basically fine". But the tool wrapped weak logic in clean output, so I trusted it.

Two parts of the first version were actually useful:

a crawl graph that mapped internal links and orphan pages
a snapshot diff system that compared audits over time

Those good parts are exactly why I missed the bad foundation for so long.

Where v1 was wrong

The bugs were not dramatic. The script did not crash. It just produced plausible-looking answers.

1. Redirects were treated like failures

const CONFIG = {
  followRedirects: false,
};

if (res.status >= 300 && res.status < 400) {
  return;
}

That meant normal 301/302 behavior could stop the crawl.

2. Meta description extraction accepted invalid patterns

const metaDescription =
  $('meta[name="description"]').attr('content') ||
  $('meta[property="description"]').attr('content') ||
  $('meta[name="og:description"]').attr('content') ||
  undefined;

Those fallback selectors are wrong. A page could fail the real check but still look fine in my report.

3. Scoring was just arbitrary deductions

let score = 100;
if (!page.title) score -= 15;
if (!page.metaDescription) score -= 10;
if (!page.h1) score -= 15;

No real severity model. No category breakdown. No evidence. Just a number that looked authoritative.

That was the worst part of the old tool: not that it was incomplete, but that it was confidently wrong.

Why I stopped working on it

I did not abandon the first version because I got bored with SEO. I abandoned it because I lost confidence in the code.

Every time I tried to improve it, I ran into another blocker:

redirects broke assumptions in the crawl flow
site restrictions and real-world variance made naive checks unreliable
some ideas I wanted, like better JavaScript rendering, stronger rules, and more trustworthy scoring, felt hard or impossible inside that codebase
fixing one weak part usually exposed two more

At some point the project stopped feeling like "one more weekend and it is done" and started feeling like a pile of compromises I no longer trusted.

That killed my motivation more than the size of the code ever did.

Finishing it meant rebuilding it

For this challenge, I did not "polish the old script". I kept the useful ideas, then rebuilt the project around one rule:

correctness before polish

Here is the real before/after:

Area	Before	After
Crawl handling	Naive `fetch()` flow	rate limiting, retries, redirect-chain handling
Extraction	Fragile selectors	validated extractors and cross-checks
Link checking	false positives	better status handling and concurrency control
Scoring	one magic number	category-based scoring with severities
Output	score first	findings first, with fixes
Good ideas kept	crawl graph, snapshot diff	both retained and expanded

What I actually accomplished in the rebuild

The rebuilt version became SEOCORE, a TypeScript CLI I can actually use on real sites instead of just demo locally.

What shipped was much bigger than a simple script. SEOCORE now spans 20+ commands and feature areas across crawling, analysis, reporting, and workflow tooling.

Tech Stack

Runtime: Node.js (v20+) & TypeScript
Monorepo Manager: Nx Monorepo
Crawler: Custom HTTP engine powered by Bottleneck (rate-limiting) & p-queue (concurrency)
Headless Browser: Playwright (optional, for client-side JavaScript rendering)
HTML Parser: Cheerio (fast server-side DOM selection)
Validation & CLI: Zod (configuration schema enforcement) & Commander.js
Test Runner: Vitest

These are the core pillars of the rebuild:

⚙️ 1. High-Performance Crawl Engine

Concurrent Crawler: Built-in rate-limiting (Bottleneck) and queue control (p-queue) to handle large sites safely.
Execution Tier System: Four distinct tiers (Fast, Standard, Deep, Enterprise) that dynamically adjust crawl budgets, rule sets, and scoring behavior.
JS Rendering (Playwright): Full headless browser execution to audit single-page apps (SPAs) and client-side hydration.
Redirect Loop Tracer: Intercepts 3xx responses, maps complete redirect chains, and flags circular loops.
Compliance Guard: Automatic robots.txt parsing, sitemap.xml URL extraction, and path filtering (wildcard inclusions/exclusions).
Visual Screenshot Capture: Automatically captures full-page and multi-breakpoint (mobile, tablet, desktop) screenshots using Playwright device descriptors.

🧠 2. Deep SEO & Entity Analyzers

Structured Data Graph: Extracts Schema.org (JSON-LD, Microdata, RDFa), stitches nodes into an Entity Graph, resolves deep referencing pointers, and exports interactive Mermaid diagrams.
E-E-A-T & Quality Scorer: Analyzes content readability (Flesch-Kincaid), internal link density, keyword stuffing, and authoritativeness.
AI Visibility Auditor: Validates llms.txt rules and crawler directives for GPTBot, ClaudeBot, and PerplexityBot.
Mobile & CWV Scorer: Audits viewport meta, tap targets, and scores mobile performance using throttled LCP/CLS metrics.
Hreflang Validator: Deep-crawls and validates bidirectional hreflang links, x-default configurations, and language code formats.
Outbound Authority & Rank Checker: Extracts backlink domain metrics and checks Google Top 10 organic rankings for target keywords.

🔍 3. Advanced Diagnostic & Strategy Tools

JS SEO Impact Report: Compares raw source HTML against rendered DOM to flag metadata, link, or content parity issues caused by client-side JS.
Dedicated Image Auditor: Audits images for weight, alt text, responsive srcset, lazy-loading, and CLS risk. Decodes dimensions with sharp.
Tech Stack Detector: Evidence-based framework, CDN, and CMS detection using deterministic confidence weights.
Business Directory Auditor: Checks local business listings (NAP consistency) across directories using a resilient search cascade.
Internal Link Planner: Generates actionable internal linking recommendations, identifying orphan pages and suggesting source/target pairs with anchor text themes.
Search Opportunities Analyzer: Combines crawl findings with optional GSC/CrUX data to prioritize page-level opportunities by business impact and ease of fix.
Competitive Site Comparer: Compares health metrics, performance budgets, metadata, and link structures across two different URLs or exported JSON audits.

💼 4. Workflows & CI/CD Integration

Snapshots & Diff System: Saves audit snapshots automatically and compares them over time.
CI Regression Mode: Fails build pipelines only on SEO regressions (--diff --ci).
Multi-Format Reports: Real-time terminal logs, structured JSON, interactive HTML, SARIF, and Mermaid diagrams.
Dry-Run & Explain UX: Preview config without crawling (--dry-run) or explain rules/tiers in detail.

More importantly, the output became more usable: instead of one vague score, the tool now surfaces findings by category, severity, and suggested fix.

That does not mean every result is perfect on every site. Real websites are messy, edge cases are real, and some findings still need human verification. But the difference now is that the tool is designed to surface evidence and uncertainty more honestly instead of hiding weak logic behind a confident-looking score.

That was my definition of "finished": not flawless, but broad enough and trustworthy enough to use on real sites.

How GitHub Copilot helped

Copilot helped most when I already understood the target shape and needed a fast first draft.

1. Parser and matcher scaffolding

The robots.txt matcher and related parsing logic had a lot of repetitive branching. Copilot was useful for drafting the first pass, especially around wildcard and suffix handling. It did not get every edge case right, but it gave me something concrete to test and refine.

2. Type and interface scaffolding

During the rebuild, I split logic across packages and needed shared types like Finding, CrawlResult, and rule context objects. Copilot was good at generating the boring first version quickly. I still had to simplify and correct those types, but it removed a lot of mechanical typing.

3. Test skeletons

Copilot also helped generate initial Vitest test scaffolding. That saved time, especially for regression tests based on bugs from v1. The generated tests were not enough on their own, but they were a useful starting point.

What still required human judgment

GitHub Copilot helped me move much faster, especially for drafting parsers, scaffolding types, and generating initial test cases.

But SEO correctness still depended on validation. I had to compare results against real tools, read specs carefully, test edge cases, and decide how findings should be weighted and classified.

That ended up being the most useful balance for me: Copilot accelerated implementation, while validation and domain judgment made the final output more trustworthy.

That part stayed human:

compare against real tools
read specs
test edge cases
decide what should count as severe

Copilot accelerated implementation. Trust still had to be earned through validation.

Demo

These screenshots show sample output from the audit standard tier command. SEOCORE also includes 20+ commands and feature areas beyond audit.

Project: github.com/codepurse/SEOCORE
Package: npmjs.com/package/seocore

Final thoughts

The lesson from this project was simple:

a polished tool that gives wrong answers is worse than a rough tool that tells the truth

I stopped working on the first crawler when every fix revealed another bad assumption. Finishing this project did not mean forcing that codebase a little farther. It meant admitting the foundation was wrong, keeping the useful ideas, and rebuilding the rest so the output could be trusted.

That is why this challenge fit so well. I did not just reopen an abandoned project. I finally finished the hard part: making it honest enough to use.

Building an SEO crawler in TypeScript: what I learned

Alfon — Thu, 28 May 2026 08:32:17 +0000

I have been working on a project called SEOCore, which is an SEO crawler and audit CLI built with TypeScript.

This is actually an older project that I first built last year for my website, qlear.app.

Recently, I came back to it because I wanted to keep building it for my new website, and also make it useful for other developers who need a free and solid SEO analyzer.

It is also my first public repository, so this project means a lot to me.

Building it has been a mix of learning in public, solving real problems, making mistakes, and slowly improving things over time.

I chose TypeScript for a simple reason: it is the language I am most familiar with.

Since I already spend most of my time working with TypeScript, it felt like the right choice. I wanted to focus on building the crawler and the audit logic, not on learning a new language at the same time.

What started as a small idea turned into a much bigger project than I expected.

At first, I only wanted a tool that could crawl pages, check a few SEO basics, and show useful results in the terminal. But while building it, I kept finding more things I wanted to add.

That is how the project slowly grew into something that can do much more than a basic crawl.

Why I started building it

There are already many SEO tools out there, but I wanted to build something that felt more natural for developers.

I wanted a tool that could:

run from the command line
fit into a normal Node.js workflow
be easier to extend
help debug technical SEO problems
be useful for automation, not just manual checking

I also liked the idea of understanding how these tools work instead of only using them from the outside.

Why TypeScript worked well

Even though I picked TypeScript because I know it well, it also turned out to be a good fit for this kind of project.

SEO audits deal with a lot of different kinds of data at once:

HTML
headers
metadata
links
redirects
structured data
performance signals
crawl rules

That can get messy very quickly.

TypeScript helped me keep the code more organized. It also made it easier to catch mistakes early and split the project into smaller parts as it grew.

So the choice started from familiarity, but it ended up being practical too.

The crawler was only one part of the job

One thing I learned early is that crawling pages is only the beginning.

Fetching a page and following links is not the hardest part. The harder part is deciding what to do with the data after that.

A useful audit tool needs to understand more than just status codes.

It needs to look at things like:

canonical tags
headings
meta titles and descriptions
internal links
redirects
schema markup
image issues
page structure
JavaScript-rendered content

That changed how I thought about the whole project.

It stopped feeling like "just a crawler" and started feeling more like a small analysis engine.

Keeping the CLI simple mattered a lot

Another thing I learned is that even a useful tool becomes hard to use if the interface feels confusing.

So I tried to keep the commands simple.

For example, a basic audit can look like this:

seocore audit https://example.com

And if I want to check how JavaScript changes the page for SEO, I can run:

seocore js-impact https://example.com

That may seem small, but clear commands make a huge difference.

It also made me think more carefully about naming, output, and what people actually need when they use a CLI tool.

SEO data gets noisy very fast

This was probably one of the biggest lessons for me.

It is easy to collect data.

It is much harder to turn that data into something useful.

A crawler can quickly generate too much output:

repeated warnings
weak signals
low-confidence guesses
too many things that are technically true but not actually helpful

That made me spend more time thinking about structure, scoring, filtering, and how to present the results in a clearer way.

I think that became one of the most important parts of the project.

Because in the end, better output is often more useful than more output.

JavaScript made things more interesting

Modern websites made this project more challenging.

A simple HTML check is still useful, but many pages now depend heavily on JavaScript. Sometimes the page that loads first is very different from what appears after rendering.

Because of that, I added Playwright-based checks for deeper analysis.

That made it possible to compare:

raw HTML
rendered DOM
metadata before and after rendering
links that only appear after JavaScript
structured data added on the client side

This ended up being one of the parts I found most interesting, because it helps explain why a page may look fine in the browser but still have SEO problems.

Building in public taught me a lot

Since this is my first public repository, I also learned things that are not only about code.

Publishing something in public feels different from building something only for yourself.

You think more about:

project structure
naming
documentation
how other people might use it
how to keep improving it without making it too messy

I am still learning that part, but I think it has already helped me become more careful and more practical as a developer.

A small note about AI

I also want to be open about this: I used AI to help write some parts of the code in this project.

I used AI mostly to speed up some repetitive parts, explore ideas faster, and help me move through certain implementation details. But I still review the code, test things, clean things up, and decide what stays in the project.

Since this is my first public repo, I think it is better to be honest about that.

For me, AI was a tool in the process, not a replacement for understanding the project.

There is more in the repo than I covered here

I kept this first post simple on purpose.

There are a lot of commands and features in the repo that I did not cover in this post. If you want to see more, feel free to visit the project and check the README:

https://github.com/codepurse/SEOCORE

If the project looks interesting or useful, I would be very grateful for a star or a fork.

And if you have an idea for a new feature, see something that can be improved, or want to help fix a bug, feel free to open an issue or create a PR. I would really appreciate that too.

Final thoughts

Building this project taught me that making a crawler is not only about collecting pages.

It is really about turning messy website data into something clear enough that people can use.

TypeScript was the right choice for me because it is what I know best.

And making this project public taught me just as much as the code itself.

If you have built anything similar, or if you work on technical SEO tools, I would love to hear how you think about it.