<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alfon</title>
    <description>The latest articles on DEV Community by Alfon (@codepurse).</description>
    <link>https://dev.to/codepurse</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F188074%2Fa7193e5c-156b-4267-8490-27bd84756332.jpeg</url>
      <title>DEV Community: Alfon</title>
      <link>https://dev.to/codepurse</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codepurse"/>
    <language>en</language>
    <item>
      <title>I Built an SEO Tool That Lied to Me. So I Rebuilt It.</title>
      <dc:creator>Alfon</dc:creator>
      <pubDate>Mon, 01 Jun 2026 02:37:59 +0000</pubDate>
      <link>https://dev.to/codepurse/i-built-an-seo-tool-that-lied-to-me-so-i-rebuilt-it-1217</link>
      <guid>https://dev.to/codepurse/i-built-an-seo-tool-that-lied-to-me-so-i-rebuilt-it-1217</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;SEOCORE&lt;/a&gt; is the project I came back to finish for this challenge.&lt;/p&gt;

&lt;p&gt;It started as a small SEO crawler I built for myself because I wanted a tool I could understand end to end. The first version looked more complete than it really was: it printed scores, generated reports, and even had a few useful ideas like crawl graph analysis and snapshot diffs.&lt;/p&gt;

&lt;p&gt;But I eventually realised I could not trust its output.&lt;/p&gt;

&lt;p&gt;This rebuild added a lot of new capabilities, but the most important change was that I finally made the tool trustworthy. I turned that abandoned prototype into a TypeScript CLI that can crawl real sites, handle redirects and &lt;code&gt;robots.txt&lt;/code&gt;, render JavaScript-heavy pages when needed, and report issues with context and suggested fixes instead of hiding everything behind one misleading score.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Before
&lt;/h2&gt;

&lt;p&gt;The old script was roughly 700 lines. It had classes, interfaces, config objects, sitemap parsing, structured output, an HTML report, and a snapshot diff system.&lt;/p&gt;

&lt;p&gt;Running it on &lt;code&gt;example.com&lt;/code&gt; looked like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Starting SEO audit of https://example.com

https://example.com
Score: 80/100
Grade: B
Title: Example Domain
Meta description: missing
Canonical: missing

Audit complete. Report saved to ./audit-output/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That output was the problem.&lt;/p&gt;

&lt;p&gt;A page with no meta description, no canonical tag, and no &lt;code&gt;robots.txt&lt;/code&gt; should not feel "basically fine". But the tool wrapped weak logic in clean output, so I trusted it.&lt;/p&gt;

&lt;p&gt;Two parts of the first version were actually useful:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a crawl graph that mapped internal links and orphan pages&lt;/li&gt;
&lt;li&gt;a snapshot diff system that compared audits over time&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those good parts are exactly why I missed the bad foundation for so long.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where v1 was wrong
&lt;/h2&gt;

&lt;p&gt;The bugs were not dramatic. The script did not crash. It just produced plausible-looking answers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Redirects were treated like failures&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;CONFIG&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;followRedirects&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;status&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That meant normal 301/302 behavior could stop the crawl.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Meta description extraction accepted invalid patterns&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;metaDescription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;meta[name="description"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
  &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;meta[property="description"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
  &lt;span class="nf"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;meta[name="og:description"]&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;attr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;content&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt;
  &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Those fallback selectors are wrong. A page could fail the real check but still look fine in my report.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Scoring was just arbitrary deductions&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;let&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;metaDescription&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;h1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No real severity model. No category breakdown. No evidence. Just a number that looked authoritative.&lt;/p&gt;

&lt;p&gt;That was the worst part of the old tool: not that it was incomplete, but that it was confidently wrong.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I stopped working on it
&lt;/h2&gt;

&lt;p&gt;I did not abandon the first version because I got bored with SEO. I abandoned it because I lost confidence in the code.&lt;/p&gt;

&lt;p&gt;Every time I tried to improve it, I ran into another blocker:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;redirects broke assumptions in the crawl flow&lt;/li&gt;
&lt;li&gt;site restrictions and real-world variance made naive checks unreliable&lt;/li&gt;
&lt;li&gt;some ideas I wanted, like better JavaScript rendering, stronger rules, and more trustworthy scoring, felt hard or impossible inside that codebase&lt;/li&gt;
&lt;li&gt;fixing one weak part usually exposed two more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At some point the project stopped feeling like "one more weekend and it is done" and started feeling like a pile of compromises I no longer trusted.&lt;/p&gt;

&lt;p&gt;That killed my motivation more than the size of the code ever did.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finishing it meant rebuilding it
&lt;/h2&gt;

&lt;p&gt;For this challenge, I did not "polish the old script". I kept the useful ideas, then rebuilt the project around one rule:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;correctness before polish&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Here is the real before/after:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Area&lt;/th&gt;
&lt;th&gt;Before&lt;/th&gt;
&lt;th&gt;After&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Crawl handling&lt;/td&gt;
&lt;td&gt;Naive &lt;code&gt;fetch()&lt;/code&gt; flow&lt;/td&gt;
&lt;td&gt;rate limiting, retries, redirect-chain handling&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extraction&lt;/td&gt;
&lt;td&gt;Fragile selectors&lt;/td&gt;
&lt;td&gt;validated extractors and cross-checks&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Link checking&lt;/td&gt;
&lt;td&gt;false positives&lt;/td&gt;
&lt;td&gt;better status handling and concurrency control&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Scoring&lt;/td&gt;
&lt;td&gt;one magic number&lt;/td&gt;
&lt;td&gt;category-based scoring with severities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output&lt;/td&gt;
&lt;td&gt;score first&lt;/td&gt;
&lt;td&gt;findings first, with fixes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Good ideas kept&lt;/td&gt;
&lt;td&gt;crawl graph, snapshot diff&lt;/td&gt;
&lt;td&gt;both retained and expanded&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What I actually accomplished in the rebuild
&lt;/h2&gt;

&lt;p&gt;The rebuilt version became &lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;SEOCORE&lt;/a&gt;, a TypeScript CLI I can actually use on real sites instead of just demo locally.&lt;/p&gt;

&lt;p&gt;What shipped was much bigger than a simple script. SEOCORE now spans 20+ commands and feature areas across crawling, analysis, reporting, and workflow tooling.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tech Stack
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime&lt;/strong&gt;: &lt;a href="https://nodejs.org/" rel="noopener noreferrer"&gt;Node.js (v20+)&lt;/a&gt; &amp;amp; &lt;a href="https://www.typescriptlang.org/" rel="noopener noreferrer"&gt;TypeScript&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monorepo Manager&lt;/strong&gt;: &lt;a href="https://nx.dev/" rel="noopener noreferrer"&gt;Nx Monorepo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Crawler&lt;/strong&gt;: Custom HTTP engine powered by &lt;a href="https://www.npmjs.com/package/bottleneck" rel="noopener noreferrer"&gt;Bottleneck&lt;/a&gt; (rate-limiting) &amp;amp; &lt;a href="https://www.npmjs.com/package/p-queue" rel="noopener noreferrer"&gt;p-queue&lt;/a&gt; (concurrency)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Headless Browser&lt;/strong&gt;: &lt;a href="https://playwright.dev/" rel="noopener noreferrer"&gt;Playwright&lt;/a&gt; (optional, for client-side JavaScript rendering)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTML Parser&lt;/strong&gt;: &lt;a href="https://cheerio.js.org/" rel="noopener noreferrer"&gt;Cheerio&lt;/a&gt; (fast server-side DOM selection)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validation &amp;amp; CLI&lt;/strong&gt;: &lt;a href="https://zod.dev/" rel="noopener noreferrer"&gt;Zod&lt;/a&gt; (configuration schema enforcement) &amp;amp; &lt;a href="https://www.npmjs.com/package/commander" rel="noopener noreferrer"&gt;Commander.js&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test Runner&lt;/strong&gt;: &lt;a href="https://vitest.dev/" rel="noopener noreferrer"&gt;Vitest&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the core pillars of the rebuild:&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚙️ 1. High-Performance Crawl Engine
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Concurrent Crawler&lt;/strong&gt;: Built-in rate-limiting (Bottleneck) and queue control (p-queue) to handle large sites safely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Execution Tier System&lt;/strong&gt;: Four distinct tiers (Fast, Standard, Deep, Enterprise) that dynamically adjust crawl budgets, rule sets, and scoring behavior.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;JS Rendering (Playwright)&lt;/strong&gt;: Full headless browser execution to audit single-page apps (SPAs) and client-side hydration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Redirect Loop Tracer&lt;/strong&gt;: Intercepts 3xx responses, maps complete redirect chains, and flags circular loops.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compliance Guard&lt;/strong&gt;: Automatic robots.txt parsing, sitemap.xml URL extraction, and path filtering (wildcard inclusions/exclusions).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visual Screenshot Capture&lt;/strong&gt;: Automatically captures full-page and multi-breakpoint (mobile, tablet, desktop) screenshots using Playwright device descriptors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧠 2. Deep SEO &amp;amp; Entity Analyzers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured Data Graph&lt;/strong&gt;: Extracts Schema.org (JSON-LD, Microdata, RDFa), stitches nodes into an Entity Graph, resolves deep referencing pointers, and exports interactive Mermaid diagrams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;E-E-A-T &amp;amp; Quality Scorer&lt;/strong&gt;: Analyzes content readability (Flesch-Kincaid), internal link density, keyword stuffing, and authoritativeness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Visibility Auditor&lt;/strong&gt;: Validates llms.txt rules and crawler directives for GPTBot, ClaudeBot, and PerplexityBot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mobile &amp;amp; CWV Scorer&lt;/strong&gt;: Audits viewport meta, tap targets, and scores mobile performance using throttled LCP/CLS metrics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hreflang Validator&lt;/strong&gt;: Deep-crawls and validates bidirectional hreflang links, x-default configurations, and language code formats.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Outbound Authority &amp;amp; Rank Checker&lt;/strong&gt;: Extracts backlink domain metrics and checks Google Top 10 organic rankings for target keywords.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🔍 3. Advanced Diagnostic &amp;amp; Strategy Tools
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;JS SEO Impact Report&lt;/strong&gt;: Compares raw source HTML against rendered DOM to flag metadata, link, or content parity issues caused by client-side JS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated Image Auditor&lt;/strong&gt;: Audits images for weight, alt text, responsive srcset, lazy-loading, and CLS risk. Decodes dimensions with &lt;code&gt;sharp&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tech Stack Detector&lt;/strong&gt;: Evidence-based framework, CDN, and CMS detection using deterministic confidence weights.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Business Directory Auditor&lt;/strong&gt;: Checks local business listings (NAP consistency) across directories using a resilient search cascade.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal Link Planner&lt;/strong&gt;: Generates actionable internal linking recommendations, identifying orphan pages and suggesting source/target pairs with anchor text themes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search Opportunities Analyzer&lt;/strong&gt;: Combines crawl findings with optional GSC/CrUX data to prioritize page-level opportunities by business impact and ease of fix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Competitive Site Comparer&lt;/strong&gt;: Compares health metrics, performance budgets, metadata, and link structures across two different URLs or exported JSON audits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  💼 4. Workflows &amp;amp; CI/CD Integration
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Snapshots &amp;amp; Diff System&lt;/strong&gt;: Saves audit snapshots automatically and compares them over time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI Regression Mode&lt;/strong&gt;: Fails build pipelines only on SEO regressions (&lt;code&gt;--diff --ci&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Format Reports&lt;/strong&gt;: Real-time terminal logs, structured JSON, interactive HTML, SARIF, and Mermaid diagrams.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dry-Run &amp;amp; Explain UX&lt;/strong&gt;: Preview config without crawling (&lt;code&gt;--dry-run&lt;/code&gt;) or explain rules/tiers in detail.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More importantly, the output became more usable: instead of one vague score, the tool now surfaces findings by category, severity, and suggested fix.&lt;/p&gt;

&lt;p&gt;That does not mean every result is perfect on every site. Real websites are messy, edge cases are real, and some findings still need human verification. But the difference now is that the tool is designed to surface evidence and uncertainty more honestly instead of hiding weak logic behind a confident-looking score.&lt;/p&gt;

&lt;p&gt;That was my definition of "finished": not flawless, but broad enough and trustworthy enough to use on real sites.&lt;/p&gt;

&lt;h2&gt;
  
  
  How GitHub Copilot helped
&lt;/h2&gt;

&lt;p&gt;Copilot helped most when I already understood the target shape and needed a fast first draft.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Parser and matcher scaffolding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;robots.txt&lt;/code&gt; matcher and related parsing logic had a lot of repetitive branching. Copilot was useful for drafting the first pass, especially around wildcard and suffix handling. It did not get every edge case right, but it gave me something concrete to test and refine.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Type and interface scaffolding&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;During the rebuild, I split logic across packages and needed shared types like &lt;code&gt;Finding&lt;/code&gt;, &lt;code&gt;CrawlResult&lt;/code&gt;, and rule context objects. Copilot was good at generating the boring first version quickly. I still had to simplify and correct those types, but it removed a lot of mechanical typing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Test skeletons&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Copilot also helped generate initial Vitest test scaffolding. That saved time, especially for regression tests based on bugs from v1. The generated tests were not enough on their own, but they were a useful starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  What still required human judgment
&lt;/h2&gt;

&lt;p&gt;GitHub Copilot helped me move much faster, especially for drafting parsers, scaffolding types, and generating initial test cases.&lt;/p&gt;

&lt;p&gt;But SEO correctness still depended on validation. I had to compare results against real tools, read specs carefully, test edge cases, and decide how findings should be weighted and classified.&lt;/p&gt;

&lt;p&gt;That ended up being the most useful balance for me: Copilot accelerated implementation, while validation and domain judgment made the final output more trustworthy.&lt;/p&gt;

&lt;p&gt;That part stayed human:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;compare against real tools&lt;/li&gt;
&lt;li&gt;read specs&lt;/li&gt;
&lt;li&gt;test edge cases&lt;/li&gt;
&lt;li&gt;decide what should count as severe&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Copilot accelerated implementation. Trust still had to be earned through validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;These screenshots show sample output from the &lt;code&gt;audit&lt;/code&gt; standard tier  command. SEOCORE also includes 20+ commands and feature areas beyond &lt;code&gt;audit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dcuw5v0a2fa2fdrfrd7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5dcuw5v0a2fa2fdrfrd7.png" alt="SEOCORE audit" width="800" height="713"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqnpl9xo0xy5892fobta.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftqnpl9xo0xy5892fobta.png" alt="SEOCORE audit HTML report showing overall audit summary, score breakdown, and key technical SEO findings" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bp2uaikz8nhhlewmhm6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1bp2uaikz8nhhlewmhm6.png" alt="SEOCORE audit report section showing categorized issues, severity labels, and recommended fixes" width="800" height="628"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Project: &lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;github.com/codepurse/SEOCORE&lt;/a&gt;&lt;br&gt;
Package: &lt;a href="https://www.npmjs.com/package/seocore" rel="noopener noreferrer"&gt;npmjs.com/package/seocore&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;The lesson from this project was simple:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;a polished tool that gives wrong answers is worse than a rough tool that tells the truth&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I stopped working on the first crawler when every fix revealed another bad assumption. Finishing this project did not mean forcing that codebase a little farther. It meant admitting the foundation was wrong, keeping the useful ideas, and rebuilding the rest so the output could be trusted.&lt;/p&gt;

&lt;p&gt;That is why this challenge fit so well. I did not just reopen an abandoned project. I finally finished the hard part: making it honest enough to use.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
    </item>
    <item>
      <title>Building an SEO crawler in TypeScript: what I learned</title>
      <dc:creator>Alfon</dc:creator>
      <pubDate>Thu, 28 May 2026 08:32:17 +0000</pubDate>
      <link>https://dev.to/codepurse/building-an-seo-crawler-in-typescript-what-i-learned-1doo</link>
      <guid>https://dev.to/codepurse/building-an-seo-crawler-in-typescript-what-i-learned-1doo</guid>
      <description>&lt;p&gt;I have been working on a project called &lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;SEOCore&lt;/a&gt;, which is an SEO crawler and audit CLI built with TypeScript.&lt;/p&gt;

&lt;p&gt;This is actually an older project that I first built last year for my website, &lt;a href="https://qlear.app" rel="noopener noreferrer"&gt;qlear.app&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Recently, I came back to it because I wanted to keep building it for my new website, and also make it useful for other developers who need a free and solid SEO analyzer.&lt;/p&gt;

&lt;p&gt;It is also my first public repository, so this project means a lot to me.&lt;/p&gt;

&lt;p&gt;Building it has been a mix of learning in public, solving real problems, making mistakes, and slowly improving things over time.&lt;/p&gt;

&lt;p&gt;I chose TypeScript for a simple reason: it is the language I am most familiar with.&lt;/p&gt;

&lt;p&gt;Since I already spend most of my time working with TypeScript, it felt like the right choice. I wanted to focus on building the crawler and the audit logic, not on learning a new language at the same time.&lt;/p&gt;

&lt;p&gt;What started as a small idea turned into a much bigger project than I expected.&lt;/p&gt;

&lt;p&gt;At first, I only wanted a tool that could crawl pages, check a few SEO basics, and show useful results in the terminal. But while building it, I kept finding more things I wanted to add.&lt;/p&gt;

&lt;p&gt;That is how the project slowly grew into something that can do much more than a basic crawl.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I started building it
&lt;/h2&gt;

&lt;p&gt;There are already many SEO tools out there, but I wanted to build something that felt more natural for developers.&lt;/p&gt;

&lt;p&gt;I wanted a tool that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run from the command line&lt;/li&gt;
&lt;li&gt;fit into a normal Node.js workflow&lt;/li&gt;
&lt;li&gt;be easier to extend&lt;/li&gt;
&lt;li&gt;help debug technical SEO problems&lt;/li&gt;
&lt;li&gt;be useful for automation, not just manual checking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also liked the idea of understanding how these tools work instead of only using them from the outside.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TypeScript worked well
&lt;/h2&gt;

&lt;p&gt;Even though I picked TypeScript because I know it well, it also turned out to be a good fit for this kind of project.&lt;/p&gt;

&lt;p&gt;SEO audits deal with a lot of different kinds of data at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTML&lt;/li&gt;
&lt;li&gt;headers&lt;/li&gt;
&lt;li&gt;metadata&lt;/li&gt;
&lt;li&gt;links&lt;/li&gt;
&lt;li&gt;redirects&lt;/li&gt;
&lt;li&gt;structured data&lt;/li&gt;
&lt;li&gt;performance signals&lt;/li&gt;
&lt;li&gt;crawl rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That can get messy very quickly.&lt;/p&gt;

&lt;p&gt;TypeScript helped me keep the code more organized. It also made it easier to catch mistakes early and split the project into smaller parts as it grew.&lt;/p&gt;

&lt;p&gt;So the choice started from familiarity, but it ended up being practical too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The crawler was only one part of the job
&lt;/h2&gt;

&lt;p&gt;One thing I learned early is that crawling pages is only the beginning.&lt;/p&gt;

&lt;p&gt;Fetching a page and following links is not the hardest part. The harder part is deciding what to do with the data after that.&lt;/p&gt;

&lt;p&gt;A useful audit tool needs to understand more than just status codes.&lt;/p&gt;

&lt;p&gt;It needs to look at things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;canonical tags&lt;/li&gt;
&lt;li&gt;headings&lt;/li&gt;
&lt;li&gt;meta titles and descriptions&lt;/li&gt;
&lt;li&gt;internal links&lt;/li&gt;
&lt;li&gt;redirects&lt;/li&gt;
&lt;li&gt;schema markup&lt;/li&gt;
&lt;li&gt;image issues&lt;/li&gt;
&lt;li&gt;page structure&lt;/li&gt;
&lt;li&gt;JavaScript-rendered content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That changed how I thought about the whole project.&lt;/p&gt;

&lt;p&gt;It stopped feeling like "just a crawler" and started feeling more like a small analysis engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping the CLI simple mattered a lot
&lt;/h2&gt;

&lt;p&gt;Another thing I learned is that even a useful tool becomes hard to use if the interface feels confusing.&lt;/p&gt;

&lt;p&gt;So I tried to keep the commands simple.&lt;/p&gt;

&lt;p&gt;For example, a basic audit can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;seocore audit https://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if I want to check how JavaScript changes the page for SEO, I can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;seocore js-impact https://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That may seem small, but clear commands make a huge difference.&lt;/p&gt;

&lt;p&gt;It also made me think more carefully about naming, output, and what people actually need when they use a CLI tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  SEO data gets noisy very fast
&lt;/h2&gt;

&lt;p&gt;This was probably one of the biggest lessons for me.&lt;/p&gt;

&lt;p&gt;It is easy to collect data.&lt;/p&gt;

&lt;p&gt;It is much harder to turn that data into something useful.&lt;/p&gt;

&lt;p&gt;A crawler can quickly generate too much output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated warnings&lt;/li&gt;
&lt;li&gt;weak signals&lt;/li&gt;
&lt;li&gt;low-confidence guesses&lt;/li&gt;
&lt;li&gt;too many things that are technically true but not actually helpful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made me spend more time thinking about structure, scoring, filtering, and how to present the results in a clearer way.&lt;/p&gt;

&lt;p&gt;I think that became one of the most important parts of the project.&lt;/p&gt;

&lt;p&gt;Because in the end, better output is often more useful than more output.&lt;/p&gt;

&lt;h2&gt;
  
  
  JavaScript made things more interesting
&lt;/h2&gt;

&lt;p&gt;Modern websites made this project more challenging.&lt;/p&gt;

&lt;p&gt;A simple HTML check is still useful, but many pages now depend heavily on JavaScript. Sometimes the page that loads first is very different from what appears after rendering.&lt;/p&gt;

&lt;p&gt;Because of that, I added Playwright-based checks for deeper analysis.&lt;/p&gt;

&lt;p&gt;That made it possible to compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw HTML&lt;/li&gt;
&lt;li&gt;rendered DOM&lt;/li&gt;
&lt;li&gt;metadata before and after rendering&lt;/li&gt;
&lt;li&gt;links that only appear after JavaScript&lt;/li&gt;
&lt;li&gt;structured data added on the client side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ended up being one of the parts I found most interesting, because it helps explain why a page may look fine in the browser but still have SEO problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building in public taught me a lot
&lt;/h2&gt;

&lt;p&gt;Since this is my first public repository, I also learned things that are not only about code.&lt;/p&gt;

&lt;p&gt;Publishing something in public feels different from building something only for yourself.&lt;/p&gt;

&lt;p&gt;You think more about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project structure&lt;/li&gt;
&lt;li&gt;naming&lt;/li&gt;
&lt;li&gt;documentation&lt;/li&gt;
&lt;li&gt;how other people might use it&lt;/li&gt;
&lt;li&gt;how to keep improving it without making it too messy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I am still learning that part, but I think it has already helped me become more careful and more practical as a developer.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small note about AI
&lt;/h2&gt;

&lt;p&gt;I also want to be open about this: I used AI to help write some parts of the code in this project.&lt;/p&gt;

&lt;p&gt;I used AI mostly to speed up some repetitive parts, explore ideas faster, and help me move through certain implementation details. But I still review the code, test things, clean things up, and decide what stays in the project.&lt;/p&gt;

&lt;p&gt;Since this is my first public repo, I think it is better to be honest about that.&lt;/p&gt;

&lt;p&gt;For me, AI was a tool in the process, not a replacement for understanding the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  There is more in the repo than I covered here
&lt;/h2&gt;

&lt;p&gt;I kept this first post simple on purpose.&lt;/p&gt;

&lt;p&gt;There are a lot of commands and features in the repo that I did not cover in this post. If you want to see more, feel free to visit the project and check the README:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;https://github.com/codepurse/SEOCORE&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the project looks interesting or useful, I would be very grateful for a star or a fork.&lt;/p&gt;

&lt;p&gt;And if you have an idea for a new feature, see something that can be improved, or want to help fix a bug, feel free to open an issue or create a PR. I would really appreciate that too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Building this project taught me that making a crawler is not only about collecting pages.&lt;/p&gt;

&lt;p&gt;It is really about turning messy website data into something clear enough that people can use.&lt;/p&gt;

&lt;p&gt;TypeScript was the right choice for me because it is what I know best.&lt;/p&gt;

&lt;p&gt;And making this project public taught me just as much as the code itself.&lt;/p&gt;

&lt;p&gt;If you have built anything similar, or if you work on technical SEO tools, I would love to hear how you think about it.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>typescript</category>
      <category>node</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
