<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Alfon</title>
    <description>The latest articles on DEV Community by Alfon (@codepurse).</description>
    <link>https://dev.to/codepurse</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F188074%2Fa7193e5c-156b-4267-8490-27bd84756332.jpeg</url>
      <title>DEV Community: Alfon</title>
      <link>https://dev.to/codepurse</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/codepurse"/>
    <language>en</language>
    <item>
      <title>Building an SEO crawler in TypeScript: what I learned</title>
      <dc:creator>Alfon</dc:creator>
      <pubDate>Thu, 28 May 2026 08:32:17 +0000</pubDate>
      <link>https://dev.to/codepurse/building-an-seo-crawler-in-typescript-what-i-learned-1doo</link>
      <guid>https://dev.to/codepurse/building-an-seo-crawler-in-typescript-what-i-learned-1doo</guid>
      <description>&lt;p&gt;I have been working on a project called &lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;SEOCore&lt;/a&gt;, which is an SEO crawler and audit CLI built with TypeScript.&lt;/p&gt;

&lt;p&gt;It is also my first public repository, so this project means a lot to me.&lt;/p&gt;

&lt;p&gt;Building it has been a mix of learning in public, solving real problems, making mistakes, and slowly improving things over time.&lt;/p&gt;

&lt;p&gt;I chose TypeScript for a simple reason: it is the language I am most familiar with.&lt;/p&gt;

&lt;p&gt;Since I already spend most of my time working with TypeScript, it felt like the right choice. I wanted to focus on building the crawler and the audit logic, not on learning a new language at the same time.&lt;/p&gt;

&lt;p&gt;What started as a small idea turned into a much bigger project than I expected.&lt;/p&gt;

&lt;p&gt;At first, I only wanted a tool that could crawl pages, check a few SEO basics, and show useful results in the terminal. But while building it, I kept finding more things I wanted to add.&lt;/p&gt;

&lt;p&gt;That is how the project slowly grew into something that can do much more than a basic crawl.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I started building it
&lt;/h2&gt;

&lt;p&gt;There are already many SEO tools out there, but I wanted to build something that felt more natural for developers.&lt;/p&gt;

&lt;p&gt;I wanted a tool that could:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;run from the command line&lt;/li&gt;
&lt;li&gt;fit into a normal Node.js workflow&lt;/li&gt;
&lt;li&gt;be easier to extend&lt;/li&gt;
&lt;li&gt;help debug technical SEO problems&lt;/li&gt;
&lt;li&gt;be useful for automation, not just manual checking&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also liked the idea of understanding how these tools work instead of only using them from the outside.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why TypeScript worked well
&lt;/h2&gt;

&lt;p&gt;Even though I picked TypeScript because I know it well, it also turned out to be a good fit for this kind of project.&lt;/p&gt;

&lt;p&gt;SEO audits deal with a lot of different kinds of data at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;HTML&lt;/li&gt;
&lt;li&gt;headers&lt;/li&gt;
&lt;li&gt;metadata&lt;/li&gt;
&lt;li&gt;links&lt;/li&gt;
&lt;li&gt;redirects&lt;/li&gt;
&lt;li&gt;structured data&lt;/li&gt;
&lt;li&gt;performance signals&lt;/li&gt;
&lt;li&gt;crawl rules&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That can get messy very quickly.&lt;/p&gt;

&lt;p&gt;TypeScript helped me keep the code more organized. It also made it easier to catch mistakes early and split the project into smaller parts as it grew.&lt;/p&gt;

&lt;p&gt;So the choice started from familiarity, but it ended up being practical too.&lt;/p&gt;

&lt;h2&gt;
  
  
  The crawler was only one part of the job
&lt;/h2&gt;

&lt;p&gt;One thing I learned early is that crawling pages is only the beginning.&lt;/p&gt;

&lt;p&gt;Fetching a page and following links is not the hardest part. The harder part is deciding what to do with the data after that.&lt;/p&gt;

&lt;p&gt;A useful audit tool needs to understand more than just status codes.&lt;/p&gt;

&lt;p&gt;It needs to look at things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;canonical tags&lt;/li&gt;
&lt;li&gt;headings&lt;/li&gt;
&lt;li&gt;meta titles and descriptions&lt;/li&gt;
&lt;li&gt;internal links&lt;/li&gt;
&lt;li&gt;redirects&lt;/li&gt;
&lt;li&gt;schema markup&lt;/li&gt;
&lt;li&gt;image issues&lt;/li&gt;
&lt;li&gt;page structure&lt;/li&gt;
&lt;li&gt;JavaScript-rendered content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That changed how I thought about the whole project.&lt;/p&gt;

&lt;p&gt;It stopped feeling like "just a crawler" and started feeling more like a small analysis engine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Keeping the CLI simple mattered a lot
&lt;/h2&gt;

&lt;p&gt;Another thing I learned is that even a useful tool becomes hard to use if the interface feels confusing.&lt;/p&gt;

&lt;p&gt;So I tried to keep the commands simple.&lt;/p&gt;

&lt;p&gt;For example, a basic audit can look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;seocore audit https://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And if I want to check how JavaScript changes the page for SEO, I can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;seocore js-impact https://example.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That may seem small, but clear commands make a huge difference.&lt;/p&gt;

&lt;p&gt;It also made me think more carefully about naming, output, and what people actually need when they use a CLI tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  SEO data gets noisy very fast
&lt;/h2&gt;

&lt;p&gt;This was probably one of the biggest lessons for me.&lt;/p&gt;

&lt;p&gt;It is easy to collect data.&lt;/p&gt;

&lt;p&gt;It is much harder to turn that data into something useful.&lt;/p&gt;

&lt;p&gt;A crawler can quickly generate too much output:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated warnings&lt;/li&gt;
&lt;li&gt;weak signals&lt;/li&gt;
&lt;li&gt;low-confidence guesses&lt;/li&gt;
&lt;li&gt;too many things that are technically true but not actually helpful&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That made me spend more time thinking about structure, scoring, filtering, and how to present the results in a clearer way.&lt;/p&gt;

&lt;p&gt;I think that became one of the most important parts of the project.&lt;/p&gt;

&lt;p&gt;Because in the end, better output is often more useful than more output.&lt;/p&gt;

&lt;h2&gt;
  
  
  JavaScript made things more interesting
&lt;/h2&gt;

&lt;p&gt;Modern websites made this project more challenging.&lt;/p&gt;

&lt;p&gt;A simple HTML check is still useful, but many pages now depend heavily on JavaScript. Sometimes the page that loads first is very different from what appears after rendering.&lt;/p&gt;

&lt;p&gt;Because of that, I added Playwright-based checks for deeper analysis.&lt;/p&gt;

&lt;p&gt;That made it possible to compare:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;raw HTML&lt;/li&gt;
&lt;li&gt;rendered DOM&lt;/li&gt;
&lt;li&gt;metadata before and after rendering&lt;/li&gt;
&lt;li&gt;links that only appear after JavaScript&lt;/li&gt;
&lt;li&gt;structured data added on the client side&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ended up being one of the parts I found most interesting, because it helps explain why a page may look fine in the browser but still have SEO problems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Building in public taught me a lot
&lt;/h2&gt;

&lt;p&gt;Since this is my first public repository, I also learned things that are not only about code.&lt;/p&gt;

&lt;p&gt;Publishing something in public feels different from building something only for yourself.&lt;/p&gt;

&lt;p&gt;You think more about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;project structure&lt;/li&gt;
&lt;li&gt;naming&lt;/li&gt;
&lt;li&gt;documentation&lt;/li&gt;
&lt;li&gt;how other people might use it&lt;/li&gt;
&lt;li&gt;how to keep improving it without making it too messy&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I am still learning that part, but I think it has already helped me become more careful and more practical as a developer.&lt;/p&gt;

&lt;h2&gt;
  
  
  A small note about AI
&lt;/h2&gt;

&lt;p&gt;I also want to be open about this: I used AI to help write some parts of the code in this project.&lt;/p&gt;

&lt;p&gt;I used AI mostly to speed up some repetitive parts, explore ideas faster, and help me move through certain implementation details. But I still review the code, test things, clean things up, and decide what stays in the project.&lt;/p&gt;

&lt;p&gt;Since this is my first public repo, I think it is better to be honest about that.&lt;/p&gt;

&lt;p&gt;For me, AI was a tool in the process, not a replacement for understanding the project.&lt;/p&gt;

&lt;h2&gt;
  
  
  There is more in the repo than I covered here
&lt;/h2&gt;

&lt;p&gt;I kept this first post simple on purpose.&lt;/p&gt;

&lt;p&gt;There are a lot of commands and features in the repo that I did not cover in this post. If you want to see more, feel free to visit the project and check the README:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/codepurse/SEOCORE" rel="noopener noreferrer"&gt;https://github.com/codepurse/SEOCORE&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the project looks interesting or useful, I would be very grateful for a star or a fork.&lt;/p&gt;

&lt;p&gt;And if you have an idea for a new feature, see something that can be improved, or want to help fix a bug, feel free to open an issue or create a PR. I would really appreciate that too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;Building this project taught me that making a crawler is not only about collecting pages.&lt;/p&gt;

&lt;p&gt;It is really about turning messy website data into something clear enough that people can use.&lt;/p&gt;

&lt;p&gt;TypeScript was the right choice for me because it is what I know best.&lt;/p&gt;

&lt;p&gt;And making this project public taught me just as much as the code itself.&lt;/p&gt;

&lt;p&gt;If you have built anything similar, or if you work on technical SEO tools, I would love to hear how you think about it.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>typescript</category>
      <category>node</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
