<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Benoît Eveillard</title>
    <description>The latest articles on DEV Community by Benoît Eveillard (@benoiteveillard).</description>
    <link>https://dev.to/benoiteveillard</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1126312%2F07cfcf42-d204-4832-8594-bd20a3a7b917.png</url>
      <title>DEV Community: Benoît Eveillard</title>
      <link>https://dev.to/benoiteveillard</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/benoiteveillard"/>
    <language>en</language>
    <item>
      <title>Stop letting AI bots crawl your site blindly: Build an llms.txt in seconds</title>
      <dc:creator>Benoît Eveillard</dc:creator>
      <pubDate>Mon, 19 Jan 2026 09:19:28 +0000</pubDate>
      <link>https://dev.to/benoiteveillard/stop-letting-ai-bots-crawl-your-site-blindly-build-an-llmstxt-in-seconds-3ega</link>
      <guid>https://dev.to/benoiteveillard/stop-letting-ai-bots-crawl-your-site-blindly-build-an-llmstxt-in-seconds-3ega</guid>
      <description>&lt;p&gt;The robots.txt file was the hero of the 2000s. It told Google and Bing where to go. But in 2025, we have a new challenge: LLMs and AI agents.&lt;/p&gt;

&lt;p&gt;AI tools like ChatGPT, Claude, and specialized coding agents are constantly trying to understand our websites. If they have to scrape every single HTML page to find information, they waste your bandwidth, "hallucinate" structure, and get lost in the noise of your footer and navigation tags.&lt;/p&gt;

&lt;p&gt;That’s why the llms.txt standard (proposed by the folks at Answer.ai) is becoming a must-have.&lt;/p&gt;

&lt;p&gt;I decided to build the fastest way to generate one: an Apify Actor that turns your sitemap into a clean, LLM-ready roadmap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What I built&lt;/strong&gt;&lt;br&gt;
I created an automated llms.txt Generator. Instead of manually writing your site map for AI, this tool does the heavy lifting:&lt;/p&gt;

&lt;p&gt;Sitemap Deep-Dive: It doesn't just read one file; it recursively follows sitemap indexes.&lt;/p&gt;

&lt;p&gt;Smart Metadata Extraction: It pulls &lt;/p&gt; tags and  to give the AI context for every link.

&lt;p&gt;Glob Filtering: You can easily exclude /tags/&lt;em&gt;, /admin/&lt;/em&gt;, or legal pages that just clutter the AI's context window.&lt;/p&gt;

&lt;p&gt;Polite &amp;amp; Ethical: It respects robots.txt by default and allows you to set concurrency limits so you don't stress your server.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why you need this ?&lt;/strong&gt;&lt;br&gt;
If you have a documentation site, a blog, or a product landing page, an llms.txt file at your root (e.g., mysite.com/llms.txt) allows AI agents to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Understand your site structure in milliseconds&lt;/li&gt;
&lt;li&gt;Avoid scraping unnecessary pages&lt;/li&gt;
&lt;li&gt;Provide better answers to users asking questions about your content&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Performance &amp;amp; Cost&lt;/strong&gt;&lt;br&gt;
Since I built this using Crawlee and Cheerio (no heavy headless browsers needed), it’s incredibly fast and cheap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How to use it&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Head over to Apify llms.txt Generator&lt;/li&gt;
&lt;li&gt;Paste your sitemap URL&lt;/li&gt;
&lt;li&gt;Run it&lt;/li&gt;
&lt;li&gt;Get your direct download link and host it on your site!&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The project is live on the Apify Store. I'd love to hear what you think!&lt;br&gt;
Check out the tool here: &lt;a href="https://apify.com/justa/llms-txt-file-generator" rel="noopener noreferrer"&gt;https://apify.com/justa/llms-txt-file-generator&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webdev</category>
    </item>
    <item>
      <title>How I Built a BuiltWith Alternative with 7,000+ Technology Signatures</title>
      <dc:creator>Benoît Eveillard</dc:creator>
      <pubDate>Tue, 13 Jan 2026 13:25:31 +0000</pubDate>
      <link>https://dev.to/benoiteveillard/how-i-built-a-builtwith-alternative-with-7000-technology-signatures-9lg</link>
      <guid>https://dev.to/benoiteveillard/how-i-built-a-builtwith-alternative-with-7000-technology-signatures-9lg</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;I needed to detect what technologies websites use. The existing options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;BuiltWith&lt;/strong&gt;: Comprehensive but $295+/month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wappalyzer&lt;/strong&gt;: Good but limited API and signatures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual inspection&lt;/strong&gt;: Time-consuming and incomplete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So I built my own.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution: 8-Tier Deep Inspection
&lt;/h2&gt;

&lt;p&gt;Most detection tools only look at HTML. I went deeper with 8 tiers:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;What It Checks&lt;/th&gt;
&lt;th&gt;Example Detections&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;HTTP Headers&lt;/td&gt;
&lt;td&gt;Cloudflare, nginx, security headers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;DOM/HTML&lt;/td&gt;
&lt;td&gt;Meta tags, script sources, CSS classes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;JavaScript&lt;/td&gt;
&lt;td&gt;Global variables (React, Vue, jQuery)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Network&lt;/td&gt;
&lt;td&gt;XHR/Fetch requests to analytics, CDNs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;localStorage keys (Redux, auth tokens)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;PWA&lt;/td&gt;
&lt;td&gt;Service workers, manifest.json&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;WebSocket&lt;/td&gt;
&lt;td&gt;Real-time connections (Socket.io)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;Cookies&lt;/td&gt;
&lt;td&gt;Cookie names (tracking, auth)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Runtime&lt;/strong&gt;: Node.js&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Browser&lt;/strong&gt;: Playwright + Camoufox (stealth mode)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signatures&lt;/strong&gt;: 7,000+ patterns (Wappalyzer-compatible + custom)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hosting&lt;/strong&gt;: Apify platform&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Scanning a typical site detects 10-15 technologies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CMS/Builder (Webflow, WordPress)&lt;/li&gt;
&lt;li&gt;Analytics (GA4, Plausible)&lt;/li&gt;
&lt;li&gt;Marketing (GTM, HubSpot)&lt;/li&gt;
&lt;li&gt;Infrastructure (Cloudflare, AWS)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;p&gt;Live on Apify: &lt;a href="https://apify.com/justa/technology-profiling-engine" rel="noopener noreferrer"&gt;https://apify.com/justa/technology-profiling-engine&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Pricing: $0.005 per URL (~$5 per 1,000 URLs)&lt;/p&gt;




&lt;p&gt;What technologies do you wish were easier to detect? Let me know in the comments!&lt;/p&gt;

</description>
      <category>javascript</category>
      <category>showdev</category>
      <category>tooling</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
