<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matteo Perino</title>
    <description>The latest articles on DEV Community by Matteo Perino (@matte97p).</description>
    <link>https://dev.to/matte97p</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1303945%2Ff86797b7-c27c-433f-8bed-4aa703210ad9.jpeg</url>
      <title>DEV Community: Matteo Perino</title>
      <link>https://dev.to/matte97p</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/matte97p"/>
    <language>en</language>
    <item>
      <title>Four small CLIs to make your site visible to AI engines</title>
      <dc:creator>Matteo Perino</dc:creator>
      <pubDate>Mon, 11 May 2026 17:58:23 +0000</pubDate>
      <link>https://dev.to/matte97p/four-small-clis-to-make-your-site-visible-to-ai-engines-eao</link>
      <guid>https://dev.to/matte97p/four-small-clis-to-make-your-site-visible-to-ai-engines-eao</guid>
      <description>&lt;p&gt;Most of the GEO/SEO tooling on the market right now reads like it was written to sell a course, not to solve a problem.&lt;/p&gt;

&lt;p&gt;So I wrote four tools instead.&lt;/p&gt;

&lt;p&gt;Four Node CLIs, zero runtime dependencies, MIT, each one does &lt;strong&gt;one thing&lt;/strong&gt;. They all live under the &lt;a href="https://www.npmjs.com/org/geosuite" rel="noopener noreferrer"&gt;&lt;code&gt;@geosuite&lt;/code&gt;&lt;/a&gt; scope on npm, and the source is at &lt;a href="https://github.com/TryGeoSuite" rel="noopener noreferrer"&gt;github.com/TryGeoSuite&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here's what they do, and the design call behind each one.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. &lt;code&gt;@geosuite/ai-crawler-bots&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; tells you whether GPTBot, ClaudeBot, PerplexityBot, and ~20 other AI crawlers can actually reach your site, and where the block is coming from when they can't.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @geosuite/ai-crawler-bots robots https://your-site.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The non-obvious part: when a request comes back &lt;code&gt;403&lt;/code&gt;, the result distinguishes between an &lt;strong&gt;edge&lt;/strong&gt; block (Cloudflare / CloudFront / Vercel / Akamai / Fastly / Netlify fingerprint in the response) and an &lt;strong&gt;origin&lt;/strong&gt; block (no such fingerprint — your application or web server). The remediation is different in each case: edge means flip a toggle in your CDN dashboard, origin means update a config.&lt;/p&gt;

&lt;p&gt;It also parses &lt;code&gt;robots.txt&lt;/code&gt; with line-level provenance, so when a bot is &lt;code&gt;Disallow&lt;/code&gt;ed it tells you &lt;em&gt;which line in which group&lt;/em&gt; did it. And it detects the &lt;code&gt;# BEGIN Cloudflare Managed content&lt;/code&gt; … &lt;code&gt;# END Cloudflare Managed Content&lt;/code&gt; markers Cloudflare injects when "Block AI Bots" is enabled — if your own rules would have allowed the bot but the managed block disallows it, the report says so.&lt;/p&gt;

&lt;p&gt;UA strings come from operator docs, not third-party SEO blogs that copy each other. We don't accept entries without a docs link.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqun29nyiuz3i6spcj12.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqun29nyiuz3i6spcj12.png" alt="ai-crawler-bots" width="800" height="459"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  2. &lt;code&gt;@geosuite/schema-templates&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; ships 23 copy-paste-ready schema.org JSON-LD templates plus an offline structural validator.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @geosuite/schema-templates list
npx @geosuite/schema-templates show Product
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JSON-LD is the cheapest, least ambiguous signal you can give an AI assistant about what your page is. It will not on its own make ChatGPT cite you — authority and freshness still matter — but it removes a class of avoidable failures. The AI no longer has to guess your prices, your author, or whether a number on the page is a benchmark or a typo.&lt;/p&gt;

&lt;p&gt;I deliberately excluded fields that aren't truly recommended for each type. Padding templates with every optional schema.org property dilutes the signal. If you need a field that's not there, schema.org is the source of truth — add it yourself.&lt;/p&gt;

&lt;p&gt;There's also &lt;code&gt;geosuite-schema fill &amp;lt;Type&amp;gt; --url &amp;lt;url&amp;gt; --ai&lt;/code&gt; if you want the LLM to populate placeholders from a real page, but the deterministic side (templates + validator) does not need a network or an API key.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i6rbry75kcw2o3widfg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i6rbry75kcw2o3widfg.png" alt="schema-templates" width="800" height="738"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  3. &lt;code&gt;@geosuite/llms-txt-generator&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; turns a &lt;code&gt;sitemap.xml&lt;/code&gt; into an &lt;code&gt;llms.txt&lt;/code&gt; file per the proposed standard at &lt;a href="https://llmstxt.org/" rel="noopener noreferrer"&gt;llmstxt.org&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @geosuite/llms-txt-generator https://your-site.com/sitemap.xml &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Your Site"&lt;/span&gt; &lt;span class="nt"&gt;--enrich&lt;/span&gt; &lt;span class="nt"&gt;--out&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;public/llms.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;llms.txt&lt;/code&gt; is intended to be the LLM-shaped equivalent of a sitemap: a curated, sectioned, markdown index of your most important pages. The format is small enough to be parsed by classical tooling (regex) and also legible to a model — that's the point.&lt;/p&gt;

&lt;p&gt;The generator is deterministic. With &lt;code&gt;--enrich&lt;/code&gt; it fetches each URL once and pulls &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; + &lt;code&gt;&amp;lt;meta name="description"&amp;gt;&lt;/code&gt; via regex. No headless browser, no LLM dependency in the default path. (&lt;code&gt;--ai&lt;/code&gt; is opt-in if you want the LLM to rewrite descriptions; we send only URL + title + meta, never the page body.)&lt;/p&gt;

&lt;p&gt;Sitemap-index files are flattened automatically. Pass them like a flat sitemap.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fr81zlomq7gs5lftpaf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4fr81zlomq7gs5lftpaf.png" alt="llms-txt-generator" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  4. &lt;code&gt;@geosuite/sitemap-builder&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What it does:&lt;/strong&gt; crawls a site and emits a valid &lt;code&gt;sitemap.xml&lt;/code&gt;. For sites that ship without one (more common than you'd think on custom builds).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx @geosuite/sitemap-builder https://your-site.com &lt;span class="nt"&gt;--output&lt;/span&gt; sitemap.xml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;BFS, same-origin only, three caps stack: page count, depth, wall-clock budget. Whichever fires first wins. Drops obvious non-HTML extensions and fragment-only links. Output is &lt;code&gt;sitemaps.org&lt;/code&gt;-compliant — &lt;code&gt;&amp;lt;loc&amp;gt;&lt;/code&gt; plus optional &lt;code&gt;&amp;lt;lastmod&amp;gt;&lt;/code&gt;, no &lt;code&gt;&amp;lt;changefreq&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;priority&amp;gt;&lt;/code&gt; (deprecated, ignored by every major engine).&lt;/p&gt;

&lt;p&gt;Whole tool is around 250 lines of vanilla Node. No &lt;code&gt;puppeteer&lt;/code&gt;, no &lt;code&gt;cheerio&lt;/code&gt;, no &lt;code&gt;axios&lt;/code&gt;. Just &lt;code&gt;node:http&lt;/code&gt;, &lt;code&gt;node:https&lt;/code&gt;, and a few regexes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2jn0yj0tl3p5zoupkro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe2jn0yj0tl3p5zoupkro.png" alt="sitemap-builder" width="800" height="872"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The design choices, all in one place
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Zero runtime dependencies.&lt;/strong&gt; The four packages combined add ~0 install footprint to your project. The only exception is &lt;code&gt;llms-txt-generator&lt;/code&gt;, which depends on &lt;code&gt;fast-xml-parser&lt;/code&gt; for the sitemap-index path because writing your own XML parser is a footgun.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI mode is opt-in.&lt;/strong&gt; Every CLI has a &lt;code&gt;--ai&lt;/code&gt; flag. Without it, behaviour is fully deterministic. With it, payloads are minimal and structured (verdicts, titles, depths) — never raw HTML or page bodies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One tool, one job.&lt;/strong&gt; Composable via stdout/JSON. If you want to chain &lt;code&gt;sitemap-builder&lt;/code&gt; into &lt;code&gt;llms-txt-generator&lt;/code&gt;, that's a single pipe.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Boring code.&lt;/strong&gt; No clever metaprogramming. The whole stack is meant to be readable in an afternoon. If it isn't, that's a bug, not a feature.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Why open source the building blocks
&lt;/h2&gt;

&lt;p&gt;The same checks power &lt;a href="https://trygeosuite.it" rel="noopener noreferrer"&gt;GeoSuite&lt;/a&gt;, the hosted product I'm building (history, alerts, dashboards, integrations into your content pipeline). But the building blocks belong open: I find it dishonest to sell a black box that does things any developer could verify.&lt;/p&gt;

&lt;p&gt;If you find a bot UA missing — or worse, a wrong one — the place to send it is &lt;code&gt;bots.json&lt;/code&gt; in &lt;a href="https://github.com/TryGeoSuite/ai-crawler-bots" rel="noopener noreferrer"&gt;&lt;code&gt;ai-crawler-bots&lt;/code&gt;&lt;/a&gt;, with a link to the operator's docs. UA strings drift a couple of times per year per operator, and that file ages faster than anything else in the suite.&lt;/p&gt;

&lt;p&gt;PRs and issues welcome. Especially the ones that prove me wrong.&lt;/p&gt;

&lt;p&gt;→ &lt;a href="https://github.com/TryGeoSuite" rel="noopener noreferrer"&gt;github.com/TryGeoSuite&lt;/a&gt;&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>ai</category>
      <category>geo</category>
      <category>generativeengineoptimization</category>
    </item>
  </channel>
</rss>
