DEV Community

Cover image for CLI for AI Search Optimization in Node.js projects
Made Büro
Made Büro

Posted on

CLI for AI Search Optimization in Node.js projects

If you've been following the AI search space, you've probably heard about llms.txt – a simple Markdown file that tells AI crawlers (ChatGPT, Claude, Perplexity, Gemini, DeepSeek etc.) what your site is about and what content they're allowed to use. Think of it as robots.txt, but designed for the LLM era.

In this post I'll walk through how I built a CLI tool – geo-ai-cli — that generates and validates these files for any Node.js project, and share some of the interesting design decisions along the way.

What is llms.txt?

The llms.txt proposal defines a standard for exposing structured site content to AI systems. Two files:

llms.txt – a compact summary: site name, description, and a list of key pages with titles and URLs
llms-full.txt – the same, but with full content for each page
A minimal llms.txt looks like this:

# My SaaS Product

> A brief description for AI crawlers.

## Pages

- [Home](https://example.com/): Welcome page
- [Pricing](https://example.com/pricing): Plans and pricing

## Blog

- [Getting Started](https://example.com/blog/start): First steps guide
Enter fullscreen mode Exit fullscreen mode

AI search engines like Perplexity, Claude and ChatGPT browse these files to understand your site's content and include it in answers. It's essentially GEO – Generative Engine Optimization.

The CLI

npm install --save-dev geo-ai-cli

Four commands cover the full workflow:

npx geo-ai init      # scaffold a config file
npx geo-ai generate  # generate llms.txt + llms-full.txt
npx geo-ai validate  # check output files are valid
npx geo-ai inspect   # preview config and crawler rules
Enter fullscreen mode Exit fullscreen mode

init
Scaffolds a geo-ai.config.ts in the current directory. If a config already exists, it exits without overwriting – safe to run multiple times.

// geo-ai.config.ts
import type { GeoAIConfig } from 'geo-ai-core';

export default {
  siteName: 'My Site',
  siteUrl: 'https://example.com',
  siteDescription: 'A brief description for AI crawlers.',
  crawlers: 'all',
  provider: {
    Pages: [
      { title: 'Home', url: 'https://example.com/', description: 'Welcome page' },
    ],
    Blog: [
      { title: 'Getting Started', url: 'https://example.com/blog/start', description: 'First steps' },
    ],
  },
} satisfies GeoAIConfig;
Enter fullscreen mode Exit fullscreen mode

The provider field is a plain object where each key becomes a section in llms.txt. You can also pass a ContentProvider instance if you want to pull data from a CMS or API dynamically.

generate
Loads your config, calls createGeoAI(config) from geo-ai-core, and writes both files to ./public (or wherever you point --out):

npx geo-ai generate --out ./dist/public

Config discovery follows a priority order: geo-ai.config.ts → geo-ai.config.js → geo-ai.config.json. TypeScript configs are loaded via dynamic import(), so you get full type safety without a separate compilation step.

validate
Checks that the generated files are present and structurally valid. Works on local files or a live URL:

# local
npx geo-ai validate --path ./public

# remote
npx geo-ai validate --url https://example.com
Enter fullscreen mode Exit fullscreen mode

Validation rules are intentionally simple:

  • file missing: not_found
  • content < 50 chars: warn
  • doesn't start with #: fail
  • starts with #: pass

Exit code 1 on any fail or not_found, so it integrates cleanly into CI:

# .github/workflows/deploy.yml
- run: npx geo-ai generate
- run: npx geo-ai validate
Enter fullscreen mode Exit fullscreen mode

inspect
Prints a human-readable summary of your config: site info, which AI bots are allowed/disallowed, and how many items are in each section. Useful for debugging before you generate.

npx geo-ai inspect

# Site: My Site
# URL:  https://example.com
# Out:  ./public
# Crawlers:
#   GPTBot: allow
#   ClaudeBot: allow
#   PerplexityBot: allow
#   ...
# Sections:
#   Pages: 3 item(s)
#   Blog: 12 item(s)
Enter fullscreen mode Exit fullscreen mode

You can also point it at a live URL to fetch and display the remote files:

npx geo-ai inspect --url https://example.com

Adding it to your build pipeline

The most useful place for geo-ai generate is right before your deployment step:

// package.json
{
  "scripts": {
    "build": "next build && geo-ai generate",
    "postbuild": "geo-ai validate"
  }
}
Enter fullscreen mode Exit fullscreen mode

Or in CI:

- name: Build
  run: npm run build

- name: Generate llms.txt
  run: npx geo-ai generate

- name: Validate
  run: npx geo-ai validate --url ${{ env.SITE_URL }}
Enter fullscreen mode Exit fullscreen mode

Controlling which AI bots can crawl

By default, crawlers: 'all' allows all 16+ known AI bots. You can get granular:

export default {
  siteName: 'My Site',
  siteUrl: 'https://example.com',
  crawlers: {
    GPTBot: 'allow',
    ClaudeBot: 'allow',
    PerplexityBot: 'allow',
    'Google-Extended': 'disallow', // opt out of Gemini training
    Bytespider: 'disallow',        // TikTok crawler
  },
  provider: { /* ... */ },
} satisfies GeoAIConfig;

Enter fullscreen mode Exit fullscreen mode

The geo-ai-core engine uses this to generate a corresponding robots.txt block and per-bot allow/disallow rules in the llms files.

What's next

The CLI is part of a larger ecosystem:

  • geo-ai-core — the zero-dependency engine (llms.txt generation, bot rules, crawl tracking, SEO signals, AI description generation via Claude/OpenAI)
  • geo-ai-next — a thin Next.js wrapper with middleware and App Router handler
  • WordPress and Shopify integrations

The geo-ai-core/ai entry point lets you bulk-generate AI descriptions for your content using Claude or OpenAI, with a built-in rate limiter and batching – but that's a topic for another post.

If you're building a public-facing Node.js project, adding llms.txt takes about 5 minutes and meaningfully improves how AI search engines understand your content. Give it a try:

npx geo-ai init

Links
Website: https://www.geoai.run
CLI page: https://www.geoai.run/cli
CLI docs: https://www.geoai.run/docs/integrations/cli
GitHub: https://github.com/madeburo/GEO-AI

Top comments (5)

Collapse
 
williamwangai profile image
William Wang

Really nice tool! The fact that it generates both llms.txt and llms-full.txt from a single config is exactly the kind of developer experience that will drive adoption.

One thing I've noticed working in the AI search visibility space: the sites that benefit most from llms.txt aren't just the ones that have the file — it's the ones where the content within it is well-structured with clear entity relationships. Your CLI generating these automatically from project config removes a huge manual bottleneck.

Question: have you considered adding a validate --url option that also checks whether AI search engines are actually crawling and respecting the file? That feedback loop between "I generated it" and "it's actually being used" is the gap most tools miss.

Collapse
 
madeburo profile image
Made Büro • Edited

Thanks @williamwangai,
validate --url already exists! You can run:

npx geo-ai validate --url https://example.com
Enter fullscreen mode Exit fullscreen mode

It checks whether llms.txt and llms-full.txt are present, accessible, and contain valid content at the remote URL.

It logs bot visits (GPTBot, ClaudeBot, PerplexityBot, etc.) with GDPR-compliant IP anonymization via SHA-256.

What's missing is combining these into a single "health check" view – something like geo-ai status that shows both file validity AND recent crawler activity. That's a great idea for the roadmap.

Your point about entity relationships in the content is spot on too. The ContentProvider interface is designed to support that – you structure your data into typed sections (products, articles, pages) with relationships, and the generator outputs them accordingly.

Appreciate the thoughtful feedback!

Collapse
 
gronrafal profile image
Rafał Groń

The llms.txt approach is something I've been maintaining manually for my WordPress plugin. The validate --url command is exactly the missing piece — knowing the file is there is one thing, knowing it's actually being parsed is another. One thing I'd add to the roadmap: WordPress-specific content providers that pull from WP REST API automatically. Most WP developers won't touch a config file, but a plugin that generates and updates llms.txt on post publish would get massive adoption.

Thread Thread
 
madeburo profile image
Made Büro

Thanks, appreciate your message

Manual llms.txt maintenance is not a scalable path for most developers or site owners. That’s actually why I added a WordPress / WooCommerce module to the GEO AI:
GitHub: github.com/madeburo/GEO-AI-Woo

The goal is to make this much more automatic for WP-based projects and move closer to the kind of publish-driven workflow you described

Thread Thread
 
gronrafal profile image
Rafał Groń

Nice — that's exactly the gap. A plugin that auto-generates llms.txt on post publish is the missing piece for most WP developers. Going to check out the WooCommerce module.