DEV Community

Cover image for CLI for AI Search Optimization in Node.js projects
Made Büro
Made Büro

Posted on

CLI for AI Search Optimization in Node.js projects

If you've been following the AI search space, you've probably heard about llms.txt – a simple Markdown file that tells AI crawlers (ChatGPT, Claude, Perplexity, Gemini, DeepSeek etc.) what your site is about and what content they're allowed to use. Think of it as robots.txt, but designed for the LLM era.

In this post I'll walk through how I built a CLI tool – geo-ai-cli — that generates and validates these files for any Node.js project, and share some of the interesting design decisions along the way.

What is llms.txt?

The llms.txt proposal defines a standard for exposing structured site content to AI systems. Two files:

llms.txt – a compact summary: site name, description, and a list of key pages with titles and URLs
llms-full.txt – the same, but with full content for each page
A minimal llms.txt looks like this:

# My SaaS Product

> A brief description for AI crawlers.

## Pages

- [Home](https://example.com/): Welcome page
- [Pricing](https://example.com/pricing): Plans and pricing

## Blog

- [Getting Started](https://example.com/blog/start): First steps guide
Enter fullscreen mode Exit fullscreen mode

AI search engines like Perplexity, Claude and ChatGPT browse these files to understand your site's content and include it in answers. It's essentially GEO – Generative Engine Optimization.

The CLI

npm install --save-dev geo-ai-cli

Four commands cover the full workflow:

npx geo-ai init      # scaffold a config file
npx geo-ai generate  # generate llms.txt + llms-full.txt
npx geo-ai validate  # check output files are valid
npx geo-ai inspect   # preview config and crawler rules
Enter fullscreen mode Exit fullscreen mode

init
Scaffolds a geo-ai.config.ts in the current directory. If a config already exists, it exits without overwriting – safe to run multiple times.

// geo-ai.config.ts
import type { GeoAIConfig } from 'geo-ai-core';

export default {
  siteName: 'My Site',
  siteUrl: 'https://example.com',
  siteDescription: 'A brief description for AI crawlers.',
  crawlers: 'all',
  provider: {
    Pages: [
      { title: 'Home', url: 'https://example.com/', description: 'Welcome page' },
    ],
    Blog: [
      { title: 'Getting Started', url: 'https://example.com/blog/start', description: 'First steps' },
    ],
  },
} satisfies GeoAIConfig;
Enter fullscreen mode Exit fullscreen mode

The provider field is a plain object where each key becomes a section in llms.txt. You can also pass a ContentProvider instance if you want to pull data from a CMS or API dynamically.

generate
Loads your config, calls createGeoAI(config) from geo-ai-core, and writes both files to ./public (or wherever you point --out):

npx geo-ai generate --out ./dist/public

Config discovery follows a priority order: geo-ai.config.ts → geo-ai.config.js → geo-ai.config.json. TypeScript configs are loaded via dynamic import(), so you get full type safety without a separate compilation step.

validate
Checks that the generated files are present and structurally valid. Works on local files or a live URL:

# local
npx geo-ai validate --path ./public

# remote
npx geo-ai validate --url https://example.com
Enter fullscreen mode Exit fullscreen mode

Validation rules are intentionally simple:

  • file missing: not_found
  • content < 50 chars: warn
  • doesn't start with #: fail
  • starts with #: pass

Exit code 1 on any fail or not_found, so it integrates cleanly into CI:

# .github/workflows/deploy.yml
- run: npx geo-ai generate
- run: npx geo-ai validate
Enter fullscreen mode Exit fullscreen mode

inspect
Prints a human-readable summary of your config: site info, which AI bots are allowed/disallowed, and how many items are in each section. Useful for debugging before you generate.

npx geo-ai inspect

# Site: My Site
# URL:  https://example.com
# Out:  ./public
# Crawlers:
#   GPTBot: allow
#   ClaudeBot: allow
#   PerplexityBot: allow
#   ...
# Sections:
#   Pages: 3 item(s)
#   Blog: 12 item(s)
Enter fullscreen mode Exit fullscreen mode

You can also point it at a live URL to fetch and display the remote files:

npx geo-ai inspect --url https://example.com

Adding it to your build pipeline

The most useful place for geo-ai generate is right before your deployment step:

// package.json
{
  "scripts": {
    "build": "next build && geo-ai generate",
    "postbuild": "geo-ai validate"
  }
}
Enter fullscreen mode Exit fullscreen mode

Or in CI:

- name: Build
  run: npm run build

- name: Generate llms.txt
  run: npx geo-ai generate

- name: Validate
  run: npx geo-ai validate --url ${{ env.SITE_URL }}
Enter fullscreen mode Exit fullscreen mode

Controlling which AI bots can crawl

By default, crawlers: 'all' allows all 16+ known AI bots. You can get granular:

export default {
  siteName: 'My Site',
  siteUrl: 'https://example.com',
  crawlers: {
    GPTBot: 'allow',
    ClaudeBot: 'allow',
    PerplexityBot: 'allow',
    'Google-Extended': 'disallow', // opt out of Gemini training
    Bytespider: 'disallow',        // TikTok crawler
  },
  provider: { /* ... */ },
} satisfies GeoAIConfig;

Enter fullscreen mode Exit fullscreen mode

The geo-ai-core engine uses this to generate a corresponding robots.txt block and per-bot allow/disallow rules in the llms files.

What's next

The CLI is part of a larger ecosystem:

  • geo-ai-core — the zero-dependency engine (llms.txt generation, bot rules, crawl tracking, SEO signals, AI description generation via Claude/OpenAI)
  • geo-ai-next — a thin Next.js wrapper with middleware and App Router handler
  • WordPress and Shopify integrations

The geo-ai-core/ai entry point lets you bulk-generate AI descriptions for your content using Claude or OpenAI, with a built-in rate limiter and batching – but that's a topic for another post.

If you're building a public-facing Node.js project, adding llms.txt takes about 5 minutes and meaningfully improves how AI search engines understand your content. Give it a try:

npx geo-ai init

Links
Website: https://www.geoai.run
CLI page: https://www.geoai.run/cli
CLI docs: https://www.geoai.run/docs/integrations/cli
GitHub: https://github.com/madeburo/GEO-AI

Top comments (0)