DEV Community

Wilson Xu
Wilson Xu

Posted on

Adding AI Superpowers to Your CLI Tools with LLM APIs

Adding AI Superpowers to Your CLI Tools with LLM APIs

The command line has always been where developers do their most focused work. It is fast, scriptable, and composable. Now, with large language model APIs becoming cheap and accessible, we can inject intelligence directly into our terminal workflows. Instead of context-switching to a chat window, imagine your CLI tools understanding your code, generating commit messages, explaining errors in plain English, and translating natural language into shell commands — all without leaving your terminal.

This article walks through practical patterns for building AI-powered CLI tools. We will cover API selection, real implementation code, cost management, privacy considerations, and offline fallbacks. Every example is production-ready and drawn from real tools.

Why CLI + AI Is the Next Frontier

Graphical AI assistants are useful, but they break flow. You leave your terminal, paste code into a browser, wait for a response, then copy it back. That round trip kills momentum.

CLI-native AI tools eliminate that friction. They plug directly into existing Unix pipelines. You can pipe git diff into an AI summarizer, chain an error explainer into your test runner, or wrap any command with an AI layer that adds context. The composability of the Unix philosophy — small tools that do one thing well — maps perfectly onto LLM integration. Each tool handles one task, and the LLM provides the reasoning layer.

There is also a deployment advantage. CLI tools ship as single binaries or npm packages. There is no frontend to build, no server to host, no OAuth flow to implement. Your users install with npm install -g or brew install and start using it immediately.

Choosing an LLM API

Before writing code, you need to pick a provider. Here is a practical comparison for CLI tool developers.

OpenAI (GPT-4o, GPT-4o-mini) remains the most widely used. The API is stable, documentation is excellent, and GPT-4o-mini is extremely cost-effective at $0.15 per million input tokens. For most CLI tools, GPT-4o-mini provides more than enough reasoning capability.

Anthropic Claude (Claude 3.5 Sonnet, Claude 3 Haiku) excels at code understanding and following precise instructions. Claude 3 Haiku is the speed champion — fast responses matter when your CLI tool runs on every commit or every file save. The API supports system prompts natively, which helps maintain consistent output formatting.

Google Gemini offers a generous free tier and supports extremely long context windows (up to 1 million tokens). If your CLI tool needs to process entire codebases at once, Gemini's context length is a significant advantage.

Ollama (local models) runs models like Llama 3, Mistral, and CodeLlama entirely on your machine. Zero cost, zero latency to an external server, and complete privacy. The trade-off is that local models require decent hardware (8GB+ RAM for 7B parameter models) and produce lower-quality outputs than cloud APIs. But for simple tasks like commit message generation, local models work surprisingly well.

A practical strategy is to default to a cloud API and fall back to Ollama when the cloud is unreachable. We will implement this pattern later.

Building an AI-Powered Commit Message Generator

Let us start with the most immediately useful tool: generating commit messages from staged diffs. This is a perfect first project because the input (a diff) and output (a short message) are both well-defined.

#!/usr/bin/env node
import { execSync } from 'child_process';
import OpenAI from 'openai';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

async function generateCommitMessage() {
  const diff = execSync('git diff --cached', { encoding: 'utf-8' });

  if (!diff.trim()) {
    console.error('No staged changes found. Stage files with git add first.');
    process.exit(1);
  }

  // Truncate very large diffs to stay within token limits
  const truncatedDiff = diff.length > 8000
    ? diff.slice(0, 8000) + '\n... (diff truncated)'
    : diff;

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `You are a commit message generator. Given a git diff, write a concise
conventional commit message. Use the format: type(scope): description
Types: feat, fix, refactor, docs, test, chore, style, perf
Keep the first line under 72 characters. Add a blank line and bullet points
for complex changes. Output ONLY the commit message, nothing else.`
      },
      {
        role: 'user',
        content: truncatedDiff
      }
    ],
    temperature: 0.3,
    max_tokens: 200
  });

  const message = response.choices[0].message.content.trim();
  console.log(message);
}

generateCommitMessage();
Enter fullscreen mode Exit fullscreen mode

The key design decisions here matter. We set temperature to 0.3 because commit messages should be deterministic, not creative. We cap max_tokens at 200 because commit messages should be short. We truncate the diff at 8000 characters to avoid blowing through token limits on large changes.

To use this as a git hook, save it and wire it into .git/hooks/prepare-commit-msg:

#!/bin/sh
MSG=$(node ~/tools/ai-commit-msg.js 2>/dev/null)
if [ -n "$MSG" ]; then
  echo "$MSG" > "$1"
fi
Enter fullscreen mode Exit fullscreen mode

Now every commit automatically gets an AI-generated message that you can edit before confirming.

Smart Code Review CLI: Pipe Diffs to an LLM

A code review tool extends the commit message pattern. Instead of generating a short summary, we ask the LLM to analyze code quality, spot bugs, and suggest improvements.

import Anthropic from '@anthropic-ai/sdk';
import { readFileSync } from 'fs';

const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });

async function reviewCode(input) {
  const diff = input || readFileSync('/dev/stdin', 'utf-8');

  const response = await client.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 1500,
    system: `You are a senior code reviewer. Analyze the provided diff and provide:
1. **Critical Issues** — bugs, security vulnerabilities, data loss risks
2. **Suggestions** — performance improvements, better patterns, readability
3. **Verdict** — APPROVE, REQUEST_CHANGES, or NEEDS_DISCUSSION

Be concise. Focus on issues that matter. Skip style nitpicks unless they
affect readability significantly. Format your response in markdown.`,
    messages: [
      { role: 'user', content: diff }
    ]
  });

  console.log(response.content[0].text);
}

reviewCode(process.argv[2]);
Enter fullscreen mode Exit fullscreen mode

Usage is clean and composable:

# Review staged changes
git diff --cached | ai-review

# Review a specific PR
gh pr diff 42 | ai-review

# Review changes since last release
git diff v1.2.0..HEAD | ai-review
Enter fullscreen mode Exit fullscreen mode

We use Claude here because it follows formatting instructions reliably. The structured output (Critical Issues, Suggestions, Verdict) makes the review scannable. A developer can glance at the verdict and dive into details only when needed.

AI-Powered Error Explainer

Stack traces are information-dense but rarely human-friendly. An error explainer tool takes raw error output and returns a plain-English explanation with actionable fix suggestions.

async function explainError(errorText) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `You are a debugging assistant. Given an error message or stack trace:
1. Explain what went wrong in one sentence a junior developer would understand
2. Identify the root cause
3. Provide a specific fix with a code snippet if applicable
4. Mention any common mistakes that cause this error

Keep your response under 200 words. Be direct and practical.`
      },
      { role: 'user', content: errorText }
    ],
    temperature: 0.2,
    max_tokens: 500
  });

  return response.choices[0].message.content;
}
Enter fullscreen mode Exit fullscreen mode

The magic is in the integration. Wire this into your test runner:

# Run tests, pipe failures to the explainer
npm test 2>&1 | ai-explain-errors

# Or use it interactively
ai-explain "TypeError: Cannot read properties of undefined (reading 'map')"
Enter fullscreen mode Exit fullscreen mode

For the interactive mode, you can even capture the last command's stderr automatically:

# Add to .zshrc
explain() {
  local last_error=$(fc -ln -1 | xargs -I{} bash -c '{} 2>&1 1>/dev/null')
  echo "$last_error" | ai-explain-errors
}
Enter fullscreen mode Exit fullscreen mode

Natural Language to Shell Commands

This is the tool that feels most like magic. Type what you want in English, and get the shell command back.

async function nlToShell(description) {
  const os = process.platform;
  const shell = process.env.SHELL || '/bin/bash';

  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `Convert natural language to a shell command.
OS: ${os}, Shell: ${shell}
Rules:
- Output ONLY the command, no explanation
- Use common Unix tools (find, grep, awk, sed, curl, jq)
- Prefer simple, readable commands over clever one-liners
- If destructive (rm, mv), add a confirmation prompt
- Never use sudo unless explicitly requested`
      },
      { role: 'user', content: description }
    ],
    temperature: 0.1,
    max_tokens: 150
  });

  const command = response.choices[0].message.content.trim();
  console.log(`\x1b[36m$ ${command}\x1b[0m`);

  // Safety: ask before executing
  const readline = await import('readline');
  const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
  rl.question('Execute? [Y/n] ', (answer) => {
    if (answer.toLowerCase() !== 'n') {
      execSync(command, { stdio: 'inherit' });
    }
    rl.close();
  });
}
Enter fullscreen mode Exit fullscreen mode

Example usage:

$ ai-sh "find all TypeScript files modified in the last 24 hours larger than 10KB"
$ find . -name "*.ts" -mtime -1 -size +10k
Execute? [Y/n]
Enter fullscreen mode Exit fullscreen mode

The safety confirmation is critical. Never auto-execute AI-generated shell commands. The confirmation prompt gives the developer a chance to verify the command before it runs.

Adding AI Autocomplete to Any CLI Prompt

For interactive CLI tools built with libraries like inquirer or prompts, you can add AI-powered autocomplete that suggests completions based on context.

import inquirer from 'inquirer';

async function aiAutocomplete(partialInput, context) {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
      {
        role: 'system',
        content: `Suggest 5 completions for a CLI input. Context: ${context}
Return a JSON array of strings. Nothing else.`
      },
      { role: 'user', content: partialInput }
    ],
    temperature: 0.5,
    max_tokens: 200
  });

  try {
    return JSON.parse(response.choices[0].message.content);
  } catch {
    return [];
  }
}

// Use with inquirer's autocomplete prompt
inquirer.registerPrompt('autocomplete',
  require('inquirer-autocomplete-prompt'));

const answer = await inquirer.prompt([{
  type: 'autocomplete',
  name: 'command',
  message: 'What do you want to do?',
  source: async (_, input) => {
    if (!input || input.length < 3) return [];
    const suggestions = await aiAutocomplete(input, 'git operations');
    return suggestions;
  }
}]);
Enter fullscreen mode Exit fullscreen mode

The key to making this feel responsive is debouncing. Do not fire an API request on every keystroke. Wait until the user pauses typing for 300 milliseconds, and only trigger when the input is at least 3 characters long.

Cost Management: Token Counting, Caching, and Model Selection

AI API calls cost money. For CLI tools that run frequently (on every commit, every test run, every file save), costs add up fast. Here are the patterns that keep bills under control.

Token counting before sending. Use the tiktoken library to estimate token counts before making API calls. If a diff is too large, summarize it first or split it into chunks.

import { encoding_for_model } from 'tiktoken';

function estimateCost(text, model = 'gpt-4o-mini') {
  const enc = encoding_for_model(model);
  const tokens = enc.encode(text).length;
  enc.free();

  const costs = {
    'gpt-4o-mini': { input: 0.00015, output: 0.0006 },
    'gpt-4o': { input: 0.005, output: 0.015 },
    'claude-3-haiku': { input: 0.00025, output: 0.00125 }
  };

  const rate = costs[model] || costs['gpt-4o-mini'];
  return {
    tokens,
    estimatedCost: (tokens / 1000) * rate.input
  };
}
Enter fullscreen mode Exit fullscreen mode

Response caching. Hash the input and cache responses locally. If the same diff is reviewed twice, return the cached result instead of making another API call.

import { createHash } from 'crypto';
import { existsSync, readFileSync, writeFileSync, mkdirSync } from 'fs';
import { join } from 'path';

const CACHE_DIR = join(process.env.HOME, '.cache', 'ai-cli');

function getCached(input) {
  const hash = createHash('sha256').update(input).digest('hex');
  const cachePath = join(CACHE_DIR, `${hash}.json`);

  if (existsSync(cachePath)) {
    const cached = JSON.parse(readFileSync(cachePath, 'utf-8'));
    const age = Date.now() - cached.timestamp;
    // Cache valid for 24 hours
    if (age < 86400000) {
      return cached.response;
    }
  }
  return null;
}

function setCache(input, response) {
  mkdirSync(CACHE_DIR, { recursive: true });
  const hash = createHash('sha256').update(input).digest('hex');
  const cachePath = join(CACHE_DIR, `${hash}.json`);
  writeFileSync(cachePath, JSON.stringify({
    timestamp: Date.now(),
    response
  }));
}
Enter fullscreen mode Exit fullscreen mode

Model tiering. Use cheap models for simple tasks and expensive models only when quality matters. Commit messages work fine with GPT-4o-mini. Complex code reviews benefit from Claude 3.5 Sonnet or GPT-4o. Let users configure the model via environment variables:

const model = process.env.AI_CLI_MODEL || 'gpt-4o-mini';
Enter fullscreen mode Exit fullscreen mode

A typical developer running AI commit messages on every commit might make 20 API calls per day. At GPT-4o-mini rates with average diffs, that costs roughly $0.01 per day — effectively free.

Offline Fallbacks When the API Is Unavailable

Cloud APIs go down. Networks fail. Airplanes exist. A good CLI tool handles these gracefully.

The best fallback strategy uses Ollama as a local backup:

async function queryLLM(prompt, systemPrompt) {
  // Try cloud API first
  try {
    const response = await openai.chat.completions.create({
      model: process.env.AI_CLI_MODEL || 'gpt-4o-mini',
      messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: prompt }
      ],
      timeout: 10000 // 10 second timeout
    });
    return response.choices[0].message.content;
  } catch (cloudError) {
    console.error('Cloud API unavailable, falling back to local model...');
  }

  // Fall back to Ollama
  try {
    const response = await fetch('http://localhost:11434/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({
        model: 'llama3.2',
        messages: [
          { role: 'system', content: systemPrompt },
          { role: 'user', content: prompt }
        ],
        stream: false
      })
    });

    const data = await response.json();
    return data.message.content;
  } catch (localError) {
    console.error('No AI available. Falling back to template.');
    return null;
  }
}
Enter fullscreen mode Exit fullscreen mode

The final fallback — returning null — lets the calling code use a non-AI template. For commit messages, that might mean opening the editor with a standard format. For code reviews, it might mean skipping the review. The tool degrades gracefully instead of crashing.

Privacy: When to Use Local Models vs Cloud APIs

Sending code to cloud APIs raises legitimate privacy concerns. Here is a practical decision framework.

Use cloud APIs when:

  • The code is open source
  • The diff contains no secrets, credentials, or PII
  • You need high-quality output (complex reviews, nuanced explanations)
  • Your organization's security policy permits it

Use local models (Ollama) when:

  • Working with proprietary code under NDA
  • The diff might contain API keys or credentials
  • Compliance requirements restrict data transmission (HIPAA, SOC2)
  • You want zero external dependencies

Implement a pre-send filter that scans for sensitive patterns:

function containsSensitiveData(text) {
  const patterns = [
    /(?:api[_-]?key|secret|password|token)\s*[:=]\s*['"][^'"]+['"]/gi,
    /-----BEGIN (?:RSA |EC )?PRIVATE KEY-----/,
    /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z]{2,}\b/g, // emails
    /\b\d{3}-\d{2}-\d{4}\b/, // SSN pattern
    /sk-[a-zA-Z0-9]{20,}/, // OpenAI keys
    /ghp_[a-zA-Z0-9]{36}/, // GitHub tokens
  ];

  return patterns.some(pattern => pattern.test(text));
}

async function safeLLMQuery(prompt, systemPrompt) {
  if (containsSensitiveData(prompt)) {
    console.warn('Sensitive data detected. Using local model.');
    return queryLocalOnly(prompt, systemPrompt);
  }
  return queryLLM(prompt, systemPrompt);
}
Enter fullscreen mode Exit fullscreen mode

This filter is not foolproof, but it catches the most common cases. For maximum security, configure the tool to always use local models via an environment variable:

export AI_CLI_LOCAL_ONLY=true
Enter fullscreen mode Exit fullscreen mode

Real Implementation Patterns from depcheck-ai

Our depcheck-ai tool analyzes project dependencies and flags unused, outdated, or vulnerable packages. Here are patterns we refined through real usage.

Streaming responses for long operations. When analyzing dozens of dependencies, waiting for a complete response feels slow. Stream the output instead:

const stream = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  messages,
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}
Enter fullscreen mode Exit fullscreen mode

Structured output with JSON mode. When your CLI tool needs to parse the LLM's response programmatically (not just display it), use JSON mode to guarantee parseable output:

const response = await openai.chat.completions.create({
  model: 'gpt-4o-mini',
  response_format: { type: 'json_object' },
  messages: [
    {
      role: 'system',
      content: `Analyze these dependencies and return a JSON object with:
{
  "unused": ["package names that appear unused"],
  "outdated": [{"name": "pkg", "current": "1.0", "latest": "2.0"}],
  "risky": [{"name": "pkg", "reason": "why it is risky"}]
}`
    },
    { role: 'user', content: dependencyData }
  ]
});

const analysis = JSON.parse(response.choices[0].message.content);
Enter fullscreen mode Exit fullscreen mode

Progress indicators. AI calls take 1-5 seconds. Always show a spinner or progress bar. Silence during that window makes users think the tool is broken:

import ora from 'ora';

const spinner = ora('Analyzing dependencies...').start();
try {
  const result = await queryLLM(prompt, systemPrompt);
  spinner.succeed('Analysis complete');
  console.log(result);
} catch (error) {
  spinner.fail('Analysis failed');
  console.error(error.message);
}
Enter fullscreen mode Exit fullscreen mode

Retry with exponential backoff. Rate limits and transient errors are common with LLM APIs. Implement retry logic:

async function withRetry(fn, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (attempt === maxRetries - 1) throw error;
      if (error.status === 429) {
        const delay = Math.pow(2, attempt) * 1000;
        await new Promise(r => setTimeout(r, delay));
      } else {
        throw error;
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

The most effective AI-powered CLI tools share these characteristics:

  1. They solve a specific problem. A commit message generator does one thing. An error explainer does one thing. Resist the urge to build an "AI that does everything" CLI. Small, focused tools compose better.

  2. They are fast. Use the cheapest model that produces acceptable output. Cache aggressively. Stream long responses. Show progress indicators.

  3. They degrade gracefully. Cloud API down? Fall back to local. Local model not installed? Fall back to templates. No fallback available? Exit cleanly with a helpful message.

  4. They respect privacy. Scan for secrets before sending data to cloud APIs. Offer a local-only mode. Be transparent about what data leaves the machine.

  5. They integrate with existing workflows. The best AI CLI tools do not require changing how developers work. They plug into git hooks, pipe into existing commands, and respect Unix conventions.

The barrier to building these tools is remarkably low. A useful AI-powered CLI tool can be built in under 100 lines of JavaScript. The LLM handles the hard part — understanding code and generating human-readable output. Your job is to build the plumbing: get the input, format the prompt, display the output, handle errors.

Start with the commit message generator. Once you see AI-generated messages appearing in your git log, you will want to add AI to everything in your terminal. And with the patterns in this article, you can.

Top comments (0)