DEV Community

Cover image for CrawlForge v4.2.2: New CLI + 3 Tools for Local AI Scraping
Simon
Simon

Posted on

CrawlForge v4.2.2: New CLI + 3 Tools for Local AI Scraping

Today we are shipping CrawlForge v4.2.2, our biggest release since launch. It brings three new tools, a standalone command-line interface, and a quiet shift in how we think about web scraping for AI: most of it should run locally, on your own machine, without API keys.

This post is the umbrella for everything in 4.2.2. Three deep-dive guides follow in the next nine days.


TL;DR: v4.2.2 adds @crawlforge/cli (a standalone CLI -- no MCP client needed), extract_with_llm (local LLM extraction via Ollama, no OpenAI/Anthropic key required), scrape_template (one-line scrapers for 10 popular sites), and list_ollama_models (free model discovery). Tool count goes from 20 to 23. Free tier still includes 1,000 credits. Install: npm install -g @crawlforge/cli.

Table of Contents


What Shipped

v4.2.2 adds four things:

  1. @crawlforge/cli -- a standalone command-line tool exposing all 23 CrawlForge tools to your shell. No MCP client required.
  2. extract_with_llm -- LLM-powered structured extraction that defaults to local Ollama. No external API key needed.
  3. scrape_template -- pre-built scrapers for Amazon, LinkedIn, GitHub, YouTube, Reddit, Hacker News, Stack Overflow, npm, Product Hunt, and Twitter/X.
  4. list_ollama_models -- a free discovery tool that lists models on your local Ollama instance.

Tool count goes from 20 to 23. The CLI is brand new -- it is not a tool, it is a delivery channel.

+----------------+       +-------------------+       +----------------+
|   Your Shell   | <-->  |  @crawlforge/cli  | <-->  |  CrawlForge    |
|   (cron, CI)   |       |  (JSON in/out)    |       |   API + Tools  |
+----------------+       +-------------------+       +----------------+
                                  ^
                          No MCP handshake.
                          Just HTTPS + stdout.
Enter fullscreen mode Exit fullscreen mode

The New CrawlForge CLI

The CLI is the shortest path from intent to scraped data. You install it once, set an environment variable, and every CrawlForge tool becomes a command:

npm install -g @crawlforge/cli
export CRAWLFORGE_API_KEY="cf_live_your_key_here"

crawlforge scrape https://example.com
crawlforge search "best MCP servers 2026"
crawlforge research "AI agent frameworks" --depth 3
Enter fullscreen mode Exit fullscreen mode

Why does this matter? Because MCP is great for AI agents, but a lot of scraping work is not an AI agent task. It is a cron job. A CI step. A one-off pull from your terminal. For that, you want JSON on stdout that pipes into jq, not a JSON-RPC handshake.

Why have a CLI when MCP already exists?

MCP is optimized for AI agents picking tools dynamically. The CLI is optimized for humans typing commands and scripts piping JSON. Different shapes for different jobs:

Workflow Best fit
Claude/Cursor agent MCP
Cron job CLI
GitHub Actions step CLI
One-off terminal CLI
Server in a loop Raw API

All three paths hit the same backend, share the same credit balance, and use the same API key.

Read the complete CrawlForge CLI guide for the full command reference and real-world workflows.


Extract With LLM: Local AI Extraction

extract_with_llm is structured extraction powered by a language model. You hand it a URL and a schema, it gives you back JSON. The new part is that it defaults to local Ollama rather than calling OpenAI or Anthropic.

{
  "url": "https://news.ycombinator.com/item?id=123456",
  "schema": {
    "type": "object",
    "properties": {
      "title":    { "type": "string" },
      "points":   { "type": "number" },
      "comments": { "type": "number" }
    }
  },
  "provider": "ollama",
  "model": "llama3.1:8b"
}
Enter fullscreen mode Exit fullscreen mode

Three things follow from the local-first default:

  • No third-party API costs. The LLM is free. You only pay 3 CrawlForge credits per extraction.
  • No data leaving your machine. Scraped content stays on localhost.
  • No new API key to manage. If Ollama is installed, you are done.

When to still use OpenAI or Anthropic

Local models are great for predictable schemas (titles, prices, counts, ratings). For long-form reasoning -- summarizing a 10,000-word article, classifying nuanced sentiment, extracting fields that require world knowledge -- a frontier model still wins.

Switch providers with one parameter:

crawlforge extract https://example.com \
  --provider anthropic \
  --model claude-sonnet-4-6
Enter fullscreen mode Exit fullscreen mode

You pay the provider's per-token cost plus 3 CrawlForge credits. Same schema, same output shape.


Detailed guide: extract data with local LLMs.


Scrape Template: Ten Sites, One Call

scrape_template is for the long tail of scraping requests that all look the same: "get me product data from Amazon", "get me a GitHub repo's metadata", "get me the top posts on Hacker News today". You should not need to write CSS selectors for these. We did it once, we maintain it, you call it.

crawlforge template amazon --url "https://www.amazon.com/dp/B0CHX1W1XY"
crawlforge template github --url "https://github.com/anthropics/anthropic-sdk-python"
crawlforge template hackernews --top 10
Enter fullscreen mode Exit fullscreen mode

Ten templates ship in this release:

Template What it returns Credits
amazon Product title, price, rating, reviews, images 1
linkedin Profile name, headline, experience, skills 1
github Repo metadata, stars, languages, README 1
youtube Video title, views, channel, transcript 1
reddit Post title, score, comments, top replies 1
hackernews Story title, points, URL, comments 1
stackoverflow Question, answers, accepted, vote counts 1
npm Package metadata, weekly downloads, versions 1
producthunt Product name, tagline, upvotes, makers 1
tweet Tweet text, author, engagement, replies 1

Full walkthrough with code: scrape Amazon, LinkedIn, and GitHub with one tool.


list_ollama_models: Free Model Discovery

Most useful as a sanity-check before running extract_with_llm. Lists every model on your local Ollama instance with name, size, and modified date.

crawlforge extract --list-ollama-models
Enter fullscreen mode Exit fullscreen mode

Costs zero credits. It does no scraping, no LLM call -- it just queries Ollama's local API on 127.0.0.1:11434 and returns the result. If you have ever wondered which model you actually have installed, this is the answer.


Old Workflow vs v4.2.2 Workflow

Task Pre-4.2.2 v4.2.2
Scrape from your terminal curl + custom parser, or boot a Node REPL crawlforge scrape <url>
Extract structured data with LLM extract_structured (CSS selectors) or roll your own with Puppeteer + OpenAI extract_with_llm (Ollama default)
Scrape Amazon, LinkedIn, GitHub scrape_structured with hand-maintained selectors scrape_template (we maintain selectors)
Run scraping in CI/cron curl with API key in headers crawlforge <cmd> with env var

Credit Costs

The three new tools follow our existing credit-cost model. No surprises:

Tool Credits Why
list_ollama_models 0 Free discovery helper
scrape_template 1 Single page, pre-built schema
extract_with_llm 3 LLM inference (provider-agnostic)

The CLI itself is free. It uses your existing API key and bills against your normal credit balance.


How to Upgrade

Existing users do not need to do anything. The new tools are live on all plans -- Free, Hobby, Professional, and Business -- and show up automatically in your MCP client.

Install the CLI
npm install -g @crawlforge/cli
export CRAWLFORGE_API_KEY="cf_live_..."
crawlforge --help
Enter fullscreen mode Exit fullscreen mode

Add the export line to your shell profile (~/.zshrc, ~/.bashrc) so it persists. For CI, set CRAWLFORGE_API_KEY as a repository secret.

Try Ollama-powered extraction
# 1. Install Ollama (one-time)
curl -fsSL https://ollama.com/install.sh | sh

# 2. Pull a model (llama3.1:8b is a good start)
ollama pull llama3.1:8b

# 3. Run extraction through CrawlForge
crawlforge extract https://example.com \
  --provider ollama \
  --model llama3.1:8b
Enter fullscreen mode Exit fullscreen mode

The first run pulls about 5 GB. After that, every extraction is local, free, and offline-capable.


What Is Next

We are working on three things for 4.3:

  • More templates -- Etsy, eBay, TikTok, Instagram, Google Maps. Send us requests on Discord.
  • Webhook delivery for batch_scrape -- get results pushed to your endpoint when long-running jobs complete.
  • CLI watch mode -- crawlforge track --watch for live diffs on monitored pages.

Ready to try the new tools? Free tier still includes 1,000 credits and no credit card.

Get 1,000 Free Credits

Or jump straight into the deep dives:

Top comments (0)