DEV Community

Cover image for How to Use Web Scraping Templates the Right Way (2026)
Simon
Simon

Posted on

How to Use Web Scraping Templates the Right Way (2026)

Most web scraping projects are not unique snowflakes. Track competitor prices. Enrich a list of leads. Audit a site for SEO. Pull training data for a model. It is the same handful of recipes, over and over.

A web scraping template is one of those recipes, pre-wired: a ready-to-use JSON config that chains the right tools in the right order, so you copy it, point it at your targets, and run. CrawlForge ships 24 of them in the templates gallery. This guide is about using them well — not just copy-paste, but read, adapt, and cost them out before you scale.


TL;DR: A CrawlForge template is a copy-paste JSON config that chains multiple MCP tools into one workflow (price monitoring, lead enrichment, SEO audits, market research, AI training data). There are 24 across 9 categories, each costing 3–19 credits per run. Run them from Claude/Cursor, the crawlforge CLI, or the REST API. Free tier = 1,000 credits, no credit card.

Table of Contents


What Is a Web Scraping Template?

A template is a saved configuration that orchestrates two or three CrawlForge tools into one workflow with a business outcome attached. Instead of wiring search_web then scrape_structured then analyze_content yourself — and guessing every parameter — you copy a config that already does it.

Each template in the gallery carries:

  • A category — E-commerce, Research, Data Collection, Monitoring, AI & LLM, Sales, SEO, Content, or Advanced Scraping (nine in total).
  • A difficulty — beginner, intermediate, or advanced.
  • The tool chain it runs and a fixed credit cost per run (3–19 credits).
  • A copy-paste JSON config with sensible default parameters.

You run that config from any MCP client (Claude, Cursor, Windsurf), the crawlforge CLI, or the REST API. Same config, same shape of result.

Templates Gallery vs the scrape_template Tool

This trips people up, so let's be precise. CrawlForge has two different things with "template" in the name:

Templates gallery scrape_template tool
What it is A library of multi-tool config chains A single tool with 10 site schemas
Scope Any workflow (pricing, SEO, research…) 10 specific sites (Amazon, LinkedIn, GitHub…)
Output Whatever the chained tools return Structured JSON for that one site
Cost 3–19 credits/run (sum of its tools) 1 credit/call
Use when You want a whole workflow, ready-made You want data from one popular site

If your target is one of the ten supported sites, reach for the tool — covered in depth in Scrape Amazon, LinkedIn & 8 More Sites With One Tool. For everything else — a full pricing-monitoring or lead-enrichment pipeline — you want a gallery template. This guide is about the gallery.

How to Use a Template the Right Way

Copy-paste is step one. Using a template well is six.

1. Pick by outcome, not by tool. Start from the job ("monitor competitor prices") and filter the gallery by category and difficulty. New to this? Start with the cheap, two-tool, beginner templates before reaching for a 19-credit research pipeline.

2. Read the config before you run it. Look at the tool order, the parameters, and whether it carries a schedule (hourly, daily, or weekly). Order matters: search_web finds the URLs, then scrape_structured extracts from them.

3. Swap the placeholders. Every config ships with example values — https://competitor-a.com/pricing, {company_name}, "product name", a default schema. Replace those with your real targets and the exact fields you want back. The schema is your output contract; trim it to what you'll actually use.

4. Do the credit math before you scale. A template's cost is just the sum of its tools' costs. Multiply by frequency: a 7-credit template run hourly is 168 credits/day. Here is the per-tool table the configs draw from:

Credits Tools
1 fetch_url, extract_text, extract_links, extract_metadata, scrape_template
2 scrape_structured, extract_content, map_site, process_document, localization
3 analyze_content, track_changes, extract_structured, extract_with_llm
4 summarize_content, crawl_deep
5 stealth_mode, scrape_with_actions, batch_scrape, search_web, generate_llms_txt
10 deep_research

5. Run it from your stack of choice. In an MCP client, paste the goal and let the agent call the tools. From a terminal or cron job, use the crawlforge CLI. In an app, hit the REST API. All three share one API key and one credit balance.

6. Schedule and monitor. Templates built for monitoring carry a schedule. Pair them with track_changes so you act on diffs, not on every identical run.

8 Templates Worth Copying First

The gallery has 24. These eight cover the highest-demand jobs and span beginner to advanced. Expand each for the copy-paste config.

1. Competitor Pricing Monitor

E-commerce · intermediate · 7 credits/run · batch_scrape + scrape_structured

Scrape a set of competitor pricing pages on a schedule and normalize them into a clean plan / price / features structure.

Config + how to adapt
{
  "tools": [
    {
      "name": "batch_scrape",
      "params": {
        "urls": [
          "https://competitor-a.com/pricing",
          "https://competitor-b.com/pricing"
        ],
        "selectors": { "price": ".price", "name": "h1" }
      }
    },
    {
      "name": "scrape_structured",
      "params": {
        "schema": {
          "plans": [{ "name": "string", "price": "string", "features": ["string"] }]
        }
      }
    }
  ],
  "schedule": "daily"
}
Enter fullscreen mode Exit fullscreen mode

Replace urls with your competitors' pricing pages, then tune selectors and schema to the fields you track. Keep schedule at daily for most pricing work. Full walkthrough: build an AI price-monitoring system.

2. Contact Enrichment Pipeline

Sales · intermediate · 7 credits/run · search_web + extract_metadata + extract_links

Turn a bare company name into an enriched record — official site, social handles, and key links.

Config + how to adapt
{
  "tools": [
    {
      "name": "search_web",
      "params": { "query": "{company_name} official website" }
    },
    {
      "name": "extract_metadata",
      "params": { "include": ["og:title", "og:description", "twitter:site"] }
    },
    {
      "name": "extract_links",
      "params": { "filter": ["linkedin.com", "twitter.com", "github.com"] }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Drive {company_name} from your CRM export, and widen the extract_links filter to the domains you care about. Run it per row to enrich a whole list. Full walkthrough: build a lead-enrichment engine.

3. SEO Site Audit

SEO · beginner · 6 credits/run · map_site + extract_metadata + analyze_content

Crawl a site, pull every page's metadata, and score content quality — a fast, repeatable audit.

Config + how to adapt
{
  "tools": [
    {
      "name": "map_site",
      "params": { "url": "https://your-site.com", "max_depth": 3 }
    },
    {
      "name": "extract_metadata",
      "params": {}
    },
    {
      "name": "analyze_content",
      "params": { "metrics": ["readability", "topics", "sentiment"] }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Point url at your domain and raise or lower max_depth to control crawl breadth (and cost). One of the cheapest templates to run regularly. Full walkthrough: automating SEO audits with CrawlForge.

4. AI Training Data Collector

AI & LLM · intermediate · 7 credits/run · batch_scrape + extract_content

Collect and clean web pages at scale into model-ready text — no navigation, no boilerplate.

Config + how to adapt
{
  "tools": [
    {
      "name": "batch_scrape",
      "params": {
        "urls": ["https://docs.example.com/page-1", "https://docs.example.com/page-2"],
        "format": "markdown"
      }
    },
    {
      "name": "extract_content",
      "params": { "format": "text", "remove_navigation": true }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Feed urls from a sitemap or CSV, and keep remove_navigation on so menus and footers don't pollute your dataset. Full walkthrough: web scraping for AI training data pipelines.

5. Market Intelligence Dashboard

Research · advanced · 19 credits/run · deep_research + batch_scrape + summarize_content

The flagship. Run multi-source research, scrape the key industry sources, and summarize it all into a daily briefing.

Config + how to adapt
{
  "tools": [
    {
      "name": "deep_research",
      "params": {
        "query": "SaaS market trends and funding rounds",
        "sources": 10,
        "conflict_detection": true
      }
    },
    {
      "name": "batch_scrape",
      "params": {
        "urls": ["https://techcrunch.com", "https://www.saastr.com"],
        "format": "markdown"
      }
    },
    {
      "name": "summarize_content",
      "params": { "max_length": 300, "format": "bullet_points" }
    }
  ],
  "schedule": "daily"
}
Enter fullscreen mode Exit fullscreen mode

Change the query to your market and swap urls for your trusted sources. At 19 credits/run it is the most expensive template here — run it daily, not hourly. Related: competitive intelligence with AI agents.

6. Review Sentiment Analyzer

E-commerce · intermediate · 10 credits/run · search_web + scrape_structured + analyze_content

Find reviews across platforms, structure them, and score sentiment and topics.

Config + how to adapt
{
  "tools": [
    {
      "name": "search_web",
      "params": { "query": "\"product name\" reviews", "max_results": 10 }
    },
    {
      "name": "scrape_structured",
      "params": {
        "schema": {
          "reviewer": "string",
          "rating": "number",
          "text": "string",
          "date": "string"
        }
      }
    },
    {
      "name": "analyze_content",
      "params": { "metrics": ["sentiment", "topics"] }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Put your product in the query, raise max_results for more coverage, and keep the schema tight so sentiment scoring stays clean. Related: e-commerce product data extraction at scale.

7. Job Listings Scraper

Data Collection · intermediate · 7 credits/run · search_web + scrape_structured

Search job boards and pull listings into a structured feed — title, company, location, salary, date.

Config + how to adapt
{
  "tools": [
    {
      "name": "search_web",
      "params": { "query": "software engineer remote jobs 2026", "max_results": 20 }
    },
    {
      "name": "scrape_structured",
      "params": {
        "schema": {
          "title": "string",
          "company": "string",
          "location": "string",
          "salary": "string",
          "posted_date": "string"
        }
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Change the query to your role and region, and add schema fields (remote flag, seniority) as needed. See it on the Job Listings Scraper template page.

8. Website Change Detector

Monitoring · beginner · 6 credits/run · fetch_url + extract_content + analyze_content

Watch a single page and surface when its content shifts — pricing, terms, or announcements.

Config + how to adapt
{
  "tools": [
    {
      "name": "fetch_url",
      "params": { "url": "https://example.com/page-to-monitor" }
    },
    {
      "name": "extract_content",
      "params": { "format": "text" }
    },
    {
      "name": "analyze_content",
      "params": { "metrics": ["topics"] }
    }
  ],
  "schedule": "hourly"
}
Enter fullscreen mode Exit fullscreen mode

Set url to the page you care about and dial schedule to your tolerance for staleness — hourly for fast-moving pages, daily for the rest. Related: build a competitive-intelligence agent.

The Other 16 Templates

The remaining gallery entries, grouped by category — each is a copy-paste config on the templates page:

  • Research: News Aggregation Pipeline (11cr), Multi-Source Research Agent (12cr), Academic Paper Research (14cr).
  • Data Collection: Real Estate Listings Tracker (7cr), PDF Document Processor (6cr), Government Data Extractor (5cr).
  • Monitoring: Compliance Monitoring (9cr), Social Media Monitoring (12cr).
  • E-commerce: E-commerce Product Extraction (3cr).
  • AI & LLM: Documentation Knowledge Base (10cr).
  • Sales: Tech Stack Detector (3cr).
  • SEO: Link Building Prospector (7cr).
  • Content: Content Migration Tool (7cr), Localization Content Audit (7cr).
  • Advanced Scraping: Dynamic SPA Scraper (7cr), Stealth Data Extraction (7cr).

Customizing or Building Your Own

No template is a perfect fit out of the box — that is the point of step three. When a config gets you 80% there, swap the parameters and schema and you are done. When nothing fits:

  • Start from the closest template and rewrite its schema and parameters.
  • Compose tools yourself. Use scrape_structured when you know stable CSS selectors, or extract_with_llm when the layout shifts and you want schema-driven, layout-resilient extraction.
  • Request a template. If you want a recipe we don't ship yet, ask on Discord — popular requests get added to the gallery.

FAQ

What is a web scraping template?

A ready-to-use JSON config that chains multiple CrawlForge tools into one workflow with a specific outcome — price monitoring, lead enrichment, SEO auditing, and so on. Copy the config, swap in your URLs and schema, and run it from an MCP client, the crawlforge CLI, or the REST API. CrawlForge ships 24 templates across 9 categories.

Templates gallery vs the scrape_template tool?

The gallery is a library of multi-tool config chains for complete workflows (3–19 credits/run). The scrape_template tool is a single tool with pre-built schemas for 10 popular sites (Amazon, LinkedIn, GitHub…) at 1 credit/call. Use a gallery template for a whole workflow; use scrape_template for data from one of the ten supported sites.

How many credits does a template cost?

A template costs the sum of its tools per run, from 3 credits (E-commerce Product Extraction, Tech Stack Detector) to 19 (Market Intelligence Dashboard). A Competitor Pricing Monitor is batch_scrape (5) + scrape_structured (2) = 7. Multiply by frequency to budget: 7 credits hourly is 168/day.

Can I customize a template or change its schema?

Yes — that is the intended workflow. Every template ships with placeholders (example URLs, a default schema, sample queries) you replace with your real targets. The schema is your output contract, so trim or extend it. If nothing fits, start from the closest template or compose tools with scrape_structured / extract_with_llm.

How do I run a template?

Three ways, sharing one API key and credit balance: paste the goal into an MCP client like Claude, Cursor, or Windsurf; run it from a terminal or cron job with the crawlforge CLI; or call the REST API from an app. The same config produces the same result across all three.


Ready to run your first template? Free tier includes 1,000 credits and no credit card.

Start Free — 1,000 Credits Included

Or keep reading:

Top comments (0)