DEV Community

Cover image for Reduce LLM Token Costs with Clean Markdown Output from AI‑Powered Web Scraping
AlterLab
AlterLab

Posted on • Originally published at alterlab.io

Reduce LLM Token Costs with Clean Markdown Output from AI‑Powered Web Scraping

TL;DR

Request Markdown‑formatted output from AlterLab’s scraping API to strip HTML noise before feeding data to LLMs. This cuts token usage, lowers cost, and simplifies parsing in AI‑driven pipelines.

Why HTML Inflates LLM Costs

Large language models charge per token. Raw HTML from a typical page includes tags, attributes, whitespace, and scripts that add little semantic value but increase token count dramatically. For example, a product listing page might deliver 12 KB of HTML, which translates to roughly 3 000 tokens—most of it noise. When you chain multiple pages or run retrieval‑augmented generation (RAG) workflows, these extra tokens multiply quickly, raising both latency and expense.

The Markdown Alternative

AlterLab’s API supports an optional formats parameter. Setting formats=['markdown'] returns the page’s main content converted to clean Markdown. Headings become #, lists become -, and tables retain a simple pipe‑delimited structure. The resulting text is typically 60‑80 % smaller than the raw HTML equivalent, directly reducing the token count sent to your LLM.

This is platform, because Alters, the API request code showing

```python title="fetch.html snippet and then
showdown by using a dscrape using
-ing to
optional formats param
respo
we try a request to Markdown
and the extra
0 example

We'll a a like:


python title="scrape_markdown.py" {2-5}

client = alterlab.Client("YOUR_API_KEY")   # API key from dashboard
# Request Markdown formatted output
response = client.scrape(
    url="https://example.com/articles/latest",
    formats=["markdown"]                     # highlighted: ask for Markdown
)
# The cleaned Markdown is ready for LLM consumption
print(response.text[:500])                 # preview first 500 characters


Enter fullscreen mode Exit fullscreen mode

bash title="Terminal"
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/articles/latest", "formats": ["markdown"]}'


Enter fullscreen mode Exit fullscreen mode

Integrating with LLM Pipelines

Once you have the Markdown string, you can feed it directly into your LLM call. Because the text is already structured, you often need less prompting to extract insights. For retrieval‑augmented generation, store the Markdown in your vector database; the reduced size means more chunks fit within your index’s token limits, improving recall without increasing storage costs.

Consider a simple summarization flow:

  1. Scrape target page with formats=["markdown"].
  2. Pass the Markdown to a summarization model (e.g., gpt-4o-mini).
  3. Use the summary downstream—no extra HTML stripping step required.

This eliminates a custom HTML‑to‑text preprocessing step, reducing both code complexity and potential bugs.

Combining Markdown Output with Cortex AI Extraction

AlterLab’s Cortex AI can extract structured fields (prices, dates, SKUs) from raw HTML. When you first request Markdown, you strip noise, then let Cortex work on the cleaner text. This two‑step approach can lower the token count sent to Cortex as well, because the model sees less irrelevant markup.


python title="cortex_markdown.py" {3-7}

client = alterlab.Client("YOUR_API_KEY")
response = client.scrape(
    url="https://example.com/products/listing",
    formats=["markdown"],                # get clean Markdown first
    extract={"model": "cortex-v1"}       # then run AI extraction on that Markdown
)
print(response.json)                     # structured data, minimal token overhead


Enter fullscreen mode Exit fullscreen mode

Cost Impact Example

Assume you scrape 10 000 product pages per month. Average raw HTML size: 12 KB (~3 000 tokens). Average Markdown size: 4 KB (~1 000 tokens).

  • HTML route: 10 000 × 3 000 = 30 M tokens → at $0.000015 per token ≈ $450/month.
  • Markdown route: 10 000 × 1 000 Markdown route: 10 000 × 1 000 = 10 M tokens → ≈ $150/month.

Savings of roughly $300/month, plus reduced egress bandwidth and faster LLM inference.

Best Practices

  • Always request the minimal format you need: formats=["markdown"] or formats=["json"] when downstream code expects structured data.
  • Combine formats with extract to let AlterLab perform both cleaning and AI extraction in one request.
  • Monitor your token usage via your LLM provider’s dashboard; you should see a noticeable drop after switching to Markdown.
  • If you need the original HTML for archival, keep a separate request without the formats flag, but use it sparingly.

Internal Resources

For a full list of supported output formats, see the API documentation. To get started quickly, follow the quickstart guide. For pricing details on our pay‑as‑you‑go model, visit the pricing page.

Takeaway

Asking AlterLab for Markdown‑formatted scraped data is a simple, effective way to reduce LLM token consumption and lower operating costs. The cleaned output removes HTML noise, speeds up downstream processing, and works seamlessly with AlterLab’s AI extraction features. Start using the formats parameter today and see immediate savings on your AI‑driven scraping pipelines.

Top comments (0)