DEV Community

Cover image for Reducing LLM Token Consumption in RAG Pipelines with Clean JSON Output from Web Scraping APIs
AlterLab
AlterLab

Posted on • Originally published at alterlab.io

Reducing LLM Token Consumption in RAG Pipelines with Clean JSON Output from Web Scraping APIs

TL;DR

Using clean JSON output from a web scraping API dramatically reduces the token count fed into LLMs in Retrieval-Augmented Generation (RAG) pipelines. This lowers costs, speeds up responses, and improves answer quality by removing unnecessary HTML, scripts, and styling.

Why Token Count Matters in RAG

RAG workflows retrieve external documents, inject them into a prompt, and ask an LLM to generate an answer. The retrieved text often arrives as raw HTML full of tags, inline CSS, JavaScript, and navigation menus—none of which help the model answer the user’s question. Each extra character translates to more tokens, increasing API latency and cost. For example, a typical product page might be 150 KB of HTML but only 12 KB of useful text after stripping markup—a 92% reduction in token load.

How Clean JSON Helps

Scraping APIs like AlterLab can return data in structured formats (JSON, Markdown, plain text) instead of raw HTML. By specifying formats=["json"], you receive only the fields you need—title, price, description—already stripped of markup. This pre‑filtering happens at the edge, saving bandwidth and compute before the data even reaches your RAG module.

Example: Requesting JSON Output

```python title="rag_scraper.py" {2-4}

client = alterlab.Client("YOUR_API_KEY") # Initialize with your key

Ask for JSON output; the API handles rendering and anti‑bot measures

response = client.scrape(
url="https://example.com/product",
formats=["json"] # <-- get structured JSON, not HTML
)

response.json is a dict ready for your RAG retriever

print(response.json)





```bash title="Terminal" {2-4}
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/product",
    "formats": ["json"]
  }'
Enter fullscreen mode Exit fullscreen mode

The returned JSON might look like:

{
  "title": "Wireless Headphones",
  "price": 89.99,
  "description": "Noise‑cancelling over‑ear headphones with 30h battery life."
}
Enter fullscreen mode Exit fullscreen mode

Feeding this three‑field object into your prompt uses far fewer tokens than dumping the entire HTML page.

Infographic: RAG Pipeline with Clean JSON

Practical Impact: Token Savings

Consider a RAG system that retrieves the top‑3 pages per query. Using raw HTML:

  • Average page size: 130 KB → ~32 k tokens per page (assuming 4 bytes/token)
  • 3 pages → ~96 k tokens prompt

Using clean JSON (≈10 % of HTML size):

  • Average JSON size: 13 KB → ~3.2 k tokens per page
  • 3 pages → ~9.6 k tokens prompt

That’s a 90% reduction in input tokens, cutting LLM API costs proportionally and decreasing latency by a similar factor. Lower token usage also reduces the chance of hitting model context limits, allowing you to include more relevant sources per query.

Best Practices for Integration

  1. Specify only needed fields – Use the API’s select or post‑process to keep the payload minimal.
  2. Cache responses – Since scraped content changes infrequently, store JSON blobs to avoid repeated API calls.
  3. Handle errors gracefully – Check HTTP status and fallback to retries; the API already manages retries for transient network issues.
  4. Respect rate limits – Even with an API, follow the provider’s guidelines to maintain fair access.

Internal Resources

For a quick start with the official Python client, see the Python scraping API. To understand how the service handles anti‑bot measures without violating any terms, review the anti‑bot solution. Full details on request parameters and response formats are in the API documentation.

Takeaway

Clean JSON output from a scraping API is a simple, high‑leverage optimization for any RAG pipeline. By stripping irrelevant markup at the source, you cut token usage, lower costs, and improve the relevance and speed of LLM‑generated answers—without writing fragile parsers or skirting terms of service. Start by adding formats=["json"] to your next scrape request and measure the token savings immediately.

Top comments (0)