AlterLab

Posted on Jun 7 • Originally published at alterlab.io

Connect Ollama to Live Web Data Using Markdown Extraction

#dataextraction #python #api #aiagents

TL;DR

Connecting Ollama to live web data requires fetching JavaScript-rendered pages and converting the raw HTML into token-efficient Markdown. Using a managed scraping environment handles the browser execution, while Markdown conversion reduces context window usage by up to 90%. This architecture enables local LLMs to process live data effectively without overwhelming their token limits.

The Context Window Problem

Local LLMs like Llama 3 or Mistral typically operate with an 8k to 32k token context window. Raw HTML is hostile to LLMs. A standard e-commerce product page or financial dashboard can easily exceed 150,000 characters of raw source code.

The DOM is packed with structural noise: tracking scripts, inline CSS, SVG paths, base64 images, and deep <div> nesting. Feeding raw HTML into a prompt dilutes the model's attention. The model wastes computation parsing layout tags instead of reasoning about the actual text.

Markdown solves this. Converting the rendered DOM to Markdown strips the layout markup while preserving the semantic hierarchy: headers, lists, links, and text formatting. A 100k-token HTML document typically reduces to a dense 500-token Markdown string. This keeps inference fast, stays well within local context limits, and drastically improves extraction accuracy.

The Data Pipeline

Fetching modern web data requires three phases: executing JavaScript to render the single-page application, extracting the rendered DOM, and cleaning the output for the LLM.

Handling Browser Fingerprinting

Using standard HTTP libraries like requests or plain curl fails on modern sites. Single-page applications return empty shell HTML until JavaScript executes. You need a browser.

Basic headless browsers (like standard Playwright or Puppeteer) leak technical signals. Default user agents, missing plugins, exposed navigator.webdriver flags, and specific WebGL rendering signatures flag the session as automated. Web Application Firewalls (WAFs) detect these anomalies and block the connection before the DOM even loads.

Instead of continuously patching Playwright stealth plugins and managing residential proxy pools manually, you can outsource the execution layer. Using a managed bot detection handling solution ensures the page renders correctly, bypassing interstitials and CAPTCHAs, allowing you to focus purely on the LLM integration.

Requesting Markdown Data

We need to instruct our scraping layer to return Markdown natively. This avoids running heavy DOM parsing libraries locally. Here is how to request pre-converted Markdown using AlterLab.

cURL Implementation

This terminal command requests the target URL and specifically asks the API to format the output as Markdown.

```bash title="Terminal" {3}
curl -X POST https://api.alterlab.io/v1/scrape \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"url": "https://example.com/data", "formats": ["markdown"]}'




### Python SDK Implementation

For integration into a Python application, the [Python SDK](https://alterlab.io/web-scraping-api-python) handles the request formatting and provides typed responses.



```python title="fetch_data.py" {5-6}

client = alterlab.Client("YOUR_API_KEY")

# Request Markdown format directly
response = client.scrape(
    url="https://example.com/data",
    formats=["markdown"]
)

markdown_content = response.markdown
print(f"Retrieved {len(markdown_content)} characters of Markdown.")

Connecting the Pipeline to Ollama

With clean Markdown ready, the final step is piping it into Ollama. Ollama runs the model locally, ensuring your prompts and extracted data remain private.

You need the ollama Python package installed (pip install ollama). Ensure the Ollama daemon is running locally and you have pulled a model, for example: ollama run llama3.

The integration script combines the scraping fetch with the LLM query.

```python title="ollama_pipeline.py" {18-24}

def analyze_web_page(url: str, query: str) -> str:
# 1. Fetch live data
client = alterlab.Client("YOUR_API_KEY")
scrape_response = client.scrape(
url=url,
formats=["markdown"]
)

context = scrape_response.markdown

system_prompt = (
    "You are a data extraction assistant. "
    "Answer the user's query using ONLY the provided Markdown context."
)

# 2. Query Ollama locally
llm_response = ollama.chat(model='llama3', messages=[
    {'role': 'system', 'content': system_prompt},
    {'role': 'user', 'content': f"Context:\n{context}\n\nQuery: {query}"}
])

return llm_response['message']['content']

Execute the pipeline

if name == "main":
target = "https://example.com/financial-report"
question = "Extract the Q3 revenue figures and list the risk factors."

answer = analyze_web_page(target, question)
print("LLM Analysis:")
print(answer)




## Prompt Architecture

The success of your extraction depends heavily on how you instruct the model. Local models benefit from strict bounding instructions. 

Structure your prompt to clearly separate the system instructions, the raw data context, and the actual user query. Notice in the code block above how the context is injected directly into the user message, preceded by the system prompt enforcing strict adherence to the provided text.

If you need structured data out of Ollama, append schema instructions to the prompt:



```python title="prompt_design.py" {3-7}
format_instructions = """
Format your response as a valid JSON object matching this schema:
{
  "revenue_q3": "string",
  "risk_factors": ["string"]
}
Do not include markdown code blocks or conversational text.
"""

Scaling the Architecture

This architecture scales horizontally. Because Ollama runs locally, your only external dependency is the scraping layer. You can queue thousands of URLs, fetch them asynchronously, and process the resulting Markdown through your local GPU hardware with zero additional API inference costs.

By shifting the burden of DOM rendering and bot evasion to an external service, and shifting the burden of LLM inference to your local machine, you achieve a highly resilient, cost-effective data pipeline.

For advanced configuration options on scheduling these fetches or handling specific HTTP methods, review the documentation to fine-tune the ingestion layer.

DEV Community