How to Give Your AI Agent Access to Trustpilot Data

#api #aiagents #llm #rag

Disclaimer: This guide covers accessing publicly available data. Always review a site's robots.txt and Terms of Service before automated access.

TL;DR

To give your AI agent access to Trustpilot data, connect it to an extraction API that handles headless browsing and anti-bot systems automatically. By defining a strict JSON schema, you convert unstructured review pages into clean data arrays ready for immediate insertion into your LLM context window. This eliminates token waste and prevents pipeline failures caused by rate limits.

Why AI Agents Need Trustpilot Data

Agents require live context to make accurate decisions. Connecting them to public review platforms unlocks several core autonomous use cases.

Reputation Monitoring
Autonomous agents track brand sentiment continuously. They pull the latest reviews, classify the core complaints, and alert human engineering teams when technical issues arise in production.

Competitor Tracking
Retrieval-Augmented Generation (RAG) pipelines ingest competitor feedback. Product managers can query their internal knowledge base to discover exactly what features users dislike about competing tools.

Automated Support Triage
Agents read incoming reviews instantly. They cross-reference the stated problems with internal documentation and draft personalized, context-aware responses for your support team to approve.

Why Raw HTTP Requests Fail for Agents

Giving an LLM access to the internet via standard HTTP libraries causes immediate pipeline degradation. Websites deploy heavy countermeasures against automated access.

Standard requests.get() calls fail. Sites block unrecognized user agents. Even if you spoof headers, datacenter IP addresses trigger immediate CAPTCHA challenges. Your agent receives an HTML page containing a security challenge instead of the requested data.

Token waste presents a larger architectural problem. A standard Trustpilot page contains megabytes of DOM elements, inline CSS, and tracking scripts. Feeding raw HTML into an LLM context window burns token budget rapidly. It also severely limits the number of reviews the model can analyze simultaneously. Dense, unparsed HTML increases hallucination rates because the model struggles to isolate the actual review text from the surrounding noise.

Connecting Your Agent to Trustpilot via AlterLab

You need a middleware layer that translates unstructured web pages into strict JSON. AlterLab provides this layer. Read our Getting started guide for initial environment setup.

For LLM workflows, the Extract API docs detail the optimal approach. Instead of returning HTML, the API uses a headless browser to render the page, solves any bot challenges, and extracts exactly the data defined in your JSON schema.

Here is how to implement the extraction tool in Python.

```python title="agent_trustpilot_tool.py" {8-17}

client = alterlab.Client("YOUR_API_KEY")

def get_trustpilot_reviews(url: str) -> str:
"""Tool for the agent to fetch structured review data."""
schema = {
"company_name": "string",
"overall_rating": "number",
"reviews": [{
"author": "string",
"rating": "number",
"date": "string",
"text": "string",
"helpful_votes": "number"
}]
}

result = client.extract(
    url=url,
    schema=schema,
    min_tier=3  # Force JS rendering for dynamic review loading
)

# Return compact JSON string to save agent token budget
return json.dumps(result.data, separators=(',', ':'))

Example usage by the agent

extracted_data = get_trustpilot_reviews("https://www.trustpilot.com/review/example.com")
print(extracted_data)




You can test this pipeline directly from your terminal to verify the structured output format before integrating it into your agent's tool registry.



```bash title="Terminal"
curl -X POST https://api.alterlab.io/api/v1/extract \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.trustpilot.com/review/example.com",
    "min_tier": 3,
    "schema": {
      "company_name": "string",
      "reviews": [{"rating": "number", "text": "string"}]
    }
  }'

Using the Search API for Trustpilot Queries

Agents rarely know the exact Trustpilot URL for a given company. A robust agentic workflow requires a two-step process. First, the agent searches for the company profile. Second, the agent extracts the reviews from the located profile.

The Search API handles the discovery phase. It executes a query on the target site and returns a structured list of results. Your agent can evaluate the results, select the correct URL, and proceed with extraction.

```python title="search_tool.py" {7-10}
def find_trustpilot_profile(company_name: str) -> str:
"""Tool for the agent to locate a company's Trustpilot URL."""
client = alterlab.Client("YOUR_API_KEY")

query = f"site:trustpilot.com {company_name}"

result = client.search(
    query=query,
    num_results=3
)

return json.dumps([
    {"title": r.title, "url": r.url} 
    for r in result.results
])




## MCP Integration

Building custom tools requires writing boilerplate code for every new LLM framework. The Model Context Protocol (MCP) standardizes how agents interact with external tools. 

Instead of writing wrapper functions, you can connect your agent directly to the web using our official MCP server. This allows AI assistants like Claude, Cursor, or custom LangChain agents to natively call extraction commands. Read the complete setup instructions in the [AlterLab for AI Agents](https://alterlab.io/docs/tutorials/ai-agent) documentation.

<div data-infographic="steps">
  <div data-step data-number="1" data-title="Agent executes tool call" data-description="LLM decides it needs review data and calls the Extract MCP tool with the target URL"></div>
  <div data-step data-number="2" data-title="System fetches data" data-description="Platform handles headless rendering, bypasses anti-bot, and extracts JSON"></div>
  <div data-step data-number="3" data-title="Context window updated" data-description="Clean structured data feeds directly back into the agent context for analysis"></div>
</div>

## Building a Reputation Monitoring Pipeline

Let us assemble a complete, production-ready pipeline. This example demonstrates how an OpenAI-powered agent utilizes defined tools to monitor reputation autonomously. The pipeline handles discovery, extraction, and synthesis.

We define two tools for the LLM. The first locates the target URL. The second performs the heavy extraction. The system prompt instructs the agent on how to sequence these tools.



```python title="reputation_pipeline.py" {30-36, 45-48}

from tools import find_trustpilot_profile, get_trustpilot_reviews

client = openai.Client()

tools = [
    {
        "type": "function",
        "function": {
            "name": "find_trustpilot_profile",
            "description": "Finds the Trustpilot URL for a given company name.",
            "parameters": {
                "type": "object",
                "properties": {
                    "company_name": {"type": "string"}
                },
                "required": ["company_name"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_trustpilot_reviews",
            "description": "Extracts recent reviews from a specific Trustpilot URL.",
            "parameters": {
                "type": "object",
                "properties": {
                    "url": {"type": "string"}
                },
                "required": ["url"]
            }
        }
    }
]

def analyze_competitor(company_name: str):
    messages = [
        {"role": "system", "content": "You are a competitive intelligence agent. First, find the target company's Trustpilot URL. Then, extract their reviews. Finally, write a brief technical summary of their users' most common complaints."},
        {"role": "user", "content": f"Analyze recent feedback for {company_name}."}
    ]

    # Initial LLM call to determine next action
    response = client.chat.completions.create(
        model="gpt-4-turbo",
        messages=messages,
        tools=tools
    )

    # In a production system, you would iterate through tool calls here.
    # The agent will output a tool call to find_trustpilot_profile.
    # You execute it, append the result to messages, and call the LLM again.
    # It then calls get_trustpilot_reviews.
    # You execute that, append the JSON data, and the LLM generates the final report.

    return response.choices[0].message

# Execute the pipeline
print(analyze_competitor("Acme Corp"))

This architecture ensures the language model only operates on highly condensed, relevant information. By the time the LLM performs its final synthesis step, all HTML boilerplate and navigation logic has been stripped away. The model focuses purely on semantic analysis of the actual review text.

Scaling and Cost

Agentic workflows execute frequently. If you run a scheduled job that checks twenty competitors every hour, your infrastructure needs to handle that volume without unpredictable cost spikes. Review AlterLab pricing to calculate exact usage limits for your specific pipeline. You pay strictly for successful extractions, ensuring your agentic architecture remains highly scalable and your budgeting remains predictable.

Key Takeaways

Giving your AI agent access to Trustpilot data requires robust infrastructure. Raw HTTP calls fail against modern bot protection. Sending raw HTML wastes token context windows.

By using an extraction API built for AI workloads, you bypass these limitations. You define strict JSON schemas. The infrastructure handles the browser rendering and challenge solving. Your agent receives dense, structured data blocks. This creates reliable, automated pipelines for reputation monitoring, competitor analysis, and automated support operations.