DEV Community

Cover image for How to Give Your AI Agent Access to G2 Data
AlterLab
AlterLab

Posted on • Originally published at alterlab.io

How to Give Your AI Agent Access to G2 Data

Disclaimer: This guide covers accessing publicly available data. Always review a site's robots.txt and Terms of Service before automated access.

TL;DR

To give an AI agent access to G2 data, route its tool calls through AlterLab's Extract API. This provides structured JSON directly to the LLM context window, bypassing the need for manual HTML parsing while handling browser rendering and rate limits automatically.

Why AI Agents Need G2 Data

AI agents building software comparison RAG pipelines require real-world user feedback. G2 hosts millions of public reviews, feature ratings, and market categorizations. Accessing this data enables agents to perform specific tasks:

  1. Software Comparison Research: Agents can pull feature matrices and user sentiment to compare tools dynamically, generating unbiased recommendations based on empirical data.
  2. Competitor Intelligence: Pipelines can monitor a competitor's page for new negative reviews, alerting product teams to specific missing features.
  3. Category Monitoring: Agents can track entire software categories to identify emerging tools and shift market position strategies.

Why Raw HTTP Requests Fail for Agents

Giving an LLM a standard HTTP client tool usually leads to pipeline failure. Target sites like G2 employ sophisticated rate limiting and browser fingerprinting. Standard GET requests fail to render client-side JavaScript, triggering bot detection mechanisms immediately.

When this happens, the agent receives an HTML challenge page instead of data. This pollutes the context window. It wastes token budgets on retries. Often, the LLM hallucinates answers based on incomplete security page text. Agents need structured data, not raw DOM elements and CAPTCHA challenges.

Connecting Your Agent to G2 via AlterLab

The solution is an intermediary tool that handles the transport layer and returns clean JSON. AlterLab provides this infrastructure. Before implementing the tool, follow our getting started guide to configure your environment and API keys.

You have two primary approaches: the Extract API for structured data and the Scrape API for raw HTML.

The Extract API Approach

The Extract API is designed specifically for AI agents. You define a schema, and the API returns a JSON object matching that schema. This minimizes context window usage. Review the full Extract API docs for advanced schema configurations.

```python title="agent_extract.py" {3-9}

client = alterlab.Client("YOUR_API_KEY")

Structured extraction gets clean data without parsing HTML

result = client.extract(
url="https://g2.com/categories/marketing-automation",
schema={
"products": ["string"],
"top_features": ["string"],
"average_rating": "number"
}
)

print(result.data) # Clean structured dict, ready for your LLM






```bash title="Terminal"
curl -X POST https://api.alterlab.io/api/v1/extract \
  -H "X-API-Key: YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://g2.com/categories/marketing-automation", 
    "schema": {"products": ["string"]}
  }'
Enter fullscreen mode Exit fullscreen mode

The Scrape API Approach

If your agent operates in a Python environment and prefers to use tools like BeautifulSoup locally, you can use the Scrape API. This returns the raw HTML after full JavaScript rendering.

```python title="agent_scrape.py" {3-4}

client = alterlab.Client("YOUR_API_KEY")
html_content = client.scrape(url="https://g2.com/categories/crm")

Agent can now parse the full DOM locally




## Using the Search API for G2 Queries

Agents rarely know exact URLs in advance. A user might prompt the agent with "Compare the top CRM tools on G2." The agent must first search to find the correct pages. 

The AlterLab Search API allows agents to execute queries and retrieve organic results, which they can then feed into the Extract API.



```python title="agent_search.py" {3-7}

client = alterlab.Client("YOUR_API_KEY")

search_results = client.search(
    query="site:g2.com best crm software 2026",
    num_results=3
)

for result in search_results.data:
    print(result.url)
    # Agent iterates over URLs to extract reviews
Enter fullscreen mode Exit fullscreen mode

MCP Integration

If you use Claude, Cursor, or an MCP-compatible framework, you do not need to write custom Python tools. You can use the AlterLab MCP server. It exposes the Extract, Scrape, and Search endpoints directly to the model as native tool calls.

To configure this environment, read the AlterLab for AI Agents tutorial. Once connected, Claude can autonomously search G2, extract schemas, and synthesize answers without additional wrapper code.

Building a Software Comparison Research Pipeline

Let us build a complete function-calling pipeline. This example shows the logical flow of an agent receiving a user query, fetching G2 data, and generating a final report.

```python title="comparison_pipeline.py" {16-25}

alterlab_client = alterlab.Client("YOUR_ALTERLAB_KEY")
llm_client = openai.Client(api_key="YOUR_OPENAI_KEY")

def get_g2_product_data(url: str) -> str:
"""Tool provided to the LLM to fetch G2 data."""
result = alterlab_client.extract(
url=url,
schema={
"product_name": "string",
"overall_rating": "number",
"recent_reviews": [{"pros": "string", "cons": "string"}]
}
)
return json.dumps(result.data)

tools = [{
"type": "function",
"function": {
"name": "get_g2_product_data",
"description": "Extracts structured product data and reviews from a G2 URL.",
"parameters": {
"type": "object",
"properties": {
"url": {"type": "string", "description": "The G2 product URL"}
},
"required": ["url"]
}
}
}]

Agent execution loop

messages = [{"role": "user", "content": "Compare the recent pros and cons of Product A vs Product B based on their G2 pages. Product A: https://g2.com/products/a/reviews. Product B: https://g2.com/products/b/reviews."}]

response = llm_client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)

In a complete application, you handle the tool_calls,

append the JSON results to messages, and call the LLM again.




When scaling this pipeline across thousands of products, check [AlterLab pricing](/pricing) to model your API usage costs. The Extract API significantly reduces LLM token costs by dropping heavy HTML markup before the data reaches your context window.

<div data-infographic="try-it" data-url="https://g2.com" data-description="Extract structured G2 data for your AI agent"></div>

## Key Takeaways

1.  **Skip the DOM**: Giving your agent raw HTML wastes tokens and increases latency. Always use structured extraction endpoints.
2.  **Automate Transport**: Offload browser rendering and rate limiting to AlterLab so your agent focuses entirely on reasoning and synthesis.
3.  **Use MCP for Zero-Code Tools**: Connect Claude or Cursor directly to AlterLab via MCP to grant instant web data access without writing custom Python wrappers.
Enter fullscreen mode Exit fullscreen mode

Top comments (0)