DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

How to use PageBolt MCP tools in a LangChain or LlamaIndex agent

How to Use PageBolt MCP Tools in a LangChain or LlamaIndex Agent

MCP (Model Context Protocol) isn't just for IDE assistants like Claude Desktop and Cursor. You can connect any MCP server to a Python agent framework programmatically — and get every tool it exposes automatically, with no HTTP wrappers to write.

Here's how to give a LangChain or LlamaIndex agent access to PageBolt's 8 browser tools via MCP.

Why MCP over direct API calls

When you wrap the PageBolt API manually as LangChain tools, you define one function per endpoint and maintain the schemas yourself. With MCP:

  • All tools are discovered automatically from the server — add a new MCP tool and your agent gets it without a code change
  • Tool descriptions, parameter schemas, and validation come from the server
  • Same config pattern as Claude Desktop/Cursor — one server, many clients

LangChain: using langchain-mcp-adapters

The langchain-mcp-adapters package bridges MCP servers and LangChain tool interfaces.

pip install langchain-mcp-adapters langchain-openai langgraph
Enter fullscreen mode Exit fullscreen mode

Connect and load tools

import asyncio
import os
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

async def run_agent_with_mcp():
    # Start the PageBolt MCP server as a subprocess
    server_params = StdioServerParameters(
        command="npx",
        args=["-y", "pagebolt-mcp"],
        env={"PAGEBOLT_API_KEY": os.environ["PAGEBOLT_API_KEY"]},
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # Load all PageBolt tools as LangChain-compatible tools
            tools = await load_mcp_tools(session)
            print(f"Loaded {len(tools)} tools: {[t.name for t in tools]}")
            # → ['take_screenshot', 'generate_pdf', 'create_og_image',
            #    'run_sequence', 'record_video', 'inspect_page',
            #    'list_devices', 'check_usage']

            # Build a ReAct agent with the tools
            model = ChatOpenAI(model="gpt-4o", temperature=0)
            agent = create_react_agent(model, tools)

            # Run a task
            result = await agent.ainvoke({
                "messages": [
                    {
                        "role": "user",
                        "content": (
                            "Screenshot https://news.ycombinator.com "
                            "and tell me the top 5 story titles"
                        ),
                    }
                ]
            })
            return result["messages"][-1].content


if __name__ == "__main__":
    print(asyncio.run(run_agent_with_mcp()))
Enter fullscreen mode Exit fullscreen mode

Multi-task agent loop

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

async def build_agent():
    """Build a persistent agent with PageBolt MCP tools."""
    server_params = StdioServerParameters(
        command="npx",
        args=["-y", "pagebolt-mcp"],
        env={"PAGEBOLT_API_KEY": os.environ["PAGEBOLT_API_KEY"]},
    )
    # Note: keep the session open for the agent's lifetime
    read, write = await stdio_client(server_params).__aenter__()
    session = await ClientSession(read, write).__aenter__()
    await session.initialize()

    tools = await load_mcp_tools(session)
    model = ChatOpenAI(model="gpt-4o", temperature=0)
    agent = create_react_agent(model, tools)
    return agent


async def main():
    agent = await build_agent()

    tasks = [
        "Take a full-page screenshot of https://example.com and describe the layout",
        "Inspect https://example.com — list all buttons and links with their selectors",
        "Generate a PDF of https://example.com",
        "What devices are available for mobile screenshots?",
        "How many API requests have I used this month?",
    ]

    for task in tasks:
        print(f"\nTask: {task}")
        result = await agent.ainvoke({"messages": [{"role": "user", "content": task}]})
        print(f"Result: {result['messages'][-1].content}")


asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

LlamaIndex: native MCP tool support

LlamaIndex has a built-in BasicMCPClient that connects to MCP servers directly.

pip install llama-index llama-index-tools-mcp llama-index-llms-openai
Enter fullscreen mode Exit fullscreen mode
import asyncio
import os
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

async def run_llamaindex_agent():
    # Connect to the PageBolt MCP server
    mcp_client = BasicMCPClient(
        command="npx",
        args=["-y", "pagebolt-mcp"],
        env={"PAGEBOLT_API_KEY": os.environ["PAGEBOLT_API_KEY"]},
    )

    # Wrap as LlamaIndex tools
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()
    print(f"Loaded {len(tools)} tools: {[t.metadata.name for t in tools]}")

    # Build a ReAct agent
    llm = OpenAI(model="gpt-4o", temperature=0)
    agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)

    # Run tasks
    response = agent.chat(
        "Screenshot https://pagebolt.dev on an iPhone 14 Pro and "
        "describe the mobile layout"
    )
    print(response)

    response = agent.chat(
        "Inspect https://example.com/login and find the CSS selectors "
        "for the username and password fields"
    )
    print(response)


asyncio.run(run_llamaindex_agent())
Enter fullscreen mode Exit fullscreen mode

LlamaIndex: workflow with sequential tool calls

import asyncio
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent import ReActAgent
from llama_index.llms.openai import OpenAI

async def visual_qa_workflow(url: str) -> str:
    """Inspect a page, then screenshot it, then report findings."""
    mcp_client = BasicMCPClient(
        command="npx",
        args=["-y", "pagebolt-mcp"],
        env={"PAGEBOLT_API_KEY": os.environ["PAGEBOLT_API_KEY"]},
    )
    mcp_tool_spec = McpToolSpec(client=mcp_client)
    tools = await mcp_tool_spec.to_tool_list_async()

    llm = OpenAI(model="gpt-4o", temperature=0)
    agent = ReActAgent.from_tools(tools, llm=llm, verbose=True)

    result = agent.chat(f"""
    Perform a quick audit of {url}:
    1. Use inspect_page to map all interactive elements
    2. Use take_screenshot to capture the full page
    3. Report: how many forms, buttons, and links are on the page?
       Does the visual layout look complete?
    """)
    return str(result)


result = asyncio.run(visual_qa_workflow("https://example.com"))
print(result)
Enter fullscreen mode Exit fullscreen mode

Using Claude as the LLM (Anthropic + LangChain)

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from langchain_mcp_adapters.tools import load_mcp_tools
from langchain_anthropic import ChatAnthropic
from langgraph.prebuilt import create_react_agent

async def run_with_claude():
    server_params = StdioServerParameters(
        command="npx",
        args=["-y", "pagebolt-mcp"],
        env={"PAGEBOLT_API_KEY": os.environ["PAGEBOLT_API_KEY"]},
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()
            tools = await load_mcp_tools(session)

            # Claude is natively multimodal — it can see the screenshots it takes
            model = ChatAnthropic(model="claude-sonnet-4-6", temperature=0)
            agent = create_react_agent(model, tools)

            result = await agent.ainvoke({
                "messages": [{
                    "role": "user",
                    "content": (
                        "Screenshot https://example.com in dark mode and "
                        "describe the color scheme and typography"
                    ),
                }]
            })
            return result["messages"][-1].content


print(asyncio.run(run_with_claude()))
Enter fullscreen mode Exit fullscreen mode

What tools you get

Once connected, your agent has access to all 8 PageBolt MCP tools:

Tool What it does
take_screenshot Screenshot any URL, HTML, or Markdown — 30+ params including device, dark mode, ad blocking
generate_pdf Convert any URL or HTML to PDF
create_og_image Generate social card images from templates or custom HTML
run_sequence Multi-step browser automation (navigate, click, fill, screenshot)
record_video Record browser automation as MP4 with cursor effects and AI voice narration
inspect_page Get all interactive elements with CSS selectors — use before run_sequence
list_devices List 25+ device presets (iPhone, iPad, MacBook, Galaxy, etc.)
check_usage Check current API quota

MCP vs direct HTTP: when to use each

MCP approach Direct HTTP approach
Setup Install MCP client package, start server subprocess Write tool wrappers manually
Tools available All 8 automatically Only what you implement
Schema maintenance Server-managed Manual
Best for Agent frameworks, reusable pipelines Lightweight scripts, specific endpoints
Requires Node.js Yes (for npx pagebolt-mcp) No

The MCP approach is better for agent frameworks where you want all tools available and don't want to maintain HTTP wrapper code. Direct HTTP (article 104) is better for targeted scripts or environments without Node.js.


Try it free — 100 requests/month, no credit card. → Get started in 2 minutes

Top comments (0)