Emil Anker

Posted on Jan 30

Starter Guide: Browser Agents

#browser #agents #vercel #hackathon

Overview

A browser agent receives instruction (e.g., “Open example.com and get the page title”) and then performs browser actions—click, type, navigate, scrape—on a remote or headless browser. The steps below let you:

Use Vercel AI SDK + Anthropic or OpenAI to interpret the user’s instruction.
Call “Browser Use (Open)” or Multi-On to automate a browser session in the cloud, without installing heavy local dependencies.
Return the browser’s output (like scraped text) to the user in a Next.js front end.

We’ll also highlight a Vercel Template to help you get the UI running fast, plus an example using Llama-2-7b-chat on Replicate if you prefer open-source.

Ingredients (Est. 15 min)

Next.js (13+ recommended)
- Quick Start: create-next-app
- Or use a Vercel Template such as
  - Next.js 13
  - Next.js Tailwind
Vercel AI SDK
- Docs: sdk.vercel.ai
LLM
- Default: Anthropic or OpenAI
- Alternative (Open Source): Llama-2-7b-chat on Replicate
Browser Automation
- Browser Use (Open), or
- Multi-On Cookbook

(Est. 15 min) to sign up for any LLM service, select a Next.js template on Vercel, and install dependencies.

Step 1: Set Up Your Next.js Project (Est. 15 min)

Option A: Create from scratch

npx create-next-app browser-agent
cd browser-agent

Option B: Use a Vercel Template

Visit vercel.com/templates.
Select a Next.js starter (e.g. “Next.js 13 Minimal Starter”).
Click “Deploy,” then clone it locally (or directly code in the Vercel environment).

Install the Vercel AI SDK

npm install @vercel/ai

Choose Your Browser Framework

Browser Use (Open):

  npm install browser-use

Or Multi-On Cookbook (replace references accordingly in the code below).

Add Environment Variables

Create .env.local:

ANTHROPIC_API_KEY=your_anthropic_key
# OR
OPENAI_API_KEY=your_openai_key

# If using Llama-2-7b-chat on Replicate:
REPLICATE_API_TOKEN=...

(Est. 15 min) for setting up the template, installing deps, and adding environment variables.

Step 2: Browser Agent API Route (Est. 30 min)

Create a route in Next.js 13 at app/api/agent/route.js (or pages/api/agent.js if using Next.js 12). If you used a Vercel Template, just add this file:

// app/api/agent/route.js
import { NextResponse } from 'next/server';
import { launch } from 'browser-use'; 

//-------------------------------------------------------------------
// 1. CHOOSE YOUR LLM OPTION
//-------------------------------------------------------------------

// Option A: Anthropic/OpenAI with Vercel AI SDK
import { createLLMService } from '@vercel/ai';

// Option B: Llama-2-7b-chat via Replicate (custom fetch)
async function fetchLlama2Replicate(userPrompt) {
  const REPLICATE_URL = 'https://api.replicate.com/v1/predictions';
  const response = await fetch(REPLICATE_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Token ${process.env.REPLICATE_API_TOKEN}`,
    },
    body: JSON.stringify({
      version: 'replicate/llama-2-7b-chat:latest', // see replicate for the version
      input: {
        prompt: userPrompt,
      },
    }),
  });
  const replicateData = await response.json();
  // You might need to poll for completion; depends on replicate's flow
  // The actual text might be in replicateData.output or a sub-object
  const text = replicateData?.output ?? 'No response yet';
  return text;
}

//-------------------------------------------------------------------
// 2. MAIN HANDLER
//-------------------------------------------------------------------

export async function POST(request) {
  try {
    const { userPrompt, useReplicate } = await request.json();

    // If user wants to use Llama-2-7b-chat on Replicate:
    if (useReplicate) {
      const replicateResponse = await fetchLlama2Replicate(`
        The user says: "${userPrompt}".
        Please output a JSON with:
        {"url":"https://...","action":"scrapeText","selector":"..."}
      `);
      // parse replicateResponse to get instructions
      let instructions;
      try {
        instructions = JSON.parse(replicateResponse.trim());
      } catch {
        instructions = { url: "", action: "none", selector: "", inputText: "" };
      }

      return await handleBrowserActions(instructions);
    } else {
      // Otherwise, Anthropic/OpenAI with Vercel AI SDK
      const llm = createLLMService({
        provider: 'anthropic', // or 'openai'
        apiKey: process.env.ANTHROPIC_API_KEY, // or OPENAI_API_KEY
      });

      const llmResponse = await llm.generate({
        model: 'claude-instant-v1', // or 'gpt-3.5-turbo'
        prompt: `
          The user says: "${userPrompt}".
          Please output a JSON with:
          {
            "url": "https://...",
            "action": "scrapeText" or "click" or "fillForm",
            "selector": "...",
            "inputText": "..."
          }
          If not sure, set "action":"none".
        `,
        max_tokens: 150,
      });

      // parse LLM response
      let instructions;
      try {
        instructions = JSON.parse(llmResponse.trim());
      } catch {
        instructions = { url: "", action: "none", selector: "", inputText: "" };
      }

      return await handleBrowserActions(instructions);
    }
  } catch (error) {
    console.error(error);
    return NextResponse.json({ success: false, error: error.message }, { status: 500 });
  }
}

//-------------------------------------------------------------------
// 3. HELPER to launch browser & perform actions
//-------------------------------------------------------------------
async function handleBrowserActions(instructions) {
  const { page, browser } = await launch();
  await page.goto(instructions.url || 'https://example.com');

  let result;
  switch (instructions.action) {
    case 'scrapeText':
      result = await page.textContent(instructions.selector || 'h1');
      break;
    case 'click':
      await page.click(instructions.selector || 'body');
      result = 'Clicked the element!';
      break;
    case 'fillForm':
      if (instructions.inputText) {
        await page.fill(instructions.selector, instructions.inputText);
        result = `Filled form with: ${instructions.inputText}`;
      } else {
        result = `No inputText provided.`;
      }
      break;
    default:
      result = 'No recognized action or action was "none". Did nothing.';
      break;
  }

  await browser.close();
  return NextResponse.json({ success: true, instructions, result });
}

Notes

useReplicate: We added this field so you can toggle between a Replicate Llama-2 call or the default Anthropic/OpenAI approach with minimal code changes.
If you only want Llama-2-7b-chat on Replicate, remove the Anthropic/OpenAI code and rely on fetchLlama2Replicate.

(Est. 30 min) to write, test, and debug this route.

Step 3: Frontend UI (Est. 15 min)

If you used a Vercel Template with a default homepage, you can replace or edit that page. For Next.js 13:

"use client";
import { useState } from "react";

export default function Home() {
  const [userPrompt, setUserPrompt] = useState("");
  const [agentOutput, setAgentOutput] = useState("");
  const [useReplicate, setUseReplicate] = useState(false);

  async function handleSubmit() {
    setAgentOutput("Loading...");
    const res = await fetch("/api/agent", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ userPrompt, useReplicate }),
    });
    const data = await res.json();
    if (data.success) {
      setAgentOutput(
        `Action done!\n\nInstructions: ${JSON.stringify(
          data.instructions
        )}\nResult: ${data.result}`
      );
    } else {
      setAgentOutput(`Error: ${data.error}`);
    }
  }

  return (
    <main style={{ padding: 20 }}>
      <h1>Browser Agent Demo</h1>
      <p>Try: "Go to https://example.com and scrapeText at 'h1'."</p>
      <input
        style={{ width: 300 }}
        value={userPrompt}
        onChange={(e) => setUserPrompt(e.target.value)}
      />
      <div style={{ marginTop: 10 }}>
        <label>
          <input
            type="checkbox"
            checked={useReplicate}
            onChange={(e) => setUseReplicate(e.target.checked)}
          />
          Use Llama-2-7b-chat on Replicate
        </label>
      </div>
      <button style={{ marginTop: 10 }} onClick={handleSubmit}>Run</button>
      <pre style={{ marginTop: 20 }}>{agentOutput}</pre>
    </main>
  );
}

(Est. 15 min) to build a simple input form, fetch the API, and display results.

Step 4: Deploy to Vercel (Est. 15 min)

Push to GitHub (or GitLab).
Create a New Project on Vercel (if you didn’t do so at the template stage).
In Project Settings → Environment Variables, add:
- ANTHROPIC_API_KEY or OPENAI_API_KEY
- If using Llama-2-7b-chat on Replicate, add REPLICATE_API_TOKEN.
Deploy.

You’ll get a production URL like https://browser-agent.vercel.app.

(Est. 15 min) to finalize environment variables and deployment.

Step 5: Example Use Case — Get Headline from Hacker News

Now that your Browser Agent is running, let’s try a real-world example. We’ll scrape the latest headline from the front page of Hacker News.

Go to Your Deployed Site
- e.g., https://browser-agent.vercel.app
Enter a Prompt:

   Go to https://news.ycombinator.com and scrapeText at '.titleline a'

This instructs the LLM to generate JSON instructions like:

   {
     "url": "https://news.ycombinator.com",
     "action": "scrapeText",
     "selector": ".titleline a"
   }

Agent Executes
- The server route interprets instructions, launches a remote browser, navigates to Hacker News, and scrapes .titleline a.

Response

The JSON returned might look like:

 {
   "success": true,
   "instructions": {
     "url": "https://news.ycombinator.com",
     "action": "scrapeText",
     "selector": ".titleline a"
   },
   "result": "Example Headline from HN"
 }

Your UI shows “Action done!” plus the scraped headline.

More Potential Browser Agent Use Cases

Once you have a Browser Agent, you can easily add new tasks. For example:

Auto-Form Filling

“Go to example.com/login, fillForm with username: myUser, password: myPass, then click on ‘#submitBtn’.”
Price Comparison

Scrape prices from multiple e-commerce sites to find the best deal, then combine the results.
Content Extraction

Collect blog post titles, meta descriptions, or images across a list of websites.
Email Reading (in Webmail)

Navigate to Gmail/Outlook web UI, log in, parse unread messages, maybe even respond.
Automated Testing

Provide a test scenario, e.g., “Go to my staging URL, fill the form with test data, confirm success message.”

Time Summary & Final Notes

Initial Setup: (Est. 15 min)
API Route: (Est. 30 min)
Front End: (Est. 15 min)
Deployment: (Est. 15 min)

Total: ~1.5 hours for a basic functional version!

Congrats! You now have a Browser Agent that can interpret user prompts, run tasks in a headless browser, and return results—all from a Next.js front end deployed on Vercel. You’ve even tested scraping headlines from Hacker News and seen how easy it is to integrate Llama-2-7b-chat on Replicate. Next, explore multi-step flows, advanced UI, or a different use case to take your agent even further.

Top comments (1)

vitorbruno404 • Jan 30

awesome!