DEV Community

Cover image for Starter Guide: Browser Agents
Emil Anker
Emil Anker

Posted on

Starter Guide: Browser Agents

Overview

A browser agent receives instruction (e.g., “Open example.com and get the page title”) and then performs browser actions—click, type, navigate, scrape—on a remote or headless browser. The steps below let you:

  1. Use Vercel AI SDK + Anthropic or OpenAI to interpret the user’s instruction.

  2. Call “Browser Use (Open)” or Multi-On to automate a browser session in the cloud, without installing heavy local dependencies.

  3. Return the browser’s output (like scraped text) to the user in a Next.js front end.

We’ll also highlight a Vercel Template to help you get the UI running fast, plus an example using Llama-2-7b-chat on Replicate if you prefer open-source.


Ingredients (Est. 15 min)

  1. Next.js (13+ recommended)
  2. Vercel AI SDK
  3. LLM
  4. Browser Automation

(Est. 15 min) to sign up for any LLM service, select a Next.js template on Vercel, and install dependencies.


Step 1: Set Up Your Next.js Project (Est. 15 min)

Option A: Create from scratch

npx create-next-app browser-agent
cd browser-agent
Enter fullscreen mode Exit fullscreen mode

Option B: Use a Vercel Template

  1. Visit vercel.com/templates.
  2. Select a Next.js starter (e.g. “Next.js 13 Minimal Starter”).
  3. Click “Deploy,” then clone it locally (or directly code in the Vercel environment).

Install the Vercel AI SDK

npm install @vercel/ai
Enter fullscreen mode Exit fullscreen mode

Choose Your Browser Framework

  • Browser Use (Open):
  npm install browser-use
Enter fullscreen mode Exit fullscreen mode
  • Or Multi-On Cookbook (replace references accordingly in the code below).

Add Environment Variables

Create .env.local:

ANTHROPIC_API_KEY=your_anthropic_key
# OR
OPENAI_API_KEY=your_openai_key

# If using Llama-2-7b-chat on Replicate:
REPLICATE_API_TOKEN=...
Enter fullscreen mode Exit fullscreen mode

(Est. 15 min) for setting up the template, installing deps, and adding environment variables.


Step 2: Browser Agent API Route (Est. 30 min)

Create a route in Next.js 13 at app/api/agent/route.js (or pages/api/agent.js if using Next.js 12). If you used a Vercel Template, just add this file:

// app/api/agent/route.js
import { NextResponse } from 'next/server';
import { launch } from 'browser-use'; 

//-------------------------------------------------------------------
// 1. CHOOSE YOUR LLM OPTION
//-------------------------------------------------------------------

// Option A: Anthropic/OpenAI with Vercel AI SDK
import { createLLMService } from '@vercel/ai';

// Option B: Llama-2-7b-chat via Replicate (custom fetch)
async function fetchLlama2Replicate(userPrompt) {
  const REPLICATE_URL = 'https://api.replicate.com/v1/predictions';
  const response = await fetch(REPLICATE_URL, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Token ${process.env.REPLICATE_API_TOKEN}`,
    },
    body: JSON.stringify({
      version: 'replicate/llama-2-7b-chat:latest', // see replicate for the version
      input: {
        prompt: userPrompt,
      },
    }),
  });
  const replicateData = await response.json();
  // You might need to poll for completion; depends on replicate's flow
  // The actual text might be in replicateData.output or a sub-object
  const text = replicateData?.output ?? 'No response yet';
  return text;
}

//-------------------------------------------------------------------
// 2. MAIN HANDLER
//-------------------------------------------------------------------

export async function POST(request) {
  try {
    const { userPrompt, useReplicate } = await request.json();

    // If user wants to use Llama-2-7b-chat on Replicate:
    if (useReplicate) {
      const replicateResponse = await fetchLlama2Replicate(`
        The user says: "${userPrompt}".
        Please output a JSON with:
        {"url":"https://...","action":"scrapeText","selector":"..."}
      `);
      // parse replicateResponse to get instructions
      let instructions;
      try {
        instructions = JSON.parse(replicateResponse.trim());
      } catch {
        instructions = { url: "", action: "none", selector: "", inputText: "" };
      }

      return await handleBrowserActions(instructions);
    } else {
      // Otherwise, Anthropic/OpenAI with Vercel AI SDK
      const llm = createLLMService({
        provider: 'anthropic', // or 'openai'
        apiKey: process.env.ANTHROPIC_API_KEY, // or OPENAI_API_KEY
      });

      const llmResponse = await llm.generate({
        model: 'claude-instant-v1', // or 'gpt-3.5-turbo'
        prompt: `
          The user says: "${userPrompt}".
          Please output a JSON with:
          {
            "url": "https://...",
            "action": "scrapeText" or "click" or "fillForm",
            "selector": "...",
            "inputText": "..."
          }
          If not sure, set "action":"none".
        `,
        max_tokens: 150,
      });

      // parse LLM response
      let instructions;
      try {
        instructions = JSON.parse(llmResponse.trim());
      } catch {
        instructions = { url: "", action: "none", selector: "", inputText: "" };
      }

      return await handleBrowserActions(instructions);
    }
  } catch (error) {
    console.error(error);
    return NextResponse.json({ success: false, error: error.message }, { status: 500 });
  }
}

//-------------------------------------------------------------------
// 3. HELPER to launch browser & perform actions
//-------------------------------------------------------------------
async function handleBrowserActions(instructions) {
  const { page, browser } = await launch();
  await page.goto(instructions.url || 'https://example.com');

  let result;
  switch (instructions.action) {
    case 'scrapeText':
      result = await page.textContent(instructions.selector || 'h1');
      break;
    case 'click':
      await page.click(instructions.selector || 'body');
      result = 'Clicked the element!';
      break;
    case 'fillForm':
      if (instructions.inputText) {
        await page.fill(instructions.selector, instructions.inputText);
        result = `Filled form with: ${instructions.inputText}`;
      } else {
        result = `No inputText provided.`;
      }
      break;
    default:
      result = 'No recognized action or action was "none". Did nothing.';
      break;
  }

  await browser.close();
  return NextResponse.json({ success: true, instructions, result });
}
Enter fullscreen mode Exit fullscreen mode

Notes

  • useReplicate: We added this field so you can toggle between a Replicate Llama-2 call or the default Anthropic/OpenAI approach with minimal code changes.
  • If you only want Llama-2-7b-chat on Replicate, remove the Anthropic/OpenAI code and rely on fetchLlama2Replicate.

(Est. 30 min) to write, test, and debug this route.


Step 3: Frontend UI (Est. 15 min)

If you used a Vercel Template with a default homepage, you can replace or edit that page. For Next.js 13:

"use client";
import { useState } from "react";

export default function Home() {
  const [userPrompt, setUserPrompt] = useState("");
  const [agentOutput, setAgentOutput] = useState("");
  const [useReplicate, setUseReplicate] = useState(false);

  async function handleSubmit() {
    setAgentOutput("Loading...");
    const res = await fetch("/api/agent", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ userPrompt, useReplicate }),
    });
    const data = await res.json();
    if (data.success) {
      setAgentOutput(
        `Action done!\n\nInstructions: ${JSON.stringify(
          data.instructions
        )}\nResult: ${data.result}`
      );
    } else {
      setAgentOutput(`Error: ${data.error}`);
    }
  }

  return (
    <main style={{ padding: 20 }}>
      <h1>Browser Agent Demo</h1>
      <p>Try: "Go to https://example.com and scrapeText at 'h1'."</p>
      <input
        style={{ width: 300 }}
        value={userPrompt}
        onChange={(e) => setUserPrompt(e.target.value)}
      />
      <div style={{ marginTop: 10 }}>
        <label>
          <input
            type="checkbox"
            checked={useReplicate}
            onChange={(e) => setUseReplicate(e.target.checked)}
          />
          Use Llama-2-7b-chat on Replicate
        </label>
      </div>
      <button style={{ marginTop: 10 }} onClick={handleSubmit}>Run</button>
      <pre style={{ marginTop: 20 }}>{agentOutput}</pre>
    </main>
  );
}
Enter fullscreen mode Exit fullscreen mode

(Est. 15 min) to build a simple input form, fetch the API, and display results.


Step 4: Deploy to Vercel (Est. 15 min)

  1. Push to GitHub (or GitLab).
  2. Create a New Project on Vercel (if you didn’t do so at the template stage).
  3. In Project Settings → Environment Variables, add:
    • ANTHROPIC_API_KEY or OPENAI_API_KEY
    • If using Llama-2-7b-chat on Replicate, add REPLICATE_API_TOKEN.
  4. Deploy.

You’ll get a production URL like https://browser-agent.vercel.app.

(Est. 15 min) to finalize environment variables and deployment.


Step 5: Example Use Case — Get Headline from Hacker News

Now that your Browser Agent is running, let’s try a real-world example. We’ll scrape the latest headline from the front page of Hacker News.

  1. Go to Your Deployed Site

    • e.g., https://browser-agent.vercel.app
  2. Enter a Prompt:

   Go to https://news.ycombinator.com and scrapeText at '.titleline a'
Enter fullscreen mode Exit fullscreen mode

This instructs the LLM to generate JSON instructions like:

   {
     "url": "https://news.ycombinator.com",
     "action": "scrapeText",
     "selector": ".titleline a"
   }
Enter fullscreen mode Exit fullscreen mode
  1. Agent Executes

    • The server route interprets instructions, launches a remote browser, navigates to Hacker News, and scrapes .titleline a.
  2. Response

    • The JSON returned might look like:
     {
       "success": true,
       "instructions": {
         "url": "https://news.ycombinator.com",
         "action": "scrapeText",
         "selector": ".titleline a"
       },
       "result": "Example Headline from HN"
     }
    
  • Your UI shows “Action done!” plus the scraped headline.

More Potential Browser Agent Use Cases

Once you have a Browser Agent, you can easily add new tasks. For example:

  1. Auto-Form Filling

    “Go to example.com/login, fillForm with username: myUser, password: myPass, then click on ‘#submitBtn’.”

  2. Price Comparison

    Scrape prices from multiple e-commerce sites to find the best deal, then combine the results.

  3. Content Extraction

    Collect blog post titles, meta descriptions, or images across a list of websites.

  4. Email Reading (in Webmail)

    Navigate to Gmail/Outlook web UI, log in, parse unread messages, maybe even respond.

  5. Automated Testing

    Provide a test scenario, e.g., “Go to my staging URL, fill the form with test data, confirm success message.”


Time Summary & Final Notes

  • Initial Setup: (Est. 15 min)
  • API Route: (Est. 30 min)
  • Front End: (Est. 15 min)
  • Deployment: (Est. 15 min)

Total: ~1.5 hours for a basic functional version!

Congrats! You now have a Browser Agent that can interpret user prompts, run tasks in a headless browser, and return results—all from a Next.js front end deployed on Vercel. You’ve even tested scraping headlines from Hacker News and seen how easy it is to integrate Llama-2-7b-chat on Replicate. Next, explore multi-step flows, advanced UI, or a different use case to take your agent even further.

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (1)

Collapse
 
vitorbruno404 profile image
vitorbruno404

awesome!

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay