DEV Community

Cover image for Natural Language Browser Automation with Amazon Bedrock & Playwright
Naoki Ishihara
Naoki Ishihara

Posted on

Natural Language Browser Automation with Amazon Bedrock & Playwright

Nova-click lets you control your browser with plain text commands. It's built with Amazon Nova (via Amazon Bedrock) and Playwright. Check it out on GitHub: nova-click.

TL;DR ⏩

Aspect Key Takeaway
What can it do? Convert plain text commands into browser actions.
Dependencies? Only boto3 and Playwright required.
Model? Amazon Nova (via Amazon Bedrock)
Where to try? See the source on GitHub: https://github.com/Naoki0513/nova-click
Who is it for? Ideal for developers and researchers interested in browser automation.

Demo 🎬

Image description

Prompt: "Search for the most popular waterproof Bluetooth speaker under $50 on Amazon and add it to the cart.

View the high-quality version here.

Quick Start 🚀

# 1. Clone the repository
git clone https://github.com/Naoki0513/nova-click.git
cd nova-click

# 2. Install dependencies
pip install boto3==1.38.13 playwright==1.40.0
python -m playwright install chromium

# 3. Set up AWS credentials
mkdir credentials
vim credentials/aws_credentials.json   # Save with the format below
Enter fullscreen mode Exit fullscreen mode
{
  "aws_access_key_id": "AKIA…",
  "aws_secret_access_key": "xxxxxxxx",
  "region_name": "us-west-2"
}
Enter fullscreen mode Exit fullscreen mode
# 4. Launch
python main.py
Enter fullscreen mode Exit fullscreen mode

Modify the constants in main.py to test different prompts and models.

Motivation 💡

"Imagine AI managing all your browser tasks—that would change the game."

Automating repetitive browser tasks lets users focus on creativity and impact—this idea inspired nova-click.

How It Works ⚙️

nova-click loops through three steps

  • Page ARIA tree snapshot as JSON
  • Amazon Bedrock decision (click or type)
  • Playwright action execution

Repeat until your task is complete.

Why Amazon Nova? 💎

Amazon Nova, running on Amazon Bedrock, delivers cost-efficiency, speed, and large context support. Here's how it compares:

Model Input Cost ($/M tokens) Output Cost ($/M tokens) Context Window Speed
Amazon Nova Pro 0.80 3.20 Up to 300K tokens
GPT-4o 2.50 10.00 Up to 128K tokens
Claude 3.7 Sonnet 3.00 15.00 Up to 200K tokens*

FAQ ❓

Q1. How does nova-click differ from Nova Act?

Nova Act (https://nova.amazon.com/act) is Amazon's dedicated browser-automation model, while nova-click is a lightweight framework that works with any general-purpose LLM.

Q2. How reliable is it?

nova-click is an early research prototype and can be flaky—expect occasional missteps on complex pages, so review and tweak its actions as needed.

Q3. Does it use Playwright MCP?

No—nova-click uses plain Playwright in Python, though it draws inspiration from MCP.

Get Started 🎉

  1. Clone and run python main.py.
  2. Describe your workflow in plain English.
  3. Stars, issues, and PRs welcome.

Join us in exploring LLM-driven browser automation with nova-click.

Top comments (0)