Mohammed Anes

Posted on Mar 22

Amazon Nova Act Deep Dive — Perceive, Act, Deploy: How AWS Built a 90%+ Reliable Browser Agent

#aws #ai #automation #python

Raise your hand if this has happened to you:

You write a Selenium script. It works on Friday. On Monday, the site changed a button class, and it's broken.

You switch to Playwright. Better. But the moment a cookie banner pops up at the wrong time, your agent halts, completely lost.

This is the core problem with browser automation: it's rule-based. You're telling it exactly what to click — not what you want to accomplish.

AI agents were supposed to fix this. But the first generation of LLM-powered browser bots had a different problem: give a general LLM one big instruction like "book me the cheapest flight to Delhi", and it would hallucinate steps, lose context midway, or confidently click the wrong thing with zero awareness of failure.

Benchmarks showed state-of-the-art models hitting only 30–60% accuracy on real browser tasks.

Amazon Nova Act was built specifically to close this gap — and it reports over 90% reliability at scale.

Here's the full architecture breakdown.

What Is Amazon Nova Act?

Nova Act is an AWS service for building and managing fleets of reliable AI agents that automate browser-based UI workflows. Introduced by Amazon AGI Labs in early 2025 as a research preview, it moved to general availability with full AWS integrations: IAM, S3, CloudWatch, and Bedrock AgentCore.

The one-line summary: Nova Act is what you get when you train a model specifically and exclusively for browser automation — not bolt an LLM onto existing tools.

What makes it different:

Feature	Traditional Automation	Nova Act
Element targeting	CSS selectors / XPath	Natural language + vision
Breaks on UI change?	Yes, constantly	Rarely — it sees the page
Reliability	30–60% (complex tasks)	90%+
Infrastructure	You manage it	AgentCore handles it
Human escalation	Manual alert setup	Built-in, first-class

The key insight behind the reliability number: vertical integration. Most agentic frameworks take a general model and attach browser tools. Nova Act co-trained the model, orchestrator, and browser actuator together end-to-end. That's what moves the needle from 50% to 90%.

The Core Loop: Perceive → Reason → Act

Every Nova Act workflow runs on one repeating cycle:

📸 Screenshot  →  🧠 Nova 2 Lite Model  →  ⚡ Action  →  📸 Screenshot again

Step 1 — Perceive: A screenshot of the current browser state is taken and passed to the model along with your natural language instruction. No DOM parsing. No HTML inspection. It sees the page visually, the same way a human does.

Step 2 — Reason: Amazon Nova 2 Lite (the custom foundation model powering Nova Act) produces a low-level action plan: click at (x, y), type "Chennai", scroll down 400px, press Enter. Trained with reinforcement learning on in-domain browser data — it predicts the next correct browser action, not just the next token.

Step 3 — Act: The action is executed via Playwright under the hood. Nova Act sits on top of Playwright, translating natural language to precise Playwright commands automatically.

Then a new screenshot is taken and the loop repeats — until the task is done, or until the agent decides it needs a human.

The Most Important Pattern: Atomic `act()` Calls

This is the single design decision that separates Nova Act from everything else. And it's deceptively simple:

Don't give one big instruction. Give many small, precise ones.

❌ This approach has ~50% success:

nova.act("book me the cheapest flight from Chennai to Delhi next Friday")

✅ This approach gets 90%+:

nova.act("go to makemytrip.com")
nova.act("click on 'Flights'")
nova.act("set origin to Chennai")
nova.act("set destination to Delhi")
nova.act("set date to next Friday")
nova.act("click Search")
nova.act("sort results by price")
nova.act("click the cheapest result")

Each act() call:

Takes a fresh screenshot of the current state
Executes exactly one clearly scoped action
Returns a result object you can inspect, assert, and branch on in Python

Because you're writing Python around these calls, you get conditionals, loops, retries, and error handling for free. It's not a black box — it's a library.

from nova_act import NovaAct

with NovaAct(starting_page="https://www.amazon.com") as nova:
    nova.act("search for a noise cancelling headphone under 5000 rupees")
    nova.act("select the first result")

    result = nova.act(
        "what is the price shown on this page?",
        schema={"type": "object", "properties": {"price": {"type": "string"}}}
    )
    print(result.parsed_response)  # {"price": "₹3,499"}

    if result.parsed_response["price"]:
        nova.act("click Add to Cart")

Notice the schema parameter — that's how you extract structured data from any page. No CSS selectors. No XPath. Just natural language + a JSON schema, and Nova Act fills it in from what it sees on screen.

Structured Outputs: Turn Any Website Into an API

This feature deserves its own callout because it's genuinely powerful.

result = nova.act(
    "what are the top 3 products on this page?",
    schema={
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name":   {"type": "string"},
                        "price":  {"type": "string"},
                        "rating": {"type": "string"}
                    }
                }
            }
        }
    }
)

products = result.parsed_response["products"]
# [{"name": "...", "price": "...", "rating": "..."}, ...]

Any website — even one with zero public API — becomes a structured data source. Combine with ThreadPoolExecutor and you're running 10 concurrent extractions at once.

Running Workflows in Parallel

Nova Act is designed for fleet-scale. Here's the concurrency pattern:

from concurrent.futures import ThreadPoolExecutor, as_completed
from nova_act import NovaAct, ActError

def process_vendor(vendor_url: str):
    with NovaAct(starting_page=vendor_url) as nova:
        nova.act("log in using saved credentials")
        nova.act("navigate to pending invoices")
        return nova.act(
            "extract all invoices",
            schema=invoice_schema
        ).parsed_response

vendor_urls = [
    "https://vendor1.com",
    "https://vendor2.com",
    "https://vendor3.com",
]

with ThreadPoolExecutor(max_workers=10) as executor:
    futures = {executor.submit(process_vendor, url): url for url in vendor_urls}
    for future in as_completed(futures):
        try:
            data = future.result()
            save_to_s3(data)
        except ActError as e:
            print(f"Failed: {futures[future]} → {e}")

Ten browser sessions, running in parallel, each isolated, each logged — all managed through AWS.

The Full AWS Stack

Nova Act doesn't run in a silo. Here's how it slots into AWS:

┌─────────────────────────────────────────────────┐
│              Developer Tools                    │
│  Web Playground · IDE Extension · Python SDK   │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│           Nova Act Engine                       │
│  Amazon Nova 2 Lite · Orchestrator · Playwright │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│       Amazon Bedrock AgentCore                  │
│  Runtime · Browser Tool · Session Isolation     │
└────────────────────┬────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│            AWS Infrastructure                   │
│    ECR · IAM · S3 · CloudWatch · CloudTrail     │
└─────────────────────────────────────────────────┘

AgentCore Browser Tool

Provides a fully managed, cloud-based browser runtime:

Session isolation — every workflow gets its own containerized browser
Parallel execution at scale — no infrastructure to manage
Live viewing + session replay — debug in real time
CloudTrail logging — full audit trail for every browser action
Ephemeral containers — browser environment destroyed after each session (security by default)

Connecting to AgentCore Browser from code:

from bedrock_agentcore.tools.browser_client import browser_session
from nova_act import NovaAct
import boto3

def run_cloud_agent(prompt: str, starting_page: str, nova_act_key: str):
    with browser_session(region="us-east-1") as client:
        ws_url, headers = client.generate_ws_headers()
        with NovaAct(
            cdp_endpoint_url=ws_url,
            cdp_headers=headers,
            nova_act_api_key=nova_act_key,
            starting_page=starting_page,
        ) as nova:
            return nova.act(prompt)

Your agent now runs in a sandboxed cloud browser — not on your laptop.

Deployment in one command

The VS Code extension handles the entire deployment pipeline automatically:

Packages your workflow → Docker container
Pushes image → Amazon ECR
Creates → IAM roles + S3 buckets
Deploys → AgentCore Runtime

Zero manual infrastructure. One click in the IDE.

Cost Effectiveness: The Real Story

Traditional browser automation has hidden costs that never show up in your AWS bill:

Developer time maintaining brittle selectors (breaks every sprint)
Failed workflows causing downstream data errors
Re-running failed jobs, debugging silent failures

Nova Act's natural language instructions are resilient to UI changes. "Click the submit button" works whether it's labelled Submit, Send, or Confirm — blue or orange — left side or right side.

On infrastructure costs, AgentCore's pricing model is genuinely aligned with how agentic workloads behave:

✅ Consumption-based — pay per second of actual CPU/memory use
✅ No charge during I/O wait — agentic workloads spend 30–70% of time waiting for pages to load or LLM responses. That idle time is free.
✅ No reserved instances — no upfront commitment
✅ Free Tier — up to $200 credit for new AWS customers

Practical example: A workflow that spends 60% of its time waiting (page loads, API calls) — you only pay for 40% of wall-clock time. At scale, that compounds significantly.

Real-World Use Cases

🧪 QA and Automated Testing

Your test cases can now read like user stories:

with NovaAct(starting_page="https://yourapp.com") as nova:
    nova.act("log in as a standard user")
    nova.act("add one item to cart and go to checkout")
    nova.act("complete checkout using test card 4111111111111111")

    result = nova.act(
        "is an order confirmation visible?",
        schema={
            "type": "object",
            "properties": {
                "confirmed": {"type": "boolean"},
                "order_id":  {"type": "string"}
            }
        }
    )
    assert result.parsed_response["confirmed"] is True

No selector maintenance. No brittle IDs. Tyler Technologies (public sector software) converted manual test plans to automated suites in minutes using Nova Act — without a single CSS selector.

📋 ERP and Legacy System Automation

Many enterprise systems — legacy CRMs, ERP portals, government platforms — have no API. Nova Act handles them:

for record in crm_records:
    with NovaAct(starting_page="https://legacy-erp.company.com") as nova:
        nova.act("click New Contact")
        nova.act(f"enter '{record['name']}' in the Name field")
        nova.act(f"enter '{record['email']}' in the Email field")
        nova.act(f"select '{record['region']}' from the Region dropdown")
        nova.act("click Save")

🔍 Competitive Intelligence

Monitor competitor pricing or product listings without an API:

with NovaAct(starting_page="https://competitor.com/pricing") as nova:
    result = nova.act(
        "extract all pricing plans and their features",
        schema=pricing_schema
    )
    save_to_s3(result.parsed_response)

🏢 Vendor Portal Processing

Dozens of vendor portals, no API integration, invoice processing and status checks — run them all concurrently with automatic human escalation on exceptions.

Human-in-the-Loop: Built for Production Reality

Production agentic systems will hit edge cases. CAPTCHAs. Unexpected modals. Two-factor prompts. Nova Act handles this without you building custom alerting:

Workflow hits an ambiguous state
Nova Act pauses and sends an SNS notification to a designated supervisor
Supervisor receives a devtools URL — they can inspect and interact with the live browser state
Supervisor takes corrective action
Workflow resumes automatically

This isn't bolted on — it's a core architectural feature. The design acknowledges that 90% reliability means 10% still needs a human. That 10% is handled gracefully.

The Developer Journey: Playground → Code → Cloud

nova.amazon.com/act     →    pip install nova-act    →    VS Code Extension    →    Deploy
   Explore in browser         Write Python workflow       Debug step-by-step      One click to AWS

Quick start:

pip install nova-act
export NOVA_ACT_API_KEY="your_api_key"

from nova_act import NovaAct

with NovaAct(starting_page="https://news.ycombinator.com") as nova:
    result = nova.act(
        "what are the top 5 post titles on this page?",
        schema={
            "type": "object",
            "properties": {
                "posts": {"type": "array", "items": {"type": "string"}}
            }
        }
    )
    print(result.parsed_response)

That's all it takes to extract structured data from any webpage.

For Builders in Restricted Regions

Nova Act is currently only available in US East (N. Virginia). If you're in India, Southeast Asia, or any other region without access — here's what to do right now:

1. Learn the atomic act() pattern today. This architecture pattern is what matters. When access opens up, your mental model is already in place.

2. Approximate it with Bedrock + Playwright. Use Amazon Bedrock's Claude models with tool use + Playwright for browser control. The perceive→reason→act loop is buildable today. You lose the purpose-built model advantage, but you learn the architecture.

3. Read the public GitHub. The aws/nova-act repo is open. The amazon-agi-labs/nova-act-samples repo has CDK examples for Lambda, ECS, and AgentCore. Reading these is free and region-independent.

What's Coming

Amazon has been explicit about the roadmap:

MCP integration — Nova Act workflows calling external tools and APIs mid-session via Model Context Protocol
Strands Agents — Amazon's open-source multi-agent SDK already supports Nova Act as a sub-agent in larger pipelines
Beyond the browser — the same RL training methodology is being extended to more complex real-world task environments
More regions — no official dates, but global expansion is the clear direction

Browser automation is the beachhead. The broader target is any workflow that requires perception, reasoning, and action in a UI environment.

Key Takeaways

6 things to remember from this article:

Vertical integration is why it's reliable — model + orchestrator + actuator trained together

Atomic act() calls are the pattern — small precise steps beat one big instruction

Vision > DOM selectors — screenshots don't break when UI classes change

AgentCore pricing — you don't pay for the 30–70% of time your agent spends waiting

Human-in-the-loop is first-class — production agents hit edge cases; it's designed for that

One-command deployment — Docker, ECR, IAM, AgentCore Runtime handled automatically

Resources

🏠 Product page → aws.amazon.com/nova/act
📖 Official docs → docs.aws.amazon.com/nova-act/latest/userguide
💻 GitHub SDK → github.com/aws/nova-act
🧪 Sample code + CDK → github.com/amazon-agi-labs/nova-act-samples
💰 AgentCore pricing → aws.amazon.com/bedrock/agentcore/pricing
🎮 Web playground → nova.amazon.com/act

Found this useful? Drop a ❤️, share with your AWS friends, and follow for more deep dives on agentic AI on AWS. I'll be posting the "Build it yourself with Bedrock + Playwright" version next — a practical guide for restricted-region builders who want to implement the same perceive→act loop today.