Raise your hand if this has happened to you:
You write a Selenium script. It works on Friday. On Monday, the site changed a button class, and it's broken.
You switch to Playwright. Better. But the moment a cookie banner pops up at the wrong time, your agent halts, completely lost.
This is the core problem with browser automation: it's rule-based. You're telling it exactly what to click — not what you want to accomplish.
AI agents were supposed to fix this. But the first generation of LLM-powered browser bots had a different problem: give a general LLM one big instruction like "book me the cheapest flight to Delhi", and it would hallucinate steps, lose context midway, or confidently click the wrong thing with zero awareness of failure.
Benchmarks showed state-of-the-art models hitting only 30–60% accuracy on real browser tasks.
Amazon Nova Act was built specifically to close this gap — and it reports over 90% reliability at scale.
Here's the full architecture breakdown.
What Is Amazon Nova Act?
Nova Act is an AWS service for building and managing fleets of reliable AI agents that automate browser-based UI workflows. Introduced by Amazon AGI Labs in early 2025 as a research preview, it moved to general availability with full AWS integrations: IAM, S3, CloudWatch, and Bedrock AgentCore.
The one-line summary: Nova Act is what you get when you train a model specifically and exclusively for browser automation — not bolt an LLM onto existing tools.
What makes it different:
| Feature | Traditional Automation | Nova Act |
|---|---|---|
| Element targeting | CSS selectors / XPath | Natural language + vision |
| Breaks on UI change? | Yes, constantly | Rarely — it sees the page |
| Reliability | 30–60% (complex tasks) | 90%+ |
| Infrastructure | You manage it | AgentCore handles it |
| Human escalation | Manual alert setup | Built-in, first-class |
The key insight behind the reliability number: vertical integration. Most agentic frameworks take a general model and attach browser tools. Nova Act co-trained the model, orchestrator, and browser actuator together end-to-end. That's what moves the needle from 50% to 90%.
The Core Loop: Perceive → Reason → Act
Every Nova Act workflow runs on one repeating cycle:
📸 Screenshot → 🧠 Nova 2 Lite Model → ⚡ Action → 📸 Screenshot again
Step 1 — Perceive: A screenshot of the current browser state is taken and passed to the model along with your natural language instruction. No DOM parsing. No HTML inspection. It sees the page visually, the same way a human does.
Step 2 — Reason: Amazon Nova 2 Lite (the custom foundation model powering Nova Act) produces a low-level action plan: click at (x, y), type "Chennai", scroll down 400px, press Enter. Trained with reinforcement learning on in-domain browser data — it predicts the next correct browser action, not just the next token.
Step 3 — Act: The action is executed via Playwright under the hood. Nova Act sits on top of Playwright, translating natural language to precise Playwright commands automatically.
Then a new screenshot is taken and the loop repeats — until the task is done, or until the agent decides it needs a human.
The Most Important Pattern: Atomic act() Calls
This is the single design decision that separates Nova Act from everything else. And it's deceptively simple:
Don't give one big instruction. Give many small, precise ones.
❌ This approach has ~50% success:
nova.act("book me the cheapest flight from Chennai to Delhi next Friday")
✅ This approach gets 90%+:
nova.act("go to makemytrip.com")
nova.act("click on 'Flights'")
nova.act("set origin to Chennai")
nova.act("set destination to Delhi")
nova.act("set date to next Friday")
nova.act("click Search")
nova.act("sort results by price")
nova.act("click the cheapest result")
Each act() call:
- Takes a fresh screenshot of the current state
- Executes exactly one clearly scoped action
- Returns a result object you can inspect, assert, and branch on in Python
Because you're writing Python around these calls, you get conditionals, loops, retries, and error handling for free. It's not a black box — it's a library.
from nova_act import NovaAct
with NovaAct(starting_page="https://www.amazon.com") as nova:
nova.act("search for a noise cancelling headphone under 5000 rupees")
nova.act("select the first result")
result = nova.act(
"what is the price shown on this page?",
schema={"type": "object", "properties": {"price": {"type": "string"}}}
)
print(result.parsed_response) # {"price": "₹3,499"}
if result.parsed_response["price"]:
nova.act("click Add to Cart")
Notice the schema parameter — that's how you extract structured data from any page. No CSS selectors. No XPath. Just natural language + a JSON schema, and Nova Act fills it in from what it sees on screen.
Structured Outputs: Turn Any Website Into an API
This feature deserves its own callout because it's genuinely powerful.
result = nova.act(
"what are the top 3 products on this page?",
schema={
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "string"},
"rating": {"type": "string"}
}
}
}
}
}
)
products = result.parsed_response["products"]
# [{"name": "...", "price": "...", "rating": "..."}, ...]
Any website — even one with zero public API — becomes a structured data source. Combine with ThreadPoolExecutor and you're running 10 concurrent extractions at once.
Running Workflows in Parallel
Nova Act is designed for fleet-scale. Here's the concurrency pattern:
from concurrent.futures import ThreadPoolExecutor, as_completed
from nova_act import NovaAct, ActError
def process_vendor(vendor_url: str):
with NovaAct(starting_page=vendor_url) as nova:
nova.act("log in using saved credentials")
nova.act("navigate to pending invoices")
return nova.act(
"extract all invoices",
schema=invoice_schema
).parsed_response
vendor_urls = [
"https://vendor1.com",
"https://vendor2.com",
"https://vendor3.com",
]
with ThreadPoolExecutor(max_workers=10) as executor:
futures = {executor.submit(process_vendor, url): url for url in vendor_urls}
for future in as_completed(futures):
try:
data = future.result()
save_to_s3(data)
except ActError as e:
print(f"Failed: {futures[future]} → {e}")
Ten browser sessions, running in parallel, each isolated, each logged — all managed through AWS.
The Full AWS Stack
Nova Act doesn't run in a silo. Here's how it slots into AWS:
┌─────────────────────────────────────────────────┐
│ Developer Tools │
│ Web Playground · IDE Extension · Python SDK │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ Nova Act Engine │
│ Amazon Nova 2 Lite · Orchestrator · Playwright │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ Amazon Bedrock AgentCore │
│ Runtime · Browser Tool · Session Isolation │
└────────────────────┬────────────────────────────┘
│
┌────────────────────▼────────────────────────────┐
│ AWS Infrastructure │
│ ECR · IAM · S3 · CloudWatch · CloudTrail │
└─────────────────────────────────────────────────┘
AgentCore Browser Tool
Provides a fully managed, cloud-based browser runtime:
- Session isolation — every workflow gets its own containerized browser
- Parallel execution at scale — no infrastructure to manage
- Live viewing + session replay — debug in real time
- CloudTrail logging — full audit trail for every browser action
- Ephemeral containers — browser environment destroyed after each session (security by default)
Connecting to AgentCore Browser from code:
from bedrock_agentcore.tools.browser_client import browser_session
from nova_act import NovaAct
import boto3
def run_cloud_agent(prompt: str, starting_page: str, nova_act_key: str):
with browser_session(region="us-east-1") as client:
ws_url, headers = client.generate_ws_headers()
with NovaAct(
cdp_endpoint_url=ws_url,
cdp_headers=headers,
nova_act_api_key=nova_act_key,
starting_page=starting_page,
) as nova:
return nova.act(prompt)
Your agent now runs in a sandboxed cloud browser — not on your laptop.
Deployment in one command
The VS Code extension handles the entire deployment pipeline automatically:
- Packages your workflow → Docker container
- Pushes image → Amazon ECR
- Creates → IAM roles + S3 buckets
- Deploys → AgentCore Runtime
Zero manual infrastructure. One click in the IDE.
Cost Effectiveness: The Real Story
Traditional browser automation has hidden costs that never show up in your AWS bill:
- Developer time maintaining brittle selectors (breaks every sprint)
- Failed workflows causing downstream data errors
- Re-running failed jobs, debugging silent failures
Nova Act's natural language instructions are resilient to UI changes. "Click the submit button" works whether it's labelled Submit, Send, or Confirm — blue or orange — left side or right side.
On infrastructure costs, AgentCore's pricing model is genuinely aligned with how agentic workloads behave:
- ✅ Consumption-based — pay per second of actual CPU/memory use
- ✅ No charge during I/O wait — agentic workloads spend 30–70% of time waiting for pages to load or LLM responses. That idle time is free.
- ✅ No reserved instances — no upfront commitment
- ✅ Free Tier — up to $200 credit for new AWS customers
Practical example: A workflow that spends 60% of its time waiting (page loads, API calls) — you only pay for 40% of wall-clock time. At scale, that compounds significantly.
Real-World Use Cases
🧪 QA and Automated Testing
Your test cases can now read like user stories:
with NovaAct(starting_page="https://yourapp.com") as nova:
nova.act("log in as a standard user")
nova.act("add one item to cart and go to checkout")
nova.act("complete checkout using test card 4111111111111111")
result = nova.act(
"is an order confirmation visible?",
schema={
"type": "object",
"properties": {
"confirmed": {"type": "boolean"},
"order_id": {"type": "string"}
}
}
)
assert result.parsed_response["confirmed"] is True
No selector maintenance. No brittle IDs. Tyler Technologies (public sector software) converted manual test plans to automated suites in minutes using Nova Act — without a single CSS selector.
📋 ERP and Legacy System Automation
Many enterprise systems — legacy CRMs, ERP portals, government platforms — have no API. Nova Act handles them:
for record in crm_records:
with NovaAct(starting_page="https://legacy-erp.company.com") as nova:
nova.act("click New Contact")
nova.act(f"enter '{record['name']}' in the Name field")
nova.act(f"enter '{record['email']}' in the Email field")
nova.act(f"select '{record['region']}' from the Region dropdown")
nova.act("click Save")
🔍 Competitive Intelligence
Monitor competitor pricing or product listings without an API:
with NovaAct(starting_page="https://competitor.com/pricing") as nova:
result = nova.act(
"extract all pricing plans and their features",
schema=pricing_schema
)
save_to_s3(result.parsed_response)
🏢 Vendor Portal Processing
Dozens of vendor portals, no API integration, invoice processing and status checks — run them all concurrently with automatic human escalation on exceptions.
Human-in-the-Loop: Built for Production Reality
Production agentic systems will hit edge cases. CAPTCHAs. Unexpected modals. Two-factor prompts. Nova Act handles this without you building custom alerting:
- Workflow hits an ambiguous state
- Nova Act pauses and sends an SNS notification to a designated supervisor
- Supervisor receives a devtools URL — they can inspect and interact with the live browser state
- Supervisor takes corrective action
- Workflow resumes automatically
This isn't bolted on — it's a core architectural feature. The design acknowledges that 90% reliability means 10% still needs a human. That 10% is handled gracefully.
The Developer Journey: Playground → Code → Cloud
nova.amazon.com/act → pip install nova-act → VS Code Extension → Deploy
Explore in browser Write Python workflow Debug step-by-step One click to AWS
Quick start:
pip install nova-act
export NOVA_ACT_API_KEY="your_api_key"
from nova_act import NovaAct
with NovaAct(starting_page="https://news.ycombinator.com") as nova:
result = nova.act(
"what are the top 5 post titles on this page?",
schema={
"type": "object",
"properties": {
"posts": {"type": "array", "items": {"type": "string"}}
}
}
)
print(result.parsed_response)
That's all it takes to extract structured data from any webpage.
For Builders in Restricted Regions
Nova Act is currently only available in US East (N. Virginia). If you're in India, Southeast Asia, or any other region without access — here's what to do right now:
1. Learn the atomic act() pattern today. This architecture pattern is what matters. When access opens up, your mental model is already in place.
2. Approximate it with Bedrock + Playwright. Use Amazon Bedrock's Claude models with tool use + Playwright for browser control. The perceive→reason→act loop is buildable today. You lose the purpose-built model advantage, but you learn the architecture.
3. Read the public GitHub. The aws/nova-act repo is open. The amazon-agi-labs/nova-act-samples repo has CDK examples for Lambda, ECS, and AgentCore. Reading these is free and region-independent.
What's Coming
Amazon has been explicit about the roadmap:
- MCP integration — Nova Act workflows calling external tools and APIs mid-session via Model Context Protocol
- Strands Agents — Amazon's open-source multi-agent SDK already supports Nova Act as a sub-agent in larger pipelines
- Beyond the browser — the same RL training methodology is being extended to more complex real-world task environments
- More regions — no official dates, but global expansion is the clear direction
Browser automation is the beachhead. The broader target is any workflow that requires perception, reasoning, and action in a UI environment.
Key Takeaways
6 things to remember from this article:
- Vertical integration is why it's reliable — model + orchestrator + actuator trained together
- Atomic act() calls are the pattern — small precise steps beat one big instruction
- Vision > DOM selectors — screenshots don't break when UI classes change
- AgentCore pricing — you don't pay for the 30–70% of time your agent spends waiting
- Human-in-the-loop is first-class — production agents hit edge cases; it's designed for that
- One-command deployment — Docker, ECR, IAM, AgentCore Runtime handled automatically
Resources
- 🏠 Product page → aws.amazon.com/nova/act
- 📖 Official docs → docs.aws.amazon.com/nova-act/latest/userguide
- 💻 GitHub SDK → github.com/aws/nova-act
- 🧪 Sample code + CDK → github.com/amazon-agi-labs/nova-act-samples
- 💰 AgentCore pricing → aws.amazon.com/bedrock/agentcore/pricing
- 🎮 Web playground → nova.amazon.com/act
Found this useful? Drop a ❤️, share with your AWS friends, and follow for more deep dives on agentic AI on AWS. I'll be posting the "Build it yourself with Bedrock + Playwright" version next — a practical guide for restricted-region builders who want to implement the same perceive→act loop today.
Top comments (1)
Intresting