Skip to content

DEV Community

Danilo Poccia for AWS

Posted on Mar 31

Amazon AGI announces research preview of Amazon Nova Act: Build agents that take action in web browsers

#ai #python #programming #web

During our daily activities, we interact with many websites, from filling out complex forms and extracting data across multiple websites to reading emails and booking appointments. It would be great if we could automate some of those tasks, but automating browser-based workflows brings up significant challenges. These tasks typically require either tedious manual intervention or brittle automation scripts that break when the websites they rely on are updated. Current solutions often demand specialized knowledge of website structures and frequent maintenance as sites evolve.

Today, I'm excited to share Amazon Nova Act, a research preview from Amazon Artificial General Intelligence (AGI). Amazon Nova Act is a new AI model trained to perform actions within a web browser that you can use with the Amazon Nova Act SDK. Instead of relying on backend integrations, the SDK navigates websites like a user, clicking buttons, filling out forms, and extracting data dynamically. Amazon Nova Act is currently available in the US.

The SDK can automate real-world workflows across any website, even those without structured programmatic access. Using the SDK combines natural language, Python scripting, and Playwright automation in a single interface, making it easy to build, test, and refine website automation. It can also run multiple workflows in parallel, eliminating wait times and speeding up repetitive tasks beyond human capabilities.

This approach can simplify many use cases such as gathering data from multiple sources for on-call engineering tasks, automating leave requests across multiple systems, streamlining the creation of marketing campaigns, or implementing quality assurance (QA) testing of web applications.

Building on Amazon Nova models' strong performance in multimodal intelligence and agentic workflows, Amazon Nova Act has been further trained on planning and running multistep actions in a browser. It is optimized for high reliability on atomic actions, such as searching for an item in a catalog or a list, with best-in-class performance on perception benchmarks, including ScreenSpot and GroundUI Web.

The Amazon Nova Act SDK allows you to build browser action automation commands with both natural language instructions and code. This hybrid approach makes it easier to break down complex sequences into reliably repeatable steps and fall back to conventional browser automation when needed, all within a unified programming interface. Let's see it in action.

Getting your Amazon Nova Act API key

Amazon Nova Act is a research preview from Amazon AGI and is not included in the AWS SDKs. The Amazon Nova Act SDK uses different credentials from AWS accounts. To get early release access, you need an API key. Using your Amazon account, you can sign into nova.amazon.com, a new website that you can use to experience the capabilities of the Amazon Nova foundation models (FMs). There, you can choose Act in the Labs section of the navigation pane.

You might need to join a waitlist to get access. In that case, you can come back to the Lab section when you receive a confirmation email to generate your API key

Using the Amazon Nova Act SDK

Let's see how Amazon Nova Act works with a practical example. In this scenario, imagine I'm looking for a new place to live. I commute by bike and train, so I'd like to know how long it takes to cycle to the train station from each place I'm considering.

To do that manually, I'd have to go first to a service like Zumper to find a list of properties that satisfy my needs. Then, I'd have to use a website like Google Map to find the distance by bike for all the addresses in that list. This time, I'll automate all of that with the following script!

from concurrent.futures import ThreadPoolExecutor, as_completed

import fire
import pandas as pd
from pydantic import BaseModel

from nova_act import NovaAct


class Apartment(BaseModel):
    address: str
    price: str
    beds: str
    baths: str


class ApartmentList(BaseModel):
    apartments: list[Apartment]


class CaltrainBiking(BaseModel):
    biking_time_hours: int
    biking_time_minutes: int
    biking_distance_miles: float


def add_biking_distance(apartment: Apartment, caltrain_city: str, headless: bool) -> CaltrainBiking | None:
    with NovaAct(
        starting_page="https://maps.google.com/",
        headless=headless,
    ) as client:
        client.act(
            f"Search for {caltrain_city} Caltrain station and press enter. "
            "Click Directions. "
            f"Enter '{apartment.address}' into the starting point field and press enter. "
            "Click the bicycle icon for cycling directions."
        )
        result = client.act(
            "Return the shortest time and distance for biking", schema=CaltrainBiking.model_json_schema()
        )
        if not result.matches_schema:
            print(f"Invalid JSON {result=}")
            return None
        time_distance = CaltrainBiking.model_validate(result.parsed_response)
        return time_distance


def main(
    caltrain_city: str = "Redwood City",
    bedrooms: int = 2,
    baths: int = 1,
    headless: bool = False,
    min_apartments_to_find: int = 5,
):
    all_apartments: list[Apartment] = []

    with NovaAct(
        starting_page="https://zumper.com/",
        headless=headless,
    ) as client:

        client.act(
            "Close any cookie banners. "
            f"Search for apartments near {caltrain_city}, CA, "
            f"then filter for {bedrooms} bedrooms and {baths} bathrooms. "
            "If you see a dialog about saving a search, close it. "
            "If results mode is 'Split', switch to 'List'. "
        )

        for _ in range(5):  # Scroll down a max of 5 times.
            result = client.act(
                "Return the currently visible list of apartments", schema=ApartmentList.model_json_schema()
            )
            if not result.matches_schema:
                print(f"Invalid JSON {result=}")
                break
            apartment_list = ApartmentList.model_validate(result.parsed_response)
            all_apartments.extend(apartment_list.apartments)
            if len(all_apartments) >= min_apartments_to_find:
                break
            client.act("Scroll down once")

        print(f"Found apartments: {all_apartments}")

    apartments_with_biking = []
    with ThreadPoolExecutor() as executor:
        future_to_apartment = {
            executor.submit(add_biking_distance, apartment, caltrain_city, headless): apartment
            for apartment in all_apartments
        }
        for future in as_completed(future_to_apartment.keys()):
            apartment = future_to_apartment[future]
            caltrain_biking = future.result()
            if caltrain_biking is not None:
                apartments_with_biking.append(apartment.model_dump() | caltrain_biking.model_dump())
            else:
                apartments_with_biking.append(apartment.model_dump())

    apartments_df = pd.DataFrame(apartments_with_biking)
    closest_apartment_data = apartments_df.sort_values(
        by=["biking_time_hours", "biking_time_minutes", "biking_distance_miles"]
    )

    print()
    print("Biking time and distance:")
    print(closest_apartment_data.to_string())


if __name__ == "__main__":
    fire.Fire(main)

When I initialize the Amazon Nova Act SDK client, I pass a starting page, whether to launch the browser headless or not, and if I need debug logs. Then, I use the act() method to pass instructions to the agent using natural language that can include variables. For example:

client.act("Close any cookie banners.")

or

client.act(f"Search for apartments near {location}")

To run the script, I install the Amazon Nova Act SDK:

pip install nova-act

I set the Amazon Nova Act API key in the NOVA_ACT_API_KEY environment variable:

export NOVA_ACT_API_KEY=<YOUR_API_KEY>

The script also uses the pandas Python module to process the data extracted from the websites and Pydantic to gather data from the SDK in the correct format:

pip install pandas pydantic

Now, I run the script and go grab some coffee. When I'm back, the script has produced a nicely formatted table, completing in minutes what would typically take much longer to gather manually.

Here's a recording of what happened on my screen while the script was running. At the bottom of the screen, you can see the output to the terminal from the Amazon Nova Act SDK, including the thinking process, the actions, and the results extracted from the web pages. After some properties have been selected, multiple browser windows are used in parallel to find the distance by bike to the train station.

At the end of the video, I see the final table that includes information from multiple websites and sorts results based on my needs. The video has been sped up to make it easier to follow.

If I look at the code, the script demonstrates several key capabilities of Nova Act:

Natural language commands – The act() method accepts straightforward natural language instructions like "search for homes near..." that Amazon Nova Act translates into precise browser actions.

Structured data extraction – Amazon Nova Act can extract specific information from web pages and return it in structured formats like JSON.

Parallelization – Multiple Amazon Nova Act clients can run simultaneously in separate threads, drastically reducing the time needed to collect data from multiple sources.

Hybrid programming model – The example combines the power of natural language instructions with traditional Python code for maximum flexibility and control.

Things to Know

Amazon Nova Act is available in the US as a research preview from Amazon AGI. At this time, there's no cost when using the Amazon Nova Act SDK.

The Amazon Nova Act SDK supports MacOS and Ubuntu operating systems, and is compatible with Python 3.10 or later. You can use the SDK interactively with the Python interpreter for experimenting and step-by-step debugging with a visible browser window, or you can prepare a script for automation and asynchronous execution using headless mode.

Amazon Nova Act works best when you break up actions into multiple act() calls that typically result in 3 to 5 browser actions each (for example, click, type, scroll). Rather than asking for a complex workflow in a single command, divide it into logical steps, similar to how you'd instruct a person on how to complete a task for you. You can further enhance results with proper error handling in your code.

To automate tasks on websites that require authentication, you can configure the Amazon Nova Act SDK to use the Chrome browser installed on your machine with your existing login sessions, rather than the browser managed by the SDK.

The Amazon Nova Act SDK is an experimental early release. When using it, please consider that it might make mistakes.

Ready to start building with the Amazon Nova Act SDK? Whether you’re automating tedious tasks or optimizing large-scale workflows, the SDK gives you the power of generative AI to automate the web in a fast and reliable way, without the need to onboard to specific website structures or APIs.

You can follow the instructions in this repo where you can find examples and a full onboarding guide.

I can't wait to see what you'll automate with Amazon Nova Act!

Top comments (8)

Subscribe

Who Am I • Apr 1

I'm outside of US, So is there any way that i can get the API Key for testing Nova SDK?

Danilo Poccia • Apr 1 • Edited

Sorry, at this time, this is a research preview only available in the US. We're just getting started and are excited to share more information in the future.

Farmer Sneed • Apr 6

It's so funny how AI has been working so hard to make development easier, but never testing. This is finally what we've been waiting for with test automation. Wdio and Playwright need to be updated so frequently and it's so time consuming when you could just have an AI look at a page and do simple interactions to determine if it's working or not.

Dinesh Kumar • Apr 2

If we generate API key in US, can we use that API key in outside of US?. Please confirm

Danilo Poccia • Apr 3

You must be in the US to use Nova Act.

Vishal Singh • Apr 2

Hey, did you get the answer or not?

Dinesh Kumar • Apr 2

Not yet. Still waiting for an update.

JoEy0ll0X • Apr 3

I wonder how this would work with something like indeed.com