Kazuya

Posted on Dec 6, 2025

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

In this video, Amazon's AGI lab introduces Nova Act, a frontier-class AI model for browser automation that achieves over 90% reliability on enterprise workflows. The team explains how Nova Act uses reinforcement learning on web simulations and advanced element understanding to interact with browsers like humans do, overcoming limitations of legacy code-based solutions. Key features include human-in-the-loop capabilities, tool use beyond browsers, and full AWS integration with SDK, IDE extension, and CLI. Design partners demonstrate real-world applications: 1Password uses Nova Act to build universal sign-on by gathering website-specific intelligence at scale, Amazon Leo automated QA testing across web and mobile platforms achieving 60 dev days saved, and Sola powers enterprise process automation with near 100% reliability for clients like R1 RCM. The platform offers complete developer experience from online playground prototyping to production deployment with observability through AWS console.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introducing Amazon Nova Act: From Answers to Actions in Browser Automation

Welcome everyone. We're going to get started. Thank you all for joining us for our breakout session on Amazon Nova Act. My name is Kelsey, and I am here from our Amazon AGI lab based in San Francisco. We are a team within the Amazon Artificial General Intelligence organization that is specifically focused on pursuing long-term research bets. The focus of our lab over the course of the last year or so has been specifically on agents. We are really excited about this paradigm shift that we're seeing in the industry from models that give us answers to models that take actions. I know this is not the first time you've heard about agents this week. They've certainly gotten a lot of talk over the course of re:Invent.

We started with the browser because this is as close as we can get to a near universal action space in the digital world. We've been working on this problem of training models and providing solutions that allow for human-like performance when using a computer. What's been interesting about this problem is that it has existed for a long time. People have wanted to automate computer tasks basically as long as computers have existed, and the solutions have also existed for a long time. Some of you in this room may have used more legacy browser automation solutions that are code-based and involve writing a considerable amount of logic to specify exactly what you want an automation to do to use a computer and perform a task.

Here's an example for grabbing the weather from Weather.com. These solutions were a really great starting point to this problem, but they came with certain challenges. What we hear from customers is challenge one: these solutions took many months to get up and running in many cases. They were great for one static website or one static workflow, but as soon as a website changed, they would break. There was a pretty large maintenance burden involved with this approach. The biggest thing is there was really limited generalizability because as a developer you had to specify every step of the workflow.

If you created one workflow for one geography or one SKU, but all of a sudden you had to scale to the full scale of your organization and your business, and generalize to fifty different states or hundreds of different insurance companies, all of a sudden that problem became untenable. The technology was not really working with you to get to the scale that you need. What's interesting about this problem though is that as people, we do not run into these same constraints when we face problems related to computer use. Regardless of which email client you use and love, if I show you any email client and ask you to write an email, I'm fully confident that everyone in this room could figure it out.

The way that you would know how to do this is actually kind of hard to encode in a rules-based way because you probably look at this screenshot and you're looking for a whole combination of things. You're looking for maybe a pen icon or a button that says compose, maybe it says draft, maybe it says new. It's probably towards the top of the page because writing an email is a key piece of functionality for an email client. So maybe it's separated from the other functions in some way. All of this is intuition that we've built through millions and millions of examples of using UIs and recognizing these common patterns. But software doesn't have that inherently.

What we've been really excited to do as part of the lab is build models that treat computer use more like how humans do. That's what we've built with Nova Act. The goal here is to train models that interact with a browser like a person. They look at a screen, they take a task, they understand what's going on, and they determine what to do next. Once they've performed an action, they repeat that same loop. They take another screenshot, they understand what's going on in light of the previous action, and they take another action to continue on the journey just like we do as people.

These systems are way more robust. They don't fail due to a small change. You can get up and running much more quickly through natural language, and you can scale just in the same way that we know how to perform that flow across any email client. These models can also generalize across different environments. But if you've been following the agent space, that's not the first demo you've seen of agents or of computer use in particular. The biggest bottleneck we hear from customers is reliability. We hear from a lot of customers that they're super excited about agents, they believe in the future, and they're very quick to build a prototype or a proof of concept, and then they get stuck.

Achieving Reliability Through Element Understanding, Reinforcement Learning, and Human-in-the-Loop

While these models hold lots of promise and these solutions are very exciting, they don't actually work in a repeatable, scalable way that you need when you're trying to solve real business problems. This has been a core focus of Nova Act. We've approached reliability as the P-zero as we've been solving this problem, and we've done so in a few different ways. The first is really zeroing in on element understanding. For a model to work end to end in a reliable way, it needs to really understand web elements in the way that humans do. There are a number of culprits that typically stump agents, including date pickers, drop downs, and certain types of filters. These components are different on every site, and often agents don't know exactly how to handle them.

We spent a lot of time collecting training data specifically on these components and evaluating our models in these areas to make sure we achieve end-to-end reliability, including handling the correct amount of time to load after you type in a zip code and other such idiosyncrasies.

The second approach is using reinforcement learning specifically on web simulations or gyms. We've built hundreds of examples of mock websites that have similar components and types of UI patterns that you see across the web in the workflows that our customers are performing every day. We tell the model to go complete a task, and we don't specify the workflow or the steps that the model should take. We just validate that the end state is successful. This allows our model to do a large amount of exploration of these different platforms to really understand what's possible, what happens when I do A, what happens after that, and to understand different patterns for success. This has been a really key part of achieving reliability.

Lastly, we've kept real world evaluation as our focus. You'll see some exciting benchmark results with the latest release of Nova Act. We're really excited to see that our model is outperforming models of similar size like Haiku and even much larger ones like Sonnet. Ultimately, the metrics we care most about are related to customer success. What we're seeing with Nova Act is that early customers are seeing upwards of 90% reliability on the workflows that they're deploying in production. This is really important to us because we believe that an agent that is 50% reliable is 0% useful. Customers need agents that actually work in production, and this is what we're seeing today that we're really excited about.

Even the most reliable agents need help and need human oversight sometimes. With this release at re:Invent this year, we are excited to launch human-in-the-loop capabilities with Nova Act. This allows you as a developer to configure the ability for the agent to call on a human either to take over on a task or to review a task before the agent continues. You can do this through platforms like Slack or through custom integrations, such as a custom UI, and facilitate this human supervision of fleets of agents.

We're also expanding beyond the browser. As I mentioned, the browser is the place that we started with Nova Act. What we've heard from customers is that folks are excited to extend that same level of reliability across their full workflows even beyond the browser. We're getting started on this in preview with Nova Act now, and we see customers doing things like reading their QA tests from Jira and then implementing them using Nova Act in the browser, or taking form fill inputs from Excel and then filling out that form in the browser. We're just getting started on this journey, but we're really excited to extend beyond what the browser itself is capable of.

All of this is very exciting, but what we've also learned is that having a really reliable model is necessary but not sufficient for building great agents at scale. Customers run into these questions: How do I debug? How do I measure success and deploy and scale? These are equally important questions in the journey to developing agents that really have true business impact. So now I'll have my colleague come along and talk about how we've tackled this problem as well.

Building an End-to-End AWS Platform: From Prototype to Production Deployment

Thanks, Kelsey. My name is Ian. I'm a product manager and a member of technical staff at the AGI Lab, and it's great to see everybody here. Let's go back a little bit to how we started on this journey. In March of 2025, we released a research preview, and since then, we've had a bunch of people using our product and system, and they gave us a lot of feedback. Some of the feedback was about reliability, about the 90% reliability that Kelsey mentioned, which is absolutely critical. But we also got feedback that in order to release an agent into production in an enterprise, we need security. It needs to have all of the AWS security capabilities that everybody here has come to know and love.

In July, we did our first integration with AWS, and our first users were able to use AWS authentication and S3 for saving the logs and things like that. That was our very first attempt at integrating into AWS. This week, we're happy to announce that we've launched as a fully generally available end-to-end AWS service, fully integrated with an AWS console and basically everything that you expect from a proper AWS service. I'll walk you through them now. Our new AWS service includes the following. First of all, it includes a frontier-class, state-of-the-art model.

As Kelsey mentioned, this is a state-of-the-art model that is as good as anything on the market, if not better. We have a new AWS service and console, which I'll walk you through in a few minutes. We also have a new playground where you can all go to our online playground and try out the product without having to download an SDK or write any code. You can see if it works for you online, and if it does, then we have the SDK, which you can use for coding in Python.

We have an updated version of our existing SDK that uses our new model and is able to work with the new capabilities like human-in-the-loop. We also have a new version of our IDE extension, which I'll show you in a few minutes, designed to use the new model and the new capabilities. We've built a CLI which makes it really easy once you've built your agent to package everything up into an image and deploy it. We have these new capabilities like human-in-the-loop and tool use, and we're very excited about this end-to-end platform.

We feel that this platform provides several benefits to you. First, frontier-class accuracy. More importantly, while benchmarks are important, real-world reliability is actually much more important to our users. We've spent a lot of time training our model to make sure that it can achieve over 90% reliability on the sort of typical day-to-day enterprise use cases. Thirdly, cost-effectiveness. We've priced it very aggressively, and we feel that it is the most cost-effective solution on the market for similar products. Lastly, time to value. We're really proud of the work we've done in building a terrific developer experience. Simply put, we feel that Nova Act is the best service for creating AI browser-based agents and creating ones that you want to actually use in production in the enterprise.

Let me walk you through that developer journey. First, you can prototype on the online playground, build with the SDK and our IDE extension, deploy to AWS, and get observability via the AWS console. Now I can show you some examples. This is what our online playground looks like. You can see on the left that you can enter natural language prompts. On the right, we've got an embedded browser which is currently running a web gym, a simulation of a travel site. We mocked up a travel site for buying tickets to other planets. On the bottom, you can see the agent's thinking steps, so you can try out your use case and see if it works here. When you're ready, you can click download, and it'll download the agent you've developed as a Python script, which you can then continue developing on your laptop.

On your laptop, you can use the Nova Act SDK, and we highly recommend you try out our VS Code IDE extension. It works on any VS Code compatible IDE and has a really terrific UI that simplifies things. On the left, you can see a notebook-style UI where all of the agent steps can be broken out into separate cells. This gives you the ability to iterate and keep tweaking an individual step until you've got it exactly right. Without this, you have to keep rerunning your agent from the beginning, and it's a real pain waiting for it to go through 30 steps until it gets to the one that you've tweaked. Our initial users have said that it can save them up to 80% of the time. On the right, we've embedded a browser right in your IDE, so we've got everything that you need in one space.

When you're ready, we have a deploy tab, and this makes it super easy to insert your AWS credentials. With one click, it'll build the agent image that you need to deploy. It'll use your Python script and the SDK. It'll also take care of allocating the resources for you, the ECR repository, S3 bucket, set up your IAM execution roles and things like that, and deploy everything to AWS for you. So it's super simple. Once you've deployed your agent to AWS, you can use our console and see all of the agents that you've deployed. For each agent, you can see a record of every single agent run. You can scroll down and see all of the screenshots, all of the thinking statements, the prompts, and all of the actions that the agent took. This is terrific for troubleshooting and for customer support.

Thirdly, you can also use Nova Act as part of a multi-agent framework. Here's a demo of something that we built that uses Nova Act in conjunction with two other Nova models: a Nova Chat LLM model and Nova Sonic, which is a voice model. I'll play this video for you to see. Hey Nova, I'd like to plan a trip to the closest exoplanet. Maybe something warm with a beach. The closest exoplanet we know of is Proxima Centauri B orbiting the star Proxima Centauri, which is just 4.24 light years away. Proxima Centauri B has abundant green oceans and beautiful beaches of fine orange sand. The average temperature is 86 degrees Fahrenheit. What date would you like to travel? In one month. So January 3rd. Okay, great. Would you like me to book a flight to Proxima Centauri B? Let's do it. Got it. Let me generate instructions. Okay, first, I need to find a ticket from San Francisco to Proxima Centauri B for January 3rd, 2026. So I think you get the idea. The basically the idea here is that you can build really cool experiences by using multiple agents together in the same agent agentic solution.

Four Key Use Cases: Web QA Testing, Data Entry, Data Extraction, and Checkout Flows

As we've been working with design partners and with the initial users over the last year, we've seen a lot of different use cases appear. In fact, if you think of Nova Act, it's a very powerful and very low level primitive that can be used for innumerable use cases, literally thousands and thousands of use cases. But as we saw what people are doing, we've seen that four typical clusters of use cases appear most often, and these are use cases that we feel are the lowest hanging fruit and will allow you to really get a lot of value in the near term. The first one is web QA testing. So today, if you want to build a regression test on a web application, you need an engineer who needs to write code using something like Selenium or Playwright, and that code is brittle. If a button moves on the website, then suddenly that code doesn't work anymore. With Nova Act, you can use natural language prompts, and it'll understand what the site is supposed to do, understand if the design has changed, and it'll just work.

Next example is data entry. Every company has many workflows that involve doing things that involve manual transactions with websites. For example, salespeople, after a meeting, have to come in and enter a bunch of information into a CRM system. People have to file taxes, people have to file licenses or apply for licenses in different governmental websites. So there's tons of these undifferentiated manual tasks that people have to do that don't really add that much value. Wouldn't it be great if we could help you automate those so people can do what they really like to do at work, which is be strategic and creative and do what their real job is?

Similar for data extraction, there's many industries like healthcare or logistics and many others where there's thousands of fragmented businesses and websites, and none of them are going to have APIs in the near future. So having a system that can automatically reach out to these different systems and collect data from them is of huge value and saves a ton of time. And checkout flows for e-commerce and for travel and things like that. We've also seen as being a very popular use case, and people are automating thousands of these at scale.

1Password's Universal Sign-On: Scaling Website-Specific Intelligence with Nova Act

So now what I'd love to do is introduce you to some of our design partners. They're going to walk you through a little bit about their company and how they've been able to innovate using Nova Act. So the first one, I'd like to invite Floris to the stage from 1Password. Thanks. Hey, everyone.

Hi, I'm Floris from the engineering team at 1Password, and today I'm going to show you how 1Password is using agentic AI to improve our own product in a way that just wouldn't be possible in the pre-AI era. The star of the show here is really Nova Act. A little bit about 1Password: we secure 1.3 billion credentials for 180,000 businesses and millions of users who use 1Password every day to log into their favorite websites. We don't just store logins for the developers in the room; we also store SSH keys and sensitive .env files, and these come with native integrations in the desktop apps.

Let's talk about autofill. We have a browser extension that adds a small 1Password icon next to your login forms. If you click on that icon, you can choose a credential that you want to use to log in, and 1Password does the tedious work of filling out the form and getting you logged into your website. This has been out for a while and it's been working great, but it's time now for the next generation of autofill. This is what we're calling universal sign-on.

With universal sign-on, we want to take the experience from "hey 1Password, fill in this form for me" to a more high-level approach where you say "hey 1Password, just log me in. Just do whatever it takes to log me in regardless of the login method, whether that's a username and password, TOTP MFA, enterprise SSO, a passkey, or signing with GitHub or Google, which you probably forgot which one you use with which website again." Here's a preview of what this looks like. Now you can just click on a website and it'll immediately navigate to the login page and immediately log you in with a password and MFA token, and you're just logged in like that.

Now let's talk about how this works. Unfortunately, there's not a standard protocol for logging into websites. It's basically a free-for-all of HTML, and there's a lot of ambiguity out there. Every website does it slightly differently. The classic autofill algorithm solved the ambiguity with a one-size-fits-most algorithm based on heuristics, and it's been working quite well. It's being used millions of times every single day. However, with the vision that we have for universal sign-on, we're running into the limits of the heuristics that we can articulate in our code.

Right now, the browser extension doesn't just need to know how to fill in a form, but also how to navigate to the form and how to navigate through the form, which is just a lot more complex. To make this a success, we're going to need website-specific logic and website-specific instructions on how to complete the login. As you can guess, this doesn't scale if you need to do this by hand because there are millions of websites out there that offer a login. Even if we were to undertake this massive effort, it would be very brittle because we would see breaking changes on a daily basis, which is really not acceptable.

This is where Nova Act comes in. What we've done is built an AI agent that uses Nova Act and goes out and browses all these websites. It collects the necessary information about the specific oddities of each website. Then we have a second agent that validates this intelligence that we've gathered and passes it on to what we call the site intelligence engine. The site intelligence engine makes it available to our browser extension, and the browser extension runs on the user's device. The nice thing about this is that all the information gathering and validation can run on our infrastructure out of band, and the browser extension login flow will remain blazingly fast and also deterministic, which is really important. The validation can also run on a periodic basis to see if the intelligence that we've gathered is still accurate and correct, and if not, we invalidate it.

Let's look at an example of a Nova Act agent in practice. Here it's going to navigate to the AWS re:Invent website, and this is actually a pretty simple example because its job here is to get to the login form. As you can see on the top right, it has a big login icon, so this is a pretty simple one. Let's see if it's able to find it. There we go. Now it found the login form and it knows that it completed its task. Now let's look at a slightly more complex example.

This one is a bit trickier. This is Duolingo, and it doesn't have a traditional login button at the top right. Instead, it has a button that says "I already have an account." To make it more complicated, it doesn't have an href tag. It has a JavaScript handler, and this would be a bit more tricky to build with a heuristics-based algorithm. However, for humans it's super easy because it's just "I already have an account." Because Nova Act takes the same human approach, it's able to get to the login form just as easily.

Now let's look at the logs here. Along the way, Nova Act will log the steps that it takes as part of the evaluation loop. Here you can see it really thinks like a human. It knows what it needs to do and it says it found the button that says "I already have an account." Then it figures out that it should click it, does the actual click, and then evaluates the result again. It knows that it found the login form and that it completed its task and needs to return now.

To recap, website-specific intelligence can meaningfully improve the 1Password products, and Nova Act is really the thing that enables us to do it at this scale in a way that we just couldn't have done in the pre-AI era. We still have a long way to go here. We're just scratching the surface, but you can already try out the new universal sign-on UX in the latest beta version of the 1Password browser extension if you're interested.

Amazon Leo's QA Automation: From Prototype to 200 Live Scenarios in Five Weeks

I'll pass it on now. Perfect. Thank you, Floris. That was super interesting. Now I'd like to invite Matthew from Amazon Leo. Hey folks, I'm Matthew. I'm from Amazon Leo. Amazon Leo is the next generation of satellite Internet connectivity that Amazon is building. We currently have 158 satellites in space right now, and we're always launching more. Space always elicits a little sense of wonder and whimsy, right? As we get closer to launching our beta product, we had a really big task because we set zero critical customer bugs being reported across our web and mobile browsers. That's really aggressive, right? We have hundreds and hundreds of test cases we need to perform.

We have an aggressive timeline because we're always building and always shipping, and we have weeks to do this, not months. I'm going to talk a little bit about how we leverage Nova Act to go from a prototype into a production-grade QA automation system today. Traditionally, you see that our traditional automation has really complex code. You have to work with multiple frameworks, and each of those is really its own unique specialty, right? You need people who understand Appium selectors for mobile if you want to serve Android and iOS. You need somebody who really understands Selenium or Playwright and knows how to handle page jitter and how long you should wait. You've got to get it just right.

Nova Act likes to invert that, right? I don't need to know that anymore. I don't need that specialty. Instead, what I can do is turn something else into it. What we've done is taken a more opinionated approach to natural language. We had hundreds and hundreds of these Gherkin test cases, you know, given-when-then, for our customers. We built an agentic framework around it. On one side, we take this given-when-done statement, we use a Strands agent like we saw earlier today, and we convert that into the Nova Act command on the fly. Then we determine what type of test this is. Is this a web test? Is this a mobile test?

If it's web, we use the Playwright actuator that comes baked in with Nova Act. For mobile though, Nova Act doesn't support it, but they do have a really extensible SDK framework that allowed us to write our own Appium actuator to go in and start performing these actions in our mobile app. So we have one SDK, one unified interface, with no platform considerations anymore. You just run the test, and it figures everything else out for you, right? One of the other things we talked a lot about today is the 90%, right? When you're running QA automation and you're looking for understanding the perfect customer experience, you need that to really get closer to 100%, right?

One of the other things that Nova Act provides us is the ability to export a trajectory. What that is, is every piece of information that Nova Act performed during the test, it saves. It saves a picture of the page, the DOM, where it clicked, and how it thought about the problem.

We can then replay that deterministically for our next run. We've built a self-healing replay engine as well. As our page changes and our customer experience changes, our core mission of the test case hasn't changed, but the page has moved around a little bit. When we fail our test case because we couldn't match it in the deterministic way anymore, we rerun it intelligently and then save that. The next time through, it passes. We're running three times faster with no more non-deterministic behavior that we worry about repeatedly. We get really good confidence that our experience is shipping the way we want it to.

Let's take a look at what this looks like in action. Step one: given a user successfully navigates to Leo.amazon.com. Step two: when the user clicks on the join the list button in the header. Step three: the user enters leo.user@amazon.com into the email input field. Step four: the user enters 98052 into the postal code input field. Step five: the user selects United States from the country dropdown. Step six: the user clicks the submit button. Step seven: then the user should see a confirmation message indicating successful submission.

What you got to see is that we transformed everything on the fly and we're running this in real time. We get to see how our systems are thinking about it. It's very easy for us to debug and it gives us really high confidence in both when we see a failure, understanding why we're failing, and when it's passing, knowing exactly why. We can see the type of customer behavior that's going to happen in real time.

We went from prototype to production in five weeks. The first one or two weeks were really setting the groundwork, making sure our agents could handle throttling and all those little things when you're engineering. Weeks three and four, we got to production ready, moving it into accounts that can hold and manage all the data we're running through. Week five, we're now running 200 live scenarios with 3600 validation points across web, iOS, and Android. We've estimated and saved about 60 dev days to date and we're saving another 30 every month.

Sola's Agentic Process Automation: Powering Enterprise Workflows with Near 100% Reliability

Thank you and I will bring up Neil. Cool, thank you, Matthew. Now I'd like to introduce Neil from Sola. Thank you, everyone. I'm Neil, co-founder and CTO of Sola. As we know, enterprise work today happens across more systems, teams, and tools than ever before, and process automation remains a massive challenge with teams struggling to get value. So what do we need in this next generation of process automation tooling to automate meaningful core operations of businesses? We need the tool to understand what people do. That means observing their work, figuring out their process, and capturing logic and context effectively. We need it to generalize across all these systems on browsers, on desktops and beyond. We need it to handle challenging and dynamic digital work. And of course, we need the solutions we build to scale to enterprise volume.

This is where Sola comes in. Sola is an agentic process automation platform. Some of the largest enterprises in the world, Fortune 500s and the largest private enterprises across verticals and industries use Sola to power their businesses, building intelligent, flexible automations that do everything from medical data entry to financial compliance to legal back office and much more. Sola automations sit on top of systems and interact directly with digital applications. Underneath the hood, this is powered largely by computer use agents. They're the systems that everyone's been talking about today. They are systems that can see, understand, and operate applications just like you and me.

In this video, we can see a Solobot running a generalizable track and trace workflow. The Solobots can understand how a workflow is done by watching someone do it, converting their process into a visual diagram that users can modify. Then it can execute real executions of these workflows, adapting to dynamic interfaces and automatically updating the diagram based on encountered scenarios. It does all this while providing observability and learning from human intervention when needed, becoming better and better over time. These bots help handle some of the most complicated manual workflows for businesses at scale.

A lot of what powers this is Claude models, and we use a variety of them under the hood. Nova Act in particular fills an important niche for what Sola does. It's a powerful workhorse for our computer use needs. Nova Act is steerable, adheres to complex instructions reliably, and allows us to enforce strict guardrails to guarantee the reliability that enterprises need. It's able to handle complex interfaces with state-of-the-art intelligence while operating in real time.

Here's a simplified diagram of how a Solobot can use Nova Act, and generally it's a reliable framework for using Claude models. If there's an instruction that's been scoped to Nova Act, the Solobot will hand off the task to an orchestrator agent. This agent has those instructions along with context about the workflow, the current execution, previous executions, business context and logic, and more. Given all that information, the agent will break down the task into subtasks. In this case, we have action A, action B, and so on. The actions will be handed off to Nova Act sub-agents, which will go off and complete those tasks.

Then given all the context described before, along with the output of the Nova Act sub-agent, which has transparent reasoning and action traces, the orchestrator can validate that action and plan and update subsequent tasks accordingly. Nova Act is specially built for this kind of UI automation. The piece I mentioned before is just one part of our agent harness. We have many places we use Claude models. The Nova Act SDK makes it straightforward to integrate across our entire platform while also automating and supporting the observability that we need for monitoring. Its extensibility allows us to set up custom tools to complement the rest of our harness.

As a workhorse model, Nova Act is fast and reliable, keeping workflows moving in real time, while also automatically handling edge cases like complex error states and conditional logic, while also effectively calling human-in-the-loop when needed. Here's a more advanced fleet orchestration pattern that the Solobot can deploy, which I also think is a good framework example. Similar to before, the orchestrator can break down actions into tasks, but here these are handled by Nova Act sub-agents in parallel. The thinking and action traces and the results of each of these Nova Act sub-agents are aggregated via an aggregation agent that's then passed back to the orchestrator to plan and conduct future tasks. With this kind of system, we can achieve near 100% reliability on these core enterprise workflows.

Here we can zoom in on a specific case. On the right we can see a representation of the visual diagrams on the Sola platform. On the bottom left we can see an example of a portion of this workflow where it's logging into a medical portal, navigating to the patient, and updating the patient field. On the upper left we can see the agent traces. This is the model doing that update patient field. In this case, it's non-trivial. It's not just updating one specific field. The model needs to understand that it needs to click on a button to add an entry, and then it needs to look over the entire form, figure out exactly where in the form that update needs to happen, and then put the relevant information in very reliably.

So Sola is a customer of Nova Act, but downstream of that, companies like R1 RCM, one of the largest revenue cycle management platforms in the US with tens of thousands of employees, use Sola to tackle back office work. For definitions, RCM stands for revenue cycle management. It's basically how your doctors get paid. Because Sola workflows are adaptable, they can handle the hundreds of different payment platforms that a portal like R1 needs to interact with on a regular basis.

Nova Act has been integral to the Sola platform, allowing us to push the boundaries of what computer use models are capable of to conduct real world work for enterprises. With partners like AWS we're able to support automating the most core and critical operations of businesses today.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community