DEV Community

Cover image for AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Beyond web browsers: HITL and tool integration for Nova Agents (AIM3334)

In this video, Amazon's AGI lab introduces Nova Act, an AI agent for browser automation that achieves over 90% reliability in production workflows. The team explains how Nova Act uses reinforcement learning on web simulations and advanced element understanding to overcome limitations of traditional code-based automation. Key features include human-in-the-loop capabilities, AWS integration as a fully managed service, and an end-to-end developer platform with playground, SDK, IDE extension, and CLI. Design partners demonstrate real-world applications: 1Password uses Nova Act to power Universal Sign-On across millions of websites, Amazon Leo automated 200 QA scenarios saving 60 dev days, and Sola built an enterprise process automation platform handling complex medical and financial workflows. The session emphasizes Nova Act's cost-effectiveness, benchmark performance exceeding models like Haiku and Sonnet, and extensibility for multi-agent frameworks.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Thumbnail 20

Thumbnail 40

From Static Automation to Human-Like Computer Use: Introducing Amazon Nova Act

All right, welcome everyone. We're going to go ahead and get started. Thanks all for joining us for our breakout session on Amazon Nova Act. My name is Kelsey. I am here from our Amazon AGI lab based in San Francisco. We are a team within the Amazon Artificial General Intelligence organization that is specifically focused on pursuing long-term research bets, and the focus of our lab over the course of the last year or so has been specifically on agents. We are really excited about this paradigm shift that we're seeing in the industry from models that give us answers to models that take actions, and I know this is not the first time you've heard about agents this week. They've certainly gotten a lot of talk over the course of re:Invent.

Thumbnail 50

Thumbnail 60

We've started with the browser because this is as close as we can get to a near universal action space in the digital world, and we've been working on this problem of training models and providing solutions that allow for human-like performance when using a computer. What's been interesting about this problem is that it has existed for a long time. You've had people wanting to automate computer tasks basically as long as computers have existed, and the solutions have also existed for a long time. So some of you in this room may have used more legacy browser automation solutions that are code-based and involve basically writing a considerable amount of logic to specify exactly what you want an automation to do, to go use a computer and perform a task.

Thumbnail 100

So here's an example for grabbing the weather from Weather.com. These solutions were a really great starting point to this problem, but they came with certain challenges. What we hear from customers is challenge one: these solutions took many months to get up and running in many cases. They were great for one static website or one static workflow, but as soon as a website changed, they would break, and so there was a pretty large maintenance burden involved with this approach. And then the biggest thing is there was really limited generalizability because as a developer you had to specify every step of the workflow. If you created one workflow for one geography or one SKU or one workflow, but all of a sudden you had to scale to the full scale of your organization, your business, and generalize to 50 different states or hundreds of different insurance companies, all of a sudden that problem became untenable and the technology was not really working with you to get to the scale that you need.

Thumbnail 150

What's interesting about this problem though is that as people, we do not run into these same constraints when we face problems related to computer use. Regardless of which email client you use and love, if I show you any email client and ask you to write an email, I'm fully confident that everyone in this room could figure it out. And the way that you would know how to do this is actually kind of hard to encode in a rules-based way because you probably look at this screenshot and you're looking for a whole combination of things. You're looking for maybe a pen icon or a button that says compose, maybe it says draft, maybe it says new. It's probably towards the top of the page because writing an email is a key piece of functionality for an email client, so maybe it's separated from the other functions in some way.

Thumbnail 200

Thumbnail 210

Thumbnail 220

All of this is intuition that we've built through millions and millions of examples of using UIs and recognizing these common patterns, but software doesn't have that inherently. So what we've been really excited to do as part of the lab is build models that treat computer use more like how humans do. And that's what we've built with Nova Act. So the goal here again is to train models that interact with a browser like a person. They look at a screen, they take a task, they understand what's going on, and they determine what to do next. And then once they've performed an action, they repeat that same loop. They take another screenshot, they understand what's going on in light of the task, and they take another action to continue on the journey just like we do as people.

Thumbnail 230

Thumbnail 240

So these systems are way more robust. They don't fail due to a small change. You can get up and running much more quickly through natural language, and you can scale just in the same way that we know how to perform that flow across any email client. These models can also generalize across different environments. But if you've been following the agent space, that's not the first demo you've seen of agents or of computer use in particular. The biggest bottleneck we hear from customers is reliability. We hear from a lot of customers that they're super excited about agents, they believe in the future, and they're very quick to build a prototype or a proof of concept, and then they get stuck because while these models hold lots of promise and these solutions are very exciting, they don't actually work in a repeatable, scalable way that you need when you're trying to solve real business problems.

Thumbnail 270

Thumbnail 280

Achieving 90% Reliability Through Element Understanding and Reinforcement Learning

And this has been a core focus of Nova Act. We've approached reliability as the P-0 as we've been solving this problem, and we've done so in a few different ways. The first is really zoning in on element understanding. So what we've seen is for a model to work end to end in a reliable way, it needs to really understand web elements in the way that humans do. So there are a number of culprits that typically stump agents. These include date pickers, drop downs, certain types of filters. These components are different on every site, and often agents don't know exactly how to handle,

Thumbnail 320

you know, waiting the correct amount of time to load after you type in a zip code, these sorts of idiosyncrasies. So we spent a lot of time collecting training data specifically on these components, evaluating our models on these areas to make sure we get to that end-to-end reliability.

The second is using reinforcement learning specifically on web simulations or gyms. So the way that this works is we've built hundreds of examples of mock websites that have the similar components and types of UI patterns that you see across the web in the workflows that our customers are performing every day. And then we tell the model to go complete a task, and we don't specify the workflow or the steps that the model should take. We just validate that that end state is successful. And so what this allows is for our model to do a large amount of exploration of these different platforms to really understand what's possible, what happens when I do A, what happens after that, and to really understand different patterns for success here. So this has been a really key part of achieving reliability.

Thumbnail 370

And then lastly, we've really kept real-world evaluation as our focus. You'll see some exciting benchmark results with the latest release of Nova Act. We're really excited to see that our model is outperforming models of similar size like Haiku and even much larger ones like Sonnet. But ultimately, the metrics we care most about are related to customer success. And what we're seeing with Nova Act is that early customers are seeing upwards of 90% reliability on the workflows that they're deploying in production. And this is really important to us because we believe that an agent that is 50% reliable is 0% useful. Customers need agents that actually work in production, and this is what we're seeing today that we're really excited about.

Thumbnail 410

Thumbnail 420

Thumbnail 430

Thumbnail 440

Even the most reliable agents need help and need human oversight sometimes. And so with this release at re:Invent this year, we are excited to launch human-in-the-loop capabilities with Nova Act. So this allows you as a developer to configure the ability for the agent to call on a human either to take over on a task or to review a task before the agent continues. You can do this through platforms like Slack, through custom integrations, in this case a custom UI, and facilitate this human supervision of fleets of agents.

Thumbnail 450

And then we're also expanding beyond the browser. As I mentioned, browser is the place that we started off with Nova Act. What we've heard from customers is folks are excited to extend that same level of reliability across their full workflows even beyond the browser. And so we're getting started on this in preview with Nova Act now. We see customers doing things like reading their QA tests from Jira and then implementing them using Nova Act in the browser, or taking form fill inputs from Excel and then filling out that form in the browser. So we're just getting started on this journey but really excited to extend beyond what the browser itself is capable of.

Thumbnail 480

All of this is very exciting, but what we've also learned is that having a really reliable model is necessary but not sufficient for building great agents at scale. Customers run into these questions: How do I debug? How do I measure success and deploy and scale? And these are equally important questions in the journey to developing agents that really have true business impact. So now I'll have my colleague Ian talk about how we've tackled this problem as well.

Thumbnail 510

Thumbnail 520

Building an End-to-End AWS Service with Enterprise-Grade Security

Okay, thanks, Kelsey. My name is Ian. I'm a product manager and a member of technical staff at the AGI Lab, and it's great to see everybody here. So let's go back a little bit to how we started on this journey. So in March of 2025, we released a research preview, and since then we've had a bunch of people using our product, using our system, and they gave us a lot of feedback. Some of the feedback was about reliability, about the 90% reliability that Kelsey mentioned, which is absolutely critical. But we also got feedback that in order to release an agent into production in an enterprise, we need security. It needs to have all of the AWS security capabilities that everybody here has come to know and love.

So in July, we did our first integration with AWS, and our first users were able to use AWS authentication and S3 for saving the logs and things like that. So that was our very first attempt at integrating into AWS. This week, we're happy to announce that we've launched as a fully generally available end-to-end AWS service, fully integrated. We have an AWS console, we have basically everything that you expect from a proper AWS service, and I'll walk you through them now.

So our new AWS service includes the following. First of all, it includes a frontier class, state-of-the-art model. As Kelsey mentioned, it's as good as anything on the market, if not better.

We have a new AWS service and console, which I'll walk you through in a few minutes. We have a new playground where you can all go to our online playground and try out the product without having to download an SDK or write any code. You can see if it works for you online, and if it works for you, then we have the SDK which you can use for coding in Python. We've had an existing one, so this is an updated version that uses our new model and is able to work with the new capabilities like human in the loop and things like that. And we have a new version of our IDE extension, which I'll show you as well in a few minutes, again to use the new model and the new capabilities.

Thumbnail 650

Thumbnail 660

Thumbnail 670

Thumbnail 680

We've built a CLI which makes it really easy once you've built your agent to package everything up into an image and deploy it. And we have these new capabilities like human in the loop and tool use. So we're very excited about this. It's our end-to-end platform. We feel that it provides a few benefits to you. First of all, frontier class accuracy. But more importantly, while benchmarks are important, real-world reliability is actually much more important to our users. So we've spent a lot of time training our model to make sure that it can achieve over 90% reliability on the sort of typical day-to-day enterprise use cases.

Thumbnail 700

Thumbnail 710

Thumbnail 720

Thirdly, cost effectiveness. We've priced it very aggressively. We feel that it is the most cost-effective solution on the market for similar products. And lastly, time to value. We're really proud of the work we've done in building a terrific developer experience, and all of these together, simply put, we feel that Nova Act is the best service for creating AI browser-based agents and creating ones that you want to actually use in production in the enterprise.

Thumbnail 730

Thumbnail 750

Thumbnail 760

Developer Experience: From Playground Prototyping to Production Deployment

So let me walk you through that developer journey. First of all, you can prototype on the online playground, build with the SDK and our IDE extension, deploy to AWS, and observability via the AWS console. And now I can show you some examples. This is what our online playground looks like. You can see on the left you can enter natural language prompts. On the right, we've got an embedded browser which is currently running a web gym, a simulation of a travel site. This is a travel site we mocked up for buying tickets to other planets. And on the bottom, you can see the agent's thinking steps. So you can try out your use case and see if it works here.

Thumbnail 780

Thumbnail 790

When you're ready, you can click download, and it'll download the agent you've developed as a Python script, which you can then continue developing on your laptop. On your laptop, you can use the Nova Act SDK, and we highly recommend you try out our VS Code IDE extension. It works on any VS Code compatible IDE, and it's got a really terrific UI that really simplifies things. On the left, you can see a notebook-style UI where all of the agent steps can be broken out into separate cells.

Thumbnail 810

Thumbnail 830

Thumbnail 840

What this gives you is the ability to iterate and keep on tweaking an individual step until you've got it exactly right. Without this, you have to keep rerunning your agent from the beginning, and it's a real pain waiting for it to go through 30 steps until it gets to the one that you've tweaked. Our initial users have said that it can save them up to 5x, up to 80% of the time. And on the right, you can see we've embedded a browser right in the IDE. So again, we've got everything that you need in one space.

Thumbnail 850

Again, when you're ready, we have a deploy tab, and this makes it super easy to insert your AWS credentials. And with one click, it'll build the agent image that you need to deploy. So it'll use your Python script, the SDK. It'll also take care of allocating the resources for you, the ECR repository, S3 bucket, set up your IAM execution roles and things like that, and deploy everything to AWS for you. So again, super simple.

Thumbnail 880

Thumbnail 900

And then once you've deployed your agent to AWS, you can use our console and see all of the agents that you've deployed, and for each agent you can see a record of every single agent run. So this is an example. You can scroll down and you can see all of the screenshots, all of the thinking statements, the prompts, and all of the actions that the agent took. So this is terrific for troubleshooting and for customer support.

Thumbnail 910

Thirdly, you can also use Nova Act as part of a multi-agent framework. Here's a demo of something that we built that uses Nova Act in conjunction with two other Nova models, like a Nova Chat LLM model and Nova Sonic, which is a voice model. I'll just play this video for you to see.

Thumbnail 930

Thumbnail 940

Thumbnail 970

Thumbnail 990

Thumbnail 1000

Hey Nova, I'd like to plan a trip to the closest exoplanet. Maybe something warm with a beach. The closest exoplanet we know of is Proxima Centauri B orbiting the star Proxima Centauri, which is just 4.24 light years away. Proxima Centauri B has abundant green oceans and beautiful beaches of fine orange sand. The average temperature is 86 degrees Fahrenheit. What date would you like to travel? In one month. So, January 3rd. Okay, great. Would you like me to book a flight to Proxima Centauri B? Let's do it. Got it. Let me generate instructions. Okay, first, I need to find a ticket from San Francisco to Proxima Centauri B for January 3rd, 2026. I think you guys get the idea.

Thumbnail 1010

Thumbnail 1030

Thumbnail 1050

Thumbnail 1060

Basically, the idea here is that you can build really cool experiences by using multiple agents together in the same agentic solution. As we've been working with design partners and with the initial users over the last year, we've seen a lot of different use cases appear. In fact, if you think of Nova Act, it's a very powerful and very low-level primitive that can be used for innumerable use cases, literally thousands and thousands of use cases. But as we saw what people are doing, we've seen that four typical clusters of use cases appear most often, and these are use cases that we feel are the lowest hanging fruit and will allow you to really get a lot of value in the near term.

The first one is web QA testing. Today, if you want to build a regression test on a web application, you need an engineer who needs to write code using something like Selenium or Playwright, and that code is brittle. If a button moves in the website, then suddenly that code doesn't work anymore. With Nova Act, you can use natural language prompts, and it'll understand what the site is supposed to do, understand if the design has changed, and it'll just work.

Thumbnail 1090

The next example is data entry. Every company has many workflows that involve doing things that involve manual transactions with websites. For example, salespeople, after a meeting, they have to come in and enter in a bunch of information into a CRM system. People have to file taxes, people have to file licenses or apply for licenses in different governmental websites. There are tons and tons of these, we call them undifferentiated manual tasks that people have to do that don't really add that much value. Wouldn't it be great if we could help you automate those so people can do what they really like to do at work, which is be strategic and creative and do what their real job is.

Thumbnail 1140

Thumbnail 1170

Similar for data extraction, there are many industries like healthcare or logistics and many others where there are thousands of fragmented businesses and websites, and none of them are going to have APIs in the near future. Having a system that can automatically reach out to these different systems and collect data from them is of huge value and saves a ton of time. And checkout flows for e-commerce and for travel and things like that, we've also seen as being a very popular use case, and people are automating thousands of these at scale.

Thumbnail 1180

Thumbnail 1200

1Password's Universal Sign-On: Scaling Website-Specific Intelligence with Nova Act

Now what I'd love to do is introduce you to some of our design partners. They're going to walk you through a little bit about their company and how they've been able to innovate using Nova Act. The first one, I'd like to invite Floris to the stage from 1Password. Thanks. Hey, everyone.

Thumbnail 1210

Hi, I'm Floris van der Grinten from the engineering team at 1Password, and today I'm going to show you how 1Password is using agentic AI to improve our own product in a way that just wouldn't be possible in a pre-AI era. The star of the show here is really Nova Act. So a little bit about 1Password. We secure 1.3 billion credentials for 180,000 businesses and millions of users who use 1Password every day to log into their favorite websites. We don't just store logins. For the developers in the room, we also store SSH keys and sensitive .env files, and these also come with native integrations in the desktop apps.

Thumbnail 1250

Thumbnail 1280

So let's talk about autofill. We have a browser extension, and this browser extension will add a small 1Password icon next to your login forms. If you click on that icon, you can choose a credential that you want to use to log in, and 1Password does the tedious work of filling out the form and getting you logged into your website. This has been out for a while and it's been working great, but it's time now for the next generation of autofill, and this is what we're calling Universal Sign-On.

Thumbnail 1320

With Universal Sign-On, we want to take the experience from "Hey 1Password, fill in this form for me" to a more high-level approach where you say "Hey 1Password, just log me in. Just do whatever it takes to log me in regardless of the login method, whether that's a username and password, a TOTP MFA, enterprise SSO, a passkey, or signing in with GitHub or Google," which you probably forgot which one you used with which website again. So here's a preview of what this looks like. Now you can just click on a website, and it'll immediately navigate to the login page and immediately log you in with a password and an MFA token, and you're just logged in like that.

Thumbnail 1330

Thumbnail 1340

Now let's talk a little bit about how this works. There's unfortunately not a standard protocol for logging into websites. It's basically a free-for-all HTML, and there's a lot of ambiguity out there. Every website does it slightly differently. Autofill, the classic autofill algorithm, solved the ambiguity with a one-size-fits-most algorithm that's based on heuristics, and it's been working quite well. It's being used millions of times every single day. But with the vision that we have for Universal Sign-On, we're running into the limits of the heuristics that we can articulate in our code. Because right now, the browser extension doesn't just need to know how to fill in a form, but also how to navigate to the form and how to navigate through the form, which is just a lot more complex.

Thumbnail 1390

So to make this a success, we're going to need website-specific logic, website-specific instructions on how to complete the login. But as you can guess, this doesn't scale if you need to do this by hand because there are millions of websites out there that offer a login. Even if we were to do this massive undertaking, it would be very, very brittle because we would see breaking changes on a daily basis, which is really not acceptable.

Thumbnail 1420

So this is where Nova Act comes in, and what we've done is we've built an AI agent that uses Nova Act and that goes out and browses all these websites. It will collect the necessary information about the specific oddities of each website, and then we have a second agent that validates this intelligence that we've gathered. Then it passes it on to what we call the Site Intelligence Engine. The Site Intelligence Engine makes it available to our browser extension, and the browser extension runs on the user's device. The nice thing about this is that all the information gathering and validation can run on our infrastructure out of band, and the browser extension login flow will remain blazingly fast and also deterministic, which is really important. The validation, we can also run this on a periodic basis to see if the intelligence that we've gathered is still accurate and correct, and if not, invalidate it.

Thumbnail 1480

Thumbnail 1500

So let's look at an example of a Nova Act agent in practice. Here it's going to navigate to the AWS re:Invent website, and this is actually a pretty simple example because its job here is to get to the login form. As you can see on the top right, it has a big login icon, so this is a pretty simple one. Let's see if it's able to find it. There we go. Now it found the login form and it knows that it completed its task. Now let's look at a slightly more complex example.

Thumbnail 1510

Thumbnail 1530

This is Duolingo, and this one doesn't have a traditional login button at the top right, but it has a button that says "I already have an account." Also, to make it more complicated, it doesn't have an href tag. It has a JavaScript handler, and this would be a bit more tricky to build with a heuristics-based algorithm. But for humans it's super easy because it just says "I already have an account," and because Nova Act takes the same human approach, it's able to just as easily get to the login form here as well.

Thumbnail 1540

Now let's have a look at the logs here. Along the way, Nova Act will log the steps that it takes as part of the evaluation loop. So here you can see it really thinks like a human. It knows what it needs to do and it says it found the button that says "I already have an account." Then it figures out that it should click it, then it does the actual click, and then it evaluates the result again and it knows that it found the login form and it knows that it completed its task and that it needs to return now.

Thumbnail 1580

So to recap, the website-specific intelligence can meaningfully improve the 1Password products, and Nova Act is really the thing that enables us to do it at this scale and in a way that we just could not have done in the pre-AI era. We still have a long way to go here. We're just scratching the surface, but you can already try out the new universal sign-on experience in the latest beta version of the 1Password browser extension if you're interested. I'll pass it on.

Amazon Leo's QA Automation: From Prototype to Production in Five Weeks

Perfect. Thank you, Floris. Super interesting. And now I'd like to invite Matthew from Amazon Leo.

Thumbnail 1630

Hey folks, so I'm Matthew. I'm from Amazon Leo. Amazon Leo is the next generation of satellite Internet connectivity that Amazon is building. We currently have 158 satellites in space right now, always launching more. Space always elicits a little sense of wonder and whimsy, right? And so as we get closer to launching our beta product, we had this really big task because we set zero critical customer bugs being reported across our web and mobile browsers. And that's really aggressive, right? We have hundreds and hundreds of test cases we need to perform. We have an aggressive timeline because we're always building, we're always shipping, and we have weeks to do this, not months. So I'm going to talk a little bit about how we leverage Nova Act to go from a prototype into a production-grade QA automation system today.

Thumbnail 1680

Traditionally, you kind of see our traditional automation has really complex code. You have to work with multiple frameworks. Each of those are really its own unique specialty, right? You need people who understand Appium selectors for mobile if you want to serve Android and iOS. You need somebody who really understands Selenium or Playwright and knows how to handle page jitter and how long should you wait. You've got to get it just right. Nova Act inverts that, right? I don't need to know that anymore. I don't need that specialty.

Thumbnail 1710

And so what I can do instead is turn something else into it. So what we've done is we've taken a little bit more opinionated approach to natural language. We had hundreds and hundreds of these Gherkin test cases, you know, given-when-then, for our customers. And so we built an agentic framework around it. On one side, we take this given-when-then statement, we use a Strands agent like we saw earlier today, and we convert that into the Nova Act command on the fly. Then we determine what type of test is this? Is this a web test? Is this a mobile test? And so if it's web, we use the Playwright actuator that comes baked in with Nova Act. For mobile though, Nova Act doesn't support it, but they do have a really extensible SDK framework that allowed us to write our own Appium actuator to go in and start performing these actions in our mobile app. So we have one SDK, one unified interface, with no platform considerations anymore. You just run the test, it figures everything else out for you, right?

Thumbnail 1780

And so one of the other things we talked a lot about today is, you heard the 90%, right? And the 90% and the 90% and the 90%. When you're running QA automation and you're looking for understanding the perfect customer experience, you need that to really get closer to 100%, right? And so one of the other things that Nova Act provides us is the ability to export a trajectory. And what that is, is every piece of information that Nova Act performed during the test, it saves. It saves a picture of the page, it saves the DOM, it saves where it clicked, it saved how it thought about the problem.

What we can do is then replay that deterministically for our next run. We've built a self-healing replay engine too. As our page changes and our customer experience is changing, our core mission of the test case hasn't changed, but the page has moved around a little bit. We will fail our test case because we couldn't necessarily match in the deterministic way anymore. We will rerun it generically, and then save that. The next time through, it'll pass. So we're running three times faster. There's no more non-deterministic behavior that we're worried about over and over again, and we get really good confidence that our experience is shipping the way we want it to.

Thumbnail 1860

Thumbnail 1870

Thumbnail 1880

Thumbnail 1890

Thumbnail 1900

Let's take a look at what this looks like in action. You can see step one, given a user successfully navigates to Leo.amazon.com. Step two, when the user clicks on the join the list button in the header. Step three, and the user enters Leo user at Amazon.com into the email input field. Step four, and the user enters 98052 into the postal code input field. Step five, and the user selects United States from the country dropdown. Step six, and the user clicks the submit button. Step seven, then the user should see a confirmation message indicating successful submission.

What you got to see is that we transformed everything on the fly, and we're running this in real time. We get to see how our systems are thinking about it. It's very easy for us to debug, and it gives us really high confidence in both when we do see a failure, why are we failing, and when it's passing, we know exactly why. What was it thinking? How did it get there? We can see this is the type of customer behavior that's going to happen in real time.

Thumbnail 1940

Again, prototype to production in five weeks. The first one or two weeks is really setting the groundwork, making sure that our agents, you know, you've got to handle throttling, all those little fun things when you're engineering. Weeks three and four, we're getting it production ready, moving it into accounts that can hold and manage all the data that we're running through. Week five, we're now running 200 live scenarios, 3600 validation points across web, across iOS, across Android. We think we've estimated and saved about 60 dev days to date, and we're saving another 30 every month.

Thumbnail 2000

Thumbnail 2020

Sola's Agentic Process Automation: Powering Enterprise Workflows at Scale

Thank you and I will bring up Neil. Cool, thank you, Matthew. And now I'd like to introduce Neil from Sola. Thank you. Everyone Hey, I'm Neil, co-founder and CTO of Sola. As we know, enterprise work today happens across more systems, teams, and tools than ever before, and process automation remains a massive challenge with teams struggling to get value. So what do we need in this next generation of process automation tooling to automate meaningful core operations of businesses? We need the tool to understand what people do. That means observing their work, figuring out their process, and capturing logic and context effectively. We need it to generalize across all these systems on browsers, on desktops and beyond. We need it to handle challenging and dynamic digital work. And of course, we need the solutions we build to scale to enterprise volume.

Thumbnail 2060

This is where Sola comes in. Sola is an agentic process automation platform. Some of the largest enterprises in the world, Fortune 500s and the largest private enterprises across verticals and industries use Sola to power their businesses, building intelligent, flexible automations that do everything from medical data entry to financial compliance to legal back office and much, much more. Sola automations sit on top of systems and interact directly with digital applications, and underneath the hood, this is powered largely by computer agents. They're the systems that everyone's been talking about today. They are systems that can see, understand, and operate applications just like you and me.

Thumbnail 2110

In this video here, we can see a Solobot running a generalizable track and trace workflow. The Solobots can understand how a workflow is done by watching someone do it, converting their process into a visual diagram that users can modify. Then it can execute real executions of these workflows, adapting to dynamic interfaces and automatically updating the diagram based on encountered scenarios. It does all this while providing observability and learning from human intervention when needed, becoming better and better over time. These bots help handle some of the most complicated manual workflows for businesses at scale.

Thumbnail 2150

As mentioned, a lot of what powers this is Claude models, and we use a variety of them under the hood. Nova Act in particular though, fills an important niche for what Sola does. It's a powerful workhorse for our computer use needs. It's steerable, it adheres to complex instructions reliably, it allows us to enforce strict guardrails to guarantee the reliability that enterprises need, and it's able to handle complex interfaces with state of the art intelligence while operating in real time.

Thumbnail 2190

Here's a simplified diagram of how a Solobot can use Nova Act, and generally it's a reliable framework for using Claude. If there's an instruction that's been scoped for Nova, the Solobot will hand off the task to an orchestrator agent. This agent has those instructions along with context about the workflow, the current execution, previous executions, business context and logic, and more. Given all that information, the agent will break down the task into subtasks. In this case, we have action A, action B, and so on. The actions will be handed off to Nova Act sub-agents, which will go off and complete those tasks. Then given all the context described before, along with the output of the Nova Act sub-agent, which has transparent reasoning and action traces, the orchestrator can validate that action and plan and update subsequent tasks accordingly.

Thumbnail 2240

Nova Act is specially built for this kind of UI automation. The piece I mentioned before is just one part of our agent harness. We have a ton of places we use Claude models. The Nova Act SDK makes it really straightforward to integrate across our entire platform while also supporting the observability that we need for monitoring. Its extensibility allows us to set up custom tools to complement the rest of our harness. As a workhorse model, it's fast and reliable, keeping workflows moving in real time, while also automatically handling those edge cases like complex error states and conditional logic, while also effectively calling human in the loop when needed.

Thumbnail 2280

Here's a more advanced fleet orchestration pattern that the Solobot can deploy, which I also think is a good framework example. Similar to before, the orchestrator can break down actions into tasks, but here, these are handled by Nova Act sub-agents in parallel. The thinking and action traces and the results of each of these Nova Act sub-agents are aggregated via an aggregation agent that's then passed back to the orchestrator to plan and conduct future tasks. With this kind of system, we can achieve near 100% reliability on these core enterprise workflows.

Thumbnail 2320

Here we can zoom in on a specific case. Here on the right we can see a representation of the visual diagrams on the Sola platform. On the bottom left we can see an example of a portion of this workflow where it's logging into a medical portal, it's navigating to the patient, updating the patient field, and on the upper left we can see the agent traces. So this is the model doing that update patient field. In this case, it's non-trivial. It's not just updating one specific field. The model needs to understand that it needs to click on a button to add an entry, and then it needs to look over the entire form, figure out exactly where in the form that update needs to happen, and then put the relevant information in very reliably.

Thumbnail 2360

So Sola is a customer of Nova Act, but downstream of that companies like R1 RCM, one of the largest revenue cycle management platforms in the US with tens of thousands of employees, use Sola to tackle back office work. And for definitions, RCM stands for revenue cycle management. It's basically how your doctors get paid. Because Sola workflows are adaptable, they can handle the hundreds of different payer platforms that a portal like R1 needs to interact with on a regular basis.

Thumbnail 2390

Nova Act has been integral to the Sola platform, allowing us to push the boundaries of what computer use models are capable of, to conduct real world work for enterprises. With partners like AWS we're able to support automating the most core and critical operations of businesses today. Thanks, I'll hand this off.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)