🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Build AI your way with Amazon Nova customization (AIM382)
In this video, AWS solutions architect Veda Raman, Amazon product manager Dan Sinnreich, and Terra Security CTO Gal Malachi discuss customizing Amazon Nova models for security and content moderation. The session introduces the new Nova 2 family (Lite, Pro, Omni, Sonic) and explores four customization techniques: RAG, supervised fine-tuning, alignment, and continued pre-training. Dan demonstrates how AWS customized Nova models using LoRA adapters to enable sensitive content moderation for cybersecurity, law enforcement, and media use cases while maintaining core safety. Gal shares how Terra Security fine-tuned Nova Pro to build AI agents for penetration testing, solving the "Guardrail Paradox" by creating a custom Guardrail Checker that improved true positive detection from 80% to 92%. The presentation emphasizes that customization with proprietary data creates durable competitive advantages beyond generic model capabilities.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
Introduction: Customizing Nova Models for Security and Content Moderation
Welcome everybody. Genetic models fail in specific ways. Let's fix that. Thank you for joining us today on this session where we talk about Nova customization, specifically for security and content moderation. My name is Veda Raman, and I'm a solutions architect for Gen AI at AWS, and I'm joined by my wonderful co-speakers Dan and Gal. Would you like to introduce yourselves?
Yes, hi, I'm Dan Sinnreich. I'm a product manager with Amazon AGI working with responsible AI controls. Thanks. Yeah, and hi everyone. My name is Gal Malachi. I'm the co-founder and CTO of Terra Security, and we are doing agentic penetration testing platform. Awesome. Thank you.
So before we dive deeper into Nova and customization, quick show of hands, how many of you have used Amazon Nova models? Okay, quite a few. Another question for you: how many of you have customized Nova models or any model for that matter? Okay, not many. So I hope by the end of this session, we can convince you to customize and use Nova models.
Over the next one hour we're going to talk about the Nova models, introduce the new Nova models, and talk about how to customize Nova models and why customization is really necessary. And then Dan will talk about how we customize Nova models to enable sensitive content moderation use cases and then finally Gal is going to talk about how at Terra Security they customize Nova models to enable agentic penetration testing.
Introducing the Amazon Nova 2 Family of Models
Before I dive deeper into customization, I wanted to take a quick moment to introduce you to the new Nova 2 family of models that we launched yesterday. Starting with Amazon Nova 2 Lite, which is our most price performant hybrid reasoning model with excellent performance on agentic and tool calling use cases. Amazon Nova 2 Lite is generally available.
Amazon Nova 2 Pro is our highly capable multimodal model with increased performance on complex tasks such as coding and agentic use cases, which is also a hybrid reasoning model. And then we also launched Amazon Nova 2 Omni, which is our multimodal reasoning model which can accept in addition to text, image, and video, it can also accept audio and speech as inputs, and it's capable of producing text and image as output. And finally we also launched Amazon Nova 2 Sonic, which is our speech to speech model.
Here's a deeper look at the capabilities of each of these models. As you can see, Amazon Nova 2 Omni, in addition to text, image and video, it can also accept audio input and produce speech and text, and produce image and text as outputs. All of these models have a 1 million context window length.
Why Customization Matters: From Generic AI to Business-Specific Solutions
All right, so with that introduction to the Nova family of models, let's dive deeper into why we need to customize them. Gartner predicts that by 2027, more than half of the Gen AI models used by enterprises will be domain specific. The one size fits all approach of these general purpose models will not be sufficient for your specialized needs. As your enterprise integrates more and more AI into your business, there's a growing demand for the models to understand your business context and your data.
And accuracy alone isn't competitive. Every organization has access to good models now. What sets you apart is how you use it. And customization becomes your bridge between the generic AI and your specific business reality. So how can customization really help? Customization is how you can capture your unique IP and how tasks are done. You have your unique workflows, your unique business processes, and customization helps you embed these into the model. Customization also allows you to align the responses to your brand voice. A generic model doesn't sound like you.
Customization is what brings that consistency across every interaction. Customization helps you ground the model in your proprietary knowledge. There are no more generic answers when you customize the model. It helps improve accuracy and safety in domain-specific scenarios, and Dan is going to dive deeper into how we use customization to enable sensitive content moderation use cases. Lastly, it's also how you gain durable differentiation more than what the generic models can offer. Generic models can always catch up, but your customization won't.
Now that you understand why customization is necessary, let's look at how you can actually do customization. Amazon Nova offers these four types of customizations, each serving a different purpose. Starting on the left, RAG, which is short for retrieval augmented generation, is how you use the in-context learning capabilities of the model and customize the model to ground the responses in your own knowledge. It's the simplest and easiest way to get started because you're not really changing the model parameters or the model weights when using RAG.
The next option is supervised fine-tuning, which is a little more involved and complex because you're actually changing the weights of the model. Supervised fine-tuning trains the model with your specified specialized knowledge for specific tasks. If you have a task where you want the model to be really good at summarization or really good at Q&A, you can use examples from your workflows and train the model with that input and output data sets.
The third type of customization is alignment. Alignment is when you want the model to sound like your brand and have a specific tone. You would have heard about things like reinforcement learning, and these are all alignment techniques. When you want to align the model, you use feedback either from human preferences or from reward models and tune the model to be your own brand voice.
Finally, the last technique is continued pre-training, and this is really useful when you have niche domain data that the model might not have seen before and you want the model to gain deep domain expertise. You continue the pre-training process and use your unstructured data to train the model. The model not only absorbs the general knowledge but also your niche domain knowledge.
How to Customize Amazon Nova: Bedrock, SageMaker AI, and Nova Forge
Here's a deeper look at what customization techniques are available for Amazon Nova models and all the options available for you. You can customize on Amazon Bedrock or Amazon SageMaker AI or use the newly launched Amazon Nova Forge to customize Amazon Nova models as well. On the left here you see all the different customization techniques. For supervised fine-tuning you have parameter-efficient fine-tuning and full fine-tuning techniques. In terms of alignment you can either do direct preference optimization or proximal policy optimization, or the newly launched reinforcement fine-tuning as well.
If you want to do knowledge distillation, that's supported on Amazon Bedrock or Amazon SageMaker AI or Amazon Nova Forge as well. In knowledge distillation, you're taking a larger model as a teacher model and training a smaller model, which is a student model. The student model, which is a smaller model, will be more cost efficient but as intelligent as the teacher model.
Let's take a deeper look at each of the options and how you can use them to customize, starting with Amazon Bedrock. Amazon Bedrock provides you a managed way to customize your Amazon Nova models. Bedrock provides you Bedrock console access or an API method to customize the models, so you can get started in three easy steps. You select the source model that you want to customize, you specify the hyperparameters and the input data that you want to customize on, and then using the API or the console you set up the customization job and Bedrock takes care of customizing the model for you.
If you want to do full customization, you can use SageMaker AI. SageMaker AI has pre-built recipes for both fine-tuning and continued pre-training. These recipes make customization easier by taking away a lot of heavy lifting. For example, you need to figure out the right type of instance and the number of instances you want to use for customization. All of that is taken care of for you with recommendations, so you don't have to experiment and run multiple different iterations of customization. With SageMaker, you can also easily switch between multiple different accelerators for your training.
With SageMaker AI, you get started with three easy steps. You specify the training and validation data directories, select the recipe from either SageMaker Hyperpod or SageMaker AI, and run the recipe to get the customized model as a result. Once you customize your model, you want to do inference with those models. You can bring those customized models to Bedrock to do inference. Bedrock offers two different ways to do this. You can either do on-demand inference or provision capacity for inference. With on-demand inference, you get instant access to the model via the API and pay based on per-token pricing.
When you want to do provisioned inference, which is mostly for production use cases, you get dedicated performance with fixed pricing based on the number of model units that you provision. Finally, I'm going to talk about Amazon Nova Forge. This was recently launched, and Nova Forge is a program you can use for deeper customization of Nova models. With Nova Forge, you get access to multiple different checkpoints of the Nova models, whether pre-training, mid-training, or post-training checkpoints. You can bring your own reward function and use the newly launched reinforcement-based fine-tuning to do alignment.
You can plug your real-world proprietary environments into Nova Forge. You can also do knowledge distillation, distilling from a larger teacher model to a student model. You also get access to the responsible AI toolkit. With Nova Forge, you get early access to newer models in preview, including the Amazon Nova 2 Pro model and Amazon Nova 2 Omni. With that, I'll turn it over to Dan to talk about content moderation and how customization can help you there.
Customizing Content Moderation: Enabling Sensitive Use Cases with Alignment and LoRA Adapters
Thank you very much, Veda. I appreciate it. I'm going to do what Veda did and maybe ask for a show of hands. How many of you have run into safety guardrails either using Nova or using another large language model, or maybe you anticipate that you might run into safety guardrails because you have sensitive content that you're processing? I see a few hands. Thank you for the feedback. Whether you've encountered these guardrails or you're anticipating that you might run into them, don't worry. You'll see that there are very good reasons for running into them, and you're in a safe space in this room.
We'll show you how we can customize those for you. The good news is that we've built solutions to allow you to customize these guardrails. Before we dive deep into the solutions, I think it's worth highlighting a little bit about Nova's responsible AI architecture. First, I want to highlight that we are responsible by design. We ground ourselves in the eight core dimensions of responsible AI. There are science papers about this. The most important thing to remember are these dimensions around safety, privacy and security, fairness, explainability, and so on.
I'll show you how we can customize these later on, but keep these in mind. This is really how we design our models. We also participate in various industry-leading collaborations. For example, we work with organizations like the Frontier Model Forum, the Partnership on AI, and various other government forums. Earlier this year, we published our Frontier Model Safety Framework, which supported the Korea Frontier AI safety commitments.
We also partner with third-party evaluators. We find those to be very good, especially for red teaming. They have special skills and capabilities that we want to take advantage of as we build and design our models. We also work very closely with academia. For example, earlier this year we hosted our first Amazon Nova AI challenge. We had 10 elite university teams that competed in a head-to-head tournament. Half of them, so 5, tried to build jailbreak bots. The other half built the safety guardrails and they tried to harden the models. We had a lot of innovations and ideas that came out of that. We've also published on that. If you're interested, let me know. I'm happy to point you to it. We've announced our second one, focused on trusted software agents next year. So if you're affiliated with a university and want to participate, look that up or let me know after the talk.
We also work very closely with customers to understand their needs. Everything that Veda spoke about and everything that I will discuss is grounded in customer feedback. One of the pieces of customer feedback we received is that some of the content moderation guardrails don't work out of the box for all customers. Let me explain and give you some examples of that. For example, let's say you are an internet or cybersecurity firm, or you're building security tools and you're looking to use a large language model. Those use cases can include generating test malware code, simulating cyberattacks, and developing various security testing scenarios. That malicious code and pen testing may get deflected by content moderation guardrails, so we want to customize those.
The same thing applies in law enforcement and media and entertainment. Honestly, sometimes you'll see very similar content around crimes, drugs, violence, illegal substances, and violent content with mature themes. They sort of go together in these two industries oftentimes, and they're valid use cases why you may want to use a large language model to understand that content. Similarly, for online platforms, they need to moderate their content, and there's all sorts of things out there that typically a large language model would deflect even if you're using it for valid purposes. You can see that there's a real diversity of use cases here. They're all valid business needs that require customized content moderation settings.
So how do we enable these use cases? We did so using a lot of the concepts that Veda talked about, and I'll touch on with alignment and fine-tuning. Before we do that, let's look at some of the core components of our Nova models and our content moderation tools around the Nova models. There are really three components. There's alignment, and notice it's the same word that Veda used when she talked about the capabilities of customization. This refers to the fact that the models are trained to respond in a certain way. We use supervised fine-tuning, or SFT, which is one of the methods Veda talked about, and RLHF, for reinforcement learning with human feedback, to align the models and to make sure that they're designed in such a way that they respond in ways that are consistent with those eight dimensions that I mentioned.
For example, if you ask them to generate foul language, even if you're asking the model to summarize a web page and tell me if there's foul language, it may not do so because it's designed not to generate that. The same thing applies with dangerous weapons and other adult content. That's the alignment piece of the model. Guardrails are the first and last line of defense around the model. We have input moderation guardrails and output moderation guardrails that help us quickly and robustly respond to any gaps that the model might have. Maybe the model does, on occasion, generate content that it shouldn't, since it's a stochastic process. We have a guardrail around that.
We also have extensive safety evals. We have lots of internal benchmarks that we use to test our models before they're released. We have over 300 distinct red-teaming techniques, and this is also where we use and collaborate with external firms, especially in the area of chemical and biological risks where there are firms out there that have really good expertise in that. Let me show you how we put these into practice and how they work when you use Nova.
When you want it to give you a response, here is our RAI framework as it's applied at what we call runtime. Runtime is when you actually say, "Here's a question, here's a request," and the model provides a response. You can see that the user provides an input. That input is first moderated by input moderation guardrails. We won't touch too much on these and we don't customize these too much, but if that input, if that request is not deflected, that prompt passes on to the model itself.
The model is designed around RAI dimensions during training, so it generates aligned content. It will deflect unaligned content on topics like weapons, mature language and themes, and malicious code. Let's say the prompt goes through, the model processes it, and then we have output moderation guardrails that filter out sensitive content that the model could occasionally generate. At the end, you get a system output. This is what happens at runtime every time you do a request. It goes through these very quickly and generates or deflects if the content is not aligned.
Now that you see how the framework works, let me show you some examples of where we wanted to provide additional flexibility to solve the use cases I mentioned earlier. What we've done is customize for specific types of content using some of the same concepts that Veda spoke about. For example, safety allows you to generate content in areas including dangerous weapons and controlled substances. Sensitive content includes profanity, bullying, nudity, and other mature themes. Fairness has to do with bias and culture considerations, such as stereotypes against various groups. Security covers content such as malware, phishing emails, and malicious code that you might find.
What's interesting is that security tends to stand alone. There's definitely a Venn diagram where this overlaps, but security tends to be used in cybersecurity use cases where safety, sensitive content, and fairness tend to appear together. Think of a TV drama or a movie script. If you're going to see weapons and controlled substances, you're likely going to see profanity and nudity and things like that, not always, but generally. So those are the four dimensions that we customize. Let me show you the technical details and get down to exactly how we do that using custom models.
These are the three components that work together to enable customization. We use LoRA adapters for the core model, content classification for the output model, and all this is available using Amazon Bedrock. Let me take one of them at a time. For the core model, we train using SFT, supervised fine-tuning, exactly what Veda spoke about earlier. We use a LoRA adapter. The way the LoRA adapter works is that it unlearns specific RAI dimensions while maintaining core safety. There are a couple of references at the bottom left with some really nice research that we published around how to do that.
Basically, with a LoRA adapter, the original model weights stay the same, but you're adding small additive modifications in select layers of the model specific to one of the content areas I spoke about earlier. If you want to allow list, for example, security, we can make changes just in the layers that have to do with security, safety, or sensitive content, while the other things remain unchanged. Those are LoRA adapters, and that helps us to unlearn the alignment of the model. For the guardrail output moderation, our output moderation will classify the different types of content. If a certain type of content is allow listed for the core model, we will also allow list it in the guardrail content classification. If a customer says they need to use security, we say great and let security go through.
That's the content classification on the output. What's nice is the third component. This is all available on Amazon Bedrock as a custom on-demand model, so you do custom on-demand inference. What's nice is that it's the same style and method as if you're using an out-of-the-box model.
Your code and API stay the same. You're just using a custom model iron instead of the base model iron. So that's very nice. The pricing is the same for Bedrock inference for a custom model and for an out-of-the-box base model. So it's a very nice and elegant way to do this.
I'll finish with just a couple of examples here. The first one is in media and entertainment. I'm using Bedrock playgrounds here, where you can actually test all this very quickly. You can see here this first example is a hypothetical TV drama script. I've asked it to give me an idea for a TV drama that is targeted at adult audiences and should have mentions of dangerous weapons, violence, and things like that. On the right-hand side, it's blocked by the core model because it includes dangerous weapons and profanity. But on the left-hand side with the adapter and the allow listing I was showing, you actually get to see and generate ideas for the script. That's the first example.
The second example before I turn it over to Gaal is in security. This is a security example used to analyze a terminal session where a malicious actor got root access to the machine and started executing ping commands to various malicious IPs. This is actually a very common use case. When you ask the base model to explain why this is malicious and what happened in this code, it will deflect because it's been trained not to produce additional malicious code. But on the left-hand side when we add the adapter, it actually will analyze that script, explain what the issues are, and correctly identify the security risks. This is how you can use that customization to address these use cases.
Terra Security's Challenge: The Guardrail Paradox in Agentic Penetration Testing
So that's my part. I'll turn it over to Gaal who will go over how Terra Security is using custom models for their award-winning Agentic AI-powered penetration testing product. Thank you. Hi everyone, and thank you, Dan and Vera, for this beautiful technology and introduction to what we've just seen here. My name is Gaal, and I'm the co-founder and CEO of Terra Security. Just before I show you how we leverage everything that we've just seen here, allow me to quickly introduce Terra and what we do.
Just last month, as you probably heard, Anthropic reported that it blocked the first known AI-orchestrated cyberattack. This isn't science fiction anymore. The attackers actually managed to infiltrate multiple organizations by weaponizing Claude code. This highlights something that every security team and every organization is now dealing with: attackers can scale faster than defenders. This is exactly why we founded Terra Security. If attackers are using AI for offensive operations, then defenders need AI-driven offensive capabilities of their own.
Before we dive in, let's take a quick step back and talk about what penetration testing or pen testing is all about. Pen testing is a practice where ethical hackers, the good guys, simulate real-world attacks and try to discover real vulnerabilities in live systems. Pen testing is not just about finding vulnerabilities; it's also about exposing the blast radius. In 2025, almost 2026, in a world where everything else is automated, pen testing is still 90 percent manual. Web applications specifically are dynamic creatures. They change all the time and they don't have unified structure or any standard, and this makes it impossible to automate or to hardcore complex attacks. Every attempt so far has failed, and it also makes the process slow and expensive, and it just doesn't scale.
So at Terra, our take is simple. The future of pen testing is agentic.
For the first time, there's technology that allows us to reason in real time, just like a human would. At Terra, we haven't removed the human from the loop completely. Instead of replacing, we are augmenting. So how do we do this? We teach AI agents to hack responsibly. Our agents are trained to do every part of the penetration testing process, from discovering the assets, generating test cases, and ultimately executing real payloads to discover real vulnerabilities. This last step, execution, is where value and risk live. So we test live systems and therefore we have two non-negotiables. First, reliability: we must find everything and don't miss any threat. Second, safety: don't harm the system or the users. We do this by using guardrails. You'll see there is a built-in contradiction here. We need to attack and make sure that we don't miss anything, but also do it in a safe and balanced way, and this is very hard to do.
At Terra, we have a name for that. We call this trade-off the Guardrail Paradox. How do we still protect our users and customers from destructive operations, but also how do we not block the system completely from doing its job? Let me explain. Does anyone feel a little bit overwhelmed by this SQL injection payload? A little bit, okay, me too. This is a SQL injection payload. It's very simple, but it's also very destructive. It's a great example of something that we should never run. Never. And why never? Because even in a dev environment, if we execute this query, we will delete all the users from the database, including the user that is used for testing, and then the testing won't be able to proceed. So we cannot run it.
Any guesses if you ask an AI agent to generate a SQL injection payload, what would be the first choice? Well, you guessed right: it's DROP TABLE users. And what about this one? This one is not an obviously destructive payload. Here we changed the role of a user to admin, but in some environments, in some cases, it might be something you don't want to run in production, for instance. So I think you get the idea. We need to choose carefully what to run, and the difficult part about it is that everything is determined at runtime based on the context that exists at the moment of execution. Context matters. And to add more complexity to a fairly complex process, context is dynamic and changing. We operate on different environments, different customers with different tolerance of risk, and we have this saying at Terra: what's safe at noon in staging might be unsafe at 2:00 p.m. in production. So our guardrails have to adapt in real time.
Building a Multi-Layer Defense: From Custom Adapters to Guardrail Checker Agents
We came up with a system of guardrails. First, we chose Amazon Nova Pro as our base model thanks to its balance between cost and performance. Out of the box, the model comes with guardrails by the model provider—things that the model provider won't allow you to do. For our use case, it was a little bit problematic because out of the box, the model won't allow us to generate offensive payloads, as Dan mentioned earlier. So we worked with the Nova team, who collaborated with us, and we got a custom model with content moderation settings that are aligned with our needs. On top of that, we have Terra guardrails—things that we won't allow the models to do, like never drop any database table regardless of the environment that you are testing against. And lastly, we give our customers the ability to provide their own guardrails, and they can say whatever they want, like when you hit a MongoDB database, stop, don't do anything. Each one of these layers is a must.
So let's see what we have so far. This is a very simplified version of our agent. We have the payload generator, which is based on the model from the Nova team, the Nova Pro Security adapter, and we have our layers of guardrails and the custom guardrails on top of that. This agent has attached to it a tool that will allow it to execute the payloads against the target system. When we see that, we think we should be safe, right? Well, actually not quite.
We still see destructive payloads, like DROP TABLE users. Although it doesn't happen too much, we can never allow any of this, as we explained before. So why are we still seeing them? Let's use an example. We all know how large language models love to use the em dash. Have you ever tried to ask them to stop using the em dash? If you did, you know that they will listen to you, but just some of the time. This is exactly the problem that we are having here.
The problem is large language models have seen so many em dashes in the training data that when they need to print a dash, there is some probability that it will be an em dash. The same goes for our agent. When it needs to print a SQL injection payload, there is some probability it will be DROP TABLE users. If guardrails cannot guard everything, we need a second line of defense. If payload generation is offense, then we need dedicated defense. Like every problem, we solve it with a new agent. This time we have a guardrail checker. Unlike the other agent, the payload generator, which has two jobs—generating payloads and making sure they are safe—this agent has one and only one job: block malicious payloads. And it does the job very well. The problem is a little bit too well.
Here is another example. This is an example of a payload that we do not want to block. Here we are just changing the last login of a user to now, and there is nothing destructive about it. The problem is that since we are changing a sensitive entity in the system, a user, our agents sometimes might think that this is something that needs to get blocked, and this is not good because it keeps us from running all the tests that we need to run. So here is the thing: not all blocked payloads are malicious.
When we are actually thinking about it, we have converted one problem into another. Now we have a battle between two agents: the offense, who tries to generate payloads for offensive testing, and the defense, who tries to block them. When you give an AI agent a job to do and there is missing context or doubt, they will usually default to do the job that they were trained to do. In our case, when there is missing context or doubt, the defender will mostly default to block, or in other words, it will default to do what it already knows. So we are stuck. How do we proceed from here?
We need to teach the large language models new behaviors. When our agents need to decide whether to block or allow a payload, it is based on our set of rules. We have seen that most of the problems we are having are due to the default behaviors of the large language models. To do this, we utilize our own custom data, but first, obviously we need to collect this data. At Terra, we did not remove the human from the loop completely, as I mentioned earlier. Our researchers work with our agents, providing assistance and guidance. Sometimes they will even do the last step on a complex attack.
As our researchers work with the system, they rank the performance of our AI agents. This creates our initial dataset. Each result, together with the human feedback, is then saved to a bucket. So let's assume that we have our data and the data is ready. What are our options?
We've seen that we already tried prompt engineering, which really doesn't cut it. RAG or hybrid RAG might help us get more relevant context for the problem that we are trying to solve, but these are still methods of in-context learning and they won't change the default behavior of the LLM. So it's either fine-tuning or continuous pre-training. For our use case, since we already have high-quality labeled data from our researchers, fine-tuning was a very easy choice for us.
Fine-Tuning with Human Feedback: Achieving 92% Accuracy Through Model Distillation
You might think that creating a custom model is a very hard job, right? Well, actually not anymore. With Amazon SageMaker and Amazon Bedrock, it is super easy to create your own custom model and make it available. So before that, let's just talk about data curation.
We have our human in the loop collecting the examples. We set this data into a bucket and then in SageMaker we designed a very simple pipeline. First, we anonymize the data using an LLM because we want to remove biases. We don't want to get affected by specific customer data or specific applications. Then we normalize the data and split it into a training dataset and a test dataset. The training dataset will be used for the fine-tuning job and the test dataset for the evaluation of our newly created model.
Since we are already creating our own custom model, we have the opportunity to make it smaller, faster, and cheaper. The idea is simple. We don't need all the extra knowledge that the larger model has, right? But we do want to extract or transfer the knowledge that is relevant to us from the bigger model to our smaller model. To do that, we use a method called model distillation. Let's see the complete flow.
Again, we are starting with the collected data—good and bad examples. Good examples are where our researchers reached an agreement with the agent. Since we removed a lot of data during the data cleansing process through anonymization and normalization, we want to bring back some of the context. For instance, like the activated guardrails, the risk profile, and the environment that this example belongs to, and so on. Once we have this data, we use the teacher model—the bigger model—and actually add additional insights to this data. We are enriching the data with additional insights by the teacher model. This creates our final dataset for the fine-tuning job.
So finally, we are ready to fine-tune our model. In Amazon SageMaker, we start by selecting the best model. As I said earlier, Amazon Nova Light—we chose that for the fine-tuning job. We pull our anonymized and normalized training data and we push it into a fine-tuning job. Then we use the test data that we put aside earlier to evaluate our newly created model. If it passes our threshold, we are done. We can just push it into Amazon Bedrock and make it accessible on demand. Super simple, super powerful.
So to recap, we started with a very simplified agent that is based on the Nova security adapter, which is our payload generator, and it has our guardrails on top of it. Then it did a great job but not good enough because we still saw payloads that we should block.
Then we introduced the Guardrail Checker Agent, whose whole purpose is to decide whether we need to block or allow payloads, and it did a very good job. The problem was it was too good. We used our researchers and a human-in-the-loop approach to collect examples of both bad and good cases where we reached agreement or disagreement with the model's choice. We put this data aside, enriched it, and eventually used it to fine-tune a new LLM that replaced the brain of our Guardrail Checker.
When we think about it, we actually injected Terra's business context into the flow. The nice thing about it is we don't plan to stop here. We already have the data and the pipelines, and everything is ready, so we keep iterating, collecting examples, building datasets, and improving our models continuously.
So what does this actually mean? We go from 80% true positives, which are correct blocks, to 92% after the first iteration. This is a huge improvement after just one iteration, and it shows how by injecting your business context into the process, you can really unlock this shift from average to great. As I said before, we plan on keeping improving until there's nothing else to improve.
To summarize, we've seen that attackers already use AI, but now defenders finally have an AI-powered advantage that is safe, predictable, and always on their side. Thank you for listening, and thanks to AWS for the platform to make this happen. Dan Veda and I will be here off stage to take questions. Thank you so much for listening.
; This article is entirely auto-generated using Amazon Bedrock.

























































Top comments (0)