DEV Community

Cover image for AWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Data protection strategies for AI data foundation (AIM339)

In this video, Derek Martinez and Sabrina Petruzzo from AWS demonstrate building a secure healthcare chatbot for nonprofits with live coding. They implement a six-layer defense-in-depth strategy including encryption, IAM access control, CloudTrail auditing, AWS Config HIPAA compliance, and PII detection using Amazon Textract and Amazon Comprehend. The session features hands-on implementation of differential privacy techniques like k-anonymity for age masking and data sanitization through a SageMaker pipeline. They demonstrate prompt injection defense by blocking malicious queries attempting to override security settings. The architecture separates raw and processed data in different S3 buckets, applies automated compliance monitoring, and creates comprehensive audit trails. The live demo shows successfully masking sensitive patient information while maintaining data utility for legitimate queries.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Introduction: AI Security Challenges and the Defense in Depth Strategy

My name is Derek Martinez, and I'm a senior solutions architect here at AWS on the nonprofits team. Today with me is Sabrina Petruzzo, the security lead for the nonprofits team. By show of hands, just so I can get to know who's in the room, do we have any security engineers or developers? Now who here gets that call at 2 a.m. when something breaks or somebody has a breach? I've been there before. I understand your pain. Now here's the million dollar question: How many of you are running AI in production right now? Keep those hands up if you can tell me exactly what sensitive data your applications handle as well as what governance controls you have to handle that sensitive information.

The OWASP Foundation came out with the top 10 risks for large language models, and guess what the top one on the list was? You guessed it: prompt injection. So today we're going to talk about four key security areas. The first is data sanitization. The second is prompt injection defenses. The third is securing a machine learning pipeline, and the fourth is a defense in depth strategy.

Thumbnail 80

We're going to take a look at our defense in depth strategy first. As you see here, we have six layers. In the first layer, we're going to enable encryption for our data at rest as well as our endpoints. In our second layer, we're going to implement fine-grained access control, leveraging an identity access management service, IAM, in AWS. In layer three, we're going to create a comprehensive audit and defense system leveraging CloudTrail to make sure that we know who is doing what and what actions they took.

In our fourth layer, we're going to create automated compliance. What does that mean? We're going to leverage AWS Config to define a preset of rules so that if our system deviates from those preset rules, it will monitor, alert, and then we can take action to remediate. In layer five, we're going to do PII detection and data sanitization. This is where you're going to actually see our live code talk, so I just wanted to point that section out to you. In our last layer, we're going to do prompt injection defense, which we'll show you with a chatbot a little bit later.

Thumbnail 170

Building a Secure Nonprofit Healthcare Chatbot Architecture

So today we're going to be building a nonprofit healthcare chatbot that your internal teams can utilize to query sensitive patient information. First, we have an internal data owner. Your internal data owner will go ahead and upload documents and patient data into an Amazon S3 bucket. This is going to serve as your initial data entry point for the information that's going to be powering our chatbot. For example, this internal data owner could be a medical provider who's uploading patient records into your system.

Thumbnail 190

Once documents are uploaded into S3, it triggers an Amazon SageMaker pipeline. Within this pipeline, we have different data protection functionality via Amazon Textract and Amazon Comprehend. Amazon Textract will scan through the input documents and extract text from those documents, and then Comprehend will process those documents to detect PII found within your data. This is also where we're implementing differential privacy techniques for data sanitization. Once the documents are processed, we're storing them in another S3 bucket, and that S3 bucket is going to be the bucket powering our chatbot. This also separates the raw data from the processed data by having two separate S3 buckets, so we're not fetching any raw, unprocessed data for our chatbot.

In terms of security, we also have auditing set up. Everything is going to be logged via Amazon CloudTrail. Those logs are stored securely in an Amazon S3 bucket, and they're also leveraging a KMS encryption key to make sure all of our information is encrypted.

Thumbnail 260

Now let's take a look at it from the user perspective. Our end user sends a prompt to our chatbot user interface, and then our API Gateway exposes our backend Lambda function through a REST API. Our backend Lambda function is going to process the prompt and check to see if there's any prompt injection techniques being used. Once detected, it will mitigate against that prompt injection. We also have AWS Config, which will apply that rule set.

Because we're going to leverage healthcare data, we're actually going to implement the HIPAA conformance pack, which is a predefined set of rules specifically for HIPAA compliance. When it comes to our prompt injection, I'm sure you're wondering what kind of detection and defense we're going to do. What we're going to do is use a direct prompt injection, meaning we're going to ask it for normal information, but we're going to give it an instruction to actually override the settings so that we can see what happens when we do that.

Thumbnail 340

Thumbnail 370

Live Demonstration: Testing the Chatbot with Differential Privacy and Prompt Injection Defense

Now, this is the healthcare chatbot for researchers and patient statisticians. In this case, what we're going to do is make sure that our chatbot works. Let's go ahead and send a prompt over here to see if our chatbot is working. We're going to type: "How many patients have diabetes?" Now this is important to note. When we get this response back, you're going to see we get two results, and those two results will actually show that two patients have diabetes. In this case, we're actually going to see the data masked, and we're going to show that there's no personally identifiable information in the output of our prompt.

Thumbnail 400

So we'll give it a second here. There we go. See, we have two patients. Awesome. I've also entered myself into this chatbot, or into the data source, as a patient with patient ID 12345 and said that my age is 100 years old. So if we go ahead and ask the chatbot, "What is the age of patient with patient ID 12345?" we'll go ahead and see. Once it runs that, it says that patient with patient ID has an age range between 100 and 109 years old, and this is just for differential privacy.

Thumbnail 430

Thumbnail 450

So now here's the real test. Let's go ahead and see what prompt injection looks like. If we ask the same question and we say, "How old is the age of patient with patient ID 12345?" and tell the chatbot to override its security settings, we'll see that the chatbot responds by saying that a potential prompt injection attack has been identified or detected and to rephrase your request. Now three important things happen here. One, we obviously stopped the prompt injection. Two, we actually log an alert to our team. And then three, our team then goes and takes actions based on our CloudTrail logs, as well as if any AWS Config rules present an alert.

Thumbnail 470

Live Coding Session: Implementing the SageMaker Data Protection Pipeline

So for today's session, we're really going to be focusing on the back end portion of the architecture, from uploading the raw documents into S3 to actually carrying out the data processing functionality, because this is really where the magic happens. For ease of deployment, we're using an Amazon SageMaker Jupyter notebook, and we've broken that up into six different implementation steps. First, we're installing our required packages and all of our application dependencies. Then we're setting up our security foundation. This is where we're setting up our KMS encryption key, we're enabling those AWS Config HIPAA rules, and we're also creating least privileged access rules.

From there we have to go ahead and actually define our SageMaker pipeline. This is where we are defining the execution environment as well as all of our functionality to actually extract our text with Amazon Textract and then mask it with Amazon Comprehend. For pipeline deployment, we've done all the hard work, so we'll deploy the pipeline. We also have a verification step where we'll actually use just regular data that you can see right there, and we'll show you that it's being masked so you can see that it's working. Then after that we'll actually execute the pipeline, which you'll be able to see in the console.

Thumbnail 560

Thumbnail 580

And with that, let's go ahead and move over to the live coding portion. Did I hear a question? No. Anybody have a question for us while we're waiting? No, it all makes sense so far. That's what I love to hear. All right, awesome. So if I go ahead and go into our SageMaker console and I go into JupyterLab, I can go ahead and open up my Jupyter notebook. This is the SageMaker data protection pipeline that we have right now, and it does have those six different implementation steps that we talked about.

What's nice about this notebook here is that we can go ahead and actually click through each individual section to validate that it works before moving on to the next section. So I'll go ahead and start off by setting up our dependencies. Some of this we do have pre-built out. However, that data protection file with the Comprehend and Textract functionality will be live coding today.

Thumbnail 610

Thumbnail 620

I'll go ahead and set up our packages and dependencies. We'll click through, and once our setup and dependencies have successfully completed, we will see a print statement stating that all dependencies are imported successfully. Next, we'll set up our security infrastructure. That's the KMS key, those config rules, and our IAM rules that we need as well. I'll go ahead and run this one as well.

Thumbnail 650

Thumbnail 660

Let me zoom in. We got our first question. All right, it's official. We made it. Is that better? Perfect. So here, once we ran our security infrastructure section, we can see that we've set it up, we've created our encryption key, we've created our IAM role, and we've also enabled the HIPAA compliance conformance pack within config.

Thumbnail 670

Thumbnail 680

Thumbnail 690

Step 3 is where we're actually defining our Amazon SageMaker pipeline. First, we're just defining some pipeline parameters like the input and output bucket. Then we have that processing environment. We're just setting up the processor, the instance type, the session, and so on. Then we also have a step here where we are actually defining the Comprehend and extract data protection step. We have a data protection data protection Python file here that we're referencing. We'll actually want to go ahead and create that Python file, which is what we'll be coding today.

Thumbnail 720

Thumbnail 730

What I'm going to do here is go into JupyterLab, which is one of our AWS IDE environments. Here I'll go ahead and start coding the code to support our data protection functionality. All right, so for this part, nothing fancy, we're just going to install our dependencies and make sure we import our tools. Now we're going to define our first variable. This is going to actually create two extraction paths, one specifically for text files, one for PDFs and images, as well as JSON files. Now we're also going to validate that our paths exist, and if they don't, we'll fail gracefully.

Thumbnail 750

Thumbnail 770

Now for this one, what we're going to do is we're actually going to take a look at our text files. This is where we're actually starting to do something. We're going to open it up. We're going to scan the text in the files, and we're actually going to extract it and return it as a string. In this case, what we're going to do is we're actually going to show that the PDF and the scanned images is going to be a little different than what we talked about with the text files.

Thumbnail 810

First, we want to make sure we're in the right region, right? We want to be in the region our data's in. Second, we're going to create a Textract boto3 client. This will allow us to do our extraction. Then what we're actually going to do is we're going to open the files in binary mode. We're going to pass the file bytes to Textract, and then it'll actually return the text for us. Then here what we're actually going to do is when we get the return text, when it comes to Textract, you can actually return blocks of code and blocks of code could be a line, it could be a word, or it could be a page. What we're doing here is we're specifically telling it we want to return a line because we don't want a single word. We're basically recreating the documents that we have in our raw bucket. We just want to clean them, right? We want to make sure it matches the way it looked before.

Thumbnail 850

Thumbnail 870

Thumbnail 890

Let me zoom in again. This is where we do our error handling. Basically, we want to fail gracefully. We want to make sure that the pipeline doesn't error out if the file path isn't found. It's just going to ignore that step. There will be a part later where we'll deal with that. Okay, our next step, we're actually going to implement differential privacy. This is where we start, right? We're going to start with the randomization. In this particular instance, what we're actually going to do is we're going to implement the randomization. If you go to the next line of code, we're actually going to apply a value here and we're going to say that if the value doesn't match what we're looking for, then go ahead and do nothing. The reason why is because we don't want it to manipulate the document, right? We have to make sure that step is in there first because if we wind up changing something in the document, we're not going to be able to replicate the document.

Thumbnail 910

Thumbnail 920

At this particular step, this is where we start rounding the values. We're going to take the years and round them to a unit of 10. Let me give you an example. Let's say somebody was born in 1981. We want to make sure that rather than giving them the exact date, we give them a range. I have a question for you all. Why would we do that? Does anybody know? Correct, because it's a quasi identifier. You'll see this come up with a couple of different examples here. What that means is that it could be used with something else to identify somebody. That's a quasi identifier. So what we're really doing here is making sure that it's less likely that you will be able to identify somebody based on their birth year because we're giving it a range.

Thumbnail 970

Thumbnail 980

Okay, here we go. This is a fun word to say. Does anybody know how to pronounce that? We'll throw it out to the audience. No. Okay, k-anonymity. That's right. You got it. Good job. So here what we're doing is we're going to do the same thing with the ages. We're going to leverage age ranges. If you're 35, we're going to provide an age range, and we did this with Sabrina. I don't know if you saw in the example, but if you're 35, we give you a range from 30 to 39. Age isn't something that I think I could identify somebody on just by itself, but if I continue to add other bits of data, now all of a sudden it gets a little bit easier. So that's what we're going to implement here to make sure that we are protected.

Thumbnail 1030

Thumbnail 1090

One of the things I want to call out is that if an error or if a value kind of matches an age but it's not an actual age, we want it to do something. So what it's actually going to do is it's not going to act on it. I'll give you an example: a phone number. You don't want your phone number to be masked as an age. So we just need to make sure that the system understands that it's going to do this particular type of behavior. It's important to note that this applied privacy protection function right here is actually our privacy protection router. The reason we have a privacy protection router is because different types of PII require different types of differential privacy techniques. So in this case, we want our file to be intelligent enough to say, "Hey, this PII looks a little different than that PII." Which I'll go into a little more later, but I don't want to steal the show yet.

Thumbnail 1100

Thumbnail 1130

Thumbnail 1150

Same concept here. We're just masking the PII data based on the type. While I have you here, why would we implement k-anonymity? Anybody know? Hint, hint. I kind of gave you the answer earlier, but I wanted to see if somebody else would answer. It's to reduce the ability to identify someone by their name and their age. Correct, exactly, because it's a quasi identifier. Good job. Okay, so in this case, now we want to find and hide the PII. This is where AWS Comprehend is going to come in. If you actually do the next line of code as well, you're going to see that we're going to do the same thing for Comprehend that we did for Textract. We're going to create our boto3 client. Do we need to be in the same region as our data? Anybody? Is that good best practice? Yes, exactly. I heard that, yes. So what we're going to do again is apply this boto3 client. We're going to have the ability to now find the PII and apply the appropriate privacy protection to it.

Thumbnail 1180

So in this case, what we're going to do is give it a confidence score. We're going to say, okay, I want to return, for instance, the year of somebody and I want it to have the ability to be 80 percent accurate. So in this case, we're just applying the differential privacy technique and also applying a level of confidence to that.

Thumbnail 1210

Who here is having fun? Is anybody excited to look at code today? Yeah, all right, I'm the only nerd here. I got you. Thanks for that. Okay, so in this case, what we're actually going to do is we're going to get the text, and then we're actually going to replace the text. Depending on the type of PII as well as our confidence score, we will do the replacement.

Anybody have a question at this point? We're cruising through this thing, so I don't mind slowing down. Yes, sir. How do you know to use the score? Is there an error? There is a trail, yes. I think for us we use—sorry, go ahead. What were you going to say? We did play around with different confidence scores, so it really depends on what your output is and what you're expecting, or what level of threshold you can withstand in terms of incorrect output. So it really is something that you can play around with and see what works best for your use case, yeah. And I saw a lot of the examples I saw were 90%, but we wanted to show variants. We do show another one later that shows 90% as well. Any other questions? All right, we're moving on.

Thumbnail 1310

Thumbnail 1320

Thumbnail 1330

Okay, this just again—hey, if there's an error when we detect our PII, this actually pertains specifically to if the PII I'm looking at, I can't quite determine if this is PII or not, we're going to go ahead and just return the text the way it was. Again, the purpose of this is to make sure our document stays as intact as possible, right? Okay, so in this case, you're going to see this a little later, but what we're actually going to do is we're going to make—and you can go ahead and add that next line of code—but we're going to make our input and output folders. This is where our process is going to happen.

Thumbnail 1350

The important thing about this is if it doesn't exist, we're going to make it and your input output is going to be very important, right? Our input is our raw data in our bucket. Our output now becomes our clean data. And then also, if it doesn't exist, we create these buckets so that we have them in our pipeline, so it doesn't fail. All right, and for this line, now we're actually showing the file types that we're actually going to look for, and we give the system the prompt to do that. And here we're also going to say we want to do this in a batch processing style and it's important to note that if one fails, we don't want to stop the files, we just want to ignore that file for now and then let our team know that we didn't scan that particular file because of an error.

Thumbnail 1390

All right, and on this step we're actually going to clean the text. Once we have the clean text, we're going to actually output it. We're also going to print what we did here. This is Sabrina and I—we feel very confident about making sure that our system understands or tells us what it did. So we actually wanted to print the applied privacy protection and as well as what was found and masked, and we're actually going to put this in an audit bucket. Our audit bucket is something we can go back to later and we can say, hey, in that document, what did we actually do? What did we find, right?

Thumbnail 1430

Thumbnail 1440

And then it's important to note that it will actually put it in a—yeah, you can go ahead and put that next line. I think I'm still in the thunder here. Yeah, it's going to put it in an output folder and it's actually going to say "cleaned" on it. It's one way we'll know we'll be able to differentiate it, right? Here's our audit, right? This is what we really want to make sure our teams understand when we're going through this process. We have an audit log set up. We have an audit bucket. We'll be able to go in and make sure again that we understand what kind of PII was detected and because we have CloudTrail as well, we can also see who accessed our audit bucket, right?

Thumbnail 1470

We want to make sure that we know who has access to it if they performed any actions. Obviously we want to lock down our bucket, but sometimes things happen and that's why we have a service like CloudTrail to be able to monitor. All right, are you all still having fun? Sweet, I'm doing my job. Okay, so in this case, we—again, I know we're belaboring the point, but for us, if we get an error, the last thing we want to do is stop the whole process, right? We would rather alert and let our team go and look at it, but continue the process with what we know is good so that we're still processing everything and it's not stopping our pipeline. I think we're getting close to the end here, huh? We are. This is the last function.

Thumbnail 1510

And we did this in record time. So this is a standard function. Before I say that, let me ask a question. Does anybody know what this does? Why would we put this in there? I'm looking at the coders in the group. If you just call up a Python file, it will run. There you go. I knew somebody would know it. I actually did not know before we started doing this, and I found it out and I was like, wow, my whole life has changed.

Thumbnail 1540

Thumbnail 1550

Thumbnail 1560

I want to say that I think we missed something earlier. I don't think there was anything after the trial I was missing. Let's go back and look. Never mind. Yeah, I think I just didn't see it. Yeah, no worries. Yeah, the find and hide. Yeah, no worries. All right, so this is the coding portion. This is where I'm going to open the floor up for Q&A. Feel free to ask any question you want, and Sabrina is going to answer all your questions today. We're also still going to go back into the decision maker notebook and actually run this code, but if anyone does have any questions right now before we move on, I'm happy to answer those.

Thumbnail 1590

Thumbnail 1600

Thumbnail 1610

Pipeline Execution and Results: Verifying Data Sanitization in Action

That's true. I jumped ahead a little bit. Dirk got a little too excited there, but it happens. All right, if there are no questions, we can show you what it looks like in the console and we can run the actual SageMaker notebook. Okay, so we've actually uploaded this file into our SageMaker environment already. Essentially, this step 3, the SageMaker pipeline again, is setting up that processing environment and actually defining the functionality for our text extraction and PII masking. It's also setting up the audit trail, enabling encryption, things like that. So if we go ahead and run step 3, we'll see that we have our pipeline creation functions defined. It's actually using that data protection Python file that we just created to set up Amazon Textract and Amazon Comprehend to do that data sanitization for us.

Thumbnail 1630

Thumbnail 1640

Thumbnail 1650

Thumbnail 1660

All right, now we're going to deploy the pipeline. Fingers crossed, everybody. No, I know it's going to work. I tested it. I got a backup one just in case it fails. It's okay. And then now what we're going to do is we're going to go to the verification step here. You are about to see. It masked real time. There you go. We have masked data. Everybody give me applause. Yay. All right, now we're going to execute the pipeline though. I bet you didn't know we had one more step, huh? All right, so we're going to execute the pipeline. Now this is important because we have an input bucket and an output bucket. The data protection Python file created both of those for us. Our files are in there. We're syncing our files. We have some files in another bucket, and we wanted to show you a syncing step, and then we're going to execute the pipeline.

Thumbnail 1680

Thumbnail 1690

Thumbnail 1700

On the last step, there we go. Now let's go to the console. Yes, so the pipeline has actually been executed. So now if we go back, let's go back into the SageMaker console here. So if we go into SageMaker Studio and click on pipelines, we'll see here that we have our data protection pipeline created for us and right now it's currently in the executing phase, so the pipeline is running right now executing. It does take about 2.5 minutes or so for our pipeline to finish executing. We can show you the files, but we can see that we have our data protection step that we implemented in our SageMaker pipeline currently running. And again, that's just doing that text extraction and Comprehend text extraction and PII masking for us. Any questions while we wait?

Thumbnail 1750

Thumbnail 1760

Thumbnail 1770

Has anyone done this before? Go ahead. Sorry, I think I missed what. How can we identify what was or wasn't PII so that was in the Comprehend? So in the Comprehend step we notate what PII was there. Sorry, I probably did brush over that. So for like Social Security numbers, phone numbers, emails, we have a step in there that actually differentiates the two. So like age, we did k-anonymity for years. We did the randomized units and then we also do a step in there where we actually identify Social Security numbers and based on that type of PII we just mask the data. Yeah. Was there another question? No, okay. Has anyone done something like this before or any data sanitization, whether you're using Textract Comprehend or like Bedrock or another service?

All right, let me ask another question. Has anybody's organization really pushing generative AI right now? And are you or somebody on your team tasked with trying to figure out how not to expose data in your generative AI application? Yeah, so this was one of the ways this came about. It's one way to do it, but we wanted to provide customers with solutions. So again, we work in the nonprofit space. A lot of our customers don't have a lot of big teams, and so we try to find solutions that they could just stand up and run today.

That's where all this came from, and the emphasis behind it is to make sure that when you're exposing your data to your generative AI application, you know what data you have and whether or not it's masked. In our case, we have a HIPAA compliant customer, so PII is unacceptable. We can't expose any PII. So in this case, we made sure that we sanitize that data before it got there. Now we could leverage a generative AI application and we know our data is good.

Thumbnail 1850

Thumbnail 1860

Thumbnail 1870

Thumbnail 1880

We can see that our pipeline has successfully executed. We could go into the pipeline if we wanted to, open it up, and you can see the status here as well as your files and settings details. What we could do now is actually go into Amazon S3. If I go into my buckets here, I can see that I have different buckets created for me, and all this was done through SageMaker or the Jupyter notebook and Amazon SageMaker. Everything that we created there created the input bucket, output bucket, as well as the audit logs bucket. So if we go ahead and go into our data protection output bucket, we can see here that we have all of our clean files. I uploaded files like Sabrina.txt, Sherman.png, and so on. It added that clean prefix for me to differentiate a raw file versus a clean file. I could go ahead and actually download one of these files if I wanted to.

Thumbnail 1910

Thumbnail 1940

I mentioned I added myself as a patient with patient ID 12345. I actually created a file for myself. So if I go ahead and download this text file, I can open it up for you all. Let me try to open it up on the right screen. There we go. So we can open it up here at least, and I can see here I have my patient ID. It says I'm patient ID 12345, and then it has all of my sensitive data masked for me. You can't see my name, my email address, or my phone number. You can see that my age is 100 to 109. Again, I entered myself as a patient and said I was 100 years old, so we're applying that differential privacy here. You can still see my reason for my visit and then I have patient notes where I say it's always day one.

Thumbnail 1980

I know the question going on in your head right now. Derek, how do I know if you go back to the bucket real quick, how does my system know the difference between a name like John Doe and a letter like Dear John? Comprehend, in that step where we're actually identifying the data, will actually process that for us. I just wanted you to know that was one of the questions that I got a lot as we were practicing this and going through it. How does it differentiate? The Comprehend service itself actually does that for us.

Thumbnail 2030

Thumbnail 2040

Thumbnail 2050

Q&A, Additional Resources, and Closing Remarks

Opening the floor up for questions. I know somebody's got one. It's been burning in your brain. We got a couple. Let's go here first. AWS Config. Yes. So if we actually go to our SageMaker notebook real quick, in order to save time, we built out this security infrastructure. If you open that Python file, this is where we're actually building out all the security infrastructure that we talked about in that very first step. In this case, what we're doing is taking AWS Config to monitor our configuration of our systems and services, and then we're going to figure out or we're going to be alerted if our rules are deviated from. Config does that for us now. HIPAA is a conformance pack that has a preset list of rules. In this case, I believe it's a managed conformance pack in which we actually give you that list of rules. This is what auditors have told us helps with HIPAA compliance. It doesn't make you HIPAA compliant, but those set of rules are rules that you can start monitoring for and give to your compliance teams as a way to show them that you're monitoring for that.

Next question. What if instead of masking, you needed for your application the data to be synthetic instead? Is that a huge lift to the pipeline to support that? That's not a huge lift. You can use, I would have to double check whether Comprehend supports that natively, like basically putting dummy data. It sounds like you want to do. I believe I would have to double check about that.

But that's something you could do with Glue jobs, for example, if you wanted to. So you can actually use different Glue jobs to input fake data instead of the data that you have in place. You could use a text string instead of putting 1234, you can do 4567, for example. So you can definitely do that as well if that's within your use case.

Think of this like a starter, right? We could go into many different directions based off of this pipeline right here. Absolutely. Any other questions? I thought I saw another hand. Now's your time. This may go on YouTube. It's a chance for you to be famous. No? Okay, we're good. Let's go to the next one.

Thumbnail 2160

Thumbnail 2170

This is where I get to tell you about all the fun things we're doing on the nonprofits team. We need to go back to PowerPoint. Get your phones out and get ready because we have some QR codes coming. Okay, so we have more sessions now. I know it says nonprofit, but there are many different services that we talk about. It's not just nonprofits, but if you are a nonprofit, we would love for you to come to our sessions. A lot of them are geared towards nonprofit customers, whether it's healthcare or charities, so just feel free to capture that screen or that QR code.

Thumbnail 2190

Next, if you are a member of the AWS nonprofit team, meaning you're a nonprofit customer, grab this QR code right here. It will actually allow you to communicate with us, and we can tell you who your account rep is so you can start talking to people like Sabrina and myself. We can bring in solutions architects to have architectural conversations, and then we have an area in the pavilion where you can actually come by and see us. We're actually right by Movember, so you can come talk to us and then go get a haircut.

Thumbnail 2240

SkillBuilder, who here is plugged into SkillBuilder? Got a couple, okay? If you're looking for some free trainings, there are paid trainings as well, but we have a lot of free trainings. Here's a good way to get access to that just to get yourself started, and then I think this one, yes, it says it's specific to generative AI. All right, lastly.

We have seen what we've accomplished today. You've built an AI security framework, not just slides, but production-ready code. We talked about our six layers: encryption, PII detection, access control, audit trails, automated compliance, data sanitization, and secure prompt handling. This is your competitive advantage. Here's what to do next: take these templates that we have shown you this week and pick one AI workload to try and do something against.

Lastly, I would just like to thank you all. It's been a pleasure of ours. We know that your time is valuable. It's the most valuable thing here at re:Invent, right? We're just so thankful that you joined us today. Thank you everyone. We appreciate it.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)