Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Supercharge security investigations with custom detection & analytics (SEC350)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Supercharge security investigations with custom detection & analytics (SEC350)

In this video, Amazon GuardDuty's Senior Product Manager Sujay Doshi and Principal Engineer Peter Ferrie present the service's malware protection capabilities, including newly launched features for on-demand API scanning and fully managed AWS Backup scanning. Peter provides a deep dive into GuardDuty's multi-engine detection system, including hash-based, pattern matching, machine learning, and third-party engines, explaining how they collectively detect threats like cryptocurrency miners (45% of detections) and zero-day malware. The session highlights GuardDuty's ability to process 9 trillion events per hour during Prime Day and its native threat intelligence systems like Mithra and Mtpot. Nubank's Senior Software Engineer Mikaelly Felicio shares their implementation across three countries, achieving PCI compliance and complete cloud coverage in under 12 months while maintaining less than one dollar cost per client. The presentation emphasizes the importance of scanning both live workloads and backups to detect dormant malware that slips past traditional EDR tools, with practical guidance on incremental versus full scanning strategies for cost-effective ransomware protection.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction to Amazon GuardDuty and Its Threat Detection Capabilities

Hello. Good morning and thank you for joining this session. I hope you are all having a lovely day and are already getting plenty of steps walking through the conference, but I hope you continue to have fun and a lot of learning for the rest of the week. My name is Sujay Doshi, and I'm a Senior Product Manager with Amazon GuardDuty. I'll be completing four years this February. I'm joined today by Peter and Mikaelly. Hi, I'm Peter Ferrie. I've been with Amazon for 5.5 years, and this year marks my 40th year in anti-malware research. Thank you. I'm Mikaelly Felicio. I've been at Nubank for almost three years and I'm working as a Senior Software Engineer.

Right, thank you. So, a quick agenda. We'll start with giving a short overview of what the service is and raise of hands on how many of you today use GuardDuty and more specifically, one of the malware protection offerings. Next, we'll talk about the new features that we launched a few weeks ago. One is a scan on demand API and the other is a fully managed offering to scan your AWS backups for malware. Next, we'll use Peter's extensive experience and for the first time he'll actually dive deep into the inner workings of the GuardDuty malware scan engine and essentially how we keep up with the ever-evolving malware landscape. Then we'll have Mikaelly share how they use one of the offerings or a couple of the offerings from malware protection from GuardDuty to drive their security outcomes at Nubank.

So, GuardDuty, Amazon GuardDuty for those that do not know, is a native threat detection service from AWS trusted by 95% of our top 2000 AWS customers. Looking at data and consuming logs, GuardDuty across tens of thousands of customers monitors hundreds of millions of EC2 instances and millions of S3 buckets to essentially give visibility into account and resource level compromise. The sophistication that it derives is the accessibility to native threat intelligence that it collects from various internal threat intelligence groups and combining it with all the different detection algorithms that we use to detect sophisticated cloud threats.

One such example is with a system called Mithra that we run internally. It's a domain reputation service, a graph-based neural network database that essentially monitors all the DNS traffic across Amazon for the last seven days and curates a list of malicious and benign domains, often days, weeks, or months sooner than a traditional threat intelligence provider. The reason we are able to do this is the accessibility and the telemetry that we are looking at to build this intelligence. Being a fully managed service and being a first-party service, automatically these indicators of compromise in the form of IPs, domains, and file hashes are consumed by GuardDuty into the threat intelligence platform that is then used to monitor customer workloads and network traffic to detect those threats.

The other example we share here would be a system called Mtpot, a globally distributed system of threat sensors and honeypots that essentially tracks attacker behavior, tactics, techniques, and procedures. This ultimately also provides us the indicators of compromise and the behavior to understand how attackers are targeting the whole of AWS cloud that we can then leverage for specific customer workloads and use it to generate findings. From an elasticity perspective or the auto-scaling aspect, we're sharing an anecdote that this year during Prime Day, which I hope many of you were able to enjoy, GuardDuty processed close to nine trillion events in an hour. This goes on to show with a 49% increase year over year how automatically it will scale with the data that it consumes.

With all this intelligence, it essentially relies on the telemetry and the sources that we collect data from. When you think about GuardDuty, as an account that has enabled GuardDuty, automatically we'll be monitoring the CloudTrail management events and the VPC flow logs and the DNS query logs from your EC2 instances. That's basically what we call foundational sources.

Everything else that we have—S3 data plane events, Lambda monitoring where we monitor the Lambda function executions for network threats—all of this in the runtime agent comprises what we call opt-in protection plans. As a delegated administrator, you can decide to enable it on specific workloads where you perceive that threat activity would manifest and you want to be protected against it.

Once we collect all of this data, a key differentiator from a fully managed aspect means that for US customers, you do not need to do any source service-side configuration for us to collect the data and process it. You don't need to collect the data specifically for GuardDuty. It may be for any investigation or compliance purposes, but GuardDuty does not rely on customer-vended logs. We automatically, if you enable one feature or multiple of these features, pull the data from a backend stream so that you do not have the cost of ownership of collecting the data and shipping it to GuardDuty for processing. That's a key value proposition of being a first-party, fully managed service.

Once we have all of this data, it goes to the analytics engine that comprises threat intelligence, heuristics-based TTP detection algorithms, and we also learn the normal behavior from these logs to generate ML-based anomaly detection findings that give visibility from the perspective of whether it's normal for the organization, whether it's normal for this particular workload, and whether communication happens from an ASN that has a bad reputation, and so forth. The major focus of this session would be on malware scanning and malware detection.

Malware Protection for EC2 and S3: Agentless Solutions for Cloud Workloads

Historically, when GuardDuty was launched in 2017 and up until 2022, we had all of these different protection plans and features which relied on telemetry data or events. In 2022, we expanded for the first time to scan a particular resource, specifically to detect malware. We'll talk about some of the offerings for malware protection that we have. Starting in 2022, as I said, we launched for the first time malware protection for EC2 that delivers an agentless solution to detect malware for your EC2 and container workloads that have been attached to EBS volumes.

With a single click, similar to other features, the ease of use and holistic coverage across the organization remains paramount. With a single click, you can get malware protection coverage for all your accounts and workloads. The centralized monitoring allows your delegated administrator and security teams to review the findings with contextual data around the malware family, the source within the scan engine that we use, and we'll talk about that later in the session.

A very big value proposition here is that there are no agents, so you don't have to update, manage, or install anything in order to specifically detect malware. The container awareness brings in key protection against some of your container deployments, and if there's malware that's installed, prior to this feature, customers had to essentially take a snapshot of their volumes, run it against whatever malware scanner was at their disposal, and then try to stitch it back to a narrative around what they saw in GuardDuty.

For example, if a network anomaly indicated some communication to a C2 server, what would a customer want to do? They would take a snapshot, check whether it's malware or whether malware was installed as a result of this network communication. All of that with this feature was automated with a single click. Customers who have enabled this feature automatically have their EBS volumes scanned when we see a preceding network anomaly. Or you can even use scan on demand as part of your investigation or forensics workflow. There's no performance impact to your live workloads because there are no agents that are installed.

Next, we expanded the offering to provide a fully managed solution to secure the data pipelines that you build on S3, specifically helping to solve for untrusted upload application use cases. You have external vendors uploading data or internal users uploading data to S3. In both scenarios, you do not first want to be the receiver of something that's malicious, so you want to establish trustworthiness in the data that's uploaded to your S3. In the second case, you do not want your internal users, through some supply chain compromise, uploading something like an image file that's actually malware, and you do not want to become a distributor of malware.

In both these cases, there's a massive need to ensure trustworthiness in the data and that you can trust the provenance of the data being sent. With this feature on the buckets, you can configure the malware protection plan and we'll automatically monitor for any new objects that are uploaded. We'll scan them and detect if there's a malware finding. Using EventBridge, there will be scan notifications that you can consume either in the delegated administrator account, your bucket owner account, or the member accounts.

Once we complete the scan, we'll attach a tag on the object itself so that you can implement inherent protection and quarantine use cases using tag-based access control. In the bucket policy, you can ensure that no user or application can access this particular object until you find a tag that's clean. The scan engine is automatically updated to keep up with the signatures and detection definitions that we use to detect malware. As a customer, you don't need to manage the infrastructure or update the scan engine to keep up with the latest threats. All of it is managed on your behalf through this feature with seamless setup and minimal configuration on the buckets that you want to protect from untrusted upload applications.

New Features: On-Demand Scanning and Fully Managed Malware Protection for AWS Backups

A few weeks ago, we expanded the same offering. Everything remains the same from a scan infrastructure perspective—you do not need to manage infrastructure or scan engine updates. However, we added another pattern to ensure that you not only scan any new object uploads for malware but can also decide to scan existing objects within an S3 bucket to detect malware. You can seamlessly integrate this into your workflows using the API to establish a security baseline for buckets that are not newly set up.

A hybrid strategy combining continuous monitoring through new object upload scanning and retro-scanning through objects already present in your S3 bucket ensures that you have the latest security baseline for buckets in the untrusted upload realm. The orchestration layer allows you to use the API depending on your operating procedures and workflow requirements. A major focus for the remaining part of my talk is to introduce the fully managed offering we expanded in GuardDuty's malware protection suite of offerings to scan the backups taken by AWS Backup for malware.

We launched this a few weeks ago. From the AWS Backup console, as part of your backup plan, you can ensure continuous monitoring for malware once a backup job completes. We'll scan your EBS backups or snapshots, your EC2 AMIs, and your S3 recovery points automatically for malware once you set up that configuration as part of your backup plan. You can also use on-demand scanning to investigate your past backups, which is particularly useful for safe restoration workflows.

In addition to scanning, we have two modes available: full and incremental. Incremental scanning provides cost efficiency because we only scan the data that's changed between the backup you took yesterday and the backup you're scanning today. You don't end up paying for the entire scan every time, but full scan provides more security assurance. It's something you would want to ensure for safely restoring with the latest verification from the latest malware detection definitions. The full scan gives you that opportunity.

The idea is that this does not involve you as part of the backup processes. Once the backup jobs are complete, the scan initiates with minimal configuration that you have to do either on GuardDuty or Backup to realize this feature. This eliminates the need for you to do manual scan and restore workflows, which are time-consuming and delay business recovery in cases of events like ransomware. It provides you a seamless integration between these two native AWS services.

Understanding Malware Progression in Backups and the Importance of Continuous Scanning

But you might ask why you would scan backup resources for malware if you're already scanning the live workloads either through traditional EDR tools or using one of the offerings from GuardDuty that we talked about for EBS volume scanning and S3 object scanning. To understand this, we'll look at a scenario on how malware progresses through a backup. This timeline represents a typical 14-day retention timeline where each block represents a daily backup.

As you can see in the first phase, which is the green phase from day 1 to 7, the attacker already has a foothold in the environment. The malware was planted but it's dormant and not yet activated. We don't see any system changes, there are no files that are impacted, we don't see encryption, and there are no malicious artifacts that got introduced in day 1 to 7. The dormancy of the malware ensures that there's no persistence that's created, and it's just specific malware files that are present, but they're not yet acting upon anything.

The reason we call these backups clean backups is that in a typical restoration workflow, you always want to restore in a clean and isolated environment. What that does is actually negate the effect of the malware, which is environment dependent. It also blocks the C2 communication pathways. All in all, this represents the safe recovery period and a point in time before malicious behavior surfaced in the environment.

The second phase is the yellow phase, day 8 to 10, where we start seeing early-stage artifacts appearing in the environment. This might be small preparatory changes, but still low noise in your environment. EDR tools or live scanning workload solutions will not have noticeable indicators that are generated because of the low noise and the still dormant malware. But once the malware is activated, all of these slip past into the backups. You can still do some surgical fixes from this malware and restore just the critical business data, but you still need to perform the item-level recovery capability that AWS Backup inherently provides, ensuring that you are restoring just the clean portion of these backups.

Day 11 to 14 represents where we see a ransomware event happen, so the entire production system and even your new backups that are taken are all compromised and it's unsafe to restore from these backups. If you think about this timeline in a real environment, the timing spans across multiple weeks. For example, if the initial compromise that we talked about happened 3 weeks ago, your last clean backup that you know of was taken 2 weeks ago, and the activation happens now. Imagine that by the time the team identifies that there's a breach in their environment, it might be too late because of the retention window that you typically would have configured for your backups, and you start losing the clean backups.

Imagine if you identify the breach one week from now, you're basically losing out on the entire safe recovery window or the backups that are identified and tagged as clean backups. The key takeaway here is proactive preservation of the clean backups, but more importantly, identification that you have a clean or an infected backup to ensure that your recovery process in the event of events like ransomware, you can safely recover always from a clean copy.

This progression actually highlights a natural blind spot with early-stage threats that slip past into the backups but are not detected by live workload scanning solutions like GuardDuty, EBS scanning, or S3 scanning, or even your traditional EDR tools because they are not designed to actually detect some of these dormant behaviors. They'll be monitoring the in-environment running workloads and looking at the behavior in many cases to find something malicious has happened.

And if the threat is dormant, which in our example was the case, these threats will slip past into the backup. Automatic backup scanning essentially solves or addresses this by continuously scanning the backup data at rest in the backup repositories, abstracting vault-level constructs so that automatically, once the backup job completes, it will initiate the scan. It continuously tracks the changes that happen to your backups and identifies when a backup moved from a clean state to unsafe to actually compromised. You can then define the last known clean backup as what you want to restore from in the event of ransomware.

Traditional recovery solutions actually rely on manual restore and scan, which, as we identified, is time-consuming, adds cost and overhead, and delays business recovery. Automatic scanning through this feature solves for these efficiencies and will continuously provide assurance of the presence of malware as the backups are taken. Bringing this together, backup scanning essentially provides a second layer of defense to your traditional malware scanning for live workload solutions. They are more like complementary and not competing solutions, which are two essential security layers in the entire data protection lifecycle.

Periodic Full Scans, Implementation Workflows, and Best Practices for Backup Security

We'll look into the periodic importance of periodic full scans and how you want to ensure that while cost efficiency is a major factor with incremental scanning, time and again, and especially for safe restoration, periodic full scans are important. Imagine the first full scan happens on day one when the backup runs. From day one to day four, after the first backup, there's incremental scan, and we'll continue to scan just the data that changed between these days. Let's say a zero-day malware was planted on day four. Because at that point the detection definitions did not have information about that threat, when we run the incremental scan, we'll actually tag it as no threats found, even though the threat exists in the backup.

When we scan that data, we didn't have the detection definitions, so we'll tag it as no threats. Now let's say after day five is when the malware engine was updated and we now have information about the zero-day malware. But if you continue to just run incremental scans, because it only scans the data that's different from the past day, we'll actually continue to attach the wrong disposition to subsequent backups and we'll continue to tag them as no threats found. Periodic full scans come into play for the same scenario. If on day six when we identified that there's no threats found, let's say on day seven you actually ran a full scan. Now, because you already have the information with the latest detection definition and also the definition for the zero-day malware, we'll actually attach the right disposition. We'll say that backup seven has threats, and we'll also maintain the lineage of all the backups attached or the parent backup from it.

Then we continue to take incremental backups. Once you rebase the baseline through periodic full scan, when we do an incremental scan on the eighth day, even though there's no new file that was added which was actually malicious, we carry forward the status. So tomorrow, if you decide to restore from the eighth day backup, you will have the information that in the lineage there was a backup that was either the same backup or the backup in the lineage has malware. Once you remove that malicious file from the backup source, the subsequent incremental scan automatically will rebase the status.

What this provides is that it maintains the information and the history through incremental scans. It will maintain the history of what files that were detected as malware and whether they are still malicious. If they are, we continue to carry forward the status, and if we don't find that malicious file and everything is clean, we'll update the status.

This allows customers to use the subsequent backups or the last known clean backup. This is the importance of periodic scanning to ensure assurance every time you want to restore.

A sample implementation workflow: think about using GuardDuty. You get to the point where there's an outcome in the form of findings, which is an EventBridge notification. You can use that and have custom implementation through a Lambda function that will tag the backups and use SCP to ensure that you are never restoring from a backup that's tagged as infected. On the other hand, for clean backups, you can safely move them to an isolated environment to ensure that you always restore from these isolated logical air gap vaults to make sure that the backups you're restoring are safe.

How it looks on the console: you simply provide a scanner rule as part of your backup plan, define which resources you want to perform the backup scanning for, and then you select the mode against the cadence of daily, hourly, weekly, or monthly. The on-demand scanning is exactly the same when you provide it.

For investigation or forensics, you provide the backup, and you can perform full or incremental scans. You provide the on, and we'll continue to scan, providing you have the right permissions given in the role.

In GuardDuty, you can monitor the results again. Contextualization, like any other finding in GuardDuty, is paramount. We'll give the malware source information, the scan engine information, and so on and so forth. Again, if you want to restore as part of your backup processes, you go to the backup console and then you look at the backups that are clean.

You can define to either run an on-demand scan. We actually recommend running an on-demand scan always at the time you are restoring so that it verifies against the latest signature set to give you the assurance that you need.

Pricing is straightforward. For all the resource types—that is EBS, S3, and EC2—it's 5 cents per GB. If you decide to do incremental scanning, the first scan will certainly be a full scan so that we can then define on the next time a backup is run what has changed. So the first scan will be full and then subsequent scans will be incremental depending on your configuration. There's an example: for a 100 GB volume, it'll be $5, and then subsequently if there's only 2 GBs that changed, you know, it will be cost efficient, giving you kind of the same results that you'd reach or outcomes you'd reach with full scans.

Some general best practices: automated incremental scanning for cost efficiency, periodic full scans for rebaselining and assurance. And like I said, in time of security incidents for restoration, on-demand scans are always appreciated. You don't need to enable foundational GuardDuty, but we still recommend it. Think of different solutions and findings that we have, especially for resiliency and recovery and ransomware detections, which is not always based on files but fileless behavior you see with S3 data plane events where we see a lot of encryption through customer keys or there's a spike or anomalous spike in get object, list object kind of scenarios. All of those signals and findings are available in GuardDuty through other features.

So even though foundational GuardDuty is not paramount, we recommend it so that you can combine it with these signals to get higher fidelity findings around your ransomware events. With all the information that's available in Findings and EventBridge, both backup persona and security personnel get the relevant information around the malware. I'd like to invite Peter to talk and dive deep into this.

Deep Dive into GuardDuty's Multi-Engine Malware Detection Architecture

GuardDuty has multiple engines that it uses to handle different detection scenarios. The first engine we have is a hash-based one. We use that engine primarily to suppress false positives so we can exclude the one file and no others.

We also use this engine for responding quickly. For example, if there's an outbreak, we can add detection very quickly, and we can respond within minutes instead of hours.

The engine itself is quite fast. We've found some ways to accelerate our hashing operation. The engine itself is resilient to minor changes in the file. And what I mean by that is that typically

we consider a file to be a solid block of bytes, and we hash it from end to end to get a single hash value. However, there can be cases where a file is not a solid block of bytes. Instead, it might have a squishy bit right in the middle, such as a URL or username password combination. In that case, our engine differs from the traditional implementation because it can be told to hash around the variable part.

Similarly, if a file has content at the end that is variable, perhaps there's an appended configuration, then our engine can be told to hash up to the point where the variable content starts. This means that you can add any number of additional bytes and change any values within the variable section, and we will still have a constant hash. For bad actors that are getting their sights taken down quickly and keep cycling, we will still have detection for these.

The next engine we have is a pattern matching engine. In the event that hashing is not suitable because there might be multiple changes throughout the file, or if a file is very large but the really interesting part is right at the front, then we use the pattern matching engine for that. The engine is very flexible and supports traditional signature-based scanning with straight byte comparisons, but we can do more than that. We can perform conditional evaluations up to a certain point in the file and perform frequency analysis of the bytes we have seen up to a certain point. We can even extract bit-level characteristics of the file for very fine-grained pattern matching.

The engine is very expressive, so the intent of a detection is very clear for the next engineer who has to perhaps extend an existing detection or perform code review for an existing detection. We use this engine primarily for specific detections where the code is constant across replicants. This code will execute arbitrary commands received remotely without any user interaction, and for us this is quite a simple example with a single detection.

However, there can also be cases where we want to use the engine for generic detections where multiple variants of the same sample differ only slightly but are otherwise functionally equivalent. Here is an example of actual malware, just a snippet of the code. This malware uses a polymorphic engine to change its appearance from replicant to replicant. The engine is able to insert random characters in the decrypted body and also change the variable names to random strings. Here is another example where you can see the constant portions are becoming smaller. Here is the extreme example where the constant portions have been reduced to the smallest possible length and all the variable names have been replaced with random strings. Despite the difference between the first example and the last one, there is still a single detection for us. It gives a sense of just how powerful this engine is.

The next engine we have is a machine learning engine. Our focus with machine learning is cryptocurrency miners. The reason for that is because within our environment, cryptocurrency miners are a big problem. If we look at this table here, we can see how quickly a cryptocurrency miner can be installed. We went from a misconfigured state to a compromised state in less than 4.5 days, which means that the compromised instance was found in its pristine state almost immediately and the miner was installed very quickly after that. If we consider this from a dynamic analysis point of view, over 60 percent of compromised workloads involve cryptocurrency mining as part of that post-exploitation.

In fact, cryptocurrency mining is the most common post-compromise activity that we see in dynamic detection scenarios. Coming back to static analysis, I took one week's worth of scan results that had detections. Of those scan results, I took 8,000 samples entirely randomly. Of those 8,000 detections we had, 45% of them were cryptocurrency miners.

Not all of those were unique, but even if 90% of them were identical, that still leaves over 300 unique samples that someone would have to add detection for manually from this subset week over week if we were to do this one at a time. That's a huge problem. But fortunately for us, we have a machine learning model. I would love to say it's so awesome that it detects all of these, but it's not, at least not yet. Still, it's doing some very heavy lifting for us, and for the ones that it doesn't detect, we have our pattern matching engine and the hashing engine to clean up the rest.

The next engine we have is a third-party engine. We received this engine as an SDK from the vendor. It runs as part of our regular scan flow, but we run this engine in an entirely isolated instance. It's not able to connect with the cloud provider, and it's not able to connect to the internet at all. The only thing we get out of the instance is the finding results, and we are very careful about what we'll accept from the finding results to minimize the chance of data exfiltration.

We use this engine primarily for historical detections because the vendor has seen many samples that we might never see. We don't automatically collect samples from customers, and we don't share these samples with the vendor. So in the event that a customer is not willing to share a sample with us, then we might never see the sample at all. The SDK also offers heuristic detections, and they use a machine learning model for this as well. We are working to develop our own heuristics specific to our environment, but at least for now, these existing heuristics are a very powerful baseline for us.

Runtime Detection and Network-Level Scanning for Comprehensive Threat Protection

We also get parsers in this SDK for common file formats such as archives, images, and documents. Again, we're working on our own parsers specific to our environment to get better scanning performance. But we also have file formats that the vendor might never see because we don't share the samples, so we're working on specific parsers for those as well.

That covers our static detection for EC2 and S3. Mikaelly, my co-presenter, will talk more about a specific use case for S3. But we have more than just static scanning. We also have a runtime service. The runtime service runs on the instance and has system-wide visibility into everything that's happening as it happens. We get file system and network events, and it's also container aware so we can see inside EKS and ECS. We collect events from the system into signals, and a collection of signals get correlated into a finding.

Here's an example of a finding using our latest extended threat detection, which was presented at an earlier talk this morning. You can see the level of detail that we have here. We have a detection for a newly downloaded file. There was a file system event when the file was downloaded that triggered a hash calculation and a hash lookup. The hash was found on our denial list and resulted in this finding.

Here's what the findings used to look like, and you can see why we changed it. If you squint in the top right-hand corner, you can see a reverse shell was created on the system. We have file system events when the file was created. We had system events when the process was started, and we had network events when the process connected to the website and when it redirected the standard IO to point to a particular port on the system. That collection of events resulted in this finding.

This finding shows another detection for a site that tried to connect to a cryptocurrency website. The network event that was intercepted when the process attempted to connect to the site is what resulted in this finding. However, for customers that don't want to run a runtime service on their instance, we have network level scanning as well. This is not relying on explicit scanning; it's just monitoring the activity as it happens.

The network scanning is able to watch for network connections to sites with poor reputation. It's able to detect credential use in anomalous configurations. And it's also looking for data access that is not within the typical behavior patterns on the system. Here's an example of a finding where an instance tried to connect to a cryptocurrency website. Here's an example of a finding where an instance tried to connect to a command and control website. And finally, here is a finding where an IP address that's on our denial list attempted to call an API that is commonly used just prior to an exploitation attempt.

Nubank's Implementation: Achieving Global Compliance and Seamless Malware Detection

This shows how all of our engines work together to protect customers and that's how we keep up with the threat landscape. Now I'll pass to Mikaelly Felicio, who can describe a specific use case for S3. I'm from Nubank's security team and I'm going to present a use case from EC2 and S3. But before this, let me start by introducing us to you. Nubank was born from the discomfort and powerless feeling while dealing with bureaucracy in financial services. We started in 2013 in Brazil with a goal of simplifying people's lives and empowering them financially.

Since 2013 to today, we went from a credit card issuer to the most valuable brand in the country. Seeing how successful we are, we began to search other markets. Seeing how similar problems Latin America has, we started there and now we have presence in three countries: Mexico, Colombia, and Brazil. We surpassed our 100 million clients goal and we still managed to have a customer-centric philosophy with less than one dollar cost to serve each client.

The challenge we are going to present here is that as Nubank was scaling and adding more financial products and more countries, we began to have a problem with that growth and also different regulators in specific countries. This year, we had the opportunity to have a single process for PCI because it's basically the same in each country. The main point about PCI that I'm going to present here is about threat detection. Not only about the requirements that we were going to attend, we also had the company's requirements. Our security team doesn't have to impact or add friction to growth.

So we were searching for a tool that could be effortless to integrate into our system. GuardDuty was the smooth place and most integrated with our environment. To begin the implementation strategy:

I have a main point to present here regarding our new bank implementation. In a bank, we treat each country as a single organization in AWS. So if we're going to do the configuration of GuardDuty on AWS Organizations, we would have administration points for each country. As a global team, it's simpler to go with an invitation-based implementation, which is shown in step 3. But in practice, we have the accounts with the resources in the original place and going further to our demonstration point.

For the monitoring of findings, we have a dual path. As a cloud security team, we have to enable things from other security teams and have this monitoring constantly. The main path and the official path for the security teams for incident response is the one that goes through a bucket and feeds RCM so they can interact. The other path serves the purpose of visibility and also allows us to be promptly aware to have any assistance for the security teams and the production teams invited in the incident.

Beyond the implementation of the detection, we also managed to get ready specifically for EC2. We automated the whole process with a first response where we can use a tag-based pipeline once we drive tags to an S3 object. We managed to have a restrictive policy, so only SOC teams have interaction with the file and have the analysis. If they mapped false positives, they also update the access to storage.

For the EC2, we went from a more conservative approach because an automated first response could lead to outage. So we made available the isolation in the playbook, but it's only applied once the team is involved in the incident. So to finalize our use of GuardDuty, we began with a regulatory request. We managed to revise our process of malware analysis in the bank and also improve the process. Because we use GuardDuty in our already integrated environment, we caused no disruption to productive teams. We didn't break anything and it was very seamless.

We also achieved global compliance and we can roll out to any country that we intend to. This changed completely our coverage in the cloud because we didn't have this before. By enabling this, we managed to get to productive teams a new feature where they can autonomously enable document triage for third-party files and be more secure. The whole process was advised and enabled in less than 12 months, and we still managed to maintain the cost at less than one dollar to serve each camp.

Key Takeaways: Fully Managed Protection and Enhanced Security Operations

Some key takeaways from this session that we learned: the fully managed aspect reduces the cost of ownership for you to detect threats from malware in your workloads and more specifically, aids in ransomware scenarios with the recently launched feature. You can now combine live workloads with backup scanning for backup workloads with a single scan engine that learns from all of these different offerings to collectively provide higher efficacy and higher fidelity to malware-based threats.

It complements, as Peter walked us through, other offerings from GuardDuty by combining and correlating to actually provide visibility into the actual breach scenario: pre-exploitation, compromise, and post-exploitation. This improves security operations with contextual information around the resources, the malware family, the scan engine that contributed, and the attacker information. Thank you for attending this session. I'd like to invite the fellow speakers to join us.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community