AWS re:Invent 2025 - Reimagining SIEM architecture using AWS S3 Buckets (SEC346)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Reimagining SIEM architecture using AWS S3 Buckets (SEC346)

In this video, Vega introduces their federated SIEM architecture that reimagines security operations using AWS S3 buckets. Eli and Eran explain how their Security Analytics Mesh (SAM) technology enables organizations to index and query security logs directly in S3 without centralizing data, eliminating traditional SIEM ingestion costs. They demonstrate how customers achieved 60-80% cost reduction by avoiding AWS egress fees and SIEM taxes while maintaining full detection, threat hunting, and incident response capabilities. The platform connects to multiple data repositories including Splunk, Microsoft Sentinel, Databricks, and Snowflake, providing cross-correlation through a unified KQL interface. Real-world examples include an e-commerce platform achieving 70% cost savings and a Fortune 500 pharmaceutical company monitoring 500+ AWS accounts without data movement.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introducing Vega: Challenging the Monolithic SIEM Architecture

Hello everyone. We are Vega. We're a start-up started in January 2024. We raised $65 million from Cyberstarts, Accel, Wavepoint, and others. We're here to completely change how you do security operations and security analytics. Today we're going to talk about how you can reimagine your SIEM architecture using AWS S3 buckets, something that we all have today, and how you can increase your visibility and get a lot more out of your data without compromising on costs and breaking the bank.

Just a bit about us. I'm Eli. I'm the co-founder and CTO here at Vega. I've been building Vega together with my team for almost two years, since January 2024. I'm based in New York, originally from Tel Aviv. I've been doing cybersecurity for more than 15 years, starting on the offensive side and now on the defensive side. I've done a lot of low-level engineering, which is going to show up in the talk today. And this is Eran. Hi everyone. My name is Eran, head of threat detection and research at Vega. For the past 10 years, I've been doing incident response, threat hunting, and managed the SOC in the past. I've led incident response engagements in a global IR firm, and I've been at Vega for around a year and a half.

When we started Vega, we spoke with hundreds of enterprises about the challenges they're facing with their SIEM and their security operations programs. We identified three common problems amongst almost every customer we spoke with. The first one is growing security telemetry volume. With migration to the cloud today and multi-cloud environments, the types and volumes of security telemetry grow year over year, significantly. You have cloud logs, VPC flow logs, and all these logs that you want to generate security insights from, but it's really challenging, often because of the cost constraints that come with the traditional architecture of the SIEM.

The traditional monolithic architecture requires you to send all these logs and pay for each byte of log that you're sending into your SIEM. Therefore, you have to deal with all these cost constraints. Another common problem we've seen amongst many organizations is that as a result of the growing data volume as well as the cost constraints, organizations keep multiple data repositories of their security telemetries. Maybe they have some in each cloud environment. Maybe they have a bunch of different SIEMs. We've seen organizations with nine different SIEMs and different data lakes. Think about how you can deal with that.

All of these problems really stem from the outdated monolithic architecture of the SIEM, which requires organizations and leads engineers and security operations teams to work with certain assumptions about what you can do and what you should do with your security telemetries. The most important assumption that we are challenging at Vega is security telemetry centralization. We're asking ourselves, why do you need to centralize and copy each byte of data that you would like to monitor? Why can't you just query data in multiple data repositories instead of sending it all over the internet into one centralized data location?

Another question is, why must storage and detection, engineering and threat hunting capabilities be coupled into the same monolithic system? Just like how we've seen data technologies that decouple storage and compute, why do security operations teams have to rely on a single monolithic piece that couples together their storage and storage fees with the security processes and outcomes that they want out of this security data?

We're also asking the question of why you cannot utilize multiple data repositories and storage options, maybe even your S3 buckets, to achieve the outcomes that you would like out of a traditional SIEM by querying the data and correlating the data. What happens when you have multiple data repositories? Maybe that's due to M&A. Maybe you've acquired a new company and they have another SIEM. Maybe as a result of data residency requirements, you have to keep data in multiple data repositories. But you still want to be able to cross-correlate data across all these different repositories and build detections on top of this data, do some threat hunting, or respond to a real incident. You would like to be able to query all this data. Fundamentally, those outdated architectural assumptions that come with the monolithic architecture of the SIEM make it a huge challenge for the threat detection team as well as the architecture team to solve these problems with the traditional tools we have today.

Federated Architecture in Action: Cost Reduction Through S3-Based Security Operations

Let's talk about how it affects real AWS customers today. We can take an address as an example of how to leverage a federated architecture to achieve security operations. There's an organization that wants to monitor and apply detections, threat hunting, and incident response over their AWS environment. First, they need to enable those logs: CloudTrail, VPC flow logs, and GuardDuty. There's a fee, of course, for AWS to enable those services. These services create logs and audits, and those logs need to be retained. You have AWS storage, which includes CloudWatch and S3 buckets. So you have the service creating the logs and the storage there, but the team cannot really leverage those logs for any security purposes when they're simply retained in that storage.

In the old-fashioned way, they need to send it to the SIEM. They have to start shipping all the logs from AWS outside of Amazon and into their SIEM. This leads to AWS egress costs for data movement. Then you have the most significant cost for SIEM ingestion and SIEM storage because you want to retain it for a long period of time. Most SIEM solutions today also require you to apply security modules, which costs additional money. If a security team wants to have security operation capabilities on top of your AWS environment, they have to spend money in six different places just to do this simple thing.

Imagine there's an organization with 200, 400, or even 750 AWS accounts. Consider how much money it costs just to get the data to where they can start leveraging it. This is one example of what we're coming to fix and address in our federated approach. Let's see how this looks when we do not ship the logs to the SIEM. You have the service, so you need to enable those logs. You need to store them in database storage, but that's it. What we're basically doing is applying an indexer to index those logs exactly where they already are, where they natively already live. We're using indexing compute in the same cloud, in the same region, to take those logs in S3 buckets and create an index for them in the same storage. We're able to achieve that index without shipping the logs outside of AWS.

The second part of this process is to be able to run queries, threat hunting, incident response, and detections. That's where Vega's console, which is external, is able to reach out to that storage and that index and do queries, do federated detections, do incident response, and essentially everything a security team would need to do detection and response over AWS.

We just have some service fees, storage, the three buckets, and some compute, but it's a small fraction of the money it actually costs when you don't need to ship it. In this scenario, an organization with 750 or more AWS accounts can achieve these security operation capabilities with 60 to 80% cost reduction by doing this approach. There's no SIEM tax.

Let me share 22 use cases as examples. The first one involves two organizations that we're working with who are using S3 buckets to use the federated architecture. One is a major e-commerce platform. They essentially had a SIEM solution that they used for everything they needed, but they also needed to have a longer retention period, so they were shipping everything to S3 buckets just as a backup. When we started a conversation with them, we told them that all their logs were already in a very low-cost, high-performance storage. But they were also sending it to the SIEM, and it cost them so much money.

So we started to index all their logs—their firewall, EDR, Office 365, any data source that they had in S3 buckets. We applied indexing for it exactly where it is. This made their SIEM entirely redundant because they were able to create detections on top of S3, threat hunting on top of S3, and incident response the same way. They were able to entirely remove the SIEM, and the immediate value was a 70% cost reduction in total. They have the same security capabilities, but even more than that. Because now that they have this additional cost savings, they're able to say that they can have longer retention periods of their data. They can onboard more data that they could not have onboarded before, such as load balancer logs, WAF logs, or EDR event telemetry—something that they could not really fit in their SIEM. But now that they have this extra money plus the capability to operationalize data in a low-cost, high-performance storage, it's an opportunity for them.

The second use case is a Fortune 500 pharmaceutical company. They had a very large SIEM with really dozens of terabytes per day. At some point, they realized that they could not afford to send all the logs from AWS to their SIEM just because of the insane amount of costs that it takes. So they were looking for a solution to have security operations on top of their AWS without shipping it to the SIEM. We came in and started to index all their data. They have 500 or more than 500 AWS accounts. So we started to index their CloudTrail, VPC flow logs, GuardDuty, CloudWatch, any piece of data, RDS logs, everything they needed for their security day to day.

They gained full visibility and all detection capabilities on top of it, but did not ship a single log out of Amazon. Everything stayed there, no egress cost at all, and no SIEM tax. But then you can say that they have an issue now. They have logs for their security team in S3, but also have logs for security in the SIEM, so practically what happened there is data fragmentation. The security teams need to work on two consoles, and this is what we wanted to avoid. So Vega also has that capability to go ahead and connect to an existing SIEM, also an existing data lake, and also to object storage and cross-correlate information that does not exist in the same place.

Security Analytics Mesh (SAM): Unified Security Operations Across Multiple Data Repositories

There are three core things that we're focusing on. The first one is the capability to have correlation across storage. If you have logs in S3, logs in your SIEM, logs in your Elastic, logs in your Snowflake, logs in other cloud platforms, you can still run one query that is taking these logs and cross-correlating them, so the fragmentation becomes irrelevant. The second thing is hybrid data lakes, meaning that you can have a data lake, let's call it S3, you can have something which is Elastic, and you can have the SIEM. So different data lakes of different technologies and even different query languages, but it would still work seamlessly.

The third thing is to still keep a unified security view, meaning one platform, a true single pane of glass, even though you have different data lakes. It's still one console, one query language, and just one interface to work in for the team. So you must be asking yourself how this magic works behind the scenes.

This is basically what we've been working on for the last two years, and we call it SAM, Security Analytics Mesh. The architecture of Security Analytics Mesh really gives you the flexibility of connecting to the data wherever it makes sense to keep the data. You might be thinking that you now have to rip and replace your existing SIEM, but no. As everyone said, one of the core capabilities of Vega is understanding that you already have made a lot of investments in your EDR platform, in your existing SIEM platform, and in maybe a data lake where you're shipping some of your logs. Perhaps you're using a pipeline tool like Kafka to push them to S3 buckets. Vega can help instantly operationalize all of these data repositories without moving all the data.

We can connect to any SIEM: Splunk, Microsoft Sentinel, Google Chronicle, Elastic. We can connect to any NextGen SIEM and XDR platforms like Sentinel One Data Lake or CrowdStrike NextGen SIEM. All of the telemetries that you already have in there can stay in there. Over time, if you feel like it makes sense for you to move them to S3 buckets or a data lake, it's up to you. You now have the flexibility to choose where you want to store your data.

The real power of Vega comes when we talk about data lakes and object storage. We can connect to data lakes such as Databricks or Snowflake. If your CIO decides that you're going all in on a specific data lake, that's fine. That should not change how the security team is working or building and managing detections. Most importantly for AWS customers, Vega shines if you have a lot of data already in object storage.

If you think about most of the telemetries that are born in the cloud, or basically all of the telemetries that are born in the cloud, they are naturally born in object storage. Vega allows you to keep all the cloud logs and the VPC flow logs within those object storage buckets. What we do is we have our own indexing technology where you don't need to ship the data to any data lake or SIEM. We can connect and index the data within the same cloud and within the same region. This architecture makes so much more sense because there is no need to send data all over the internet.

Think about a multi-cloud environment where you're using multiple clouds. We have a customer who is utilizing all three major clouds across tens of regions and they have thousands of cloud accounts across each cloud provider. With them, we can instantly operationalize all of their data locally and save a lot of money. This can come up to millions of dollars and even tens of millions of dollars, but still achieve all the security outcomes that they want and get visibility out to all of the security telemetries.

You also have much more power because you can be in control over your own indexes. You can keep the indexed buckets in your AWS account. Now you have more control and more flexibility over where you want to put your data, and less cost. All of these data repositories are accessible through a single KQL query interface. Our user interface and user experience is very good, and people love it, especially analysts who are already familiar with the pipeline language like SPL or Microsoft Sentinel's KQL. We leverage the same KQL so it's easy to get started instantly and get your SOC familiar with the system.

We also leverage a lot of AI and LLM capabilities within the product to make asking simple questions really easy. The most important piece is that this is the foundation for an AI-native security operations. Once you have this single semantic data layer across all of your security telemetries in every data repository, you can now achieve the AI outcomes that you want out of your data. We all know that your AI triage, AI detections, and AI ability to respond to threats is only as good as the data that the AI and LLM can access. We're almost out of time, but if you would like to chat more, we'll be at our booth 1928. We're also going to give out hotdogs, so I would love to see you there and chat about your same challenges and how Vega can help. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.