🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.
Overview
📖 AWS re:Invent 2025 - Fast-track to insights: AWS-SAP data strategy (ANT333)
In this video, Satish and Bjoern explore AWS and SAP's 17-year partnership for enterprise data integration. Satish demonstrates AWS Glue Zero-ETL, a fully managed, no-code service that replicates SAP data to S3, S3 Tables, or Redshift in near real-time (under 10 seconds), supporting sources like SAP OData, Oracle, MySQL, and Salesforce. Bjoern introduces SAP Business Data Cloud, launched on AWS in February 2025, which creates data products combining SAP and non-SAP data using a lakehouse architecture with medallion layers (bronze, silver, gold). The solution leverages open resource discovery protocol and Deltashare for zero-copy consumption, enabling AI agents to understand data semantics through knowledge graphs. Both solutions aim to unify scattered data across SAP HANA, DataSphere, BW, and AWS native services like Redshift, QuickSight, and SageMaker AI for comprehensive analytics and generative AI workloads.
; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.
Main Part
AWS and SAP Partnership: Leveraging Zero-ETL for Unified Data Integration
Hello everyone. Thanks for joining today's session. Does anyone know how long the partnership between AWS and SAP has been? It's 17 years. In today's session, we will be talking about how you can leverage this 17-year deep partnership to build some of your enterprise use cases. I'm Satish, and I work as a Principal Specialist SA with AWS Analytics. Beyond SAP CTO will be joining me as well. Together we will walk you through some of the AWS and SAP features which will help you to fast-track insights wherever your data is, whether it's in SAP or non-SAP systems.
Let's get started. Before we dive into the specific features, let's understand why customers are choosing AWS for SAP workloads. AWS offers the broadest set of SAP data platforms, including SAP HANA, DataSphere, and BW. You have all kinds of options available to you in AWS. Beyond data products, you also have the option of AWS Native Services for your security, governance, and for building your data strategy, data pipelines, and generative AI workloads. You have over 200 services in AWS which will give you comprehensive capability to build use cases.
Customers are appreciating the comprehensiveness of this partnership. We have thousands of customers across different industries, including healthcare, automotive, technology, and retail. For example, Moderna, one of the first COVID vaccine supply delivery firms which delivered millions of vaccines in a very quick time across different places, runs their supply chain optimization on SAP using AWS Cloud. Similarly, we have Toyota, Volkswagen, and Adidas from retail, all running mission-critical workloads on AWS.
We talked about comprehensiveness. Let's understand some of the choices you have to begin with. In SAP, we have been supporting SAP DataSphere, BW, and SAP HANA. All those products have been supported for quite some time. But this year, SAP Business Data Cloud was first launched on AWS in February 2025. With Business Data Cloud, you will be able to build data products to derive insights from your data very quickly.
Then the next set of options you have are AWS Native Stack for data processing. You have AWS Glue, EMR, and for data warehousing you have Redshift. For visualization you have QuickSight. If you would like to build AI/ML or generative AI workloads, you can use SageMaker AI. That's the comprehensive set you have at hand to build any kind of use cases.
The choice doesn't end here. You have SAP options and AWS Native options, and you also have options with providers such as Qlik, Calibra, Snowflake, and Databricks. That makes the total technology stack very comprehensive to solve virtually any kind of use case you have.
Choice is great, but what you do with that choice really matters. Whether you are deriving any value and insight out of those choices makes a big difference. Data is your unique asset, and your data might be spread across different sources, some in SAP and some in non-SAP. What we always tell customers is don't leave your data scattered. Onboard that data into a unified lakehouse and build insights out of it.
Customers often come back and tell us that's great. We would like to bring all our data and derive insights, but onboarding that data is super complex because of the different technologies involved, different skill sets involved, and the people involved. That's where AWS Glue offers different native connectors across all these sources, so you can quickly build pipelines and bring the data into a lakehouse, whether it's S3, S3 Tables, or Redshift. In today's session, I will be talking about a new feature called Zero-ETL, which is a fully managed service that will help you to replicate data from a variety of sources into a Unified Lakehouse.
Let's understand how Zero-ETL works. When you set up Zero-ETL for the first time, it will create a snapshot of your source data and seed it into your target, whether it's S3, S3 Tables, or Redshift. That is the first thing that happens when you set up the Zero-ETL integration. From there on, whenever a change happens from the source,
Once the zero-ETL integration is set up, whenever a change happens from the source, there will be continuous replication, incremental replication from the source to the target. This replication often happens in less than 10 seconds, providing near real-time replication. You have the choice of picking specific data objects that you would like to replicate. We particularly recommend this approach in the case of SAP, selecting only the specific data objects you want to unify rather than replicating all your data. You also have that option with zero-ETL, and you can choose the refresh interval based on your data refresh requirements. You may have near real-time requirements or you may want a daily refresh. You have that option with zero-ETL.
Let me show you a quick demo. This is the AWS Glue management console. On the left-hand side, you have the option to select zero-ETL integration. You will start by creating a zero-ETL integration, and for SAP, the connection is through SAP OData. I have the connection pre-created, so I'm selecting that connection along with the IAM role that has access to the SAP instance. Now I'm choosing only the specific object. In this case, I'm interested in purchase orders, so I'm selecting the purchase orders and I can do a preview of the data before bringing it from SAP. You can see the preview here. Now you will select the target for this demo. I'm selecting S3 as the target cataloged in the Glue database. That's the Glue database I selected, and I'm specifying the IAM role that has access to write into the Glue database. The rest of the details are auto-populated, but you can choose the specific target table name. Either you can leave the source target table name as is, or in this case, I'm renaming it to purchase orders so it shows up as purchase orders in my target table.
Now you will select the refresh interval. The default is one hour, but you can choose 15 minutes. I'm editing it to 15 minutes and giving a name to the integration of your choice. That's it. You've selected the source, target, and refresh interval, and you'll click next. You can review all the details of what you selected, and then click on create zero-ETL integration. As I mentioned, it will first create the seed from source to target, and as soon as that seed is complete, the status becomes active so that it is ready for querying the data. Everything is through the UI with no code required. It's straightforward. Once the status is active, you can click on the target and see the list of tables. You see purchase orders and also a metadata table which contains the status of replication and how many records got replicated. If I click on the table data, it will open Amazon Athena. There you can see the data source and the different tables that got replicated. You can see here that the replication is completed from purchase orders, and this is the volume of data that got replicated. You can also query the actual purchase orders table by right-clicking and seeing the preview. You will see the different manufacturers and the plant IDs, the entire data that is available which got replicated into the Glue database.
The point we would like to drive through this demo is that the replication is straightforward without any code involved, and it's a fully managed, no-cost service for you. Before I hand over, I would like to give you a comprehensive overview of the different sources and targets that are available for zero-ETL replication. You can replicate from AWS native databases or from third-party applications such as Salesforce. In this year's re:Invent, we announced support for self-managed databases like Oracle, MySQL, and SQL Server, PostgreSQL, whether they are hosted on EC2 or on-premises. You can use zero-ETL to replicate the data seamlessly between these sources to the target of your choice.
SAP Business Data Cloud: Solving Data Integration Through Data Products
With that, I hand over to Bjoern to talk us through SAP Business Data Cloud. Thank you. Thank you, Satish, and thanks for having me. I'm really excited to be here today. My name is Bjoern Friedmann. I'm the CTO for Business Data Cloud, and today I would like to talk about how you can integrate SAP data with non-SAP data. As Satish was showing, there's quite an easy way to use zero-ETL to replicate a subset of the data from SAP into AWS. However, if we look at SAP data and how it's typically being used, we very often see that there's a wide variety of different downstream consumption, so you have different user personas, you have application users, you have analysts, you could have developers or data modelers, and typically they used to put all the data into various buckets.
However, the problem with SAP data is it's so specific in terms of semantics and so complex that in order to really make use of it in a reliable way, you have to recreate the semantics over and over again. Now with the introduction of Business Data Cloud, we actually solved this data integration problem for all data holistically. I want to introduce how we did that.
In principle, what we want to do with Business Data Cloud is make it super easy for SAP customers to use SAP data, either in combination with non-SAP data. We also provide capabilities to seamlessly use that data with other providers like AWS, for instance. The goal to achieve is to unblock AI and with that agents to make use of this data and to really derive meaningful actions by fully understanding the semantics of the data.
On the bottom layer you see the SAP applications. This could be S/4HANA, it could be Ariba, SuccessFactors, all of these things. In Business Data Cloud we then basically create so-called data products, which are a 360-degree overview of all the source data coming from various systems. Ultimately we then feed the data plus the semantics of the data into a knowledge graph, which then can be hydrated by agents in order to really take action on top of this data by fully understanding the semantics of the data.
So what is Business Data Cloud? Business Data Cloud consists of a lakehouse layer. On the bottom part of the slide you see data products. We manage these data products in an object store, and I will come to this in a second when we look at the architecture. On top of that, we have the so-called business data fabric. Business data fabric comes with various options for you as customers to consume this data.
This could be, for instance, in our cloud warehouse which is called DataSphere. You can then put analytics on top with SAP Analytics Cloud, but we also provide a way and a path for BW modernization into the cloud. Ultimately, we then also offer SAP Data Products as a built-in native citizen into our stack AI and ML solution in order to achieve all these AI and agentic use cases. In addition, we're also working on enabling other partners like AWS in order to make use of this data.
Ultimately on top what we have is a comprehensive application layer we call these intelligent apps. Intelligent apps could either be industry solutions or it could be any kind of application built on top of our Business Technology Platform, but it can also be third-party applications consuming or producing data products.
So what are data products? Data products you can think of as more than just the relational datasets which you typically would extract out of SAP. We also ship these data products fully managed as a fully managed service, so it's a SaaS offering basically. We also enrich these data products with the semantics of the source system. The metadata which we have for this is open sourced, so this is open resource discovery. You can look this up on the internet. It's a fully open source protocol invented by SAP and we're using basically this as a foundation for integrations with partners like Databricks or AWS.
In addition to this, we also offer a various set of APIs in order to access these data products. For instance, you can access data products via Deltashore. That allows for zero-copy consumption in various downstream services, be that now apps, warehouses, or other applications. We also will allow SQL-based access on top of data products.
Now what we achieve with this is basically we established what we call a data product economy. If we look on the upper part of the slide on the left side, you see the SAP-managed data products. Those are data products which we ship out of the box as a managed service for all SAP line of business solutions which are native cloud citizens. So that means SAP S/4HANA Cloud, private cloud edition is one of the sources, SuccessFactors, Ariba, Concur, but also all the other line of business solutions which are running in public cloud. These public cloud sources can be hydrated and can be used as a source for fully managed SAP data products. In addition, we also allow customers to create what we call custom managed data products.
These could come from sources on-premises, such as S/4HANA on-premises, but it is also possible to integrate BW-based data and ultimately you can also integrate third-party data. By enabling the combination of these various data products in a zero-copy fashion, we also enable partners to create data products and share them back through the open resource discovery protocol that I was mentioning, and ultimately through Deltashare we enable bidirectional sharing in both directions.
Architecture and Implementation: The Medallion Approach in Business Data Cloud
Now let's look at the architecture and how we set this up. As I was mentioning, Business Data Cloud is built around the lakehouse solution. That means the heart and center of data products is an object store where we actually manage these data products. It was very important for us to decouple the ingestion layer from the consumption layer. The consumption layer can be applications, warehouses, or partner solutions, and for this it was very important that we establish a stable interface that is completely distinct from the source layer.
In order to manage this, here's what happens: a consumer, which could be an application or an administrator using the so-called PDC cockpit, which is our administration interface, triggers an installation of a data product. That calls over into the BDC layer where we have an orchestration layer which then manages the whole data integration chain. It starts with calling an API in the source system, in this case S/4HANA. Let's say a customer wants to install a sales order data product. We would tell S/4HANA to start collecting all the data for the sales orders. Sales orders consist of various tables in S/4HANA, and then S/4HANA would start to collect the data and push it into the object store.
The data lands in the so-called bronze layer, which is the raw data or ingestion layer, and on top of this we then apply the medallion architecture. For those of you who are not familiar with the medallion architecture, it's an architecture which allows you to decouple the ingestion layer from the consumption layer. What happens technically is we take all the raw data, which could come in as CSV files, parquet files, or completely unstructured data, and then we run a transformation as part of a Spark pipeline. This transformation is basically part of a data product definition, which tells the machinery in our case, Spark, how to transform the raw data into a data product.
Ultimately, in the silver layer this is where you find the so-called primary data product. A primary data product is more or less a one-to-one representation of the source data, but with a stable API for consumption. Then ultimately in the gold layer you could find so-called derived data products. What's a derived data product? Let's say you want to combine sales orders coming from S/4HANA with employee data coming from SuccessFactors or with third-party data coming from whatever sources you might think of. If you want to combine these data, that's possible because you can point to a primary data product as an input for a derived data product. You can then run transformations and with that you would create a derived data product which you find in the gold layer.
After this transformation has been done, we then expose the data product with its semantics in a central catalog service which is called Unified Customer Landscape, and there we register the metadata as part of this document. This is now also the service which allows consumption in other downstream applications or partner applications. With this, let me end the session for today with a call to action. Here you'll find a couple of links if you're interested in either how Business Data Cloud works or what zero-ETL integration means and how it looks. This is where you can find these links. With that, let me thank you and remind you that you can give feedback for the session in the normal channels. Thank you so much.
; This article is entirely auto-generated using Amazon Bedrock.

































Top comments (0)