Kazuya

Posted on Dec 9, 2025

AWS re:Invent 2025 - Modernize your data warehouse by moving to Amazon Redshift (ANT317)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Modernize your data warehouse by moving to Amazon Redshift (ANT317)

In this video, Manan Goel, Satesh, and Yannick from Roche discuss modernizing analytics data warehouses with Amazon Redshift. The session covers key trends from a Harvard Business Review survey showing 83% of data leaders prioritize generative AI and agentic AI, while data foundation remains a challenge. Demonstrations include Zero-ETL integration for simplifying data pipelines from 23 sources including Salesforce, DynamoDB, and PostgreSQL, plus Redshift MCP Server enabling natural language queries through Amazon Bedrock. Roche shares their migration journey consolidating 300 data sources, decommissioning five legacy platforms, processing three million Redshift queries daily, achieving 35% cost reduction and 18% SLA improvements. The presentation emphasizes ELT over ETL architecture, Lambda UDFs for AWS Translate integration, and Redshift Spectrum for bridging data lakes with warehouses.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Modernizing Analytics with Amazon Redshift

Welcome to the session titled Modernize your Analytics Data Warehouse with Amazon Redshift. Over the next hour or so, we'll talk about how Amazon Redshift as a modern data warehouse helps you to break down data silos, bring data together, analyze it in different formats, and deliver transformational business value.

Just a quick introduction of the speakers today. My name is Manan Goel. I'm a Principal Product Manager with the Amazon Redshift team. I've been with Redshift for about seven years now. We also have a couple of other presenters, so I'm really excited. Satesh will join us. He's a Principal Solutions Architect with Redshift, and he'll talk about some of the capabilities of Redshift. Finally, Yannick from Roche is also going to join us. So let's kick it off.

First, just a quick lay of the land in terms of the agenda for today. We'll start with talking about some of the key trends that we are hearing from data and analytics leaders when we talk to them about their data analytics needs. What are we hearing from customers? Then we'll talk about how Redshift is evolving and adding new capabilities and features to deliver on the trends that we are hearing from customers. Third, we'll talk about how you'll get an opportunity to see Redshift in action. We know agentic AI is top of mind for everyone, so we'll show you some of the use cases of Redshift with live demos in terms of how you can use Redshift for agentic AI use cases using Redshift's MCP server and things like that.

Finally, we'll turn it over to Yannick to take us through Roche Pharmaceuticals' journey around data warehouse migration and modernization. They have done some really phenomenal work in terms of consolidating multiple data warehouse environments across multiple countries, so we'll get a chance to hear from him about some of the best practices around migration and modernization. Finally, we'll close it up with the next steps and how to get started with your modernization and migration journey with Redshift.

Key Trends in Data Analytics: AI Adoption and Data Foundation Challenges

All right, so let's start with the trends. I wanted to start with a survey that Harvard Business Review recently did with over 600 data and analytics leaders like yourselves, and there were two key trends that prominently emerged in that survey. First of all, 83%, or more than eight out of ten data analytics leaders in that survey, said generative AI and agentic AI are key strategic initiatives for their organization. I'm sure you are also seeing these trends in your organization where everybody is looking at these technologies as foundational technologies of our times in terms of getting more insights from the data and really delivering transformational customer experiences.

But interestingly enough, in the same survey, Harvard Business Review also found out that for more than five out of ten customers, their data foundation remains a stumbling block in terms of getting value out of data analytics. So there's a lot of work that needs to be done in terms of getting your data foundation ready so you can take advantage of these foundational technologies like generative AI and agentic AI.

Now when Harvard Business Review dug deeper into the key things that these data analytics leaders are working on as far as getting their data foundation ready, four key things jumped out. They said they really want highly curated, high quality data because agentic AI and generative AI are basically based on the value of the data. The better data that you have, the better insights you're going to get. So data quality and curation remains a key initiative that these data leaders are working on.

Other things like breaking down data silos, removing data fragmentation, and getting data together in one place, whether you move it or provide federated access to it, remains a key area that leaders are focusing on. The third thing is data security and governance. One of the things that you're doing with generative AI and agentic AI is you are exposing it to a lot more consumers, whether they are human consumers or machine consumers within your organization or outside your organization. So in that context, governance and security is really important.

And finally, again with agentic AI and generative AI, what we are seeing is the order of magnitude of the queries is increasing quite dramatically.

So being able to deliver these capabilities at scale while being cost effective is also very important, and that's where Redshift as a modern data warehouse comes into play. Redshift out of the box provides you with a number of capabilities in each one of these areas to help you build a very strong data foundation.

Amazon Redshift Architecture: Building a Modern Multi-Warehouse Lake House

Just looking at the Redshift architecture, starting in the center, I wanted to point out a few things that are foundational capabilities of Redshift, which makes it really easy for you to build a strong data foundation. First of all, Redshift gives you the ability to build this multi-warehouse lake house architecture. You can move away from single monolithic data warehouse environments and more into this modern distributed data warehouse architecture.

We give you capabilities such as separation of storage and compute so you can scale one independent of the other and get the best price performance depending on your workloads. We give you a lake house architecture, which means that you can both store and process structured as well as unstructured data, either in a data warehouse or in data lakes in open formats like Iceberg with SD tables and general purpose S3 buckets and things like that.

Also, a foundational capability of Redshift is this multi-warehouse architecture where we see a lot of customers move away from single monolithic clusters to this multi-cluster architecture which is much more scalable, reliable, and cost efficient. You can use provisioned as well as serverless compute depending upon your use cases. We also give you features like data sharing where you can have a single copy of the data yet have multiple computes go against it with workload isolation so you don't have the noisy neighbor problems or resource contention in terms of dashboarding workloads being impacted by data science workloads and things like that.

Beyond that, we also give you the ability to do different kinds of analytics, so SQL analytics, Spark analytics, AI/ML, generative AI analytics, and the ability to bring data together from multiple sources in your organization, whether they are relational sources or streaming sources with capabilities like Zero-ETL and streaming ingestion. Finally, from an infrastructure perspective, we use AI and ML capabilities extensively in the data warehouse to automatically scale the data warehouse up and down depending upon your requirement.

So all in all, Redshift gives you a strong platform in terms of the key needs that you have around building a modern data warehouse architecture and modernizing your analytics. But of course, being AWS, we're not stopping with what we have. We continue to work back from your requirements and continue to bring out new capabilities to further improve the data warehouse and further improve the analytics.

Recent Innovations: 60+ New Features Across Performance, Lake House, and AI/ML

Just in last year, the team has been actively working and we launched over 60 different features in these four main areas to give you additional capabilities to further simplify and make it easy for you to build a modern analytics architecture. Some of these things I just wanted to highlight. If you're building a multi-warehouse architecture, we're giving you performance parity across this distributed architecture. On the serverless side, we're improving the performance of serverless, giving you more and increasing the RPU ranges.

Lake house analytics remains a key investment area for us, so treating Iceberg as a first class citizen as far as the data warehouse is concerned and giving you both read and write capabilities on Iceberg. On the near real-time analytics side, we're continuing to expand the sources of databases that we support, making it easy for you to move data with one click. We're adding support for on-premises as well as EC2-based databases like Oracle or SQL Server, supporting them as sources, and supporting business applications like Salesforce and ServiceNow as sources and seamlessly moving data from them into the analytics environment so you can quickly do analytics.

Finally, on the AI/ML and generative AI side, we're also delivering capabilities like MCP Server and integration with Amazon Bedrock so you can run LLMs within the data warehouse or you can build your agentic applications easily. Of course, the proof is in the pudding. The product continues to evolve based on your requirements. I also wanted to share a customer success story with you in terms of how one of our customers modernized their analytics infrastructure using Redshift and what results they are seeing in their environment.

Charter Communications Success Story: 35% Cost Reduction and 18% SLA Improvement

The example that I have is from Charter Communications. For those who are not familiar with Charter Communications, it's a United States-based large telecommunication and mass media company. They provide internet, phone, and cable TV services here in the US to millions of customers. You can imagine for them, scalability, reliability, and being able to do these kinds of things in a matter of minutes or seconds is really important. Think about when a new Netflix show comes up and becomes extremely popular. Everybody wants to watch it, and you want to be able to scale up so you don't have any streaming issues or things like that when people are watching it.

This customer went through their migration and modernization journey, and some of the results that you're seeing here include a 35% reduction in cost, 18% SLA improvements while moving 600 terabytes of data from their on-premises system, over 500,000 queries, 40,000 objects, and things like that. It's pretty phenomenal in terms of what they were able to deliver. I just wanted to give a quick background around what their environment looked like before and after they went through the migration and what results they were able to see.

This is what their architecture looked like in the past when they were running an on-premises data warehouse. You can see a lot of duplication here, a lot of inefficiencies in terms of multiple pipelines for batch processing versus near real-time processing, and a lot of issues that you see in a monolithic data warehouse environment. They had shared resources, multiple workloads competing for similar resources, missing SLAs as a result of that, not having the elasticity and scalability that their business requires when events like the Super Bowl or a new Netflix show comes out, and then poor disaster recovery options, high operating costs, and things like that.

This is their final architecture when they completed their migration to Amazon Redshift. What you'll notice is a much simpler, cleaner architecture. The duplication across those multiple pipelines for batch and real-time is gone. There's a single pipeline which is doing both batch as well as real-time processing. They have this multi-cluster architecture, a hub and spoke architecture with a centralized data lake cluster which is ingesting all the data. On the spoke side, they have these serverless data warehouse environments which give them workload isolation, purpose-built environments giving them better performance and even saving costs. For example, when they're using a serverless data warehouse for dashboarding, if nobody runs the dashboards on the weekend, the compute goes into a sleep state and they don't pay anything for compute. So it's pretty phenomenal in terms of business results.

These are some of the results that they saw: 18% improvement in SLA. The elasticity and scalability, which used to take days or months in the past, now happens in minutes or seconds. The disaster recovery has improved quite a lot with dramatic improvements in RTO and RPO. They are able to bring new products to the market a lot faster and can give new capabilities to their customers much more quickly. For example, we launched Iceberg materialized views this week, and they're able to quickly take these kinds of capabilities and roll them out into their environment so their customers can benefit from it. Finally, they're reducing operating costs by over 35%. So it's pretty phenomenal in terms of the results that this customer saw by moving to Amazon Redshift and then also modernizing with Redshift and using it for AI, ML, and generative AI use cases.

Business Use Case: Optimizing Marketing Campaigns for a Sporting Goods Company

All right, so what I'm going to do next is turn it over to Satesh to actually walk us through some product demonstrations so you can see some of these capabilities live in action. Thank you, Manon. All right, so we'll start with the business use case. Let's assume you own a sporting goods company. You sell tennis rackets, basketballs, cricket bats, everything anybody could play with. You're doing pretty good. Your sales are good. You're a billion-dollar company, but you have a new CEO who joined, and she wants to increase the sales more. Who doesn't want more money, right? Everybody wants more money, so she asked your sales head and IT head to come up with a plan on how you can improve the sales revenue.

They reviewed your business processes and IT processes and identified two gaps which can maximize revenue for your organization. One is

the time it takes for your campaigns to reach your customers is not timed well. To give an example, if a game event is going on today and your personalized ad campaigns reach at the end of the game or the next day, you will not be able to benefit from maximum revenue out of it, right? So that is one gap. The second gap is your marketing teams are not able to quickly put together personalized promotions for your customers. If you could optimize and solve these two problems, your revenue will increase. So that's the outcome from the sales head.

Then your CEO asks your IT head, why is this? How can we solve this problem? Fundamentally, the main reason for the slow and mistimed marketing campaigns is long, complex pipelines which pull data from a variety of channels in your organization, so that needs to be optimized. The second thing is your marketing analysts rely heavily on your IT teams to run queries, run data insights, and give them the data and feed them the data because they do not have enough skill sets to access all the data sources that are available in your organization. So these are the findings.

Let's see how Amazon Redshift can help solve these problems for your organization. This is how currently your ETL pipeline is. Your customer data is in structured data sources. You have your web applications and social data which is feeding into your game analytics, and you have telemetry coming from different applications, a variety of sources, pretty classic, right? Then you have four different ETL technologies which process this data, different technologies, different skill sets, data landing in between, and eventually reaching your Redshift data warehouse. So the time it takes to make the data available to your end marketing team is so long because of these kinds of pipelines.

Zero-ETL Integration Demo: Simplifying Data Pipelines from Multiple Sources

What if you have a magic bullet which can solve this problem for you, which is nothing but Zero-ETL integration? It is a fully managed service offered by AWS to replicate data from a wide variety of sources into Amazon Redshift. As we speak today, we have around 23 sources which we support to replicate the data from all these sources into Redshift. Those sources are AWS native databases, third-party sources like Salesforce, SAP, ServiceNow, and also some of the on-premises data sources like Oracle and SQL Server. That's a new announcement that happened at this re:Invent. So there's enough choice for you to simplify your ETL pipelines.

Let's see how this happens in action. What you are seeing is an AWS Glue console, and on the left-hand side you see Zero-ETL integration. If you click on that, it will give me the screen to create Zero-ETL integration. I click on create Zero-ETL integration and select Salesforce as the source. So we are going to get the campaign data from Salesforce. I select the Salesforce connection and the IAM role which has access to the instance.

Now you don't need to replicate the entire data. You only replicate the data that is specific to your use case. Here I'm selecting accounts, contacts, opportunities, and campaigns, so I can pick and choose the specific objects that are required for me. While I'm selecting the objects, I can also do a quick preview to see what data I'm getting. You can click on preview so that you will have a firsthand look at what you are replicating and you can do quick sanity checks.

All right, so you've selected the source. Now you need to select the target. Our target for this use case is Amazon Redshift. You can also choose S3 or S3 tables. So I'm selecting Redshift data warehouse in this account, and it is showing the sources. Then you click next. You can choose your own encryption key or leave the defaults and refresh frequencies. You can go as low as one second and give a name to the integration. Here I'm giving Salesforce integration and click next. Once you give the name, you can give any name. That's it. You're pretty much done with setting up the ETL pipeline to pull the data from your source and create Zero-ETL integration.

It will take around 10 to 12 minutes initially, and once it is active, it is ready for consumption. All you need to do is create a database and you can do it right on the screen to query the data that is replicated from Salesforce. You can give again any name of your choice. I'm giving Salesforce DB and create the database.

So this completes the setup. Now we'll go to Amazon Redshift and start querying this data. Click on Zero-ETL integration. It will show the integration that we created. It is active and the database is also active. Click on the integration, query the data. It will open the query editor, the Redshift navigator. You will see the tables which got replicated from Salesforce. Here we selected account, campaign, contact, and opportunity. You can just right-click on one of the sources and see the data.

It's not just about replicating the data. You can also do the observability on this integration. If you click on the Zero-ETL integration, it will show the amount of tables that got replicated and the volume of data that you pulled from Salesforce. You can either monitor from this console itself, or you can also monitor using system tables or CloudWatch. So you have observability also baked into this feature.

Now we saw how you can get the data from Salesforce, but what if you have multiple sources? That's where we will show how you can bring additional data, be it from DynamoDB or your structured data source like PostgreSQL. You can get data from all those sources as well. Let's have a quick look at how you can get that data. I pre-created the Zero-ETL integrations in similar fashion. If you see here, your channel data is coming from DynamoDB, structured data from PostgreSQL. You can click on one of them and click Query Data.

What happens when you do the Zero-ETL replication from all these sources is they land up as separate databases inside the cluster. Each source shows up as its own database, and here you can see the data from DynamoDB, the channel tables. If you expand the next one, the customer database, you will see all the customers and orders which you pulled from Aurora PostgreSQL, and then Salesforce. You saw a detailed demo on how we pull the data. The point here is you can get the data from multiple sources and run a single query across all these databases from your Amazon Redshift cluster. You don't need to go to individual sources and do a lot of ETL work around that.

Empowering Marketing Analysts with Natural Language Queries

This solves the first problem, taking back to your use case. Now you have a simplified ETL process which can pull the data from different sources and load it into your centralized data warehouse. But your problem is not solved yet. You only solved the first part. The second part is your marketing analysts should be able to operate on this data and quickly generate insights. You are able to get the data faster, but can they consume this data? The answer is no.

The reason is right now, the process they follow is they reach out to your team asking, "Hey, I need data for these customers who are attending this game. I want to generate a personalized promotion plan for them. Give me the data." Then your team will go ahead and run a bunch of SQL queries across these data sources and get the data out to them, and then they'll be able to work on their promotion plans. Is this real time? The answer is no. You still have humans in the loop, and they process this data and equip your marketing team with all the information.

So how can you solve this problem? What if a marketing analyst asks the question in natural language, and then your data warehouse understands that question, automatically runs the queries behind the scenes, and generates the output for them in natural language? Then they don't need to go through your team or your IT teams to do all that querying, and you can also eliminate the delay that is occurring in your promotion process. Let's see how this can be achieved using Amazon Redshift Serverless. How many of you know what is MCP? Just a show of hands. All right, so MCP is the buzzword now. Everybody understands, but very quickly, it is a standardized protocol for applications to communicate with LLMs.

Amazon Redshift launched Redshift MCP Server in June 2025, and we have seen significant adoption from that time. What happens behind the scenes? Let's say a marketing analyst asks the question in natural language from either a front-end tool or any client, let's say Amazon Q, Claude, Visual Studio Code, or Cloud Desktop, any of the clients. Then the natural language prompt goes to the LLM on Amazon Bedrock, and it tells, "Hey, to answer this question, I need X, Y, Z tools, and I need to run them in a specific order to address this question." The LLM is the brain behind identifying the tools to solve your problem and orchestrating the tools.

Then it responds back to your client saying that, "Hey, this is the set of tools you need to use, and this is the sequence you need to run, and that will help you to solve the problem." Then the client uses the API of Redshift MCP Server and makes calls to your data warehouse, and your data warehouse runs those tools and calls and returns the data back to your end client, and your end client shows it to your marketing analyst. All these things happen

behind the scenes. You as a marketing analyst don't need to worry about it. The client, Amazon Bedrock, Redshift, and the Redshift data warehouse work as a unit to address your question in the natural language prompt. So this is how you can simplify it. We'll do a bit more detailed demo on how you can see this in action.

Redshift MCP Server in Action: From Natural Language to Automated Insights

Because we are all technologists, we need to understand a bit more than just the final output. So I'm in Amazon Q. You can use any of the MCP clients. So let me go ahead and first ask what tools I have. I can just ask for tools. And then it gave me the results. I'll pause so that you can take a good look at it. I'll pause and play so you can grab the information. So first you see the built-in tools, and then below you see the tools that are specific to Redshift MCP. You have execute query, and you have list databases, schemas, and tables, so a variety of tools available. While I'm running this demo, pay attention to what tools it is recommending so that you can guess also and see if your guess is right or wrong, so it will be a little fun activity.

So next I want to identify what clusters I have in my account. So I asked, show me all available Redshift clusters, and if you see, the LLM said list clusters is the right thing for you, and it asked the agent to run list clusters. It ran the list clusters tool and gave us an output that you have two clusters. One is an analytics cluster and the second one is a marketing cluster, and it told that both are serverless and what is their status and endpoints, are they publicly accessible, encryption details, all that information it gave you as a snapshot.

Great, I know my clusters now. I need to identify what databases, tables, and schemas I have. So let's go ahead and ask that question. So here I'm going to ask what are the databases and tables available in the analytics cluster. That's my prompt again. The LLM figures out the user is asking for databases, so it will call list databases and then list schemas, list tables in that order, and sometimes multiple times depending on the number of schemas. That's the orchestration that goes behind. And it pulls back all the list of databases, schemas, and tables that are available in your analytics cluster. So if you look at the output, you see there's a dev database and there's a public schema, and then there are customer and orders tables inside that schema.

Okay, great. Now I would like to understand what are the different elements in the customers and orders table. Again, you can ask a question in natural language. Can anybody guess what tool it will call? It's list columns, right? If you ask, show me the structure of customers and orders tables, you can observe that it translates that into a list columns tool and it gets all the information that is behind those tools, and it runs twice, one for the customer table and one for the orders table, and it gives you all the metadata of both those tables. Great. So far you identified your cluster, databases, tables, and columns.

Now put your marketing analyst cap on. You don't need to know about any of these things. All you care about is who are your top ten customers and their purchasing patterns. So let's ask that question in natural language and see what it will do behind the scenes. Again, pay attention to the tools it is calling and how many times it is calling. So I'm asking to analyze customer purchase patterns and give me the top ten customers and their buying frequency, so it will analyze it. It is calling execute query once and execute query twice, once to get the top ten customers and second to get the purchasing patterns. It will get back the results to you and also gives you insights in a summary. So you as a user, you don't need to run any kind of query and you don't need to rely upon your IT teams to do the work for you, and everything happens in real time so you can put the promotional content right when the game is happening.

Okay, bringing it all together, your first problem is simplifying your ETL pipelines, so we use Zero-ETL to simplify it, so that problem is solved, and your second problem is to make your marketing team self-sufficient. That we achieved using Redshift MCP. So this solved the problem which we started with. Your CEO is happy, you're happy, and we are happy too.

So on that happy note, I'm going to hand it over to Yannick to talk about how Redshift helped Roche to solve their analytic needs. Thank you.

Roche Pharmaceuticals' Modernization Journey: People, Process, and Technology Transformation

You may need to use this. Good morning, good morning everybody. My name is Yannick Misteli. I'm Head of Engineering in Global Pharma Strategy at Roche. For those who are not familiar, Roche is a global leader in pharmaceuticals. We operate in over 80 countries, with headquarters in Switzerland, which is also where I'm from, and we have a legacy of over 130 years of innovation.

At Roche, I'm leading a team of roughly 150 engineers, and they're distributed in LATAM, EMEA, and APAC. We support the go-to-market domain of Roche. In pharma, go-to-market is the engine that connects our science and products with the real world. My teams work very closely together with the sales, marketing, and digital teams, and those teams interact with doctors, hospitals, and the broader ecosystem.

Now, five years ago when we started, we set out to modernize this engine. We had three big ambitions, and of course there were also some hurdles that came with that. The first one was around deep customer insights. We really wanted to understand and have a 360-degree view of our customers, but the reality was all the data was pretty fragmented everywhere and we could not get this view.

Second, we really wanted to increase the speed and also accelerate the time to value. But we were so caught up in legacy infrastructure, it was very difficult to spin up new things, so it was impossible for us to reach the speed that we wanted. Lastly, we wanted to enable a global innovation system where we can take local successes and scale them globally, but that was not possible because the technology was so fragmented it was impossible to scale. Also, the mindset was not in place and we kept reinventing the wheel everywhere.

To understand why we couldn't scale, I think this is a good picture of the technology landscape that we were facing five years ago. You can see that we had a huge Oracle cluster in EMEA. We had a big Hadoop cluster in LATAM, a patchwork of SQL servers and MySQL everywhere. We were trying to stitch it together with legacy tools like Informatica, Talend, Alteryx, and whatnot. Maybe the biggest challenge though is in the center of this, because we had so many disconnected systems, the business users reverted to probably the most popular distributed database in the world, Excel.

So we realized that to solve our data legacy problem, it cannot be solved by technology alone. That's why we enabled a framework that looks into three dimensions: people, process, and technology. On the people side, we leaned in heavily on Conway's Law. Conway's Law states that your technical architecture is a mirror of your organizational structure. When your teams are disconnected, you will end up in data silos. The second aspect, we wanted to foster a more global mindset. I wanted my teams to think globally but act locally.

On the process side, we really needed to come up with global standards, but of course keeping the local agility. It's not really about rigid central control. It's more about establishing a common language that enables us to scale. On the technology side, of course this is the enabler. Switching and centralizing in AWS Cloud radically simplified our tech stack, and of course the more you centralize, the more scalability you also need, and that's exactly where AWS is shining.

When I go back to the people, if you think of 150 engineers and how we operated five years ago, it reminded me very much of kids' soccer. What I mean by that is they're chasing, everybody's chasing the ball. Everyone is going to where the ball is, leaving the goal wide open, right?

We needed a more professional structure with defined positions. That's how we organized ourselves. We had our data engineers as the defenders building the stability. We had our analytics engineers distributing the ball, and we had our data analysts as the strikers scoring the goals and unlocking the business value.

Talent alone doesn't win you a game. You need a team, so that's why we established cross-functional teams that are linked to regions. This forms a matrix structure, and this matrix structure helps us keep our teams highly aligned but loosely coupled. Highly aligned because within the capabilities we make sure that we have strong standards and a technology stack that is common. But loosely coupled because we give the flexibility to these cross-functional teams to decide on what needs to be built.

On the process side, we also needed to enable these teams with common ways of working. These common ways of working are built on three pillars. DevOps is one of the most important ones and also a huge mindset shift. I realized that when we started off and my management came back and asked me, Yannick, when are you going to hand over to the operations team? I said there is no handover. We build it, we run it. That is what DevOps is, development and operations.

If you think about what is the main advantage of DevOps, it's very simple. If you know you need to operate this, you think twice about how you build it. That's for me the big benefit of DevOps. Of course, we followed a lot of automation. Everything is code, and that helped us to have the speed but also the quality that is needed.

The second pillar is we switched to Agile. We unified on two-weekly sprints, and that was also a huge mindset shift for the business. I remember the first sprint when the business came back to me mid-sprint and wanted to change all the sprint goals. I said to them, no, we're not going to change all the sprint goals. They looked at me surprised, saying, well, but I thought you're Agile. We had to educate them that Agile is not chaos. It's actually being very disciplined on your delivery, and you need to have a good plan.

The last pillar is the transparency that is needed if you're leading a big organization. We really unified on the project management. We centralized the tracking and the project management in a single source of truth, because otherwise you cannot operate on a portfolio of that size. I also like the quote here, if you want to build great products, we need great people. If you want to attract and keep great people, we need great principles.

From a technology perspective, the biggest shift actually was going from ETL to ELT. In the old legacy world of on-premises, storage is extremely expensive and compute is scarce. The way we did it was we extracted the data, we transformed it, and then we loaded it into our Oracle system. That was the way, but AWS and the cloud completely changed that economics. S3 is cheap, and Redshift has massive compute available to you. So we switched the principles from ETL to ELT, staging all our data and then processing it with Redshift.

What are the big benefits? There are two. First of all, the speed. If we get new business requests, chances are very high that we have this data already staged in our data lake, so we can start to deliver on the insights and the analytics right away. The second one is the quality. We can really separate the concerns here, EL and T. The data engineers focus on the extract and load part, and the analytics engineers focus on the transform part.

That connects again to the people aspect and how you organize your teams. We are able to parallelize that,

and the people focusing on certain areas can become very proficient in the technology they use and in the work they do, which of course is bringing you the agility and the quality that you want.

Now this is our architecture. It's a bit of an eye chart, but this is how we manage over 300 sources that we bring into our data platform. It follows a very simple principle of ingest, process, and then serve. On the ingestion part, we are a big customer of AWS AppFlow. AWS AppFlow is very great for SaaS applications because it's a managed service from AWS, so you just connect it to your SaaS application and they will take care of everything. Now for the data sources that are not SaaS like DataSUS or PubMed, we needed ways that can customize a little bit more. For that we use Lambda and Glue, so we have built a nice framework around those two services to bring in the data.

We also have AWS Transfer Family because we receive many CSV files from providers. Now AWS Transfer Family is the perfect solution there. It connects to S3. The Glue crawler crawls new files and registers them to the Glue data catalog. So once the files get delivered, we can query it directly as well. Of course, the best data onboarding is the ones that you don't need to do, so that's why we also have with some partners Redshift data share. And also we're switching to zero-ETL wherever possible. We already have for our internal Aurora databases, we already switched to zero-ETL as well.

On the process part, as I said, everything is done in Redshift. We use DBT to orchestrate all that workloads and DBT helps us with version control all of our business transformations, so it's very transparent on what is happening. On the consume side, we also try to pick the right tool for the right use case, so we have many curated dashboards that are in Tableau and it's really great for that. But then also we have ThoughtSpot for our self-service BI offering. Also this natural language interface that ThoughtSpot offered, it helped us a lot to scale, and we are very happy with that because there's a huge adoption on the business side of ThoughtSpot.

Ultimately we also take it one step further because what we also do is we write back insights to Salesforce. So we really want to bring the data and make it actionable and meet the business where they are. So we write back to the Salesforce instance on the business processes that support it. Now I was talking about ELT and in that ELT world, one of the most important services actually is Redshift Spectrum. We call it also the bridge because it bridges the data lake world with the data warehouse world.

And there's three aspects for us that are crucial with Redshift Spectrum. The first one is that you can query the data in the data lake in place. So as soon as new data arrives, you can query it. So Spectrum enables that. The second one is the cost optimization. You can store huge amounts of data in S3 that you never need to bring to Redshift. So you can decide on what should stay in Redshift, what should stay in a data lake. You can find a good balance there with hot data, cold data.

And last but not least, the seamless experience for the analysts. They can combine data within Redshift with the data lake, and I think this will only get more capable now with standards like Iceberg, where in theory you can query data sitting in Snowflake Iceberg, Databricks Iceberg, and combine them all together. We are not using Redshift only for number crunching. We also process a lot of semi-structured data, so we have a lot of free text flowing in from our CRM system.

Now we're a big fan of the Lambda UDFs. What we have built here is a Lambda UDF that is using AWS Translate. And what that enables us is

this free text that is flowing in from the CRM in many different languages, we are able to centrally translate that into English so we can run millions of translations in the database in SQL without the need to build complex data pipelines. So Lambda UDF is really giving you superpowers. The other thing that we also like is the Redshift Bedrock LLM integration, so they are actually connected use cases because that translated text, we run them through the Bedrock LLM in Redshift as well to try to detect adverse events that are put in the CRM system that should not be put in the CRM system. So we try to flag them. Basically, you run LLM at scale, millions of records with a very easy integration.

Then there's also the battle that we realized we cannot win, which is the spreadsheet battle. So if you cannot win a fight, we embraced it. What we have done is we built a Google Sheets add-on that connects to Redshift. How we did that is with a service called Redshift Data API, which is really great because you don't need a driver, you don't need a complicated setup. It's basically an HTTPS interface to Redshift. So we've built a JavaScript add-on in Google Sheets to solve two problems. First of all, the business stakeholders can now load the latest data, and this is all governed so they can have the latest data in their spreadsheet where they feel comfortable. The second aspect is they can also write back to Redshift so we can make this local knowledge available globally. This is great because this data can be reference data, can be whatever, and can then directly be used in dashboards or downstream data pipelines.

Some of my takeaways from the last five years of modernization. The first one is reimagine, don't re-platform. So don't try to do a lift and shift like we did with switching from ETL to ELT. I think if you want to get the most out of the cloud, most out of AWS, it's very important to really rethink. Second one is lead with business value and not technology. We try to work backwards from our strategic goals of having customer 360 and time to value, and all the technological decisions should follow from there. Also, treat modernization as a cultural shift, not just a project. I was talking about people, process, and technology. I think it's very important to work in all of these three dimensions.

Also, building frameworks is important. They should not be a cage, but you need to build standards. You need to have standards, and to find the right standards that can be followed is difficult but important. Invest in your ways of working as much as in your tech stack because the greatest and best technology is worthless if you don't know how to use it. So it's truly important to invest heavily also in the upskilling part. And last but not least, obsess over the developer experience because ultimately you want to have a great customer experience, and for that you need good products. In order to build good products, you need to have happy developers, so that's why for me the developer experience is something that truly matters.

Now, what did this bring for us? I would say we operate now at a massive scale. As I said, 300 different data sources. We could decommission five legacy platforms, and I think also that we run three million Redshift queries per day also shows a bit the scale that we have reached now. Now, what is in it from a business perspective? I think again, we finally achieved what we were looking for with having very fast time to insight. Now we can build new data solutions within days and not months. At the same time, we could finally have this deeper customer understanding that we were looking for because we also connect these 300 data sources. So it's not only we bring them, we can also connect them.

We also reorganized the 150 engineers, and for me importantly, we built a data platform they actually love to use. So we have a solid foundation here for future innovation in order to do now what patients need next. Thank you very much.

Getting Started: Migration Resources and Next Steps with Amazon Redshift

Thanks, Yannick. That was pretty awesome. Yannick has been a great partner for us, particularly for the service team, and helped us improve the product quite a lot. So it's pretty phenomenal what Roche has been able to achieve with Redshift and some of the related technologies. I think between Satesh and Yannick, hopefully we have given you enough data points to get started with your journey around data warehouse or analytics migration and modernization.

The next step you're most likely all wondering is how do you get started. We all know migration projects are not easy. They take time, they take effort, they take resources. So I have some good news there. At this point, AWS has already helped migrate over 1.5 million databases to AWS databases. So we've pretty much figured out the process in terms of how to help you migrate from your existing legacy infrastructure to the cloud and Redshift and related technologies.

If you want to get started, we can help you with the resources in three main areas. Starting with tools and technologies, we have migration tools like AWS Database Migration Service or AWS Schema Conversion Tool that can automate a lot of these conversions. When you saw Yannick talk about converting these data objects into Redshift or moving a lot of these data objects into Redshift, we can automate a lot of these things using these tools, Schema Conversion Tool and Database Migration Service.

Also, from a people perspective, we provide a lot of resources. If you want to start with a proof of concept or a migration pilot, you can work with our professional services teams or our partners, and they can help scope the projects for you. They can help you get started on this journey. And finally, we also offer a number of migration programs. For example, Migration Acceleration Program, which provides you with incentives and credits to offset the cost of running some of these systems together as you migrate from a source to target system. So there are a variety of tools and resources available for you to get started.

I also wanted to leave you with a number of resources around Redshift. If you're interested in learning more about new Redshift features and capabilities, I invite you to visit the Redshift website. We have the link over there. There are a lot of customer success stories other than Roche across a wide variety of industries, financial services, healthcare, gaming, software, and internet available for you to look at. There are a lot of blogs and tutorials. We are really big into self-service and hands-on-the-keyboard type enablement, so there are a number of blogs and demos that are available as well, and some of the QR codes are there, as well as books and all that you can buy and learn more about the technology.

Also, there's a LinkedIn group you can sign up to be updated on new data analytics related announcements. So a variety of resources for you to learn about Redshift and get started with Redshift and some of our other analytics services.

In closing, I wanted to leave you with four key thoughts. First of all, the fuel that makes generative AI or agentic AI sing is really your data. You really need to have a strong data foundation to get the value out of AI and generative AI. What we have seen is Redshift provides the best capabilities in the space, enterprise-grade capabilities around reliability, scalability, availability, some of the capabilities you saw around Model Context Protocol integration, data integration, scaling of the platform, ease of use with serverless, and all for you to get started with your journey.

At this point, tens of thousands of customers like Roche are already using Redshift to modernize their data platforms and take advantage of these AI and generative AI capabilities, and I invite you to get started on your own journey with the tools and the resources that we have shared with you. So thank you so much for joining us today. If you wanted to talk to any of the speakers, we have left their contact information here, and some of us will also be available after the session here. So if you have questions, please feel free to stop by and ask the questions. With that, thank you so much for joining the session and enjoy the rest of the conference. Thank you.

; This article is entirely auto-generated using Amazon Bedrock.