Kazuya

Posted on Dec 5

AWS re:Invent 2025 - Streamline AI model development lifecycle with Amazon SageMaker AI (AIM364)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Streamline AI model development lifecycle with Amazon SageMaker AI (AIM364)

In this video, Khushboo Srivastava, Bruno Pistone, and Manikandan Paramasivan from KOHO demonstrate how Amazon SageMaker Studio streamlines AI model development, from data preparation to deployment. The session covers end-to-end workflows including fine-tuning large language models using SageMaker HyperPod with EKS orchestration, MLflow experiment tracking, and deployment options. Key features highlighted include IDE flexibility (JupyterLab, Code Editor, VS Code remote access), Trusted Identity Propagation for security, Amazon Nova model customization, and the new SageMaker Spaces add-on for running IDEs on HyperPod clusters with GPU sharing. KOHO's case study reveals impressive results: 98% cost reduction (from $1.5M to $26K annually), 15ms latency for fraud detection processing 1M+ daily transactions, demonstrating how SageMaker Studio enables enterprise-scale ML solutions with startup agility across traditional ML and GenAI applications.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Streamlining AI Model Development with Amazon SageMaker AI

Today we will be talking about how you can streamline AI model development lifecycle as well as monitor and do lifecycle management for your AI workflows using Amazon SageMaker AI. My name is Khushboo Srivastava. I'm a Senior Product Manager Technical at Amazon. I manage the Amazon SageMaker Studio product. I've been with AWS for almost six plus years. Along with me are joining Bruno, a Senior Worldwide Specialist Solutions Architect, and Manikandan Paramasivan, one of our esteemed customers from KOHO, who will also talk briefly about themselves.

Hello everyone. Thank you for joining this session today. My name is Bruno Pistone, coming from Italy, Milan, probably from the accent you can hear. I'm a Senior Worldwide Specialist Solutions Architect in AWS for almost five and a half years, focusing on model training and now model customization for large language models.

Hi everyone. My name is Manikandan Paramasivan. I'm a Senior Staff Architect for Data, ML and AI at KOHO. I manage and lead the architecture, infrastructure and operations for our data and ML platforms. Excited to be here to share our story about how we use SageMaker at KOHO.

Awesome. Innovations in generative AI are quickly transforming the business landscape. IDC predicts global spending on generative AI is expected to reach $202 billion by 2028, representing around 32% of the overall AI spending. With a compound annual growth rate of 29%, Goldman Sachs predicts that generative AI can increase global GDP by as much as 7% or almost $7 trillion and lift productivity growth by 1.5 percentage points over the next 10 years.

We're examining the growth trajectory of generative AI. As you can see from our slide, we are witnessing unprecedented growth and adoption rates across multiple fronts. 89% of enterprises are advancing in generative AI initiatives as of right now, with 92% planning to increase investments by 2027. 78% of organizations now use AI in at least one of their business functions. And 77% of organizations choose AI models that are 3 billion parameters or smaller, which suggests a preference for customization and for customizable and cost-effective models.

Achieving these benefits is not without challenges. Let's talk a little bit about challenges that our enterprise customers face. Our enterprise customers are constantly sharing with us a couple of major challenges in ML development. First, disparate and disconnected ML tools are significantly increasing time to market. Teams are spending more time managing tools than developing solutions. Second, isolation between team members is a real killer in productivity and collaboration. Data scientists, AI developers, and business teams often work in silos, leading to duplicated efforts and missed opportunities.

Third, governing AI and ML projects efficiently becomes exponentially more complex as you scale. Without the right framework, security and compliance can become major bottlenecks. And finally, availability and management of infrastructure is key to training and fine-tuning machine learning and large language models. This slide talks a little bit about the similarities between a classic traditional machine learning workflow as well as a generative AI workflow. We always start from preparation of data in a GenAI project, which means format the data into the prompt template or structure of the identified LLM. Second, we select the right foundational model that we want to fine-tune for our use case and with our own data. Third, we run the workload on the compute cluster required. And finally we evaluate and deploy the model.

Amazon SageMaker Studio: A Purpose-Built End-to-End ML Development Platform

Amazon SageMaker Studio

We launched Amazon SageMaker Studio a couple of years ago to address these challenges and help our customers build end-to-end ML workflows. SageMaker Studio provides a purpose-built end-to-end ML development platform where data scientists can not only build and deploy their models or fine-tune their models, but also manage and monitor their AI workflows. Data scientists can select an IDE of their preference from a choice of Studio, JupyterLab, or Code Editor built on open source VS Code.

Data scientists can use SageMaker Studio notebooks for data preparation. They can write data scripts for data generation and preparation, or they can use the built-in EMR connections to run their large-scale Spark workflows for running their data preparation jobs at scale. Finally, they can select from a hub of foundation models, choose one of the pre-built foundation models already available, select a fine-tuning technique of their choice, any fine-tuning techniques or reinforcement learning techniques, or bring in their own model and train it from scratch.

Finally, they can deploy it in production. SageMaker Studio provides a visual interface for deploying your endpoints in production as well. You can manage and monitor all your endpoints, all your models, everything under a single pane of glass. The key thing to note is that SageMaker Studio provides multiple SageMaker AI service offerings within a single platform that you can use for each step of your AI development workflow.

You can also run experiments using MLflow and monitor your experiments. You can also build your pipelines using SageMaker Pipelines within Studio. It could be as simple as drag and drop to build a pipeline, or you could write your own code in the SageMaker Studio notebooks. The flexibility is really your choice. Today, there are tens of thousands of customers using Amazon SageMaker AI. To name a few, we have 3M, Coinbase, Intuit, Domino's, and many others that are not listed on this slide.

Live Demo: From Data Preparation to Model Deployment Using SageMaker Studio and HyperPod

Before we jump into the details of each offering involved in an end-to-end generative AI project development for building, fine-tuning, and deploying LLMs, we would like to see in practice how SageMaker AI can help developers with these fundamental activities. For this, I will hand over the mic to my colleague Bruno. Thank you, Khushboo. So in this demo, we would like to give you a tangible example of what are the main activities that are related to machine learning as well as a generative AI project. In particular, we are going to start from data preparation to fine-tune a large language model and then deploy this model by using different types of SageMaker AI services.

In particular, I prepared this architecture by impersonating two main personas. As a platform administrator, I prepared the environment for development, training, and deployment. In particular, I created a private networking environment as well as deployed the cluster where I'm going to submit these jobs by using HyperPod with EKS orchestration, as well as using the same cluster for deployment. For the development part, I deployed SageMaker Studio, where I can give the possibility to data scientists, engineers to connect, prepare data, and link Studio with HyperPod by using a shared file system with Amazon FSx for Lustre.

So as an engineer, data scientist, or developer, I can connect to Studio. I can select the IDE of choice that can be, for example, JupyterLab or Code Editor. I can start prototyping my code and save the data in the shared FSx in order to have this accessible from the cluster itself. And then once I'm ready, I can submit jobs for training and deployment

by using the HyperPod CLI as well as the kubectl CLI. Once the job is running, I can monitor everything in terms of what is happening, as well as training metrics or system metrics directly in SageMaker Studio by using task governance capabilities as well as to manage the flow. Now we would like to jump directly into the demo. This is the SageMaker Studio user interface. I prepared the JupyterLab space that I already prepared. There are all the necessary components installed. I can connect to the JupyterLab space. What is happening is that in a few minutes it is loading. So here I have my user interface.

I can click on custom file system and select the FSxL. I already prepared the entire content. The first step is the data preparation, but initially we want to install all the necessary Python modules that I need for preparing the data. So I can install a requirements.txt that contains all my Python modules, and I can do this interactively. Once I'm ready, I can restart the kernel and I can start the data preparation piece. I selected a model, Qwen/Qwen3-4B, and I want to improve as a task the possibility to reason and invoke tools. That is pretty important for Agentic AI. Here, I can define all the functions for preparing this data directly in the notebook, such as extracting the tool content, the thinking content, and validating the messages. And after that, I can prepare the dataset with a proper function.

What is going to happen is that this function is formatting the entire dataset into the proper prompt style accepted by the model we will see here with all the specific tags and highlighting what are the main components such as, for example, system prompt, how to invoke tools, the reasoning, and how to generate the answer. Once I'm ready, I can directly upload to Amazon S3, as well as save in the shared file system. So here, for example, I can even access this data directly from SageMaker Studio. Now it's time to start the training activity. So the first thing is to define the parameters for my training workload. So I specify where the data are stored on the FSxL Lustre volume, and parameters such as learning rate and epochs.

Since I'm operating with HyperPod with EKS, I'm actually defining a manifest file which contains all the information surrounding the PyTorch job, such as the number of instances, the GPU type, and Docker image that I want to use. Once I'm ready, directly from JupyterLab, I can open the terminal and start interacting with HyperPod. In this case, I use the kubectl SDK. I deployed the workload, and here I can see that there are two new pods in my cluster. I can investigate the logs of the first one, and we will see that the first step for me is actually to align the same environment that is available on SageMaker Studio with the pod environment by installing the same libraries to give continuity to my workload. I can clear the terminal, and now I can investigate what is the main node, since it is a distributed workload by analyzing what is the main address that is acting in the cluster. So it is the same pod. And now I can connect in order to see what is happening in terms of workloads.

So here the model is actually downloading. Then the dataset is prepared and after a while it is actually connecting also to MLflow because we are going to track all these metrics directly accessible from SageMaker Studio. Once the job starts, we will see a new log that is actually showing the evolution of the epochs. So now we want to move directly in the Studio UX and see directly in MLflow what is going to happen. So I already created a managed MLflow tracking server. I can open it. We will see that there is a new experiment. I can analyze directly in the graph the metrics that I want to collect for training metrics as well as system metrics. In this case, for example, for the first node, I'm also analyzing the GPU utilization for this specific node, how it is evolving during the time. But if you want to actually have more details around what is happening on the cluster directly in SageMaker Studio, I can actually access information on the cluster on the HyperPod cluster, and in particular under compute HyperPod, I can select the available cluster and see the tasks that are ongoing.

For example, here there is a new task that is running. Under metrics, I can actually see the evolution of utilization, for example for the GPU and CPU, the total number of GPUs available, as well as the evolution of utilization over time. Now I'm waiting for the completion of the job. Once it's finished, I can see that the pods are completed. The same thing is actually available also on SageMaker Studio in the console, so the job and task have actually succeeded.

The model is actually available directly in Studio, so I can even use it for offline evaluation, which is actually pretty important since there is this shared FSxL. Now I want to deploy the model. The first thing is to copy this model into Amazon S3 in order to make sure that this deployment is accessible and the model is accessible for the deployment component on HyperPod. This is the manifest file which describes the deployment, so it is saying where the model is located, as well as what instance type I want to use and the container image. In this case, I want to use a pre-built one that is the LMI container offered by SageMaker.

In a similar manner, what I can actually do is deploy this component from SageMaker Studio. I clean the space and apply this new deployment YAML file. This deployment is actually creating different components such as, for example, the inference endpoint configuration which describes what we saw in this manifest file. It is really important because in case you have multiple endpoints and you want to understand what is happening, this is actually the description of what the configuration is for the specific endpoint.

As we can see, there are the same kind of information. Now I can understand the pods that are available, so there is a new pod that is under deployment. I can describe it, and what is happening is that there are multiple images that are created in order to have, for example, the web server up and running in order to expose this endpoint. I can go in Studio under deployment endpoints. I will see that there is this new endpoint that is under creation. I can refresh the page, and once it is in service, I can start interrogating my model.

Here I prepared a notebook that is directly using the SageMaker Python SDK. I define the system prompt in order to ask the model to use the reasoning part within these tags. Just to repeat, this was not actually a reasoning model, this was an older version. Once I'm in the notebook, and the prompt is "Say hello to re:Invent 2025," we will see that the model is now improved because it is actually now following what I want to do. There are the think tags, as well as we will see the generated answer that is actually reflecting what is my specific task.

Data Preparation Options: Interactive Notebooks to Distributed Processing

So as you probably understood, there are different phases that are really important for machine learning as well as for a general project. The first step is the data preparation piece. For the data preparation piece, there are multiple ways in order to prepare our data for machine learning. We saw in the previous demo that we used an interactive approach directly in the JupyterLab notebook. So as a user, I can connect to the IDE of choice in SageMaker Studio, JupyterLab, or Code Editor, install whatever libraries I want, and start the preparation piece with also the possibility to select multiple instance types in case, for example, I need more computational power.

But in case I have a lot of data and I need to probably distribute this workload for data preparation, I can actually connect JupyterLab notebooks to EMR servers, EMR servers that can be self-managed. So basically you manage the EMR server as well as Amazon EMR servers. So there is the possibility to actually use the serverless utilization of Amazon EMR and prepare this data in a distributed manner directly in the notebook by using the Spark framework. But if you want to actually operate more in a programmatic approach in a synchronous manner, so have this step into a proper pipeline for machine learning, we can use a service of SageMaker AI, that is Amazon SageMaker Processing, in order to create a job that can use whatever framework you want, so custom images as well as pre-built containers, for example for Spark, and make this step as part of your pipeline.

Model Training and Customization Techniques for Generative AI

The second step, after the data preparation piece, is the training of a machine learning generative AI model. Specifically for generative AI, when we are talking with leaders about generative AI projects, there are new challenges and new questions that may come up. For example, which model should I use? You probably ask yourself this question every day. The open source world is releasing new models constantly, so which is the right one for you to use? And now you can access these types of models.

The second question is, how do I customize my model? There are different techniques that you may want to apply based on your business goals. And of course, how can I optimize my training performance? In the next slides, we are trying to answer all these questions. From a technical approach, there are different types of techniques that we want to apply based on your business goals.

If I want to adapt a model to a specific industry, what does that mean? It means having a model that knows the specific terminology of the industry you are operating in, such as healthcare and life science or financial services. In this case, we are using continued pre-training activity. What does that mean in terms of data? We are taking text that can be, for example, a PowerPoint or a Word document, and we translate it into a machine-readable format, such as a TXT file. We then feed the entire document into the model itself. The task for the model is to predict the next token, which is basically the word or combination of words, based on the entire context that is passed to the model.

In case we want to teach the model a new task or improve the capability of an existing task, like in the previous demo, we are using supervised fine-tuning techniques. With supervised fine-tuning techniques, we are using a dataset named labeled data. A labeled dataset means that we have an array of messages where there is an alternation between user and assistant. The user is the user prompt that you want to feed, so the user question, and the assistant is the expected answer that the model should generate. We are instructing the model that given a user prompt, the expected answer should be that one.

Then there are techniques named post-training techniques that extend into the preference alignment part. Preference alignment can also be related to reinforcement learning. It means that we want to align a model to be more human-like in the sense that we want the model to generate answers that are more similar to what a human can generate. We are applying different types of reinforcement learning techniques. This is an example of a dataset where we are providing input to the user prompt and giving the model an example of what is a good answer, but we are also giving the model an example of what is a bad answer. In this way, the model will learn how to distinguish between the two.

Training Infrastructure: SageMaker HyperPod and Training Jobs with HyperPod Recipes

How do we do this on SageMaker AI? In the previous demo, we actually used a self-managed cluster with Amazon SageMaker HyperPod. With Amazon SageMaker HyperPod, we are talking about a purpose-built, resilient, and self-orchestration infrastructure for maximized resource control. In this case, we have full control of the cluster that can be orchestrated with Amazon as well as with other tools, and we have advanced capabilities for task governance and organization in order to organize, for example, the execution of tasks based on the policies that you want to define.

But in case you want to focus more on the machine learning part and do not want to manage a cluster, we can use SageMaker training jobs, which is a fully managed, resilient infrastructure for large-scale and cost-effective training. With training jobs, we prototype our code. Once we want to submit the job, we invoke an API by specifying the instance type and the number of instances, and SageMaker AI takes care of everything. It turns up the infrastructure, executes the job, and once the job is finished, the infrastructure is turned down. You are paying in this on-demand approach only for the amount of time that the job is under execution. But of course, if you have workloads that are more predictable, so you know when to use and how many instances to use, there are options such as flexible training plans or spot instances where you can reserve some capacity upfront.

The observability part is actually very important, as we saw in the previous demo. In SageMaker, and in particular in SageMaker Studio, we have many options to accomplish this. For example, with SageMaker HyperPod we can use task governance capabilities within Studio to analyze which tasks are under execution. We can also define custom policies to prioritize specific tasks based on the team or the priority that we want to give to the workload.

In a similar manner, we can do the same thing with training jobs. We can analyze the metrics related to the cluster used for the training job itself and orchestrate the execution of these jobs by using, for example, the connection between training jobs and AWS Batch. Regarding which model to use, last year during re:Invent 2024, we released a capability named Amazon SageMaker HyperPod Recipes, which is a curated, ready-to-use set of parameters for open source models such as Deepseek, Meta Llama, and Mistral, as well as first-party models. Starting from the New York summit of this year, we also have the possibility to customize Amazon Nova models with SageMaker AI.

A recipe is a collection of parameters that are preconfigured and related to the model itself. We basically just specify the recipe that we want to use without writing any code, and SageMaker AI takes care of the execution of the workload. The recipe is available for both SageMaker training jobs and SageMaker HyperPod. For training jobs, we can use the SageMaker Python SDK to specify the recipe we want to use. In a similar manner with HyperPod, we can use HyperPod to submit those jobs.

Deployment Options: Managed Inference and HyperPod Cluster Deployment

Regarding the deployment part, we have two options to deploy those models. The first one is to use SageMaker managed inference. If we want to stay in the area of a fully managed approach, SageMaker managed inference offers the possibility to use fully managed real-time endpoints with automatic scaling. We can define auto scaling policies to scale up and scale down based on the spikes you receive during the day based on the requests to the model itself.

If we want to maximize the utilization of the cluster we used for the training workload, we can reuse the same HyperPod cluster as we did in the demo and deploy those models in the same HyperPod cluster. In particular, to accelerate the deployment of open models, since the New York summit of this year, there is also the possibility to deploy directly from SageMaker Studio in literally a few clicks open source models directly from the Studio interface by selecting the HyperPod cluster that might be up and running within your account.

Recent SageMaker Studio Launches: Remote IDE Access, Trusted Identity Propagation, and Spaces on HyperPod

Now I'm calling back Khushboo to discuss the recent launches that we did. Thank you, Bruno. All right, so let's talk about some of the recent launches in Amazon SageMaker Studio. Before we start, I just wanted to reiterate that Amazon SageMaker Studio provides you a suite of IDEs. AI developers can pick an IDE of their choice. These are all fully managed IDEs, and the three key IDEs that we provide are JupyterLab, a web-based IDE for notebooks, code, and data with a flexible and extensible interface that allows you to easily configure ML workflows.

Code Editor is based on open source Code OSS and boosts productivity with its familiar shortcuts, terminal, debugger, and refactoring tools. RStudio is a fully managed IDE for R with a console, a syntax highlighting editor that supports direct code execution, and tools for plotting, history, debugging, and workspace management. Remote access from local IDE is available for AI developers who prefer to operate and develop code in their local IDE such as their local Visual Studio Code IDE but at the same time want to benefit from the compute infrastructure defined for SageMaker Studio.

We recently released the capability where you can connect your local IDE from your SageMaker Studio Spaces. With remote connection to IDE, we leverage SageMaker AI's powerful compute resources and your data to analyze and process data, develop AI models while maintaining all the existing SageMaker Studio security controls and permissions without any additional configurations required.

With AWS Toolkit integration, your Visual Studio Code IDE can show all your SageMaker Studio Spaces in the left navigation. As you see in the GIF here, all you need to do is enable remote access, open it in Visual Studio, and there you go, you have your spaces right there.

Trusted identity propagation is a feature of AWS Identity Center where you can send the end user's identity, which is the human user identity, across all the workflows within Amazon AWS services. Let's say a user logs in using AWS Identity Center into their Amazon SageMaker Studio and opens SageMaker Studio notebooks to connect to downstream AWS services such as Redshift, EMR, Lake Formation, or ML services such as SageMaker Training, SageMaker Processing, or SageMaker Inference. Their human user identity or the corporate identity defined in IDC will get propagated through all these workflows and will get logged into CloudTrail logs.

As an admin, you can now go ahead and audit who accessed which resources. Not only that, if you want to apply fine-grain access controls using S3 Access Grants, Lake Formation Access Grants, or Redshift Data APIs, you can do so. You can say that a user should have access to these S3 buckets only, and SageMaker Studio will honor that thanks to the TIP integration.

Early this year, we released the capability where users can customize Amazon Nova models on SageMaker AI. Everything from SageMaker Studio, users can explore multiple Amazon Nova models such as Nova Micro, Light, and Pro. They can select their preferred customization techniques for their use case, such as supervised fine-tuning or reinforcement learning techniques. They can open a pre-built notebook to start customizing their workloads.

On SageMaker Training jobs or SageMaker HyperPod, as you see, this is an example where you can open a sample notebook and it drops you into a notebook where you can do the customization.

You can now maximize your CPU and GPU investments across each stage of the ML lifecycle by running IDEs and notebooks also on Amazon SageMaker HyperPod clusters. Last week we launched a new capability which lets you install a new add-on on your HyperPod EKS clusters. The name of the add-on is Amazon SageMaker Spaces. A space can be thought of as a self-contained entity where you can define all the configurations such as image, compute resources, local storage, any persistent volumes, and even more configurations that are required by AI developers to run their IDEs.

For AI developers, this means accelerating GenAI development. Now AI developers can launch their JupyterLab as well as Code Editor IDEs on web browsers or connect their local IDEs.

Such as local VS Code IDE to run notebooks on the HyperPod compute cluster. This means they can now run their IDEs on the same persistent clusters where they have their training and inference workloads using their familiar tools such as HyperPod CLI or sharing data across interactive AI workloads and training jobs using the already mounted file systems such as FSx or EFS. It provides faster IDE startup latencies with image caching and supports idle shutdown.

AI developers can now maximize cluster utilization. We support mid profiles for GPU sharing, so AI developers can bin pack multiple spaces onto a single instance. Not only that, they can also bin pack their multiple spaces on a single GPU, which means they can run IDEs on fractional GPU as well. For the admin persona, this means unified governance and observability. Administrators can now leverage SageMaker HyperPod task governance to efficiently utilize GPU investments across diverse workloads, and they can view memory, CPU, and GPU usage across diverse workloads and reprioritize using SageMaker HyperPod task awareness as well as HyperPod observability.

Detailed Demo: Installing and Using Amazon SageMaker Spaces Add-on on HyperPod Clusters

Now we're going to see a brief demo for the capability of Amazon SageMaker Spaces add-on. Before we get into the demo, I wanted to quickly call out a few things that we will be seeing today. First, we will see how you can go ahead and install Amazon SageMaker Spaces add-on on an existing HyperPod cluster. You can also install it on a new HyperPod cluster. There are two ways to install it. Quick install comes with automatic defaults, and custom install is where you can provide your own custom install. It also comes with automatic defaults, but you can override those defaults with your own configurations.

We discussed what a space already is. It's a self-contained entity where you can define different specs and configurations that will be used to run your IDE on HyperPod. I also want to briefly talk about templates. Templates are a mechanism that you as an admin can use to create default templates for defining some default configurations for your Amazon SageMaker Spaces. For example, if you want teams to use a specific set of custom images or if you want your teams to bring in their own images, if you want to define already mounted file systems, local storage, you want to define the range of the local storage from minimum to maximum, you want to define compute resources and default resources required to run a space. Everything could be done using templates.

We provide you two default templates, one for JupyterLab and one for Code Editor, but you can go ahead and create your own template and then mark it as a default. The last thing I wanted to call out is that your AI developers can access their spaces in a visual interface as well. How can they do it? There are three ways they can do it. One is using a web browser. In this web browser, the admins provide a custom DNS, and all that the AI developers have to do is run a HyperPod CLI command to generate a web UI URL. If they click on it, it opens their JupyterLab plain vanilla IDE on their web browsers on the custom DNS provided by the admins.

The second way to access their spaces is through their local IDE VS Code. Again, as an admin you can choose to define that you want to enable remote connection. If you do, then all that your AI developers have to do is run a HyperPod CLI command to get the URL or a string that you can then click and it directly prompts you to open your VS Code IDE and opens your space directly in the VS Code IDE. Let's say if your local port forwarding feature which will open up your plain JupyterLab IDE on your web browser. So let's get into the demo.

So we click on the custom install. Here you see options to create remote access configuration, which is connecting to local IDE as well as web browser access configuration, where you provide a custom DNS and a valid certificate. You provide your EMS key. And hit install. Awesome. So your add-on is now installed on the ML training cluster.

Now you can see two default templates that we provide by default. We will now create another template for your end users. You can choose the application type. If you have task governance enabled for this cluster, you can select a pre-existing task governance priority label to assign to all these spaces created using this template.

Here is where you provide your images. Please remember that you can use SageMaker distribution images, SMD images, as well as provide ECR repos. Your data scientists can bring in their own images. This is where we define the storage. The storage is built using local EBS. You define your compute, GPU, CPU, as well as memory per space. And finally, you define a lifecycle configuration script that you want to use to run your spaces, so you can install any custom package that you need to. And you have your template ready. Now you can go ahead and make this new template as your default template.

Now let's see how you can set up namespaces as well. Here what we will see is how to create a new namespace or use an existing namespace. For existing namespaces we will also show you task governance namespaces. You can then go ahead and create EKS pod identity associations, which are the service accounts with which your space will run. These, once you create the service accounts, you can assign a runtime role to these as well.

Now that you have defined some service accounts and their runtime roles, you can go ahead and create users. You can create users and groups. When you do so, you can assign them one or more service accounts, which means when a user is running a space in a given namespace, they will be able to use a set of service accounts to run their pods or spaces in this scenario.

Awesome. You can also see a list of spaces that the users are running on the HyperPod cluster. As an admin, you have the controls to manage these spaces. Let's say you want to stop a space, restart a space, or if the user leaves the company and you want to delete a space from consuming any further resources, you can take those actions directly from the console.

Now we will go ahead and see how the data scientist persona will create spaces. What we see here is the data scientist persona running the help command. And finally, creating a space with the name data-science-space and using default configurations. If you list,

you see the space is running and the status available is true. If you describe the space, you can see all the configurations or specs of the space. The key thing to note is that data scientists can use the default configurations, but they can also override or customize the space's spec. If you stop the space, and then list the space, you will finally see that the status has changed from available to true to false. All right, let's start this space again. And we see the status changing to true. As we talked about, data scientists can update the space too. In this scenario, we're updating the memory, compute resources, as well as the display name. So once the update is done and you run a describe command, you'll see that these specs of the space have been updated. You can get logs for your space as well, if you want to deep dive into a specific scenario or troubleshoot. Finally, you can access this space, as we talked about, using a web UI URL. If you click on this URL, it directly opens up your space as a plain vanilla IDE on your local web browser. You can run a delete command as well, and then if you list it, you will see that the space doesn't exist.

All right, with this we conclude the demo. One thing that we haven't covered here in the demo is how you can generate the web UI URL or the VS Code connection URL that can directly open, so if you click on those URLs it will prompt you to open your VS Code IDE or open up your IDE space in the local web browser. Awesome. With that, we will talk about how SageMaker AI has helped KOHO accelerate their ML journey, and I'll give the mic to Manikandan. Thank you.

KOHO Case Study: Achieving 98% Cost Reduction and Enterprise-Scale ML with SageMaker Studio

Hi everyone, can you hear me okay? Alright, cool. Thank you, Chris, and thank you, Bruno. Quick intro about me, Manikandan Paramasivan. I'm a Senior Staff Architect at KOHO. I lead the architecture, infrastructure engineering, and operations for our data and ML platform. All that, plus I'm also a busy dad to a six-year-old daughter. I'm excited to share how we use SageMaker Studio at KOHO.

First, a quick context about KOHO. KOHO is a Canadian fintech founded in 2014, with a mission to provide better financial solutions for all Canadians. Today, we have over 2 million customers using our products across banking, credit, and lifestyle. We have products ranging from checking and savings accounts, line of credit, buy now pay later, insurance. We even have eSIM for international travelers. We are about 230 employees in total. That's a startup-sized team, but we are facing and growing at enterprise scale and building enterprise-scale solutions. So that's exactly where SageMaker Studio became critical for us, that friction of startup resources with enterprise demands.

Let me start with the challenges we faced. First, vendor costs. Like many startups, we started with different vendors for different machine learning use cases. We used to spend about 1.5 million dollars just for our real-time fraud reduction use case alone. Because typically, vendor-based solutions charge based on API calls. As we grow, as our customer base increases, as our transactions volume increased, our costs exploded. Next, performance requirements. In real-time use cases like fraud detection, we cannot compromise on speed.

We need to make the fraud decision and return the response within 50 milliseconds as our customers are waiting at the checkout using their card. Next, security and data access. We are a regulated financial institution, so we wanted to make sure our customers' data is safe. We wanted to make sure the data stays within our VPC. At the same time, we wanted to give secured access to all this data in our warehouse and lake to all of our data scientists for experimenting and building models.

Finally, we needed an end-to-end ML platform. We are a lean data science team who cannot afford to spend time stitching together different tools to do our job, so we needed a single IDE to access data, engineer features, train models, and deploy and operate in production. Those are the problems and situations we were in.

I'll be showing you how we use SageMaker Studio in our development environment. With vendor cost as a primary problem, we started our in-house model development journey back in 2023 with SageMaker Studio as a core foundation. Today we have over 20 data scientists and ML engineers using our domains across four different teams: Fraud Risk, Credit Risk, Marketing, and our own Platform team.

JupyterLab is the primary IDE most of our engineers use, but some prefer Code Editor as well. Studio gives both options. When our data scientists log in to our internal SSO portal, they get redirected to the team-specific Studio domain where their own individual JupyterLab environment is provisioned and everything is ready for them. Everything is already connected, and they get to access the data in the lake and warehouse securely.

They get to access Glue and EMR and SageMaker processing jobs for data processing and model development. Every team has dedicated S3 buckets for storing intermediate data, and access for all this is managed through IAM. From a platform perspective, we manage and provision all this through infrastructure as code using Terraform. When we want to onboard a new data scientist, all we have to do is submit a single PR to create a user profile, and they're ready to go.

That's how we set up our development environment. Now let me show you how we deploy models in production. This is a complete end-to-end ML platform architecture. I know there is a lot here, but I wanted to show how this all connects end to end. It all starts with SageMaker Studio as a model development environment. This is where our data scientists explore the data, engineer features, experiment with different models, and finally, when they're ready to move this to production, we use AWS managed Airflow as our MLOps pipeline orchestration engine.

These pipelines trigger the SageMaker processing job for feature engineering at scale and trigger training jobs for model training. Everything gets versioned and registered in the model registry. It's the same pipelines that we built for model evaluation and model monitoring, our scoring pipelines, and rollout pipelines for different models. When we want to deploy these models for real-time serving, we use SageMaker endpoints, which is a key component in meeting our latency requirement of 50 milliseconds.

The entire ML platform is built on top of our core data platform, which is S3 for raw data storage, Redshift for warehousing, and we use SageMaker Feature Store for storing and serving ML features both in online and offline mode. SageMaker Studio is not only giving us the IDE for model development, but it's also giving us visibility of everything you see here in this diagram. From the same IDE, our engineers can explore all the existing features in the Feature Store, look at all the training jobs and their history, their experiments and results, all the models that are registered, and all the endpoints that are deployed, all from one single IDE.

They don't have to jump between different consoles to gain visibility into their ML lifecycle. Everything is deployed within our VPC, and the data never leaves our boundary. That's our complete architecture: how we build models, how we deploy in production, and how we monitor and operate.

Now let's talk about results and cost, which is our primary problem. We went from spending $1.5 million per year with our vendor to just $26,000 per year with SageMaker. That's a 98% cost reduction. With over $1.47 million in annual savings, this is huge for a startup and any company for that matter.

We are processing over 1 million transactions per day with just 15 milliseconds average latency, way below our budget of 50 milliseconds. Because we build the models on top of our own data from our data platform and our own custom ML platform, we get to improve the model's accuracy and reduce false positives in our fraud detection use case. That means less dispute claims and more happy customers.

We scale the same solution across many different use cases. We're building models for underwriting our loan products, for predicting user churn, and for marketing use cases like LTV and more. Now we are moving to GenAI and LLM-based applications and solutions as well.

The way SageMaker Studio helps us is that our data scientists and ML engineers get to access foundation models through JumpStart. That's where we access these foundation models for prototyping, test our prompts, compare our prompt results, compare different models, model distillation, and fine-tune as needed for that particular prototype. Once prototyping is complete and we prove the value, then we take those solutions and deploy them in purpose-built tools and infrastructure like EKS, for example, or Bedrock directly with API calls to Bedrock in some cases.

With that SageMaker Studio-first approach, we have many use cases in production today. Our compliance team is generating anti-money laundering reports faster. We have enhanced merchant categorization that drives the accuracy of our customers' cashback percentage. We are analyzing our customers' feedback faster and prioritizing our product requests and roadmap based on those insights.

The key takeaway is that SageMaker Studio is a single IDE that can help with both building, deploying, and operating traditional machine learning models, and also experimenting, prototyping, and building GenAI and LLM applications. SageMaker Studio gave KOHO the power to build enterprise-scale and cost-effective solutions with our startup agility.

Conclusion and Resources for Getting Started with Amazon SageMaker AI

Before we conclude the session, we would like to share some material related to the official Amazon SageMaker AI website, as well as customer quotes and references on how SageMaker AI helped across multiple types of use cases. We also have some links related to workshops that you can start with for understanding how SageMaker AI works and test all the different capabilities.

Before we complete the session, I kindly ask you to rate the session on the app. This is really important for us to improve the content of future sessions. I would like to thank you again for attending this one and hope you can enjoy the rest of the day and the rest of re:Invent.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community