Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - New Era of Platform Engineering – Agentic AI-Powered Self-Service (AIM359)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - New Era of Platform Engineering – Agentic AI-Powered Self-Service (AIM359)

In this video, Ruslan Kusov, Cloud COE Director at SoftServe and AWS Ambassador, presents how platform engineering evolves with agentic AI integration. He explains SoftServe's adaptive modernization framework built on AWS services like EKS, ECS, and Lambda, emphasizing customizable integration interfaces over one-size-fits-all solutions. The presentation highlights ADQ (AI-driven enhanced engineering with Amazon Q), which combines self-service platforms with AI agents, Amazon Bedrock, and MCP servers to automate end-to-end SDLC processes. A live demo showcases migrating on-premises Java applications to AWS using containerization and managed services, demonstrating how AI-powered self-service enables re-platforming and re-architecture strategies at lift-and-shift costs, achieving 2-3x faster migration times with minimal cloud knowledge required.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Platform Engineering Fundamentals: From DevOps to the Adaptive Modernization Framework

Hi everyone. My name is Ruslan Kusov. I'm Cloud COE Director at SoftServe, an AWS Ambassador, and currently I'm losing my voice. I have one more presentation today, but I hope everything will be fine, at least for this presentation. Today I'm going to tell you about a new era of platform engineering, particularly platform engineering powered by agentic AI and self-service with agentic AI.

Let me start with the concept of platform engineering itself. I believe most of you may be familiar with this, but if not, it all started with DevOps. Remember DevOps: tools, people, and processes, and how we connect those things together to build something useful for our developers. It evolved at some point in time into platform engineering.

People started building what they call an internal developer platform, a self-service platform. The idea is that if you have a large development team or multiple teams, like ten different teams, you would like to introduce some set of standards. Otherwise, you will end up in a situation where one team develops their application using one tool, another team uses a different tool, one team uses Datadog, another team uses CloudWatch, and so on. This introduces additional operational overhead and costs. To avoid that, you want to standardize and build an abstraction layer, a platform that can be used as a self-service, so you just follow the standard and write your code. This is pretty good for developers, and that's why the concept became popular.

I really mean it because we researched different sources, worked with multiple customers, and observed this originally as a system integrator. We compared our data with data from Red Hat, and as you can see on the slide, 85% of customers are moving towards platform engineering. Organizations are discovering or already using platform engineering. As you can see, 18% have advanced it, 14% are just exploring, and 27% to 41% are somewhere in between.

The process itself, from our experience, takes about two years. So if you're right now at the exploring stage, you need about two years to introduce mature practices of platform engineering in your organization. The problem is that there is no one-size-fits-all approach. It's a complex product that depends on your organizational culture, the way you organize processes within your organization, and what tools you're already using. As I mentioned previously, it's about setting standards without introducing unnecessary new tools.

How can we approach this case? At SoftServe, we decided that it makes sense to build a framework. Instead of telling you that you should buy a platform engineering solution, we believe you should build a platform engineering solution that follows all your needs. We introduced our own framework based on our experience, our interaction with clients, feedback we received, and collaboration with AWS.

This ended up with a concept where we have core components: EKS, ECS, and Lambda, because we're talking about microservices architecture. We have all other components that complement this microservices architecture, like microservices runtime, observability components, security components, and CI/CD because we need to build containers, test them, and deploy them. We also have storage, databases, and third-party services. Of course, we have the Internal Developer Portal, the IDP, the tool that serves as your interface, the developer's interface to interact with this platform.

The idea is that all you need to do is design integration interfaces. Understand that if today you're going to use HashiCorp Vault and tomorrow you're going to switch to Secrets Manager, you're not going to do anything with your code. You just update this integration interface, the driver, for example, and you will get a new tool securely connected to your environment and your applications. It's a pretty cool idea. We call it adaptive modernization platform, but it's not a platform, it's a framework. I know things sometimes confuse people, but we're trying to follow AWS in naming our services.

We started with Blueprints. As you can see, it's all Kubernetes reference architecture related to EKS Blueprints released a couple of years ago. We have multi-availability zones. We have cluster components split by critical components and worker components. We have those environments isolated. We have best practices like Karpenter instead of auto scaler to better manage efficiency and scaling capabilities of the cluster.

If we dive deeper, we also introduce some default components because, remember, not everything can be solved with Kubernetes alone.

You should think about observability, security, CI/CD, and other things. As you can see, we built this framework considering those other components, but they are built in a way that these components can be easily substituted. If, for example, you don't want to use ArgoCD for deployments because you're currently using Jenkins and you're absolutely fine with that Jenkins, you can proceed with Jenkins. We'll give you a framework that will tell you what gaps you have in platform engineering, how you can fill these gaps with our default solution, and also a framework that will show you how to apply these platform engineering processes and practices for your organization.

The AI Evolution: Connecting Generative AI with Platform Engineering Through ADQ

It worked really well until 2023, the year when people started experimenting with AI. The interesting thing is that in 2023, a lot of people did not realize that AI was not something new. AI was born in 1956 with a small workshop. It evolved over the years, and 2023 was the year of experiments and the year of generative AI. I believe everyone had experiments with ChatGPT, with chatbots, whatever—those are common use cases. However, that was hype that distracted people a lot and distracted our platform engineering practices as well.

The next year was the year after hype, when people realized that they had created a lot of chatbots and a lot of experiments with AI, but how do we move that to production? In 2025, we're still not in production, and you can join my second session where I will show you some specific numbers about that. But 2025 is the year when we're trying to identify the business value of AI and generative AI, and we are trying to build a business case for our customers in order to move them to this AI era. Suddenly we figured out that it's very well connected with the platform engineering use cases.

To be honest, I was not surprised, but it was an honor for me this morning to participate in the keynote from Matt Garman, and he shared the same idea that in terms of SDLC, it is an end-to-end process. You cannot just take an agent for testing, an agent for business planning, an agent for UI design. If you take just a single agent and don't care about the end-to-end SDLC process, you will fail and introduce yet another bottleneck.

Let me explain how it works. Imagine that you have a lifecycle or SDLC cycle every two weeks with a new release every two weeks. You introduce testing agents, so now you don't need two weeks to test all your code. You can test it within hours. But your developers cannot develop the code within hours; they still need two weeks to develop new code. So you'll be waiting for the code that should be tested. Deployment environments cannot be prepared in hours or days. We still need two weeks to prepare those environments. So instead of improving the process, you will introduce yet another bottleneck to that process.

In order to avoid that, you need to have an end-to-end solution. We connected that with platform engineering. What if, considering this as a common case or one of the best business cases for organizations to step up into this era of AI and generative AI, we connect that with platform engineering, one of the most mature cases that we had previously? We decided to extend the idea of the self-service portal with AI. That's how we released ADQ, or AI-driven enhanced engineering with Amazon Q. Don't pay attention to the names; we are following AWS best practices in naming our services.

We released this idea of self-service powered by AI, and as you can see, it's all based on components that are either available like first-party agents from Amazon—Amazon Q, Amazon Bedrock agents, AWS Transform. Pay attention to agents that AWS released today, especially the security agent and DevOps agent. That's something that we tested in preview mode, and it worked really well as an extension to this. We also have MCP servers, custom agents, reusable prompts, and LLMs. With LLMs, it's the same situation as with engineers. You have two different engineers with their background and experience, and if you ask them to do the same thing for you, one engineer can spend one hour and another engineer can spend two hours. You can get different results. Both will work, but one result will be more cost-effective and scalable, and another one less secure, for example. It's the same with LLMs. That's why it's important to choose the right LLM and provide this power of choice to developers. This is yet another iteration of platform engineering.

Real-World Applications: AI-Orchestrated CI/CD and Kubernetes Deployment Validation

We just introduced an enhancement for the self-service that brings AI capabilities for developers. The idea of self-service is to introduce a nail gun. If you're a constructor building a new house, you can build it with just nails and a hammer, but that will take months. Alternatively, you can use a nail gun for the same purpose and build a new house within a week or less. The same principle applies to platform engineering.

Let me share a couple of real cases before I jump to the live demo. The first case is one we implemented for one of our customers, where we orchestrated the deployment CI/CD flow with the help of AI and MCP servers. It's a pretty simple case where a developer needs to know only about their programming language. For example, I know Java, so I create my application in Java. After that, I commit to the repository, and everything else is done by AI. The AI detects the programming language, creates the Dockerfile, builds a container from that Dockerfile, and deploys to Amazon using a manifest generated by another agent. It returns the deployment status. If everything is okay, I get a notification in Slack that the deployment was successful. If something goes wrong, I get a notification with analysis from MCP servers for CloudWatch and information about the potential root cause of the deployment problem.

The second use case is pretty similar. Here you can see an example of a model that we use. We use Claude 3.7 for that. The idea is that we built a tool for validation of deployments to Kubernetes. It includes validation of application components and validation of cluster components. By the end of that process, the customer receives a validated Kubernetes deployment. After a production release, the customer can be sure that everything was fine and no issues happened during deployment. In case of any issues, there is a rollback process with notification of the root cause for that failure. It works really well in production environments, and we saved about three months of time for the customer by implementing that instead of rewriting the tool they built originally using Go.

To highlight, we're not going to build a product because it doesn't make any sense. We built a framework. We don't have any proprietary software or intellectual property here, but we do have a framework that helps accelerate deployment, accelerate application development, application modernization, and migration, and increase development productivity. This framework follows all the best practices for security, higher reliability, scalability, and cost efficiency, so it's very well aligned with the Well-Architected Framework.

Live Demo: Modernizing Legacy Applications with AWS Transform and Agentic AI Self-Service

Now it's time for the demo. This is a live demo, so I apologize if it doesn't work, but I'll do my best. This demo is about migration, which is an example of self-service. Yes, this is an example of self-service that can be done for migration. Let's imagine that I'm running my data center, and as you can see, it's connected through a VPN. I have my internal IP address for that. I can create some stuff here. It adds to my to-do list, so it's a well-functioning application. The architecture is pretty simple. I have a virtual machine with a load balancer, two virtual machines with the application, and one virtual machine with the PostgreSQL database. I'm going to move it to AWS Cloud. I'm going to use the benefits of AWS Transform Service, which is a service that allows me to run assessments, get dependency mapping, qualify my services, and run right-sizing exercises. But Transform works well for lift and shift.

Unfortunately, lift and shift is a strategy for lazy migration. It's the best strategy to introduce technical debt, but not the best strategy to run in the long term. With that, customers prefer to find a way to modernize, to introduce re-platforming and rearchitecture at the beginning. So the question is: is it possible to perform re-platforming and rearchitecture with the cost of lift and shift? With this demo, I'll show you that yes, it's possible, and that's because of the self-service that we provided to our developers.

On the left-hand side, that's the architecture I have right now. On the right-hand side, that's something I'm going to build. Instead of a virtual machine load balancer, I'm going to use a managed service RDS with the primary and replica setup, Multi-AZ deployment, and my application converts to containers.

I containerized it and deployed it to EKS. The application itself is a Java application, and what we built is an agent with a fancy web UI that I added for the purpose of this demo. This agent can be integrated into the IDE of your choice, whether that's Visual Studio Code or whatever else you're going to use. It can be easily integrated.

I'm going to show you that I have my servers, and I have my servers discovered by AWS Transform. I saved this file earlier, so I'll probably skip this part with the AWS console, but that's a file generated by AWS Transform agents. With AWS Transform, I used its discovery tool to get this CSV file. This CSV file identified instances, their IP addresses, and roles, including client, database, and load balancer. According to that information, I need to log in again to my console, so I'm not going to show you a visualization, but we can chat offline after this presentation.

Let me go back to this tool that we built, this agent that we built for modernization. All we need to do is upload our files, download the CSV file and the artifacts for this Java application to an S3 bucket. That's pretty much it. Everything else is just pushing this button to start analysis. Under the hood, that's an implementation of self-service with a particular version Java containerization agent. The use case is for a developer who is operating on-premises but going to migrate to the cloud with zero knowledge of the cloud itself and zero knowledge of cloud services. I'm going to migrate my application to a reference architecture with containers, with a load balancer, with RDS managed service, and I'm going to use AI for that. I'm going to use these capabilities of self-service connected with AWS Transform.

We're going to wait for a couple of seconds to have it fully generated. I will show you what this tool actually creates and how it can be used for the next steps. Meanwhile, just a reminder that we are going from this legacy lift and shift to this modernized environment. I actually like this fact of live demo because it shows you in real time what is needed to create this conversion, to generate all the files, generate all the manifests and deployment, Docker file, and actually have something that can be easily deployed to a live environment.

Technically, within this live demo you're just watching the process of migration for this simple application. In order to understand what this means for you and your cases, if you have a large migration planning to migrate 1000 VMs, all you need to do is multiply that by 100 or 1000 to understand how much time you need for that modernization. We're generating recommendations now. I hope it will be here soon. Meanwhile, unfortunately I cannot show you what's going on, but AWS Transform has integration with Migration Hub. The service itself will be deprecated, however, right now through this Migration Hub, you can see the list of servers, you can see the dependencies, the network map dependencies, and this file that's actually exported from that AWS Transform discovery tool. That's pretty much what you can use, and you can see in the Migration Hub planning your migration. But again, a reminder that AWS Transform supports just lift and shift migration, nothing else.

The live demo is running a bit late. People are probably using Bedrock proactively because under the hood, we're using Bedrock, we're using the API, we're using the connection to Bedrock and predefined prompts. We defined the LLM, we built these custom agents, custom MCPs, and we predefined prompts that can be used in order to run this request and create results that will help me to modernize my application.

By the end of this demo, I hope once I'm done with my presentation, we can chat offline. You will have generated a YAML file, a Docker file that will help you to deploy the application, a YAML file that will help you to apply and deploy this application to a Kubernetes cluster, and you will have step-by-step instructions on how to move your database, including a snapshot that should be taken first. Finally, it took more time. So these are the results of the analysis with the old deployment roadmap, preparation step, application containerization step, Kubernetes deployment, load balancer configuration, database migration, validation testing, and details. As you can see, there's a Docker file for your application, build commands for that Docker file, Kubernetes deployment, Kubernetes services, config map, variables, and configuration for other resources, including CDK code for infrastructure as code for the load balancer and CDK for the database service.

Coming back to my presentation, that was an example of self-service that we introduced for our customers and for our developers. It accelerated a lot our migration practices and modernization practices. On average, we can see 2 to 3 times improvements in terms of time needed for migration and modernization. Last but not least, this is the way to do migration with the re-platforming and re-architecture strategy instead of lift and shift and instead of introducing technical debt, and with almost zero cloud knowledge. This is the proper use case for self-service. Thank you very much.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community