AWS re:Invent 2025 - From Vision to Value: Scaling Gen AI with Speed to Reduce TCO with AWS (COP215)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - From Vision to Value: Scaling Gen AI with Speed to Reduce TCO with AWS (COP215)

In this video, Shreenivas Chetlapalli from Tech Mahindra discusses how their partnership with AWS helps reduce total cost of ownership in generative AI implementations. He highlights that 95% of AI proofs of concept fail to reach production and presents five key principles for production-grade POCs. The session showcases Tech Mahindra Orion, a platform built with NVIDIA and hosted on AWS for model fine-tuning, RAG repositories, and agent automation. Two case studies are presented: a US engineering company that automated 50% of customer support calls generating $50,000 in additional revenue, and a Philippines BFSI company where FraudSentinel reduced fraud resolution time from seven days to under 24 hours, improving customer satisfaction by 10%. Chetlapalli emphasizes focusing on business outcomes over technology capabilities, considering cost economics including token costs, and measuring results. He also mentions Tech Mahindra's Indus model with 1.2 billion parameters and their work with AWS on metaverse applications and quantum computing using Braket.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

The 95% Failure Rate Challenge: Tech Mahindra and AWS's Approach to Production-Grade AI

Good evening. I hope you can hear me. I'm Shreenivas Chetlapalli, and I lead the business for innovation and emerging technologies at Tech Mahindra, one of the largest systems integration IT companies from India. We partner with AWS in multiple areas, including AI, metaverse, and quantum computing. In today's session, we would like to cover how Tech Mahindra and AWS have helped companies reduce their total cost of ownership through joint offerings in the generative AI space.

Let me share some important statistics with you. Many IT companies have been conducting numerous proofs of concept in the generative AI space, and everyone talks about the work they do with agents. However, do we know how many of these proofs of concept actually get converted to production? According to a recent MIT study, the failure rate is 95 percent. Let me give you three additional statistics. In any given company that we work with, employees spend approximately a quarter of their time searching for data they require for their daily work or trying to find the right person who can provide that data. This happens across all organizations. Similarly, each company is now investing money to see how they can make their associates more productive by providing them with co-pilots and other tools. This is consuming up to 7 percent of company revenue in service costs. But the most surprising statistic is that a lot of investment in technologies is actually not getting used. According to Gartner, one third of the investment that companies make in areas like GPUs, technologies, and software is underutilized or completely wasted every single year.

So where do we go from here? Tech Mahindra and AWS have decided to define parameters on how we can actually make proofs of concept that are production-grade. These are the principles that we follow. When we try to do a proof of concept, we typically look at it only as a proof of concept. But can we look at it as a full-term project? Can we consider the deliverables that we want to give, the data that we need to examine, and the latency that we need to address? Is that making a difference? The second point is the most important thing that people discuss in the AI space: how do we provide security and governance, and are we incorporating that in the first stage itself when we are doing proofs of concept?

The third principle is that AI is something that will improve over time. So when we are thinking of production, are we thinking of staggering our production with learnings from each phase? The fourth principle, and a big misconception that we have, is that agentic AI will not need human beings. That is a big misconception. Agents are there to augment what we do, and we need to plan in such a way that agents do the bulk of the work. As we go ahead, can we ensure that the presence of human beings required is very minimal? But we need to plan it in such a way. The last and most important principle for anything that we do is: are we evaluating what we are doing? Are we benchmarking the proof of concept that we are doing? Are we looking at what results we want to get out of it? These are key things that we need to consider when we try to create a proof of concept.

At Tech Mahindra, we actually found a strategy on how we deliver AI. Any company that comes to us asking for AI is looking at four key things: either they are trying to do productivity, transformation, innovation, or governance. So we came up with a strategy saying, can we ensure that we deliver AI right in the first instance? To do that, Tech Mahindra has come up with a product called Tech Mahindra Orion. What is Tech Mahindra Orion? Tech Mahindra, along with NVIDIA as an infrastructure player, has come up with a platform which will give you access to models for fine-tuning, which will help you create RAG repositories, and also help you launch agents to automate your tasks.

So what will that give us? Speed, interoperability, security, and governance. Now, this is the entire architecture of Orion. I will not go into detail, but I would request all of you to come to our booth, which is 1284, where we are talking about what Orion is, what are the key use cases that we have developed on Orion, how we launch agents at breakneck speed, what repositories we create, and how AWS is part of the entire Orion ecosystem. When you come to our booth, we'll discuss these topics, but this is the general architecture of Orion.

So what are we doing with AWS typically? We are partnered with them in Orion. Orion is actually hosted on AWS. Second, in their agent core, we are developing agents for some of our customers, typically in BFSI and manufacturing. Tech Mahindra has a large repository of certified people who have different belts from AWS, and we are doing joint research with them in some of the core areas.

Real-World Success Stories: From Customer Support Automation to FraudSentinel Implementation

Between Tech Mahindra and AWS, there are multiple case studies that we have done with customers, but I'll highlight two use cases. The first one is a US-based company developing engineering services that has been in operation since 1950. For the last 50 to 60 years, they have been in business. They approached us with a typical problem: they had 75 human agents working on customer support. These customer support agents were getting bogged down with multiple calls, which resulted in long queues for people trying to access their customer center, affecting productivity.

So what did Tech Mahindra do? Tech Mahindra and AWS gave out a joint solution where we said we would create a generative AI-based chatbot to help solve all these queries. How did we do that? The first thing we did was power this with Amazon Lex. Amazon Lex is a tool that uses natural language understanding. Whenever a customer puts in a query, it tries to understand the intent behind that query, and then it passes this information to AWS Lambda, which orchestrates all the information available, whether it's a repository, inventory, or pricing.

Lambda fetches this information and gives it to Claude Sonnet. In AWS, Bedrock has multiple models from AWS and Anthropic. Claude Sonnet is one of the models from Anthropic, and it is used for solving complex use cases. This use case takes the information from Lambda and passes it to Lex in a way that human beings can read, and that information goes to the end user. By doing this, we automated 50% of the calls they were getting and brought productivity to the customer. Using the same 75 agents, they have been able to spend more time actually selling and generating $50,000 of revenue. This is the first use case we did for an engineering company.

Now, the second use case we did was for a company in the Philippines. This was a BFSI company, primarily into mobile and doing remittances. This company has been doing well for the past few years, and by 2024, they had 94 million consumers and a market capitalization of $5 billion. If you look at the Philippines as a country, there is a lot of focus on remittances, and remittances as an area is growing at 13.3% every year until 2030. But this customer hit a roadblock.

Account takeover fraud occurs when people impersonate your details and take over your account, then start doing malicious things with it. When I discover something wrong has happened with my account, I immediately contact customer support. This same situation used to happen with a particular customer, but their customer support team typically took seven days to resolve it. In the finance business, taking seven days to resolve fraud cases means credibility loss for the company. The company approached us, and we created a platform for them called FraudSentinel.

We did two key things to build this platform. First, we automated the entire fraud detection process using Step, which is an AWS product. What used to happen manually now happens through a process flow, making fraud detection much simpler. Second, we created a generative AI-based chatbot that fraud analysts could use to talk to experts using normal business language. By implementing these solutions, we brought down the resolution time for account takeover cases from about seven days to less than 24 hours.

Not only did we reduce the resolution time, but we also brought the service level agreement to one day. For a company that had been losing credibility, this improvement helped increase its customer satisfaction index by 10%. Now they're back in business, still leading in the Philippines market, and expanding further. These use cases illustrate an important principle: when we focus on business outcomes, what should we actually be looking at?

What generally happens is that when technologists meet with customers, we are happy to discuss technology or tell them what the latest technology can do. However, the focus should be on what business outcomes the customer is seeking and how we can solve for that. Can a normal automation solution help the customer accomplish their goals rather than proposing a large language model with a RAG repository? We need to ask ourselves whether we are focusing on the business outcomes the customer wants or whether we are pushing technology for its own sake.

The second consideration is cost. From the moment we start looking at a solution, we should be examining cost economics. Do we actually need these things? Often, we tend to use ChatGPT or other models where tokens are costly. I am not saying these tools are good or bad, but we need to take token costs into consideration when we do the economics. The third consideration is whether we are properly planning proof of concepts to scale into production-grade solutions.

These are some of the key things we are doing. The last and most important thing we need to do is measure all the work we are undertaking. We need to measure the proof of concepts we are trying to do and the projects we are trying to do. What is the output we want to get from that work? That is the most important thing we need to focus on.

The way we have been orchestrating things is by discussing with the customer to understand what the problem is and how it actually impacts them. In the first case, the company was losing business because customer service executives who were supposed to sell or cross-sell were busy handling queries and taking more time to resolve them. In the second case, the customer was gaining business but losing credibility, which in the long term could affect growth in the remittance market. The focus was on how we look at business outcomes, then we come back to the drawing board and ask whether we should use AI, generative AI, or a RAG repository. What solution do we actually want to derive?

Overall, the focus has been on how we derive business outcomes from our technology implementations. We need to ensure that our technology decisions are driven by business value rather than technological capability alone.

For instance, while we initially thought agents were a panacea for all our problems, large language models are also not a panacea. Tech Mahindra has created a model called Indus with 1.2 billion parameters, and when we benchmarked it against GPT and others, independent bodies found that our tokenization ratio was much better than theirs. This tells us that we should not evaluate large language models based solely on parameters, but rather on how they have been fine-tuned and how they have been built.

Coming back to the same premises, we focus on cost and business outcomes, and we try to conduct our pilots as production-grade implementations. I believe this is what we learned from our journey with both customers. The next point I wanted to make is that we would like to extend this discussion when you visit our booth at 1284 to understand what problems you are facing in your areas and in the AI space. We want to learn what challenges your customers are experiencing and how we can work together, involving AWS whenever required.

The third point is that as a company, we also look beyond AI. I have my metaverse expert here, and we are exploring how we can use immersive experiences combined with AI to simulate many business processes. This could include digital twins for a mining company, digital twins for how fiber needs to be laid out for a telecom company, or digital twins for how a financial process gets solved. These are things we are trying to do, but they represent a culmination of both AI and immersive experience, or metaverse as we call it.

The last area we are focusing on, again working with AWS, is how we bring quantum computing into practice. We are exploring how to use quantum computing to solve cases like fraud detection and how to bring route optimization using quantum. This is something we are working on with Braket, which is an AWS product. That is where I would like to end my session, and I would be happy to take any questions.

; This article is entirely auto-generated using Amazon Bedrock.