Kazuya

Posted on Dec 5, 2025 • Edited on Dec 7, 2025

AWS re:Invent 2025 - BCC: Hybrid architecture for generative AI to meet regulatory needs (HMC324)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - BCC: Hybrid architecture for generative AI to meet regulatory needs (HMC324)

In this video, Maxim Geng and Aleksandr Bernadskii from Bank CenterCredit present their hybrid architecture for generative AI using AWS Outposts to meet regulatory requirements. They explain how the bank deployed a multi-cloud strategy with AWS KMS External Key Store (XKS) for encryption using local keys, enabling compliance with Kazakhstan banking regulations. Two use cases are showcased: fine-tuning an ASR model for mixed Kazakh-Russian language that achieved 23% accuracy improvement and saves four million tenge monthly, and implementing an HR chatbot using hybrid RAG with AWS Bedrock's Claude Sonnet model that now handles 70% of HR requests. The architecture leverages cloud GPUs for model training while keeping sensitive data on premises through mel spectrogram conversion and Named Entity Recognition for data anonymization.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Building a Hybrid Multi-Cloud Architecture with AWS Outposts to Meet Banking Regulatory Requirements

Hello, my name is Maxim Geng and I'm head of R&D for DevOps Technologies and Cloud Solutions at Bank CenterCredit. The topic of our presentation is hybrid architecture for generative AI to meet regulatory needs. Before we begin, let me share a few words about Bank CenterCredit. The bank is one of the leaders of the banking system in Kazakhstan. We have 20 branches and more than 150 offices. We serve over three million clients and provide a wide range of services for both private individuals and business clients. We also subscribed for two Outpost racks in Q1 2024.

Let's start with why we decided to use cloud and what our goals were. All of our IT infrastructure was hosted locally in our own data centers, but with growing business needs and limitations of local solutions, we decided to scale our resources using cloud. Since different providers have their own advantages, we chose a multi-cloud strategy. What goals did we see in using cloud? First, flexibility and scalability. Second, innovation and competitiveness. Third, reliability and fault tolerance. And fourth, economic efficiency.

But at the same time, as a bank, we need to ensure compliance with regulatory requirements. First requirement is that data transmission and storage must be encrypted, and the encryption key must be stored in the bank. Second requirement is to ensure anonymization of customers' data during collection and processing. Based on these requirements, we started our journey to cloud with AWS Outposts. AWS Outposts is a local private AWS cloud that stores customer data on premises. We were one of the first in Kazakhstan to deploy this solution, placing racks in our own data centers, thereby meeting all necessary requirements.

We started exploring and deploying basic managed AWS services in this solution, such as managed virtual machines, Kubernetes clusters, and S3 storage. The bank's application architecture core layer consists of Kubernetes-based microservices. However, these workloads also execute across more than 20 clusters and we wanted to explore how we could migrate these workloads using cloud. We started exploring and deploying managed Kubernetes on AWS Outposts and later integrated it with our internal bank services. We then approved this same deployment approach with AWS and other clouds.

As a result, we got the following diagram. This slide shows a multi-cloud microservice architecture for Kubernetes clusters located in our data centers in Outposts, as well as in AWS cloud and Azure cloud. All of this is combined with centralized tools for deployment management, logging, and load balancing. Returning to the requirements for cloud usage, let's discuss more about encryption implementation.

For encryption, we used AWS KMS encryption service. It has a feature called External Key Store, or XKS, and this feature allowed us to use local keys for encryption in the cloud. Now AWS KMS cloud servers can use our local keys. This solution allowed us to use our local keys for encryption of data in services in the cloud. We also used an XKS proxy that is used for local keys to be transferred to the cloud encryption service. In our case, the XKS proxy is located in our AWS Outposts.

This case is based on the principle of double encryption. In the cloud, the encryption service AWS KMS uses a data key to encrypt data in AWS services. In addition, a root key is used that is located on premises, and this key is an external key and it is more secure.

The bank was interested in AI and ML-managed AWS services and GPU workloads for our projects. We started continuous integration based on this encryption realization and made an architecture for AI model training. This slide shows the architecture of model training for our ACR models. To do this, we made synchronization between S3 Outposts and S3 cloud, which you can see in step one. After that, data can be trained in the cloud on EC2 GPU or SageMaker, as shown in step two. The processed data can then be transferred back from S3 cloud to S3 Outposts in step three.

Finally, in step four, our internal services and services in EKS can interact directly with the processed data. Throughout all stages, we use our local keys for encryption in data processing, transferring, and storage. After this, we continued integration with another machine learning service called AWS Bedrock. In this case, our local services and services in EKS Outposts can interact directly with the Bedrock API agent. Our private endpoint provides a secure connection used in real time. This solution allowed us to use all AI models available in Bedrock and to scale our workloads to any size.

Fine-Tuning an ASR Model for Mixed Kazakh-Russian Language Using Hybrid Cloud Infrastructure

Throughout all stages of data transferring, preparing, and storage, we use our local keys for encryption. As a result of implementing this architecture, integration, and encryption mechanism, we have created a technological foundation for our projects using AWS cloud services. That's all from me. Thank you. Aleksandr will tell you more about data anonymization and the implementation of real cases in the bank. Hi there. My name is Aleksandr Bernadskii. I'm a Solutions Architect at AWS. In the next ten minutes, I will tell you about two generative AI use cases that Bank CenterCredit, or BCC, built on top of the hybrid architecture that Maxim just showed you. Before I begin, I must admit that I was just an advisor for this project and most of this brilliant work was done by the data science team of the bank. Unfortunately, the guys couldn't join us today in full numbers, but we have Daria from this team. Daria, kudos to you and to your amazing colleagues. Thanks for being with us.

I will tell you today about two cases. The first one is fine-tuning of an ASR model. ASR stands for automatic speech recognition, and the second case is the implementation of an internal HR chatbot to assist BCC employees. We will start with the fine-tuning. First, I'll explain the reasons why the bank decided to create a custom model. Then I will show you the schema and describe the process of how fine-tuning was actually performed. After that, I will show you the results and some quantitative metrics. There are generally two reasons why the bank decided to create a custom model.

The majority of the population in Kazakhstan speaks the Kazakh language. There is also a significant part of the population that speaks Russian, but there is a third option as well, the so-called mixed language, when a person speaks Kazakh and Russian at the same time, like two words in Kazakh and one word in Russian. You can imagine that there are models on the market, ASR models, which can process the Russian language quite well. Some of them can also work with the Kazakh language. However, there is no model on the market that can work with the mixed language. The first reason for the bank was to create this model specifically for the mixed language.

BCC stores real data recordings from call center recordings at eight kilohertz frequency, which is quite good if you want to save storage. However, most of the ASR models on the market, whether commercial or open source, are trained with training datasets at sixteen kilohertz. You can imagine that a model trained with sixteen kilohertz does not perform well on an eight kilohertz dataset.

Furthermore, Kazakh is a low-resource language, meaning that if you search the web for Kazakh language training datasets for ASR models, you will find just two datasets, both at 16 kilohertz. They are not particularly rich in terms of vocabulary and features. So Bank CenterCredit decided to create its own custom dataset with 8 kilohertz and enriched vocabulary based on real call center data recordings. That's the second reason for this approach.

Let me guide you through the process. Initially, all call center recordings were stored in a Hadoop cluster on premise in the BCC data center. Each call center recording file was split into two channels—one for the agent and one for the customer. Then each channel was processed by voice activity detection, which removed all noise and silence from the recordings. The fragments containing only voice were split into chunks, each with variable duration ranging from two seconds to twenty seconds, depending on the actual phrase duration.

Each chunk was then processed by a speech-to-text model, the previous version that the bank had at that time. This provided a text representation of the audio file. So now we had both the audio recording and the text. Both data formats were transferred from the Hadoop cluster to the Outposts. An important question is why the bank didn't run the entire pipeline on the Outposts. The reason is that the BCC Outposts configuration doesn't have GPUs. While we can provide Outposts with GPUs, that wasn't the case for BCC's Outposts. Speech-to-text models require GPUs, and the bank has a certain amount of GPUs that is sufficient for medium-size models and inference, but not enough for fine-tuning.

So the bank used GPU power to run the previous generation speech-to-text model to generate text, then transferred everything to the Outposts—both audio and text. The text files were then processed with a Named Entity Recognition model to identify all personal information, sensitive data, and confidential data. This was a hard requirement from internal security to ensure that none of this data would leave the security perimeter of the bank. All chunks and files where such data were identified were removed from the training dataset, including both text files and audio files.

Now the bank had audio files and text files that were both encrypted using XKS internal keys and cleaned of all personal and sensitive data. These were converted into a special format called mel spectrogram. If you've never heard this term before, don't worry—I'll explain it on the next slide. The mel spectrograms were transferred from S3 on the Outposts to S3 in the region, again encrypted with external keys, without sensitive data, and fully compliant with all regulations. Mel spectrograms are not audio and not text—they're something very specific that I'll explain.

These mel spectrograms served as the training dataset for the fine-tuning job in Amazon SageMaker. This fine-tuning job produced a certain number of model artifacts that were returned back on premise and then hosted for inference using local GPU power. So the bank used cloud GPUs for fine-tuning and then took the model back on premise to host it for inference on premise. That's the process.

A mel spectrogram is literally an image of the audio. Data scientists can generate this format from an audio file by applying Fourier transformation and then applying a mel scale on top of the Fourier transformations. This gives data scientists a bidirectional representation of the audio with the x-axis representing time, the y-axis representing frequency, and color representing amplitude. This is a best practice for how data scientists prepare datasets for ASR models, and Bank CenterCredit followed these best practices.

As you can see, the results are quite good for all three languages.

Bank CenterCredit achieved significant accuracy improvement across all three languages. For Russian, the improvement was 9%, and for Kazakh, it was 15%. Most importantly, the most significant improvement came for the mixed language, which was the major goal for developing a custom model, reaching 23% accuracy improvement for the mixed language. Furthermore, this model proved to be more cost efficient than the previous commercial ASR solution that the bank used, allowing the bank to save four million tenge every month. Tenge is the local Kazakh currency. Now this model works as part of the post-call analytics pipeline, and the bank considers this project successful and plans to scale it further in the next year.

Implementing an HR Chatbot with Hybrid RAG and Key Takeaways from Bank CenterCredit's GenAI Journey

Moving to the next case study, the HR-bot. Bank CenterCredit decided to implement an HR-bot for several reasons. The first reason was to improve the quality and velocity of HR responses. The second reason was to stimulate a self-service culture within the company. The third reason was to refocus HR employees toward more strategic initiatives rather than routine duties. Of course, this chatbot needed to be compliant with internal and external regulations, which meant that no bank confidential data should leave the on-premises security perimeter. That is why the bank used a technique called hybrid RAG, which stands for retrieval-augmented generation. By using RAG, we supply additional context to the model, providing knowledge that the model does not have out of the box. This knowledge consists of the internal HR policies of the bank.

As you can see from this schema, the HR knowledge base was initially processed by an embedding model using local GPU power, and the resulting vectors were stored in a PostgreSQL database on Outposts, since PostgreSQL can function as a vector database. The same process was applied to user prompts. The chatbot has a user interface, and when a user types in a prompt, it goes to the embedding model, which converts it to a vector. The vector then goes to PostgreSQL, where vector search retrieves additional context from the PostgreSQL vector database, which represents additional context from the knowledge base. This additional context is added to the initial user prompt, and then the prompt plus the extra content is sent to the cloud region from Outposts to AWS Bedrock. In Bedrock, we have the Claude Sonnet model, which processes the prompt and additional context and provides a response that is sent back to the chatbot on premises.

Regarding the results for the chatbot, according to the bank's assessment, roughly 70% of their HR requests now go through the HR-bot, which is quite good and definitely allows the bank to reduce the workload of the HR team. Even more importantly, the bank collected feedback from employees about their satisfaction with the chatbot, and the feedback has been mostly positive.

Now we would like to give you the key takeaways from our session. First, regarding the platform, we built a secure hybrid multi-cloud platform that complies with legal requirements and is tailored for the bank's business needs. Second, we implemented a flexible, scalable, and cost-effective architecture that enables an innovative approach and high competitiveness. Regarding the GenAI process, the bank implemented a fine-tuning process for the generative AI model, specifically for the ASR model. This process relies on the hybrid architecture with AWS Outposts as a key component and is fully compliant with internal and external regulations. The bank plans to reuse the same approach for every other GenAI task, leveraging cloud GPU resources for fine-tuning and then using the actual model on premises.

Regarding the HR-bot, the bank created a successful example demonstrating that GenAI can automate routine tasks. It has set a footprint for similar projects, and the bank plans to implement the same approach in different functional areas, such as IT support or procurement. That concludes our presentation for today.

; This article is entirely auto-generated using Amazon Bedrock.

DEV Community