DEV Community

Cover image for AWS re:Invent 2025 - Build resilient and low-latency hybrid telecom infrastructure at scale (HMC328)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Build resilient and low-latency hybrid telecom infrastructure at scale (HMC328)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - Build resilient and low-latency hybrid telecom infrastructure at scale (HMC328)

In this video, Jose Flores and Juan Buitrago from Liberty Puerto Rico explain how they built a resilient, low-latency hybrid telecom infrastructure using AWS Outposts. They discuss migrating 2 million mobile subscribers from AT&T's legacy systems after acquiring the mobile operation in 2019, implementing a cloud-native charging system with Matrixx that maintains 15-millisecond latency, and achieving high availability through active-passive architecture between Puerto Rico and Miami Local Zone. The solution leverages Liberty's proprietary subsea fiber cables (Arcos and Americas 2), AWS Direct Connect, Transit Gateway, and containerized services on EKS, enabling seamless failover during hurricanes and creating a replicable template for future markets.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Liberty Puerto Rico's Mobile Network Acquisition: Challenges and Cloud-Native BSS Transformation

Good afternoon, everyone. My name is Jose Flores, and I'm the Director of IT Operations at Liberty Puerto Rico. Today we are going to talk about how we build resilience and low-latency hybrid telecom infrastructure at scale. Together with my colleague Juan, we are going to go through how we built this infrastructure in Puerto Rico and cover some of the main challenges that we faced on this journey.

Thumbnail 40

Today we are going to start by introducing Liberty in Latin America, who we are, and where we operate to set the foundation of this presentation. Later we will talk about the challenges we faced after the acquisition of the mobile network in Puerto Rico, both from the business standpoint and at the infrastructure level. After that, we will move to the main topic of this presentation, which is the solution. We will cover the infrastructure that we built, how we achieved high availability and high resiliency in Puerto Rico using AWS Outposts, and how we implemented charging solutions in the Puerto Rico market. Finally, we will talk about the lessons learned and what is next for Puerto Rico.

Thumbnail 110

Liberty Latin America is one of the main leading operators in the Latin America and Caribbean market. We have presence in more than 20 consumer markets and more than 30 B2B markets. We have an extensive submarine fiber network connecting over 40 markets with more than 50,000 kilometers of fiber optic cables. We have a revenue of 4.4 billion dollars and almost 10,000 employees.

We operate in different countries in Latin America with different brands. In Costa Rica we operate under the name of Liberty, and in Puerto Rico as Liberty as well. In the Bahamas we operate as BTC, and in other parts of the Caribbean as Flow. In Panama we operate as Móvil. For the B2B market, we use the brand Liberty Business, and Liberty Networks covers all the network fiber across the Caribbean. Liberty Latin America is part of the Liberty Group but became an independent company in 2018, at which point we focused specifically on the Latin American market. In this presentation, we are going to focus specifically on Puerto Rico, which is our newest mobile operator and the challenges that we faced.

Thumbnail 230

In 2019, we acquired the AT&T mobile operation in Puerto Rico. At that time in Puerto Rico, we were only a fixed operator where we provided cable and fixed services. We were not a mobile operator. During that time when we acquired the new mobile network, we were operating under a TSA, or Transition Service Agreement, with AT&T. This TSA allowed us to use the AT&T business support systems stack to continue the operation of the mobile platform, but this had a high cost for us and also had a specific end date, so we needed to move fast. We needed to migrate more than 2 million subscribers from the AT&T IT stack to our own IT stack. However, at that time we only had a business support systems stack that was meant for fixed services, not for mobile customers.

This made it difficult to escalate issues. It was print-based and not cloud native, which presented significant challenges for us when attempting to migrate customers using that IT stack. The system had many customizations, and the time to market was extremely slow. It was not possible to launch new products or services because each required extensive customization. Ultimately, that platform was not the right fit for us because we needed to compete with other carriers in Puerto Rico like AT&T and Claro.

At the same time, we faced a new challenge. We were transitioning from being solely a fixed operator to entering the mobile segment, which required us to create a new charging system. This new system needed to maintain at least 15 millisecond latency to connect all calls. For those unfamiliar with the term, charging is the platform that authorizes transactions in real time for data, voice, and SMS. We were looking for a solution that would provide good customer experience, connect our charging solution directly to the EPC or 5G core, and deliver a better experience for our customers.

Puerto Rico also presented unique challenges. The region has faced many heavy storms and hurricanes. After Hurricane Maria, the country was without power for many months. This required us to think carefully about building a highly resilient system. We used AWS Outposts to implement high availability with two charging systems. The main system is located in Puerto Rico, close to the EPC, and the other is in Miami, where we use a local zone. With this architecture, during any catastrophic event like a hurricane, we can maintain high availability and low latency, allowing us to continue operating in Puerto Rico.

Thumbnail 500

Here you can see our full architecture. We start with Salesforce, which is our product catalog where we create all product configurations. These configurations are distributed throughout the entire stack. We also use Evertec as a payment gateway for order provisioning. We use Salesforce as our CRM, where customer care agents can interact with customers, sell services, and resolve network problems. For billing, we use Aria, a cloud platform where we build monthly charges for customers, send invoices, and manage billing. On top of that, we have our OCS, which is where we use AWS Outposts with Matrixx. The OCS is the application that controls interaction with the network. We perform real-time rating of all events, which are then pushed to Aria for correct customer billing. We are leveraging all the capabilities available in AWS, such as Apache Kafka, to push transactions.

Thumbnail 650

Thumbnail 660

For example, we use Apache Kafka to push transactions towards Zaria as a type of CDR, and we do the same for transactional CDRs, sending all of them to the data lake. We use Athena to query all that information. Basically, this is what we built in Puerto Rico to support all the mobile operations. I migrated all the subscribers from AT&T to the new IT stack that is fully cloud-based. With that, I will hand over to my colleague, Juan Buitrago.

Thumbnail 710

Building a Hybrid Telecom Infrastructure with AWS Outposts and Subsea Fiber Networks

Hello everyone. My name is Juan Buitrago, and I'm the Cloud Governance and Engineering Manager for Liberty Latin America. Let's start talking about the foundation of this high-performing solution. This is a hybrid solution, so the foundation is not the servers and not the compute or other public cloud services. For us, the foundation for the solution is under the sea. We are Liberty Networks, an operations company for Liberty Latin America, and we operate an extensive low-latency and redundant fabric interconnect across the region.

Thumbnail 750

We include the most critical subsea cables in the region that connect the Caribbean with the Americas. This full physical infrastructure is the key enabler for our resources with AWS. Talking about the challenge here, the core challenge requires linking our mobile operations in Puerto Rico with the AWS CloudHub and the Miami Local Zone with the highest possible performance. In that way, we had to treat this connection as an extension of our local network to the AWS facility in Miami.

Thumbnail 810

The whole solution leverages our own subsea cables. For this particular solution, we are using the Arcos and Americas 2 fiber capacity of those subsea cables to create a fully redundant solution. In that way, we can guarantee the shortest and lowest jitter path between on-premises in Puerto Rico and the AWS site in Miami. This proprietary network advantage allows us to implement the first pillar of this hybrid solution, which is the Direct Connect.

Thumbnail 830

Let's go deeper into the hybrid architecture. The first component is on-premises. What we have on-premises in Puerto Rico are different data centers in different locations. On each location, we also have core components of the mobile network, including the DRA, the Diameter Routing Agent, and also the STP, the Signaling Transfer Point. These core components need to talk with our charging system. For that, we decided to implement our charging system, Matrix, in AWS Outposts also located in Puerto Rico. This is the active instance of the charging system. This solution is a containerized solution, so we are using EKS, networking services, and storage services. We also decided to include another partner, F5, to manage all the signaling traffic, and MongoDB for the database.

The connection between the core mobile and the charging system uses the local gateway from AWS. From there, we use our local network in Puerto Rico, which gives us around 10 milliseconds of network latency between both components. This is very good for transactions, and the reason is straightforward: this is our own network in Puerto Rico, so that's the easy part.

We have a different situation with the disaster recovery for the application. We decided to put the DR in Miami local zone and implemented essentially the same solution with the same AWS services, a containerized solution, and the same ecosystem of partners. From there, we need to establish replication between passive and active using network connections. For the network connection, we rely on our own subsea cable. On top of that, we have the transport network and also MPLS. Finally, we go through Direct Connect with two 10-gigabit connections to provide bandwidth and full redundancy path.

The final tier is the region. At the top, we see the region where we consume software as a service solutions from different vendors like Salesforce. We use our payment gateway, BSS components, and CRM. In parallel, we also have internal workloads running on the region, including the OSS layer, mediation, and other IT applications for the whole ecosystem. We have Tier 1, the active instance of the charging system; Tier 2, the passive instance of the charging system; Tier 3, the region; and finally connections with on-premises. For the full connection of these components, we rely on the transit gateway, which is the brain that manages all the complexity in routing traffic. We also centralize traffic inspection using AWS Network Firewall.

Thumbnail 1100

What we learned from this hybrid architecture is that this infrastructure enables us as a cloud-native telco. We now have a closed stack, creating a template for this solution so we can replicate this architecture in different markets, new acquisitions, for other operations, or other mobile companies in the region. These hybrid services, talking about Outposts, local zones, and regions with the full ecosystem of partners, solve the constraints related to low latency, data resiliency, and solution resiliency. During daily operations, we have seamless failover and failback of the application.

Thumbnail 1170

One of the main components, as I mentioned, is our charging system. For us, the important thing was to have it close to the network, close to the EPT, and close to the 5G core. We chose cloud-native vendors for the rest of the solution, not only for the OCS but also for our BSS, our CRM, and our payment gateway. This follows a best practice that allows us to go to market within weeks, not months. We selected these vendors because we learned from the past that with legacy systems, customization is not good for our business. It makes us move slowly. For that reason, we selected all these vendors that follow a cloud-native approach.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)