DEV Community

Cover image for AWS re:Invent 2025 - LSEG: Transforming market intelligence at massive scale (IND3305)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - LSEG: Transforming market intelligence at massive scale (IND3305)

🦄 Making great presentations more accessible.
This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - LSEG: Transforming market intelligence at massive scale (IND3305)

In this video, LSEG and AWS present their transformative journey in evolving market intelligence at massive scale. Jason West and Ed Johnson detail how LSEG processes 274 billion messages daily from 575 venues across 170 countries, managing 90 million+ market instruments and 75 petabytes of historical tick data. The session focuses on the Matrix Signals Platform, a serverless solution that delivers 5x cost reduction and reduces signal detection time from 3 days to 30-45 minutes. Ed explains the architecture using AWS Glue, Athena, OpenSearch, and Bedrock to transform trillions of ticks into actionable insights with zero data copy. The platform enables real-time hotspot detection, correlates market volatility with Reuters news using Gen AI, and supports scaling to anticipated 50 million messages per second by 2030. Key achievements include 97% storage cost reduction, 82% infrastructure cost savings, and API-based access enabling faster customer onboarding and capacity management across their global distribution network.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

Thumbnail 30

Introduction: Capital Markets Ecosystem and the Challenge of Processing 274 Billion Daily Messages

Good afternoon everyone. My name is Rohit Singh. I'm a principal solution architect at Financial Services AWS. I'm delighted to be joined by Jason West, Group Director Real Time, and Ed Johnson, Real-time architect, and we'll be talking about the LSEG transformative journey, how they are evolving market intelligence at massive scale. By massive, I mean they're working with some really big numbers. We'll start the agenda with a quick look at the capital market ecosystem, the challenges it faces, a system which is always running, and we will soon see exchanges running 24/7 that would further explore the volume of market data that is produced.

Then Jason will take stage and will share the LSEG partnership journey with AWS, how the key business drivers, strategic decisions, and impressive results they've achieved working with AWS. After that, Ed will take stage to dive deep into the Matrix Signals Platform, the technical backbone that enables sophisticated market intelligence at scale. Finally, we'll wrap up with the key outcomes and takeaways, things that you can apply within your own organizations. It's going to be a really exciting journey through innovation at scale.

Thumbnail 100

By the end of the session, you will learn how LSEG has designed a serverless solution that operates at scale. We're talking about some really big numbers. We're talking about 274 billion messages processed and delivered globally. We're talking about 75 terabytes of historical tick data, and we're also talking about 90 million plus market instruments. The focus of the session is the Matrix Signals Platform. Ed will walk us through how LSEG designed the serverless solution to work at extreme scales, no more headaches with traditional infrastructure, no more managing the scaling of that.

Thumbnail 160

Thumbnail 170

This solution scales seamlessly, and it's cutting infrastructure costs by up to 5 times. So a solution that also offers flexibility as the market conditions change, as we're seeing right now, and how it leverages Gen AI using Amazon Bedrock to provide market insights. We are dealing with hundreds of exchanges, each producing massive amounts of data around the clock, and then we have aggregators like London Stock Exchange Group. They have the challenging task of getting all this data, processing it, and distributing it around the globe. And then we have the consumers, the banks and hedge funds who then need this data to make trading decisions.

Thumbnail 210

Thumbnail 220

Thumbnail 230

Just think about the scale: 575 distinct sources, each with their own format, protocol, and timing requirements. In the grand scheme of things, this represents trillions of dollars of market trading every day. When the system runs smoothly, the global markets operate efficiently. Any delays or issues with the data can result in lost opportunities worth millions. So the stakes are absolutely high, and the performance requirements are very stringent. Each layer represents its own set of challenges. Producers need ultra-low latency and high throughput while managing geographical constraints and regulatory requirements, especially across regions.

Thumbnail 250

Then we have aggregators like LSEG. They have to collect all this diverse data, process it, normalize it, and distribute it globally to clients who rely on it for making critical trading decisions. Consumers require secure access and the ability to process and analyze data at petabyte scale, all to be delivered in a cost-effective manner. Plus, there's also a need for more flexibility as we've seen that market conditions can always change. This issue grows significantly as market data volumes are growing year on year, and traditional infrastructure just can't keep up with that.

Thumbnail 280

So, diving deeper into the challenges by market data producers, the complexity scales rapidly and the challenges multiply exponentially. They must cope with market volatility, which can spike the market rate of volumes by 10 to 50 times, especially during major market events.

Thumbnail 320

Thumbnail 340

Everything has to be cost efficient and reliably delivered to all their clients, and AWS services like S3 and Private Link can help with that. S3 provides petabyte-scale storage and has integration with other AWS services so you can perform analytics while the data sits on S3. Private Link offers high-performance, secure connectivity for your clients globally from anywhere. Now that I've given you an overview of the incredible complexity of the capital market ecosystem, I'll invite Jason to share the key business challenges that drove this transformation and explain why LSEG chose AWS as their technology partner. The business context Jason will provide is extremely crucial because this was not about technology modernization. It was about providing new business capabilities, improving client experience, and positioning LSEG for continued leadership in the market aggregation and distribution business. Over to you, Jason.

Thumbnail 390

LSEG's Global Reach: 300 Years of Innovation Meeting Modern Market Volatility

Thank you, Rohit, I appreciate it. I'm Jason West, and I look after the real-time business at the London Stock Exchange Group. For those of you who don't know, we're one of the leading global financial market infrastructure companies in the world, delivering data to 4.4 million clients around the world in 170 countries. We have 26,000 employees around the world and 300 years of innovation. Our best innovation was in 1850 when we sent the first message from the London Stock Exchange by carrier pigeon. Now we're all about Gen AI and cool stuff, so we're progressing as our clients are progressing and as our technology partners are progressing.

Thumbnail 420

This is the makeup of the London Stock Exchange Group today. There are multiple divisions. Starting at the top left, LSEG Data & Analytics is where I sit and where the real-time business sits. To the right of that, you have LSEG FX, which provides FX markets and trading direct to brokers and via our workspace platform. Then there's post-trade, which is all about our clearing and trade clearing. Risk intelligence keeps us safe and ensures that our clients can work within proper regulatory frameworks with good data. Our FTSE Russell business is all about indexes, a huge and well-supported business around the world with 44,000 clients today. And ultimately, we're a stock exchange as well, covering everything from ingestion and execution to trade input and clearing.

Thumbnail 490

If we drill down to the real-time business in numbers, we're present in 170 countries today. We transmit 274 billion messages a day around the world, ingesting from 575 venues and exchanges around the world, some cloud-based and some traditional venues and exchanges. We ingest that data in 4 milliseconds, normalize it, and put it onto our network so that we can place the strike price, the high price, the low price, whatever the venue, in exactly the same place for our clients. Since 1996, we've been storing every trade and every tick from every single venue. We're the only company to do that, and it's currently at 75 petabytes. A few years ago, we made the decision that this data needs to be hosted in the cloud, so that data now sits within AWS off our premises, and we're writing 5 to 10 terabytes of data to it every single day, so we need the elasticity to grow that.

Thumbnail 590

As Rohit mentioned, 90 million-plus single instruments from all these venues are transmitted around the world daily. We're taking on or upgrading 150 to 200 new venues around the world as part of our aggregation within Data & Analytics. We have 34 collection and distribution points of presence around the world where we collect our data from the venues and exchanges, normalize it, and then distribute it onto our network. Over the last couple of years, we've seen some pretty big trends in the unprecedented scale of modern financial markets. COVID was one, and the markets sort of flatlined a bit. The Ukraine and Russia conflict saw a 40 percent increase in data rates that did not go back down and hasn't.

If we look at US legislation and tariff wars, that pushed us up to nearly 20 million messages a second on our backbone. We have the capacity for it, with 20 to 25% scale within our environment where we collect and distribute that data. But this is just getting bigger. We are constantly accessing our Tick History store and our PAP store, plus we have to get these messages around the world to our clients with low latency, high throughput, no jitter, and delivered on time. We are being relied on to do this. We expect this to grow as there are more and more venues coming online and more and more of our clients working with GenAI data scientists and data engineers analyzing and extracting this data, trying to find alpha, trying to trade smarter, trying to beat other people into the markets.

Thumbnail 690

So we had to come up with a plan to deal with this. We have scaled our core network, infrastructure, and backbone, plus we are storing all this data. The reason we store it is our clients do not want to. If you look at infrastructure costs today for those in that game, you do not want to throw horsepower at this problem. You need to run at scale, both up and down.

Business Challenges Driving Transformation: From Legacy Infrastructure to Cloud-Native Solutions

Addressing the business challenge, legacy on-premise infrastructure is a problem for most people. You either have to replace it every 3 to 5 years when it is advertised down, or you have to ship more of it into your data centers, and your colos are costing you more money. At LSEG, we took the decision to move our historical data out and shut down that environment, which is great for us. That moves us from a CapEx model to an OpEx model, which is easier to maintain.

Exponential data growth is something we see daily. It is just growing and growing, and we do not know where it is going to be. We are predicting that we are at 20 million messages a second today, and by 2029 that is going to be 50 million messages a day. It is going to be a hard case to cope with. Customer demand for real-time insights is another challenge. In a previous role, I worked at a tier one investment bank. Our head of the FX desk came to me one day and said, "I want to be in this venue." I said, "Great, give me half a million dollars in 9 months and I will build it." That is not good enough these days. You have to be able to get your clients, your partners, and your desks into the market as fast as possible.

Cost pressures in competitive markets are significant. Everyone is looking at margins. I will be honest, I think our data is the best data in the world, but I do have competitive issues with that. So I have to make it cost effective for me and for our clients. We also need innovation velocity. We have developers that code and execute during the day, releasing 5 to 10 times a day. We have to make those right for those markets.

For the transformation journey, we started with historical data migration. Tick History PCA originally stored all this data as CSV files, huge amounts of data. We have now changed that to Parquet format, which is easier to interrogate and stores a lot smaller, helping us with our costs and allowing our clients to get to that data quicker. Real-time platform development is critical. In our 34 locations globally, we have to be able to scale when the markets move, and we do. We have a very good capacity forecasting team with 20 to 25% headroom in all of these places. But even as the person that runs real time today, I have to look at the next generation and figure out how I am going to be able to scale 30, 40, 50% when it is required.

Thumbnail 850

We are also working on the Matrix Signals Platform and large scale ETL. This is something that Ed will be covering shortly.

Thumbnail 880

Strategic Business Outcomes: Geographic Expansion and Accelerated Innovation Cycles

If we look at cloud native data feeds across the latency spectrum, the data room feeds business with LSEG delivers this front to back, from ultra low latency all the way up to quant data solutions, price and reference data sets, with historical data on this side and low latency on this side. As you can see along the bottom, these are some of the AWS products we are using today to help us facilitate our move as we grow into the markets.

For the business outcome, geographic expansion is huge for us. Our clients expect us to meet them wherever they are. The push for us now is to deliver in some countries around the world, especially in regulated countries where the data has to live and reside in that country and has to be operated in that country.

We work with multiple partners including regulators, central banks, and cloud providers. Faster onboarding is one of our key benefits. For some of our products, it used to take us weeks to onboard clients. Now we can get a client up and running in 2 minutes with an email for our real-time optimized product that comes out of AWS, with 3 updates a second. Trade safe is another feature. People using low latency trading won't particularly use that, but for wealth funds and hedge funds, they're happy to use that type of data.

On demand capacity for scale is critical for us. When the markets moved in April this year, we had lots of clients phone us up and say, can you please conflate this data? Our applications are choking. We can't deal with it. So within our distribution Pops for each one of those clients, we reduced the update rate from full tick to maybe 4 updates a second or 5 updates a second. When the market stabilized, we switched it back up to full tick. That gives us the ability to do that within our public and private clouds.

Thumbnail 990

Accelerated innovation cycles are essential as well. We've all seen the craze of GenAI, faster chips, Nvidia. Everyone's striving to get into these markets quicker and do smarter things with data. We've got to be smart as well to help our clients do what they need to do. So we work tirelessly with our technical teams, our architects, and our developers to make sure we're ahead of the market.

Thumbnail 1030

Technical Deep Dive: Understanding LSEG's Real-Time Platform as a Global Message Bus

So what I'd like to do now for the main event is hand over to Ed Johnson, one of our real-time architects. Thanks, Jason. I'm Ed. I'm a real-time architect. Today I'm going to be taking everyone in the room through a deep dive of the real-time Matrix Signals Platform, both from an architecture perspective and from some of the technology lessons learned that we've faced. That's hopefully going to help others learn some lessons and use our data at scale.

Thumbnail 1060

As an initial start, it helps us to understand a little bit about what LSEG's real-time platform is from a technical perspective before we get into the specifics around our Matrix Signals environment. For the non-market data experts in the room, and there will be a few of you I'm sure, LSEG Real-Time is a huge globally distributed, stateful, first in, first out message bus.

That message bus sits not only across one stream of data but across every single market instrument that we transmit in the world. That's not just one stream, but 90 million of them. There's an example on the slide there. You can see that's our RIC, our Reuters Instrument Code for Nvidia, which as many of you will know is one of the most liquid and most traded instruments in the United States market. That represents one of our streams of data. But we have a first in, first out, low latency queue that delivers that data from our collections estate through our core infrastructure and out to our distribution locations to our customers in the fastest way possible.

As part of that service, we also add value add and analytics data in the stream to make that data available natively to customers. We're also normalizing that data, so putting it wherever the venue in the world is that has provided that price, we'll put it into our LSEG format, and it's then the same API and the same data structure no matter where in the world you're receiving that data from. The challenge for us today and what we're going to deep dive into is being able to understand how these 90 million streams overlap, detect hotspots at scale, work out where in our infrastructures they overlap, and respond to that in the market in a much faster, more optimized way.

As part of this transformation as well, we've also delivered over 5 times cost reduction in the cost of ownership of detecting these signals in the market. Brief analogy for everybody, just to take it out of the technical realm. It's quite straightforward if you think of LSEG real time and our feed as a river of data. Imagine each stream sits within that river. All 575 global stock exchanges, trading venues and other contributors feed almost as tributaries into that river of data. Our customers sit alongside on the banks, observing that stream as it comes past.

At the bottom of the river, and Jason mentioned it earlier, we have basically what you could think of as a lake, which is our Tick History product, and that provides the captured version of that stream of data right back to 1996 with 75 petabytes plus of data in that S3.

This data is stored specifically in Parquet format, which might be useful for some in the room, and we'll come back to that analogy as we discuss. LSEG Real-Time is the river, and our Tick History product is the lake at the end of it. I mentioned earlier that this estate was globally distributed.

Thumbnail 1230

This is an overview of our collections, core, and distribution infrastructure, both in LSEG's private cloud environments that sit in our data centers around the world, and also in some AWS distribution locations where we serve the real-time optimized product from. Our global estate spans many sites, and all of that is connected via our world-leading software-defined network. This collections and distribution infrastructure serves content from pretty much any source to pretty much any destination. You can consume data from the United States, in Asia, and vice versa.

There's a global hardware footprint that we used to have that we used to detect signals, hotspots, and insights across this environment, which allowed us to respond to market volatility. The cost of that signals detection infrastructure, which was designed for capacity management and other types of hotspot detection, was ballooning and has continued to increase as we see the increasing data rates that Jason was talking about earlier in the market. We needed to find a way to futureproof that solution, leverage some more modern tooling, and transform our product offering as part of that.

The Scale Problem: Managing 90 Million Instrument Streams with First-In, First-Out Precision

This hotspot detection is so important for us as an organization. Delivering data in a low-latency, low-jitter, accurate, timely fashion is our business—it's what we do. It's really important that we detect that before it becomes a problem. Returning to the streams of data I was describing, it's helpful to distill the problem statements slightly so we can deep dive into the architecture in a moment.

Thumbnail 1330

Thumbnail 1340

First, we're going to be forecasting and optimizing for capacity management. That is detecting hotspots and implementing performance optimizations where the components of this environment—where the infrastructure receives more of these data streams. Each bit of our infrastructure serves some of them, so working out where the hotspots are in that is super important.

Thumbnail 1370

Thumbnail 1380

We wanted to mitigate increasing infrastructure costs. Reducing the cost of ownership of our on-premise packet capture-based signal detection systems was a key deliverable as part of this. We wanted to enable our operational teams and our leaders to accelerate the speed at which they could make decisions. As we saw the markets scale very quickly, our signals detection environment was slower to respond in terms of signals, and we needed a way to speed that up and provide the insights to our operational teams and our leadership teams, enabling us to respond quicker in times of extreme market volatility.

Thumbnail 1410

Let's take one of those streams—Nvidia here's the example, we're carrying that one on from earlier. There's an example here of what that data actually looks like from our normalized data format. You can see there are seven messages on the screen just as examples that are mocked up, and each of these is what we would call a tick in the parlance. Each tick here contains an update to a stock price. That's fundamentally what the data's showing. There are some trades as well you can see, so trade prices and volumes as well that we send as part of that data stream.

This is just for a one-second period for this instrument, and this is just a mocked-up example. We'll have many instruments in our environment that may have a set of data sending these ticks over 10,000 times in a single second. If you scale that across the 90 million instruments we have, you can immediately see where the problem of scale comes in and the amount of work we have to do to make sure that this environment is super performant and super optimized.

In addition to the data you see on the screen, there's also value-add data and analytics that we will package up with the feed, and it's all sitting in the same format no matter the trading venue, as I mentioned earlier. One thing to note from a market data perspective is that this is only meaningful if this data is ordered, so it has to arrive in the right order. Otherwise, the stock price is incorrect. You've got to make sure that it arrives in the right way. That's where the first-in, first-out came from.

Thumbnail 1510

Matrix Signals Platform Architecture: Zero-Copy Analysis of 75 Petabytes in Amazon S3

Let me dive into the architecture. You see our LSEG Real-Time Collections core and distribution infrastructure that forms our low-latency on-premise environment for full tick data. The LSEG Tick History product is hosted in Amazon S3 with 75 petabytes of data. The change we're making as part of this architecture is being able to turn the data we see in the LSEG real-time feed into the insights we have in Tick History, and then we perform the analysis in our Tick History optimized data set in the cloud environment. We then reflect that back into our on-premise environments and across our global distribution infrastructure to respond quickly to hotspots.

Thumbnail 1580

The solution takes data captures from Tick History and aggregates that data into a highly optimized time series format. We then extract the parameters and insights we need to detect hotspots in the environment. We started this project without the Glue components and without the OpenSearch components to demonstrate what we would achieve as part of this build. I had to look at the data with Amazon Athena, and we chose that as our first base because it's a super low-cost, light-touch solution where we could inspect the data that sits in the Tick History data set natively in Parquet with completely zero copy.

Getting this approach rolling and demonstrating that the insight was there in this dataset and that we could reflect the signals straight back into the environment quickly enough was super valuable. We did that in a few months as part of a proof of concept with no deployment of infrastructure. From the proof of concept with Athena, we then added the Glue layer on top. That Glue ETL layer is a super optimized pre-aggregation step that generates all of the time series we can derive along with some other parameters and reference information that sits on the back of the tick-by-tick view in the S3 bucket.

We then index and embed that data as part of our Amazon OpenSearch pipelines, which makes that available to internal use cases and other parts of the organization. This is all tied together with a series of Lambda functions, Event Bridge, and other components that many in the room will be familiar with. The signals from this come back into our environment and we use it to respond quickly to hotspots.

Thumbnail 1680

The tick-by-tick data sits in the Amazon S3 LSEG Tick History product set. We're querying that natively with Amazon S3 access points, so we're not needing to copy that data. This isn't just stuff we do internally. We provide this data in a zero-copy fashion via S3 access points with our S3 Direct product from Tick History.

We're taking that tick-by-tick view and then generating that optimized time series version of the data. Each of the instruments we have in our environment is aggregated into a one-second time series with 100-millisecond burst detection across every single market instrument. We're pre-aggregating into that data set. From that point on, we're able to use a flexible modeling layer with Amazon Bedrock and Athena, which can be multi-tenanted, so we can provide those billions of data points in that optimized data set into insights and signals that we can model very flexibly with that end query layer.

Thumbnail 1710

Thumbnail 1780

Three Architectural Dimensions: From Trillions of Ticks to Actionable Signals Using AWS Glue and Athena

To take through an example, there are seven ticks mocked up on the top. Those seven ticks would feed through into the total messages parameter for that second, and then we might calculate an instrument rank as part of a model that would say find me the top instruments that are being consumed in this part of Asia, for example. I'm going to take us through three specific dimensions of this architecture that bring it into more detail and hopefully provide some lessons learned. The first dimension is transforming trillions of ticks—you see that tick-by-tick store in the Tick History data set—into billions of optimized data points. We're using S3 access points and Glue services to do that. This is zero copy, and the reason we introduced Glue slightly later was so we had more dials to be able to performance-tune the ETL phase to get the most value if you like.

We have flexible scaling with AWS Glue, so we scale up and down depending on whether it's the weekend, daytime, or a busy part of the market like US market open. This helps us with cost optimization. We're also joining this information at this stage with reference information and other parameters that help us support and aggregate against other parts of the environment. As part of this, we've delivered zero storage cost of ownership because we're no longer maintaining any storage costs as part of the solution.

Thumbnail 1850

The second dimension involves creating models from those billions of optimized data points to very specific signals that respond to the questions we want to ask of the data. We've retained the flexible backbone that we saw initially as part of the proof of concept because we didn't know in advance the set of questions we'd want to ask of this dataset. It's a neat way to futureproof the solution and provide capability to future use cases as well as the ones we know about today.

The Athena layer automatically detects when new data is available and runs all of the models against that new dataset. We can ask any question of the data with this layer of aggregation. Some examples include pinpointing all 100 millisecond hotspots in the US region at market open, finding the most liquid instruments for today compared to yesterday, or analyzing compute requirements and performance requirements for a given instrument watchlist for one of our tier one bank customers.

We structure this with Athena SQL queries, but all of those are multi-tenanted and owned by the end users of this capacity and hotspot detection data. We're not building a very specific pipeline for each individual use case. This maintains a super low cost, and we don't have to pay for this environment either.

Thumbnail 1940

The third dimension is something you might not have seen linked to Athena before. We're taking the outputs of Athena in Parquet format and auto-indexing them straight away into OpenSearch. This is beneficial for two very specific reasons. It gives us API-based access to that data and means we're able to aggregate in a very specific way and then provide that data as an API natively no matter the use case. We're indexing that data so we can provide quick insights and serve that via an API.

In addition, we're making that data available in Amazon Bedrock for knowledge-based use cases so we can get the embedding results of the data and detect hotspots but also correlate those hotspots in a GenAI native way that's futureproofed for the technologies available in Bedrock. By making this available for any use case, we've improved the dataset available for capacity across the organization. It's now API-based integration or GenAI-based integrations, and we've made those hotspots and volatility detection available to a much wider audience within the firm. This means we can start to do fin-ops on it and do other things as well.

As a final explanation, you saw on the first slides around volatility that unexpected news in the market drives volatility for us. When we see that, having a quick and easy way to correlate that volatility information with market-leading news that we provide for financial services customers from Reuters is super valuable. We've added a news correlation layer as part of this solution. We're not only saying there was a hotspot, but we're able to also explain why there was a hotspot. That's something only available from a GenAI type summarization and RAG perspective.

Finally, multi-tenancy and being able to do this at scale—you might wonder how we're doing that with Athena. It's very straightforward. We're putting it through Fargate and making those authenticated APIs available in a safe, structured way to different users within the organization. That helps us with that part as well.

Thumbnail 2090

Transformative Results: 97% Storage Cost Reduction and 30-Minute Signal Detection

Let's take signal detection from end to end. We've got that tick-by-tick data which ultimately is in our AWS Glue Serverless aggregation job . It's pre-aggregated, and we've got that data available for every market instrument to detect and ask any question we want of that data. There's an example instrument there that might have 5,000 messages in a given second.

Thumbnail 2120

Of which 4,000 of those will have appeared in a 100 millisecond burst within that second. From the Glue server, this job will then auto-run our hotspot detection model as part of the Amazon Athena SQL approach. We'll detect unusual bursts in the data flow, and in this case we're looking for hotspots where there's a proportion of burst messages that are significant, above the total available.

Thumbnail 2140

Thumbnail 2160

From that point, we'll then auto-index the output of all of that—find all the hotspots and then auto-index all the hotspots to trigger burst-based monitoring via an API. This is a very quick model of detection on new data, followed by auto-indexing, which is straightforward as well. That indexing is then made available via an API based view, which we then use to take infrastructure action, specific monitoring actions, or burst remediation as part of that environment.

From end to end, we're able to aggregate this data from a tick-by-tick view in a really efficient, super cost-effective way without copying any data around our environment. We're doing this at a scale where we're touching all 90 million instruments in the environment, which is something that asking this question of every single one individually, we would never have been able to do in the past.

Thumbnail 2200

Returning to the end-to-end architecture, we've got the ability to take the tick-by-tick data and aggregate and analyze that data with Glue. There's a cost to that, but it's managed. We then model that data through Amazon Athena as part of a flexible querying layer, and then we'll index and embed that as part of our Amazon Bedrock and Fargate delivery mechanisms.

Thumbnail 2230

The purpose of this has been to deliver real-time pattern and anomaly detection at scale. We're doing this as soon as the data's available for us in that S3 bucket. The scale we're talking about is trillions of messages into key insights. From a storage perspective, infrastructure cost perspective, and a time-to-signal perspective, I'm going to deep dive into these slightly.

We were maintaining on-premise hardware for packet capture across this signals environment, which was very expensive to maintain and support. We've reduced the storage costs to basically negligible over that estate—effectively a 97% reduction using the data that's available in S3 already. The cost of infrastructure has been reduced, and using scalable Glue jobs and some of the enterprise capabilities that exist in our AWS environment means that we've delivered about an 82% reduction in infrastructure costs as well.

On top of that, the time to signal we've reduced from 3 days in the worst case, if we're looking for specific data points, right down to a 30 to 45 minute window. This is completely revolutionary for how quickly we can respond to hotspots for our customers. This is all a serverless architecture, so we're not having to maintain OS images and patching and all those other things that are expensive operationally.

Thumbnail 2340

From an operational cost perspective and by leveraging the cloud data approach here, we've been able to completely reduce our operational expenses too. We're very optimistic that this can scale as well as we see the number of messages in the market approach 50 million messages per second by 2030.

Thumbnail 2370

We've been enabled by this AWS technology to address a wider range of organizational challenges than just capacity management. We would have been able to address capacity management by investing in the existing approach. What we've actually done in this case, though, is add a load of other benefits by just changing the architecture and using some serverless tooling. We've been able to make the most of the many innovations that have happened in the big data space over the last 10 years, which previously have not been accessible to us from a low-latency platform perspective.

We've improved capacity management, but I'm going to touch on some specifics around the other things we've been able to do. As I mentioned earlier, we're leveraging Bedrock to enrich the hotspot detection with specific market news and insights from Reuters news data.

We've ensured open API-based access to data so that we're not having specific domain experts guarding capacity and hotspot detection information. We're making that open across the organization and enabling many different use cases to respond to hotspots across the environment. We've been able to plot the end-to-end data path for the first time consistently, and the reason that's been hard is we have 90 million distinct streams of data. Working out where those are flowing from is a very large big data problem for us. Being able to plot that path from top to bottom is super helpful, and we've been able to do that across every market instrument for the first time.

Enabling FinTechs for market data to consumers is really important. How we better optimize our environment to put our biggest infrastructure in locations where there's the most volatility is super valuable. We've been able to reduce the time for new customer onboardings, which is very important as part of our product experience because we're able to deliver more accurate, more targeted estimations of bandwidth, capacity requirements, and last-line connectivity in a few minutes versus the multiple days it used to take us in the past to get that data out of the old signal systems.

Thumbnail 2520

It's been a good journey, and we've learned a lot. The proof of concept infrastructure and proving this to the organization meant that Jason and others had the confidence to fund it to delivery. We're now in a position where we can scale across the environment to detect signals and hotspots into the future as we see market data volumes increase. Thanks for listening to the deep dive, and I'm going to hand back to Jason who's going to conclude and pass on some next steps.

Thumbnail 2540

Conclusions and Future Vision: Serverless Architecture Enabling Gen AI-Powered Market Intelligence

Cheers, Ed, thank you very much. I appreciate it. So as Ed mentioned there, the POC, if he'd asked me to do the POC on real live infrastructure and data center, it would've cost me a fortune. I think it cost me 20 quid, so it was great, and I'm joking, it wasn't that cheap.

Thumbnail 2550

So conclusions and next steps. LSEG's AWS transformation journey: our tick history data, 75 petabytes and growing by terabytes a day, now sits in a scalable environment that's not on-premise, doesn't cost me CapEx, and we can scale when we need it. It also gives our clients a direct access point to that data. They don't have to come into my environment; they can use their own S3 and connect to it. So that gives all of our client base a single point of access to basically all of our data without them having to store it.

Thumbnail 2580

Real-time platform development: Real-Time Optimized is one of our products that comes out of AWS. It's trades safe, it's three updates per second. We have about 500 clients using that today because they don't need full tech. They're comfortable running their workloads or they're taking that data directly into their premises. RTMDS, which is our managed service offering, started deploying this in public cloud as well last year. So we're managing the infrastructure and the service for our clients while they look after their apps and their workloads.

Thumbnail 2630

With the release of private link connectivity last year by AWS, I have six regions globally. I can now interconnect to any AWS region using their backbone and infrastructure, without my clients or myself having to get involved in procuring telco lines. So that helps us get to market a lot quicker.

Thumbnail 2650

Matrix Signals Platform is a great invention by Ed and some other smart people in our organization. This allows us to focus on capacity management and scaling. We know where the hotspots are, we know where we need to invest, and we know where we need to scale up. So hopefully we'll continue to build it and it'll exist for the lifetime.

And then advanced optimizations. So like I alluded to, we're a data company, we're not a telco provider. Wherever I can get my data to my clients quickly, fairly cheaply, and easily without me having to do it, I'm going to take that option. So the release of Cross Region Private Link last year to the 34 AWS regions is great for me and my clients on AWS. It works for us, enables us to get our data out quicker and faster. Everyone's happy. So this is some of the transformations we've done in partnership with AWS, and we'll continue to work with these guys and see how we get on. I think I'm handing over to Rohit now who'll give us the last summary.

Thumbnail 2720

Thank you, Jason and Ed. What an incredible journey, and the technical architecture that Ed took us through is truly impressive, especially the fact that it's zero data copy and completely serverless and cloud native by design. This is one of the examples where you build a solution early on like the ticketry solution, and now they're building on top of it, coming up with new use cases. I think the innovation will carry on.

Coming to the Gen AI side of the equation, the Matrix Signals Platform is demonstrating the Gen AI potential for financial services. One of the examples is having an AI system that can correlate market events with Reuters news. Gen AI's ability to relate events and market movements to when news events happen—and we're talking about major events—is not something a business analyst can do or spot at this pace and at this scale of data.

Thumbnail 2750

Thumbnail 2770

Additionally, there's the ability to recognize patterns when we have 90 million plus instruments simultaneously. The ability to find connections, hotspots, and trends which are not possible for a human to identify. The metrics platform is providing operational actionable intelligence to the team so they can prepare themselves when those events happen in a much quicker fashion. This is one of those problems that is truly solved by Gen AI when you're transforming a very large amount of data into actionable intelligence, especially at this scale and this speed, which would have been unimaginable just a few years ago if you were to use another system to do it.

Thumbnail 2840

The LSEG team is using Gen AI, and it's not just nice to have. They're actually using it to fundamentally transform the way market intelligence works. These are some of the AWS services and features that will be relevant to our capital markets customers. Some of the things the LSEG team is using include low-latency outposts and the nanosecond time-stamping ability on EC2, and these solutions are continuing to come out. We also included a couple of links to the LSEG solutions which are featured in today's session. I'll pause here for a second if you want to take a picture of those QR codes.

With that, I would like to thank you for attending this session and for LSEG's incredible journey and what exactly they've managed to achieve at this massive scale. This is an example of visionary leadership, the right cloud platform, and a collaborative partnership approach. The AWS services which the LSEG team is using are available and ready for customers to try and build the next generation of financial solutions. Organizations that are embracing the ability to start using these services and innovations will be the market leaders of the future. We'll be around later to answer any questions or have any specific discussions you want to have about a specific challenge or opportunity. With that, I will thank you for your time and please do provide some feedback for the session. Thank you very much.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)