Kazuya

Posted on Dec 5, 2025 • Edited on Dec 8, 2025

AWS re:Invent 2025 - Ripple: Building an intelligent, multi-agent system for 24/7 operations-IND3301

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Ripple: Building an intelligent, multi-agent system for 24/7 operations-IND3301

In this video, Ripple's platform team presents how they built an AI-powered operations platform to transform XRPL blockchain monitoring using AWS. The session covers the challenges of analyzing massive log volumes (30-35GB per node) from 900+ decentralized nodes and eliminating dependency on C++ experts. The solution architecture features a multi-agent system using Amazon Bedrock, including orchestrator, code analysis, log analysis, and query generator agents powered by Claude 3.5 Sonnet. Key innovations include GraphRAG with Amazon Neptune Analytics for code-log correlation, Cohere Rerank for improved retrieval, and Model Context Protocol for CloudWatch integration. The demo shows how the system reduced troubleshooting time from 2-3 days to minutes by automatically correlating code and logs to answer complex operational queries about validator proposals and consensus rounds.

; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Introduction: Ripple's AI-Powered Operations Platform at AWS re:Invent

Hello everyone. Welcome to Reinvent. Thank you for joining our session today. We know that you had a lot of great options out there, so we truly appreciate you being here. My name is Aman Tiwari. I'm a Solutions Architect at AWS. In today's session, we are going to talk about Ripple and how they have built an AI-powered operations platform to transform their platform operations using AWS.

Now this is a 300 level breakout, which means that we are assuming you folks are familiar with core AWS services and concepts. That being said, don't worry. We will take a moment to introduce each AWS service. Now we have a packed agenda, so let's jump right in. Here are the things that we will cover in today's session.

We will start off by talking about what is Ripple, who is Ripple, their journey, their story, and XRP to set the context. From there, we're going to talk about the challenges faced by the team. Next, we will dive deep into the solution architecture, and the Ripple team is also going to share the lessons learned and the journey they took to build it and scale it on AWS.

We also have a demonstration for you folks, so I'm very excited about the real implementation of the workload. And finally, we will close it out by talking about what's next for Ripple. Now with that, I will hand it over to Vijay, who's going to get us started.

Clarifying the Distinction: XRP vs. Ripple and the Company's Enterprise Solutions

Thank you so much, Aman, for the introduction. Aman has been an excellent partner from the AWS side throughout this initiative. I'm really excited to be here to walk us through the journey that we've been on at Ripple and solving this XRPL operations platform via AI solution. Coming straight out of the Thanksgiving week, particularly when you talk with friends and family, I always get asked and cornered by many folks, and the first question they end up asking is where should we invest on the crypto side? How much is Ripple going to go? Why is Ripple spiking 5X since Thanksgiving?

I always take a moment to pause, and then I tell them, are you talking about XRP? I think there is always this confusion about Ripple and XRP, where people misunderstand what is Ripple and what is XRP. So I just want to probably set the stage with what is XRP, right? XRP is the native digital asset for the XRPL blockchain, just like BTC for Bitcoin, ETH for Ethereum, and SOL for Solana.

Ripple, on the other hand, is a fintech company. We build enterprise-grade solutions leveraging blockchain to move value faster, cheaper, and more efficiently, and to solve problems for businesses like financial institutions and banking. We do have applications varying from a cross-border payment solution, which is our Ripple Payments, and then Ripple Custody, which is primarily used for storing digital assets in a more secure and compliant way. Recently we also launched a stablecoin which is backed by US dollars, which recently attained a billion dollars market cap.

We use RLUSD and XRP as a bridge currency in some of the solutions that we use. As you see, we've been operating since 2012. We serve customers over 90 countries. We have almost 1,100 employees. It's a well-established institution solving enterprise-grade solutions for businesses.

XRPL Blockchain: A Decentralized, Energy-Efficient Network Built for Business

Moving on to the next slide. I want to give a bit of an intro about XRPL. XRPL is a layer one blockchain built for business. What do I mean by built for business? I don't know how many of you have seen the 2008 Bitcoin white paper. It came up with an abstract that Bitcoin is a decentralized peer-to-peer networking system that solves the double spend problem, right? This is probably the first of a kind where a decentralized ecosystem helps solve the double spend problem.

One of the things about Bitcoin was that it's more energy consuming. I think the architects of the XRPL figured it out. This may not be a scalable way, so they came up with this most energy-efficient proof of association decentralized blockchain, which is XRPL, which has a deterministic finality of 3 to 5 seconds. It solves the double spend problem and can be used for payments and other business use cases. As you see, XRP is also one of the OG blockchains. It's been around from 2012, and it has its own first native DEX.

XRPL has all the features you need for financial institution use cases like payments and escrow. XRPL has all these capabilities built in as native primitives on the L1 blockchain. Looking at some of the statistics, XRPL is a decentralized ecosystem with almost 900 nodes. This means anybody can become part of the network. All you need is a machine with 32 gigabytes of RAM. You can spin up an instance, download the binary, and you can be part of the XRPL network. These 900 nodes are located throughout the world, starting from Japan and extending to San Francisco, and there are different types of nodes available.

Validators are the core nodes primarily responsible for the consensus layer of the protocol. Looking at some of the metrics, we recently closed 100 million ledgers in the last 13 years, which also demonstrates 3 to 5 seconds finality. One of the biggest advantages of using XRPL is the transaction fees. They are incredibly cheap. It costs 0.0004 cents to send a transaction from one place to another. XRPL is a battle-tested blockchain built for business and is making waves in the digital assets era.

On the previous slides we talked much about the XRPL blockchain, but I want to give you a miniature version of the network. This is a peer-to-peer network, which means there is no central server, no headquarters, and no single machine controlling anything. Instead, it is made up of many different types of nodes. We have validators, which I already mentioned are the brain of the ecosystem and part of the consensus layer. We have hubs, which are primarily responsible for connecting these peers. We have relayers, which can send messages from one machine to another. We have almost 180 independent validators. The scale of the network is robust and resilient, and it is definitely one of the best networks if you want to do any cross-border payments or institutional-grade solutions.

The Challenge: Monitoring a Decentralized Network and the Need for AI-Driven Log Analysis

Being in the Ripple platform team, monitoring the resiliency of the network is one of the challenging problems. For example, in a centralized ecosystem, it is very easy to figure out what is going on when you launch a new feature. You can just go to a database and execute a query before and after the change to easily determine what is happening in the network. However, when you talk about a decentralized ecosystem with 1000 nodes run by many independent operators like universities, blockchain institutions, wallet providers, and financial institutions, it is very hard to identify how the network is behaving.

We in the Ripple platform team are in the trenches all the time making sure XRP is more secure, resilient, and robust. Particularly in solving these problems, one of the things we always end up having is relying too much on C++ experts. The whole protocol was written in 2012 and is a huge codebase, all in C++. When an issue happens or when we want to monitor the network, we generally get a whole huge volume of logs and try to make sense of them by looking at the C++ code. It is so complex that we have to rely on C++ experts to understand the pattern of the issues.

For example, four weeks ago there was a Red Sea cable cut, and some of the node operators in the Asia Pacific were having reliability issues. When we got all those issues raised to us, we asked them to give us the logs. We received all these logs, which are huge in volume, almost 30 to 35 gigabytes for a single node. We took all those huge volumes of logs and did a cross comparison. All these logs are debug logs, and in a peer-to-peer network, you get a lot of signals including cryptographic details, proposals, peer connections, and making a meaningful summary of what is going on is one of the most complicated processes, particularly when you are solving production problems.

It could take almost two days just to understand what is going on. Even in those cases, we have to rely on a C++ expert like the Core Ledger team, an engineering team that does all the protocol development, to get involved and make us a meaningful summary out of it. Logs are the gold mine of information and play a very critical role. We have been doing this forever. One of the things we wanted to change was how AI can solve this problem for us. Why do we have to rely on C++ experts when the whole world is heading towards English as a programming language? We started looking into AI as a solution where we can correlate between the logs and the code and it can give us a meaningful summary out of it.

That's when we started looking at the whole journey of log and code correlation of the solution. This is probably the most lightweight high-level design you're going to see. My partners who are wearing the suits will come with the big guns of a whole deep dive architecture. If you are engineers, you're going to be in for a treat. There is a lot of good stuff that's going to be talked about, but on a very high level, I want to call out these three boxes.

The first is the middle box, the multi-agent platform. This is the brain of this orchestration. This is a foundational layer that we built leveraging AWS stacks. This contains two agents at this point, and we're also working on three more agents which Hari will be giving us a more detailed walkthrough. This is primarily the orchestration layer where the agents work when a user sends a request via a chat interface. They make a code path decision of whether it can go towards a code analysis pipeline or log analysis pipeline.

The second half is where the bulk of the data preparation happens. Since the code is an open source repository, we have an everyday periodic code retrieval that is run from the GitHub repository. Every commit which goes into the developed branch gets prepared and stored in a graph database for us to make a meaningful summary out of it. Then we have the log processing pipeline. Since we're talking about huge volumes of logs, almost 35 to 50 gigabytes for a single load, and Ripple operates like 40 plus machines in mainnet that constitutes around 2 to 2.5 terabytes of data. To have all this data ready and prepared, we have this log processing pipeline which runs every day to prepare it for the whole agentic framework to make a meaningful summary out of it.

Technical Deep Dive: Log Processing, Code Analysis, and GraphRAG Architecture

I'm going to call on Hari next to come and walk you all through the detailed architecture. In the forthcoming slides, we will talk about the log processing pipeline, the code processing pipeline, and the graph database capabilities that we have built. First, let's talk about the log processing pipeline which brings the nodes' logs to CloudWatch. Raw logs from validators, hubs, and client handlers are first brought into S3 using a separate workflow which we have orchestrated through GitHub workflows using SSM.

Once the data reaches S3, an S3 event trigger fires a Lambda function, which looks into the file, gets the byte start and end of each chunk, and respects the log line boundaries and the configured chunk size. Once these messages are derived, they are then put into SQS for further distributed processing. These messages are then read by the log processor Lambda function. It retrieves only the relevant chunks from S3 based on the configured chunk metadata that it read, and it passes the log lines, gets the metadata out of it, and puts these log lines and metadata to CloudWatch.

Now let's move on to the code analysis pipeline. Here we have two main repositories: the rippled repository and the standards repository. The rippled repository holds the server software for the XRPL ecosystem, and the standards repository describes the standards and specifications relating to the XRPL ecosystem to ensure interoperability between the XRPL and other applications that are built on top of it. Both repositories feed into our pipeline using Amazon EventBridge, which is a serverless event bus. Using EventBridge scheduler within it, we automatically trigger repository syncs on a cadence.

Once the sync event arrives, the Git repository processor pulls the latest changes from GitHub, including the code and the documentation, versions them, and then stores them into S3. Then comes the knowledge-based ingestion job, the sync job, which pulls this data and puts it into Amazon Neptune Analytics Vector store. Knowledge base, by default in the backend, configures the chunking, embeddings, and acts as storage as well. Now let's move on to the next slide to dive a little bit deeper about the GraphRAG capabilities.

We are not using the open search vector store here. We are using GraphRAG because it is really good at storing relationships across the codebase. The files are stored in S3 now. The process starts by reading these files. The knowledge-based sync jobs read these files in and convert them into smaller, manageable chunks. Bedrock also allows multiple chunking strategies, including fixed size chunking and other LLM-based approaches. In our use case, we use fixed-size chunking because it works well with structured content like code and documentation.

Once these chunks are derived, they are sent into the Titan text embedding V2 model, which generates the embeddings that represent the actual semantic meaning of the text. There is also an extract entity step that happens underneath. It looks into the chunks and derives related entities such as function names, code calls, classes, modules, and other domain-specific identifiers. The Claude model gets all this information, and this information is then used to build a lexical graph. This lexical graph uncovers all the relationships across our codebase, which makes the retrieval very fast and very efficient for us while utilizing very little context window for the LLM.

Once these chunks, embeddings, and entities are retrieved, they are sent to the Neptune Analytics graph store, where the graph is built. The lexical graph is nothing but nodes and edges which have links to the entities and source chunks that belong to source documents, and the cross-document relationships are all stored here. This Neptune graph database becomes a powerful retrieval layer after all the processing that happens in the backend, which is managed by Bedrock.

When a query comes in, the code analysis agent using an LLM hosted in Bedrock first gets context from the knowledge base along with the entities, then it is sent to the LLM for generating responses which are grounded not only in semantics but also in code relationships. We have also applied another significant improvement. We have a re-ranking layer built on top of this GraphRAG workflow. Re-ranking is a powerful enhancement to RAG. It adds a second and more intelligent pass over the retrieved documents. While vector search gives us the top matches based on embeddings, re-rankers like Cohere Rerank evaluate the relevance dynamically at runtime. This means the model reads both the query and each candidate document together, allowing it to understand nuance and give better results for the LLM.

Here is where Ripple applies it. We first retrieve a broad chunk of data from the knowledge base. We then give that to the re-rank model along with the user query. The re-ranker model then calculates a relevance score from zero to one, with one being the highest score, and then it reorders the results and returns the top ten results back to the LLM. Now you get fewer results to the LLM, but they are all very high quality, so the context window is not affected for the LLM, and you still have more tokens available.

You can see this question here, which is why we have the screenshot. I asked the question about what log messages are defined inside a function. This actually forms the premise of the entire presentation. We need to find the right log messages to query in our log files. You can see the source chunk that it shows is number one, but originally without the re-rank model it was actually number four. Because of the user query that we gave it and also the chunks that were retrieved, it was able to combine them and surface the most relevant result.

Now I'll pass on to Aman for the multi-agent architecture.

Multi-Agent Platform: Orchestration, Prompt Engineering, and Model Context Protocol

Hello. So now let's talk about the multi-agent platform that the Ripple Team has created. Our multi-agent platform consists of four AI agents. Let's talk about them one by one. The orchestrator agent is responsible for taking queries from platform engineers using a web UI. Think of the web UI as a clean chat interface. Every request goes to Amazon API Gateway, which is a managed service that allows you to create, publish, secure, and maintain API endpoints at scale. The orchestrator agent receives this query and classifies it through intent classification. Based on the query, it then decides whether to invoke a single specialist agent or coordinate multiple agents in a sequential or thorough execution format. Once these downstream agents complete their tasks, the orchestrator agent is also responsible for synthesizing their outputs into a single coherent response.

The orchestrator agent uses DynamoDB as a state management backbone. If you're already familiar with Amazon API Gateway, it has a soft 29-second integration timeout. This means that if a user asks a question and the entire multi-agent workflow takes more than 29 seconds, it will time out. That is the reason why, as soon as the orchestrator agent receives a query, it immediately creates a task entry in DynamoDB and continuously updates progress messages asynchronously. Once the results are back, it writes the final answer back to DynamoDB.

The code analysis agent uses Knowledge Base as a tool. Hari went into deep details about how we developed a graph RAG application using Bedrock Knowledge Base. The code analysis agent uses this tool to derive code insights. The log analysis agent is responsible for performing operational analytics on top of CloudWatch log groups. The process involves generating a CloudWatch Logs Insights query, and in order to generate a syntactically accurate query, the log analysis agent works closely with the query generator agent, whose sole responsibility is to come up with an accurate query. You can see that there is a tool, a static JSON file, that is provided to the query generator agent, which it reviews and hands off the query to the log analysis agent. You can see Amazon Cognito, which handles the required API Gateway authentication.

Before talking about the other components, you might notice that there are two agents hosted on AWS Lambda, whereas the other two agents are hosted on something different. You might recognize the symbol. This is Amazon Bedrock AgentCore, which enables you to deploy and operate AI agents at production scale with any model or any framework. Amazon Bedrock AgentCore runtime is a purpose-built serverless environment for these AI agents. We made this design choice because at that time, Lambda satisfied all our constraints and AgentCore had just entered preview. Now that AgentCore is generally available, one of the next steps for the Ripple Team will be to migrate these Lambda-based agents onto Bedrock AgentCore runtime so that they benefit from this purpose-built infrastructure for AI agents. The Secrets Manager is storing the sensitive credentials required for AgentCore communication.

While the SSM Parameter Store stores the configuration parameters for the services. Now let's dive deep and talk about certain aspects of AI agents. Prompt engineering and system prompts are fundamental to how these agents operate. A system prompt defines and gives the agent its identity and responsibilities, and more importantly, what the agent should not do. Ripple has practiced strict prompt hygiene principles, where every prompt is structured to provide the agent its role and task, along with strong explicit guardrails of what the agent is not allowed to do.

We have literally copied the system prompts onto this deck from the codebase. You can see how the system prompts vary for all four AI agents. The orchestrator agent's system prompt is geared towards task delegation. The log analysis agent's system prompt tells the agent that you are an expert in analyzing XRPL logs that are stored in Amazon CloudWatch. The code analysis agent is powered by the GraphRg application, and we are telling the code analysis agent that your responsibility is to understand the code dependencies, the XRPL codebase, as well as Git commit relationships. When it comes to the CloudWatch query generator prompt, its responsibility is telling the agent that you are able to precisely generate queries that can be used by the log analysis agent.

System prompts are important to highlight because prompt engineering is the foundation of how these AI agents behave. Now let's talk about what makes these AI agents so powerful and what makes them able to execute tasks on our behalf. Which are the tools and the framework that they are using, and of course, the model? The foundation of this architecture is the Strands SDK. Strands is an open source SDK built by AWS for multi-agent coordination. You can see that all these agents are using Strands as the multi-agent framework.

Let's dive deep into three of these agents. The orchestration agent is using Claude 4.5 Sonnet via Amazon Bedrock and uses two Strands tools. The call code analysis agent tool is used to call the code analysis agent via Lambda invocation, whereas the call log analysis agent function talks to the log analysis agent over HTTP with JSON web token authentication. If you look at the code analysis agent, the first tool, which is the XRPL query code, is powered by XRPL knowledge base, whereas the get recent commits and get commit details are Git-based sync actions. The log analysis agent needs access to CloudWatch APIs such as execute queries, describe, as well as retrieve.

In summary, Strands handles the communication layer, it handles the message passing, context management, and multi-agent coordination, whereas the orchestration agent takes the decisions, and the specialist AI agents do the deep domain analysis work and communicate back to the orchestrator agent. Now let me walk you through an end-to-end query flow. The user uses a Web UI to ask a question. It hits the API Gateway. The API Gateway sends the question to an orchestrator agent, which performs intent classification. In this case, the orchestrator agent decides to talk to the code analysis agent. The code analysis agent then invokes its tools to get the exact relevant code lines. Once this information is retrieved, it then passes along that question to the log analysis agent.

The log analysis agent uses the CloudWatch query generator agent to obtain an accurate CloudWatch query that it can execute on top of these CloudWatch log groups. This is where Model Context Protocol comes into the picture—an open standard developed by Anthropic that helps these AI agents talk to external systems, such as Amazon CloudWatch, over standardized interfaces. As you can see, the log analysis agent uses two MCP tools. The first tool provides the ability to execute a query on top of these log groups, and we get a query ID back. We then take the query ID and use the second tool to get the actual log results.

Once the log analysis agent has these results, it assimilates the information through the code analysis agent, performs the code-log correlation, and sends the results back to the orchestrator agent, which finally synthesizes the output and sends it back to the user. All the AI agents are using large language models. In this case, we have been using Claude 4.5 Sonnet, but using Amazon Bedrock, you get model flexibility and can choose any model or mix and match. The orchestrator agent can use a lighter, smaller model for speed, whereas the bigger, heavier agents can use models that are better at code generation, like Claude 4.5 Sonnet.

Live Demonstration: Code-Log Correlation in Action Through the Chat Interface

This is the entire current state architecture of Ripple, signifying the AI processing, the log processing pipeline, as well as the code analysis pipeline. Now with that, I will hand it over to Harry, who will walk us through a demonstration of the solution. Before we move to the demo, we will take a few brief pauses to understand the log lines because that is going to be text-heavy in this demo. I will pause so that you can observe the log lines and get a better understanding of them. Let me play the demo now.

This is a chat UI. The question we have asked here is: for the given time range, how many proposals has a validator seen from other peers? A proposal is a validator's suggested view of what the next ledger should be. We have asked the code analysis agent to get the log lines from the code and pass these log lines to the log analysis agent, which can generate the response and give it back to us.

This is the orchestrator agent logs. You can see here that it has received the user query and will be calling the code analysis agent to get the log lines from the code. This is the code analysis agent. You can see here that the prompt is received from the orchestrator, and the chat history, if it was available, would have been sent here too. It uses the query XRPL code repeatedly until it gets all the log lines related to the question. You can see that the primary log line has been retrieved. This is the line in the Consensus.h file. Along with it, you get other log lines as well for the log analysis agent to derive further analysis.

This is the log analysis agent logs that you are seeing right now. It has received the prompt and would receive the chat history. You will also see that the code analysis agent responds, as shown here as well. This helps it understand what the previous agent's response was and also give better results based on this. The CloudWatch query generator agent is now being called to get the CloudWatch queries. You can see that there is a pattern tool which gets called now. We get the static patterns from S3, which contains the log patterns as well as the estimated number of patterns that we have seen. This gives us a better idea on how to form the limit queries for the CloudWatch query agent.

Now we can see that the pattern tool has responded. The CloudWatch query generator has generated the queries. We have to look for four important aspects here. One is the CloudWatch query itself and what this query does, an estimated count of these log patterns.

Also the instructions back to the CloudWatch query generator to the log analysis agent. The instructions contain how to execute these queries, whether they can be executed in parallel or not, and what time ranges need to be passed, among other details. Here you can see the MCP execute query gets called and the get results gets called multiple times based on the instructions from the CloudWatch query agent. You can see that the results would be displayed shortly. Now we have the results. You can see that a total of 267,282 proposals were received from other peers. Along with it, you get the hourly distribution as well, and also all the peer node IDs which sent proposals, which will also be displayed here. So not only the question that we asked for, but other related information as well is being sent here.

This is the response from the orchestrator agent. It's a summary response that combines both the code analysis agent response and the log analysis agent response. This gives a nice summary view, but we can also have a detailed view by clicking on the view agent analysis button. Here you can see that the code analysis agent response as well as the log analysis agent response would be displayed in detail here. So the user can now ask more questions on top of this and get more results based on this as well.

This is the agent core observability. This was really helpful for us to improve our agent performance. It has the total number of sessions, latency, duration, token usage, as well as the errors that are present in the agent core containers. So this was really helpful for us.

This is the second question we are asking. Here, multiple log lines need to be collated to answer a question. In XRPL we have something called consensus rounds which happen every three to four seconds, and that's a sequence of events that happen for each and every phase. Without understanding this, this question cannot be answered. By this, what I mean is that in the consensus accepted phase, canonical transactions attached would be formed, as you can see, you can find the log line here. And after several log lines, you can see that the built ledger message would be shown which contains the ledger hash for which we have asked for. These two things need to be collated and because it understands consensus rounds, it will be able to correctly form the log lines. It has given the correct log lines to look for.

It's asking us for the canonical transaction set message, individual transactions under them, and built ledger message, and this is sent from the code analysis agent to the log analysis agent which then executes the queries and gives out the ledger hash for us and also all the events that you saw earlier in between these two events. They're also nicely summarized here. You can see when it was seen and when the second message was seen as well, a few seconds apart actually, so it aligns well with how XRPL behaves.

Lessons Learned, Evolution Journey, and Future Roadmap for XRPL Operations

The lessons learned are quite significant. One thing that we totally evaluated, or probably didn't pay much attention to initially, was context. Context engineering is the total key. I think the LLM has a lot of capabilities, but definitely the context is not right. I think that's where it can hallucinate a lot. So context engineering is something that we have learned is pretty important, particularly when you want to design any solutions which are related to AI, and also capturing the decisions, reasoning, and tool calls. It's a clear auditable workflow.

When the year started, when the platform team was doing all these things manually, we decided we're going to use AI. But then if you look at the final architecture, that is not something we envisioned when we started the whole design. So I want to give you a walkthrough of how this architecture evolved. The first quarter is when we started writing the vision document for how this platform team should behave and how we should use the latest technological capabilities like AI to solve some of the problems rather than doing everything in a mundane manual way. So the whole first quarter we were thinking about the vision and in terms of the strategy how this platform team should operate. Initially we were thinking, okay, let's go with the machine learning system. We can train it. We can have all that data in our hand and we can have all these inferences.

However, we noticed that industry-wide standards started shifting completely when agentic AI came into the picture. That's when we realized we should involve more subject matter experts and that we may not be the right team to solve this problem on our own. So we started having discussions with the AWS Space team, which is a prototype initiative team. They quickly came in, and we walked them through the use case and problem scenarios.

One thing that gave us more confidence was that once the Space team got involved, they could build this prototype as quickly as possible. This prototype was one of the key stepping stones for us to conceptualize the solution and have the confidence that this is possible. All the complexities that Hari and Aman walked through in the backend are handled by the foundation layer of the Space Team initiative. We're very grateful to the Space team for helping us out.

In Q3, we noticed that AWS also evolved and started introducing Agent Core. We were probably the first of a kind working on their previous features. We noticed that Agent Core has a lot of infrastructure capabilities that we don't want to deal with. We want to focus on the actual business logic and the problem we want to solve. The infrastructure aspects are definitely handled well by Agent Core, so we don't have to solve all these problems ourselves.

In Q4, we're now working with the AWS Professional Services team to productionize this solution. What do we mean by productionize? We're putting in all the guardrails, VPCs, and making it secure and compliant so we can serve this to the XRPL open source community.

You can definitely see the benefits, particularly in the demo that Hari showcased in the UI chatbot. What used to take at least two to three days for a platform engineer working with a core C++ engineer to produce a meaningful summary can now be done much faster. It's not just direct log lines that give you the answers. If you notice, the logs are very noisy because it's a peer-to-peer network. You get too many messages in the logs, and they can be jumpy and out of sequence. This is where AI definitely helped us create meaningful summaries out of these log lines.

One of the biggest benefits I noticed was the removal of the bidirectional dependency between the platform team and the C++ expert. The platform team always relied on C++ experts to help understand these logs and determine if something was an issue or not. C++ experts always relied on the platform team to provide good logs because storing them in Grafana and making them searchable was also an initiative where the platform team became a bottleneck. Having this one chat interface reduces the dependency between platform and core engineers so they can automatically search for any information and logs we've already pumped in.

That said, we're also noticing that core engineers now want to develop features as fast as possible. So we started pumping devnet logs and testnet logs so that any engineer wanting to build a feature can directly look at their logs, get a meaningful summary, and even compare their current logs to mainnet to make sure what they're developing makes sense and won't introduce any problems. They can quickly understand what's going on in the development cycle.

Time reduction is one of the biggest wins, and enhanced data troubleshooting and data processing at the scale we're talking about means this AI chatbot is definitely helping us in everyday operations. For example, next week we have a standalone release going on for XRP Mainnet. I have platform engineers looking at the logs every day right now via the chatbot to make sure that everyday logs are looking good, and they're giving me a thumbs up that today's logs are good. We have four to five more days of testnet soaking that we want to do. This whole operations efficiency that we're talking about in terms of excellence is definitely being helped a lot by this chatbot AI solution.

We've talked about a lot of this architecture, and there are some capabilities that we see which we could definitely use. One of the things we're noticing is Agent Core memory. The reason is that we don't want every new session to lack memory capabilities where people who use the chat interface end up reingesting the context and question again and again. We definitely want to take advantage of Agent Core memory, which is solving some of these problems. We're also looking into Agent Core identity, which comes out of the box with Agent Core, where we can put in all the role-based access. A secure layer is definitely going to help us solve this problem.

We've solved the problem of log analysis and code analysis. What else we can use with this chatbot is something we've noticed from the XRPL community, particularly working on cryptocurrencies like forensics and account forensics. For example, if you're an XRPL account user, sometimes you might end up losing money because of scams or network scams.

This chatbot can help quickly identify the transaction layer. If a user comes in and puts in their wallet address, the chatbot can quickly go through the RPC calls of different accounts and then give you a meaningful summary of where the source of the fund is and where it finally ends at the destination. This means that anybody who lost funds can quickly understand where the final destination of the currency is being stored and they can involve the relevant law enforcement to quickly pass or freeze the funds before it goes to the malicious actor.

The second thing we are thinking about is network level monitoring. One of the things we noticed is that since it is so cheap to use XRPL, people end up sending a lot of dust transactions and spam transactions that could be quite operationally heavy for the UNL validators. We can quickly identify these dust transactions and spam transactions by looking at the overall network, and we can involve the UNL community so they can quickly be aware of who the problematic accounts are and what those accounts' intentions are. We can quickly identify how we can solve some of these network level transaction issues.

This has been a wonderful journey. I want to definitely call out some of the folks who were helping us build the solution, particularly the Pace team: Shri, Jim, and Ishan. They came in and changed the entire architecture in just six weeks. We never expected this architecture would be this robust and resilient in terms of how we want to take it forward for the next steps. Aman has been a wonderful partner from the AWS side, working with us hand in hand every day to solve this problem, and huge shout out to Parisa for stitching us all together and involving all the right folks to solve this problem.

The Ripple engineering leadership was very supportive when we said we were going to solve this AI problem. They were always about pioneering us to solve this zero to one problem. Without all this support, we would not be able to be here at this stage. I hope you all learned from our lessons so you do not have to reinvent the wheel in terms of your technological choices, particularly if you want to choose a multi-agent system. You can definitely use some of these technological choices that we made. If you have more questions, we are happy to answer.

; This article is entirely auto-generated using Amazon Bedrock.