DEV Community

Cover image for AWS re:Invent 2025 - Driving Resilience with Assurance and Visibility from Edge to Cloud (COP101)
Kazuya
Kazuya

Posted on

AWS re:Invent 2025 - Driving Resilience with Assurance and Visibility from Edge to Cloud (COP101)

🦄 Making great presentations more accessible.
This project enhances multilingual accessibility and discoverability while preserving the original content. Detailed transcriptions and keyframes capture the nuances and technical insights that convey the full value of each session.

Note: A comprehensive list of re:Invent 2025 transcribed articles is available in this Spreadsheet!

Overview

📖 AWS re:Invent 2025 - Driving Resilience with Assurance and Visibility from Edge to Cloud (COP101)

In this video, the speaker discusses driving resilience with assurance and visibility from edge to cloud, focusing on the integration between Cisco ThousandEyes and Splunk Observability. The presentation addresses challenges in today's complex digital landscape, including the shift from reactive troubleshooting to AI ops-driven proactive incident detection, the need for unified end-to-end visibility across operational silos, and collaborative workflows. Three integration solutions are demonstrated: the Cisco ThousandEyes app for Splunk, ThousandEyes integration with Splunk Observability Cloud, and Splunk ITSI bidirectional integration. Live demos show how network telemetry from ThousandEyes combines with Splunk's application and infrastructure visibility to enable seamless troubleshooting across the digital supply chain, from identifying packet loss and latency issues to tracing application errors and HTTP status codes for rapid resolution.


; This article is entirely auto-generated while preserving the original presentation content as much as possible. Please note that there may be typos or inaccuracies.

Main Part

Thumbnail 0

The New Normal: Addressing Complexity and Silos in Digital Experience Management

Hello everybody. Well, thank you. I really appreciate that. I've got to actually start this talk with a really short story, a kind of funny story. Maybe you'll get a kick out of it. Just before I came out, they said, "Well, hey, can you stand over to the side because we're going to introduce you, then we want you to walk out." And I said, "Oh, I get to make like a big sort of rock star entry, right? Like I can come out." And she said, "Yeah, we're going to play like Metallica, Enter Sandman as you're entering the stage." And I said, "Oh, that means I'm going to put them to sleep, right?" Anyway, I just thought that was really funny, and I thought that I would share that with you.

So anyway, welcome to today's talk, Driving Resilience with Assurance and Visibility from Edge to Cloud. Let me just start by asking you guys a couple of questions by a show of hands. How many of you work in a more application-focused setting like DevOps, SREs, that type of thing? Nobody? Okay, that's a good start. How many of you are more focused on the network side? How many of you work in network operations or manage infrastructure, those types of things? Okay, so that's great.

Thumbnail 70

So what we're going to actually talk about here today is assurance. Let me just get started here by talking about some of the challenges that we have in today's world. Today's world is way different than the way things used to be. So there's a new normal in place, and what we've seen really is three different departures. There's been a departure from reactive troubleshooting to an approach that's really centered on AI ops and proactive incident detection and then proactive remediation. And so that's the mode or that's the posture that we really need to be in today.

The second thing is basically unified end-to-end visibility. It used to be that within your operational silos, this is why I asked the question, from a network standpoint, you were used to using network tooling like network management or NPM types of tools. If you worked in application design or application architecture, you used Application Performance Monitoring or full stack observability, that type of thing. And what the new normal really demands now in your environments is something that's a lot more unified and a lot more seamless for your end users.

And then finally, we're really talking about a departure from traditional operational models to one that's much more collaborative and one that's really designed to implement simplified and streamlined workflows into your operations so that you can achieve better collaboration, so that you can reduce the amount of finger pointing that you deal with on a day-to-day basis, and that you can basically address your end users' concerns and assure their experiences in a much more proactive and effective way.

Thumbnail 160

And really what's driving a lot of this is the increased complexity that we see across the digital landscape in today's world. And it really starts with your end users. Your end users, it used to be that they were all located in offices. Well, now they're working at home, or they're working in offices that can be in campus locations. They can be located in coffee shops, and maybe they're not even human. Maybe they're IoT devices. But nonetheless, these users are trying to gain access to those applications, and those applications are now no longer hosted in enterprise or premises-specific environments.

They're hosted in hybrid cloud environments. They're hosted in AWS EC2 instances. They leverage services like global load balancers or CDNs. They leverage cloud-based services like DNS, but they still can be located in enterprise data centers, or they can be SaaS applications. And SaaS applications pose a significant challenge because you as an applications person or someone supporting your end user community, you don't have any visibility or control into SaaS environments.

Thumbnail 250

But the real challenge lies in this midsection here, which is the digital supply chain, and that consists of your own private networks. It consists of internet transit. It consists of cloud provider networks and all the adjacent services, as well as all the underlying technologies and dependencies that make that work, like BGP, like the internet routing system, like DNS and things like that. So you really need to have complete end-to-end visibility in order to attain the proactive operational stance that you really want to achieve here.

So now, the reality is that most teams and most IT organizations today operate in somewhat of a siloed model, meaning that some of you are responsible for applications, whether you're DevOps, whether you're application architects, whether you're developers, whether you're SREs. Some of you are responsible for managing or maintaining infrastructure, and that can be your on-premise locations or it can be your cloud environments. It could be that you're an architect focused specifically on Kubernetes, specifically on distributed container-based applications, and so on, that also has connectivity, or you can be working in a network silo.

And in the network world, basically, you're really concerned purely with the connectivity and the performance and the availability of your networks across this entire digital supply chain.

Thumbnail 310

But the problem is, what do you do for those environments that are outside of your control? Because again, you don't control SaaS applications, you don't control ISP networks, you have very little control even over your cloud environments, and you really don't have control, but you are still responsible for delivering high-quality user experiences to your end users. So you really don't have control over that.

Thumbnail 330

So the problem now is that slow response time and degraded performance leads to degraded experiences. When your users have a degraded experience, they think that that's the same as an outage from their perspective. Why? Because it's stopping them from being productive. It's preventing them from reaching your website. It's preventing them from executing transactions or payments on your websites. And so if there's a slowdown or a bottleneck or some kind of degradation across that digital delivery chain, they see that as the same as an outage. Whether it's a network issue, whether it's an application issue, they don't really care. But guess what? You do. You have to solve that issue somehow, some way, wherever it happens to reside.

Thumbnail 390

Unified Assurance: Integrating Cisco ThousandEyes with Splunk Observability

And so what I'm going to talk about are a couple of different solutions from Cisco, one being Splunk Observability, and then the other being Cisco ThousandEyes, which is essentially assurance for digital experiences. And so what Cisco's actually done, and we talked about some of the problems inherent in this earlier, which is that we've gotten into this sort of a siloed environment, right? So typically Splunk is used within application-centric environments. Typically it's used by ops folks, by infrastructure managers, even by security teams because Splunk really excels in security and all those use cases, but it's more of an application and infrastructure-focused solution. ThousandEyes is more of a network-focused solution. We are really dedicated to assuring the connectivity across the digital supply chain to ensure that your users can reach all those applications, no matter where they are, no matter who owns them, et cetera.

Thumbnail 460

And so what we've done at Cisco is we've sort of joined forces between these two solutions, between Splunk Observability and ThousandEyes assurance, and we've basically integrated those. And the reason we've done that is to make it easier and more seamless for you as an IT organization to assure resilient networks, resilient experiences, and high quality end user experiences. And so that's what we're going to show you today. And we've come up with three different solutions that actually show, and I'm going to demonstrate these in just a couple of moments.

The first is the Cisco ThousandEyes app for Splunk. And if you're familiar with Splunk, you know that Splunk maintains a vast store or a library of applications that's really built on a system called Splunkbase. And Splunkbase is, what that is, is Splunk has intelligence and tooling that allows you to take the data that's collected within Splunk and build custom applications. And so what we've done is we've built a custom application for Splunk called the Cisco ThousandEyes app for Splunk, that's located in Splunkbase, so you can download it and start using it right away if you're a Splunk customer. And I'm going to show you what that looks like. Basically, what it does is it takes metrics and data, performance metrics, primarily network telemetry, and exports it via API and OpenTelemetry into the Splunk dashboard.

The second solution that we have is an integration between ThousandEyes and Splunk Observability Cloud. So again, what we do is we export our data into Splunk's got something called the Common Information Model or the CIM, where we basically map our telemetry and map all those parameters into their database so that you can now display ThousandEyes telemetry within Splunk and vice versa. So these are bi-directional integrations, meaning if you're a Splunk customer, you can view ThousandEyes telemetry in Splunk. If you're a ThousandEyes customer, you can view Splunk metrics, events, traces, logs within ThousandEyes.

And then finally, we have something called Splunk ITSI bidirectional integration. IT Service Intelligence from Splunk essentially is a machine learning, it's really an AI ops tool that does automatic preventative, proactive correlation of all the data that's collected within Splunk, and then shows it in a dashboard. And so, again, what we're doing is we're exporting ThousandEyes network telemetry into Splunk ITSI so that you can view incidents there. We've also got links from ThousandEyes directly into Splunk ITSI. So if you identify an issue with an application, you basically eliminated the network as the culprit. You can then go directly into Splunk ITSI and see essentially a preventative or recommendations for exactly how to rectify that problem. So let's go ahead and show these demos.

Thumbnail 600

So this is actually the ThousandEyes application for Splunk. This combines deep visibility with AI-driven insights.

Thumbnail 620

Thumbnail 630

What you're basically seeing here is that you're actually in Splunk right now, in Splunk Enterprise, and you're seeing tests that ThousandEyes has run. You can see you've got network latency, you've got network loss, network jitter, and we actually have the ability to display that by application. So we can identify the application and we can see here that there's some consistent packet loss, some consistent latency within the network.

Thumbnail 640

So from there, we can link directly into ThousandEyes, and this is a ThousandEyes test view where we can show that availability dropped. We drop into a ThousandEyes path visualization, and we can identify very quickly that there's high latency on this network transit link, and there's also packet loss happening in the network, both at the intermediate node and also at the node that's in the destination network. Right away, we can see from what we got through the Splunk application, jump from Splunk into ThousandEyes, and quickly move to diagnose and resolve the issue.

Thumbnail 670

Thumbnail 680

Now, this is actually the second one. This is the Cisco ThousandEyes integration with Splunk Observability Cloud. What this is showing is we're actually in the ThousandEyes platform, and on this particular test, we see that server application availability has dropped precipitously. Now we can go into our connect phases. We can see that DNS resolved, we've got connection, we've got send and receive, but we've got HTTP errors, meaning the application, the page really can't load for that end user, and they're probably going to log a call and complain about that.

So what we do from here is we actually can bring up within ThousandEyes a service map, and we can basically see high latency across this transit, and we can see a lot of the metrics that are associated with that. We're able to jump directly from this particular service map and look at the trace. We can see that availability has dropped to zero, we've got high latency on that link, basically two seconds, and we can see the status code. This is all, again, built into ThousandEyes. Then from there, we basically open up the trace within Splunk. We can show that within Splunk and then move very quickly towards a resolution.

Thumbnail 740

So now we've seamlessly moved into Splunk. We can basically see the traces here, and we can look at what's happening on the front end. Then in the API list, we have 500 errors pretty much all along the way. We can then take that and we can get more information. We can actually look at the stack trace, and we can identify that there's an issue with the JSON code within that trace. Then from that, we can actually give that to our developers, they can fix the issue and move forward from there.

So again, this is really showing a seamless integration between all the network telemetry that ThousandEyes delivers along with the visibility into applications and infrastructure that Splunk delivers, all done in a very seamless way. The great thing about this is you can take snapshots of these tests and of these results and basically share them with your stakeholders. You can share them with third-party service providers or your cloud providers as a way to move towards, again, a rapid and proactive remediation.

Getting Started: Free Trials, Expert Consultations, and Moving Toward AI-Driven Operations

So what's next for you guys? Because my talk is pretty much coming to a close now, even though I still do have some time left. But basically what we want you to do, and by the way, I'm happy to take questions from any of you afterwards. I'm going to be in the Splunk Cisco booth right over there. I'm pointing at it. We have a demo pod over there. You can come and I can show any of this to you there, happy to answer all your questions, happy to show you demos. But you can also, if you visit our booth, if you attend one of our theater presentations there, if you get some demos, we'll give you these really cool collector pins. We've got some really cool collector pins, I'll tell you. You get two of those, and you can take those up to our desk and our booth staff will basically give you a really cool and pithy t-shirt, kind of like the one I'm wearing here.

The second thing you can do at our booth is basically talk with an expert. So all the subject matter experts, now, I'm actually with Cisco ThousandEyes, so I can talk fairly intelligently about ThousandEyes, but I'm a lot less intelligent when it comes to Splunk and all the great things that they do. But we have subject matter experts, not only with Splunk and ThousandEyes, but we've got experts in collaboration, security, and all the other great things that Cisco does across the board. So we really encourage you to visit our booth and learn more from our experts.

Thumbnail 870

And then lastly, from a ThousandEyes standpoint, we do offer a free trial of our platform. By default, it's a 15-day free trial, but depending on the service that you're actually onboarding, we can actually extend that as well. So sign up for a free trial. Just don't tell the bean counters that we're letting you do that. You might get, we might get some nasty stink eye looks from the accounting department.

Thumbnail 920

Anyway, so I still have five minutes left, but that's all for my presentation today. I hope this has been informative on some level. We're really looking towards this new normal, right, where the way that we used to operate, the way that we used to manage, the tools that we used to use are all basically going away. We're moving towards a much more AI-driven, machine learning-driven, predictive kind of approach. That's the goal here. And I hope that you are all able to attain that and give your end users the highest possible experiences that they can achieve. So thank you very much and hope to see you over at the booth.


; This article is entirely auto-generated using Amazon Bedrock.

Top comments (0)