DEV Community

Cover image for I Got Lost in Canary Wharf for 30 Minutes, But I Found the Future of SRE
Maame Afua A. P. Fordjour
Maame Afua A. P. Fordjour

Posted on

I Got Lost in Canary Wharf for 30 Minutes, But I Found the Future of SRE

If you've ever been to Canary Wharf, you know it's less of a business district and more of a high-stakes escape room designed by someone who really dislikes Google Maps.

I arrived for SRE Day London 2026 at the Everyman Cinema feeling prepared. I had my bag, my notes, and a general idea of where I was going. Fast forward 30 minutes, and I was still wandering around Crossrail Place like a lost protagonist in a sci-fi movie. Between the multiple levels and the "Level -2" hidden entrance, the frustration was real. I spent nearly half an hour pacing back and forth, trying to figure out how to actually get into the venue. Honestly speaking I probably hit 2k on my daily steps from walking around in circles to find the exact location. And for some weird reason, google maps and apple maps don't work really well in Canary Wharf, did some research. Apparently it is due to something called The urban canyon effect. If you are interested in reading more about how that affects GPS signals, you can read this article: urban canyon

But honestly? The second I stepped inside, the frustration evaporated.


The Swag Haul

First things first: the swag. As a student, you quickly learn that the quality of an event is often proportional to the stickers on the table. SRE Day did not disappoint. I loaded up on fridge magnets and stickers, but the highlight was definitely the SRE Day t-shirts. I managed to cop one, and let's just say it's going straight into my weekly rotation.


The Knowledge Drop: Morning to Midday

keynotes

The talks kicked off at 09:00, and sitting on those comfortable Everyman sofas made it feel more like a movie premiere than a tech conference. Here is a breakdown of what I learned before 15:00:

Peter Marshall (Imply): He opened with a keynote on Decoupled Observability. The big takeaway here was that we are often limited by "tightly coupled" architectures where data is stuck to specific tools. By decoupling the data layer, we can scale our detection and investigation without the costs spiralling out of control.

Dewan Ahmed (Harness): He talked about Secure by Default in AI-driven delivery. As we move faster with AI, we run the risk of "automating insecurity at scale." He challenged us to look beyond just scanners and build confidence directly into the pipeline.

Matt Henderson (Phoebe): This was fascinating he compared software reliability to the human immune system. Instead of just reacting to alerts and scrambling when things break, we need systems that can predict and prevent failures, just like our bodies handle threats before we even feel sick.

Tyler Hannan (ClickHouse): He asked a controversial question: Do Metrics Matter? While metrics are the fastest way to see if a system is unhealthy, Tyler pointed out that as systems get more complex, metrics alone aren't enough to understand the "why" behind unpredictable failures.

Birol Yildiz (ilert): He showed AI SRE in action. We're moving toward a world where AI agents can diagnose and remediate outages autonomously. Imagine an incident fixing itself without anyone getting paged at 3 AM!


Lunch and the Afternoon Sprint

After a much-needed pizza break (shoutout to incident.io for powering that!), we dove into the afternoon sessions:

Deniz Yalcin & William Ravensbergen (ING): They reminded us that reliability starts with Customer Data. In banking, if upstream data is malformed or delayed, everything else fails from fraud prevention to user trust. It's an underrated SRE dependency.

Adriana Villela (Dynatrace): She emphasized that Observability is a Team Sport! We often fall into the trap of creating "Observability Silos" just like we did with DevOps. To succeed, observability needs to be integrated across the whole organization, not just a single team.

Heather Thacker (Gatling): She broke down the Performance Testing Arsenal. It's one thing for your app to work on a laptop, but another to handle 10x traffic during a marketing campaign. She covered load, stress, and soak testing—essential tools for any SRE.

Tasmia Niazi: This session was super relatable for me. She shared her journey from learner to leader, explaining that SRE is a mindset and a culture, not just a job title. Adopting that state of mind is what builds resilient teams.


Getting Involved: Tracer Cloud and Open Source

During the networking and sponsor crawl at 14:30, I had the coolest discovery: Tracer Cloud's Open SRE Agent. It's a tool focused on cloud-native alert investigation, using AI to figure out root causes before humans even have to step in.

Because I've been looking for ways to get more "hands-on" experience, I officially signed up to be a contributor! If you want to jump in and contribute to the repo as well, you can find it here:

Tracer Banner

The open-source SRE agent that investigates and debugs your data pipelines.

Slack · Getting Started · Tracer Agent · Docs · Security


Quick Start

git clone https://github.com/Tracer-Cloud/open-sre-agent
cd open-sre-agent
make dev
Enter fullscreen mode Exit fullscreen mode

Documentation → /docs


The Problem

Production data incidents often involve multiple interconnected systems.

Resolving them requires correlating operational signals — logs, metrics, traces, configuration state, and recent changes — across orchestration frameworks, compute engines, and infrastructure.

This investigation process is typically manual and tool-fragmented.


How Tracer Works

How it works

Investigation Workflow

When an alert fires, Tracer:

  1. Ingests the alert from monitoring or incident systems
  2. Assembles context from logs, metrics, configs, and dependencies
  3. Frames potential failure modes
  4. Executes investigation queries across connected systems
  5. Evaluates hypotheses based on collected evidence
  6. Delivers a root cause report and recommended next actions

Capabilities

  • Structured incident investigation
  • Parallel hypothesis execution
  • Cross-system failure correlation
  • Evidence-backed root cause analysis
  • Alert triage and MTTR reduction

Designed for production data engineering…




Contributing to open-source is one of the best ways to move from theory to reality.


What's Next?

Despite the 30-minute maze-running session at the start, SRE Day was a massive win. I came for the t-shirt, but I left with a contributor invite and a much clearer picture of where the industry is heading.

If you are curious about cloud or have deep interests in reliability, it doesn't matter if you're not in London! They host events all around Europe. You should definitely subscribe to their page on Luma to check out their upcoming events.

Also, if you're looking to give back, they have an option to help out as a Community Hero! It's a great way to support the ecosystem while growing your own network.

Now, if someone could just build an agent to help me navigate Canary Wharf next time... that would be great because honestly, I became Maame 'the explorer' in those 30 minutes😂.

Top comments (4)

Collapse
 
eaglelucid profile image
Victor Okefie

The urban canyon effect broke your GPS, but it also broke your assumption that directions are reliable. That's the SRE lesson wrapped in a metaphor: you don't know your system's failure modes until the signal drops. The lost 30 minutes taught you more about Canary Wharf than a map ever could.

Collapse
 
maame-codes profile image
Maame Afua A. P. Fordjour

I honestly knew nothing about the urban canyon till today, it is quite interesting as to how that can affect gps. And I definitely will not depend a 100% on directions from today😂. Tbh in situations like this I think a physical map would be more helpful

Collapse
 
harsh2644 profile image
Harsh

Great write-up! The urban canyon effect is real Canary Wharf is basically a GPS black hole. Next time use the indoor maps in the Crossrail Place app, they actually work underground.

The sessions you covered are gold. The decoupled observability point from Peter Marshall is crucial we've been fighting tightly coupled telemetry pipelines at work and it's a nightmare. Also, Birol's point about AI agents fixing incidents without paging anyone? That's the dream. 😅

Quick question: any demos from the Tracer Cloud agent at the event? Curious how it handles multi cloud scenarios.

Collapse
 
maame-codes profile image
Maame Afua A. P. Fordjour

With the Tracer Cloud agent, I only just forked the repo today when I got home from the event, will play around with it a bit to see how efficient it is. But from the debrief I got from one of the founders he did mention that its an AI agent that investigates data pipeline incidents automatically.... it pulls in logs, metrics, traces, and configs from tools like Airflow, Kafka, Grafana, and Datadog, then generates a root cause report. I think the entire concept is a smart on-call engineer that does the detective work for you. With how it handles multiple cloud scenarios, I am not so sure since I personally haven't tried it. But as time goes on i would give a feedback on that Harsh