I lead a multi-tenant booking platform where venues sell activities. There's a customer-facing website where users book them. On the main page we show activities available today on initial load. Venues on our platform can be in different time zones.
Our team noticed that for one venue on one of our tenants, today's availability was broken: when you clicked an activity card to see available times, the whole list of activities disappeared. I couldn't reproduce it when I looked at that venue myself, which immediately gave me the feeling that something might be wrong with the date itself. Claude, with read access to the DB, pointed out that the venue was in Australia — and since it knew the app is mostly US-based, it guessed that could be the problem. It turned out the issue was the time zone and the logic we had for past-day availability. (While revising this post I thought: why would activities even show up on first load then? We have two requests for getting activities, and they can override each other — which increases the chance of UI discrepancy.)
After reviewing the code, Claude found that our logic explicitly excluded all activities from requests for past days in the venue's time zone. The reporting team member was in the USA — their today was the venue's yesterday. The constraint: customers had to see past days exactly as before. I had to fix the glitch inside that frozen behavior, so after the update a US-based user simply saw no available times instead of a disappearing list.
Claude's first attempt was to just remove the past-days condition. But that would have required checking how the logic behaves for past days — and that stopped me: it was a blast radius I couldn't cheaply verify. So I went for a very specific fix. For past days I still send all activities, but only with empty time slots. Availability stays exactly as it was for past days, and the UI glitch is gone. I told Claude to reproduce the issue with a red test and confirm the fix with a green one — I always prefer pinning backend bugs with unit tests that reproduce the behavior. One caveat: I couldn't verify against real production data. Our subdomain setup makes running the customer site locally painful, and we have no easy way to fake a time zone in code.
The main outcome: I'm confident that availability — the most complex part of the application — is isolated from this change and unchanged. I only fixed behavior for days, without touching time-of-day logic. That said, while Claude correctly identified the cause of the specific issue, it didn't look at the problem from another angle and didn't see that the architecture itself was the reason the bug could exist. AI needs explicit constraints and rules for issues that live across multiple layers. I'm planning to build infrastructure that catches this level of issue.
Defining the blast radius and isolating problems are the pills that let developers sleep well. You can never be confident in a change if you haven't first drawn the smallest blast radius you can fully reason about. And you can never sleep happy if you let AI work blindly — without understanding the problems inside your code yourself, and without teaching AI about them.
P.S. While writing this post I realized that the logic of how we get activity availability looks like shit. Why would I fetch all activities if I just need availability for one? Something is wrong here — better to isolate the activities request in one go, and the availability/details of a single activity in a second one. That's the right architectural decision, but it's a complex customer-facing task. Does it truly need to happen right now? My rule: file it with a named trigger — refactor when the next bug from the same archetype appears, or when product work touches this flow anyway. Refactoring on disgust alone is how you cause your own outages.ng is wrong here — better to isolate the activities request in one go, and the availability/details of a single activity in a second one. That's the right architectural decision, but it's a complex customer-facing task. Does it truly need to happen right now? My rule: file it with a named trigger — refactor when the next bug from the same archetype appears, or when product work touches this flow anyway. Refactoring on disgust alone is how you cause your own outages.
Top comments (0)