Rhythm Saha

Posted on Jul 31

The Reality of Debugging: How I Deal with "That One Bug"

#debuggingtips #webdev #fullstack #mern

The Reality of Debugging: How I Deal with "That One Bug"

Ah, debugging. It's the unglamorous, often infuriating, yet utterly essential part of a developer's life. As a Fullstack Web Developer and the founder of NovexiQ, my new web development agency, I can tell you, a huge chunk of my time isn't spent building fancy new features. Nope. It's spent painstakingly tracking down bugs. And then there's "that one bug." You know the type, right? The one that just defies logic, pops up intermittently, and honestly, makes you question every single line of code you've ever written. Today, I want to share my raw experience with just such a bug – and my systematic approach to tackling it. Maybe it'll help you too!

The Scene of the Crime: A Data Inconsistency Nightmare

This particular saga began on a recent project at NovexiQ – a custom e-commerce platform we were building. Our stack was pretty standard for NovexiQ: Next.js 14 on the frontend, a Node.js/Express backend API, and Prisma handling database interactions. All deployed on Vercel. Honestly, things were humming along beautifully in development, and even during initial UAT (User Acceptance Testing), everything seemed perfect.

Then came the email from the client: "Rhythm, sometimes after a purchase, the product quantity in the cart or on the product page just doesn't update correctly. It's inconsistent."

Intermittent issues? Oh, they're truly the worst. If something's always broken, you can at least trace it down quickly. But "sometimes"? My friends, that's when it feels like it's just you versus the machine, and the machine is definitely winning.

Initial Reconnaissance: Where to Begin?

My first instinct, as always, was to verify. Could I reproduce this ghost? For a good hour, I tried every variation imaginable: purchasing products, adding/removing from the cart, fast clicks, slow clicks, multiple tabs open. Absolutely nothing. It worked perfectly. Every. Single. Time.

This immediately screamed 'race condition' or a very specific edge case I just wasn't hitting during my tests. So, my usual go-to debugging steps commenced. Here's what I did first:

Checking Logs: I dove into Vercel's serverless function logs and my custom backend logs. Everything looked surprisingly normal. Purchases were recorded, stock updates were logged as successful. Hmmm. Confusing.
Database Inspection: I used Prisma Studio to directly inspect the database tables. Post-purchase, the stock count in the database was always correct. This was genuinely confusing. If the database was always correct, why on earth was the UI wrong, even intermittently?
API Responses: Using my trusty browser developer tools, I monitored network requests. And there it was: the API responses for fetching product details or cart contents *sometimes* showed outdated stock, even though the database already had the correct, updated value.

This led me to a strong hypothesis: the issue wasn't the database update itself, but rather how that updated data was being consumed or cached by the frontend or an intermediate layer.

The Deep Dive: Tracing the Data Flow

Knowing the database was eventually consistent, I knew the problem wasn't there. So, I started meticulously tracing the data flow, right from the moment a purchase was confirmed:

Frontend (Next.js):
- How was product data fetched? Were we using getServerSideProps, getStaticProps (with revalidate), or just plain old client-side fetching?
- Was there any client-side state management (like Zustand or Redux, which I often use) holding onto stale data?
- Any pesky local storage or session storage caches at play?
Backend (Node.js/Express):
- When a purchase was made, was the stock update truly atomic?
- Could any concurrent requests possibly lead to a read *before* the write was fully committed?
- Was there an internal caching layer on the API server? (Thankfully, there wasn't, which ruled out one big headache!).
- How was Prisma handling transactions for the stock update?
Deployment Environment (Vercel):
- Ah, Vercel's caching mechanisms for serverless functions and CDN. Was it possible an older version of a serverless function response was being cached at the edge, *before* the database had fully propagated, or even before a subsequent data fetch?

The Clues Start Piling Up

So, I started adding even more detailed logs. And I mean *detailed*:

Timestamped logs for every API call related to product fetching and stock updates.
Logs showing the exact stock value before and after the database write.
Logs on the Next.js frontend to show when data was fetched and what value was received.

After several more testing sessions, with timestamps illuminating everything, one crucial pattern finally emerged:

The issue almost *always* occurred when a user rapidly made a purchase, then immediately navigated to the product page or cart. It wasn't about multiple users, but rather a single user's rapid-fire actions. Bingo! That was a huge clue.

This strongly hinted at a race condition or, equally likely, a caching problem. On the frontend, my Next.js app was using SWR (Stale-While-Revalidate) for client-side data fetching, and for product pages, it sometimes leveraged getStaticProps with a short revalidate period.

The "Aha!" Moment: The Race Condition Strikes

After what felt like endless hours of staring at logs, meticulously going through my Node.js API code line by line (thank goodness for VS Code's debugger!), and scrutinizing network Waterfall charts in Chrome DevTools, the 'aha!' moment finally hit me.

The core issue was a subtle race condition, cleverly combined with Vercel's serverless function execution model and SWR's caching behavior. What a delightful combination, right?

So, here's the insidious chain of events that was happening:

A user completes a purchase, and the frontend sends a request to our /api/checkout endpoint.
The /api/checkout serverless function on Vercel processes the order. Crucially, it updates the product stock using a Prisma transaction. Now, this transaction is fast, but it's not instantaneous.
Almost immediately after that checkout request finishes, the frontend navigates to the order confirmation page (which might fetch product details itself), or the user quickly clicks back to a product page.
This subsequent fetch for updated product data hits the /api/products/:id endpoint.

The nasty bit? Sometimes, the /api/products/:id serverless function, when invoked *very* quickly after the /api/checkout function, would run on a different "cold start" instance. Or it would simply fetch data *before* the database update from the /api/checkout transaction was fully propagated or visible to all database connections. It was like two separate processes racing to read the latest data, and occasionally, the "read" just won the race against the "write's" full commitment and propagation. Sneaky, right?

On top of that, SWR on the frontend was, well, aggressively caching. While it *does* revalidate, if the previous fetch returned stale data because of that backend race, SWR would still display that stale data until its revalidation successfully completed. It was a perfect storm.

The Fix: Ensuring Data Freshness and Consistency

My solution, after much deliberation, involved a few strategic changes. It's all about ensuring data freshness and consistency:

Atomic Updates with Prisma: First, I double-checked and reinforced the use of Prisma's interactive transactions. This was crucial to ensure the entire stock update process (deduction, saving order, etc.) was a single, atomic unit. Most of this was already there, but re-evaluating it really solidified my understanding of its importance in such scenarios.
Explicit Revalidation (Frontend): Second, and this was a big one for the frontend: Explicit Revalidation with SWR. After a successful checkout, instead of solely relying on SWR's automatic revalidation or page navigation, I added a specific SWR mutate call for the relevant product data keys on the client-side. This explicitly tells SWR to refetch the data and update its cache *immediately* after a critical action like a purchase.

// Inside the checkout success handler  
mutate('/api/products/' + productId, undefined, { revalidate: true });  
// Also, invalidate any cart-related caches if applicable  
mutate('/api/cart', undefined, { revalidate: true });

Backend "Read-After-Write" Assurance: Lastly, a "Read-After-Write" Assurance on the Backend. For critical reads immediately following a write (like fetching updated stock on an order confirmation page), I considered if a short delay or a retry mechanism for fetching the data was necessary. Or just ensuring the API call itself was designed to fetch the absolute latest committed state. In *this* particular case, the mutate on the frontend was sufficient – it forced the client to re-request the data, significantly reducing the chance of hitting a stale serverless instance or getting a pre-propagation read. Phew!

After implementing these changes and rigorously testing, the intermittent bug *vanished*! The product quantities were consistently accurate post-purchase, even with rapid navigation. You have no idea what a relief that was!

Lessons Learned from the Trenches

This experience, like so many before it in my journey of building NovexiQ, really reinforced several crucial lessons about debugging. I hope these help you too:

Trust Your Logs (But Always Verify): Logs are absolutely invaluable, but here's the catch: they only tell you what you told them to log. When you're debugging complex issues, don't hesitate to add even more granular logs, especially timestamps. Trust me, they're your best friends.
Systematic Isolation (Be a Detective): Don't just randomly poke at code, hoping for a miracle. Break down the system: frontend, backend, database, deployment environment. Systematically isolate where the inconsistency truly begins. It's exactly like being a detective!
Reproducibility is King: If you can't reliably reproduce a bug, you simply can't reliably fix it. Spend the time on this step, even when it's utterly frustrating. It pays off, I promise you.
Hypothesize and Test: Formulate clear theories about the bug's cause, and then design targeted tests to either prove or disprove them. Treat it like a proper scientific experiment.
Understand Your Stack Deeply: This particular bug really highlighted the importance of understanding your entire stack deeply. Knowing Vercel's serverless function lifecycle, Prisma's transaction isolation, and SWR's caching strategies was key. The more you truly know about your tools, the better equipped you are to debug them effectively.
Take Breaks: Staring at the same code for hours will absolutely lead to tunnel vision. Step away, clear your head, and come back with fresh eyes. Sometimes, the solution just "clicks" after a break. Trust me on this one!
Talk it Out (Rubber Duck Debugging): Explain the problem out loud to an inanimate object (yes, a rubber duck works perfectly!) or, if you're lucky, a patient colleague. The simple act of vocalizing often helps you uncover the logical flaw you've been missing. Give it a shot!

Wrapping Up

Debugging is, without a doubt, a core skill for any developer. And "that one bug"? It'll always find its way into your projects, no matter how careful you are. But instead of dreading it, try to embrace it as a profound learning opportunity. Every single bug you solve makes you a better, more resilient developer. That 'aha!' moment is what makes it all worth it.

As I continue to build NovexiQ and create modern web applications here in Santipur, I know these debugging battles are making my solutions more robust and reliable for every client. They're just part of the journey!

So, what's your go-to strategy for dealing with "that one bug" that just won't quit? I'd love to hear your tips and war stories in the comments below! Happy coding, folks!

Top comments (5)

xtofl • Aug 5

Oh the excitement of the epiphany you get after the dreadful hunt! This text makes it sound heroic.

I always try to find what caused us to put that problem in there: what knowledge did we miss, what habits should we obtain.

For it's often not the tools and libraries that are wrong, but our understanding of their workings and of the way they help solve the end user problem.

The hardest problems in software are told to be naming things and... cache invalidation.