DEV Community

Cover image for How We Eliminated Duplicate Orders at Scale: A Python Post-Mortem from EyecareWell
darkbranchcore
darkbranchcore

Posted on

How We Eliminated Duplicate Orders at Scale: A Python Post-Mortem from EyecareWell

Introduction — A Production Incident

EyecareWell operates a Shopify-backed ecommerce platform alongside a Windows blue-light filter application. As daily order volume increased, we encountered a production incident: duplicate webhook executions causing order duplication and inventory drift.

This article documents the incident, root cause, and the engineering fixes applied using Python.

Incident Summary

  • Duplicate order creation
  • Inventory mismatch across systems
  • Repeated webhook retries from Shopify

Root Cause Analysis
The webhook endpoint was synchronous and non-idempotent. Shopify retries events when responses exceed timeout thresholds, leading to race conditions and duplicate writes.

Architecture Before

  • Flask-based webhook endpoint
  • Direct DB writes
  • No background processing
  • No deduplication

Architecture After

  • FastAPI async endpoint
  • Redis-backed idempotency
  • Celery background workers
  • Immediate HTTP acknowledgment

Key Python Implementation

if redis.exists(event_id):
    return {"status": "duplicate"}
redis.set(event_id, "processing", ex=3600)
Enter fullscreen mode Exit fullscreen mode

Observability Improvements

  • Structured logging
  • Retry metrics
  • Error alerting

Results

  • 0 duplicate orders post-fix
  • 70% reduction in webhook latency
  • Stable inventory reconciliation

Lessons Learned

  • Always assume webhooks will retry
  • Idempotency is non-optional
  • Async processing improves reliability

Why This Matters
This architecture allows EyecareWell to scale safely while maintaining trust for health-focused products.

Conclusion
This case study demonstrates how disciplined Python backend design solves real production issues in ecommerce systems.

Author
Richard Zhong
support@eyecarewell.com
telegram 13854573452

Top comments (11)

Collapse
 
art_light profile image
Art light

This is a great post—clear, practical, and grounded in a real production incident, which makes the lessons very easy to trust. I really like the move to async processing with Redis-backed idempotency; it feels like the right long-term solution rather than a quick patch, especially at scale with Shopify webhooks. I’d be very interested to see how this setup behaves under even higher load or with additional event types, as this pattern seems broadly reusable.

Collapse
 
darkbranchcore profile image
darkbranchcore

Thank you for the thoughtful feedback—I really appreciate you taking the time to dig into both the incident and the solution. Your point about reusability is encouraging, and exploring this pattern under higher load and more event types is definitely an exciting next step.

Collapse
 
light_house_c13705568410a profile image
refinedlogic

Really solid post—this is a clear, honest breakdown of a real production issue and the fix feels both pragmatic and scalable. The shift to idempotency plus async processing is exactly what I’d expect for Shopify webhooks at this scale, and the measurable results make the solution feel very credible. I’d love to see a follow-up diving deeper into the Redis key strategy or how you handle edge cases during partial failures.

Collapse
 
darkbranchcore profile image
darkbranchcore

Thank you so much—really appreciate the thoughtful feedback and encouragement. I’m glad the approach resonated, and a deeper dive into the Redis key strategy and failure edge cases sounds like a great idea for a follow-up.

Collapse
 
henrikdev12 profile image
henrikdev12

This is an excellent example of treating webhook delivery as an unreliable system. The idempotency approach with Redis is clean and production-ready.

Collapse
 
darkbranchcore profile image
darkbranchcore

Absolutely! I really appreciate how you tackled this—your Redis-based idempotency solution is both elegant and practical for real-world production. Truly inspiring work!

Collapse
 
lightbeambeauty profile image
Ronny Kojima

Solid FastAPI + Celery architecture. Simple, understandable, and scalable — no unnecessary abstractions.

Collapse
 
pubtcollin profile image
AI developer

Really appreciate the clear separation between acknowledgment and background processing. This is exactly how webhook consumers should be designed.

Collapse
 
darkbranchcore profile image
darkbranchcore

Thanks for the great insight—your approach to separating acknowledgment from background processing is spot on and shows strong engineering judgment. It’s a clean, scalable pattern and a solid example others can confidently follow.

Collapse
 
agentdevwell profile image
Agent-Dev-Well

The post-mortem style makes this incredibly practical. Not enough articles show real failures and fixes like this.

Collapse
 
darkbranchcore profile image
darkbranchcore

Totally agree — this kind of honest post-mortem is incredibly valuable. Sharing real failures and concrete fixes like this helps the whole community learn and improve.