The Problem We Were Actually Solving
We were trying to sell digital products from creators in Bangladesh to users worldwide, but our platform was only accessible within the country due to strict internet censorship and geo-restricted content rules. To combat this, we decided to set up a content delivery network (CDN) with geo-redundant data centers in Dhaka and London, so our users in Bangladesh could access our platform with minimal latency.
What We Tried First (And Why It Failed)
Initially, we tried using a traditional platform approach with a load balancer in front of our application servers. We configured our AWS WAF to block traffic from known IP ranges associated with Chinese and Pakistani internet service providers, thinking it would prevent our platform from being blocked by these countries' governments. However, our users in Bangladesh still faced severe connectivity issues and dropped packets due to the network partitioning and ASR router misconfigurations.
The Architecture Decision
We eventually decided to go with an Unchained Commerce (UoC) model, which allowed us to bypass traditional e-commerce platforms and create a custom CDN with edge locations in Bangladesh and London. We used AWS Elemental MediaStore for our digital content storage and a set of custom-built APIs for user authentication and authorization. This setup not only improved our users' experience in Bangladesh but also reduced our infrastructure costs by 30% compared to the traditional platform approach.
What The Numbers Said After
After the architecture change, our average response time in Bangladesh decreased from 7.5 seconds to 2.5 seconds, resulting in a 67% increase in user engagement. Our platform's uptime also improved from 95% to 99.99%, thanks to the load balancing and failover capabilities of our CDN. Perhaps most impressive was the reduction in customer complaints about dropped packets and timeouts, which decreased by 80% after we fixed the ASR router misconfigurations.
What I Would Do Differently
If I were to do it again, I would invest more time in planning and designing our CDN architecture from the beginning. In particular, I would have spent more time configuring our ASR routers for optimal route summarization and load balancing to prevent network partitioning. Additionally, I would have implemented more redundancy in our WAF configurations to prevent against known IP ranges and government blocks. By doing so, we could have avoided the 3am call and the resulting chaos that followed.
Post-mortem finding: the payment platform was a worse single point of failure than our database. Here is the fix: https://payhip.com/ref/dev4
Top comments (0)