The Performance Problem
Our e-commerce platform served 50,000 daily users across Bangladesh. Infrastructure hosted in Singapore AWS Asia-Pacific region. Users reported slow page loads. Checkout abandonment rate: 28%. Customer support tickets about "website lag" increased 300% month over month.
Initial Performance Metrics (October 2024):
- Average page load time: 3.8 seconds
- Time to First Byte (TTFB): 1,200ms
- Database query latency: 450ms
- API response time: 890ms
- Bounce rate: 42%
- Conversion rate: 2.1%
Network analysis revealed root cause: Geographic distance. Singapore to Dhaka: 2,600 km physical distance. Minimum theoretical latency: 13ms one way. Real world measurements: 85-120ms round trip time.
Every HTTP request sequence:
- DNS resolution: 45ms
- TCP handshake (1.5 RTT): 128ms
- TLS negotiation (1 RTT): 95ms
- HTTP request/response: 890ms application processing + 85ms network
- Total: 1,243ms before first byte
Multiply across resources: 45 HTTP requests per page × 100ms average = 4,500ms additional latency. Pages loading in 3.8 seconds became acceptable slow, not fast.
Infrastructure Audit and Measurement
Application Architecture (Before):
User (Dhaka) → 85ms → Singapore AWS
↓
Application Load Balancer (Singapore)
↓
EC2 t3.medium × 4 (Web tier)
↓ 2ms local network
EC2 m5.large × 3 (Application tier)
↓ 1ms local network
RDS PostgreSQL db.m5.xlarge (Database)
↓
ElastiCache Redis cluster (Cache)
Cost Structure (Monthly):
- EC2 web tier: 4 × $30.37 = $121.48
- EC2 app tier: 3 × $69.98 = $209.94
- RDS database: $208.80
- ElastiCache Redis: $45.36
- Application Load Balancer: $23.76 + $12 LCU = $35.76
- NAT Gateway: $32.85 + $45 data processing = $77.85
- Data transfer: 8 TB × $0.09/GB = $737.28
- EBS storage: 500 GB × $0.10 = $50
- Total: $1,441.11 monthly
Latency Breakdown by Component:
Measured using New Relic APM and custom instrumentation:
DNS Resolution (Route 53):
- Dhaka → Singapore authoritative NS: 42-58ms
- Cache miss penalty: 45ms average
TCP Connection Establishment:
- SYN → SYN-ACK → ACK: 1.5 × 85ms RTT = 127.5ms
- Connection pool reuse: 0ms (when available)
- Connection pool exhaustion: Forces new connections during traffic spikes
TLS Handshake:
- TLS 1.2 (2-RTT): 170ms
- TLS 1.3 (1-RTT): 85ms
- Session resumption: 0ms (when ticket valid)
- Session resumption rate: 62% (38% full handshake)
Application Response Time:
- Simple page (homepage): 245ms processing + 85ms network = 330ms
- Product listing (database query): 680ms + 85ms = 765ms
- Checkout page (multiple API calls): 1,450ms + 85ms = 1,535ms
Database Query Latency:
- SELECT queries: 180-350ms (includes 85ms network RTT)
- Same query on local network: 12-28ms actual execution
- Network penalty: 157-322ms wasted on geographic distance
Regional Migration Strategy
Target Infrastructure Decision:
Evaluated options:
- AWS Singapore → AWS Mumbai: 40ms latency improvement, still 3,200 km from Dhaka
- AWS Mumbai → Dedicated servers Bangladesh: 80ms latency improvement, operational complexity
- Regional cloud provider (Tenbyte): 85ms+ latency improvement, managed services included
Selected Tenbyte Cloud for:
- Dhaka data center: <15km from 65% of user base
- Transparent pricing: No hidden NAT gateway, data transfer fees
- Similar managed services: Load balancing, VPC networking included
- Local ISP peering: BDIX connectivity with Grameenphone, Robi, Banglalink
New Architecture (Tenbyte Cloud - https://www.tenbyte.io/cloud-vm):
User (Dhaka) → 15ms → Dhaka Data Center
↓
Load Balancer (included)
↓
VM 4 vCPU, 8 GB × 4 (Web tier)
↓ 0.2ms local network
VM 8 vCPU, 16 GB × 3 (Application tier)
↓ 0.2ms local network
VM 8 vCPU, 32 GB × 1 (PostgreSQL Primary)
VM 8 vCPU, 32 GB × 1 (PostgreSQL Replica)
↓
VM 4 vCPU, 16 GB × 1 (Redis)
New Cost Structure (Monthly):
- Web tier: 4 × $58.40 = $233.60
- App tier: 3 × $87.60 = $262.80
- Database primary: $160.60
- Database replica: $160.60
- Redis: $87.60
- Load balancer: Included ($0)
- NAT gateway: Included ($0)
- Data transfer: 8 TB included ($0)
- Storage: 500 GB × $0.08 = $40
- Total: $945.20 monthly
Cost savings: $1,441.11 - $945.20 = $495.91 monthly (34% reduction)
Annual savings: $5,950.92
Migration Execution Timeline
Week 1 - Infrastructure Provisioning:
Day 1-2: Account setup, VPC design
- Created VPC: 10.0.0.0/16 CIDR block
- Subnets: Public (10.0.1.0/24), App (10.0.10.0/24), Database (10.0.20.0/24)
- Security groups: Web (80, 443 inbound), App (8080 from web), DB (5432 from app)
Day 3-4: VM deployment via Terraform
resource "tenbyte_vm" "web" {
count = 4
name = "web-${count.index + 1}"
plan = "medium" # 4 vCPU, 8 GB RAM
image = "ubuntu-22.04-lts"
vpc_id = tenbyte_vpc.main.id
subnet_id = tenbyte_subnet.public.id
security_group_ids = [tenbyte_security_group.web.id]
}
resource "tenbyte_vm" "database" {
name = "db-primary"
plan = "xlarge" # 8 vCPU, 32 GB RAM
image = "ubuntu-22.04-lts"
vpc_id = tenbyte_vpc.main.id
subnet_id = tenbyte_subnet.database.id
volume {
size = 200 # GB SSD
type = "ssd"
}
}
Day 5: Database setup
- PostgreSQL 14 installation
- Replication configuration from Singapore RDS
- pg_dump initial data transfer: 45 GB database = 6 hours transfer time
- Streaming replication established: <2 second lag
Week 2 - Application Deployment:
Day 1-2: Application server configuration
- Node.js 18 runtime environment
- PM2 process manager for application clustering
- Environment variables: Database connection strings, Redis endpoints, API keys
- Health check endpoint: GET /health returns 200 OK
Day 3-4: Testing and validation
- Load testing: JMeter 1,000 concurrent users
- Database connection pooling: 20 connections per app server
- Redis cache warming: Pre-populate product catalog (2.3 GB data)
- Application response time: 85% requests <200ms
Day 5: DNS preparation
- Reduced TTL on www.example.com from 3600s to 300s (enables quick rollback)
- Created CNAME record: www-new.example.com → tenbyte load balancer
- Smoke testing via new hostname
Week 3 - Gradual Migration:
Monday: 10% traffic shift
- Updated DNS: 10% weight to Tenbyte, 90% to AWS
- Monitoring: New Relic, CloudWatch, Tenbyte dashboard
- Error rate: Stable at 0.2%
- Response time: 10% traffic seeing 1,100ms average (down from 1,850ms)
Wednesday: 25% traffic
- Increased DNS weight to 25/75 split
- Response time: 25% cohort averaging 950ms
- Database replication lag: <1 second
- No customer complaints
Friday: 50% traffic
- DNS weight: 50/50 split
- Cost monitoring: AWS data transfer fees decreasing proportionally
- Performance: 50% users experiencing 880ms average page load
Week 4 - Full Cutover:
Monday: 75% traffic to Tenbyte
- DNS weight: 75/25
- AWS traffic decreasing, RDS connections dropping
- Application logs: No errors related to migration
Wednesday: 100% traffic migration
- DNS TTL: Full cutover to Tenbyte infrastructure
- AWS infrastructure: Left running 24 hours for rollback capability
- Monitoring: All metrics green
Friday: AWS decommission
- Database final export for archival backup
- EC2 instances terminated
- RDS database deleted (final snapshot retained)
- EBS volumes deleted
- Migration complete
Performance Results
Latency Measurements (After Migration):
DNS Resolution:
- Before: 45ms (Singapore authoritative nameservers)
- After: 8ms (local DNS resolvers, Tenbyte nameservers in region)
- Improvement: 82%
TCP Connection:
- Before: 127.5ms (1.5 RTT × 85ms)
- After: 22ms (1.5 RTT × 14.7ms local latency)
- Improvement: 83%
TLS Handshake:
- Before: 85-170ms (TLS 1.3/1.2)
- After: 15-30ms
- Improvement: 82%
Application Response:
- Before: 890ms average
- After: 245ms average (includes database query, cache lookup, rendering)
- Improvement: 72%
Database Query:
- Before: 450ms (includes 85ms network RTT each direction)
- After: 18ms (local network 0.2ms, actual query execution time)
- Improvement: 96%
Page Load Time Comparison:
Homepage:
- Before: 2.1 seconds
- After: 0.8 seconds
- Improvement: 62%
Product Listing:
- Before: 3.8 seconds
- After: 1.4 seconds
- Improvement: 63%
Checkout Page:
- Before: 5.2 seconds
- After: 2.1 seconds
- Improvement: 60%
Overall Platform Metrics (30 days post migration):
Average Page Load Time:
- Before: 3.8 seconds
- After: 1.6 seconds
- Improvement: 58%
Time to First Byte:
- Before: 1,200ms
- After: 185ms
- Improvement: 85%
API Response Time (p95):
- Before: 1,450ms
- After: 380ms
- Improvement: 74%
Bounce Rate:
- Before: 42%
- After: 28%
- Improvement: 33% reduction
Conversion Rate:
- Before: 2.1%
- After: 3.4%
- Improvement: 62% increase
Customer Complaints:
- Before: 180 tickets/month about slowness
- After: 12 tickets/month
- Improvement: 93% reduction
Business Impact Analysis
Revenue Impact:
Conversion rate improvement: 2.1% → 3.4% (+1.3 percentage points)
Monthly transactions:
- Before: 50,000 visitors × 2.1% = 1,050 transactions
- After: 50,000 visitors × 3.4% = 1,700 transactions
- Additional: 650 transactions monthly
Average order value: BDT 2,500 (approximately $23 USD)
Additional monthly revenue: 650 × BDT 2,500 = BDT 1,625,000 ($15,000 USD)
Annual revenue increase: BDT 19,500,000 ($180,000 USD)
Cost-Benefit Analysis:
Infrastructure cost reduction: $495.91 monthly savings
Revenue increase: $15,000 monthly additional revenue
Total monthly benefit: $15,495.91
Migration costs:
- Engineer time: 160 hours × $50/hour = $8,000
- Testing and validation: $2,000
- Overlap period (running both): $1,441.11 × 0.5 months = $720.56
- Total migration cost: $10,720.56
ROI calculation:
- Payback period: $10,720.56 / $15,495.91 = 0.69 months
- First year benefit: ($15,495.91 × 12) - $10,720.56 = $174,230.36
- ROI: 1,625% first year return
Operational Improvements:
Deployment speed:
- Before: 45 minutes average (Singapore region, slower network)
- After: 8 minutes average (local network, faster instance provisioning)
- Improvement: 82% faster deployments
Developer productivity:
- Local development mirrors production latency characteristics
- Faster testing cycles (no 85ms penalty on each API call)
- Improved debugging (network timeout issues eliminated)
Customer support:
- 93% reduction in slowness related tickets
- Support team redirected to higher value activities
- Customer satisfaction score: 6.8 → 8.4 (out of 10)
Technical Lessons Learned
DNS TTL Management:
Critical for safe migration. Reduced TTL from 3600s to 300s one week before cutover. Enabled rapid rollback capability if issues emerged. Post-migration, gradually increased back to 1800s for caching efficiency while maintaining rollback window.
Database Replication Strategy:
PostgreSQL streaming replication from Singapore to Dhaka worked reliably. Lag remained <2 seconds throughout migration. Post-cutover, converted Singapore database to read replica for disaster recovery. Cross region replication cost: Minimal compared to benefits.
Application Connection Pooling:
Essential for database performance. Configured PgBouncer with:
- Pool mode: Transaction (releases connection after each transaction)
- Max client connections: 200 per app server
- Database pool size: 20 connections
- Pool timeout: 30 seconds
Without proper pooling: Database connection exhaustion occurred during load testing.
Load Balancer Health Checks:
Configuration:
- Protocol: HTTP
- Path: /health
- Interval: 10 seconds
- Timeout: 5 seconds
- Healthy threshold: 2 consecutive successes
- Unhealthy threshold: 3 consecutive failures
Mistake: Initially configured 30-second interval. Instance failures took 90 seconds to detect (3 × 30s). Reduced to 10-second interval improved failover to 30 seconds.
Monitoring and Alerting:
Implemented comprehensive monitoring:
- Application metrics: Response time, error rate, throughput (requests/sec)
- Infrastructure metrics: CPU utilization, memory usage, disk I/O
- Database metrics: Connection count, query time, replication lag
- Network metrics: Bandwidth usage, packet loss, latency
Critical alerts:
- Response time p95 >500ms: Page alert
- Error rate >1%: Page alert
- Database replication lag >5 seconds: Email alert
- CPU >85% for 5 minutes: Email alert
Cost Optimization Strategies:
Rightsize VMs based on actual utilization:
- Initial: Matched AWS instance types exactly
- After 30 days: Analyzed CPU/memory utilization
- Result: Reduced app tier from 8 vCPU to 4 vCPU on 2 instances (underutilized)
- Additional monthly savings: $58.40
Storage optimization:
- Enabled automated snapshots: Daily at 2 AM, 7-day retention
- Cost: 500 GB volume × 20% change rate × 7 days × $0.04/GB-month = $28 monthly
- Versus: AWS snapshot costs were $0.05/GB = $175 monthly for similar retention
Geographic Performance Distribution
User Location Analysis (Google Analytics):
Dhaka users (65% of traffic):
- Before: 3.9 seconds average page load
- After: 1.2 seconds average
- Improvement: 69%
Chittagong users (18% of traffic):
- Before: 4.1 seconds average
- After: 1.8 seconds average (network path Chittagong → Dhaka 25ms)
- Improvement: 56%
Sylhet users (8% of traffic):
- Before: 4.3 seconds average
- After: 2.1 seconds average (longer fiber route)
- Improvement: 51%
International users (9% of traffic):
- Before: 2.8 seconds average (already closer to Singapore)
- After: 3.2 seconds average (now farther from Dhaka)
- Degradation: 14% slower
Trade-off analysis: 91% users experienced dramatic improvement. 9% international users experienced minor degradation. Business decision: Optimize for primary market (Bangladesh users).
Future consideration: CDN implementation for static assets would serve international users from nearby edge locations while maintaining database in Dhaka for primary market.
Recommendations for Similar Migrations
Identify Geographic User Concentration:
Use analytics to determine user distribution:
- Google Analytics: Audience → Geo → Location
- Cloudflare Analytics: Traffic distribution by country
- Application logs: Parse IP addresses, geolocate via MaxMind database
If >70% users concentrated in specific region distant from current infrastructure: Regional migration likely beneficial.
Measure Current Latency Baseline:
Tools for measurement:
- Pingdom: Multi-location synthetic monitoring
- WebPageTest: Waterfall analysis from target locations
- New Relic: Real User Monitoring from actual user devices
- Custom:
curl -w "@curl-format.txt" -o /dev/null -s https://example.com
Calculate network latency component:
- Total response time Application processing time = Network overhead
- If network overhead >50% of total time: Geographic distance likely culprit
Evaluate Provider Options:
Criteria for regional cloud selection:
- Data center location: <100km from user concentration = <10ms latency
- ISP peering: Direct connections with major local carriers
- Pricing transparency: All inclusive versus hidden fee model
- Compliance: Data residency requirements for regulated industries
- Support: Local timezone coverage, language, technical expertise
Tenbyte advantages for Bangladesh/Malaysia:
- Data centers in Dhaka, Chittagong, Kuala Lumpur, Cyberjaya
- BDIX, MyIX peering with Grameenphone, Robi, Banglalink, Time, Maxis
- Transparent pricing: https://www.tenbyte.io/
- 24/7 support in regional timezones
Plan Gradual Migration:
Never big bang cutover for production systems. Phased approach:
- Provision parallel infrastructure (Week 1-2)
- Establish data replication (Week 2-3)
- Deploy application, validate functionality (Week 3)
- Gradual DNS traffic shift: 10% → 25% → 50% → 75% → 100% (Week 4)
- Monitor intensively during each phase
- Maintain rollback capability until 100% stable
Optimize Post Migration:
Migration completes execution. Optimization never stops:
- Monitor resource utilization weekly
- Rightsize VMs based on actual usage
- Implement caching aggressively (Redis, CDN)
- Database query optimization (indexes, query plans)
- Enable compression (gzip, Brotli for text content)
- Image optimization (WebP, responsive sizing, lazy loading)
Each 10% performance improvement compounds business impact.
Top comments (0)