Tenbyte Cloud

Posted on Mar 17

How I Reduced Latency by 40% Using Regional Cloud Setup

#cloudcomputing #ecommerce #webdev #cloudinfrustructure

The Performance Problem

Our e-commerce platform served 50,000 daily users across Bangladesh. Infrastructure hosted in Singapore AWS Asia-Pacific region. Users reported slow page loads. Checkout abandonment rate: 28%. Customer support tickets about "website lag" increased 300% month over month.

Initial Performance Metrics (October 2024):

Average page load time: 3.8 seconds
Time to First Byte (TTFB): 1,200ms
Database query latency: 450ms
API response time: 890ms
Bounce rate: 42%
Conversion rate: 2.1%

Network analysis revealed root cause: Geographic distance. Singapore to Dhaka: 2,600 km physical distance. Minimum theoretical latency: 13ms one way. Real world measurements: 85-120ms round trip time.

Every HTTP request sequence:

DNS resolution: 45ms
TCP handshake (1.5 RTT): 128ms
TLS negotiation (1 RTT): 95ms
HTTP request/response: 890ms application processing + 85ms network
Total: 1,243ms before first byte

Multiply across resources: 45 HTTP requests per page × 100ms average = 4,500ms additional latency. Pages loading in 3.8 seconds became acceptable slow, not fast.

Infrastructure Audit and Measurement

Application Architecture (Before):

User (Dhaka) → 85ms → Singapore AWS
  ↓
Application Load Balancer (Singapore)
  ↓
EC2 t3.medium × 4 (Web tier)
  ↓ 2ms local network
EC2 m5.large × 3 (Application tier)
  ↓ 1ms local network
RDS PostgreSQL db.m5.xlarge (Database)
  ↓
ElastiCache Redis cluster (Cache)

Cost Structure (Monthly):

EC2 web tier: 4 × $30.37 = $121.48
EC2 app tier: 3 × $69.98 = $209.94
RDS database: $208.80
ElastiCache Redis: $45.36
Application Load Balancer: $23.76 + $12 LCU = $35.76
NAT Gateway: $32.85 + $45 data processing = $77.85
Data transfer: 8 TB × $0.09/GB = $737.28
EBS storage: 500 GB × $0.10 = $50
Total: $1,441.11 monthly

Latency Breakdown by Component:

Measured using New Relic APM and custom instrumentation:

DNS Resolution (Route 53):

Dhaka → Singapore authoritative NS: 42-58ms
Cache miss penalty: 45ms average

TCP Connection Establishment:

SYN → SYN-ACK → ACK: 1.5 × 85ms RTT = 127.5ms
Connection pool reuse: 0ms (when available)
Connection pool exhaustion: Forces new connections during traffic spikes

TLS Handshake:

TLS 1.2 (2-RTT): 170ms
TLS 1.3 (1-RTT): 85ms
Session resumption: 0ms (when ticket valid)
Session resumption rate: 62% (38% full handshake)

Application Response Time:

Simple page (homepage): 245ms processing + 85ms network = 330ms
Product listing (database query): 680ms + 85ms = 765ms
Checkout page (multiple API calls): 1,450ms + 85ms = 1,535ms

Database Query Latency:

SELECT queries: 180-350ms (includes 85ms network RTT)
Same query on local network: 12-28ms actual execution
Network penalty: 157-322ms wasted on geographic distance

Regional Migration Strategy

Target Infrastructure Decision:

Evaluated options:

AWS Singapore → AWS Mumbai: 40ms latency improvement, still 3,200 km from Dhaka
AWS Mumbai → Dedicated servers Bangladesh: 80ms latency improvement, operational complexity
Regional cloud provider (Tenbyte): 85ms+ latency improvement, managed services included

Selected Tenbyte Cloud for:

Dhaka data center: <15km from 65% of user base
Transparent pricing: No hidden NAT gateway, data transfer fees
Similar managed services: Load balancing, VPC networking included
Local ISP peering: BDIX connectivity with Grameenphone, Robi, Banglalink

New Architecture (Tenbyte Cloud - https://www.tenbyte.io/cloud-vm):

User (Dhaka) → 15ms → Dhaka Data Center
  ↓
Load Balancer (included)
  ↓
VM 4 vCPU, 8 GB × 4 (Web tier)
  ↓ 0.2ms local network
VM 8 vCPU, 16 GB × 3 (Application tier)
  ↓ 0.2ms local network
VM 8 vCPU, 32 GB × 1 (PostgreSQL Primary)
VM 8 vCPU, 32 GB × 1 (PostgreSQL Replica)
  ↓
VM 4 vCPU, 16 GB × 1 (Redis)

New Cost Structure (Monthly):

Web tier: 4 × $58.40 = $233.60
App tier: 3 × $87.60 = $262.80
Database primary: $160.60
Database replica: $160.60
Redis: $87.60
Load balancer: Included ($0)
NAT gateway: Included ($0)
Data transfer: 8 TB included ($0)
Storage: 500 GB × $0.08 = $40
Total: $945.20 monthly

Cost savings: $1,441.11 - $945.20 = $495.91 monthly (34% reduction)
Annual savings: $5,950.92

Migration Execution Timeline

Week 1 - Infrastructure Provisioning:

Day 1-2: Account setup, VPC design

Created VPC: 10.0.0.0/16 CIDR block
Subnets: Public (10.0.1.0/24), App (10.0.10.0/24), Database (10.0.20.0/24)
Security groups: Web (80, 443 inbound), App (8080 from web), DB (5432 from app)

Day 3-4: VM deployment via Terraform

resource "tenbyte_vm" "web" {
  count  = 4
  name   = "web-${count.index + 1}"
  plan   = "medium"  # 4 vCPU, 8 GB RAM
  image  = "ubuntu-22.04-lts"
  vpc_id = tenbyte_vpc.main.id
  subnet_id = tenbyte_subnet.public.id
  security_group_ids = [tenbyte_security_group.web.id]
}

resource "tenbyte_vm" "database" {
  name   = "db-primary"
  plan   = "xlarge"  # 8 vCPU, 32 GB RAM
  image  = "ubuntu-22.04-lts"
  vpc_id = tenbyte_vpc.main.id
  subnet_id = tenbyte_subnet.database.id

  volume {
    size = 200  # GB SSD
    type = "ssd"
  }
}

Day 5: Database setup

PostgreSQL 14 installation
Replication configuration from Singapore RDS
pg_dump initial data transfer: 45 GB database = 6 hours transfer time
Streaming replication established: <2 second lag

Week 2 - Application Deployment:

Day 1-2: Application server configuration

Node.js 18 runtime environment
PM2 process manager for application clustering
Environment variables: Database connection strings, Redis endpoints, API keys
Health check endpoint: GET /health returns 200 OK

Day 3-4: Testing and validation

Load testing: JMeter 1,000 concurrent users
Database connection pooling: 20 connections per app server
Redis cache warming: Pre-populate product catalog (2.3 GB data)
Application response time: 85% requests <200ms

Day 5: DNS preparation

Reduced TTL on www.example.com from 3600s to 300s (enables quick rollback)
Created CNAME record: www-new.example.com → tenbyte load balancer
Smoke testing via new hostname

Week 3 - Gradual Migration:

Monday: 10% traffic shift

Updated DNS: 10% weight to Tenbyte, 90% to AWS
Monitoring: New Relic, CloudWatch, Tenbyte dashboard
Error rate: Stable at 0.2%
Response time: 10% traffic seeing 1,100ms average (down from 1,850ms)

Wednesday: 25% traffic

Increased DNS weight to 25/75 split
Response time: 25% cohort averaging 950ms
Database replication lag: <1 second
No customer complaints

Friday: 50% traffic

DNS weight: 50/50 split
Cost monitoring: AWS data transfer fees decreasing proportionally
Performance: 50% users experiencing 880ms average page load

Week 4 - Full Cutover:

Monday: 75% traffic to Tenbyte

DNS weight: 75/25
AWS traffic decreasing, RDS connections dropping
Application logs: No errors related to migration

Wednesday: 100% traffic migration

DNS TTL: Full cutover to Tenbyte infrastructure
AWS infrastructure: Left running 24 hours for rollback capability
Monitoring: All metrics green

Friday: AWS decommission

Database final export for archival backup
EC2 instances terminated
RDS database deleted (final snapshot retained)
EBS volumes deleted
Migration complete

Performance Results

Latency Measurements (After Migration):

DNS Resolution:

Before: 45ms (Singapore authoritative nameservers)
After: 8ms (local DNS resolvers, Tenbyte nameservers in region)
Improvement: 82%

TCP Connection:

Before: 127.5ms (1.5 RTT × 85ms)
After: 22ms (1.5 RTT × 14.7ms local latency)
Improvement: 83%

TLS Handshake:

Before: 85-170ms (TLS 1.3/1.2)
After: 15-30ms
Improvement: 82%

Application Response:

Before: 890ms average
After: 245ms average (includes database query, cache lookup, rendering)
Improvement: 72%

Database Query:

Before: 450ms (includes 85ms network RTT each direction)
After: 18ms (local network 0.2ms, actual query execution time)
Improvement: 96%

Page Load Time Comparison:

Homepage:

Before: 2.1 seconds
After: 0.8 seconds
Improvement: 62%

Product Listing:

Before: 3.8 seconds
After: 1.4 seconds
Improvement: 63%

Checkout Page:

Before: 5.2 seconds
After: 2.1 seconds
Improvement: 60%

Overall Platform Metrics (30 days post migration):

Average Page Load Time:

Before: 3.8 seconds
After: 1.6 seconds
Improvement: 58%

Time to First Byte:

Before: 1,200ms
After: 185ms
Improvement: 85%

API Response Time (p95):

Before: 1,450ms
After: 380ms
Improvement: 74%

Bounce Rate:

Before: 42%
After: 28%
Improvement: 33% reduction

Conversion Rate:

Before: 2.1%
After: 3.4%
Improvement: 62% increase

Customer Complaints:

Before: 180 tickets/month about slowness
After: 12 tickets/month
Improvement: 93% reduction

Business Impact Analysis

Revenue Impact:

Conversion rate improvement: 2.1% → 3.4% (+1.3 percentage points)

Monthly transactions:

Before: 50,000 visitors × 2.1% = 1,050 transactions
After: 50,000 visitors × 3.4% = 1,700 transactions
Additional: 650 transactions monthly

Average order value: BDT 2,500 (approximately $23 USD)
Additional monthly revenue: 650 × BDT 2,500 = BDT 1,625,000 ($15,000 USD)
Annual revenue increase: BDT 19,500,000 ($180,000 USD)

Cost-Benefit Analysis:

Infrastructure cost reduction: $495.91 monthly savings
Revenue increase: $15,000 monthly additional revenue
Total monthly benefit: $15,495.91

Migration costs:

Engineer time: 160 hours × $50/hour = $8,000
Testing and validation: $2,000
Overlap period (running both): $1,441.11 × 0.5 months = $720.56
Total migration cost: $10,720.56

ROI calculation:

Payback period: $10,720.56 / $15,495.91 = 0.69 months
First year benefit: ($15,495.91 × 12) - $10,720.56 = $174,230.36
ROI: 1,625% first year return

Operational Improvements:

Deployment speed:

Before: 45 minutes average (Singapore region, slower network)
After: 8 minutes average (local network, faster instance provisioning)
Improvement: 82% faster deployments

Developer productivity:

Local development mirrors production latency characteristics
Faster testing cycles (no 85ms penalty on each API call)
Improved debugging (network timeout issues eliminated)

Customer support:

93% reduction in slowness related tickets
Support team redirected to higher value activities
Customer satisfaction score: 6.8 → 8.4 (out of 10)

Technical Lessons Learned

DNS TTL Management:

Critical for safe migration. Reduced TTL from 3600s to 300s one week before cutover. Enabled rapid rollback capability if issues emerged. Post-migration, gradually increased back to 1800s for caching efficiency while maintaining rollback window.

Database Replication Strategy:

PostgreSQL streaming replication from Singapore to Dhaka worked reliably. Lag remained <2 seconds throughout migration. Post-cutover, converted Singapore database to read replica for disaster recovery. Cross region replication cost: Minimal compared to benefits.

Application Connection Pooling:

Essential for database performance. Configured PgBouncer with:

Pool mode: Transaction (releases connection after each transaction)
Max client connections: 200 per app server
Database pool size: 20 connections
Pool timeout: 30 seconds

Without proper pooling: Database connection exhaustion occurred during load testing.

Load Balancer Health Checks:

Configuration:

Protocol: HTTP
Path: /health
Interval: 10 seconds
Timeout: 5 seconds
Healthy threshold: 2 consecutive successes
Unhealthy threshold: 3 consecutive failures

Mistake: Initially configured 30-second interval. Instance failures took 90 seconds to detect (3 × 30s). Reduced to 10-second interval improved failover to 30 seconds.

Monitoring and Alerting:

Implemented comprehensive monitoring:

Application metrics: Response time, error rate, throughput (requests/sec)
Infrastructure metrics: CPU utilization, memory usage, disk I/O
Database metrics: Connection count, query time, replication lag
Network metrics: Bandwidth usage, packet loss, latency

Critical alerts:

Response time p95 >500ms: Page alert
Error rate >1%: Page alert
Database replication lag >5 seconds: Email alert
CPU >85% for 5 minutes: Email alert

Cost Optimization Strategies:

Rightsize VMs based on actual utilization:

Initial: Matched AWS instance types exactly
After 30 days: Analyzed CPU/memory utilization
Result: Reduced app tier from 8 vCPU to 4 vCPU on 2 instances (underutilized)
Additional monthly savings: $58.40

Storage optimization:

Enabled automated snapshots: Daily at 2 AM, 7-day retention
Cost: 500 GB volume × 20% change rate × 7 days × $0.04/GB-month = $28 monthly
Versus: AWS snapshot costs were $0.05/GB = $175 monthly for similar retention

Geographic Performance Distribution

User Location Analysis (Google Analytics):

Dhaka users (65% of traffic):

Before: 3.9 seconds average page load
After: 1.2 seconds average
Improvement: 69%

Chittagong users (18% of traffic):

Before: 4.1 seconds average
After: 1.8 seconds average (network path Chittagong → Dhaka 25ms)
Improvement: 56%

Sylhet users (8% of traffic):

Before: 4.3 seconds average
After: 2.1 seconds average (longer fiber route)
Improvement: 51%

International users (9% of traffic):

Before: 2.8 seconds average (already closer to Singapore)
After: 3.2 seconds average (now farther from Dhaka)
Degradation: 14% slower

Trade-off analysis: 91% users experienced dramatic improvement. 9% international users experienced minor degradation. Business decision: Optimize for primary market (Bangladesh users).

Future consideration: CDN implementation for static assets would serve international users from nearby edge locations while maintaining database in Dhaka for primary market.

Recommendations for Similar Migrations

Identify Geographic User Concentration:

Use analytics to determine user distribution:

Google Analytics: Audience → Geo → Location
Cloudflare Analytics: Traffic distribution by country
Application logs: Parse IP addresses, geolocate via MaxMind database

If >70% users concentrated in specific region distant from current infrastructure: Regional migration likely beneficial.

Measure Current Latency Baseline:

Tools for measurement:

Pingdom: Multi-location synthetic monitoring
WebPageTest: Waterfall analysis from target locations
New Relic: Real User Monitoring from actual user devices
Custom: curl -w "@curl-format.txt" -o /dev/null -s https://example.com

Calculate network latency component:

Total response time Application processing time = Network overhead
If network overhead >50% of total time: Geographic distance likely culprit

Evaluate Provider Options:

Criteria for regional cloud selection:

Data center location: <100km from user concentration = <10ms latency
ISP peering: Direct connections with major local carriers
Pricing transparency: All inclusive versus hidden fee model
Compliance: Data residency requirements for regulated industries
Support: Local timezone coverage, language, technical expertise

Tenbyte advantages for Bangladesh/Malaysia:

Data centers in Dhaka, Chittagong, Kuala Lumpur, Cyberjaya
BDIX, MyIX peering with Grameenphone, Robi, Banglalink, Time, Maxis
Transparent pricing: https://www.tenbyte.io/
24/7 support in regional timezones

Plan Gradual Migration:

Never big bang cutover for production systems. Phased approach:

Provision parallel infrastructure (Week 1-2)
Establish data replication (Week 2-3)
Deploy application, validate functionality (Week 3)
Gradual DNS traffic shift: 10% → 25% → 50% → 75% → 100% (Week 4)
Monitor intensively during each phase
Maintain rollback capability until 100% stable

Optimize Post Migration:

Migration completes execution. Optimization never stops:

Monitor resource utilization weekly
Rightsize VMs based on actual usage
Implement caching aggressively (Redis, CDN)
Database query optimization (indexes, query plans)
Enable compression (gzip, Brotli for text content)
Image optimization (WebP, responsive sizing, lazy loading)

Each 10% performance improvement compounds business impact.

DEV Community