DEV Community

Cover image for Building a scalable multi-bucket load balancer for Cloudflare R2
image2url
image2url

Posted on

Building a scalable multi-bucket load balancer for Cloudflare R2

Building a Multi-Bucket Load Balancer for Cloudflare R2: Production-Ready Architecture

Introduction: The Challenge of Scalable Image Storage

In today's digital landscape, image storage has become a critical component for web applications. Whether you're running a social media platform, an e-commerce site, or a content management system, the ability to reliably store and serve images at scale is paramount.

The Problem

  • Single bucket dependencies create single points of failure
  • Monthly storage limits force capacity planning challenges
  • Manual bucket switching is error-prone and inefficient
  • No visibility into storage usage across buckets

Our Solution

We'll build a production-ready multi-bucket load balancing system for Cloudflare R2 that:

  • Automatically distributes uploads across multiple buckets
  • Provides real-time usage tracking and capacity management
  • Implements intelligent failover and health monitoring
  • Supports multiple load balancing strategies
  • Includes Redis persistence for distributed deployments

System Architecture

Core Components

  1. Load Balancer Engine (r2-load-balancer-v2.ts)

    • Multiple selection strategies (priority, least-used, weighted-round-robin)
    • Health monitoring with automatic cooldown
    • Capacity reservation system
  2. Persistence Layer (r2-persistence-redis.ts)

    • Redis-based usage tracking with atomic operations
    • Idempotent upload recording
    • Distributed consistency for multi-instance deployments
  3. Upload Helper (r2-upload-enhanced.ts)

    • Automatic retry with exponential backoff
    • Intelligent bucket switching on failures
    • Comprehensive error handling and reporting

Data Flow

Upload Request → Load Balancer → Bucket Selection → Capacity Reservation → Upload → Usage Tracking → Success Response
Enter fullscreen mode Exit fullscreen mode

Implementation Guide

Step 1: Setting Up Multiple R2 Buckets

First, let's configure multiple Cloudflare R2 buckets:

# Example bucket configuration
R2_BUCKETS=bucket1,bucket2,bucket3

R2_BUCKET1_NAME=image-storage-primary
R2_BUCKET1_ACCOUNT_ID=your_account_id_1
R2_BUCKET1_ACCESS_KEY_ID=your_access_key_1
R2_BUCKET1_SECRET_ACCESS_KEY=your_secret_1
R2_BUCKET1_PUBLIC_URL=https://primary.your-domain.r2.dev
R2_BUCKET1_MONTHLY_LIMIT=10
R2_BUCKET1_PRIORITY=1

# Repeat for bucket2 and bucket3 with different priorities
Enter fullscreen mode Exit fullscreen mode

Step 2: Redis Configuration

For distributed usage tracking:

const redisPersistence = new R2RedisPersistence({
  redisUrl: process.env.REDIS_URL,
  keyPrefix: 'r2-load-balancer:',
  ttl: 30 * 24 * 60 * 60 // 30 days
});
Enter fullscreen mode Exit fullscreen mode

Step 3: Load Balancer Implementation

The core load balancer supports three strategies:

Priority-Based Selection

// Buckets with lower priority numbers are selected first
const strategy = { strategy: 'priority-first' };
Enter fullscreen mode Exit fullscreen mode

Least-Used Selection

// Selects bucket with lowest usage percentage
const strategy = { strategy: 'least-used' };
Enter fullscreen mode Exit fullscreen mode

Weighted Round Robin

// Distributes load based on configured weights
const strategy = {
  strategy: 'weighted-round-robin',
  weights: { bucket1: 3, bucket2: 2, bucket3: 1 }
};
Enter fullscreen mode Exit fullscreen mode

Production Deployment Considerations

Environment Setup

Required Environment Variables:

# Multi-bucket configuration
R2_BUCKETS=bucket1,bucket2,bucket3
R2_BUCKET1_NAME=your_bucket_name
R2_BUCKET1_ACCOUNT_ID=your_account_id
# ... more bucket configs

# Redis for persistence
REDIS_URL=redis://localhost:6379
R2_USAGE_TTL=2592000

# Admin API keys
ADMIN_API_KEY=your_secure_admin_key
Enter fullscreen mode Exit fullscreen mode

Docker Deployment

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
Enter fullscreen mode Exit fullscreen mode

Monitoring and Observability

Health Check Endpoint:

curl -H "Authorization: Bearer $ADMIN_API_KEY" https://your-domain.com/api/health
Enter fullscreen mode Exit fullscreen mode

Usage Statistics:

curl -H "Authorization: Bearer $ADMIN_API_KEY" https://your-domain.com/api/r2-status
Enter fullscreen mode Exit fullscreen mode

Performance Metrics:

  • Upload success rate
  • Bucket health scores
  • Capacity utilization
  • Response times

Advanced Features and Best Practices

Capacity Reservation System

To prevent concurrent uploads from exceeding bucket limits:

// Reserve capacity before upload
const reservationId = uuidv4();
const reserved = await loadBalancer.reserveCapacity(bucketId, fileSize, reservationId);
if (!reserved) {
  // Try next bucket or return error
}
Enter fullscreen mode Exit fullscreen mode

Failure Recovery and Health Monitoring

The system automatically detects and handles failures:

// Automatic cooldown for consecutive failures
if (bucket.consecutiveFailures >= 3) {
  const cooldownMinutes = Math.min(30, bucket.consecutiveFailures * 5); // Max 30 minutes
  bucket.cooldownUntil = new Date(Date.now() + cooldownMinutes * 60 * 1000);
  console.warn(`Bucket ${bucketId} entered cooldown for ${cooldownMinutes} minutes due to ${bucket.consecutiveFailures} failures`);
}
Enter fullscreen mode Exit fullscreen mode

Usage Analytics

Comprehensive tracking with Redis:

interface UsageAnalytics {
  totalUploads: number;
  totalBytesGB: number;
  averageResponseTime: number;
  bucketHealthScores: Record<string, number>;
  monthlyResetSchedule: Date;
}
Enter fullscreen mode Exit fullscreen mode

Performance Optimization Techniques

Connection Pooling

S3 clients are cached and reused:

export class R2ClientFactory {
  private static clientCache: Map<string, S3Client> = new Map();

  public static getClient(bucketConfig: R2BucketConfig): S3Client {
    const cacheKey = `${bucketConfig.id}-${bucketConfig.accountId}`;

    let client = this.clientCache.get(cacheKey);
    if (!client) {
      client = new S3Client({
        region: 'auto',
        endpoint: `https://${bucketConfig.accountId}.r2.cloudflarestorage.com`,
        credentials: {
          accessKeyId: bucketConfig.accessKeyId,
          secretAccessKey: bucketConfig.secretAccessKey,
        },
      });
      this.clientCache.set(cacheKey, client);
    }
    return client;
  }
}
Enter fullscreen mode Exit fullscreen mode

Atomic Operations

Redis Lua scripts ensure atomicity:

-- Upload recording script
if redis.call('EXISTS', KEYS[1]) == 1 then
  return {1, redis.call('HGET', KEYS[2], 'totalBytesGB')}
end

local total = redis.call('HINCRBYFLOAT', KEYS[2], 'totalBytesGB', ARGV[1])
redis.call('SETEX', KEYS[1], ARGV[2], '1')
return {0, total}
Enter fullscreen mode Exit fullscreen mode

Response Time Optimization

  • Parallel bucket health checks
  • Cached usage statistics
  • Pre-computed routing decisions
  • Minimal Redis round trips

Testing and Validation

Unit Tests

describe('R2LoadBalancer', () => {
  it('should select bucket with lowest usage', async () => {
    const loadBalancer = createEnhancedR2LoadBalancer({ strategy: 'least-used' });

    const bucket = await loadBalancer.selectOptimalBucket(1024 * 1024);
    expect(bucket?.id).toBe('least_used_bucket');
  });

  it('should handle bucket failures gracefully', async () => {
    const loadBalancer = createEnhancedR2LoadBalancer();

    // Simulate bucket failure
    await loadBalancer.recordFailedUpload('bucket1', new Error('Connection failed'));

    const bucket = await loadBalancer.selectOptimalBucket(1024 * 1024);
    expect(bucket?.id).not.toBe('bucket1');
  });
});
Enter fullscreen mode Exit fullscreen mode

Load Testing

// Concurrent upload testing
const concurrentUploads = Array.from({length: 100}, (_, i) =>
  loadBalancer.upload(file, `test-${i}.jpg`)
);

await Promise.all(concurrentUploads);
Enter fullscreen mode Exit fullscreen mode

Integration Tests

describe('API Integration', () => {
  it('should upload with load balancing', async () => {
    const response = await fetch('/api/upload', {
      method: 'POST',
      body: formData,
    });

    expect(response.status).toBe(200);
    const result = await response.json();
    expect(result.bucketId).toBeDefined();
  });
});
Enter fullscreen mode Exit fullscreen mode

Monitoring and Alerting

Key Metrics

Business Metrics:

  • Upload success rate: > 99.5%
  • Average response time: < 2 seconds
  • Bucket utilization: < 90%
  • Monthly reset成功率: 100%

Technical Metrics:

  • Redis connection health
  • S3 API error rates
  • Memory usage patterns
  • Request throughput

Alerting Rules

# Example alerting rules
alerts:
  - name: "High bucket utilization"
    condition: "usage_percentage > 90"
    action: "scale_up_or_add_bucket"

  - name: "Bucket health degradation"
    condition: "health_score < 50"
    action: "investigate_and_rotate"

  - name: "Redis connection failure"
    condition: "redis_connection_status != 'connected'"
    action: "emergency_failover"
Enter fullscreen mode Exit fullscreen mode

Dashboard Configuration

// Grafana dashboard panels
const dashboardConfig = {
  panels: [
    {
      title: "Upload Success Rate",
      type: "stat",
      query: "sum(rate(uploads_total)) / sum(uploads_attempts)"
    },
    {
      title: "Bucket Utilization",
      type: "graph",
      query: "bucket_usage_percentage"
    },
    {
      title: "Response Times",
      type: "histogram",
      query: "upload_response_time_seconds"
    }
  ]
};
Enter fullscreen mode Exit fullscreen mode

Security Best Practices

Credential Management

  • Store R2 credentials in environment variables
  • Use IAM roles with minimum required permissions
  • Implement credential rotation policies
  • Monitor for suspicious API usage

Input Validation

// File size and type validation
if (file.size > MAX_FILE_SIZE) {
  return { error: 'File too large' };
}

if (!ALLOWED_MIME_TYPES.includes(file.type)) {
  return { error: 'Unsupported file type' };
}
Enter fullscreen mode Exit fullscreen mode

Rate Limiting

// IP-based rate limiting
const rateLimit = new Map<string, { count: number; resetTime: number }>();

function checkRateLimit(ip: string, limit: number, windowMs: number): boolean {
  const now = Date.now();
  const record = rateLimit.get(ip);

  if (!record || now > record.resetTime) {
    rateLimit.set(ip, { count: 1, resetTime: now + windowMs });
    return true;
  }

  return record.count < limit;
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

Building a scalable multi-bucket load balancer for Cloudflare R2 provides several key benefits:

High Availability: Automatic failover ensures 99.9%+ uptime
Scalability: Horizontal scaling through bucket addition
Cost Optimization: Efficient capacity utilization reduces costs
Monitoring: Real-time visibility into system health
Reliability: Atomic operations prevent data inconsistency

The implementation we've covered here is production-ready and can handle enterprise-scale workloads. The modular design allows for easy extension and customization based on specific requirements.

Next Steps

  1. Deploy to Production: Follow the deployment guide for your environment
  2. Monitor Performance: Set up monitoring and alerting
  3. Scale as Needed: Add more buckets as traffic increases
  4. Optimize Continuously: Fine-tune parameters based on usage patterns

The complete source code is available in our GitHub repository, and we welcome contributions and feedback from the community.

About the Author

This implementation was developed for the Image2URL project, a free image hosting and conversion service that handles millions of uploads monthly. The system has been battle-tested in production and serves as the foundation for our global image infrastructure.

License

This article and the associated code are released under the MIT License. Feel free to use, modify, and distribute in your own projects.

Further Reading

Top comments (0)