Building a Multi-Bucket Load Balancer for Cloudflare R2: Production-Ready Architecture
Introduction: The Challenge of Scalable Image Storage
In today's digital landscape, image storage has become a critical component for web applications. Whether you're running a social media platform, an e-commerce site, or a content management system, the ability to reliably store and serve images at scale is paramount.
The Problem
- Single bucket dependencies create single points of failure
- Monthly storage limits force capacity planning challenges
- Manual bucket switching is error-prone and inefficient
- No visibility into storage usage across buckets
Our Solution
We'll build a production-ready multi-bucket load balancing system for Cloudflare R2 that:
- Automatically distributes uploads across multiple buckets
- Provides real-time usage tracking and capacity management
- Implements intelligent failover and health monitoring
- Supports multiple load balancing strategies
- Includes Redis persistence for distributed deployments
System Architecture
Core Components
-
Load Balancer Engine (
r2-load-balancer-v2.ts)- Multiple selection strategies (priority, least-used, weighted-round-robin)
- Health monitoring with automatic cooldown
- Capacity reservation system
-
Persistence Layer (
r2-persistence-redis.ts)- Redis-based usage tracking with atomic operations
- Idempotent upload recording
- Distributed consistency for multi-instance deployments
-
Upload Helper (
r2-upload-enhanced.ts)- Automatic retry with exponential backoff
- Intelligent bucket switching on failures
- Comprehensive error handling and reporting
Data Flow
Upload Request → Load Balancer → Bucket Selection → Capacity Reservation → Upload → Usage Tracking → Success Response
Implementation Guide
Step 1: Setting Up Multiple R2 Buckets
First, let's configure multiple Cloudflare R2 buckets:
# Example bucket configuration
R2_BUCKETS=bucket1,bucket2,bucket3
R2_BUCKET1_NAME=image-storage-primary
R2_BUCKET1_ACCOUNT_ID=your_account_id_1
R2_BUCKET1_ACCESS_KEY_ID=your_access_key_1
R2_BUCKET1_SECRET_ACCESS_KEY=your_secret_1
R2_BUCKET1_PUBLIC_URL=https://primary.your-domain.r2.dev
R2_BUCKET1_MONTHLY_LIMIT=10
R2_BUCKET1_PRIORITY=1
# Repeat for bucket2 and bucket3 with different priorities
Step 2: Redis Configuration
For distributed usage tracking:
const redisPersistence = new R2RedisPersistence({
redisUrl: process.env.REDIS_URL,
keyPrefix: 'r2-load-balancer:',
ttl: 30 * 24 * 60 * 60 // 30 days
});
Step 3: Load Balancer Implementation
The core load balancer supports three strategies:
Priority-Based Selection
// Buckets with lower priority numbers are selected first
const strategy = { strategy: 'priority-first' };
Least-Used Selection
// Selects bucket with lowest usage percentage
const strategy = { strategy: 'least-used' };
Weighted Round Robin
// Distributes load based on configured weights
const strategy = {
strategy: 'weighted-round-robin',
weights: { bucket1: 3, bucket2: 2, bucket3: 1 }
};
Production Deployment Considerations
Environment Setup
Required Environment Variables:
# Multi-bucket configuration
R2_BUCKETS=bucket1,bucket2,bucket3
R2_BUCKET1_NAME=your_bucket_name
R2_BUCKET1_ACCOUNT_ID=your_account_id
# ... more bucket configs
# Redis for persistence
REDIS_URL=redis://localhost:6379
R2_USAGE_TTL=2592000
# Admin API keys
ADMIN_API_KEY=your_secure_admin_key
Docker Deployment
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["npm", "start"]
Monitoring and Observability
Health Check Endpoint:
curl -H "Authorization: Bearer $ADMIN_API_KEY" https://your-domain.com/api/health
Usage Statistics:
curl -H "Authorization: Bearer $ADMIN_API_KEY" https://your-domain.com/api/r2-status
Performance Metrics:
- Upload success rate
- Bucket health scores
- Capacity utilization
- Response times
Advanced Features and Best Practices
Capacity Reservation System
To prevent concurrent uploads from exceeding bucket limits:
// Reserve capacity before upload
const reservationId = uuidv4();
const reserved = await loadBalancer.reserveCapacity(bucketId, fileSize, reservationId);
if (!reserved) {
// Try next bucket or return error
}
Failure Recovery and Health Monitoring
The system automatically detects and handles failures:
// Automatic cooldown for consecutive failures
if (bucket.consecutiveFailures >= 3) {
const cooldownMinutes = Math.min(30, bucket.consecutiveFailures * 5); // Max 30 minutes
bucket.cooldownUntil = new Date(Date.now() + cooldownMinutes * 60 * 1000);
console.warn(`Bucket ${bucketId} entered cooldown for ${cooldownMinutes} minutes due to ${bucket.consecutiveFailures} failures`);
}
Usage Analytics
Comprehensive tracking with Redis:
interface UsageAnalytics {
totalUploads: number;
totalBytesGB: number;
averageResponseTime: number;
bucketHealthScores: Record<string, number>;
monthlyResetSchedule: Date;
}
Performance Optimization Techniques
Connection Pooling
S3 clients are cached and reused:
export class R2ClientFactory {
private static clientCache: Map<string, S3Client> = new Map();
public static getClient(bucketConfig: R2BucketConfig): S3Client {
const cacheKey = `${bucketConfig.id}-${bucketConfig.accountId}`;
let client = this.clientCache.get(cacheKey);
if (!client) {
client = new S3Client({
region: 'auto',
endpoint: `https://${bucketConfig.accountId}.r2.cloudflarestorage.com`,
credentials: {
accessKeyId: bucketConfig.accessKeyId,
secretAccessKey: bucketConfig.secretAccessKey,
},
});
this.clientCache.set(cacheKey, client);
}
return client;
}
}
Atomic Operations
Redis Lua scripts ensure atomicity:
-- Upload recording script
if redis.call('EXISTS', KEYS[1]) == 1 then
return {1, redis.call('HGET', KEYS[2], 'totalBytesGB')}
end
local total = redis.call('HINCRBYFLOAT', KEYS[2], 'totalBytesGB', ARGV[1])
redis.call('SETEX', KEYS[1], ARGV[2], '1')
return {0, total}
Response Time Optimization
- Parallel bucket health checks
- Cached usage statistics
- Pre-computed routing decisions
- Minimal Redis round trips
Testing and Validation
Unit Tests
describe('R2LoadBalancer', () => {
it('should select bucket with lowest usage', async () => {
const loadBalancer = createEnhancedR2LoadBalancer({ strategy: 'least-used' });
const bucket = await loadBalancer.selectOptimalBucket(1024 * 1024);
expect(bucket?.id).toBe('least_used_bucket');
});
it('should handle bucket failures gracefully', async () => {
const loadBalancer = createEnhancedR2LoadBalancer();
// Simulate bucket failure
await loadBalancer.recordFailedUpload('bucket1', new Error('Connection failed'));
const bucket = await loadBalancer.selectOptimalBucket(1024 * 1024);
expect(bucket?.id).not.toBe('bucket1');
});
});
Load Testing
// Concurrent upload testing
const concurrentUploads = Array.from({length: 100}, (_, i) =>
loadBalancer.upload(file, `test-${i}.jpg`)
);
await Promise.all(concurrentUploads);
Integration Tests
describe('API Integration', () => {
it('should upload with load balancing', async () => {
const response = await fetch('/api/upload', {
method: 'POST',
body: formData,
});
expect(response.status).toBe(200);
const result = await response.json();
expect(result.bucketId).toBeDefined();
});
});
Monitoring and Alerting
Key Metrics
Business Metrics:
- Upload success rate: > 99.5%
- Average response time: < 2 seconds
- Bucket utilization: < 90%
- Monthly reset成功率: 100%
Technical Metrics:
- Redis connection health
- S3 API error rates
- Memory usage patterns
- Request throughput
Alerting Rules
# Example alerting rules
alerts:
- name: "High bucket utilization"
condition: "usage_percentage > 90"
action: "scale_up_or_add_bucket"
- name: "Bucket health degradation"
condition: "health_score < 50"
action: "investigate_and_rotate"
- name: "Redis connection failure"
condition: "redis_connection_status != 'connected'"
action: "emergency_failover"
Dashboard Configuration
// Grafana dashboard panels
const dashboardConfig = {
panels: [
{
title: "Upload Success Rate",
type: "stat",
query: "sum(rate(uploads_total)) / sum(uploads_attempts)"
},
{
title: "Bucket Utilization",
type: "graph",
query: "bucket_usage_percentage"
},
{
title: "Response Times",
type: "histogram",
query: "upload_response_time_seconds"
}
]
};
Security Best Practices
Credential Management
- Store R2 credentials in environment variables
- Use IAM roles with minimum required permissions
- Implement credential rotation policies
- Monitor for suspicious API usage
Input Validation
// File size and type validation
if (file.size > MAX_FILE_SIZE) {
return { error: 'File too large' };
}
if (!ALLOWED_MIME_TYPES.includes(file.type)) {
return { error: 'Unsupported file type' };
}
Rate Limiting
// IP-based rate limiting
const rateLimit = new Map<string, { count: number; resetTime: number }>();
function checkRateLimit(ip: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const record = rateLimit.get(ip);
if (!record || now > record.resetTime) {
rateLimit.set(ip, { count: 1, resetTime: now + windowMs });
return true;
}
return record.count < limit;
}
Conclusion
Building a scalable multi-bucket load balancer for Cloudflare R2 provides several key benefits:
✅ High Availability: Automatic failover ensures 99.9%+ uptime
✅ Scalability: Horizontal scaling through bucket addition
✅ Cost Optimization: Efficient capacity utilization reduces costs
✅ Monitoring: Real-time visibility into system health
✅ Reliability: Atomic operations prevent data inconsistency
The implementation we've covered here is production-ready and can handle enterprise-scale workloads. The modular design allows for easy extension and customization based on specific requirements.
Next Steps
- Deploy to Production: Follow the deployment guide for your environment
- Monitor Performance: Set up monitoring and alerting
- Scale as Needed: Add more buckets as traffic increases
- Optimize Continuously: Fine-tune parameters based on usage patterns
The complete source code is available in our GitHub repository, and we welcome contributions and feedback from the community.
About the Author
This implementation was developed for the Image2URL project, a free image hosting and conversion service that handles millions of uploads monthly. The system has been battle-tested in production and serves as the foundation for our global image infrastructure.
License
This article and the associated code are released under the MIT License. Feel free to use, modify, and distribute in your own projects.
Top comments (0)