DEV Community: Vitaly Bicov

Supercharge Your CDN with Cloudflare Workers

Vitaly Bicov — Mon, 18 Aug 2025 09:13:45 +0000

Modern web applications demand instant content delivery, seamless personalization, and global reliability. Yet, ask any engineer managing a popular site-when a product launch triggers a traffic surge, even the best CDN sometimes buckles. One major retailer’s Black Friday campaign saw their origin servers grind to a halt, not because the CDN failed, but because cache misses skyrocketed for personalized content. The result? Lost sales and a lesson in the evolving needs of web delivery.

In this article, we’ll explore how Cloudflare Workers and edge computing can transform your CDN from a blunt instrument into a scalpel: precise, programmable, and highly efficient. Whether you’re a DevOps engineer, web architect, or performance-focused developer, you’ll learn actionable strategies for cache optimization, dynamic content, personalization, cost control, and more.

Introduction: The Evolving Demands on CDNs

Content Delivery Networks (CDNs) have long been the backbone of web performance, pushing static files closer to users worldwide. But today’s web requires more than just static acceleration. Personalized content, user-specific routing, and real-time transformations are now table stakes for user experience.

As web applications become more dynamic and distributed, so do the challenges involved in balancing speed, reliability, and cost. That’s where edge computing-and specifically, Cloudflare Workers-deliver new tools for the modern engineer.

Common CDN Challenges: Cache Efficiency, Dynamic Content, Personalization

When scaling web applications, traditional CDNs often hit roadblocks:

Cache Efficiency: CDNs excel at delivering cacheable static assets (images, CSS, JS). However, dynamic or user-personalized pages often bypass the cache, forcing repeated origin fetches.
Dynamic Content: API endpoints, A/B testing, and localization generate unique responses, limiting cache opportunities.
Personalization: Cookie-based logic, authentication, and geo-targeted experiences further fragment cacheability.
Cost: Increased origin traffic means higher bandwidth bills and potential latency spikes.

Key pain point: How do you keep performance high and costs low, even as content gets more dynamic and personalized?

Edge Computing and Cloudflare Workers: A Primer

Edge computing shifts computation from centralized servers to geographically distributed nodes (the “edge”), close to the end user. Cloudflare Workers is a serverless platform that runs lightweight JavaScript, TypeScript, or WASM code directly on Cloudflare’s global edge network.

Why Workers?

Programmability: Inspect, modify, or generate responses at the edge.
Performance: Minimal latency, as logic runs close to users.
Scalability: No server management; automatic scaling.
Security: Mitigate attacks before requests reach your infrastructure.

Architecture Overview:

User Request ──► Cloudflare Edge Node (Worker) ──► Origin (if needed)   
                             │ [Custom Logic]

Request and Response Modification at the Edge

With Workers, you can:

Rewrite requests (change URLs, headers, cookies)
Implement custom cache keys
Filter or block malicious traffic
Modify responses (inject headers, rewrite HTML)

Example: Add a Cache-Control Header to API Responses

addEventListener('fetch', event => {  
  event.respondWith(handleRequest(event.request));  
});  

async function handleRequest(request) {  
  let response = await fetch(request);  
  // Clone response so we can modify headers  
  response = new Response(response.body, response);  

  // Add Cache-Control for better CDN caching  
  response.headers.set('Cache-Control', 'public, max-age=60');  
  return response;  
}

Why this matters: Many APIs lack cache directives. By controlling headers at the edge, you unlock CDN caching for previously uncacheable content.

Implementing CDN Optimization with Worker Scripts

Let’s walk through a practical example: Dynamic cache key customization based on cookies, geography, or device.

Scenario: Personalizing Cache Keys

Suppose you run an e-commerce site with localized pricing, shown based on user country. By default, your CDN may treat all requests to /shop as the same, resulting in cache collisions or misses.

Worker script: Customize the cache key using the CF-Connecting-IP or cf-ipcountry header.

addEventListener('fetch', event => { event.respondWith(handleRequest(event.request)); }); async function handleRequest(request) { // Use country from header to personalize cache const country = request.headers.get('cf-ipcountry') || 'US'; const url = new URL(request.url); url.searchParams.set('country', country); // Create a custom cache key const cacheKey = new Request(url.toString(), requesaddEventListener('fetch', event => {  
  event.respondWith(handleRequest(event.request));  
});  

async function handleRequest(request) {  
  // Use country from header to personalize cache  
  const country = request.headers.get('cf-ipcountry') || 'US';  
  const url = new URL(request.url);  
  url.searchParams.set('country', country);  

  // Create a custom cache key  
  const cacheKey = new Request(url.toString(), request);  

  // Try to find in cache  
  const cache = caches.default;  
  let response = await cache.match(cacheKey);  
  if (!response) {  
    // Not in cache, fetch from origin and cache the result  
    response = await fetch(request);  
    // Set a short TTL for dynamic personalization  
    response = new Response(response.body, response);  
    response.headers.set('Cache-Control', 'public, max-age=120');  
    event.waitUntil(cache.put(cacheKey, response.clone()));  
  }  
  return response;  
}

Explanation:

Cache is segmented per country, reducing origin hits for localized content.
The TTL is tuned for freshness vs. cost.

Advanced Caching Strategies for Dynamic and Personalized Content

1. Stale-While-Revalidate

Serve slightly outdated content instantly, while refreshing cache in the background.

response.headers.set('Cache-Control', 'public, max-age=60, stale-while-revalidate=300');

Use case: News headlines, product listings.

2. Edge-Side Includes (ESI) Simulation

Combine static and dynamic fragments at the edge.

// Fetch static shell from cache, dynamic data from API


const [shell, data] = await Promise.all([


  cache.match(shellRequest),


  fetch(dynamicDataRequest)


]);


// Merge and respond


return new Response(await combine(shell, data), { headers });

Device and Language Detection

Customize cache key or response based on User-Agent and Accept-Language.

Real-World Use Cases: A/B Testing, Geolocation Routing, and Bot Management

A/B Testing

Problem: Running an experiment by variant assignment in the browser breaks cache efficiency.

Solution: Assign variant at the edge, cache per variant.

const cookie = request.headers.get('Cookie');


let variant = getVariantFromCookie(cookie) || assignAndSetCookie(event);


// Partition cache by variant


url.searchParams.set('ab_variant', variant);

Geolocation Routing

Use case: Redirect users to region-specific domains or serve localized assets.

const country = request.headers.get('cf-ipcountry');


if (country === 'DE') {


  return Response.redirect('https://de.example.com' + url.pathname, 302);


}

Bot Management

At the edge: Block or challenge suspicious bots before they reach your origin.

if (isLikelyBot(request)) {


  return new Response('Access denied', { status: 403 });


}

Monitoring and Measuring Success: CDN Metrics, Cache Hit Rates, and Latency

What to Track

Cache Hit Ratio: % of requests served from edge cache
Origin Bandwidth: Volume of traffic reaching backend servers
Latency: Time to first byte (TTFB) from user perspective
Error Rates: Monitor for false positives in bot management or misrouted requests

Tools

Cloudflare Analytics: Built-in dashboard for traffic, cache, and performance
Logpush/Logpull: Stream edge logs to your SIEM or analytics platform
Custom Metrics: Send data to Datadog, Prometheus, or Cloudflare Workers Analytics Engine

Cost Optimization: Reducing Bandwidth and Origin Load

By enabling edge-side processing and advanced caching:

Lower Origin Costs: Fewer requests and bytes sent to your infrastructure
Reduced Egress Fees: Especially critical for cloud-hosted origins (e.g., AWS, GCP)
Faster User Experiences: Less round-trip time to origin, better conversion rates

Example: Origin Shielding with Workers

Workers can act as a shield, absorbing unnecessary origin requests during traffic surges.

if (await cache.match(cacheKey)) {


  // Serve from edge, skip origin cost


} else {


  // Fetch and cache as above


}

Best Practices, Pitfalls, and Future Trends

Best Practices

Keep logic minimal: Edge compute is powerful but should be fast and stateless.
Monitor for Edge Cache Fragmentation: Too many cache keys can lower hit rates.
Leverage Feature Flags: Gradually roll out worker logic.
Test in Staging: Always validate behavior in a non-production environment.

Pitfalls & Limitations

Cold Start Latency: Initial worker startup may add milliseconds, but is usually negligible.
Execution Timeouts: Workers have strict limits (50ms CPU time for Free plan, higher for paid).
Complex State: Workers are stateless; use KV or Durable Objects for persistent data.

Future Trends

Deeper Personalization: Edge AI/ML for content adaptation
Edge Data Stores: Real-time, globally distributed state
Integrated Observability: Native metrics, traces, and logs

Conclusion: The Future of CDN Optimization at the Edge

Cloudflare Workers-and edge computing in general-are redefining what’s possible with CDN performance and flexibility. By bringing code execution and caching closer to your users, you can finally deliver personalized, dynamic, and blazingly fast experiences-without ballooning costs or complexity.

Key takeaways:

Edge compute unlocks advanced CDN strategies (personalization, A/B testing, bot defense)
Programmable logic at the edge means fewer origin hits, lower costs, and happier users
Careful monitoring, cache key management, and simplicity are crucial for success

Ready to level up? Dive deeper into edge patterns, try Cloudflare Workers in your stack, and start measuring the difference. The next generation of web performance is running at the edge.

Originally published at https://bicov.pro on August 18, 2025.

Predictive Auto-Scaling for Stateful Apps

Vitaly Bicov — Mon, 11 Aug 2025 09:13:22 +0000

Introduction: The Challenge of Stateful Scaling

Picture this: On Black Friday, a global e-commerce giant’s order-processing system is humming along, scaling web servers seamlessly as customer traffic surges. Yet, deep in the backend, the payment database cluster struggles, unable to keep up with demand spikes. Transactions queue up. Latency grows. Revenue — literally — slips away.

Auto-scaling stateless services is a solved problem. But getting stateful apps like databases, message queues, and cache clusters to scale predictively and reliably? That’s where the real pain starts for DevOps teams.

This article is for cloud engineers, SREs, DevOps leads, and architects who are tasked with making stateful applications as elastic, resilient, and cost-efficient as their stateless counterparts. You’ll learn:

Why stateful services are hard to scale
How predictive algorithms (time series & ML) can help
Practical implementation strategies: custom metrics, scaling policies, data management
Real-world examples, pitfalls, and best practices

Let’s dive in.

Stateful vs. Stateless: Key Differences in Scaling

Before we tackle solutions, let’s clarify what’s at stake:

Stateless apps (e.g., web frontends, API gateways) store no client/session data locally. Instances can be created or destroyed at will.
Stateful apps (e.g., databases, message brokers, cache servers) hold critical data that must persist and synchronize across nodes.

Scaling stateless workloads:

Easy-just add or remove instances based on CPU, memory, or latency metrics.

Scaling stateful workloads:

Hard-because you must also ensure:

Data integrity
Consistent state across replicas
Reliable data persistence and recovery

Barriers to Scaling Stateful Apps

Let’s break down the main hurdles:

Data Consistency and Integrity

Scaling out a stateful app means adding nodes that must sync with existing data-without risking loss or corruption.

Distributed databases (like MongoDB or Cassandra) need strict consistency protocols.
Sharding and replication must be coordinated to avoid split-brain scenarios.

Startup Time and Synchronization

Bringing new stateful nodes online isn’t instant:

Nodes must fetch data snapshots or stream state from peers.
Full sync can take minutes or more, especially under heavy load.

Resource Allocation Complexities

It’s not just about CPU/RAM:

Persistent storage: Each instance requires unique, durable storage (PersistentVolumes in Kubernetes, for example).
Network: Data replication and synchronization add network overhead.
Affinity/anti-affinity: Pods/nodes must be scheduled to minimize risk of data loss.

Predictive Algorithms for Scaling

Reactive scaling (e.g., “add node if CPU > 80%”) is too little, too late for stateful apps. Predictive approaches let you scale ahead of demand spikes, ensuring new nodes are ready in time.

Time Series Analysis: Forecasting Demand

Classic statistical methods (ARIMA, Holt-Winters, Prophet) can predict future load based on historical metrics.

Sample: Using Prophet to Forecast Cassandra Query Load

from fbprophet import Prophet  
import pandas as pd  

# Load historical QPS data  
df = pd.read_csv('cassandra_qps_history.csv')  
df.columns = ['ds', 'y']  # Prophet expects 'ds' (timestamp), 'y' (value)  

model = Prophet()  
model.fit(df)  
future = model.make_future_dataframe(periods=24, freq='H')  
forecast = model.predict(future)  

# Print prediction for next 6 hours  
print(forecast[['ds', 'yhat']].tail(6))Deploy these forecasts into your scaling logic to trigger scale-ups before the rush hits.

Machine Learning Models: Beyond Simple Thresholds

ML models (regression, LSTM, XGBoost) can learn complex patterns-seasonality, sudden bursts, multi-metric correlations.

Feature engineering: Include business events (e.g., marketing promotions), user signups, or external signals.
Model deployment: Serve predictions via REST APIs or batch pipelines integrated with your scaling controllers.

Example: ML-based Scaling Trigger

apiVersion: autoscaling/v2  
kind: HorizontalPodAutoscaler  
metadata:  
  name: stateful-db-autoscaler  
spec:  
  scaleTargetRef:  
    apiVersion: apps/v1  
    kind: StatefulSet  
    name: my-db  
  minReplicas: 3  
  maxReplicas: 10  
  metrics:  
    - type: External  
      external:  
        metric:  
          name: predicted_write_throughput  
        target:  
          type: Value  
          value: "5000"  # Predicted QPS threshold from ML model

Here, the target metric (predicted_write_throughput) is supplied by a custom ML service.

Designing Custom Metrics and Scaling Policies

Relying on CPU or memory is rarely enough. Build richer signals.

Identifying Relevant Signals

Request rates (QPS, TPS)
Queue length/lag (Kafka, RabbitMQ)
Replication lag
Disk IOPS
Business events (campaign launches, news cycles)

Tip: Use Prometheus exporters or custom sidecars to surface these metrics.

Integrating Predictions into Auto-Scaling Workflows

Train and deploy your forecasting/ML model.
Expose predictions as a metrics endpoint (/metrics or push to Prometheus).
Configure your orchestration platform (Kubernetes HPA/VPA, custom controller) to act on these predictions.

Prometheus Adapter Example:

apiVersion: v1  
kind: Service  
metadata:  
  name: prediction-metrics  
spec:  
  selector:  
    app: ml-predictor  
  ports:  
    - protocol: TCP  
      port: 8080  
      targetPort: 8080

Your HPA can now use these custom metrics for scaling triggers.

Ensuring Application Readiness During Scaling Events

Scaling a stateful app means parts of your system will be unavailable or degraded during transitions. Minimize risk:

Health Checks and Readiness Probes

Liveness probe: Restart unresponsive pods.

Readiness probe: Only send traffic to nodes that are fully initialized and synced.

readinessProbe:  
  exec:  
    command: ["/bin/check_db_synced.sh"]  
  initialDelaySeconds: 30  
  periodSeconds: 10

Graceful Startup and Shutdown

Delay taking new traffic until state sync is complete.
On scale-down, drain connections and move or flush data safely.

Gotcha: Abrupt pod deletion can cause data loss or split-brain. Always use preStop hooks and finalizers.

Managing Data Persistence and Volume Lifecycle

Scaling up or down means handling persistent storage with care.

Persistent Volume Strategies

Dynamic provisioning: Use StorageClasses to automate volume creation per replica.

Retain policy: Avoid deleting volumes until you know data is migrated.

apiVersion: storage.k8s.io/v1  
kind: StorageClass  
metadata:  
  name: fast-ssd  
provisioner: kubernetes.io/gce-pd  
reclaimPolicy: Retain

Backups and Migration During Scaling

Snapshot before scaling: Ensure you can roll back if sync fails.
Automate backups: Integrate with Velero, Stash, or native cloud snapshots.

Example: Pre-Scale Backup Job

apiVersion: batch/v1

kind: Job

metadata:

  name: db-backup

spec:

  template:

    spec:

      containers:

      - name: backup

        image: my-backup-tool

        command: ["/backup.sh"]

      restartPolicy: OnFailure

Monitoring and Observability

You can’t improve what you can’t see.

Tracking Scaling Events and Performance

Dashboards: Grafana panels for historical scaling actions, node health, replication lag, failovers.
Alerts: Notify on abnormal scaling frequency, pod crashes, or sync errors.

Cost Optimization Analysis

Correlate resource usage and spend: Are you over-provisioning to “play it safe”?
Post-mortems: Analyze scale-up/scale-down timing versus actual demand to fine-tune predictive models.

Case Studies

Let’s look at how real-world teams solve these challenges.

Auto-Scaling Database Clusters (MongoDB & Cassandra)

Problem: Slow to scale due to data copy/sync; risk of inconsistent reads.
Solution: Predict spikes using ARIMA; start new nodes 20 minutes ahead. Use readiness probes to ensure only fully-synced nodes receive traffic.

Scaling Message Queue Systems (Kafka)

Problem: Consumer lag spikes during flash sales; adding brokers mid-event is too slow.
Solution: ML model predicts high-lag events from website traffic and product launches. Brokers pre-provisioned, partitions rebalanced gradually.

Caching Layer Elasticity (Redis, Memcached)

Problem: Traffic bursts cause cache misses and backend overload.
Solution: Time-series forecasting triggers cache node warmups, pre-populating popular keys before peak hours.

Best Practices and Lessons Learned

Don’t rely solely on resource metrics. Use business-aware, custom signals.
Always bake in time for sync and warmup. Predictive scaling is about when not just how much.
Automate backups and test recovery. Assume node loss and plan for graceful failover.
Monitor everything. Invest in end-to-end observability and cost analytics.
Iterate. Initial models will be wrong-learn and refine with real production data.

Conclusion: The Future of Predictive Scaling for Stateful Workloads

Stateful auto-scaling isn’t just a technical feat-it’s an operational imperative for modern, cost-effective cloud-native systems. By combining predictive analytics with robust engineering practices around data, orchestration, and observability, you can make your stateful apps as agile as the cloud promises.

Key takeaways:

Predictive scaling bridges the gap between slow, risky stateful scale and fast-changing business demand.
Custom metrics and readiness checks are non-negotiable.
Invest in automation, monitoring, and continuous improvement.

Next steps:

Explore serverless databases, operator patterns for complex stateful services, and advanced ML for even smarter scaling. The future is predictive-get ahead of the curve.

Originally published at https://bicov.pro on August 11, 2025.

Mastering Redis Clusters: Sharding & Monitoring

Vitaly Bicov — Mon, 28 Jul 2025 08:24:10 +0000

Introduction: Why Redis Clustering Matters

Imagine you’re managing a high-traffic e-commerce platform. Black Friday hits, and suddenly, millions of shoppers are racing through your checkout. Your monolithic Redis instance-once sufficient-now buckles under the load, causing timeouts and lost sales. Sound familiar?

As organizations scale, single-node Redis deployments eventually become bottlenecks. Redis Clustering offers a resilient, horizontally-scalable architecture with automated sharding and failover. But configuring, operating, and monitoring Redis clusters for production is non-trivial: sharding can be opaque, and cluster health issues can escalate rapidly if not caught early.

This guide is for DevOps engineers, SREs, and backend developers who want to:

Confidently deploy and operate Redis clusters at scale
Understand how data is distributed via sharding
Monitor, maintain, and scale clusters for high availability and performance

Let’s dive in and master Redis Clusters from the ground up.

Redis Cluster Fundamentals

Data Distribution and Hash Slots

Redis Cluster distributes data using a concept called hash slots. There are 16,384 hash slots, and each key maps to one slot. Cluster nodes own subsets of these slots, forming the basis of automatic sharding.

Sharding: Each node manages a subset of the keyspace.
Even Distribution: Reduces risk of hot spots and balances load.

Replication and Consistency Models

Redis Cluster offers asynchronous replication:

Master nodes: Store the data and handle writes.
Replica nodes: Maintain copies of master data for failover and read scalability.

Consistency is eventual:

Writes are acknowledged after being committed to the master.
Reads from replicas may be stale.

Automatic Failover

If a master node fails, the cluster promotes one of its replicas to master- automatically -minimizing downtime and removing manual intervention from the critical path.

Setting Up a Redis Cluster

Node Configuration and Key Settings

Let’s create a basic 6-node cluster (3 masters, 3 replicas) using Docker for local testing:

Create a Docker network for cluster nodes

docker network create redis-cluster-net

Start 6 Redis nodes

for port in 7000 7001 7002 7003 7004 7005; do   
  docker run -d --name redis-$port --net redis-cluster-net \  
    -p $port:6379 \  
    -v $(pwd)/redis-$port.conf:/usr/local/etc/redis/redis.conf \  
    redis:7.2-alpine redis-server /usr/local/etc/redis/redis.conf  
done

A minimal redis-7000.conf would include:

port 6379   
cluster-enabled yes   
cluster-config-file nodes.conf   
cluster-node-timeout 5000   
appendonly yes

Gotcha: Make sure _cluster-enabled_ is _yes_ and each node has a unique config file.

Slot Allocation and Replication Topology

Join the nodes into a cluster using redis-cli:

docker run -it --rm --net redis-cluster-net redis:7.2-alpine \  
  redis-cli --cluster create \  
    172.18.0.2:6379 172.18.0.3:6379 172.18.0.4:6379 \  
    172.18.0.5:6379 172.18.0.6:6379 172.18.0.7:6379 \  
    --cluster-replicas 1

This command:

Allocates hash slots evenly across the 3 masters.
Assigns 1 replica per master (the other 3 nodes).

Verify status:

redis-cli -c -p 7000 cluster nodes

Automated Sharding in Redis

Understanding Hash Slots

Every key is assigned to a slot via CRC16(key) mod 16384. The cluster maps slots to nodes. For example:

redis-cli -c -p 7000 cluster keyslot mykey123

Output: a slot number (e.g., 15365)

This determines which node stores mykey123.

Automated Resharding

Need to redistribute data as you add nodes? Use redis-cli --cluster reshard:

redis-cli --cluster reshard 127.0.0.1:7000

Interactive prompts will let you move slots between nodes, with the cluster handling key migration.

Tip: For zero-downtime resharding, do it during off-peak hours and monitor latency!

Scaling the Cluster Dynamically

Adding a new node:

Start the new Redis node (e.g., on port 7006).
Add it to the cluster:

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000

Monitoring Your Redis Cluster

Cluster Health and Node Status

Monitor with:

redis-cli -c -p 7000 cluster info

Look for:

cluster_state:ok
cluster_slots_assigned:16384
cluster_known_nodes: number of nodes

Automated health checks can poll this endpoint and alert if state is not ok.

Tracking Memory Usage and Key Distribution

Use:

redis-cli -c -p 7000 info memory   
redis-cli -c -p 7000 cluster nodes

Ensure memory usage is balanced.
Check for slot imbalances or node failures.

A simple script to check slot distribution:

redis-cli -c -p 7000 cluster nodes | grep master | awk '{print $2, $9}'

Alerting and Visualization Tools

Prometheus + Redis Exporter: Collect metrics from all nodes.
Grafana Dashboards: Visualize memory, command rates, slot distribution.
Alertmanager: Notification on health or performance anomalies.

Example: oliver006/redis_exporter

Operational Playbook

Node Maintenance and Upgrades

Rolling Upgrades: Upgrade replica nodes first, then promote and upgrade masters one by one.
Graceful Failover: Use CLUSTER FAILOVER to promote a replica before maintaining a master.

Example sequence:

On a replica node:

redis-cli -c -p 7003 cluster failover

Failure Recovery Procedures

Automatic Failover: The cluster promotes replicas automatically.
Manual Intervention: If all replicas are lost, restore from backups.
Rejoining: Use CLUSTER MEET to re-add recovered nodes.

Common Mistake: Not maintaining up-to-date replicas-if a master and its replicas fail, data loss is possible.

Performance Tuning Best Practices

Enable appendonly yes for durability.
Tune cluster-node-timeout (default 15s) for your network latency.
Monitor for large keys or hot slots; consider client-side sharding if needed.

Integrating Clients with Redis Cluster

Connection Handling and Discovery

Use Redis Cluster-aware clients (e.g., Jedis, lettuce, redis-py-cluster).

Clients discover the cluster topology at startup.
On MOVED or ASK responses, they reroute requests.

Example (Python):

from rediscluster import RedisCluster   

rc = RedisCluster(startup_nodes=[{"host": "127.0.0.1", "port": "7000"}], decode_responses=True)


rc.set("user:100", "alice")


print(rc.get("user:100"))

Error Recovery Strategies

MOVED/ASK errors: Cluster-aware clients handle these and re-route transparently.
Retries: Implement retry logic for transient failures.
Backoff strategies: For network partitions or failover, use exponential backoff.

Load Balancing Approaches

Let the client connect to any node; the cluster will direct requests.
For maximum resiliency, provide multiple startup nodes.
Avoid single-node proxies, which can become bottlenecks.

Conclusion: Best Practices and Takeaways

Redis Clustering unlocks massive scalability and high availability for demanding workloads. Key takeaways:

Sharding is handled automatically via hash slots-understand slot allocation for effective troubleshooting.
Monitor cluster state, memory, and slot balance proactively; integrate with tools like Prometheus and Grafana.
Master operational playbooks for upgrades, resharding, and failure recovery.
Use cluster-aware clients and implement robust error recovery.

What’s next? Explore advanced topics like multi-datacenter clusters, tuning persistence for your durability needs, and integrating Redis cluster with cloud orchestration (Kubernetes, managed Redis services).

By mastering Redis Cluster internals and monitoring, you’ll confidently scale your data layer-no matter how high the traffic spikes.

Originally published at https://bicov.pro on July 28, 2025.

Microsoft’s Majorana-1 Quantum Chip: The Future Is Closer Than You Think

Vitaly Bicov — Sun, 27 Jul 2025 11:45:47 +0000

Introduction: Quantum Computing Becomes Reality

In February 2025, Microsoft made shockwaves in the tech world by revealing Majorana-1, its first quantum computing chip. And this is no ordinary quantum chip — it’s a completely new paradigm. Built on topological state of matter, Majorana-1 is claimed to provide stability and scalability to quantum computing, and for good reasons. We’re talking about a chip capable of solving problems today’s top supercomputers cannot.

How Majorana-1 Works: Topological Magic and Majorana Particles

So, what’s so special about this chip? The magic lies in topological quantum computing. The conventional qubits, by contrast, are fragile and prone to mistakes. The topological qubits, by contrast, are much sturdier. The sturdiness is achieved by tapping anyons — quirky quasiparticles in two-dimensional systems. The real magic is revealed, however, if anyons are braided. The braiding pattern in which they’re braided is the basis for quantum computations, and such is inherently immune to errors.

But the hero of the piece? Majorana particles. These are their antiparticle equivalents, something that sounds like something taken directly from Star Trek but is, in reality, serious business. Majorana particles yield topologically stable qubits, something resistant to the noisy world that otherwise interferes with quantum computers.

The chip consists of aluminum (a superconductor) and indium arsenide (a semiconductor) and is what is described as a topoconductor. For now, it only contains eight qubits, but Microsoft is in the process of scaling it to jaw-dropping levels of one million qubits.

Current Status: An Insight to the Future

At this stage, Majorana-1 is in prototype. With only eight onboard qubits, it’s not quite prime time — you’re not spinning it up on Azure today. Don’t let that low figure fool you, though. Microsoft is planning decades, not years, to develop a fault-tolerant quantum computer. And no, no empty promise — it’s under the umbrella of DARPA’s Underexplored Systems for Utility-Scale Quantum Computing (US2QC) initiative.

The chip’s current capability is remarkable: 1% error rates, a huge improvement on current capabilities and something that might bring large-scale quantum computing sooner than anybody might have hoped.

Zoom image will be displayed

Why It Matters: Real-World Significance and Game-Changing Implications

If Microsoft is successful and accomplishes that goal of one-million-qubits, the implications are huge. Here’s how:

Drug Discovery: Quantum simulation of interactions between molecules is likely to revolutionize how drugs are developed, speeding up timelines and reducing cost.
Materials Science: The understanding of advanced materials on the quantum level is likely to provide breakthroughs in energy storage and in nanotechnology.
Optimization Problems: From logistics to supply chain, quantum computers may solve problems to which centuries may pass before a response is given by classical computers.
AI and Machine Learning: Quantum AI is likely to produce superior, faster algorithms than any in current use.
Cryptography: Quantum computers have the ability to decrypt today’s modes of cryptography, but in so doing, open up to new, quantum-resistant cryptography.

Microsoft’s plan to incorporate Majorana-1 in Azure cloud services may bring quantum computing to a much larger audience, and in doing so, bring to earth something long appeared to lie in orbit.

Challenges: The Road to Progress is Not Smoooooth

Of course, no rainbows and unicorns. There are serious barriers to overcome. The mere existence of Majorana particles is contested in some corners of the scientific world, and claims have in the past been discounted. Material defects and realistic operating issues also shadow them.

However, Microsoft is poised. Recent reports in Nature suggest they have achieved low-error successes, validating their ambitious plans. Until, though, there is a scalable, fault-tolerant quantum computer running in live applications, there is always going to be doubting.

Zoom image will be displayed

The Bigger Picture: How Microsoft Sizes Up

Microsoft isn’t competing in any way. Its competitors, such as Google, IBM, IonQ, and Rigetti, have their own plans to advance. Google’s Willow chip and 1,121-qubit Condor processor by IBM stand on their own. Nonetheless, Microsoft’s topological solution might have an edge — stability and scalability, to begin, and something everyone might have to copy.

While some industry leaders, including Nvidia CEO Jensen Huang, believe pragmatic quantum computing is decades ahead, Microsoft is counting on otherwise. If they’re correct, quantum computing is only a few years away from redefining everything.

Conclusion: Quantum Computing, Nearer Than You Realize

Microsoft’s Majorana-1 is no mere chip — but rather, a solid push towards enabling quantum computing to be made real, reliable, and scalable. Having plans to advance to a million qubits and seamless integration in Azure, Microsoft is in position to take leadership in a technology revolution.

Sure, there are barriers. But if Majorana-1 is successful, we’re talking about an age where quantum computing is no longer relegated to laboratory testing — but is instead a game-changer in industry, in medicine, in AI. The age of quantum is no longer on the horizon — but may be closer than ever.

CPU in Linux. Load Average

Vitaly Bicov — Sun, 27 Jul 2025 10:24:12 +0000

Load Average is an important measurement in Linux to assess system load on average. It represents the average rate for running and waiting-for-run-queue processes on the system for 1, 5, and 15-minute time intervals. Compared to using only CPU utilization, Load Average gives system administrators a better and deeper understanding of the current load.

The Evolution of Load Average

Originally, this measurement wasn’t always so versatile. Prior to 1993, it only reflected CPU load average (similar to other Unix systems at that time) and didn’t account for other resource demands. Everything changed with a patch released on Friday, October 29, 1993, where the author stated:

“The kernel only counts ‘runnable’ processes when computing the load average. I don’t like that; the problem is that processes which are swapping or waiting on ‘fast,’ i.e. noninterruptible, I/O, also consume resources. It seems somewhat nonintuitive that the load average goes down when you replace your fast swap disk with a slow swap disk… Anyway, the following patch seems to make the load average much more consistent WRT the subjective speed of the system. And, most important, the load is still zero when nobody is doing anything.”

The Big Takeaway

While the exact evolution of the code after that patch hasn’t been fully explored here, the crucial point remains: from that moment on, people began thinking of Load Average not merely as CPU load but as an indicator of overall system load.

How Load Average is Calculated

From the quotation provided or in various websites, “average” (in the context of Load Average) might appear to simply be an arithmetic average of values over a given period. In reality, Load Average in Linux is calculated using an exponential moving average (EMA) rather than a simple arithmetic average.

This approach gives recent system load changes greater priority compared to historical data. As a result, the value remains both sensitive and stable, making it especially useful for monitoring system performance.

The Load Average Formula

Zoom image will be displayed

where:

• is the new Load Average value.

• is the previous Load Average value.

• is the current number of processes in the run queue (running + waiting).

• is the time elapsed since the last update.

• (tau) is a time constant (different for the 1-, 5-, and 15-minute averages):

• 1 minute: seconds

• 5 minutes: seconds

• 15 minutes: seconds

Every second, the Linux kernel changes the Load Average with a smoothing formula. Each of the three metrics (1, 5, and 15 minutes) has its own decay coefficients.

Decay Factor for Updates

Zoom image will be displayed

This formula makes sudden changes in the system less sharp. For instance, when you start a heavy workload, the Load Average does not jump right up to the highest point but rises slowly instead.

Example Calculation

Let’s assume the current 1-minute LA is , and at time four processes appear in the queue.

The new Load Average is calculated as follows:

Zoom image will be displayed

…and so on.

The new Load Average does not go directly to 4; it rises slowly. This method allows the metric to show the current system state better. In math terms, the three values (1, 5, and 15 minutes) always average the total system load since the start. They decay in an exponential way but at different speeds — for 1, 5, and 15 minutes. Therefore, the 1-minute average has around 63% of the load from the last minute and 37% from earlier times, not counting the last minute. The same 63%/37% distribution holds for the 5 and 15-minute averages for their respective times. It’s not exactly true to say that the 1-minute average includes just the last 60 seconds of activity — it also has 37% from a more distant past. But it is accurate to say that it mainly shows the last minute.

Zoom image will be displayed

Practical Use

If Load Average < Number of cores, the system is running normally.
If Load Average ≈ Number of cores, the CPU is at 100% utilization.
If Load Average > Number of cores, processes are waiting for CPU time, indicating potential performance problems.

For example, on an 8-core server:

• LA(5) = 4 → CPU is about 50% utilized.

• LA(5) = 12 → CPU is overloaded; processes are waiting.

Why Load Average Is Better Than CPU Utilization Percentage?

Processes in the Queue. Load Average includes both active processes on the CPU and those waiting in the queue. CPU utilization percentage, however, only shows current CPU activity and ignores queued processes, which can lead to underestimating the real load.
2. Overall Load Measure. Load Average reflects all CPU cores. In systems with multiple processors or hyperthreading, it better represents the system’s load than CPU utilization, which may overlook the impact of queued processes.
3. I/O Factors. Load Average takes into account processes in “uninterruptible sleep” (like waiting for I/O). These processes can greatly impact performance even when not using CPU time, which you might miss if considering just CPU utilization.
4. Analyzing Trends. Load Average gives data for three timeframes (1, 5, and 15 minutes), making it simpler to observe changes over time and spot possible problems. In contrast, CPU utilization percentage offers only a brief view, lacking any trend analysis.

Zoom image will be displayed

About Hyperthreading

Hyperthreaded systems allow each physical processor to manage two instruction streams which results in the formation of logical cores. Linux treats these streams as separate cores. A computer system with 4 physical processor cores and hyperthreading capability will appear to have 8 logical cores. When a system reaches a Load Average of 8.0 it indicates full usage of all logical cores yet doesn’t guarantee performance to be twice as fast. Performance degradation occurs faster as Load Average increases in hyperthreaded systems compared to those without hyperthreading. Our next article will cover the operational details of hyperthreading and explore its benefits and drawbacks along with its overall effects.

Conclusion

Load Average provides richer insight into system performance through its extensive assessment of system load beyond basic CPU utilization measures. Load Average includes information about the current CPU workload and scheduled processes that are pending execution and tracks additional elements like I/O operations. Load Average proves to be an essential monitoring and management tool for Linux systems when dealing with heavy load or when working with servers that have multiple cores and hyperthreading capabilities.

The comprehensive historical analysis of this metric can be found in Brendan Gregg’s article and its previous Habr publication translation. Should you want to investigate the kernel code to understand this metric’s operation you are welcome to explore it directly!

Kubernetes CronJob + Sidecar: A Love Story Gone Wrong (And How to Fix It)

Vitaly Bicov — Sun, 27 Jul 2025 10:05:59 +0000

I work at a large product company with a sprawling Kubernetes infrastructure. We run thousands of workloads, process massive amounts of data, and rely on automation to keep things running smoothly. So when we needed to execute a scheduled task in Kubernetes, using a CronJob seemed like a no-brainer.

At first, everything worked perfectly. Our CronJob fired up a Job, the task ran, completed, and exited cleanly.

But then, as always, the requirements changed:

• The script was opening too many database connections, so we added an SQL proxy to optimize connection pooling.

• The task became mission-critical, meaning we needed real-time monitoring to ensure failures wouldn’t go unnoticed.

• We added sidecar containers for these enhancements… and that’s when everything broke.

The Problem: CronJob Stopped Running

Kubernetes CronJobs work by creating Jobs, which spin up Pods to execute the actual work. A Job is considered complete only when all containers in the pod reach the Succeeded state.

Our main container was completing successfully, transitioning to Succeeded.

But our sidecar containers – SQL Proxy and Monitoring – were running indefinitely.

Since they never exited, the Job never finished, and the CronJob never scheduled the next execution.

Oops.

Why We Needed These Sidecars in the First Place

SQL Proxy: Our script was making hundreds of direct DB connections, overwhelming the database. Adding a SQL proxy helped pool connections, reducing the load.
Monitoring: The job wasn’t just some background task – it was mission-critical. If it failed silently, key business processes would break. We needed real-time logs and metrics to ensure it was running correctly.

So removing the sidecars wasn’t an option. Instead, we needed to teach them when to exit.

The Fix: Graceful Shutdown via File Signaling

We needed a way to tell the sidecars:

“Hey, the main job is done. Time to shut down.”

Here’s the new strategy:

The main container creates a file (/shared-data/done) in a shared volume at startup.
The sidecars monitor the file using inotifywait.
When the main job finishes, it deletes the file.
The sidecars detect this, terminate gracefully using SIGTERM, and exit.
The Job completes, and the CronJob can schedule the next run.

This Problem Isn’t New

This issue has been around for a while, and various workarounds have been proposed. There are even specialized projects like K8S Job Sidecar Terminator, which help manage sidecar shutdown for Kubernetes Jobs.

However, our approach is much simpler and doesn’t require any additional components – just a shared volume and a simple script inside the containers.

Implementation: The Helm Chart

Shared Volume
We’ll use an emptyDir volume so all containers in the pod can access the same file.

volumes:
  - name: shared-data
    emptyDir: {}

The Main Job Container

Our main job script will:

• Execute the actual task.

• Create /pod/terminated (or /pod/error) when it have finished.

containers:
  - name: main-job
    image: my-job-image:latest
    command:
      - /bin/sh
      - -c
      - |
        trap '[ $? -eq 0 ] && touch /pod/terminated || touch /pod/error' EXIT;
        while [ ! -S /tmp/proxysql.sock ]; do sleep 1; done;  # Check sidecar service availability
        ./run-my-task.sh
    volumeMounts:
      - name: shared-data
        mountPath: /pod

Sidecar Containers (SQL Proxy or Monitoring)

We’ll use the same graceful shutdown approach for both.

  - name: proxysql-sidecar
    image: proxysql/proxysql:latest
    command:
      - /bin/sh
      - -c
      - |
        proxysql -f -c "$CONFIG_PATH" & CHILD_PID=$!
        (while true; do if [ -f "/pod/terminated" ] || [ -f "/pod/error" ]; then kill $CHILD_PID; echo "Killed $CHILD_PID because the main container terminated."; fi; sleep 1; done) &
        wait $CHILD_PID
        if [ -f "/pod/error" ]; then
          echo "Job completed with error. Exiting...";
          exit 1;
        elif [ -f "/pod/terminated" ]; then
          echo "Job completed. Exiting...";
          exit 0;
        fi
    volumeMounts:
      - name: shared-data
        mountPath: /pod

We just need to deploy two instances of this container, one as SQL Proxy and the other as Monitoring, using different images or configurations if needed.

How It Works Now

The main job starts.
The sidecar starts the main process in background and termination file waiting loop in foreground.
The main job finishes, creates /pod/terminated (or /pod/error), and exits.
The sidecars detect the termination file, catch the signal, and exit cleanly.
The Job completes, and the CronJob schedules the next run.

No more stuck Jobs, no more missing CronJob executions.

Mission complete!

Final Thoughts

Adding sidecars to Jobs and CronJobs can be tricky, but with a bit of clever process signaling, it’s totally manageable.

If your CronJob mysteriously stops running, check if your sidecars are stuck in Running state. If they are, they’re the problem.

This approach – file signaling + SIGTERM traps – is a simple, reliable fix.

For alternative solutions and further discussion, check out these resources:

• Kubernetes GitHub Issue #25908

Hope this helps! Now go forth and deploy with confidence. 🚀