<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Vitaly Bicov</title>
    <description>The latest articles on DEV Community by Vitaly Bicov (@vitaly_bykov_dd10957baace).</description>
    <link>https://dev.to/vitaly_bykov_dd10957baace</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2851512%2F2cd0a395-62cb-4e43-9194-72f1c8707e21.png</url>
      <title>DEV Community: Vitaly Bicov</title>
      <link>https://dev.to/vitaly_bykov_dd10957baace</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vitaly_bykov_dd10957baace"/>
    <language>en</language>
    <item>
      <title>Supercharge Your CDN with Cloudflare Workers</title>
      <dc:creator>Vitaly Bicov</dc:creator>
      <pubDate>Mon, 18 Aug 2025 09:13:45 +0000</pubDate>
      <link>https://dev.to/vitaly_bykov_dd10957baace/supercharge-your-cdn-with-cloudflare-workers-3h43</link>
      <guid>https://dev.to/vitaly_bykov_dd10957baace/supercharge-your-cdn-with-cloudflare-workers-3h43</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F169mgk3guoqjl3g6fsc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F169mgk3guoqjl3g6fsc3.png" alt="Supercharge Your CDN with Cloudflare Workers" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Modern web applications demand instant content delivery, seamless personalization, and global reliability. Yet, ask any engineer managing a popular site-when a product launch triggers a traffic surge, even the best CDN sometimes buckles. One major retailer’s Black Friday campaign saw their origin servers grind to a halt, not because the CDN failed, but because cache misses skyrocketed for personalized content. The result? Lost sales and a lesson in the evolving needs of web delivery.&lt;/p&gt;

&lt;p&gt;In this article, we’ll explore how Cloudflare Workers and edge computing can transform your CDN from a blunt instrument into a scalpel: precise, programmable, and highly efficient. Whether you’re a DevOps engineer, web architect, or performance-focused developer, you’ll learn actionable strategies for cache optimization, dynamic content, personalization, cost control, and more.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction: The Evolving Demands on CDNs
&lt;/h2&gt;

&lt;p&gt;Content Delivery Networks (CDNs) have long been the backbone of web performance, pushing static files closer to users worldwide. But today’s web requires more than just static acceleration. Personalized content, user-specific routing, and real-time transformations are now table stakes for user experience.&lt;/p&gt;

&lt;p&gt;As web applications become more dynamic and distributed, so do the challenges involved in balancing speed, reliability, and cost. That’s where edge computing-and specifically, Cloudflare Workers-deliver new tools for the modern engineer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common CDN Challenges: Cache Efficiency, Dynamic Content, Personalization
&lt;/h2&gt;

&lt;p&gt;When scaling web applications, traditional CDNs often hit roadblocks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Cache Efficiency: CDNs excel at delivering cacheable static assets (images, CSS, JS). However, dynamic or user-personalized pages often bypass the cache, forcing repeated origin fetches.&lt;/li&gt;
&lt;li&gt;  Dynamic Content: API endpoints, A/B testing, and localization generate unique responses, limiting cache opportunities.&lt;/li&gt;
&lt;li&gt;  Personalization: Cookie-based logic, authentication, and geo-targeted experiences further fragment cacheability.&lt;/li&gt;
&lt;li&gt;  Cost: Increased origin traffic means higher bandwidth bills and potential latency spikes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Key pain point:&lt;/strong&gt;  How do you keep performance high and costs low, even as content gets more dynamic and personalized?&lt;/p&gt;

&lt;h2&gt;
  
  
  Edge Computing and Cloudflare Workers: A Primer
&lt;/h2&gt;

&lt;p&gt;Edge computing shifts computation from centralized servers to geographically distributed nodes (the “edge”), close to the end user. Cloudflare Workers is a serverless platform that runs lightweight JavaScript, TypeScript, or WASM code directly on Cloudflare’s global edge network.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why Workers?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Programmability: Inspect, modify, or generate responses at the edge.&lt;/li&gt;
&lt;li&gt;  Performance: Minimal latency, as logic runs close to users.&lt;/li&gt;
&lt;li&gt;  Scalability: No server management; automatic scaling.&lt;/li&gt;
&lt;li&gt;  Security: Mitigate attacks before requests reach your infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Architecture Overview:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User Request ──► Cloudflare Edge Node (Worker) ──► Origin (if needed)   
                             │ [Custom Logic]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Request and Response Modification at the Edge
&lt;/h2&gt;

&lt;p&gt;With Workers, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Rewrite requests (change URLs, headers, cookies)&lt;/li&gt;
&lt;li&gt;  Implement custom cache keys&lt;/li&gt;
&lt;li&gt;  Filter or block malicious traffic&lt;/li&gt;
&lt;li&gt;  Modify responses (inject headers, rewrite HTML)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Example: Add a Cache-Control Header to API Responses
&lt;/h3&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;addEventListener('fetch', event =&amp;gt; {  
  event.respondWith(handleRequest(event.request));  
});  

async function handleRequest(request) {  
  let response = await fetch(request);  
  // Clone response so we can modify headers  
  response = new Response(response.body, response);  

  // Add Cache-Control for better CDN caching  
  response.headers.set('Cache-Control', 'public, max-age=60');  
  return response;  
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why this matters:&lt;/strong&gt;  Many APIs lack cache directives. By controlling headers at the edge, you unlock CDN caching for previously uncacheable content.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implementing CDN Optimization with Worker Scripts
&lt;/h3&gt;

&lt;p&gt;Let’s walk through a practical example:  &lt;strong&gt;Dynamic cache key customization based on cookies, geography, or device.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario: Personalizing Cache Keys
&lt;/h3&gt;

&lt;p&gt;Suppose you run an e-commerce site with localized pricing, shown based on user country. By default, your CDN may treat all requests to  &lt;code&gt;/shop&lt;/code&gt;  as the same, resulting in cache collisions or misses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Worker script:&lt;/strong&gt;  Customize the cache key using the  &lt;code&gt;CF-Connecting-IP&lt;/code&gt;  or  &lt;code&gt;cf-ipcountry&lt;/code&gt;  header.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;addEventListener('fetch', event =&amp;gt; { event.respondWith(handleRequest(event.request)); }); async function handleRequest(request) { // Use country from header to personalize cache const country = request.headers.get('cf-ipcountry') || 'US'; const url = new URL(request.url); url.searchParams.set('country', country); // Create a custom cache key const cacheKey = new Request(url.toString(), requesaddEventListener('fetch', event =&amp;gt; {  
  event.respondWith(handleRequest(event.request));  
});  

async function handleRequest(request) {  
  // Use country from header to personalize cache  
  const country = request.headers.get('cf-ipcountry') || 'US';  
  const url = new URL(request.url);  
  url.searchParams.set('country', country);  

  // Create a custom cache key  
  const cacheKey = new Request(url.toString(), request);  

  // Try to find in cache  
  const cache = caches.default;  
  let response = await cache.match(cacheKey);  
  if (!response) {  
    // Not in cache, fetch from origin and cache the result  
    response = await fetch(request);  
    // Set a short TTL for dynamic personalization  
    response = new Response(response.body, response);  
    response.headers.set('Cache-Control', 'public, max-age=120');  
    event.waitUntil(cache.put(cacheKey, response.clone()));  
  }  
  return response;  
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Cache is segmented per country, reducing origin hits for localized content.&lt;/li&gt;
&lt;li&gt;  The TTL is tuned for freshness vs. cost.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Advanced Caching Strategies for Dynamic and Personalized Content
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Stale-While-Revalidate
&lt;/h3&gt;

&lt;p&gt;Serve slightly outdated content instantly, while refreshing cache in the background.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;response.headers.set('Cache-Control', 'public, max-age=60, stale-while-revalidate=300');
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt;  News headlines, product listings.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Edge-Side Includes (ESI) Simulation
&lt;/h3&gt;

&lt;p&gt;Combine static and dynamic fragments at the edge.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Fetch static shell from cache, dynamic data from API&lt;br&gt;&lt;br&gt;
const [shell, data] = await Promise.all([&lt;br&gt;&lt;br&gt;
  cache.match(shellRequest),&lt;br&gt;&lt;br&gt;
  fetch(dynamicDataRequest)&lt;br&gt;&lt;br&gt;
]);&lt;br&gt;&lt;br&gt;
// Merge and respond&lt;br&gt;&lt;br&gt;
return new Response(await combine(shell, data), { headers });&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  

&lt;ol&gt;
&lt;li&gt;Device and Language Detection
&lt;/li&gt;
&lt;/ol&gt;
&lt;/h3&gt;


&lt;p&gt;Customize cache key or response based on User-Agent and Accept-Language.&lt;/p&gt;

&lt;h2&gt;
  
  
  Real-World Use Cases: A/B Testing, Geolocation Routing, and Bot Management
&lt;/h2&gt;

&lt;h3&gt;
  
  
  A/B Testing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Problem:&lt;/strong&gt;  Running an experiment by variant assignment in the browser breaks cache efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt;  Assign variant at the edge, cache per variant.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const cookie = request.headers.get('Cookie');&lt;br&gt;&lt;br&gt;
let variant = getVariantFromCookie(cookie) || assignAndSetCookie(event);&lt;br&gt;&lt;br&gt;
// Partition cache by variant&lt;br&gt;&lt;br&gt;
url.searchParams.set('ab_variant', variant);&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Geolocation Routing&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Use case:&lt;/strong&gt;  Redirect users to region-specific domains or serve localized assets.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const country = request.headers.get('cf-ipcountry');&lt;br&gt;&lt;br&gt;
if (country === 'DE') {&lt;br&gt;&lt;br&gt;
  return Response.redirect('&lt;a href="https://de.example.com" rel="noopener noreferrer"&gt;https://de.example.com&lt;/a&gt;' + url.pathname, 302);&lt;br&gt;&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Bot Management&lt;br&gt;
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;At the edge:&lt;/strong&gt;  Block or challenge suspicious bots before they reach your origin.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if (isLikelyBot(request)) {&lt;br&gt;&lt;br&gt;
  return new Response('Access denied', { status: 403 });&lt;br&gt;&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Monitoring and Measuring Success: CDN Metrics, Cache Hit Rates, and Latency&lt;br&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What to Track
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Cache Hit Ratio: % of requests served from edge cache&lt;/li&gt;
&lt;li&gt;  Origin Bandwidth: Volume of traffic reaching backend servers&lt;/li&gt;
&lt;li&gt;  Latency: Time to first byte (TTFB) from user perspective&lt;/li&gt;
&lt;li&gt;  Error Rates: Monitor for false positives in bot management or misrouted requests&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Tools
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Cloudflare Analytics: Built-in dashboard for traffic, cache, and performance&lt;/li&gt;
&lt;li&gt;  Logpush/Logpull: Stream edge logs to your SIEM or analytics platform&lt;/li&gt;
&lt;li&gt;  Custom Metrics: Send data to Datadog, Prometheus, or Cloudflare Workers Analytics Engine&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Cost Optimization: Reducing Bandwidth and Origin Load
&lt;/h3&gt;

&lt;p&gt;By enabling edge-side processing and advanced caching:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Lower Origin Costs: Fewer requests and bytes sent to your infrastructure&lt;/li&gt;
&lt;li&gt;  Reduced Egress Fees: Especially critical for cloud-hosted origins (e.g., AWS, GCP)&lt;/li&gt;
&lt;li&gt;  Faster User Experiences: Less round-trip time to origin, better conversion rates&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Example: Origin Shielding with Workers
&lt;/h3&gt;

&lt;p&gt;Workers can act as a shield, absorbing unnecessary origin requests during traffic surges.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;if (await cache.match(cacheKey)) {&lt;br&gt;&lt;br&gt;
  // Serve from edge, skip origin cost&lt;br&gt;&lt;br&gt;
} else {&lt;br&gt;&lt;br&gt;
  // Fetch and cache as above&lt;br&gt;&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Best Practices, Pitfalls, and Future Trends&lt;br&gt;
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Best Practices
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Keep logic minimal: Edge compute is powerful but should be fast and stateless.&lt;/li&gt;
&lt;li&gt;  Monitor for Edge Cache Fragmentation: Too many cache keys can lower hit rates.&lt;/li&gt;
&lt;li&gt;  Leverage Feature Flags: Gradually roll out worker logic.&lt;/li&gt;
&lt;li&gt;  Test in Staging: Always validate behavior in a non-production environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfalls &amp;amp; Limitations
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Cold Start Latency: Initial worker startup may add milliseconds, but is usually negligible.&lt;/li&gt;
&lt;li&gt;  Execution Timeouts: Workers have strict limits (50ms CPU time for Free plan, higher for paid).&lt;/li&gt;
&lt;li&gt;  Complex State: Workers are stateless; use KV or Durable Objects for persistent data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Future Trends
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;  Deeper Personalization: Edge AI/ML for content adaptation&lt;/li&gt;
&lt;li&gt;  Edge Data Stores: Real-time, globally distributed state&lt;/li&gt;
&lt;li&gt;  Integrated Observability: Native metrics, traces, and logs&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion: The Future of CDN Optimization at the Edge
&lt;/h2&gt;

&lt;p&gt;Cloudflare Workers-and edge computing in general-are redefining what’s possible with CDN performance and flexibility. By bringing code execution and caching closer to your users, you can finally deliver personalized, dynamic, and blazingly fast experiences-without ballooning costs or complexity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Edge compute unlocks advanced CDN strategies (personalization, A/B testing, bot defense)&lt;/li&gt;
&lt;li&gt;  Programmable logic at the edge means fewer origin hits, lower costs, and happier users&lt;/li&gt;
&lt;li&gt;  Careful monitoring, cache key management, and simplicity are crucial for success&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ready to level up? Dive deeper into edge patterns, try Cloudflare Workers in your stack, and start measuring the difference. The next generation of web performance is running at the edge.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at&lt;/em&gt; &lt;a href="https://bicov.pro/blog/supercharge-your-cdn-with-cloudflare-workers" rel="noopener noreferrer"&gt;&lt;em&gt;https://bicov.pro&lt;/em&gt;&lt;/a&gt; &lt;em&gt;on August 18, 2025.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Predictive Auto-Scaling for Stateful Apps</title>
      <dc:creator>Vitaly Bicov</dc:creator>
      <pubDate>Mon, 11 Aug 2025 09:13:22 +0000</pubDate>
      <link>https://dev.to/vitaly_bykov_dd10957baace/predictive-auto-scaling-for-stateful-apps-4cp</link>
      <guid>https://dev.to/vitaly_bykov_dd10957baace/predictive-auto-scaling-for-stateful-apps-4cp</guid>
      <description>&lt;h1&gt;
  
  
  Introduction: The Challenge of Stateful Scaling
&lt;/h1&gt;

&lt;p&gt;Picture this: On Black Friday, a global e-commerce giant’s order-processing system is humming along, scaling web servers seamlessly as customer traffic surges. Yet, deep in the backend, the payment database cluster struggles, unable to keep up with demand spikes. Transactions queue up. Latency grows. Revenue — literally — slips away.&lt;/p&gt;

&lt;p&gt;Auto-scaling stateless services is a solved problem. But getting stateful apps like databases, message queues, and cache clusters to scale predictively and reliably? That’s where the real pain starts for DevOps teams.&lt;/p&gt;

&lt;p&gt;This article is for cloud engineers, SREs, DevOps leads, and architects who are tasked with making stateful applications as elastic, resilient, and cost-efficient as their stateless counterparts. You’ll learn:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Why stateful services are hard to scale&lt;/li&gt;
&lt;li&gt;  How predictive algorithms (time series &amp;amp; ML) can help&lt;/li&gt;
&lt;li&gt;  Practical implementation strategies: custom metrics, scaling policies, data management&lt;/li&gt;
&lt;li&gt;  Real-world examples, pitfalls, and best practices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive in.&lt;/p&gt;

&lt;h1&gt;
  
  
  Stateful vs. Stateless: Key Differences in Scaling
&lt;/h1&gt;

&lt;p&gt;Before we tackle solutions, let’s clarify what’s at stake:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Stateless apps (e.g., web frontends, API gateways) store no client/session data locally. Instances can be created or destroyed at will.&lt;/li&gt;
&lt;li&gt;  Stateful apps (e.g., databases, message brokers, cache servers) hold critical data that must persist and synchronize across nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Scaling stateless workloads:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Easy-just add or remove instances based on CPU, memory, or latency metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scaling stateful workloads:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Hard-because you must also ensure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Data integrity&lt;/li&gt;
&lt;li&gt;  Consistent state across replicas&lt;/li&gt;
&lt;li&gt;  Reliable data persistence and recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Barriers to Scaling Stateful Apps
&lt;/h1&gt;

&lt;p&gt;Let’s break down the main hurdles:&lt;/p&gt;

&lt;h1&gt;
  
  
  Data Consistency and Integrity
&lt;/h1&gt;

&lt;p&gt;Scaling out a stateful app means adding nodes that must sync with existing data-without risking loss or corruption.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Distributed databases (like MongoDB or Cassandra) need strict consistency protocols.&lt;/li&gt;
&lt;li&gt;  Sharding and replication must be coordinated to avoid split-brain scenarios.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Startup Time and Synchronization
&lt;/h1&gt;

&lt;p&gt;Bringing new stateful nodes online isn’t instant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Nodes must fetch data snapshots or stream state from peers.&lt;/li&gt;
&lt;li&gt;  Full sync can take minutes or more, especially under heavy load.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Resource Allocation Complexities
&lt;/h1&gt;

&lt;p&gt;It’s not just about CPU/RAM:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Persistent storage:&lt;/strong&gt;  Each instance requires unique, durable storage (PersistentVolumes in Kubernetes, for example).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Network:&lt;/strong&gt;  Data replication and synchronization add network overhead.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Affinity/anti-affinity:&lt;/strong&gt;  Pods/nodes must be scheduled to minimize risk of data loss.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Predictive Algorithms for Scaling
&lt;/h1&gt;

&lt;p&gt;Reactive scaling (e.g., “add node if CPU &amp;gt; 80%”) is too little, too late for stateful apps. Predictive approaches let you scale  &lt;strong&gt;ahead of demand spikes&lt;/strong&gt;, ensuring new nodes are ready in time.&lt;/p&gt;

&lt;h1&gt;
  
  
  Time Series Analysis: Forecasting Demand
&lt;/h1&gt;

&lt;p&gt;Classic statistical methods (ARIMA, Holt-Winters, Prophet) can predict future load based on historical metrics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sample: Using Prophet to Forecast Cassandra Query Load&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from fbprophet import Prophet  
import pandas as pd  

# Load historical QPS data  
df = pd.read_csv('cassandra_qps_history.csv')  
df.columns = ['ds', 'y']  # Prophet expects 'ds' (timestamp), 'y' (value)  

model = Prophet()  
model.fit(df)  
future = model.make_future_dataframe(periods=24, freq='H')  
forecast = model.predict(future)  

# Print prediction for next 6 hours  
print(forecast[['ds', 'yhat']].tail(6))Deploy these forecasts into your scaling logic to trigger scale-ups before the rush hits.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  Machine Learning Models: Beyond Simple Thresholds
&lt;/h1&gt;

&lt;p&gt;ML models (regression, LSTM, XGBoost) can learn complex patterns-seasonality, sudden bursts, multi-metric correlations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Feature engineering: Include business events (e.g., marketing promotions), user signups, or external signals.&lt;/li&gt;
&lt;li&gt;  Model deployment: Serve predictions via REST APIs or batch pipelines integrated with your scaling controllers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example: ML-based Scaling Trigger&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: autoscaling/v2  
kind: HorizontalPodAutoscaler  
metadata:  
  name: stateful-db-autoscaler  
spec:  
  scaleTargetRef:  
    apiVersion: apps/v1  
    kind: StatefulSet  
    name: my-db  
  minReplicas: 3  
  maxReplicas: 10  
  metrics:  
    - type: External  
      external:  
        metric:  
          name: predicted_write_throughput  
        target:  
          type: Value  
          value: "5000"  # Predicted QPS threshold from ML model
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Here, the target metric (predicted_write_throughput) is supplied by a custom ML service.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Designing Custom Metrics and Scaling Policies
&lt;/h1&gt;

&lt;p&gt;Relying on CPU or memory is rarely enough. Build richer signals.&lt;/p&gt;

&lt;h1&gt;
  
  
  Identifying Relevant Signals
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Request rates (QPS, TPS)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Queue length/lag (Kafka, RabbitMQ)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Replication lag&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Disk IOPS&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Business events (campaign launches, news cycles)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Tip:&lt;/em&gt;  Use Prometheus exporters or custom sidecars to surface these metrics.&lt;/p&gt;

&lt;h1&gt;
  
  
  Integrating Predictions into Auto-Scaling Workflows
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt; Train and deploy your forecasting/ML model.&lt;/li&gt;
&lt;li&gt; Expose predictions as a metrics endpoint (&lt;code&gt;/metrics&lt;/code&gt;  or push to Prometheus).&lt;/li&gt;
&lt;li&gt; Configure your orchestration platform (Kubernetes HPA/VPA, custom controller) to act on these predictions.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Prometheus Adapter Example:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1  
kind: Service  
metadata:  
  name: prediction-metrics  
spec:  
  selector:  
    app: ml-predictor  
  ports:  
    - protocol: TCP  
      port: 8080  
      targetPort: 8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Your HPA can now use these custom metrics for scaling triggers.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Ensuring Application Readiness During Scaling Events
&lt;/h1&gt;

&lt;p&gt;Scaling a stateful app means parts of your system will be unavailable or degraded during transitions. Minimize risk:&lt;/p&gt;

&lt;h1&gt;
  
  
  Health Checks and Readiness Probes
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Liveness probe: Restart unresponsive pods.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Readiness probe: Only send traffic to nodes that are fully initialized and synced.&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;readinessProbe:  
  exec:  
    command: ["/bin/check_db_synced.sh"]  
  initialDelaySeconds: 30  
  periodSeconds: 10
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Graceful Startup and Shutdown
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Delay taking new traffic until state sync is complete.&lt;/li&gt;
&lt;li&gt;  On scale-down,  &lt;strong&gt;drain connections&lt;/strong&gt;  and  &lt;strong&gt;move or flush data&lt;/strong&gt;  safely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Gotcha:&lt;/strong&gt;  Abrupt pod deletion can cause data loss or split-brain. Always use preStop hooks and finalizers.&lt;/p&gt;

&lt;h1&gt;
  
  
  Managing Data Persistence and Volume Lifecycle
&lt;/h1&gt;

&lt;p&gt;Scaling up or down means handling persistent storage with care.&lt;/p&gt;

&lt;h1&gt;
  
  
  Persistent Volume Strategies
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Dynamic provisioning: Use StorageClasses to automate volume creation per replica.&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Retain policy: Avoid deleting volumes until you know data is migrated.&lt;/p&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: storage.k8s.io/v1  
kind: StorageClass  
metadata:  
  name: fast-ssd  
provisioner: kubernetes.io/gce-pd  
reclaimPolicy: Retain
&lt;/code&gt;&lt;/pre&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Backups and Migration During Scaling
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Snapshot before scaling: Ensure you can roll back if sync fails.&lt;/li&gt;
&lt;li&gt;  Automate backups: Integrate with Velero, Stash, or native cloud snapshots.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Example: Pre-Scale Backup Job&lt;/strong&gt;&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: batch/v1&lt;br&gt;
kind: Job&lt;br&gt;
metadata:&lt;br&gt;
  name: db-backup&lt;br&gt;
spec:&lt;br&gt;
  template:&lt;br&gt;
    spec:&lt;br&gt;
      containers:&lt;br&gt;
      - name: backup&lt;br&gt;
        image: my-backup-tool&lt;br&gt;
        command: ["/backup.sh"]&lt;br&gt;
      restartPolicy: OnFailure&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Monitoring and Observability&lt;br&gt;
&lt;/h1&gt;

&lt;p&gt;You can’t improve what you can’t see.&lt;/p&gt;

&lt;h1&gt;
  
  
  Tracking Scaling Events and Performance
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Dashboards: Grafana panels for historical scaling actions, node health, replication lag, failovers.&lt;/li&gt;
&lt;li&gt;  Alerts: Notify on abnormal scaling frequency, pod crashes, or sync errors.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Cost Optimization Analysis
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Correlate resource usage and spend: Are you over-provisioning to “play it safe”?&lt;/li&gt;
&lt;li&gt;  Post-mortems: Analyze scale-up/scale-down timing versus actual demand to fine-tune predictive models.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Case Studies
&lt;/h1&gt;

&lt;p&gt;Let’s look at how real-world teams solve these challenges.&lt;/p&gt;

&lt;h1&gt;
  
  
  Auto-Scaling Database Clusters (MongoDB &amp;amp; Cassandra)
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Problem: Slow to scale due to data copy/sync; risk of inconsistent reads.&lt;/li&gt;
&lt;li&gt;  Solution: Predict spikes using ARIMA; start new nodes 20 minutes ahead. Use readiness probes to ensure only fully-synced nodes receive traffic.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Scaling Message Queue Systems (Kafka)
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Problem: Consumer lag spikes during flash sales; adding brokers mid-event is too slow.&lt;/li&gt;
&lt;li&gt;  Solution: ML model predicts high-lag events from website traffic and product launches. Brokers pre-provisioned, partitions rebalanced gradually.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Caching Layer Elasticity (Redis, Memcached)
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Problem: Traffic bursts cause cache misses and backend overload.&lt;/li&gt;
&lt;li&gt;  Solution: Time-series forecasting triggers cache node warmups, pre-populating popular keys before peak hours.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Best Practices and Lessons Learned
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Don’t rely solely on resource metrics.&lt;/strong&gt;  Use business-aware, custom signals.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Always bake in time for sync and warmup.&lt;/strong&gt;  Predictive scaling is about  &lt;em&gt;when&lt;/em&gt;  not just  &lt;em&gt;how much&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Automate backups and test recovery.&lt;/strong&gt;  Assume node loss and plan for graceful failover.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Monitor everything.&lt;/strong&gt;  Invest in end-to-end observability and cost analytics.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Iterate.&lt;/strong&gt;  Initial models will be wrong-learn and refine with real production data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Conclusion: The Future of Predictive Scaling for Stateful Workloads
&lt;/h1&gt;

&lt;p&gt;Stateful auto-scaling isn’t just a technical feat-it’s an operational imperative for modern, cost-effective cloud-native systems. By combining predictive analytics with robust engineering practices around data, orchestration, and observability, you can make your stateful apps as agile as the cloud promises.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key takeaways:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Predictive scaling bridges the gap between slow, risky stateful scale and fast-changing business demand.&lt;/li&gt;
&lt;li&gt;  Custom metrics and readiness checks are non-negotiable.&lt;/li&gt;
&lt;li&gt;  Invest in automation, monitoring, and continuous improvement.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Next steps:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Explore serverless databases, operator patterns for complex stateful services, and advanced ML for even smarter scaling. The future is predictive-get ahead of the curve.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at&lt;/em&gt; &lt;a href="https://bicov.pro/blog/predictive-auto-scaling-for-stateful-apps" rel="noopener noreferrer"&gt;&lt;em&gt;https://bicov.pro&lt;/em&gt;&lt;/a&gt; &lt;em&gt;on August 11, 2025.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>webdev</category>
      <category>programming</category>
    </item>
    <item>
      <title>Mastering Redis Clusters: Sharding &amp; Monitoring</title>
      <dc:creator>Vitaly Bicov</dc:creator>
      <pubDate>Mon, 28 Jul 2025 08:24:10 +0000</pubDate>
      <link>https://dev.to/vitaly_bykov_dd10957baace/mastering-redis-clusters-sharding-monitoring-6ni</link>
      <guid>https://dev.to/vitaly_bykov_dd10957baace/mastering-redis-clusters-sharding-monitoring-6ni</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi7yq1p1yojkzwyr8na7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi7yq1p1yojkzwyr8na7.png" alt="# Mastering Redis Clusters: Sharding &amp;amp; Monitoring" width="700" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction: Why Redis Clustering Matters
&lt;/h1&gt;

&lt;p&gt;Imagine you’re managing a high-traffic e-commerce platform. Black Friday hits, and suddenly, millions of shoppers are racing through your checkout. Your monolithic Redis instance-once sufficient-now buckles under the load, causing timeouts and lost sales. Sound familiar?&lt;/p&gt;

&lt;p&gt;As organizations scale, single-node Redis deployments eventually become bottlenecks. Redis Clustering offers a resilient, horizontally-scalable architecture with automated sharding and failover. But configuring, operating, and monitoring Redis clusters for production is non-trivial: sharding can be opaque, and cluster health issues can escalate rapidly if not caught early.&lt;/p&gt;

&lt;p&gt;This guide is for DevOps engineers, SREs, and backend developers who want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Confidently deploy and operate Redis clusters at scale&lt;/li&gt;
&lt;li&gt;  Understand how data is distributed via sharding&lt;/li&gt;
&lt;li&gt;  Monitor, maintain, and scale clusters for high availability and performance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let’s dive in and master Redis Clusters from the ground up.&lt;/p&gt;

&lt;h1&gt;
  
  
  Redis Cluster Fundamentals
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Data Distribution and Hash Slots
&lt;/h2&gt;

&lt;p&gt;Redis Cluster distributes data using a concept called  &lt;strong&gt;hash slots&lt;/strong&gt;. There are 16,384 hash slots, and each key maps to one slot. Cluster nodes own subsets of these slots, forming the basis of automatic sharding.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Sharding&lt;/strong&gt;: Each node manages a subset of the keyspace.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Even Distribution&lt;/strong&gt;: Reduces risk of hot spots and balances load.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Replication and Consistency Models
&lt;/h2&gt;

&lt;p&gt;Redis Cluster offers  &lt;em&gt;asynchronous&lt;/em&gt;  replication:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Master nodes&lt;/strong&gt;: Store the data and handle writes.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Replica nodes&lt;/strong&gt;: Maintain copies of master data for failover and read scalability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Consistency is  &lt;strong&gt;eventual&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Writes are acknowledged after being committed to the master.&lt;/li&gt;
&lt;li&gt;  Reads from replicas may be stale.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Automatic Failover
&lt;/h2&gt;

&lt;p&gt;If a master node fails, the cluster promotes one of its replicas to master-  &lt;em&gt;automatically&lt;/em&gt;  -minimizing downtime and removing manual intervention from the critical path.&lt;/p&gt;

&lt;h1&gt;
  
  
  Setting Up a Redis Cluster
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Node Configuration and Key Settings
&lt;/h2&gt;

&lt;p&gt;Let’s create a basic 6-node cluster (3 masters, 3 replicas) using Docker for local testing:&lt;/p&gt;

&lt;p&gt;Create a Docker network for cluster nodes &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker network create redis-cluster-net
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Start 6 Redis nodes&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for port in 7000 7001 7002 7003 7004 7005; do   
  docker run -d --name redis-$port --net redis-cluster-net \  
    -p $port:6379 \  
    -v $(pwd)/redis-$port.conf:/usr/local/etc/redis/redis.conf \  
    redis:7.2-alpine redis-server /usr/local/etc/redis/redis.conf  
done
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;A minimal  &lt;code&gt;redis-7000.conf&lt;/code&gt;  would include:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;port 6379   
cluster-enabled yes   
cluster-config-file nodes.conf   
cluster-node-timeout 5000   
appendonly yes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Gotcha:&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;Make sure&lt;/em&gt; &lt;code&gt;_cluster-enabled_&lt;/code&gt; &lt;em&gt;is&lt;/em&gt; &lt;code&gt;_yes_&lt;/code&gt; &lt;em&gt;and each node has a unique config file.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Slot Allocation and Replication Topology
&lt;/h2&gt;

&lt;p&gt;Join the nodes into a cluster using  &lt;code&gt;redis-cli&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -it --rm --net redis-cluster-net redis:7.2-alpine \  
  redis-cli --cluster create \  
    172.18.0.2:6379 172.18.0.3:6379 172.18.0.4:6379 \  
    172.18.0.5:6379 172.18.0.6:6379 172.18.0.7:6379 \  
    --cluster-replicas 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;This command:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Allocates hash slots evenly across the 3 masters.&lt;/li&gt;
&lt;li&gt;  Assigns 1 replica per master (the other 3 nodes).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Verify status:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli -c -p 7000 cluster nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
  
  
  Automated Sharding in Redis
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Understanding Hash Slots
&lt;/h2&gt;

&lt;p&gt;Every key is assigned to a slot via CRC16(key) mod 16384. The cluster maps slots to nodes. For example:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli -c -p 7000 cluster keyslot mykey123  
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Output: a slot number (e.g., 15365)
&lt;/h2&gt;

&lt;p&gt;This determines which node stores  &lt;code&gt;mykey123&lt;/code&gt;.&lt;/p&gt;
&lt;h2&gt;
  
  
  Automated Resharding
&lt;/h2&gt;

&lt;p&gt;Need to redistribute data as you add nodes? Use  &lt;code&gt;redis-cli --cluster reshard&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli --cluster reshard 127.0.0.1:7000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Interactive prompts will let you move slots between nodes, with the cluster handling key migration.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Tip:&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;For zero-downtime resharding, do it during off-peak hours and monitor latency!&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Scaling the Cluster Dynamically
&lt;/h2&gt;

&lt;p&gt;Adding a new node:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Start the new Redis node (e.g., on port 7006).&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Add it to the cluster:&lt;/p&gt;

&lt;p&gt;redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;h1&gt;
  
  
  Monitoring Your Redis Cluster
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Cluster Health and Node Status
&lt;/h2&gt;

&lt;p&gt;Monitor with:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli -c -p 7000 cluster info
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Look for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;cluster_state:ok&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;cluster_slots_assigned:16384&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;cluster_known_nodes&lt;/code&gt;: number of nodes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Automated health checks can poll this endpoint and alert if state is not  &lt;code&gt;ok&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracking Memory Usage and Key Distribution
&lt;/h2&gt;

&lt;p&gt;Use:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli -c -p 7000 info memory   
redis-cli -c -p 7000 cluster nodes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;  Ensure memory usage is balanced.&lt;/li&gt;
&lt;li&gt;  Check for slot imbalances or node failures.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A simple script to check slot distribution:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli -c -p 7000 cluster nodes | grep master | awk '{print $2, $9}'&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Alerting and Visualization Tools&lt;br&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Prometheus + Redis Exporter&lt;/strong&gt;: Collect metrics from all nodes.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Grafana Dashboards&lt;/strong&gt;: Visualize memory, command rates, slot distribution.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Alertmanager&lt;/strong&gt;: Notification on health or performance anomalies.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example:  &lt;a href="https://github.com/oliver006/redis_exporter" rel="noopener noreferrer"&gt;oliver006/redis_exporter&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Operational Playbook
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Node Maintenance and Upgrades
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Rolling Upgrades&lt;/strong&gt;: Upgrade replica nodes first, then promote and upgrade masters one by one.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Graceful Failover&lt;/strong&gt;: Use  &lt;code&gt;CLUSTER FAILOVER&lt;/code&gt;  to promote a replica before maintaining a master.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example sequence:&lt;/p&gt;

&lt;p&gt;On a replica node:&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;redis-cli -c -p 7003 cluster failover&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Failure Recovery Procedures&lt;br&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Automatic Failover&lt;/strong&gt;: The cluster promotes replicas automatically.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Manual Intervention&lt;/strong&gt;: If all replicas are lost, restore from backups.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Rejoining&lt;/strong&gt;: Use  &lt;code&gt;CLUSTER MEET&lt;/code&gt;  to re-add recovered nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;&lt;em&gt;Common Mistake:&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;Not maintaining up-to-date replicas-if a master and its replicas fail, data loss is possible.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Performance Tuning Best Practices
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Enable  &lt;code&gt;appendonly yes&lt;/code&gt;  for durability.&lt;/li&gt;
&lt;li&gt;  Tune  &lt;code&gt;cluster-node-timeout&lt;/code&gt;  (default 15s) for your network latency.&lt;/li&gt;
&lt;li&gt;  Monitor for large keys or hot slots; consider client-side sharding if needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Integrating Clients with Redis Cluster
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Connection Handling and Discovery
&lt;/h2&gt;

&lt;p&gt;Use Redis Cluster-aware clients (e.g.,  &lt;a href="https://github.com/redis/jedis" rel="noopener noreferrer"&gt;Jedis&lt;/a&gt;,  &lt;a href="https://lettuce.io/" rel="noopener noreferrer"&gt;lettuce&lt;/a&gt;,  &lt;a href="https://github.com/Grokzen/redis-py-cluster" rel="noopener noreferrer"&gt;redis-py-cluster&lt;/a&gt;).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Clients discover the cluster topology at startup.&lt;/li&gt;
&lt;li&gt;  On  &lt;code&gt;MOVED&lt;/code&gt;  or  &lt;code&gt;ASK&lt;/code&gt;  responses, they reroute requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Example (Python):&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from rediscluster import RedisCluster   

&lt;p&gt;rc = RedisCluster(startup_nodes=[{"host": "127.0.0.1", "port": "7000"}], decode_responses=True)&lt;br&gt;&lt;br&gt;
rc.set("user:100", "alice")&lt;br&gt;&lt;br&gt;
print(rc.get("user:100"))&lt;br&gt;
&lt;/p&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h1&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Error Recovery Strategies&lt;br&gt;
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;MOVED/ASK errors&lt;/strong&gt;: Cluster-aware clients handle these and re-route transparently.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Retries&lt;/strong&gt;: Implement retry logic for transient failures.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Backoff strategies&lt;/strong&gt;: For network partitions or failover, use exponential backoff.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Load Balancing Approaches
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;  Let the client connect to any node; the cluster will direct requests.&lt;/li&gt;
&lt;li&gt;  For maximum resiliency, provide multiple startup nodes.&lt;/li&gt;
&lt;li&gt;  Avoid single-node proxies, which can become bottlenecks.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Conclusion: Best Practices and Takeaways
&lt;/h1&gt;

&lt;p&gt;Redis Clustering unlocks massive scalability and high availability for demanding workloads. Key takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Sharding is handled automatically via hash slots-understand slot allocation for effective troubleshooting.&lt;/li&gt;
&lt;li&gt;  Monitor cluster state, memory, and slot balance proactively; integrate with tools like Prometheus and Grafana.&lt;/li&gt;
&lt;li&gt;  Master operational playbooks for upgrades, resharding, and failure recovery.&lt;/li&gt;
&lt;li&gt;  Use cluster-aware clients and implement robust error recovery.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What’s next?&lt;/strong&gt;  Explore advanced topics like multi-datacenter clusters, tuning persistence for your durability needs, and integrating Redis cluster with cloud orchestration (Kubernetes, managed Redis services).&lt;/p&gt;

&lt;p&gt;By mastering Redis Cluster internals and monitoring, you’ll confidently scale your data layer-no matter how high the traffic spikes.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Originally published at&lt;/em&gt; &lt;a href="https://bicov.pro/blog/mastering-redis-clusters-sharding-monitoring" rel="noopener noreferrer"&gt;&lt;em&gt;https://bicov.pro&lt;/em&gt;&lt;/a&gt; &lt;em&gt;on July 28, 2025.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Microsoft’s Majorana-1 Quantum Chip: The Future Is Closer Than You Think</title>
      <dc:creator>Vitaly Bicov</dc:creator>
      <pubDate>Sun, 27 Jul 2025 11:45:47 +0000</pubDate>
      <link>https://dev.to/vitaly_bykov_dd10957baace/microsofts-majorana-1-quantum-chip-the-future-is-closer-than-you-think-b2</link>
      <guid>https://dev.to/vitaly_bykov_dd10957baace/microsofts-majorana-1-quantum-chip-the-future-is-closer-than-you-think-b2</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kat34hubuig64x7gpt8.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kat34hubuig64x7gpt8.jpeg" alt="Microsoft’s Majorana-1 Quantum Chip: The Future Is Closer Than You Think" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Introduction: Quantum Computing Becomes Reality&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
In February 2025, Microsoft made shockwaves in the tech world by revealing Majorana-1, its first quantum computing chip. And this is no ordinary quantum chip — it’s a completely new paradigm. Built on topological state of matter, Majorana-1 is claimed to provide stability and scalability to quantum computing, and for good reasons. We’re talking about a chip capable of solving problems today’s top supercomputers cannot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How Majorana-1 Works: Topological Magic and Majorana Particles&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
So, what’s so special about this chip? The magic lies in topological quantum computing. The conventional qubits, by contrast, are fragile and prone to mistakes. The topological qubits, by contrast, are much sturdier. The sturdiness is achieved by tapping anyons — quirky quasiparticles in two-dimensional systems. The real magic is revealed, however, if anyons are braided. The braiding pattern in which they’re braided is the basis for quantum computations, and such is inherently immune to errors.&lt;/p&gt;

&lt;p&gt;But the hero of the piece? Majorana particles. These are their antiparticle equivalents, something that sounds like something taken directly from  &lt;em&gt;Star Trek&lt;/em&gt;  but is, in reality, serious business. Majorana particles yield topologically stable qubits, something resistant to the noisy world that otherwise interferes with quantum computers.&lt;/p&gt;

&lt;p&gt;The chip consists of aluminum (a superconductor) and indium arsenide (a semiconductor) and is what is described as a  &lt;em&gt;topoconductor&lt;/em&gt;. For now, it only contains eight qubits, but Microsoft is in the process of scaling it to jaw-dropping levels of one million qubits.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Current Status: An Insight to the Future&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
At this stage, Majorana-1 is in prototype. With only eight onboard qubits, it’s not quite prime time — you’re not spinning it up on Azure today. Don’t let that low figure fool you, though. Microsoft is planning decades, not years, to develop a fault-tolerant quantum computer. And no, no empty promise — it’s under the umbrella of DARPA’s Underexplored Systems for Utility-Scale Quantum Computing (US2QC) initiative.&lt;/p&gt;

&lt;p&gt;The chip’s current capability is remarkable: 1% error rates, a huge improvement on current capabilities and something that might bring large-scale quantum computing sooner than anybody might have hoped.&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhb03gqiqsqhfqy6hy364.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhb03gqiqsqhfqy6hy364.jpeg" alt="Quantum simulation of interactions between molecules" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why It Matters: Real-World Significance and Game-Changing Implications&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If Microsoft is successful and accomplishes that goal of one-million-qubits, the implications are huge. Here’s how:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Drug Discovery:&lt;/strong&gt;  Quantum simulation of interactions between molecules is likely to revolutionize how drugs are developed, speeding up timelines and reducing cost.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Materials Science:&lt;/strong&gt;  The understanding of advanced materials on the quantum level is likely to provide breakthroughs in energy storage and in nanotechnology.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Optimization Problems:&lt;/strong&gt;  From logistics to supply chain, quantum computers may solve problems to which centuries may pass before a response is given by classical computers.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;AI and Machine Learning:&lt;/strong&gt;  Quantum AI is likely to produce superior, faster algorithms than any in current use.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cryptography:&lt;/strong&gt;  Quantum computers have the ability to decrypt today’s modes of cryptography, but in so doing, open up to new, quantum-resistant cryptography.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft’s plan to incorporate Majorana-1 in Azure cloud services may bring quantum computing to a much larger audience, and in doing so, bring to earth something long appeared to lie in orbit.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Challenges: The Road to Progress is Not Smoooooth&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Of course, no rainbows and unicorns. There are serious barriers to overcome. The mere existence of Majorana particles is contested in some corners of the scientific world, and claims have in the past been discounted. Material defects and realistic operating issues also shadow them.&lt;/p&gt;

&lt;p&gt;However, Microsoft is poised. Recent reports in  &lt;em&gt;Nature&lt;/em&gt;  suggest they have achieved low-error successes, validating their ambitious plans. Until, though, there is a scalable, fault-tolerant quantum computer running in live applications, there is always going to be doubting.&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3887fu3xrhf3qd4sav5u.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3887fu3xrhf3qd4sav5u.jpeg" alt="Microsoft's chip" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Bigger Picture: How Microsoft Sizes Up&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Microsoft isn’t competing in any way. Its competitors, such as Google, IBM, IonQ, and Rigetti, have their own plans to advance. Google’s Willow chip and 1,121-qubit Condor processor by IBM stand on their own. Nonetheless, Microsoft’s topological solution might have an edge — stability and scalability, to begin, and something everyone might have to copy.&lt;/p&gt;

&lt;p&gt;While some industry leaders, including Nvidia CEO Jensen Huang, believe pragmatic quantum computing is decades ahead, Microsoft is counting on otherwise. If they’re correct, quantum computing is only a few years away from redefining everything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conclusion: Quantum Computing, Nearer Than You Realize&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Microsoft’s Majorana-1 is no mere chip — but rather, a solid push towards enabling quantum computing to be made real, reliable, and scalable. Having plans to advance to a million qubits and seamless integration in Azure, Microsoft is in position to take leadership in a technology revolution.&lt;/p&gt;

&lt;p&gt;Sure, there are barriers. But if Majorana-1 is successful, we’re talking about an age where quantum computing is no longer relegated to laboratory testing — but is instead a game-changer in industry, in medicine, in AI. The age of quantum is no longer on the horizon — but may be closer than ever.&lt;/p&gt;

</description>
      <category>devops</category>
    </item>
    <item>
      <title>CPU in Linux. Load Average</title>
      <dc:creator>Vitaly Bicov</dc:creator>
      <pubDate>Sun, 27 Jul 2025 10:24:12 +0000</pubDate>
      <link>https://dev.to/vitaly_bykov_dd10957baace/cpu-in-linux-load-average-1ci</link>
      <guid>https://dev.to/vitaly_bykov_dd10957baace/cpu-in-linux-load-average-1ci</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27u9qa956qpgrlwa8cmi.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F27u9qa956qpgrlwa8cmi.webp" alt="CPU in Linux. Load Average" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Load Average is an important measurement in Linux to assess system load on average. It represents the average rate for running and waiting-for-run-queue processes on the system for 1, 5, and 15-minute time intervals. Compared to using only CPU utilization, Load Average gives system administrators a better and deeper understanding of the current load.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;The Evolution of Load Average&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Originally, this measurement wasn’t always so versatile. Prior to 1993, it only reflected CPU load average (similar to other Unix systems at that time) and didn’t account for other resource demands. Everything changed with a patch released on Friday, October 29, 1993, where the author stated:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The kernel only counts ‘runnable’ processes when computing the load average. I don’t like that; the problem is that processes which are swapping or waiting on ‘fast,’ i.e. noninterruptible, I/O, also consume resources. It seems somewhat nonintuitive that the load average goes down when you replace your fast swap disk with a slow swap disk… Anyway, the following patch seems to make the load average much more consistent WRT the subjective speed of the system. And, most important, the load is still zero when nobody is doing anything.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;The Big Takeaway&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;While the exact evolution of the code after that patch hasn’t been fully explored here, the crucial point remains: from that moment on, people began thinking of Load Average not merely as CPU load but as an indicator of overall system load.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4u243flam635eo0ar682.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4u243flam635eo0ar682.jpeg" alt="Load Average in Ubuntu" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;How Load Average is Calculated&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;From the quotation provided or in various websites, “average” (in the context of Load Average) might appear to simply be an arithmetic average of values over a given period. In reality, Load Average in Linux is calculated using an  &lt;strong&gt;exponential moving average (EMA)&lt;/strong&gt;  rather than a simple arithmetic average.&lt;/p&gt;

&lt;p&gt;This approach gives recent system load changes greater priority compared to historical data. As a result, the value remains both sensitive and stable, making it especially useful for monitoring system performance.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;The Load Average Formula&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1w7fr3koo78aaponwl68.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1w7fr3koo78aaponwl68.jpeg" alt="The Load Average Formula" width="800" height="61"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;where:&lt;/p&gt;

&lt;p&gt;• is the new Load Average value.&lt;/p&gt;

&lt;p&gt;• is the previous Load Average value.&lt;/p&gt;

&lt;p&gt;• is the current number of processes in the run queue (running + waiting).&lt;/p&gt;

&lt;p&gt;• is the time elapsed since the last update.&lt;/p&gt;

&lt;p&gt;• (tau) is a time constant (different for the 1-, 5-, and 15-minute averages):&lt;/p&gt;

&lt;p&gt;• 1 minute: seconds&lt;/p&gt;

&lt;p&gt;• 5 minutes: seconds&lt;/p&gt;

&lt;p&gt;• 15 minutes: seconds&lt;/p&gt;

&lt;p&gt;Every second, the Linux kernel changes the Load Average with a smoothing formula. Each of the three metrics (1, 5, and 15 minutes) has its own decay coefficients.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decay Factor for Updates&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvv4kb7gk9wvoz0v7w1r.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvv4kb7gk9wvoz0v7w1r.jpeg" alt="Decay Factor Formula" width="800" height="110"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This formula makes sudden changes in the system less sharp. For instance, when you start a heavy workload, the Load Average does not jump right up to the highest point but rises slowly instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Calculation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Let’s assume the current 1-minute LA is , and at time four processes appear in the queue.&lt;/p&gt;

&lt;p&gt;The new Load Average is calculated as follows:&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2ae1jju2139kytzmvzr.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff2ae1jju2139kytzmvzr.jpeg" alt="Load Average calculation example 1" width="800" height="38"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8p08op23mq1wozyb469.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv8p08op23mq1wozyb469.jpeg" alt="Load Average calculation example 2" width="800" height="36"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3cjvtsuog5ygvntz6ff.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd3cjvtsuog5ygvntz6ff.jpeg" alt="Load Average calculation example 3" width="800" height="36"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;…and so on.&lt;/p&gt;

&lt;p&gt;The new Load Average does not go directly to 4; it rises slowly. This method allows the metric to show the current system state better. In math terms, the three values (1, 5, and 15 minutes) always average the total system load since the start. They decay in an exponential way but at different speeds — for 1, 5, and 15 minutes. Therefore, the 1-minute average has around 63% of the load from the last minute and 37% from earlier times, not counting the last minute. The same 63%/37% distribution holds for the 5 and 15-minute averages for their respective times. It’s not exactly true to say that the 1-minute average includes just the last 60 seconds of activity — it also has 37% from a more distant past. But it is accurate to say that it mainly shows the last minute.&lt;/p&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wes5560qz81xas9tzgn.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0wes5560qz81xas9tzgn.jpeg" alt="Htop in Linux" width="800" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Practical Use&lt;/strong&gt;
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;If Load Average &amp;lt; Number of cores, the system is running normally.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If Load Average ≈ Number of cores, the CPU is at 100% utilization.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If Load Average &amp;gt; Number of cores, processes are waiting for CPU time, indicating potential performance problems.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;For example, on an 8-core server:&lt;/p&gt;

&lt;p&gt;• LA(5) = 4 → CPU is about 50% utilized.&lt;/p&gt;

&lt;p&gt;• LA(5) = 12 → CPU is overloaded; processes are waiting.&lt;/p&gt;

&lt;h1&gt;
  
  
  Why Load Average Is Better Than CPU Utilization Percentage?
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt; Processes in the Queue. Load Average includes both active processes on the CPU and those waiting in the queue. CPU utilization percentage, however, only shows current CPU activity and ignores queued processes, which can lead to underestimating the real load.&lt;/li&gt;
&lt;li&gt; 2. Overall Load Measure. Load Average reflects all CPU cores. In systems with multiple processors or hyperthreading, it better represents the system’s load than CPU utilization, which may overlook the impact of queued processes.&lt;/li&gt;
&lt;li&gt; 3. I/O Factors. Load Average takes into account processes in “uninterruptible sleep” (like waiting for I/O). These processes can greatly impact performance even when not using CPU time, which you might miss if considering just CPU utilization.&lt;/li&gt;
&lt;li&gt; 4. Analyzing Trends. Load Average gives data for three timeframes (1, 5, and 15 minutes), making it simpler to observe changes over time and spot possible problems. In contrast, CPU utilization percentage offers only a brief view, lacking any trend analysis.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Zoom image will be displayed&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbhtwu7cx5qlsbu318lp.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvbhtwu7cx5qlsbu318lp.jpeg" alt="Atop in Linux" width="800" height="186"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;About Hyperthreading&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Hyperthreaded systems allow each physical processor to manage two instruction streams which results in the formation of logical cores. Linux treats these streams as separate cores. A computer system with 4 physical processor cores and hyperthreading capability will appear to have 8 logical cores. When a system reaches a Load Average of 8.0 it indicates full usage of all logical cores yet doesn’t guarantee performance to be twice as fast. Performance degradation occurs faster as Load Average increases in hyperthreaded systems compared to those without hyperthreading. Our next article will cover the operational details of hyperthreading and explore its benefits and drawbacks along with its overall effects.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;strong&gt;Conclusion&lt;/strong&gt;
&lt;/h1&gt;

&lt;p&gt;Load Average provides richer insight into system performance through its extensive assessment of system load beyond basic CPU utilization measures. Load Average includes information about the current CPU workload and scheduled processes that are pending execution and tracks additional elements like I/O operations. Load Average proves to be an essential monitoring and management tool for Linux systems when dealing with heavy load or when working with servers that have multiple cores and hyperthreading capabilities.&lt;br&gt;&lt;br&gt;
The comprehensive historical analysis of this metric can be found in Brendan Gregg’s article and its previous Habr publication translation. Should you want to investigate the kernel code to understand this metric’s operation you are welcome to explore it directly!&lt;/p&gt;

</description>
      <category>devops</category>
    </item>
    <item>
      <title>Kubernetes CronJob + Sidecar: A Love Story Gone Wrong (And How to Fix It)</title>
      <dc:creator>Vitaly Bicov</dc:creator>
      <pubDate>Sun, 27 Jul 2025 10:05:59 +0000</pubDate>
      <link>https://dev.to/vitaly_bykov_dd10957baace/kubernetes-cronjob-sidecar-a-love-story-gone-wrong-and-how-to-fix-it-4e8i</link>
      <guid>https://dev.to/vitaly_bykov_dd10957baace/kubernetes-cronjob-sidecar-a-love-story-gone-wrong-and-how-to-fix-it-4e8i</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyd80qq28934rxs89izsh.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyd80qq28934rxs89izsh.webp" alt="Kubernetes CronJob + Sidecar: A Love Story Gone Wrong (And How to Fix It)" width="800" height="800"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I work at a large product company with a sprawling Kubernetes infrastructure. We run thousands of workloads, process massive amounts of data, and rely on automation to keep things running smoothly. So when we needed to execute a scheduled task in Kubernetes, using a CronJob seemed like a no-brainer.&lt;/p&gt;

&lt;p&gt;At first, everything worked perfectly. Our CronJob fired up a Job, the task ran, completed, and exited cleanly.&lt;/p&gt;

&lt;p&gt;But then, as always, the requirements changed:&lt;/p&gt;

&lt;p&gt;• The script was opening too many database connections, so we added an SQL proxy to optimize connection pooling.&lt;/p&gt;

&lt;p&gt;• The task became mission-critical, meaning we needed real-time monitoring to ensure failures wouldn’t go unnoticed.&lt;/p&gt;

&lt;p&gt;• We added sidecar containers for these enhancements… and that’s when everything broke.&lt;/p&gt;

&lt;p&gt;The Problem: CronJob Stopped Running&lt;/p&gt;

&lt;p&gt;Kubernetes CronJobs work by creating Jobs, which spin up Pods to execute the actual work. A Job is considered complete only when all containers in the pod reach the Succeeded state.&lt;/p&gt;

&lt;p&gt;Our main container was completing successfully, transitioning to Succeeded.&lt;/p&gt;

&lt;p&gt;But our sidecar containers – SQL Proxy and Monitoring – were running indefinitely.&lt;/p&gt;

&lt;p&gt;Since they never exited, the Job never finished, and the CronJob never scheduled the next execution.&lt;/p&gt;

&lt;p&gt;Oops.&lt;/p&gt;

&lt;p&gt;Why We Needed These Sidecars in the First Place&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;SQL Proxy: Our script was making hundreds of direct DB connections, overwhelming the database. Adding a SQL proxy helped pool connections, reducing the load.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Monitoring: The job wasn’t just some background task – it was mission-critical. If it failed silently, key business processes would break. We needed real-time logs and metrics to ensure it was running correctly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So removing the sidecars wasn’t an option. Instead, we needed to teach them when to exit.&lt;/p&gt;

&lt;p&gt;The Fix: Graceful Shutdown via File Signaling&lt;/p&gt;

&lt;p&gt;We needed a way to tell the sidecars:&lt;/p&gt;

&lt;p&gt;“Hey, the main job is done. Time to shut down.”&lt;/p&gt;

&lt;p&gt;Here’s the new strategy:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The main container creates a file (/shared-data/done) in a shared volume at startup.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The sidecars monitor the file using inotifywait.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;When the main job finishes, it deletes the file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The sidecars detect this, terminate gracefully using SIGTERM, and exit.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Job completes, and the CronJob can schedule the next run.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This Problem Isn’t New&lt;/p&gt;

&lt;p&gt;This issue has been around for a while, and various workarounds have been proposed. There are even specialized projects like K8S Job Sidecar Terminator, which help manage sidecar shutdown for Kubernetes Jobs.&lt;/p&gt;

&lt;p&gt;However, our approach is much simpler and doesn’t require any additional components – just a shared volume and a simple script inside the containers.&lt;/p&gt;

&lt;p&gt;Implementation: The Helm Chart&lt;/p&gt;

&lt;p&gt;Shared Volume&lt;br&gt;
We’ll use an emptyDir volume so all containers in the pod can access the same file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;volumes:
  - name: shared-data
    emptyDir: {}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;The Main Job Container&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our main job script will:&lt;/p&gt;

&lt;p&gt;• Execute the actual task.&lt;/p&gt;

&lt;p&gt;• Create /pod/terminated (or /pod/error) when it have finished.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;containers:
  - name: main-job
    image: my-job-image:latest
    command:
      - /bin/sh
      - -c
      - |
        trap '[ $? -eq 0 ] &amp;amp;&amp;amp; touch /pod/terminated || touch /pod/error' EXIT;
        while [ ! -S /tmp/proxysql.sock ]; do sleep 1; done;  # Check sidecar service availability
        ./run-my-task.sh
    volumeMounts:
      - name: shared-data
        mountPath: /pod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt;Sidecar Containers (SQL Proxy or Monitoring)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We’ll use the same graceful shutdown approach for both.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  - name: proxysql-sidecar
    image: proxysql/proxysql:latest
    command:
      - /bin/sh
      - -c
      - |
        proxysql -f -c "$CONFIG_PATH" &amp;amp; CHILD_PID=$!
        (while true; do if [ -f "/pod/terminated" ] || [ -f "/pod/error" ]; then kill $CHILD_PID; echo "Killed $CHILD_PID because the main container terminated."; fi; sleep 1; done) &amp;amp;
        wait $CHILD_PID
        if [ -f "/pod/error" ]; then
          echo "Job completed with error. Exiting...";
          exit 1;
        elif [ -f "/pod/terminated" ]; then
          echo "Job completed. Exiting...";
          exit 0;
        fi
    volumeMounts:
      - name: shared-data
        mountPath: /pod
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We just need to deploy two instances of this container, one as SQL Proxy and the other as Monitoring, using different images or configurations if needed.&lt;/p&gt;

&lt;p&gt;How It Works Now&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;The main job starts.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The sidecar starts the main process in background and termination file waiting loop in foreground.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The main job finishes, creates /pod/terminated (or /pod/error), and exits.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The sidecars detect the termination file, catch the signal, and exit cleanly.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The Job completes, and the CronJob schedules the next run.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;No more stuck Jobs, no more missing CronJob executions.&lt;/p&gt;

&lt;p&gt;Mission complete!&lt;/p&gt;

&lt;p&gt;Final Thoughts&lt;/p&gt;

&lt;p&gt;Adding sidecars to Jobs and CronJobs can be tricky, but with a bit of clever process signaling, it’s totally manageable.&lt;/p&gt;

&lt;p&gt;If your CronJob mysteriously stops running, check if your sidecars are stuck in Running state. If they are, they’re the problem.&lt;/p&gt;

&lt;p&gt;This approach – file signaling + SIGTERM traps – is a simple, reliable fix.&lt;/p&gt;

&lt;p&gt;For alternative solutions and further discussion, check out these resources:&lt;/p&gt;

&lt;p&gt;• Kubernetes GitHub Issue #25908&lt;/p&gt;

&lt;p&gt;Hope this helps! Now go forth and deploy with confidence. 🚀&lt;/p&gt;

</description>
      <category>devops</category>
    </item>
  </channel>
</rss>
