<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tibo Beijen</title>
    <description>The latest articles on DEV Community by Tibo Beijen (@tbeijen).</description>
    <link>https://dev.to/tbeijen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F92094%2F0b9a7cf5-a3d8-477d-818a-66e18d9ecf9f.jpg</url>
      <title>DEV Community: Tibo Beijen</title>
      <link>https://dev.to/tbeijen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tbeijen"/>
    <language>en</language>
    <item>
      <title>Introducing the Zen of DevOps</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Sun, 01 Mar 2026 08:31:00 +0000</pubDate>
      <link>https://dev.to/tbeijen/introducing-the-zen-of-devops-3khm</link>
      <guid>https://dev.to/tbeijen/introducing-the-zen-of-devops-3khm</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Over the past ten years or so, my role has gradually shifted from software to platforms. More towards the 'ops' side of things, but coming from a background that values APIs, automation, artifacts and guardrails in the form of automated tests.&lt;/p&gt;

&lt;p&gt;And I found out that a lot of best practices from software engineering can be adapted and applied to modern ops practices as well.&lt;/p&gt;

&lt;p&gt;DevOps in a nutshell really: Bridging the gap between Dev and Ops. &lt;/p&gt;

&lt;p&gt;One of the most impacting pieces of guidance I have encountered, is the &lt;a href="https://peps.python.org/pep-0020/" rel="noopener noreferrer"&gt;Zen of Python&lt;/a&gt;. Which largely applies to modern DevOps as well.&lt;/p&gt;

&lt;p&gt;So, I have created a variant: The &lt;a href="https://www.zenofdevops.org/" rel="noopener noreferrer"&gt;Zen of DevOps&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Zen of Python
&lt;/h2&gt;

&lt;p&gt;It must have been around 2013 or so, when working at &lt;a href="https://www.nu.nl/" rel="noopener noreferrer"&gt;NU.nl&lt;/a&gt;, when we phased out PHP in favor of Python. And that was an interesting mental exercise!&lt;/p&gt;

&lt;p&gt;Now I like my share of abstractions. When working on my graduation project, my favorite part was using OOP concepts in Macromedia Director, even though the demo app was just a small part of the project's scope. And working with PHP I went through my '&lt;a href="https://en.wikipedia.org/wiki/Design_Patterns" rel="noopener noreferrer"&gt;Gang of Four&lt;/a&gt;' phase and built a fair share of overengineered bloat. &lt;a href="https://framework.zend.com/manual/2.4/en/index.html" rel="noopener noreferrer"&gt;Zend Framework&lt;/a&gt; was my tool of choice, satisfying every design pattern crave I had.&lt;/p&gt;

&lt;p&gt;Then came Python. And with that came &lt;a href="https://www.djangoproject.com/" rel="noopener noreferrer"&gt;Django&lt;/a&gt;, an &lt;em&gt;opinionated&lt;/em&gt; framework. Really the opposite side of Zend Framework (which is just a grab-bag of tools with a consistent interface). And it was not just Django that has opinions, Python itself as well. A vision of the core values of the language: The &lt;a href="https://peps.python.org/pep-0020/" rel="noopener noreferrer"&gt;Zen of Python&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After a transition period, shoving some PHP-isms into Python, I came to appreciate the nature of Python. At its core it's simple. But, when you need it, it gives you all the OOP you want, as well as powerful concepts such as &lt;a href="https://realpython.com/primer-on-python-decorators/" rel="noopener noreferrer"&gt;decorators&lt;/a&gt;, &lt;a href="https://realpython.com/python-with-statement/" rel="noopener noreferrer"&gt;context managers&lt;/a&gt; and &lt;a href="https://realpython.com/python-exceptions/" rel="noopener noreferrer"&gt;exceptions&lt;/a&gt; that are cheap and very useful.&lt;/p&gt;

&lt;p&gt;Simple when possible. Complex when needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adapting to DevOps
&lt;/h2&gt;

&lt;p&gt;The Zen of DevOps combines personal experience with conversations and observations of the past many years: Setups I have experienced to be easy to maintain. Setups that, despite all the modern tools, were complex and brittle. Countless conference talks I attended and articles I read. Many hallway tracks, discussing practices with peers.&lt;/p&gt;

&lt;p&gt;The resulting guidelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Be able to break non-production systems&lt;/li&gt;
&lt;li&gt;Be able to break non-production systems only&lt;/li&gt;
&lt;li&gt;Design for more than one&lt;/li&gt;
&lt;li&gt;Design for more than once&lt;/li&gt;
&lt;li&gt;Favor changes that make you faster over those that slow you down&lt;/li&gt;
&lt;li&gt;Beautiful is better than ugly&lt;/li&gt;
&lt;li&gt;Explicit is better than implicit&lt;/li&gt;
&lt;li&gt;Simple is better than complex&lt;/li&gt;
&lt;li&gt;Complex is better than complicated&lt;/li&gt;
&lt;li&gt;Errors should never pass silently&lt;/li&gt;
&lt;li&gt;Unless explicitly silenced&lt;/li&gt;
&lt;li&gt;In the face of ambiguity, refuse the temptation to guess&lt;/li&gt;
&lt;li&gt;There should be one - and preferably only one - obvious way to do it&lt;/li&gt;
&lt;li&gt;If the implementation is hard to explain, it's a bad idea&lt;/li&gt;
&lt;li&gt;If the implementation is easy to explain, it may be a good idea&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Removals
&lt;/h3&gt;

&lt;p&gt;Some elements of the Zen of Python, I have left out of the Zen of DevOps:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Flat is better than nested / Sparse is better than dense / Readability counts&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In DevOps we have to do with the languages and formats that are common: Mostly &lt;a href="https://go.dev/" rel="noopener noreferrer"&gt;Go&lt;/a&gt;, which has its own opinionated &lt;code&gt;fmt&lt;/code&gt;. Furthermore, schema design of YAML and JSON should be guided more by API design guidelines, than readability. Although readability of course is a good thing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Special cases aren't special enough to break the rules / Although practicality beats purity&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Although valid points in their own right, I felt that, because the scope of DevOps is so much wider than a single programming language, that this guideline is a bit too restrictive. The reality of DevOps in large organizations is often a messy variety of practices having different levels of maturity. This guideline just gets in the way.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Now is better than never / Although never is often better than &lt;em&gt;right&lt;/em&gt; now&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I replaced that with "Favor changes that make you faster over those that slow you down". Putting a bit more emphasis on modern Agile and scrum practices, that sometimes favor external stakeholder requests over internal team effectiveness&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Although that way may not be obvious at first unless you're Dutch.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It's a joke. I like jokes, and even though I'm Dutch as well: This one makes no sense and rather distracts than adds anything.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Namespaces are one honking great idea -- let's do more of those!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The ubiquitous 'naming things'. A bit out of tone. DevOps is a lot about 'moving parts' and 'orchestration'. Not just software, where namespaces indeed are useful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Additions
&lt;/h3&gt;

&lt;p&gt;Some new guidelines have been added (See the &lt;a href="https://www.zenofdevops.org/" rel="noopener noreferrer"&gt;Zen of DevOps&lt;/a&gt; for more elaborate explanation of the added guidelines):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Be able to break non-production systems / Be able to break non-production systems only&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These two guidelines emphasize on differentiating non-production and production. Which is more an 'ops' thing than a 'dev' thing and was not really conveyed in the Zen of Python.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Design for more than one / Design for more than once&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;These guidelines focus on the automation and codifying practices of modern infra. And really, it's not that new: Using &lt;a href="https://www.ibm.com/docs/en/tpmfod/7.1.1.16?topic=sysprep-windows-xp-windows-2003-operating-systems" rel="noopener noreferrer"&gt;sysprep&lt;/a&gt; in the Windows XP era, to stamp out many desktops, is not entirely different from preparing USB sticks for air-gapped Kubernetes edge deployments. And that is not unlike immutable infrastructure, never modifying a server in-place, just stamping out new ones.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Favor changes that make you faster over those that slow you down&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As stated above, this guideline emphasizes the need to stay ahead of the maintenance curve. The scope and complexity of what teams can, and need to, manage is evergrowing. But that means changes that simplify, reduce friction and improve efficiency are more important than ever.&lt;/p&gt;

&lt;h2&gt;
  
  
  Universal and timeless
&lt;/h2&gt;

&lt;p&gt;Time will tell if the Zen of DevOps will be as timeless as the Zen of Python. I hope so!&lt;/p&gt;

&lt;p&gt;The range of practices that can be observed in the field of devops is increasingly wide: Front runners have already adopted agentic workflows. At the same time there are organizations where requesting a server, a cluster, a DNS change, or firewall change, can take many days&lt;sup id="fnref2"&gt;2&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;AI is changing many fields of works in impactful ways&lt;sup id="fnref3"&gt;3&lt;/sup&gt;. At the same time, engineering principles are quite foundational. If you design a plane, you build it to last, you design for maintenance and upgrades, add redundancy&lt;sup id="fnref4"&gt;4&lt;/sup&gt;, add safety margins. Whether the design is created on paper using rulers, using a computer, or mostly by AI: Those principles still exist, and should be supervised. &lt;/p&gt;

&lt;p&gt;Software is no different: Security, observability, maintainability, auditability, computational efficiency are all foundational engineering practices, also known as 'non functional requirements'.&lt;/p&gt;

&lt;p&gt;We will see if the Zen of DevOps will hold strong in these times of AI. If it doesn't, we have probably ended up with a lot of incomprehensible junk. But I have good hopes.&lt;/p&gt;

&lt;p&gt;Take for example:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There should be one - and preferably only one - obvious way to do it&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This will translate directly into better agent performance. Likewise, when experimenting with agent integrations, it's really important you can do that on non-prod. And if first experiments mess things up, it's really helpful you can rebuild your setup.&lt;/p&gt;

&lt;p&gt;Unlike the Zen of Python, which focused on a single language, the Zen of DevOps aims to be more universal. &lt;/p&gt;

&lt;p&gt;Our industry is full of 'strong preferences' or previous choices we have become invested in beyond the point of no return. The Zen of DevOps aims to guide at a higher level, so is not about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Serverless vs. Kubernetes&lt;/li&gt;
&lt;li&gt;Public cloud vs. on-premise&lt;/li&gt;
&lt;li&gt;AWS vs. Azure vs. GCP&lt;/li&gt;
&lt;li&gt;Terraform vs. CDK vs. Pulumi vs. Crossplane&lt;/li&gt;
&lt;li&gt;GitOps vs. Pipelines&lt;/li&gt;
&lt;li&gt;Agile vs. Kanban vs. Waterfall&lt;/li&gt;
&lt;li&gt;Strong vs. Weak typing&lt;/li&gt;
&lt;li&gt;Imperative vs. Declarative&lt;/li&gt;
&lt;li&gt;Windows vs. Linux&lt;/li&gt;
&lt;li&gt;Pets vs. Cattle&lt;/li&gt;
&lt;li&gt;DevOps vs. SRE vs. Platform Engineering&lt;/li&gt;
&lt;li&gt;Rust vs. ... &amp;lt;every other language&amp;gt;&lt;/li&gt;
&lt;li&gt;YAML vs. JSON vs. TOML vs. KYAML vs JSON5&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Means, not goals
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Don’t let guidelines distract you from your goals!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some of the guidelines can be interpreted in several ways. And not every guideline might be feasible or applicable in every environment. And that's ok!&lt;/p&gt;

&lt;p&gt;Consider 'explicit'. To some it might mean: Make everything very visible. No abstractions. Everything is 'out there'. To others, including me, it means: Make conscious choices in what to expose, and what to hide, making the parts that can be considered 'the interface', explicit.&lt;/p&gt;

&lt;p&gt;The main take away is: Be deliberate about such practices, and keep evaluating how they affect a project and collaboration within and between teams.&lt;/p&gt;

&lt;p&gt;One does not complete or fail the Zen of DevOps.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;It has been interesting to try to collect years of experience and observations into a small set of principles. I hope it gives teams and individuals some new perspectives. Even if not agreeing, unpacking why that might be can yield insights, and has value.&lt;/p&gt;

&lt;p&gt;In the coming time I might dive deeper into certain topics. If so, they will be tagged &lt;a href="https://www.tibobeijen.nl/tags/zenofdevops/" rel="noopener noreferrer"&gt;zenofdevops&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Got any thoughts or feedback? By all means, reach out on &lt;a href="https://www.linkedin.com/in/tibobeijen/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Zen to all...&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;I have yet to see an OKR stating 'good team mental health' as a key result. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;I am aware there are people, upon reading, wishing it were mere days. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;Curious what happens when we find out that we have replaced all simple deterministic processes by awesome 'magic', and now are strategically dependent on the new oil: Datacenter capacity, energy, GPUs, memory. Sold by just a few big companies. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn4"&gt;
&lt;p&gt;The price of not doing so &lt;a href="https://risktec.tuv.com/knowledge-bank/the-price-of-single-point-failure/" rel="noopener noreferrer"&gt;can be unacceptably high&lt;/a&gt;. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>devops</category>
      <category>sre</category>
      <category>platformengineering</category>
      <category>python</category>
    </item>
    <item>
      <title>East, west, north, south: How to fix your local cluster routes</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Fri, 04 Apr 2025 13:52:16 +0000</pubDate>
      <link>https://dev.to/tbeijen/east-west-north-south-how-to-fix-your-local-cluster-routes-1n6b</link>
      <guid>https://dev.to/tbeijen/east-west-north-south-how-to-fix-your-local-cluster-routes-1n6b</guid>
      <description>&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Recently I needed to test a &lt;a href="https://www.keycloak.org/" rel="noopener noreferrer"&gt;Keycloak&lt;/a&gt; upgrade. This required me to deploy both the new Keycloak version and a sample OIDC application on my local Kubernetes setup. And it pointed me to a thing I kept postponing:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Improve my local development DNS, routing and TLS setup&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The challenge
&lt;/h2&gt;

&lt;p&gt;Until recently, I used urls like &lt;code&gt;keycloak.127.0.0.1.nip.io:8443&lt;/code&gt;. This points to &lt;code&gt;127.0.0.1&lt;/code&gt; port &lt;code&gt;8443&lt;/code&gt; which forwards to a local &lt;a href=""&gt;k3d&lt;/a&gt; cluster. At the same time it provides a unique hostname that can be used for configuring ingress. &lt;/p&gt;

&lt;p&gt;Nice. But not without flaws.&lt;/p&gt;

&lt;p&gt;For starters, this works for routing traffic &lt;em&gt;to&lt;/em&gt; the K3D cluster, we could call this 'north-south'. But not for routing traffic &lt;em&gt;within&lt;/em&gt; the cluster (east-west). This becomes apparent when trying to setup an OIDC sample application&lt;sup id="fnref1"&gt;1&lt;/sup&gt;, such as &lt;a href="https://github.com/dexidp/dex/pkgs/container/example-app" rel="noopener noreferrer"&gt;the one shipped with DEX&lt;/a&gt;. The domain pointing to Keycloak is used in two places: By the browser of the user logging in, so in this case from the host OS, &lt;em&gt;and&lt;/em&gt; directly from the backend, so within the cluster. &lt;/p&gt;

&lt;p&gt;This puts us in a catch-22: &lt;code&gt;nip.io&lt;/code&gt;, or an entry in &lt;code&gt;/etc/hosts&lt;/code&gt; only works for north-south. &lt;code&gt;svc.cluster.local&lt;/code&gt; only works for east-west. &lt;/p&gt;

&lt;p&gt;Another problem is that the default certificates issued by &lt;a href="https://github.com/traefik/traefik" rel="noopener noreferrer"&gt;Traefik&lt;/a&gt;, are not trusted by other systems or browsers. So we frequently need to bypass security warnings, which by itself is indicative of a problem and encourages bad habits. Furthermore, even if we manage to configure our setup to use the ingress service from within the cluster, it depends on the backend application if it allows bypassing TLS host checking.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82bkochnf9ujeqfvbhdp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F82bkochnf9ujeqfvbhdp.png" alt="Problem routing both north-south and east-west traffic" width="800" height="734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To improve this, we need to address some things. In this article:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Introduction&lt;/li&gt;
&lt;li&gt;The challenge&lt;/li&gt;
&lt;li&gt;The plan&lt;/li&gt;
&lt;li&gt;Improvement 1: TLS certificates and trust&lt;/li&gt;
&lt;li&gt;Improvement 2: Fixing north-south routing&lt;/li&gt;
&lt;li&gt;Improvement 3: Fixing east-west routing&lt;/li&gt;
&lt;li&gt;Combining and automating&lt;/li&gt;
&lt;li&gt;Next steps and wrapping it up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;☞ Don't fancy reading? Head straight to the &lt;a href="https://github.com/TBeijen/dev-cluster-config" rel="noopener noreferrer"&gt;github repo containing taskfile automation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The plan
&lt;/h2&gt;

&lt;p&gt;So, let's identify and configure the components needed to create a smooth local Kubernetes setup, providing trusted TLS and predictable endpoints. &lt;/p&gt;

&lt;p&gt;This will result in applications being accessible via the following pattern:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;k3d cluster&lt;/th&gt;
&lt;th&gt;Hostnames&lt;/th&gt;
&lt;th&gt;HTTP port&lt;/th&gt;
&lt;th&gt;HTTPS port&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;cl0&lt;/td&gt;
&lt;td&gt;*.cl0.k3d.local&lt;/td&gt;
&lt;td&gt;10080&lt;/td&gt;
&lt;td&gt;10443&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cl1&lt;/td&gt;
&lt;td&gt;*.cl1.k3d.local&lt;/td&gt;
&lt;td&gt;11080&lt;/td&gt;
&lt;td&gt;11443&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cl2&lt;/td&gt;
&lt;td&gt;*.cl2.k3d.local&lt;/td&gt;
&lt;td&gt;12080&lt;/td&gt;
&lt;td&gt;12443&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;etc, etc...&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;{{&amp;lt; figure src="/img/dev_routes.jpg" title="East, west, north, south. The components used to fix the routes. Source: Wikimedia &amp;amp; Open Source projects" &amp;gt;}}&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Improvement 1: TLS certificates and trust
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Create CA certificate and key
&lt;/h3&gt;

&lt;p&gt;The ingress configurations in the cluster need to serve a certificate that is trusted by browsers and systems. One way could be registering a public (sub)domain for internal use, and use &lt;a href="https://letsencrypt.org/" rel="noopener noreferrer"&gt;Let's Encrypt&lt;/a&gt; certificates, using &lt;a href="https://letsencrypt.org/docs/challenge-types/#dns-01-challenge" rel="noopener noreferrer"&gt;DNS-01 challenge&lt;/a&gt; for verification.&lt;/p&gt;

&lt;p&gt;Another way is to create a self-signed Certificate Authority (CA), use that to issue TLS certificates, and ensure the CA is trusted by the relevant systems. I chose this approach since it's mostly a setup-once affair, and doesn't require me to deal with API tokens of my DNS provider. &lt;/p&gt;

&lt;p&gt;One important thing to be aware of, is that adding a CA to a trust bundle, has any certificate signed by it, to be trusted. So, if the CA signs a certificate for e.g. &lt;code&gt;myaccount.google.com&lt;/code&gt;, your browser will trust it. This can be mitigated by adding &lt;a href="https://wiki.mozilla.org/CA:NameConstraints" rel="noopener noreferrer"&gt;NameConstraints&lt;/a&gt;. This reduces the risk of adding self-signed CAs to your trust bundles.&lt;/p&gt;

&lt;p&gt;Create a &lt;code&gt;ca.ini&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ req ]
default_bits       = 4096
distinguished_name = req_distinguished_name
req_extensions     = v3_req
prompt             = no

[ req_distinguished_name ]
CN = Development Setup .local CA
O = LocalDev
C = NL

[ v3_req ]
basicConstraints = critical, CA:TRUE
keyUsage = critical, keyCertSign, cRLSign
nameConstraints = critical, permitted;DNS:.local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create key, certificate signing request (csr) and signed certificate:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;openssl ecparam -name prime256v1 -genkey -noout -out ca.key
openssl req -new -key ca.key -out ca.csr -config ca.ini
openssl x509 -req -in ca.csr -signkey ca.key -out ca.crt -days 3650 -extfile ca.ini -extensions v3_req
# Show the certificate
openssl x509 -noout -text -in ca.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note the name constraints section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;X509v3 extensions:
    X509v3 Name Constraints: critical
        Permitted:
          DNS:.local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's test if name constraints works by issuing a certificate that does not match the name constraint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Create a certificate for example.com
openssl req -x509 -newkey rsa:4096 -sha256 -days 3650 \
  -CA ca.crt -CAkey ca.key \
  -nodes -keyout example.com.key -out example.com.crt -subj "/CN=example.com" \
  -addext "subjectAltName=DNS:example.com,DNS:*.example.com,IP:10.0.0.1"
# Ok, so we can create a certificate outside the name constraints
# Let's verify it
openssl verify -verbose -CAfile ca.crt example.com.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CN=example.com
error 47 at 0 depth lookup: permitted subtree violation
error example.com.crt: verification failed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Good! Now add the CA to Keychain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain ca.crt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Setup Kubernetes cluster issuer and trust bundle
&lt;/h3&gt;

&lt;p&gt;We can now install &lt;a href="https://cert-manager.io/docs/" rel="noopener noreferrer"&gt;cert-manager&lt;/a&gt; to issue TLS certificates and &lt;a href="https://cert-manager.io/docs/trust/trust-manager/" rel="noopener noreferrer"&gt;trust-manager&lt;/a&gt; to distribute trust bundles:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add jetstack https://charts.jetstack.io --force-update

# cert-manager
# Note the NameConstraints feature gates!
helm upgrade --install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true \
  --set webhook.featureGates="NameConstraints=true" \
  --set featureGates="NameConstraints=true"

# trust-manager
helm upgrade --install \
  trust-manager jetstack/trust-manager \
  --namespace cert-manager
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Add the CA certificate and create a &lt;code&gt;ClusterIssuer&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl -n cert-manager create secret tls root-ca --cert=ca.crt --key=ca.key

cat &amp;lt;&amp;lt;EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: root-ca
spec:
  ca:
    secretName: root-ca
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a trust &lt;code&gt;Bundle&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt;EOF | kubectl apply -f -
apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: default-ca-bundle
spec:
  sources:
  - useDefaultCAs: true
  - secret:
      name: "root-ca"
      key: "tls.crt"
  target:
    configMap:
      key: "bundle.pem"
    namespaceSelector:
      matchLabels:
        trust: enabled
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the bits of configuration we need to remember when setting up applications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The cluster issuer name is &lt;code&gt;root-ca&lt;/code&gt;, so Ingress objects need the annotation &lt;code&gt;cert-manager.io/cluster-issuer: root-ca&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;The CA bundle, including our issuer CA, can be found in a ConfigMap &lt;code&gt;default-ca-bundle&lt;/code&gt; under key &lt;code&gt;pundle.pem&lt;/code&gt;, &lt;em&gt;if&lt;/em&gt; the namespace is labeled &lt;code&gt;trust: enabled&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Since I consider dev clusters ephemeral and short-lived, topics like safely rotating issuer certificates don't need attention. When setting up trust manager in production environments, be sure to consider &lt;a href="https://cert-manager.io/docs/trust/trust-manager/installation/#trust-namespace" rel="noopener noreferrer"&gt;what namespace to install&lt;/a&gt; in and &lt;a href="https://cert-manager.io/docs/trust/trust-manager/#cert-manager-integration-intentionally-copying-ca-certificates" rel="noopener noreferrer"&gt;prepare for issuer certificate rotation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Improvement 2: Fixing north-south routing
&lt;/h2&gt;

&lt;p&gt;As mentioned in the introduction, DNS resolvers like &lt;code&gt;nip.io&lt;/code&gt; are helpful for routing from host to development cluster, but will not work within the cluster: It will resolve to &lt;code&gt;127.0.0.1&lt;/code&gt; and the target service won't be there.&lt;/p&gt;

&lt;p&gt;One way to handle this is to install &lt;a href="https://thekelleys.org.uk/dnsmasq/doc.html" rel="noopener noreferrer"&gt;dnsmasq&lt;/a&gt;, and configure resolving on the host in such a way that &lt;code&gt;.local&lt;/code&gt; will use &lt;code&gt;127.0.0.1&lt;/code&gt; to resolve DNS. Which will then respond with &lt;code&gt;127.0.0.1&lt;/code&gt;. Using the port matching the K3D cluster will then ensure the correct cluster receives the traffic.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install dnsmasq
# Ensure service can bind to port 53 and starts at reboot
sudo brew services start dnsmasq
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Configure dnsmasq, use &lt;code&gt;brew --prefix&lt;/code&gt; to determine where the config is located. On a silicon Mac that will be &lt;code&gt;/opt/homebrew&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Ensure the following line in &lt;code&gt;/opt/homebrew/etc/dnsmasq.conf&lt;/code&gt; is uncommented:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;conf-dir=/opt/homebrew/etc/dnsmasq.d/,*.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we can configure dnsmasq to resolve &lt;code&gt;.local&lt;/code&gt; to &lt;code&gt;127.0.0.1&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo "address=/local/127.0.0.1" &amp;gt; $(brew --prefix)/etc/dnsmasq.d/local.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, we tell macOS to use dnsmasq at &lt;code&gt;127.0.0.1&lt;/code&gt; to resolve DNS queries for &lt;code&gt;.local&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo sh -c "echo 'nameserver 127.0.0.1' &amp;gt; /etc/resolver/local"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Be aware that tools like &lt;code&gt;dig&lt;/code&gt; and &lt;code&gt;nslookup&lt;/code&gt; behave a bit different from the usual program on macOS so they are not the best way to test&lt;sup id="fnref2"&gt;2&lt;/sup&gt;. If we have set up a K3D cluster, &lt;a href="https://k3d.io/v5.3.0/usage/commands/k3d_cluster_create/#options" rel="noopener noreferrer"&gt;mapping host ports&lt;/a&gt; to the http and https ports using &lt;code&gt;-p&lt;/code&gt;, we could try to reach an application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Assuming we have set up keycloak and port 10443 is forwarded to k3d https port
# Using -k since curl does not use the system trust bundle, so is not aware of our CA
curl -k https://keycloak.cl0.k3d.local:10443/ -I
HTTP/2 302
location: https://keycloak.cl0.k3d.local:10443/admin/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Application is there. Good. Let's move on.&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Improvement 3: Fixing east-west routing
&lt;/h2&gt;

&lt;p&gt;Our dnsmasq setup works from host, via ingress to a service. But when needing to access another service within the cluster, using the same domain, it won't. &lt;/p&gt;

&lt;p&gt;Of course, we can access services the usual way via &lt;code&gt;service-name.namespace.svc.cluster.local&lt;/code&gt;. But this means within the cluster we need to use a different domain than from the outside. Confusing at best, and in some cases not possible. One example being Keycloak client applications, as outlined in the introduction, where only one configuration item for the Keycloak domain exists.&lt;/p&gt;

&lt;p&gt;If, from &lt;em&gt;within&lt;/em&gt; our cluster, we try to resolve &lt;code&gt;keycloak.cl0.k3d.local&lt;/code&gt; from a pod, the following happens:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CoreDNS knows nothing about this, it's not a pod, it's not a service&lt;/li&gt;
&lt;li&gt;CoreDNS forwards DNS resolving to host&lt;/li&gt;
&lt;li&gt;The host (our Macbook) will recognize &lt;code&gt;.local&lt;/code&gt; and tell DNS to query for the domain at &lt;code&gt;127.0.0.1:53&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;There is no DNS server running in the pod so DNS resolving will fail&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To fix this, we make two adjustments.&lt;/p&gt;

&lt;p&gt;First, we copy the existing &lt;code&gt;traefik&lt;/code&gt; service to &lt;code&gt;traefik-internal&lt;/code&gt;, changing the type from &lt;code&gt;LoadBalancer&lt;/code&gt; into &lt;code&gt;ClusterIP&lt;/code&gt; and adjusting the ports to align with the ports mapped to the host. The resulting service looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;apiVersion: v1
kind: Service
metadata:
  name: traefik-internal
  namespace: kube-system
spec:
  ports:
  - name: webext
    port: 10080
    protocol: TCP
    targetPort: web
  - name: websecureext
    port: 10443
    protocol: TCP
    targetPort: websecure
  selector:
    app.kubernetes.io/instance: traefik-kube-system
    app.kubernetes.io/name: traefik
  type: ClusterIP
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to ensure that connecting to e.g. &lt;code&gt;keycloak.cl0.k3d.local&lt;/code&gt; from &lt;em&gt;within&lt;/em&gt; our cluster, will end up at the &lt;code&gt;traefik-internal&lt;/code&gt; service. &lt;/p&gt;

&lt;p&gt;For this we can add a &lt;a href="https://coredns.io/2017/05/08/custom-dns-entries-for-kubernetes/" rel="noopener noreferrer"&gt;custom dns entry to CoreDNS&lt;/a&gt;. We can do so by adding a configmap:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ConfigMap&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;coredns-custom&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kube-system&lt;/span&gt;
&lt;span class="na"&gt;data&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;k3d.local.server&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;k3d.local:53 {&lt;/span&gt;
        &lt;span class="s"&gt;errors&lt;/span&gt;
        &lt;span class="s"&gt;cache 30&lt;/span&gt;
        &lt;span class="s"&gt;rewrite name regex (.*)\.cl0\.k3d\.local traefik-internal.kube-system.svc.cluster.local&lt;/span&gt;
        &lt;span class="s"&gt;# We need to rewrite everything coming into this configuration block to avoid infinite loops&lt;/span&gt;
        &lt;span class="s"&gt;rewrite name regex (.*)\.k3d\.local host.k3d.internal&lt;/span&gt;
        &lt;span class="s"&gt;forward . 127.0.0.1:53&lt;/span&gt;
    &lt;span class="s"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; As the comment says, we need to ensure we rewrite &lt;em&gt;everything&lt;/em&gt;, since we feed the rewritten domain back to CoreDNS. This is why using &lt;code&gt;k3d.local&lt;/code&gt; as domain to put clusters under, works fine, whereas &lt;code&gt;k3d.internal&lt;/code&gt; does &lt;em&gt;not&lt;/em&gt;. In case of the latter, we would rewrite to a new FQDN that re-enters the custom config, resulting in an infinite loop and CoreDNS crash.&lt;/p&gt;

&lt;p&gt;With the 3 improvements in place, we now have a setup that works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xgavrk3kcbdl1paa29t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xgavrk3kcbdl1paa29t.png" alt="Consistent DNS and trusted certificates" width="800" height="734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Combining and automating
&lt;/h2&gt;

&lt;p&gt;Although we now have a configuration that works, it is not particularly easy to set up. So, what do we do? We automate.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://taskfile.dev/" rel="noopener noreferrer"&gt;Taskfile&lt;/a&gt; is single-binary &lt;code&gt;Make&lt;/code&gt; alternative that provides all the templating and configurability needed, to easily spin up K3D clusters configured as described in this article.&lt;/p&gt;

&lt;p&gt;Check out the &lt;a href="https://github.com/TBeijen/dev-cluster-config" rel="noopener noreferrer"&gt;dev-cluster-config&lt;/a&gt; repository. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# review .default.env
cat .default.env

# If needing to adjust config 
cp .default.env .env
vim .env

# Once: Setup CA certificate
task cert

# Once: Setup dnsmasq
task dnsmasq_brew

# Setup &amp;amp; configure clusters
task k3d-cluster-setup-0
task k3d-cluster-setup-1

# Optionally: Add example applications (nginx/curl)
task k3d-cluster-examples-0
task k3d-cluster-examples-1

# Use k3d to remove a cluster
k3d cluster delete cl0
k3d cluster delete cl1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Next steps and wrapping it up
&lt;/h2&gt;

&lt;p&gt;Optionally, we could also set up a load balancer like &lt;a href="https://www.haproxy.org/" rel="noopener noreferrer"&gt;haproxy&lt;/a&gt; on the macOS host that listens on the default http and https ports 80 and 443. It would serve the trusted certificate and, based on host, forward to the proper k3d cluster. This would remove the need to use custom ports.&lt;/p&gt;

&lt;p&gt;If on the other hand, one does not want to set up dnsmasq, one could address clusters like &lt;code&gt;myapp.cl1.k3d.127.0.0.1.nip.io&lt;/code&gt;, and update the CoreDNS configuration accordingly, to intercept DNS lookups to &lt;code&gt;*.k3d.127.0.0.1.nip.io&lt;/code&gt; and return the host &lt;code&gt;host.k3d.internal&lt;/code&gt; IP address.&lt;/p&gt;

&lt;p&gt;The setup described in this article, consists of several discrete parts. It is not a one-stop integrated solution. However, as illustrated above, it can be easily extended and adjusted, so that can be considered an advantage. If wanting to run &lt;a href="https://kind.sigs.k8s.io/" rel="noopener noreferrer"&gt;Kind&lt;/a&gt;, &lt;a href="https://minikube.sigs.k8s.io/docs/" rel="noopener noreferrer"&gt;Minikube&lt;/a&gt;, &lt;a href="https://docs.rancherdesktop.io/" rel="noopener noreferrer"&gt;Rancher Desktop&lt;/a&gt; or &lt;a href="https://github.com/abiosoft/colima" rel="noopener noreferrer"&gt;Colima&lt;/a&gt;, a similar approach will work.&lt;/p&gt;

&lt;p&gt;Now, local development setups, like OS and editor choices, is typically something engineers are very opinionated about. And that's fine!!&lt;sup id="fnref3"&gt;3&lt;/sup&gt; So, if you are wondering "why are you doing all this and not doing this other thing instead?". By all means, reach out on &lt;a href="https://www.linkedin.com/in/tibobeijen/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; or &lt;a href="https://bsky.app/profile/tibobeijen.nl" rel="noopener noreferrer"&gt;BlueSky&lt;/a&gt;. I'm curious!&lt;/p&gt;

&lt;p&gt;Regardless, I hope the above provides some guidance on getting the most out of your local development clusters.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;Yes, we are mixing Keycloak and DEX. The beauty of standards such as OIDC. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;It's... complicated. &lt;a href="https://rakhesh.com/infrastructure/macos-vpn-doesnt-use-the-vpn-dns/" rel="noopener noreferrer"&gt;This article&lt;/a&gt; about configuring DNS and VPN gives some insights. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;Although there is often a balance to strike between 'own improvements' and 'team standards'. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>kubernetes</category>
      <category>k3d</category>
      <category>certmanager</category>
      <category>macos</category>
    </item>
    <item>
      <title>12 Factor: 13 years later</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Sat, 27 Apr 2024 04:33:00 +0000</pubDate>
      <link>https://dev.to/tbeijen/12-factor-13-years-later-4k3o</link>
      <guid>https://dev.to/tbeijen/12-factor-13-years-later-4k3o</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In a presentation about CI/CD I gave recently, I briefly mentioned the &lt;a href="https://12factor.net/" rel="noopener noreferrer"&gt;12 factor methodology&lt;/a&gt;. Somewhere along the lines of "You might find some good practices there", and summarizing it as:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;artifact
configuration +
---------------
deployment
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the talk, a colleague of way back, came to me and said: "You were way too mild in &lt;em&gt;suggesting&lt;/em&gt; it. It's mandatory, people &lt;em&gt;should&lt;/em&gt; follow those practices."&lt;sup id="fnref1"&gt;1&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;And yes, he was right. There are a lot of good practices to get from the 12 factor methodology. But do &lt;em&gt;all&lt;/em&gt; parts still hold up? Or might following it to the letter be actually counter-productive in some cases?&lt;/p&gt;

&lt;p&gt;In the past, I have onboarded quite a number of applications into Kubernetes, that were already built with 12 factor in mind. That process usually was fairly smooth, so you start to take things for granted. Until you bump into applications that are tough to operate, that is.&lt;/p&gt;

&lt;p&gt;Upon closer inspection, such applications are usually found to violate some of the 12 factor principles.&lt;/p&gt;

&lt;p&gt;The 12 factor methodology has been &lt;a href="https://github.com/heroku/12factor/commit/2b06e7deabb64bb759f9fc6f4d9b6fcc546921bb" rel="noopener noreferrer"&gt;initiated almost 13 years ago&lt;/a&gt; at Heroku, a company that was 'cloud native', focused on developer experience and ease of operation. So, it's no surprise it still &lt;em&gt;is&lt;/em&gt; relevant.&lt;/p&gt;

&lt;p&gt;So, let's glance over the 12 factors, and put them in the context of modern cloud native applications.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 12 factors
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Codebase
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;One &lt;a href="https://12factor.net/codebase" rel="noopener noreferrer"&gt;codebase&lt;/a&gt; tracked in revision control, many deploys&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Looking at the image, these days we would add artifact between codebase and deploys. Artifact being a container, or perhaps zip file (serverless).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;code         -&amp;gt; artifact       -&amp;gt; deploy
- versioned     - container       - prod
                - zip             - staging
                                  - local
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's worth noting that for local development, depending on the setup, some form of live-reload usually comes in place of creating an actual artifact.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Dependencies
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Explicitly declare and isolate &lt;a href="https://12factor.net/dependencies" rel="noopener noreferrer"&gt;dependencies&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is something that has become more natural in containerized applications. &lt;/p&gt;

&lt;p&gt;One part of the description is a bit dated though: "Twelve-factor apps also do not rely on the implicit existence of any system tools. Examples include shelling out to ImageMagick or curl."&lt;/p&gt;

&lt;p&gt;In containerized applications, the boundary &lt;em&gt;is&lt;/em&gt; the container, and its contents are well-defined. So an application shelling out to &lt;code&gt;curl&lt;/code&gt; is not a problem, since &lt;code&gt;curl&lt;/code&gt; now comes with the artifact, instead of it being assumed to exist.&lt;/p&gt;

&lt;p&gt;Similarly, in serverless setups like AWS Lambda, the execution environment is so well-defined that any dependency it provides, can be safely used.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Config
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Store &lt;a href="https://12factor.net/config" rel="noopener noreferrer"&gt;config&lt;/a&gt; in the environment&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This point is perhaps overly specific on the exact solution. The main takeaways are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuration not in application code&lt;/li&gt;
&lt;li&gt;Artifact + configuration = deployment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Confusingly, and especially with the rise of GitOps, the configuration &lt;em&gt;is&lt;/em&gt; in a codebase, but detached from the application code.&lt;/p&gt;

&lt;p&gt;As long as the above concept is followed, using environment variables or config files, is mostly an implementation detail.&lt;/p&gt;

&lt;p&gt;Using Kubernetes, depending on security requirements, there might be considerations to use files instead of environment variables, optionally combined with envelope encryption. On this topic, I can recommend:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;KubeCon EU 2023: &lt;a href="https://kccnceu2023.sched.com/event/1HyVr/a-confidential-story-of-well-kept-secrets-lukonde-mwila-aws" rel="noopener noreferrer"&gt;A Confidential Story of Well-Kept Secrets - Lukonde Mwila, AWS&lt;/a&gt; &lt;a href="https://youtu.be/-I1JjJxy-rU?t=302" rel="noopener noreferrer"&gt;(video)&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Backing services
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Treat &lt;a href="https://12factor.net/backing-services" rel="noopener noreferrer"&gt;backing services&lt;/a&gt; as attached resources&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This has become common practice. In Kubernetes, it's usually easy to configure either a local single-pod (non-prod) Redis or Postgres, or a remote cloud-managed variant like RDS or Elasticache.&lt;/p&gt;

&lt;p&gt;There can be reasons to use local file system or memory, for example performance, or simplicity. This is fine, as long as the data is completely ephemeral, and the implementation doesn't negatively affect any of the other factors.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Build, release, run
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Strictly &lt;a href="https://12factor.net/build-release-run" rel="noopener noreferrer"&gt;separate&lt;/a&gt; build and run stages&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;From Kubernetes to AWS Lambda: It will be hard these days to violate this principle. Enhancing the aforementioned summary:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build   -&amp;gt; artifact
Release -&amp;gt; configuration +
--------------------------
Run     -&amp;gt; deployment
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Processes
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Execute the app as one or more &lt;a href="https://12factor.net/processes" rel="noopener noreferrer"&gt;stateless processes&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the full text, there is a line that better summarizes the point:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Twelve-factor processes are stateless and share-nothing&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Some takeaways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One container, one process, one service.&lt;/li&gt;
&lt;li&gt;No sticky-sessions. Store sessions externally, e.g. in Redis. See also factor 4.&lt;/li&gt;
&lt;li&gt;Simplify the process by considering &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/init-containers/" rel="noopener noreferrer"&gt;init containers&lt;/a&gt; or &lt;a href="https://helm.sh/docs/topics/charts_hooks/" rel="noopener noreferrer"&gt;Helm chart hooks&lt;/a&gt;. See also factor 12.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Somewhat overlapping with factor 4, this factor implies using external services where possible. For example: Use external Redis instead of embedded Infinispan.&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Port binding
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://12factor.net/port-binding" rel="noopener noreferrer"&gt;Export services&lt;/a&gt; via port binding&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This holds up for TCP-based applications. But it is no longer applicable for event-driven systems such as AWS Lambda or WASM on Kubernetes using &lt;a href="https://www.spinkube.dev/" rel="noopener noreferrer"&gt;SpinKube&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  8. Concurrency
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;a href="https://12factor.net/concurrency" rel="noopener noreferrer"&gt;Scale out&lt;/a&gt; via the process model&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Make your application horizontal scalable. This is somewhat related to factor 4, which result in share-nothing application processes.&lt;/p&gt;

&lt;p&gt;Furthermore, the application should leave process management to the operating system or orchestrator.&lt;/p&gt;

&lt;h3&gt;
  
  
  9. Disposability
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Maximize robustness with &lt;a href="https://12factor.net/disposability" rel="noopener noreferrer"&gt;fast startup and graceful shutdown&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In a way this can be seen as complementing the previous factor: Just as it should be easy to horizontally scale out, it should be easy to remove or replace processes.&lt;/p&gt;

&lt;p&gt;Specific to Kubernetes, this boils down to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Obey &lt;a href="https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination" rel="noopener noreferrer"&gt;termination signals&lt;/a&gt;. The application should gracefully shut down. Either handle the &lt;code&gt;SIGTERM&lt;/code&gt; signal in the application, or setup a &lt;a href="https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks" rel="noopener noreferrer"&gt;PreStop&lt;/a&gt; hook (&lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace" rel="noopener noreferrer"&gt;more info&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Setup probes. Probes should only return &lt;code&gt;OK&lt;/code&gt; when the application is actually ready to receive traffic.&lt;/li&gt;
&lt;li&gt;Setup &lt;code&gt;maxSurge&lt;/code&gt; (&lt;a href="https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment" rel="noopener noreferrer"&gt;rolling updates&lt;/a&gt;) and &lt;code&gt;PodDisruptionBudget&lt;/code&gt;(&lt;a href="https://kubernetes.io/docs/tasks/run-application/configure-pdb/" rel="noopener noreferrer"&gt;scheduling&lt;/a&gt;).&lt;/li&gt;
&lt;li&gt;Nodes are cattle, so it always should be possible to reschedule pods: The share-nothing concept.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  10. Dev/prod parity
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Keep development, staging, and production &lt;a href="https://12factor.net/dev-prod-parity" rel="noopener noreferrer"&gt;as similar as possible&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a broad topic and as relevant as ever. At a high level it boils down to 'Shift left': Validate changes as reliably and quickly as possible.&lt;/p&gt;

&lt;p&gt;Solutions are many, and could include &lt;a href="https://docs.docker.com/compose/" rel="noopener noreferrer"&gt;Docker Compose&lt;/a&gt;, &lt;a href="https://code.visualstudio.com/docs/devcontainers/containers" rel="noopener noreferrer"&gt;VS Code dev containers&lt;/a&gt;, &lt;a href="https://www.telepresence.io/" rel="noopener noreferrer"&gt;Telepresence&lt;/a&gt;, &lt;a href="https://www.localstack.cloud/" rel="noopener noreferrer"&gt;Localstack&lt;/a&gt; or setting up temporary AWS accounts as a development environment for serverless applications.&lt;/p&gt;

&lt;h3&gt;
  
  
  11. Logs
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Treat logs as &lt;a href="https://12factor.net/logs" rel="noopener noreferrer"&gt;event streams&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Don't store logs in files. Don't 'ship' logs in the application.&lt;/p&gt;

&lt;p&gt;The operating system or orchestrator should capture the output stream and route it to the logging storage of choice.&lt;/p&gt;

&lt;p&gt;Where the 12 factor methodology shows its age a bit is that there is no mention of metrics and traces, together with logs, often referred to as "the three pillars of observability".&lt;/p&gt;

&lt;p&gt;Extrapolating the approach to logging, consider systems that 'wrap' an application instead of requiring a detailed implementation. &lt;a href="https://opentelemetry.io/docs/concepts/instrumentation/zero-code/" rel="noopener noreferrer"&gt;OpenTelemetry zero-code instrumentation&lt;/a&gt; could be a good starting point. APM agents of observability SaaS platforms such as New Relic or Datadog can be applied similarly.&lt;/p&gt;

&lt;h3&gt;
  
  
  12. Admin processes
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;Run admin/management tasks as one-off processes&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This fragment in the full description might summarize it better: "Admin code should ship with the application code".&lt;/p&gt;

&lt;p&gt;This is about tasks like changing database schema, or uploading asset bundles to a centralized storage location. &lt;/p&gt;

&lt;p&gt;The goal is to rule out any synchronization issues. Keywords are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Identical environment&lt;/li&gt;
&lt;li&gt;Same codebase&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summarizing the 12 factors
&lt;/h2&gt;

&lt;p&gt;As long as we try to grasp the idea behind the factors instead of following every detail, I would say most of the factors hold up quite well.&lt;/p&gt;

&lt;p&gt;Some recommendations have become more or less common practice over the years. Some other recommendations have a bit of overlap. For example: Externalizing state (factor 4) makes concurrency (factor 8) and disposability (factor 9) easier to accomplish.&lt;/p&gt;

&lt;h2&gt;
  
  
  Factor 13: Forward and backward compatibility
&lt;/h2&gt;

&lt;p&gt;There is a point not addressed in the 12 factor methodology that in my experience has always made an application easier to operate: Backward and forward compatibility.&lt;/p&gt;

&lt;p&gt;These days we expect application deployments to be frequent and without any downtime. That implies either rolling updates or blue/green deployments. Even blue/green deployments, in large distributed platforms, are hardly ever truly atomic. And deployment patterns like canary deployments, imply being able to roll back.&lt;/p&gt;

&lt;p&gt;So, getting this right opens up the path the frequent friction-less deploys.&lt;/p&gt;

&lt;p&gt;This is about databases, cached data and API contracts. We need to consider:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How does our application handle data while version &lt;code&gt;N&lt;/code&gt; and &lt;code&gt;N+1&lt;/code&gt; are running simultaneously?&lt;/li&gt;
&lt;li&gt;What happens if we need to roll back from &lt;code&gt;N+1&lt;/code&gt; to &lt;code&gt;N&lt;/code&gt;?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some pointers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When changing the database schema, first &lt;em&gt;add&lt;/em&gt; columns. Only remove the columns in a subsequent release once the data has been migrated.&lt;/li&gt;
&lt;li&gt;First add a field to an API or event schema, only then update consumers to actually expect the new field.&lt;/li&gt;
&lt;li&gt;Consider compatibility of cached objects. Prefixing cache-keys with something unique to the application version can help here.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What will happen with data in the transition period? Store the data in old &lt;em&gt;and&lt;/em&gt; new format? Do we need to store version information with the data and support multiple versions?&lt;/p&gt;

&lt;p&gt;This can be complicated for applications provided for others to operate, unlike applications operated by the developing team itself, and released via CI/CD. External users often don't follow all minor releases, making it more likely to not have backward compatibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Some of the above recommendations might take additional effort. However, in my experience that is worth it and will be paid back (with interest) by ease of operations, piece of mind and a reduced need for coordination of releases. &lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;&lt;a href="https://en.wikipedia.org/wiki/XS4ALL" rel="noopener noreferrer"&gt;XS4All&lt;/a&gt; had a great culture, showing its roots: Passionate, knowledgeable and vocal. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>cloudnative</category>
      <category>kubernetes</category>
      <category>operations</category>
      <category>development</category>
    </item>
    <item>
      <title>EKS and the quest for IP addresses: Secondary CIDR ranges and private NAT gateways</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Thu, 10 Feb 2022 11:59:19 +0000</pubDate>
      <link>https://dev.to/tbeijen/eks-and-the-quest-for-ip-addresses-secondary-cidr-ranges-and-private-nat-gateways-109o</link>
      <guid>https://dev.to/tbeijen/eks-and-the-quest-for-ip-addresses-secondary-cidr-ranges-and-private-nat-gateways-109o</guid>
      <description>&lt;h2&gt;
  
  
  EKS and its hunger for IP addresses
&lt;/h2&gt;

&lt;p&gt;Kubernetes allows running highly diverse workloads with similar effort. From a user perspective there's little difference between running 2 pods on a node, each consuming 2 vCPU, and running tens of pods each consuming 0.05 vCPU. Looking at the network however, there is a big difference: Each pod needs to have a unique IP address. In most Kubernetes implementations there is a CNI plugin that allocates each pod an IP address in an IP space that is &lt;em&gt;internal&lt;/em&gt; to the cluster. &lt;/p&gt;

&lt;p&gt;EKS, the managed Kubernetes offering by AWS, &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/pod-networking.html" rel="noopener noreferrer"&gt;by default&lt;/a&gt; uses the &lt;a href="https://github.com/aws/amazon-vpc-cni-k8s" rel="noopener noreferrer"&gt;Amazon VPC CNI plugin for Kubernetes&lt;/a&gt;. Different to most networking implementations, this assigns each pod a dedicated IP address in the VPC, the network the nodes reside in.&lt;/p&gt;

&lt;p&gt;What the VPC CNI plugin does &lt;sup id="fnref1"&gt;1&lt;/sup&gt; boils down to this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It keeps a number of network interfaces (ENIs) and IP addresses 'warm' on each node, to be able to quickly assign IP addresses to new pods.&lt;/li&gt;
&lt;li&gt;By default it keeps an entire spare ENI warm.&lt;/li&gt;
&lt;li&gt;This means that any node effectively claims &lt;code&gt;2 ENIs * ips-per-ENI&lt;/code&gt;, since there will always be at least one daemonset claiming an IP address of the first ENI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now if we look at the &lt;a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#AvailableIpPerENI" rel="noopener noreferrer"&gt;list of available IP addresses per ENI&lt;/a&gt; and calculate an example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;EC2 type &lt;code&gt;m5.xlarge&lt;/code&gt;, 15 IP addresses per ENI. 30 IP addresses at minimum per node.&lt;/li&gt;
&lt;li&gt;Say, we have 50 nodes running. That's 1500 private addresses taken. (For perspective: That's ~$7000/month worth of on-demand EC2 compute).&lt;/li&gt;
&lt;li&gt;Say, we have &lt;code&gt;/21&lt;/code&gt; VPC, providing 3 &lt;code&gt;/23&lt;/code&gt; private subnets. That's &lt;code&gt;3 x 512 = 1536&lt;/code&gt; available IP addresses.&lt;/li&gt;
&lt;li&gt;Managed services also need IP addresses...&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can see where this is going. So, creating &lt;code&gt;/16&lt;/code&gt; VPCs it is then? Probably not.&lt;/p&gt;

&lt;h2&gt;
  
  
  Multiple VPCs
&lt;/h2&gt;

&lt;p&gt;In a lot of organizations there is not just one VPC. The networking landscape might be a combination of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multiple AWS accounts and VPCs in one or more regions&lt;/li&gt;
&lt;li&gt;Data centers&lt;/li&gt;
&lt;li&gt;Office networks&lt;/li&gt;
&lt;li&gt;Peered services, like DBaaS from providers other than AWS&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are &lt;a href="https://docs.aws.amazon.com/whitepapers/latest/aws-vpc-connectivity-options/welcome.html" rel="noopener noreferrer"&gt;many ways&lt;/a&gt; to connect VPCs and other networks. The larger the CIDR range is that needs to be routable from outside the VPC, the more likely it becomes that there is overlap.&lt;/p&gt;

&lt;p&gt;As a result, in larger organizations, individual AWS accounts are typically provided a VPC with a relatively small CIDR range, that fits in the larger networking plan. To still have 'lots of ips', AWS VPCs &lt;a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Subnets.html#VPC_Sizing" rel="noopener noreferrer"&gt;can be configured with&lt;/a&gt; with secondary CIDR ranges.&lt;/p&gt;

&lt;p&gt;This solves the IP space problem, however does not by itself solve the routing problem. The secondary CIDR range would still need to be unique in the total networking landscape to be routable from outside the VPC. This could not be an actual problem if workloads in the secondary CIDR &lt;em&gt;only&lt;/em&gt; need to connect to resources within the VPC but this very often is not the case.&lt;/p&gt;

&lt;p&gt;Quite recently AWS introduced &lt;a href="https://aws.amazon.com/blogs/networking-and-content-delivery/how-to-solve-private-ip-exhaustion-with-private-nat-solution/" rel="noopener noreferrer"&gt;Private NAT gateways&lt;/a&gt; which, together with custom networking, are options to facilitate routable EKS pods in secondary CIDR ranges.&lt;/p&gt;

&lt;h2&gt;
  
  
  VPC setups
&lt;/h2&gt;

&lt;p&gt;Let's go over some VPC setups to illustrate the problem and see how we can run EKS.&lt;/p&gt;

&lt;h3&gt;
  
  
  Basic
&lt;/h3&gt;

&lt;p&gt;A basic VPC consists of a single CIDR range, some private and public subnets, a NAT gateway and an Internet Gateway. Depending on the primary CIDR range size this might be sufficient, but in the scope of larger organizations let's assume a relatively small CIDR range.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8swh6j7x25k9a8bas8yy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8swh6j7x25k9a8bas8yy.png" alt="Basic VPC" width="580" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pro: Simple&lt;/li&gt;
&lt;li&gt;Con: Private IP exhaustion &lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secondary CIDR range
&lt;/h3&gt;

&lt;p&gt;Next step: Adding a secondary CIDR range, placing nodes and pods in the secondary subnets. This &lt;em&gt;could&lt;/em&gt; work if workloads never need to connect to resources in private networks outside the VPC, which is unlikely. Theoretically pods would be able to send packets to other VPCs but there is no route back.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2eeys4g6z7ad2t0z7q5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2eeys4g6z7ad2t0z7q5h.png" alt="Secondary CIDR range" width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pro: Simple&lt;/li&gt;
&lt;li&gt;Con: No route between pods and private resources outside the VPC&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secondary CIDR range + custom networking
&lt;/h3&gt;

&lt;p&gt;To remedy the routing problem, custom networking can be enabled in the VPC CNI plugin. This allows placing the nodes and pods in different subnets. Nodes go into the primary private subnets, pods go into the secondary private subnet. This solves the routing problem since by default, for traffic to external networks, the CNI plugin translates the pods IP address to the primary IP address of the node (SNAT). In this setup those nodes are in routable subnets.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15ytxbgjicpgoya6wy0r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F15ytxbgjicpgoya6wy0r.png" alt="Secondary CIDR range + Custom networking" width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Setting up secondary CIDR ranges and custom networking is described in the &lt;a href="https://aws.amazon.com/premiumsupport/knowledge-center/eks-multiple-cidr-ranges/" rel="noopener noreferrer"&gt;AWS knowledge center&lt;/a&gt; and also in the &lt;a href="https://www.eksworkshop.com/beginner/160_advanced-networking/secondary_cidr/" rel="noopener noreferrer"&gt;Amazon EKS Workshop&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Be aware that Source Network Address Translation &lt;a href="https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html" rel="noopener noreferrer"&gt;is disabled when using security groups for pods&lt;/a&gt;&lt;sup id="fnref2"&gt;2&lt;/sup&gt;: &lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Source NAT is disabled for outbound traffic from pods with assigned security groups so that outbound security group rules are applied. To access the internet, pods with assigned security groups must be launched on nodes that are deployed in a private subnet configured with a NAT gateway or instance. Pods with assigned security groups deployed to public subnets are not able to access the internet.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;Pro: No additional NAT gateway needed&lt;/li&gt;
&lt;li&gt;Con: Complex VPC CNI network configuration&lt;/li&gt;
&lt;li&gt;Con: Not compatible with security groups for pods&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Secondary cidr + private NAT gateway
&lt;/h3&gt;

&lt;p&gt;Instead of configuring custom networking, it is also possible to solve the routing problem by using a private NAT gateway. Unlike a public NAT gateway, it is placed in a private subnet and is not linked to an internet gateway.&lt;/p&gt;

&lt;p&gt;This way nodes &lt;em&gt;and&lt;/em&gt; pods can run in the secondary CIDR range, and the routing problem is solved outside of EKS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqwbkc1bcftjjqm2tnj5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzqwbkc1bcftjjqm2tnj5.png" alt="Secondary CIDR range + Private NAT gateway" width="800" height="351"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pro: Straightforward default VPC CNI network configuration&lt;/li&gt;
&lt;li&gt;Pro: Can be used with security group for pods&lt;/li&gt;
&lt;li&gt;Con: NAT gateway incurs cost&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Routing and controlling cost
&lt;/h2&gt;

&lt;h3&gt;
  
  
  One NAT gateway is enough
&lt;/h3&gt;

&lt;p&gt;Let's take a look at the most basic route table one can set up for the secondary private subnet:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10.150.40.0/21  local   
100.64.0.0/16   local   
0.0.0.0/0       nat-&amp;lt;private-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This puts any non-VPC traffic into the primary private subnet, and lets the route table that is configured there do the rest. Simple, but there is a catch&lt;sup id="fnref3"&gt;3&lt;/sup&gt; which we can observe when testing internet connectivity from a node.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[ec2-user@ip-100-64-43-196 ~]$ ping www.google.com
PING www.google.com (74.125.193.147) 56(84) bytes of data.
64 bytes from ig-in-f147.1e100.net (74.125.193.147): icmp_seq=1 ttl=49 time=2.31 ms
^C

[ec2-user@ip-100-64-43-196 ~]$ tracepath -p 443 74.125.193.147
 1?: [LOCALHOST]                                         pmtu 9001
 1:  ip-10-150-42-36.eu-west-1.compute.internal            0.168ms
 1:  ip-10-150-42-36.eu-west-1.compute.internal            1.016ms
 2:  ip-10-150-40-116.eu-west-1.compute.internal           0.739ms
 3:  ip-10-150-40-1.eu-west-1.compute.internal             1.510ms pmtu 1500
 3:  no reply
^C
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looking at the trace, and at the NAT gateways that exist, we can see that traffic passes the private &lt;em&gt;and&lt;/em&gt; the NAT gateway.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbpxewxzgm77td75lusr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcbpxewxzgm77td75lusr.png" alt="NAT gateways that exist in the VPC" width="800" height="50"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A careful observer might have noticed the green line in the traffic diagram bypassing the private NAT gateway. To accomplish this one needs to adjust the routing table by &lt;em&gt;only&lt;/em&gt; directing private network traffic to the private NAT gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;10.150.40.0/21  local   
100.64.0.0/16   local   
10.0.0.0/8      nat-&amp;lt;private-id&amp;gt;
0.0.0.0/0       nat-&amp;lt;public-id&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Halving the amount of traffic passing through NAT gateways is halving the cost (ignoring the fixed fee of a NAT gateway).&lt;/p&gt;

&lt;h3&gt;
  
  
  VPC endpoints and peering connections
&lt;/h3&gt;

&lt;p&gt;The above illustrates that it is important to replicate route table entries for VPC endpoints and peering connections, that exist in the primary private subnets, to avoid traffic unnecessarily passing through the private NAT gateway. It will (probably) work but it brings unneeded cost.&lt;/p&gt;

&lt;p&gt;A reminder: Since the planets that are DNS, routing and security groups need to align, be sure to grant the secondary CIDR range access to any VPC endpoint of the type 'Interface' that exist in the VPC. Not doing so will have DNS return a VPC-local IP address which will &lt;em&gt;not&lt;/em&gt; go through the private NAT gateway and hence will be blocked by the security group on the VPC endpoint.&lt;/p&gt;

&lt;h2&gt;
  
  
  Concluding
&lt;/h2&gt;

&lt;p&gt;Private NAT gateways can be an alternative to custom networking when running EKS pods in secondary CIDR ranges. As always, there are trade-offs that need to be considered, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amount of network traffic going over Transit Gateway and by that the private NAT gateway&lt;/li&gt;
&lt;li&gt;Ability to use security groups for pods&lt;/li&gt;
&lt;li&gt;Complexity of set-up&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The above should give some insight in the world of EKS networking and hopefully provides pointers to what to investigate more deeply and what pitfalls to avoid. As always, feel free to &lt;a href="https://twitter.com/TBeijen" rel="noopener noreferrer"&gt;reach out on Twitter&lt;/a&gt; to discuss!&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;This is described in great detail in this blog post: &lt;a href="https://betterprogramming.pub/amazon-eks-is-eating-my-ips-e18ea057e045" rel="noopener noreferrer"&gt;https://betterprogramming.pub/amazon-eks-is-eating-my-ips-e18ea057e045&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;Disclaimer: We haven't yet enabled security groups for pods so this is theoretical. However, following the described logic of 'No NAT = no route to the internet', we can assume similar restrictions to apply to external private networks. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn3"&gt;
&lt;p&gt;Using more NAT gateways then needed can be a &lt;a href="https://twitter.com/QuinnyPig/status/1433949394915639300" rel="noopener noreferrer"&gt;serious waste of money&lt;/a&gt; and be subject to snark. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>devops</category>
      <category>aws</category>
      <category>eks</category>
      <category>vpc</category>
    </item>
    <item>
      <title>Terraform: Good plan = good apply</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Sat, 15 Jan 2022 08:04:01 +0000</pubDate>
      <link>https://dev.to/tbeijen/terraform-good-plan-good-apply-2e22</link>
      <guid>https://dev.to/tbeijen/terraform-good-plan-good-apply-2e22</guid>
      <description>&lt;p&gt;Recently I worked on some infrastructure changes that resulted in &lt;code&gt;terraform plan&lt;/code&gt; showing more, and more impactful, changes than expected. Diving deeper, it appeared that a lot of the planned changes could be avoided by some preparations, resulting in a &lt;code&gt;terraform apply&lt;/code&gt; with no impact at all.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ordered lists and state manipulation
&lt;/h2&gt;

&lt;p&gt;First of all, Terraform for quite some time supports &lt;code&gt;for_each&lt;/code&gt;, which &lt;a href="https://www.terraform.io/language/meta-arguments/count#when-to-use-for_each-instead-of-count" rel="noopener noreferrer"&gt;is a more robust way to create multiple resources&lt;/a&gt;. That said, there can be various reasons why resources have been created by iterating over a list. The most obvious one being code that originates from before &lt;code&gt;for_each&lt;/code&gt; was common, where converting to &lt;code&gt;count&lt;/code&gt; to &lt;code&gt;for_each&lt;/code&gt; would require module users to do complex migrations. (The &lt;a href="https://learn.hashicorp.com/tutorials/terraform/move-config" rel="noopener noreferrer"&gt;new 'moved' block&lt;/a&gt; will certainly help in these scenarios, although it requires users to upgrade to Terraform v1.1 first.)&lt;/p&gt;

&lt;p&gt;In this particular case we added a secondary cidr to a VPC 'A' that was peered to another VPC 'B'. The VPC peering connection was managed by CloudPosse's &lt;a href="https://github.com/cloudposse/terraform-aws-vpc-peering" rel="noopener noreferrer"&gt;terraform-aws-vpc-peering&lt;/a&gt; module.&lt;/p&gt;

&lt;p&gt;In the vocabulary of the module, VPC A is the accepter, VPC B is the requester. Before the cidr addition the peering module managed 5 route tables in VPC B: 1x VPC default, 1x public subnets, and 3x private subnet for each availability zone. In each of those route tables a route is managed by the module that routes traffic to VPC A cidr over the VPC peering connection that is managed by the module.&lt;/p&gt;

&lt;p&gt;Now with the introduction of a second cidr in VPC A, the module adds an additional route to each of the 5 route tables, resulting in a desired state like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymgodm8j5zu26sfcrb0m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fymgodm8j5zu26sfcrb0m.png" alt="Route table example showing routes to VPC peering" width="800" height="108"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It &lt;a href="https://github.com/cloudposse/terraform-aws-vpc-peering/blob/master/main.tf#L52" rel="noopener noreferrer"&gt;does so&lt;/a&gt; by looping over a combination of the accepter VPC &lt;code&gt;cidr_block_associations&lt;/code&gt; attributes and requester VPC route tables. This results in a &lt;code&gt;terraform plan&lt;/code&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# module.vpc_peering.aws_route.requestor[1] must be replaced
-/+ resource "aws_route" "requestor" {
      ~ destination_cidr_block     = "10.123.32.0/21" -&amp;gt; "100.64.0.0/16" # forces replacement
      # truncated for readability
      ~ route_table_id             = "rtb-some-id" -&amp;gt; "rtb-other-id" # forces replacement
      ~ state                      = "active" -&amp;gt; (known after apply)
        vpc_peering_connection_id  = "pcx-peering-to-vpc-a"
    }

  # module.vpc_peering.aws_route.requestor[2] must be replaced
-/+ resource "aws_route" "requestor" {
        destination_cidr_block     = "10.123.32.0/21"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[3] must be replaced
-/+ resource "aws_route" "requestor" {
      ~ destination_cidr_block     = "10.123.32.0/21" -&amp;gt; "100.64.0.0/16" # forces replacement
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[4] must be replaced
-/+ resource "aws_route" "requestor" {
        destination_cidr_block     = "10.123.32.0/21"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[5] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[6] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "10.123.32.0/21"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[7] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[8] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "10.123.32.0/21"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[9] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I tried applying a change like this to a non-critical development environment and it applies quite fast. However, for production, 'quite fast' is not good enough. Futhermore, if for some odd reason it partially fails, you and your visitors are having a very bad day.&lt;/p&gt;

&lt;p&gt;Looking closer, and helped by &lt;a href="https://github.com/cloudposse/terraform-aws-vpc-peering/blob/master/main.tf#L52" rel="noopener noreferrer"&gt;looking at the module source&lt;/a&gt;, one can distinguish an alternating pattern in the destination cidr block: Primary, secondary, primary, secondary, repeat.&lt;/p&gt;

&lt;p&gt;This is validated by comparing some of the planned &lt;em&gt;additions&lt;/em&gt; to what's currently at different locations in state, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Same destination cidr and route table as planned addition [6]
terraform state show module.vpc_peering.aws_route.requestor[3]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Long story short, moving the existing resources in terraform state to where the module expects them to be in this new situation, results in a much more straightforward plan.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform state module.vpc_peering.aws_route.requestor[4] module.vpc_peering.aws_route.requestor[8]
terraform state module.vpc_peering.aws_route.requestor[3] module.vpc_peering.aws_route.requestor[6]
terraform state module.vpc_peering.aws_route.requestor[1] module.vpc_peering.aws_route.requestor[2]
terraform state module.vpc_peering.aws_route.requestor[2] module.vpc_peering.aws_route.requestor[4]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The resulting plan:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  # module.vpc_peering.aws_route.requestor[1] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[3] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[5] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[7] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }

  # module.vpc_peering.aws_route.requestor[9] will be created
  + resource "aws_route" "requestor" {
      + destination_cidr_block     = "100.64.0.0/16"
      # truncated for readability
    }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Much better and can be applied with zero impact. Another case where a simple change would result in an unexpected amount of &lt;code&gt;terraform plan&lt;/code&gt; output was the following:&lt;/p&gt;

&lt;h2&gt;
  
  
  Using --target to prevent computed values side effects
&lt;/h2&gt;

&lt;p&gt;We manage some EKS clusters, having managed node groups, using the &lt;a href="https://github.com/terraform-aws-modules/terraform-aws-eks" rel="noopener noreferrer"&gt;terraform-aws-eks&lt;/a&gt; module.&lt;/p&gt;

&lt;p&gt;Changing a property of a cluster, the &lt;code&gt;public_access_cidrs&lt;/code&gt; resulted in quote some planned changes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The cidr addition, as expected.&lt;/li&gt;
&lt;li&gt;Apparently the mere fact that the EKS cluster itself is changed, causes a computed value change that introduces a new launch template version.&lt;/li&gt;
&lt;li&gt;The new launch template version causes a managed node group update, causing EKS to replace all nodes. Not a problem per se, since workloads should be able to handle this, but it takes a considerable amount of time.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# All resources truncated for readability

  # module.eks_cluster_a.module.eks_cluster.aws_eks_cluster.this[0] will be updated in-place
  ~ resource "aws_eks_cluster" "this" {
      ~ vpc_config {
          ~ public_access_cidrs       = [
              # Yes, we fully trust Cloudflare DNS to not hack into our cluster
              + "1.1.1.1/32",
            ]
        }
    }

  # module.eks_cluster_a.module.eks_cluster.module.node_groups.aws_eks_node_group.workers["ng_a"] will be updated in-place
  ~ resource "aws_eks_node_group" "workers" {
      ~ launch_template {
            id      = "lt-someid"
            name    = "cluster_a-ng_a20211222060030102500000001"
          ~ version = "5" -&amp;gt; (known after apply)
        }
    }

  # module.eks_cluster_a.module.eks_cluster.module.node_groups.aws_launch_template.workers["ng_a"] will be updated in-place
  ~ resource "aws_launch_template" "workers" {
      ~ default_version         = 5 -&amp;gt; (known after apply)
      ~ latest_version          = 5 -&amp;gt; (known after apply)
      ~ user_data               = "base64-gibberish" -&amp;gt; (known after apply)
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This seemed a bit over-the-top for a cidr addition. And indeed it can be avoided. &lt;/p&gt;

&lt;p&gt;Targeting only the cluster obviously shows only a change to the cluster (and the removal of several outputs):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;terraform plan --target=module.eks_cluster_a.module.eks_cluster.aws_eks_cluster.this[0]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Applying this, and then running &lt;code&gt;terraform plan&lt;/code&gt; on the entire project results in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Truncated for readability: Some read data sources that change

Plan: 0 to add, 0 to change, 0 to destroy.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once again, much better!&lt;/p&gt;

&lt;h2&gt;
  
  
  Concluding
&lt;/h2&gt;

&lt;p&gt;When confronted with impactful planned changes, there might be more options than hope for the best, schedule at night or sit it out.&lt;/p&gt;

&lt;p&gt;It's safe to do a &lt;code&gt;terraform plan&lt;/code&gt; so when suspecting a chain of dependencies, experimenting with &lt;code&gt;--target&lt;/code&gt; can help.&lt;/p&gt;

&lt;p&gt;Modifying state is more tricky. What works for me is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prepare &lt;em&gt;all&lt;/em&gt; of the state mv changes in a txt file first before applying them.&lt;/li&gt;
&lt;li&gt;Make sure to have the current state backed up (e.g. Copy &lt;code&gt;terraform state show&lt;/code&gt; output to a file).&lt;/li&gt;
&lt;li&gt;Know how to revert the moves if needed.&lt;/li&gt;
&lt;li&gt;Test the pattern on a non-prod environment first.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hopefully the above helps anyone to use Terraform with confidence without breaking (important) things. If you have feedback or comments, be sure to leave a comment or &lt;a href="https://twitter.com/TBeijen" rel="noopener noreferrer"&gt;reach out on Twitter&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>devops</category>
      <category>terraform</category>
      <category>iac</category>
      <category>aws</category>
    </item>
    <item>
      <title>Shifting Akamai to the left using Terraform</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Fri, 03 Dec 2021 12:52:00 +0000</pubDate>
      <link>https://dev.to/tbeijen/shifting-akamai-to-the-left-using-terraform-353m</link>
      <guid>https://dev.to/tbeijen/shifting-akamai-to-the-left-using-terraform-353m</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was originally written for &lt;a href="https://www.tibobeijen.nl/2021/12/03/shift-left-akamai-terraform/" rel="noopener noreferrer"&gt;my personal blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Recently &lt;a href="https://www.nu.nl" rel="noopener noreferrer"&gt;we&lt;/a&gt; migrated our CDNs from Cloudfront to Akamai. We use Terraform for infrastructure as code (IaC) and luckily it supports Akamai as well. Since we had Cloudfront distributions for pretty much every environment, it served as a good moment to reflect on what we've taken for granted in the past years, especially since Akamai has the concept of a 'staging network' which doesn't naturally seem to fit in a test-early, test-often approach (Spoiler alert: We don't use the staging network).&lt;/p&gt;

&lt;h2&gt;
  
  
  Shift Left testing
&lt;/h2&gt;

&lt;p&gt;"Shift left" is popular in the contemporary agile and DevOps IT-landscape, and for good reasons. This article &lt;a href="https://www.bmc.com/blogs/what-is-shift-left-shift-left-testing-explained/" rel="noopener noreferrer"&gt;by BMC&lt;/a&gt; summarizes it nicely:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Shift Left is a practice intended to find and prevent defects early in the software delivery process. The idea is to improve quality by moving tasks to the left as early in the lifecycle as possible. Shift Left testing means testing earlier in the software development process.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Shift left testing illustrated:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ucpw3r03ztbv3wywtnb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2ucpw3r03ztbv3wywtnb.png" alt="Shift Left testing (image by Launchable, Inc.)" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Quoting yet another source, &lt;a href="https://devops.com/devops-shift-left-avoid-failure/" rel="noopener noreferrer"&gt;devops.com&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Shifting left requires two key DevOps practices: continuous testing and continuous deployment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Another way to reduce the failure rate is to make all environments in the pipeline look as much like production as possible.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So, how does a CDN (any CDN, be it Cloudfront, Akamai, Fastly, you name it) fit into this shift left approach? Very well actually, as long as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The CDN isn't limited to production only but is present in every environment as early as possible in the development lifecycle. &lt;sup id="fnref1"&gt;1&lt;/sup&gt;
&lt;/li&gt;
&lt;li&gt;Setting up and updating the CDN should be no different than any other code or infra change. As &lt;a href="https://martinfowler.com/bliki/FrequencyReducesDifficulty.html" rel="noopener noreferrer"&gt;Martin Fowler says&lt;/a&gt;: "If it hurts, do it more often".&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Akamai concepts
&lt;/h2&gt;

&lt;p&gt;Compared to Cloudfront, Akamai has some advanced concepts that need to be fitted into an IaC workflow in some way.&lt;/p&gt;

&lt;h3&gt;
  
  
  Activation
&lt;/h3&gt;

&lt;p&gt;An Akamai 'property' (what in Cloudfront is called a 'distribution') has versions of which one is active at any moment, typically the most recent one. When modifying a configuration of which the latest version is active, this results in a new property version which can be activated when ready. &lt;/p&gt;

&lt;h3&gt;
  
  
  The staging network
&lt;/h3&gt;

&lt;p&gt;Akamai provides two networks: Production and staging. Property versions can be activated on the staging and production networks independently. The staging network is feature-complete but doesn't provide the performance of the production network. If the production network would use &lt;code&gt;mysite.com.edgekey.net&lt;/code&gt; then the staging network would be accessible using &lt;code&gt;mysite.com.edgekey-staging.net&lt;/code&gt;. This &lt;a href="https://learn.akamai.com/en-us/webhelp/ion/web-performance-getting-started-for-http-properties/GUID-094B3C1E-0205-4104-A091-36FD4E28362D.html" rel="noopener noreferrer"&gt;can be used by modifying&lt;/a&gt; the &lt;code&gt;/etc/hosts&lt;/code&gt; file, to allow testing before activating the version on the production network.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adapting to IaC
&lt;/h3&gt;

&lt;p&gt;One can observe that both of the above concepts seem to originate from a more traditional acceptance testing practice happening late in the development lifecycle. In an IaC practice they loose some of their relevance and can even cause ambiguity that can be considered undesirable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Configuration versions are already present by having configuration in source control. The active version is determined by the branching model that is used (commonly 'latest master'), combined with any automation that exists.&lt;/li&gt;
&lt;li&gt;The Akamai staging network can be used to test a property version, but it's not really a staging environment since it uses the &lt;em&gt;production&lt;/em&gt; origins. &lt;sup id="fnref2"&gt;2&lt;/sup&gt; To illustrate: One could only test the integration of an application and a CDN change &lt;em&gt;after&lt;/em&gt; deploying the application to production. This limits the scope of what can be tested using the staging network. So for test, let alone multiple test (feature) environments, more than one property is needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What we found works well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a property for all environments: test (one or multiple), staging and production.&lt;/li&gt;
&lt;li&gt;Always activate the latest version.&lt;/li&gt;
&lt;li&gt;Test on test, which is fully representative, using any automation one has, for example &lt;a href="https://www.cypress.io/" rel="noopener noreferrer"&gt;cypress&lt;/a&gt; e2e tests.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This way the delivery of Akamai config changes is identical to that of application changes.&lt;/p&gt;

&lt;p&gt;Note that it still allows shit-hits-the-fan rollbacks: The first hour after activating a production property version, there's a quick fallback option. This can be activated (stop the bleeding), after which the active version defined in IaC can be aligned with the actual active version and a fix can be worked on (proper surgery). &lt;/p&gt;

&lt;h2&gt;
  
  
  Terraform
&lt;/h2&gt;

&lt;p&gt;Overall the Terraform module does a fine job in translating declarative Terraform config into Akamai API actions. There are however some things to consider:&lt;/p&gt;

&lt;h3&gt;
  
  
  Version to be activated
&lt;/h3&gt;

&lt;p&gt;An activation is a &lt;a href="https://registry.terraform.io/providers/akamai/akamai/latest/docs/resources/property_activation" rel="noopener noreferrer"&gt;separate Terraform resource&lt;/a&gt;. What happens under the hood is that if the version changes it will use Akamai's Property API (PAPI) to &lt;a href="https://developer.akamai.com/api/core_features/property_manager/v1.html#postpropertyactivations" rel="noopener noreferrer"&gt;create a new activation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://registry.terraform.io/providers/akamai/akamai/latest/docs/resources/property" rel="noopener noreferrer"&gt;Terraform property resource&lt;/a&gt; has 3 attributes related to versions: &lt;code&gt;latest_version&lt;/code&gt;, &lt;code&gt;production_version&lt;/code&gt; and &lt;code&gt;staging_version&lt;/code&gt;. These are determined &lt;em&gt;after&lt;/em&gt; the property has been updated, but &lt;em&gt;before&lt;/em&gt; any activation has finished. &lt;/p&gt;

&lt;p&gt;We take 'always activating latest' as a starting point. However, scenarios can exist where you want to pin a version. One possible way to accomplish this is a setting a &lt;code&gt;local&lt;/code&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;locals {
  production_version_to_activate = (var.production_activate_latest == true ? 
    akamai_property.property.latest_version : 
    (var.production_pinned_version &amp;gt; 0 ? var.production_pinned_version : akamai_property.property.production_version))
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Having variable defaults:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Note: Similar variables would exist for staging network
variable "production_pinned_version" {
  description = "Pin PRODUCTION network activation to this version. Set to 0 to always use previous property version on production (don't activate any property changes)."
  type        = number
  default     = 0
}

variable "production_activate_latest" {
  description = "Apply latest version to production. This supersedes any pinned version so disable if wanting to stay at a specific version."
  default     = true
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way  &lt;code&gt;tfvars&lt;/code&gt; can be set for various scenarios following below examples:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Directly activate latest property version (default)
production_activate_latest   = true

# Stick to previously active version (update the property, activate later, or via GUI)
production_activate_latest   = false

# Activate specific version (e.g. reverting to known to work version)
production_pinned_version    = 7
production_activate_latest   = false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Slow activations
&lt;/h3&gt;

&lt;p&gt;Activating the staging network takes about 2 to 3 mins. Activating production typically takes between 9 and 11 minutes. To shorten the feedback loop, one can configure DNS for the test environment to use Akamai's staging network, and avoid activating the production network altogether. Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;test.mysite.com CNAME test.mysite.com.edgekey-staging.net
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Given low traffic, the cache-hit ratio on test usually can't be compared to production anyway, so not having production performance would normally not be an issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implicit edge hostnames
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://registry.terraform.io/providers/akamai/akamai/latest/docs/resources/edge_hostname" rel="noopener noreferrer"&gt;edge hostname&lt;/a&gt; resource requires to set a certificate enrollment ID when using enhanced TLS (edge hostnames ending in &lt;code&gt;edgekey.net&lt;/code&gt;). However, if you're a 'Secure by default' customer, you &lt;em&gt;can&lt;/em&gt; (not: must) &lt;a href="https://learn.akamai.com/en-us/learn_akamai/getting_started_with_akamai_developers/core_features/create_edgehostnames.html#cpsprerequisitefortls" rel="noopener noreferrer"&gt;use default certificates&lt;/a&gt;. In that case the edge hostname will be &lt;a href="https://developer.akamai.com/api/core_features/property_manager/v1.html#postedgehostnames" rel="noopener noreferrer"&gt;created implicitly by the property manager API&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;As a result the edge hostname that is created is not managed via Terraform. Most of the &lt;a href="https://registry.terraform.io/providers/akamai/akamai/latest/docs/resources/edge_hostname#argument-reference" rel="noopener noreferrer"&gt;edge hostname attributes&lt;/a&gt; hardly ever needs to be changed, but for &lt;a href="https://registry.terraform.io/providers/akamai/akamai/latest/docs/resources/edge_hostname#ip_behavior" rel="noopener noreferrer"&gt;ip_behavior&lt;/a&gt; this can be a problem (&lt;a href="https://github.com/akamai/terraform-provider-akamai/issues/268" rel="noopener noreferrer"&gt;Github issue&lt;/a&gt;).&lt;/p&gt;

&lt;h2&gt;
  
  
  Final thoughts
&lt;/h2&gt;

&lt;p&gt;The main take-away is: Treat a CDN like any other cloud resource, making sure to have representative environments as early as possible in the development lifecycle, whether it is via Terraform, the &lt;a href="https://developer.akamai.com/cli" rel="noopener noreferrer"&gt;Akamai CLI&lt;/a&gt; or another tool of choice. &lt;/p&gt;

&lt;p&gt;Shift-left in the context of Akamai results in achieving confidence in provisioning &lt;code&gt;1...n&lt;/code&gt; near-identical CDN properties, reducing the need for the Akamai's staging network and ultimately speeding up the delivery process.&lt;/p&gt;

&lt;p&gt;Worth noting is that end-to-end tests in a caching setup can be challenging to keep fast due to cache ttl. This can be mitigated via cachebusters, reduced &lt;code&gt;max-age&lt;/code&gt; values in response headers or other constructs. &lt;/p&gt;

&lt;p&gt;A representative test environment with carefully considered exceptions still beats shifting right.&lt;/p&gt;

&lt;p&gt;Thanks for reading! Please leave any feedback or comments below, or &lt;a href="https://twitter.com/TBeijen" rel="noopener noreferrer"&gt;find me on Twitter&lt;/a&gt;.&lt;/p&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;For a CDN, representative &lt;em&gt;local&lt;/em&gt; development seems a bit far-fetched, but once you deploy, having a representative environment should be the goal. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn2"&gt;
&lt;p&gt;One could attempt to mitigate this by selecting a staging origin based on the request host, but this is a bad idea for a variety of reasons, the most obvious one being that it adds complexity that can easily backfire (production traffic ending up on staging origin), while still being limited to just production and staging. No test. No shifting left. ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>devops</category>
      <category>akamai</category>
      <category>terraform</category>
      <category>testing</category>
    </item>
    <item>
      <title>Maximize learnings from a Kubernetes cluster failure</title>
      <dc:creator>Tibo Beijen</dc:creator>
      <pubDate>Fri, 01 Feb 2019 20:16:29 +0000</pubDate>
      <link>https://dev.to/tbeijen/maximize-learnings-from-a-kubernetes-cluster-failure-3p53</link>
      <guid>https://dev.to/tbeijen/maximize-learnings-from-a-kubernetes-cluster-failure-3p53</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This article was originally written for &lt;a href="https://www.tibobeijen.nl/2019/02/01/learning-from-kubernetes-cluster-failure/" rel="noopener noreferrer"&gt;my personal blog&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Since a number of months we (&lt;a href="https://www.nu.nl" rel="noopener noreferrer"&gt;NU.nl&lt;/a&gt; development team) operate a small number of Kubernetes clusters. We see the potential of Kubernetes and how it can increase our productivity and how it can improve our CI/CD practices. Currently we run part of our logging and building toolset on Kubernetes, plus some small (internal) customer facing workloads, with the plan to move more applications there once we have build up knowledge and confidence.&lt;/p&gt;

&lt;p&gt;Recently our team faced some problems on one of the clusters. Not as severe as to bring down the cluster completely, but definitely affecting the user experience of some internally used tools and dashboards.&lt;/p&gt;

&lt;p&gt;Coincidentally, around the same time I visited DevOpsCon 2018 in Munich, where the opening keynote &lt;a href="https://devopsconference.de/business-company-culture/staying-alive-patterns-for-failure-management-from-the-bottom-of-the-ocean/" rel="noopener noreferrer"&gt;"Staying Alive: Patterns for Failure Management from the Bottom of the Ocean"&lt;/a&gt; related very well to this incident.&lt;/p&gt;

&lt;p&gt;The talk (by &lt;a href="https://twitter.com/rondoftw" rel="noopener noreferrer"&gt;Ronnie Chen&lt;/a&gt;, engineering manager at Twitter) focussed on various ways to make DevOps teams more effective in preventing and handling failures. One of the topics addressed was how catastrophes are usually caused by a cascade of failures, resulting in this quote:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A post-mortem that blames an incident only on the root cause, might only cover ~15% of the issues that led up to the incident.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;As can be seen in &lt;a href="https://github.com/dastergon/postmortem-templates" rel="noopener noreferrer"&gt;this list of postmortem templates&lt;/a&gt;, quite a lot of them contain 'root cause(s)' (plural). Nevertheless the chain of events can be easily overlooked, especially as in a lot of situations, removing or fixing the root cause makes the problem go away.&lt;/p&gt;

&lt;p&gt;So, let's see what cascade of failures led to our incident and maximize our learnings.&lt;/p&gt;

&lt;h2&gt;
  
  
  The incident
&lt;/h2&gt;

&lt;p&gt;Our team received reports of a number of services showing erratic behavior: Occasional error pages, slow responses and time-outs.&lt;/p&gt;

&lt;p&gt;Attempting to investigate via Grafana, we experienced similar behavior affecting Grafana and Prometheus. Examining the cluster from the console resulted in:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$: kubectl get nodes
NAME                                          STATUS     ROLES     AGE       VERSION
ip-10-150-34-78.eu-west-1.compute.internal    Ready      master    43d       v1.10.6
ip-10-150-35-189.eu-west-1.compute.internal   Ready      node      2h        v1.10.6
ip-10-150-36-156.eu-west-1.compute.internal   Ready      node      2h        v1.10.6
ip-10-150-37-179.eu-west-1.compute.internal   NotReady   node      2h        v1.10.6
ip-10-150-37-37.eu-west-1.compute.internal    Ready      master    43d       v1.10.6
ip-10-150-38-190.eu-west-1.compute.internal   Ready      node      4h        v1.10.6
ip-10-150-39-21.eu-west-1.compute.internal    NotReady   node      2h        v1.10.6
ip-10-150-39-64.eu-west-1.compute.internal    Ready      master    43d       v1.10.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Nodes &lt;code&gt;NotReady&lt;/code&gt;, not good. Describing various nodes (not just the unhealthy ones) showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$: kubectl describe node ip-10-150-36-156.eu-west-1.compute.internal

&amp;lt;truncated&amp;gt;

Events:
  Type     Reason                   Age                From                                                     Message
  ----     ------                   ----               ----                                                     -------
  Normal   Starting                 36m                kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Starting kubelet.
  Normal   NodeHasSufficientDisk    36m (x2 over 36m)  kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Node ip-10-150-36-156.eu-west-1.compute.internal status is now: NodeHasSufficientDisk
  Normal   NodeHasSufficientMemory  36m (x2 over 36m)  kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Node ip-10-150-36-156.eu-west-1.compute.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    36m (x2 over 36m)  kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Node ip-10-150-36-156.eu-west-1.compute.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     36m                kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Node ip-10-150-36-156.eu-west-1.compute.internal status is now: NodeHasSufficientPID
  Normal   NodeNotReady             36m                kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Node ip-10-150-36-156.eu-west-1.compute.internal status is now: NodeNotReady
  Warning  SystemOOM                36m (x4 over 36m)  kubelet, ip-10-150-36-156.eu-west-1.compute.internal     System OOM encountered
  Normal   NodeAllocatableEnforced  36m                kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Updated Node Allocatable limit across pods
  Normal   Starting                 36m                kube-proxy, ip-10-150-36-156.eu-west-1.compute.internal  Starting kube-proxy.
  Normal   NodeReady                36m                kubelet, ip-10-150-36-156.eu-west-1.compute.internal     Node ip-10-150-36-156.eu-west-1.compute.internal status is now: NodeReady
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It looked like the node's operating system was killing processes before the &lt;code&gt;kubelet&lt;/code&gt; was able to reclaim memory, as &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior" rel="noopener noreferrer"&gt;described in the Kubernetes docs&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The nodes in our cluster are part of an auto-scaling group. So, considering we had intermittent outages and at that time had problems reaching Grafana, we decided to terminate the &lt;code&gt;NotReady&lt;/code&gt; nodes one by one to see if new nodes would remain stable. This was not the case, new nodes appeared correctly but either some existing nodes or new nodes quickly got into status &lt;code&gt;NotReady&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;It &lt;em&gt;did&lt;/em&gt; result however, in Prometheus and Grafana to be scheduled at a node that remained stable, so at least we had more data to analyze and the root cause became apparent quickly...&lt;/p&gt;

&lt;h2&gt;
  
  
  Root cause
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://grafana.com/dashboards/315" rel="noopener noreferrer"&gt;One of the dashboards&lt;/a&gt; in our Grafana setup shows cluster-wide totals as well as a graphs for pod memory and cpu usage. This quickly showed the source of our problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tibobeijen.nl%2Fimg%2Flearning_from_k8s_failure_grafana_pod_memory.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tibobeijen.nl%2Fimg%2Flearning_from_k8s_failure_grafana_pod_memory.png" alt="Pods memory usage, during and after incident"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Those lines going up into nowhere are all pods running &lt;a href="https://github.com/Yelp/elastalert" rel="noopener noreferrer"&gt;ElastAlert&lt;/a&gt;. For logs we have an Elasticsearch cluster running, and recently we had been experimenting with ElastAlert to trigger alerts based on logs. One of the alerts that was introduced shortly before the incident was an alert that would fire if our &lt;code&gt;Cloudfront-*&lt;/code&gt; indexes would not receive new documents for a certain period. As the throughput of that Cloudfront distribution is a couple of millions of request/hour, this apparently caused an enormous ramp up in memory usage. In hindsight, &lt;a href="https://elastalert.readthedocs.io/en/latest/ruletypes.html#max-query-size" rel="noopener noreferrer"&gt;digging deeper into documentation&lt;/a&gt;, we'd better have used &lt;code&gt;use_count_query&lt;/code&gt; and/or &lt;code&gt;max_query_size&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cascade of failures
&lt;/h2&gt;

&lt;p&gt;So, root cause identified, investigated and fixed. Incident closed, right? Keeping in mind the quote from before, there's is still 85% of learnings to be found, so let's dive in:&lt;/p&gt;

&lt;h3&gt;
  
  
  No alerts fired
&lt;/h3&gt;

&lt;p&gt;Obviously we were working on alerting as the root cause was related to ElastAlert. Some data to act on is (currently) only available in Elasticsearch, like log messages (occurence of keywords) or systems outside of the Kubernetes cluster. Prometheus also has an alertmanager which we still need to set up. Besides those two sources we use New Relic for APM. Regardless of the sources, and probably a   need to converge, it starts with at least &lt;em&gt;defining&lt;/em&gt; alert rules.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Define alerts related to resource usage, like CPU, Memory and disk space.&lt;/li&gt;
&lt;li&gt;Continue research on alerting strategy that effectively combines possibly multiple sources.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Grafana dashboard affected by cluster problems
&lt;/h3&gt;

&lt;p&gt;Prometheus and Grafana are very easy to set up in a Kubernetes cluster (Install some helm charts and you're pretty much up and running). However, if you can't reach your cluster, you're blind. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consider exporting metrics outside of cluster and move Grafana out of cluster as well. This is not without downsides though as very well explained &lt;a href="https://www.robustperception.io/federation-what-is-it-good-for" rel="noopener noreferrer"&gt;in this article at robustperception.io&lt;/a&gt;. An advantage might be having a single go-to point for dashboards for multiple clusters. to my knowledge &lt;a href="https://kublr.com/" rel="noopener noreferrer"&gt;Kublr&lt;/a&gt; uses a similar set-up to monitor multiple clusters.&lt;/li&gt;
&lt;li&gt;Out-of-cluster location could be EC2 but also a separate Kubernetes cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Not fully benefiting from our ELK stack
&lt;/h3&gt;

&lt;p&gt;We are running an EC2-based ELK stack that ingests a big volume of Cloudfront logs. But also logs and metrics from the Kubernetes clusters, exported by filebeat and metricbeat daemonsets. So, the data we couldn't access via the in-cluster Grafana, existed in the ELK stack as well.... but either wasn't visualized properly or was overlooked.&lt;/p&gt;

&lt;p&gt;This in general is a somewhat tricky subject: Elasticsearch on the one hand is likely to be needed anyway for centralized logs and can do metrics as well so it &lt;em&gt;could&lt;/em&gt; be the one-stop solution. However at scale it's quite a beast to operate and onboarding could really benefit from more example dashboards (imo). &lt;/p&gt;

&lt;p&gt;On the other hand, Prometheus is simple to set up, seems to be the default technology in the Kubernetes eco-system and, paired with Grafana's available dashboards, is very easy to get into.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Either visualize important metrics in ELK or improve Prometheus/Grafana availability.&lt;/li&gt;
&lt;li&gt;Improve metrics strategy.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  No CPU &amp;amp; memory limits on ElastAlert pod
&lt;/h3&gt;

&lt;p&gt;The Helm chart used to install ElastAlert &lt;a href="https://github.com/helm/charts/blob/d3fd3f11578ebf74749b1b2c994c51d5199b8599/stable/elastalert/values.yaml#L33" rel="noopener noreferrer"&gt;allows specifying resource requests and limits&lt;/a&gt;, however these do not have default values (not uncommon) and were overlooked by us.&lt;/p&gt;

&lt;p&gt;In order to enforce configuring resource limits we could have &lt;a href="https://kubernetes.io/docs/tasks/administer-cluster/manage-resources/memory-default-namespace/#create-a-limitrange-and-a-pod" rel="noopener noreferrer"&gt;configured default and limit memory requests for our namespace&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specify resource limits via Helm values.&lt;/li&gt;
&lt;li&gt;Configure namespaces to have defaults en limits for memory requests.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Ops services affecting customer facing workloads
&lt;/h3&gt;

&lt;p&gt;Customer workloads and monitoring/logging tools sharing the same set of resources has the risk of an amplifying effect consuming all resources. Increased traffic, causing increased CPU/memory pressure, causing more logging/metric volume, causing even more CPU/memory pressure, etc.&lt;/p&gt;

&lt;p&gt;We were already planning to move all logging, monitoring and CI/CD tooling to a dedicated node group within the production cluster. Depending on our experience with that, having a dedicated 'tools' cluster is also an option. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution (was already planned):&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Isolate customer facing workloads from build, logging &amp;amp; monitoring workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  No team-wide awareness of the ElastAlert change that was deployed
&lt;/h3&gt;

&lt;p&gt;Although the new alert passed code review, the fact that it was merged and deployed was not known to everybody. More important, as it was installed via the command line by one of the team members, there was no immediate source of information that showed what applications in the cluster might have been updated.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy everything via automation (e.g. Jenkins pipelines)&lt;/li&gt;
&lt;li&gt;Consider a &lt;a href="https://thenewstack.io/gitops-git-push-all-the-things/" rel="noopener noreferrer"&gt;GitOps&lt;/a&gt; approach for deploying new application versions: 'state to be' and history of changes in code, using a tool well known by developers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  No smoke tests
&lt;/h3&gt;

&lt;p&gt;If we had deployed the ElastAlert update using a pipeline, we could have added a 'smoke test' step &lt;em&gt;after&lt;/em&gt; the deploy. This could have signalled excessive memory usage, or pod restarts due to the pod exceeding configured memory limits.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution:&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Deploy via a pipeline that includes a smoke test step.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Knowledge of operating Kubernetes limited to part of team
&lt;/h3&gt;

&lt;p&gt;Our team (as most teams) consist of people with various levels of expertise on different topics. Some have more Cloud &amp;amp; DevOps experience, some are front-end or Django experts, etc. As Kubernetes is quite new technology, and certainly for our team, knowledge was not as widespread as is desirable. As with all technologies practiced by Agile teams: DevOps should not be limited to a single (part of a) team. Luckily experienced team members were available to assist the on-call team member that had little infrastructure experience.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Resolution (was already planned):&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Ensuring Kubernetes-related work (cloud infrastructure-related in general actually) is part of team sprints and is picked up by all team members, pairing with more experienced members.&lt;/li&gt;
&lt;li&gt;Workshops deep-diving into certain topics.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrap up
&lt;/h2&gt;

&lt;p&gt;As becomes quite apparent, fixing the ElastAlert problem itself was just the tip of the iceberg. There was a lot more to learn from this seemingly simple incident. Most points listed in this article were already on our radar in one way or the other but their importance was emphasized. &lt;/p&gt;

&lt;p&gt;Turning these learnings into scrum (or kanban) items will allow us to improve our platform and practices in a focused way and measure our progress. &lt;/p&gt;

&lt;p&gt;Learning and improving as a team requires a company culture that allows 'blameless post mortems' and does not merely focus on 'number of incidents' or 'time to resolve'. To finish with a quote &lt;a href="https://devopsconference.de/continuous-delivery/i-deploy-on-fridays-and-maybe-you-should-too/" rel="noopener noreferrer"&gt;heard at a DevOps conference&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Success consists of going from failure to failure without loss of enthusiasm - Winston Churchill&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>kubernetes</category>
      <category>devops</category>
      <category>failuremanagement</category>
      <category>postmortem</category>
    </item>
  </channel>
</rss>
