<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Arthur Azrieli</title>
    <description>The latest articles on DEV Community by Arthur Azrieli (@arthurazr_46).</description>
    <link>https://dev.to/arthurazr_46</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1241887%2F8782dafd-4419-42a0-b889-09e0d3e553cf.jpg</url>
      <title>DEV Community: Arthur Azrieli</title>
      <link>https://dev.to/arthurazr_46</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/arthurazr_46"/>
    <language>en</language>
    <item>
      <title>How to Handle Alert Fatigue</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Fri, 28 Mar 2025 13:42:51 +0000</pubDate>
      <link>https://dev.to/meteorops/how-to-handle-alert-fatigue-fao</link>
      <guid>https://dev.to/meteorops/how-to-handle-alert-fatigue-fao</guid>
      <description>&lt;p&gt;A very important aspect for many developers and DevOps is handling alerts. An even more important and often overlooked aspect is alert fatigue. Alert fatigue is caused by a high volume of alerts of which many are false positives, related alerts, or duplicates. Alert handlers become so used to alerts that they disregard and miss important ones. It takes only one or two missed alerts to bring a system to a halt. However, alert fatigue is nothing but a symptom of the underlying issues that need to be addressed.‍&lt;/p&gt;

&lt;h2&gt;
  
  
  What we'll cover and learn about alert fatigue and Alertmanager
&lt;/h2&gt;

&lt;p&gt;In this article we want to discuss factors that could lead to alert fatigue, how to identify them and how to handle them. In addition, we will introduce &lt;a href="https://www.meteorops.com/blog/how-to-handle-alert-fatigue" rel="noopener noreferrer"&gt;Prometheus Alertmanager&lt;/a&gt; and how to use it in order to handle the very same scenarios that lead to alert fatigue.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;Before diving in, there are a few prerequisites if you want to follow along with the article:&lt;br&gt;
&lt;a href="https://artifacthub.io/packages/helm/prometheus-community/alertmanager" rel="noopener noreferrer"&gt;Prometheus installed&lt;/a&gt; - install by helm chart&lt;br&gt;
Alertmanager installed - alertmanager is installed by default when installing Prometheus helm chart&lt;br&gt;
Knowledge of yaml‍&lt;/p&gt;
&lt;h2&gt;
  
  
  Too Many Alerts
&lt;/h2&gt;

&lt;p&gt;Developers and DevOps are busy people. When a significant part of their time is spent on looking into or muting alerts that don't matter, we are looking at alert fatigue. To handle this situation we need to ask ourselves why there are so many alerts and why so many of them are false positives, duplicates, or simply of no value.&lt;/p&gt;
&lt;h3&gt;
  
  
  Easy Trigger Alert
&lt;/h3&gt;

&lt;p&gt;There's a tendency to sometimes overdo alert configuration, mainly as precaution. For example, we would like to know if a service is using a lot of memory or if its CPU is spiking so that we can react in time in case it develops into service disruption. In an attempt to have a preemptive encompassing view of the system, we sometimes configure too many alerts whose thresholds are unjustified. To justify an alert, we need to ask if the CPU or memory spike are really a concern. It's not uncommon for services to work a little harder at times. If the historical data shows spikes that resolve themselves and don't correspond to incidents, the alert should not be configured at all. Besides, If we are concerned with service disruption due to resource shortage we should consider automatic scaling, not setting more alerts.‍&lt;/p&gt;
&lt;h3&gt;
  
  
  Alert Configuration Cleanup
&lt;/h3&gt;

&lt;p&gt;Not setting more alerts is one part of prevention as the best medicine. Dealing with the ones that are already set is the other. Alerts that are already set should be examined through historical and operational lenses:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Have alerts been triggered many times before?&lt;/li&gt;
&lt;li&gt;Have they self-resolved?&lt;/li&gt;
&lt;li&gt;Have they really indicated some system or service disruption?&lt;/li&gt;
&lt;li&gt;Are they perhaps related to other more significant alerts?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These questions will help us discover alerts that can be removed to prevent alert fatigue.&lt;/p&gt;

&lt;p&gt;However, for mature and perhaps more complicated systems with many services and moving parts, going over hundreds of configured alerts and determining whether they are important or redundant can prove to be quite difficult and time consuming. There's always the question of ROI when coming to clean up alerts. If we spend a sprint on cleaning alerts it might benefit us down the line by reducing alert fatigue, but we just missed a sprint where we could deliver features and bug fixes. So the trade off is always there. We'd still argue that alert cleanup should take place, even if in small increments. While we are cleaning them up bit by bit, we can introduce an alert manager whose purpose is to provide additional protection against alert fatigue.‍&lt;/p&gt;
&lt;h2&gt;
  
  
  Alert Manager to Handle and Prevent Alert Fatigue
&lt;/h2&gt;

&lt;p&gt;An alert manager like Prometheus AlertManager provides a robust way to manage alerts. In the context of handling alert fatigue, the most significant aspect of an alert manager is its ability to group and inhibit alerts.‍&lt;/p&gt;
&lt;h3&gt;
  
  
  Correlating and grouping alerts
&lt;/h3&gt;

&lt;p&gt;Let's look at an example of correlating and grouping alerts. Imagine a scenario in which a datastore such as a database, search engine, or queue manager is reporting high CPU and memory consumption. Services that depend on this datastore might experience difficulties communicating with it. They too might trigger an alert indicating that they cannot communicate with the datastore. With Prometheus AlertManager, it is possible to configure that if there's an alert for the datastore resource consumption all other related alerts will be grouped together and sent to the same receiver. This way we can see the underlying cause and which services are affected.&lt;/p&gt;

&lt;p&gt;Given that alerts are properly labeled and configured to include the service name, team, cluster, region and any other attribute of significance, we can configure an alert as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;alertmanager.yaml:
global: {}
receivers:
  - name: 'default-receiver'
    webhook_configs:
      - url: 'http://example.com/webhook'
  - name: 'data-dev-receiver'
    webhook_configs:
      - url: 'http://example.com/webhook-data-dev'
route:
  group_interval: 5m
  group_wait: 10s
  receiver: default-receiver
  repeat_interval: 3h

  routes:
  - matchers:
      - team="data-dev"
    group_by: ['cluster', 'database']
    receiver: 'data-dev-receiver'
    continue: true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the routes directive, we are saying that any alert meant for the data-dev team should be grouped under the data-dev-receiver by cluster and database. To simulate, we can run these commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -H 'Content-Type: application/json' \
     -d '[{
           "labels": {
             "alertname": "DatabaseUnreachable",
             "team": "data-dev",
             "service": "aggregator",
             "database": "analytics",
             "cluster": "prod",
             "severity": "critical"
           }
         }]' \
     http://localhost:9093/api/v2/alerts

curl -H 'Content-Type: application/json' \
     -d '[{
           "labels": {
             "alertname": "DatabaseResourceConsumptionHigh",
             "team": "data-dev",
             "database": "analytics",
             "cluster": "prod",
             "severity": "critical"
           }
         }]' \
     http://localhost:9093/api/v2/alerts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is what it would look like in Alertmanager:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foy9m3yu9679jszlz0chk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foy9m3yu9679jszlz0chk.png" alt="Image description" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Inhibiting and Suppressing Alerts
&lt;/h3&gt;

&lt;p&gt;In other cases we want to suppress related alerts rather than group them with the underlying alert. For example, a database has the following alerts configured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Consuming too much resources&lt;/li&gt;
&lt;li&gt;Unreachable&lt;/li&gt;
&lt;li&gt;Connections about to max out&lt;/li&gt;
&lt;li&gt;File descriptors limit is about to be reached&lt;/li&gt;
&lt;li&gt;Queries take too long to execute&lt;/li&gt;
&lt;li&gt;Many failed queries&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Any of these alerts can be followed by the others. If an alert triggers for resource consumption, then an unreachable alert might also trigger. If queries take too long to execute, an alert for resource consumption might trigger as well.&lt;/p&gt;

&lt;p&gt;So any alert might be followed by other alerts which will create a cascade of incoming alerts when all is needed is just the one. Instead of having them all triggered one after another, Prometheus Alertmanager can be configured to suppress a subset of alerts if a specific alert is active. To handle this situation, we can create the following inhibit rules:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;inhibit_rules:

  - source_matchers:
      - alertname="DatabaseResourceConsumptionHigh"
    target_matchers:
      - alertname="DatabaseUnreachable"
    equal: ['database']

  - source_matchers:
      - alertname="DatabaseResourceConsumptionHigh"
    target_matchers:
      - alertname="DatabaseSlowQueries"
    equal: ['database']

  - source_matchers:
      - alertname="DatabaseResourceConsumptionHigh"
    target_matchers:
      - alertname="DatabaseFailedQueries"
    equal: ['database']
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We're saying that if there's an active alert for DatabaseResourceConsumptionHigh in the specific database, any DatabaseSlowQueries or DatabaseFailedQueries alert will be inhibited. Inhibited means that the alert will not trigger but will still be visible on demand. The idea here is if there's an alert on resource consumption, we already know that there's something going on with the database. We don't need to be paged for all other issues as they are probably related yet we still have them at hand for investigation.&lt;/p&gt;

&lt;p&gt;This is what it would look like in Alertmanager:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inhibited alerts don't appear when Inhibited is not selected&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsd0oi5qqi8aw2ejbe0fj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsd0oi5qqi8aw2ejbe0fj.png" alt="Image description" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Inhibited alerts appear when Inhibited is selected&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F166cnvp6oogx7gijmlmi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F166cnvp6oogx7gijmlmi.png" alt="Image description" width="800" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Personally Contributing to Prevent Alert Fatigue
&lt;/h2&gt;

&lt;p&gt;The ability to suppress alerts using Prometheus Alertmanager can also be used by the individual contributor to help prevent alert fatigue.‍&lt;/p&gt;

&lt;h3&gt;
  
  
  Taking Ownership of Alerts
&lt;/h3&gt;

&lt;p&gt;An engineer who is making changes to the system and knows that alerts might trigger, should use the AlertManager ability to silence and inhibit alerts. It goes without saying how disturbing it is to get a high number of alerts in the middle of a workday, especially when these alerts are false positives and could have been silenced in advance.&lt;/p&gt;

&lt;p&gt;If the alerts cannot be silenced because they are required as indicators during operational changes, the engineer should assume the PD or DOD shift. It might sound trivial but from personal experience we can testify that this is a common problem.‍&lt;/p&gt;

&lt;h2&gt;
  
  
  Handling Alert Fatigue is a Continuous Process
&lt;/h2&gt;

&lt;p&gt;Using an AlertManager and personally assuming responsibility over alerts are the two determining factors in handling alert fatigue. These will greatly reduce the number of alerts and pagers to mitigate and prevent alert fatigue. However, these are but the tools and methods to achieve the goal. They will have to be constantly used and reimplemented.&lt;/p&gt;

&lt;p&gt;It's important to remember that the effort to manage alerts and avoid alert fatigue is an endless process. As systems evolve their monitoring stack evolves as well. Currently configured alert handling rules might have been set up with oversight and should now be revisited. Think of it as a kind of monitoring for your monitoring. We have alerts and alerting rules in place but we have to always ask ourselves whether they are properly set and whether conditions have changed that merit revision of alerting rules.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>alertmanager</category>
      <category>prometheus</category>
      <category>monitoring</category>
    </item>
    <item>
      <title>From gatekeepers to service providers: reimagining DevOps relationship with developers</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Fri, 28 Feb 2025 19:27:44 +0000</pubDate>
      <link>https://dev.to/meteorops/from-gatekeepers-to-service-providers-reimagining-devops-relationship-with-developers-2k51</link>
      <guid>https://dev.to/meteorops/from-gatekeepers-to-service-providers-reimagining-devops-relationship-with-developers-2k51</guid>
      <description>&lt;p&gt;Most of us, if not all of us, are service providers. It doesn’t matter what position we are in. Each person in an organisation renders service onto the organisation, even the CEO and board of directors. Perhaps the only ones who are not service providers are the investors but perhaps they too render their service onto someone or something else. We begin with the notion of service providers because it is a crucial factor in the relationship between DevOps and developers. &lt;/p&gt;




&lt;h2&gt;
  
  
  The Relationship between DevOps and Developers
&lt;/h2&gt;

&lt;p&gt;The relationship between &lt;a href="https://www.meteorops.com/glossary/devops" rel="noopener noreferrer"&gt;DevOps&lt;/a&gt; and developers is as delicate and complicated as it is crucial for the whole organisation. Delicate because any friction causes setbacks in development. Complicated because it spans across many domains of knowledge and various requirements. And crucial for the whole organisation because if it’s not optimal, developers don’t deliver quality releases on time.&lt;/p&gt;

&lt;p&gt;Luckily, the core of the relationship between DevOps and developers is technological, so handling challenges and adaptations is technological in nature. There are ways, purely technological ones, to reduce the pressure DevOps face on the one hand, and on the other hand remove obstacles from the developers’ way and let them become independent in their work.&lt;/p&gt;




&lt;h2&gt;
  
  
  DevOps Are Service Providers but on a Different Scale
&lt;/h2&gt;

&lt;p&gt;DevOps is developer operations abbreviated, and it’s easy to see from this terminology that DevOps are the exclusive service providers for developers and should gear themselves accordingly. However, it appears that acting as service providers doesn’t come easy to DevOps due to the scale and complexity of what’s required from them as service providers. As a result, a lot of friction and dissonance exist between DevOps and developers in many organisations. To solve the friction and dissonance, we need to understand what type of service providers DevOps are, what’s the scale of their work, and how they can improve communication and operations to make their work, and the developers’ work, faster, easier, and more efficient.&lt;/p&gt;




&lt;h2&gt;
  
  
  Understanding the Role of DevOps as Service Providers
&lt;/h2&gt;

&lt;p&gt;Most service providers usually provide their services within well-defined scopes. Developers develop, analysts analyze, QA verify, and so on. It’s true that these can also have their own internal customers like team members that require assistance. However, in such cases, the service or assistance is still within the scope and on a relatively small scale.&lt;/p&gt;

&lt;p&gt;DevOps is a different story. Not only do DevOps have their own work cut out for them, they also support a lot of internal customers, and do so on a much larger scale and within a wider scope. DevOps assist RND people (and others such as analysts and sales engineers) in a wide variety of contexts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Assisting in setting up local environments.&lt;/li&gt;
&lt;li&gt;Granting roles and permissions to internal and external systems.&lt;/li&gt;
&lt;li&gt;Troubleshooting issues with databases, microservices, and deployments.&lt;/li&gt;
&lt;li&gt;Provisioning resources on demand.&lt;/li&gt;
&lt;li&gt;Consulting on scale and security.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem here is that DevOps are a lot of times unprepared and unequipped for it. Neither in terms of understanding who the customer is, what they need, and how to give it to them with resources and restrictions in mind, nor in terms of how to do it at scale.&lt;/p&gt;

&lt;p&gt;DevOps usually come into the job with a different mindset. A DevOps engineer probably sees themselves responsible for the development and maintenance of mainly the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Infra - &lt;a href="https://www.meteorops.com/glossary/cloud" rel="noopener noreferrer"&gt;Cloud&lt;/a&gt;, different environments.&lt;/li&gt;
&lt;li&gt;Monitoring, logging, and alerting systems.&lt;/li&gt;
&lt;li&gt;CI/CD infra.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most DevOps will agree that it’s okay to add to this list ongoing support for developers and other stakeholders over a wide variety of contexts, systems, frameworks, and platforms.&lt;/p&gt;

&lt;p&gt;However, it’s this very same wide variety of domains that DevOps support that prevents them from providing good service. When DevOps are unable to provide this service, it creates a lot of friction and dissonance between DevOps and developers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Dissonance Between DevOps and Developers
&lt;/h2&gt;

&lt;p&gt;There’s no shortage of dissonance and conflict between DevOps and developers. Let’s look at some real-life examples.&lt;/p&gt;

&lt;p&gt;What DevOps might say:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developers don’t do the bare minimum to solve issues themselves before turning to DevOps.&lt;/li&gt;
&lt;li&gt;Developers' requirements from DevOps are always the path of least resistance.&lt;/li&gt;
&lt;li&gt;Developers don’t develop with security and scale in mind.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the other hand, developers may want to counter-argue:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DevOps are not responsive enough, neither in terms of time to resolution nor deliverables.&lt;/li&gt;
&lt;li&gt;DevOps impede development and velocity through requirements and bureaucracy.&lt;/li&gt;
&lt;li&gt;DevOps don’t develop with developers in mind to assist them and facilitate their work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s easy to see the dissonance when these complaints are put side by side and one after another. Putting this conflict into structure reveals the disconnect and distance between what one side needs and what the other side is capable of providing. &lt;/p&gt;

&lt;p&gt;Since DevOps have their own duties to fulfill, adding to that extensive support adds pressure. This pressure could make DevOps compromise on the service they give because they are short on time and resources. So they expect developers to be self-sufficient and efficient when asking for support. What DevOps see as inefficiency in developers stems from the fact the developers are the most pressured entity in the organisation because they develop the product. When developers come across issues that prevent them from working they too are short on time and resources and need someone to assist them in a timely manner.&lt;/p&gt;

&lt;p&gt;This is reality nowadays in many organisations. Both developers and DevOps are short on time and resources, and when the former approaches the latter, the latter needs to halt their work and assist, or development will suffer setbacks. The goal is to change this reality and realign DevOps and developers towards better collaboration.&lt;/p&gt;




&lt;h2&gt;
  
  
  Realigning DevOps with Developers
&lt;/h2&gt;

&lt;p&gt;To realign DevOps with developers is not an easy task. The dissonance is not found in disagreement and differences. The dissonance is found in the fact that it’s not easy to change this reality and realign DevOps and developers towards better collaboration. It’s not enough to just tell DevOps that they are service providers and should do what they can to support developers in their day-to-day operations. It’s not enough to tell developers to approach DevOps like a customer requiring a service. &lt;/p&gt;

&lt;p&gt;What’s lacking here is not verbal agreement but a well-defined, well-implemented framework of methodologies to help the two sides communicate and collaborate clearly and efficiently. &lt;/p&gt;

&lt;p&gt;Moreover, the DevOps mindset must incorporate the idea that to provide services on a large scale, you need tools and you need to know what the customer needs, how and when. For DevOps, to provide services is to alleviate the pressure coming from developer needs and improve delivery. Let’s explore some ideas for improvement. &lt;/p&gt;




&lt;h2&gt;
  
  
  Communication Tools
&lt;/h2&gt;

&lt;p&gt;Most companies use some form of ChatOps such as Slack or Teams. A dedicated channel for requests from developers is the first step. However, if there’s no structured way to submit requests for support or resources, it can become unmanageable and unwieldy really quickly. Many requests can come in at once and each of them might be related to something different. &lt;br&gt;
To tackle this issue, it’s possible to install a request bot or request form in the dedicated channel. The request bot or form allows developers to submit requests in an orderly manner. It also allows DevOps to manage requests by queue and with more info and context to begin with. The form or bot should gather the following from the requester:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The nature of the problem - is it a request for support, a general question, or a request for resources?&lt;/li&gt;
&lt;li&gt;What’s the environment in question - is it local, dev, or production?&lt;/li&gt;
&lt;li&gt;The request itself - does the developer need to set up a new service, are they having issues with a service, do they need more permissions for internal and external systems?&lt;/li&gt;
&lt;li&gt;If it’s a service - what is the name of the service and its dependencies (storage, docker repos, git repos, databases)?&lt;/li&gt;
&lt;li&gt;If it’s a request for resources - quantitative measures such as CPU and memory and justification for adding resources.&lt;/li&gt;
&lt;li&gt;If it’s more permissions - justification for permissions.&lt;/li&gt;
&lt;li&gt;What steps were taken to try and troubleshoot - where applicable.&lt;/li&gt;
&lt;li&gt;Any additional information that might be relevant - logs, metrics, documentation.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;With the above in mind, now it’s time to see what can be automated or made self-serve to facilitate developer work by way of delegating and enabling.&lt;/p&gt;

&lt;h2&gt;
  
  
  Facilitating Developer Work
&lt;/h2&gt;

&lt;p&gt;Most customers would prefer to do things themselves. Especially developers who always work in fast-paced environments and within time constraints. We’ve started with communication tools and now want to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Gather info and analyze.&lt;/li&gt;
&lt;li&gt;Discover areas that are constant pain points for developers.&lt;/li&gt;
&lt;li&gt;Automate or delegate where possible.&lt;/li&gt;
&lt;li&gt;Rinse and repeat.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This way DevOps can automate repetitive tasks such as granting permissions or provisioning resources. DevOps must also think about protecting their own customers and not be too permissive. Clear boundaries need to be defined when permissions or resources are asked because, like we said, DevOps are usually adamant when it comes to security and scale.&lt;/p&gt;

&lt;p&gt;Beyond turning repetitive tasks to self-serve, DevOps should also strive towards making developers’ work as easy as possible. For a developer, an easier way to work could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Work - do everything on their own without needing anyone’s help.&lt;/li&gt;
&lt;li&gt;Develop - disposable dev environments that are quick to set up.&lt;/li&gt;
&lt;li&gt;CI/CD - easy to configure, easy to deploy, easy to revert.&lt;/li&gt;
&lt;li&gt;Panic time - clean, well-scoped, logs and metrics that are easy to search.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At this point we are almost three quarters of the way in. Now all that’s left is to make sure that customers are well-aware of what’s at their disposal. DevOps can and should plan for building a body of knowledge to assist and educate developers on how to make the best of what’s offered to them by DevOps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summarize, Refine, Document, Educate
&lt;/h2&gt;

&lt;p&gt;Once proper communication and procedures are in place to facilitate developer work, the body of knowledge should be assembled. Everything that doesn’t fall under automated requests and better troubleshooting and debugging tools should go into the body of knowledge. &lt;/p&gt;

&lt;p&gt;The body of knowledge consists of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Documentation&lt;/li&gt;
&lt;li&gt;How-Tos&lt;/li&gt;
&lt;li&gt;FAQs&lt;/li&gt;
&lt;li&gt;Workshops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Composing and maintaining a body of knowledge is probably one of the most challenging things DevOps can do. It’s easy for DevOps to configure automations but most don’t know how to write in a clear concise manner. In addition, most customers don’t really bother to read the docs and if they do they just skim through. Even if developers do make use of documentation and workshops, knowledge is changing fast and there’s a need to adjust and update the body of knowledge accordingly. To tackle this challenge DevOps can encourage developers to take an active part in maintaining the body knowledge. An engaged customer is most likely a self-sufficient one. Perhaps the most important aspect of imparting knowledge is workshops. Not only is it easier to learn and understand by doing rather than reading, it also brings DevOps and developers closer together and strengthens their relationship. &lt;/p&gt;




&lt;h2&gt;
  
  
  DevOps as a Service: Maintaining Relationships at Scale
&lt;/h2&gt;

&lt;p&gt;The first step towards improving the collaboration between DevOps and developers is understanding its scale. The scale is massive enough to put strain on the relationship, and like most scale issues, systems and procedures can be put in place and iteratively revised and improved to handle it.&lt;/p&gt;

&lt;p&gt;DevOps must acknowledge that they are service providers at scale. Being service providers at scale, they must understand their resources, capabilities, and limitations in assisting developers, and put a system or procedure in place to handle it for them. DevOps must empower developers and delegate more responsibilities to them, therefore relieving pressure off of themselves while giving developers more agency.&lt;/p&gt;

&lt;p&gt;Only when DevOps follow this set of principles will they be able to finally emerge as they have always meant to be: service providers at scale.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>automation</category>
      <category>developer</category>
      <category>operations</category>
    </item>
    <item>
      <title>Proper mindset for handling data and databases: between scaling and failing</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Fri, 28 Feb 2025 17:19:11 +0000</pubDate>
      <link>https://dev.to/meteorops/proper-mindset-for-handling-data-and-databases-between-scaling-and-failing-che</link>
      <guid>https://dev.to/meteorops/proper-mindset-for-handling-data-and-databases-between-scaling-and-failing-che</guid>
      <description>&lt;p&gt;Startups and software companies put a lot of effort into what languages to use, what tech stacks to employ, and what cloud to deploy the app to but fail to put the same focus on their data and databases&lt;/p&gt;

&lt;p&gt;Data is the core of the app and product from which everything is derived and upon which everything depends. Startups and software houses should put more thought into how they plan to gather, store, analyse, and use their data. Doing so could mark the difference between success and failure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Avoid Common Data Architecture Regrets
&lt;/h2&gt;

&lt;p&gt;Data is the raw material from which you mold your application. Anything else is just tools. If you treat your raw ingredients with proper care, the final recipe is bound to succeed.&lt;/p&gt;

&lt;p&gt;The tried and true methods of optimizing data are abundant. It’s well known that databases can be tweaked to better perform through various means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Indexing&lt;/strong&gt; - speed up searches that rely on primary keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Normalization&lt;/strong&gt; - keep data atomic and separate.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query optimization&lt;/strong&gt; - select only what you need, when you need it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partitioning&lt;/strong&gt; - divide large tables into smaller ones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So if it’s all tried and true and has been established, why do we highlight it as an overlooked part of many applications? That’s because there’s a long way to go from theory to practice. The list above only represents what can be done to optimize performance, not how and when to do so. Moreover, in a fast-paced environment of software development and especially in startups, proper planning for data is sometimes pushed aside in favor of rapid growth.&lt;/p&gt;




&lt;h2&gt;
  
  
  Plan Ahead for Your Data
&lt;/h2&gt;

&lt;p&gt;We’ve mentioned several ways to optimize database performance, but what we should really focus on is planning for the data. Questions like what sort of data it will be, what will be the format, and what sort of manipulations it will undergo. Perhaps even more rudimentary than the type and usage of data is the database itself and how it fits your application. &lt;/p&gt;




&lt;h3&gt;
  
  
  Get to Know the Database
&lt;/h3&gt;

&lt;p&gt;There are two main types of databases these days: relational (SQL) and document-based (NoSQL). &lt;/p&gt;

&lt;h4&gt;
  
  
  NoSQL:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;If your application needs to handle single yet flexible documents.&lt;/li&gt;
&lt;li&gt;If you predict large amounts of data that might be distributed and sharded.&lt;/li&gt;
&lt;li&gt;If you expect a lot of unstructured data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  SQL:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;If your application requires rigid, well-defined schemas and relations.&lt;/li&gt;
&lt;li&gt;If your application requires consistency throughout the datascape. &lt;/li&gt;
&lt;li&gt;If you intend to digest columnar data using big data tools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once you’ve chosen the database to work with, ask yourself again what your use case is. Inform yourself as to what others experienced working with MySQL, MariaDB, PostgreSQL, MongoDB, to name a few. Find the setbacks that others faced and see if at any point in the future you might face something similar.&lt;/p&gt;




&lt;h3&gt;
  
  
  Get to Know the Data and its Characteristics
&lt;/h3&gt;

&lt;p&gt;The way you design your data now will impact you in the future. It’s a hard task, but force yourself to think of what other functionality you have in store and plan to implement. See if the current data scheme and models allow easy integration of such functionality.&lt;/p&gt;

&lt;p&gt;Functionality implies data moving around and being updated constantly. Consider the behavior of your data:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you do a lot of writes but fewer reads, opt for throughput.&lt;/li&gt;
&lt;li&gt;If you do many reads but fewer writes,  opt for io and use caching.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Load and Stress the Data
&lt;/h2&gt;

&lt;p&gt;Data-related performance issues mostly hit you when you least expect them. The smooth functionality that you are used to is not attributed to the choice of database or data scheme. It’s mostly attributed to lack of load and stress on the database. This load and stress is what you should strive for.&lt;/p&gt;

&lt;p&gt;Again a hard task ahead that requires you to accept that tens of thousands of requests per minute can easily become millions. It’s easy to list and discuss ways to optimize data manipulation and retrieval, but no one gains experience and knowledge without trying.&lt;/p&gt;

&lt;p&gt;If you want to know if your data is well-structured and well-retrieved:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Try to write and read more than you imagine would be possible.&lt;/li&gt;
&lt;li&gt;Only when it stops working do we look under the hood to find and fix the problem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Like we said earlier, anything else is just tools. The data is the heart and core of the application and should be created like one: consistent, resilient, scalable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Protect Your Data
&lt;/h2&gt;

&lt;p&gt;With the efforts of choosing a database, data models, and optimizing their usage behind you, you should think about protecting your data. &lt;/p&gt;

&lt;p&gt;Protecting your data from bad actors goes without saying. It’s the internal actors that you need to shield the data from. Internal actors can be services and humans, and since humans make mistakes, so do services.&lt;/p&gt;

&lt;p&gt;Consider the following means of mitigating accidental service disruption or, worse yet, data loss or corruption:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Back up the data and plan for deploying from a snapshot.&lt;/li&gt;
&lt;li&gt;Limit access from the get-go.&lt;/li&gt;
&lt;li&gt;Reads from replicas, no human ever writes directly to the data.&lt;/li&gt;
&lt;li&gt;If a service is the owner or main user of a table or database, other services request data through internal APIs.&lt;/li&gt;
&lt;li&gt;Monitor the database CPU and memory and plan ahead in case you need to scale.&lt;/li&gt;
&lt;li&gt;Look for, kill, and find the source of long-running queries to detect misbehaving services.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Keep The Data Clean
&lt;/h2&gt;

&lt;p&gt;It’s not enough to protect your data. You also have to keep it nice and tidy. A lot of data accumulates through the product lifecycle and more often than not becomes stale. &lt;/p&gt;

&lt;p&gt;Modern hard drives are fast, reliable, and reach terabytes in volume, but that doesn’t mean you should fill them with data.&lt;/p&gt;

&lt;p&gt;Too much data puts strain on the disk and memory, not to mention that more data means longer queries, even with indexing.&lt;/p&gt;

&lt;p&gt;Consider the following as ways to keep your data clean:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Don’t do soft deletes.&lt;/li&gt;
&lt;li&gt;Scan and find least retrieved data and archive it.&lt;/li&gt;
&lt;li&gt;If you’re using PostgreSQL, use &lt;code&gt;vacuum&lt;/code&gt;. If you use MySQL, use &lt;code&gt;optimize&lt;/code&gt;. Do the same for any other database you use.&lt;/li&gt;
&lt;li&gt;Be wary of making changes – don’t add tables that duplicate data.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Keep Your Data in Mind
&lt;/h2&gt;

&lt;p&gt;Out of all the aspects and methods we discussed, there’s one conclusion to be drawn:&lt;br&gt;&lt;br&gt;
Data is the most important, most overlooked aspect of software development.&lt;/p&gt;

&lt;p&gt;To keep your data in mind means to consider all the pros and cons of choosing a database.&lt;br&gt;&lt;br&gt;
To keep your data in mind means you always check how data retrieval affects performance.&lt;br&gt;&lt;br&gt;
To keep your data in mind is to follow these principles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Choose the right database for the workload.&lt;/li&gt;
&lt;li&gt;Create indices and optimize queries.&lt;/li&gt;
&lt;li&gt;Optimize I/O through hardware adjustments and caching.&lt;/li&gt;
&lt;li&gt;Get rid of unnecessary data – no soft deletes.&lt;/li&gt;
&lt;li&gt;Check and check again that high volume doesn’t create bottlenecks.&lt;/li&gt;
&lt;li&gt;Back up your data and limit access.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Strive to apply the principles listed above because no matter what you do with your app or product, it’s almost always related to data.  &lt;/p&gt;

&lt;p&gt;Always keep in mind that Data is the foundation. When it’s well-maintained, the whole system benefits.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>architecture</category>
      <category>productivity</category>
      <category>database</category>
    </item>
    <item>
      <title>Practical Tips for Kubernetes Upgrades for Startups</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Mon, 10 Feb 2025 18:58:35 +0000</pubDate>
      <link>https://dev.to/meteorops/practical-tips-for-kubernetes-upgrades-for-startups-22df</link>
      <guid>https://dev.to/meteorops/practical-tips-for-kubernetes-upgrades-for-startups-22df</guid>
      <description>&lt;p&gt;&lt;em&gt;Upgrade Kubernetes with confidence: A step-by-step guide to ensure seamless updates, maintain stability, and avoid breaking changes.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  The all-too-popular Kubernetes upgrade-storm
&lt;/h1&gt;

&lt;p&gt;There comes a day when you get a notification that the current Kubernetes version that you are running is reaching its end of life. Best case scenario you open a ticket knowing full well that this ticket will either be pushed down the list of priorities or be forgotten completely. After all, you have other priorities such as releases and bug fixing. You are a fast-running startup that needs to bring in new business in order to grow. Upgrading Kubernetes is the least of your concerns right now.&lt;/p&gt;




&lt;h2&gt;
  
  
  You will have to upgrade eventually
&lt;/h2&gt;

&lt;p&gt;But the day finally comes when you need to upgrade and one of the following could be the trigger:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your Kubernetes version actually finally reached its end of life.&lt;/li&gt;
&lt;li&gt;RND management finally decided it was time to upgrade.&lt;/li&gt;
&lt;li&gt;You need to upgrade regardless of end of life because some critical components in your cluster need upgrading for a bug fix or a feature that you need.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You look up the helm chart or operator that is running in your cluster and realize that you cannot upgrade to the newer versions because they are incompatible with your current Kubernetes version. So you need to upgrade the cluster in order to upgrade the helm charts. And to top it all off, it has been decided to upgrade Kubernetes all the way to the latest versions and you find yourself needing to upgrade 4 versions forward.&lt;/p&gt;

&lt;p&gt;Every Kubernetes upgrade has the potential to introduce breaking changes. In most cases it’s deprecated APIs or APIs that moved from one API group to another. This will affect anything in your cluster that relies on these Kubernetes APIs. Now take this and do it four times. You need to carefully plan how to approach and execute the upgrade:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scope and price the process in terms of effort and time to completion.&lt;/li&gt;
&lt;li&gt;Create an upgrade plan and iterate over it by testing on lower environments.&lt;/li&gt;
&lt;li&gt;Set a maintenance window.&lt;/li&gt;
&lt;li&gt;Declare code freeze.&lt;/li&gt;
&lt;li&gt;Upgrade and verify.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a challenging process that will exhaust you. It is labor intensive and very error prone. If you upgrade a library in some microservice you can test it locally, the scope is almost always isolated to this specific micro service and in any case the blast radius is relatively small. But when you upgrade a Kubernetes cluster you are upgrading the entire system and anything going wrong could have serious consequences.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to approach the upgrade
&lt;/h2&gt;

&lt;p&gt;Before discussing ways to approach an upcoming upgrade we need to address the elephant in the room. Once you need to upgrade, you’re probably short on time, short on resources, and need to upgrade several versions forward while making sure that the app stack itself and everything else that runs on your Kubernetes cluster remains functional. That is not the way to go.&lt;/p&gt;

&lt;p&gt;What we derive from this situation is the first principle of how to approach the upgrade - upgrade small, upgrade continuous. Once we realize and implement this principle we can move on to what you need to do in order to successfully upgrade your Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Upgrade small, upgrade continuous.&lt;/strong&gt;  &lt;/p&gt;

&lt;p&gt;Once we realize and implement this principle, we can move on to what you need to do in order to successfully upgrade your Kubernetes cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  Upgrade small upgrade continuous
&lt;/h2&gt;

&lt;p&gt;Remember the day when you got the notification that your Kubernetes cluster versions reached its end of life? Well this is the day where you waste no time and put this task in the sprint.&lt;/p&gt;

&lt;p&gt;Opponents of this approach might say that a startup cannot afford to jump on every upgrade because there’s more pressing business to conduct. But a startup also cannot afford system instability. The longer the wait the more unstable the system might become and the upgrade will be harder and harder, especially if more than one upgrade is in question. So upgrade small, upgrade continuous. This goes for components in the cluster as much as the cluster itself.&lt;/p&gt;

&lt;p&gt;Keeping your helm charts, operators and controllers up to date will almost always guarantee that you will not have to upgrade them when upgrading your Kubernetes cluster. There is no doubt that a startup should think first how to bring in money. If the Kubernetes cluster upgrade competes with a feature that will bring in new business, the feature will almost always win.&lt;/p&gt;

&lt;p&gt;However, by insisting on upgrading small and keeping up to date, you provide yourself with breathing room for when a feature or a bug fix are really critical. You can allow yourself to skip the upgrade and focus on business because the end of life of your Kubernetes cluster version is farther down the road.&lt;/p&gt;




&lt;h2&gt;
  
  
  Test and verify on lower environments
&lt;/h2&gt;

&lt;p&gt;Another principle worth discussing is testing and verifying the upgrade on lower environments. The term lower environments obviously means dev and stg but in many cases dev and stg environment represent the app stack and less so the infrastructure.&lt;/p&gt;

&lt;p&gt;This means that when testing on lower environments they have to be identical to the production cluster that you are about to upgrade. As identical non-critical environments, you can allow yourself to make mistakes which are the best way to learn.&lt;/p&gt;

&lt;p&gt;Upgrading Kubernetes is a difficult task. Having the ability to try it out without fear of service disruption is liberating and will allow you to experiment more therefore better preparing yourself for the upgrade day. &lt;/p&gt;

&lt;p&gt;When you eventually test on lower environments, don’t just upgrade and settle for a working cluster. Remeber that lower environments are meant to represent the architecture and app stack of higher environments. Consider the following when upgrading consider the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Monitor the environments through metrics and logs to check for anything suspicious or out of the ordinary:

&lt;ul&gt;
&lt;li&gt;A critical component fails to be scheduled - pods in crash CrashLoopBackOff, pods failing to satisfy liveness and readiness probe, nodes don’t scale when the cluster is loaded, etc.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kube-proxy&lt;/code&gt; is not alive and well and services cannot talk to each other.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kube-dns&lt;/code&gt; is not alive and well and services fail to resolve host names.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Run e2e tests on your app stack. Verifying that your app stack functions as it should in an upgraded environments will give extra confidence that the upgrade is going well:

&lt;ul&gt;
&lt;li&gt;Run e2e tests.&lt;/li&gt;
&lt;li&gt;Run integration tests.&lt;/li&gt;
&lt;li&gt;Run load tests to verify that deployments scale accordingly.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;There’s a caveat though in this approach that we need to address. You have to provision and maintain these environments which means more resources allocated to the upgrade process even if it’s not in the pipeline. But there’s no better approach. It’s a “measure twice cut once” and “invest money not time” rolled into one.&lt;/p&gt;

&lt;p&gt;Try out the upgrade first and then execute and do it on preallocated environments. The stability you will achieve will contribute to the overall health of the system and the organization itself. No amount of money can compensate for dev teams overworked by system instability. It could also prove a source of churn where instability drives away clients and even prospects.&lt;/p&gt;




&lt;h2&gt;
  
  
  Now it’s time to upgrade
&lt;/h2&gt;

&lt;p&gt;Let’s assume that we are in an ideal world where you have your lower environments ready and well-maintained and you have allocated time and resources for continuous upgrades. How do you prepare for an upgrade? There are several things you need to do.&lt;/p&gt;

&lt;p&gt;First of all, you have to thoroughly read the release notes. And it doesn’t mean scrolling through them but reading them line by line. It’s a time consuming task but it follows the principle of “measure twice cut once”. A lot of what you will read won’t be relevant. A lot of what you will read will be invaluable. Dedicate time and patience to this task. Tutorials and guides are obviously welcome but try to remember that not all environments are alike. &lt;/p&gt;

&lt;p&gt;Now that you have a sense of what’s heading your way in terms of the effort put into research, you could use an automated process to give you a head start. You can find exactly that in &lt;a href="https://github.com/kubepug/kubepug" rel="noopener noreferrer"&gt;kubepug which is an open-source Kubernetes pre-upgrade checker&lt;/a&gt;. What you can and should do with kubepug:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Run kubepug against your current Kubernetes cluster version to get the following:

&lt;ul&gt;
&lt;li&gt;A list of deprecated APIs.&lt;/li&gt;
&lt;li&gt;Any objects affected by API changes.&lt;/li&gt;
&lt;li&gt;What APIs should be used instead of deprecated ones.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;If all goes well and we wish you that it will, run kubepug once more because it is also capable of verifying current versions.&lt;/li&gt;

&lt;li&gt;If all goes well and we wish you that it will, run kubepug once more because it is also capable of verifying current versions.&lt;/li&gt;

&lt;li&gt;Trust kubepug but verify that everything that it drew you attention to was indeed upgraded or replaced and that it’s consistent with the release notes.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Once you gather the information and mode of operation from release notes and guides, go look at your Kubernetes cluster and find everything that both exists in your cluster and is referenced in the information you gathered. The match between the two is the basis for your upgrade plan.&lt;/p&gt;




&lt;h2&gt;
  
  
  Automate and summarize
&lt;/h2&gt;

&lt;p&gt;We’ve mentioned a few times that the upgrade process, especially everything that precedes it, is very arduous and time consuming. This is where you automate the discovery and summary process.&lt;/p&gt;

&lt;p&gt;Use LLMs to summarize and highlight information that you gathered and other tools to scan, analyze and inform you on changes between versions. Another aspect of the upgrade process is to compile, document and implement the upgrade process itself. Laying down the foundations of upgrading small and continuously is perhaps the most important aspect even more than the upgrade itself.&lt;/p&gt;

&lt;p&gt;It’s true that the goal is the eventual Kubernetes cluster upgrade, but how it’s carried out will determine the measure of peace of mind that you will have when approaching this important task.&lt;/p&gt;




&lt;h2&gt;
  
  
  Yours is a startup and should start well
&lt;/h2&gt;

&lt;p&gt;Kubernetes is one of the best things to have happened to the tech industry. By using it, your startup avoids the pain of having to provision your own orchestrator. Take into account that the time you save for using Kubernetes rather than maintaining your own solution, is time to invest in paying respects back to Kubernetes.&lt;/p&gt;

&lt;p&gt;And to do that you need to consider that for Kubernetes to continue serving you it needs to be up to date and well-maintained. Then and only then will it guarantee the highest level of stability. And a stable infra for a startup is priceless as it allows you to grow and rarely holds you back. &lt;/p&gt;

&lt;p&gt;So for all your successful upgrades to come, adopt the mindset that we are trying to convey. &lt;/p&gt;

&lt;p&gt;Give the Kubernetes upgrade its place in the development pipeline. Like we highlighted, an upgraded, well-maintained cluster is an invaluable resource. Small to medium effort every now and then is better than an out-of-the-blue urgent upgrade.&lt;/p&gt;

&lt;p&gt;Be ahead of the upgrade. Don’t wait for it to come to you. Seek it proactively and open a ticket with a due date. Add yourself a calendar reminder. Allow yourself time to educate and prepare yourself. Yes there are automated tools like kubepug that we mentioned but we need to know how to use these tools and rely on them to the extent that they don’t have the final say.&lt;/p&gt;

&lt;p&gt;Test on lower environments and verify and validate by looking at metrics and logs. Validate further by making sure that the app stack functions as it should.&lt;/p&gt;

&lt;p&gt;These principles don't guarantee smooth upgrades as the unexpected is almost always bound to happen. However, they do guarantee successful upgrades that will instill in you greater confidence for the current upgrade as well as future upgrade. You’re a startup and adopting these principles and mindset will prove itself not only when upgrading your cluster, but in anything that you set out your startup to become and achieve.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>cloudcomputing</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Deploy a Kubernetes App &amp; AWS Resources using Crossplane on Kubernetes: Part 2</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Thu, 31 Oct 2024 12:19:56 +0000</pubDate>
      <link>https://dev.to/meteorops/deploy-a-kubernetes-app-aws-resources-using-crossplane-on-kubernetes-part-2-2m51</link>
      <guid>https://dev.to/meteorops/deploy-a-kubernetes-app-aws-resources-using-crossplane-on-kubernetes-part-2-2m51</guid>
      <description>&lt;h2&gt;
  
  
  To properly enjoy this article
&lt;/h2&gt;

&lt;p&gt;This tutorial assumes you already followed the steps in part 1: &lt;a href="https://www.meteorops.com/blog/deploy-aws-resources-using-crossplane-on-kubernetes" rel="noopener noreferrer"&gt;&lt;em&gt;Deploy AWS Resources using Crossplane on Kubernetes&lt;/em&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Also, this is the Github Repository we’ll be using:&lt;br&gt;
&lt;a href="https://github.com/MeteorOps/crossplane-aws-provider-bootstrap" rel="noopener noreferrer"&gt;https://github.com/MeteorOps/crossplane-aws-provider-bootstrap&lt;/a&gt;&lt;/p&gt;


&lt;h2&gt;
  
  
  What’s this article about?
&lt;/h2&gt;

&lt;p&gt;In this article, we’ll cover a use-case that can benefit from Crossplane: full environment deployment.&lt;/p&gt;

&lt;p&gt;This is a step-by-step guide with an example and a Git repository, so by the end of it, you should be able to deploy a sample env.&lt;/p&gt;

&lt;p&gt;You can technically walkthrough the entire thing by "copy-paste" and everything should work. But, diving into the explanations with an extra 5–10 minutes will leave you with longer-term value.&lt;/p&gt;

&lt;p&gt;Hope you enjoy!&lt;/p&gt;


&lt;h2&gt;
  
  
  What to expect from this article?
&lt;/h2&gt;

&lt;p&gt;By the end of it, you’ll understand:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;How Crossplane can be used for full environment deployment&lt;/li&gt;
&lt;li&gt;How to deploy a sample app with AWS resources&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  What not to expect?
&lt;/h3&gt;

&lt;p&gt;This article guides you through a simple application deployment, and not a full set of apps.&lt;/p&gt;

&lt;p&gt;It also doesn’t go into using Crossplane in conjunction with Helm, but does cover important principles regarding it.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Crossplane for the Full Environment Use-Case?
&lt;/h2&gt;
&lt;h3&gt;
  
  
  When you want to deploy a full environment, it usually involves 3 layers:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Resources the application needs to run well&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Application&lt;/strong&gt;: The programs built by the company to serve users&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data&lt;/strong&gt;: The data the application uses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But you already know that.&lt;/p&gt;
&lt;h3&gt;
  
  
  The thing is a tradition developed, and Crossplane sort of broke this tradition.
&lt;/h3&gt;

&lt;p&gt;The tradition was this process: &lt;em&gt;Build infrastructure, Deploy application on top.&lt;/em&gt;&lt;br&gt;
How did Crossplane break this tradition?&lt;br&gt;
The application deployment can now provision infrastructure required by the application.&lt;/p&gt;
&lt;h3&gt;
  
  
  Pull-Request Environments are also easier
&lt;/h3&gt;

&lt;p&gt;By creating a namespace with all of the apps and the AWS resources required with Crossplane, the use-case of &lt;a href="https://www.meteorops.com/blog/the-cto-devops-handbook-simple-principles-and-examples#bonus-an-example-setup-for-a-cto-approaching-production" rel="noopener noreferrer"&gt;creating a full environment per Pull-Request&lt;/a&gt; as part of the CI becomes much easier.&lt;/p&gt;

&lt;p&gt;That's a nice benefit of such setup for companies utilizing the feature-branch or Gitflow approaches.&lt;/p&gt;


&lt;h2&gt;
  
  
  A Traditional Full Env Example
&lt;/h2&gt;

&lt;p&gt;To provision and &lt;a href="https://www.meteorops.com/blog/one-click-environment-the-ultimate-devops-goal" rel="noopener noreferrer"&gt;deploy a full environment&lt;/a&gt; in the past, the process would generally look something like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Provision VPC+EKS+... using Terraform&lt;/li&gt;
&lt;li&gt;Use Terraform to bootstrap the cluster with a CD tool (e.g., ArgoCD)&lt;/li&gt;
&lt;li&gt;ArgoCD looks at a repo that deploys all apps from there&lt;/li&gt;
&lt;li&gt;An application needs a new S3 Bucket, so the developer writes Terraform code for it&lt;/li&gt;
&lt;li&gt;The application gets removed after a while (but the bucket stays)&lt;/li&gt;
&lt;li&gt;Someone needs to remember that bucket was owned by that app and remove it from Terraform&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  A Crossplane Full Env Example
&lt;/h2&gt;

&lt;p&gt;To provision and deploy a full environment with Crossplane the process is similar (we still need a Kubernetes Cluster to start with for the initial environment):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Provision VPC+EKS+... using Terraform&lt;/li&gt;
&lt;li&gt;Deploy Crossplane’s prerequisites to the cluster with Terraform&lt;/li&gt;
&lt;li&gt;Add Crossplane resources to application Helm Charts (so they get their required infra upon deployment)&lt;/li&gt;
&lt;li&gt;Create a Crossplane manifest to deploy the Helm Charts + Some shared infra required by all apps&lt;/li&gt;
&lt;li&gt;When an application is removed, its AWS resources are gone with it&lt;/li&gt;
&lt;li&gt;When an entire environment is terminated, its AWS resources are gone with it&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Crossplane in Helm vs. Helm in Crossplane
&lt;/h2&gt;

&lt;p&gt;When using Crossplane alongside Helm, the question arises:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Should Helm apply the Crossplane code? Or, should Crossplane apply the Helm Charts?&lt;br&gt;
I'm glad you asked it - the answer is both, depends when.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h3&gt;
  
  
  Reasons for Crossplane in Helm:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create or modify app-specific resources when that app is deployed&lt;/li&gt;
&lt;li&gt;Delete app-specific resources when that app is deleted&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Reasons for Helm in Crossplane:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Manage dependencies between resources and applications using Crossplane&lt;/li&gt;
&lt;li&gt;Create shared resources that are not owned by a single application&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  The Step-by-Step Guide
&lt;/h2&gt;

&lt;p&gt;Deploy the simple application alongside a S3 bucket using a Crossplane Composite Application.&lt;/p&gt;
&lt;h3&gt;
  
  
  Before proceeding
&lt;/h3&gt;

&lt;p&gt;Make sure you follow the steps in the 1st article (takes 3-minutes to just copy-paste the code snippets into your terminal and run the entire thing).&lt;/p&gt;


&lt;h3&gt;
  
  
  Deploy the Crossplane Kubernetes Provider
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Prepare the AWS Credentials for the Application to be able to use AWS&lt;/strong&gt;&lt;br&gt;
Run the following oneliner to create the Secret containing the AWS credentials in the right format as required by the Application (the application will simply run &lt;code&gt;aws s3 ls&lt;/code&gt; to show the bucket):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl create secret generic aws-creds &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;aws_access_key_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; aws_access_key_id creds | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' = '&lt;/span&gt; &lt;span class="s1"&gt;'{print $2}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="nt"&gt;--from-literal&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;aws_secret_access_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-i&lt;/span&gt; aws_secret_access_key creds | &lt;span class="nb"&gt;awk&lt;/span&gt; &lt;span class="nt"&gt;-F&lt;/span&gt;&lt;span class="s1"&gt;' = '&lt;/span&gt; &lt;span class="s1"&gt;'{print $2}'&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure it was created as expected by fetching the secret:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hxj0r5ipwxqsvxpbotb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hxj0r5ipwxqsvxpbotb.png" alt="Image description" width="742" height="55"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Deploy the Crossplane Kubernetes Provider resources using the k8s-provider-bootstrap.yaml file&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; k8s-provider-bootstrap.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Make sure the provider was created and is ready before proceeding to the next steps:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get providers provider-kubernetes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkwy376ar5myo1tqaxmz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frkwy376ar5myo1tqaxmz.png" alt="Image description" width="800" height="51"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Deploy the Crossplane Kubernetes Provider Configuration using the &lt;code&gt;k8s-provider-conf.yaml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; k8s-provider-conf.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is done separately as it needs to happen after the Provider resources were created.&lt;/p&gt;

&lt;p&gt;This is where we tell the Crossplane Kubernetes Provider in which Kubernetes cluster it should operate when it’s creating resources.&lt;/p&gt;




&lt;h3&gt;
  
  
  Create a deployable unit of an App &amp;amp; AWS Resources using Crossplane
&lt;/h3&gt;

&lt;p&gt;Here we do 3 things with 3 files:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The &lt;code&gt;composite-app-xrd&lt;/code&gt; file:&lt;/strong&gt;&lt;br&gt;
Contains the CompositeResourceDefinition (XRD) for the K8sApplication by using the Composition of a K8s Deployment and S3 Bucket (described below)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The &lt;code&gt;composite-app-composition&lt;/code&gt; file:&lt;/strong&gt;&lt;br&gt;
Contains the Composition definition which creates both the Kubernetes Deployment and the S3 Bucket&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The &lt;code&gt;composite-app-example&lt;/code&gt; file:&lt;/strong&gt;&lt;br&gt;
Calls the CompositeResource defined by &lt;code&gt;composite-app-xrd&lt;/code&gt; file&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h3&gt;
  
  
  Crossplane Resources Files Breakdown &amp;amp; Creation
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;composite-app-xrd.yaml&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
&lt;strong&gt;~ K8sApplication CompositeResourceDefinition&lt;/strong&gt;&lt;br&gt;
This defines a composite resource for a Kubernetes application, with &lt;code&gt;bucketName&lt;/code&gt; and &lt;code&gt;bucketRegion&lt;/code&gt; fields in the spec. Users can claim this resource as K8sApplication.&lt;br&gt;&lt;br&gt;
The K8sApplication CompositeResource (XRD) accepts the &lt;code&gt;bucketName&lt;/code&gt; &amp;amp; &lt;code&gt;bucketRegion&lt;/code&gt; fields and uses them to create an S3 Bucket, and to create a K8s Deployment of a mock “service” that simply runs &lt;code&gt;aws s3 ls&lt;/code&gt; to see the bucket.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;~ Deploy the CompositeResourceDefinition (XRD)&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; composite-app-xrd.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;composite-app-composition.yaml&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Defines a Composition of resources that can be created by a CompositeResource.&lt;/p&gt;

&lt;p&gt;This is where we define the Composition that creates a combo of a Kubernetes Deployment with the mock “service” that runs &lt;code&gt;aws s3 ls&lt;/code&gt; as well as the S3 bucket — The CompositeResource simply calls this resource.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;~ Deploy the Composition&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; composite-app-composition.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;composite-app-example.yaml&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Deploys the actual K8sApplication CompositeResource, and passes the details of the region in which the bucket should be created, and the name of the bucket (both are also passed to the Kubernetes Deployment as environment variables that helps it access the same bucket).&lt;/p&gt;

&lt;p&gt;As mentioned above, the CompositeResource calls the Composition which creates the resources using the Crossplane providers.&lt;/p&gt;

&lt;p&gt;Deploy the app by running the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl apply &lt;span class="nt"&gt;-f&lt;/span&gt; composite-app-example.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Look at your pretty Application
&lt;/h3&gt;

&lt;p&gt;Fetch the K8sApplication resource you’ve just created by running the below command obsessively until it’s marked as &lt;code&gt;Healthy&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl get K8sApplication
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should get something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fazrulo1bvz0xoss2uezx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fazrulo1bvz0xoss2uezx.png" alt="Image description" width="687" height="98"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Print the logs of the application and see it fetching the AWS S3 Bucket:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl logs &lt;span class="nt"&gt;-l&lt;/span&gt; &lt;span class="nv"&gt;app&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;awscli
&lt;span class="c"&gt;# 2024-10-17 16:00:31 my-app-bucket-nqzhx-xzjcq&lt;/span&gt;
&lt;span class="c"&gt;# 2024-10-17 16:00:50 my-app-bucket-nqzhx-xzjcq&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Cleanup
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;kubectl delete &lt;span class="nt"&gt;-f&lt;/span&gt; composite-app-example.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Recap
&lt;/h3&gt;

&lt;p&gt;To briefly recap what you did here:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Prepared Crossplane for deploying a mix of Kubernetes and AWS resources&lt;/li&gt;
&lt;li&gt;Defined the manifests required to deploy an app built of a Deployment and a S3 Bucket&lt;/li&gt;
&lt;li&gt;Sharpened your grasp on some Crossplane concepts&lt;/li&gt;
&lt;li&gt;Discussed some use-cases for which it’s useful&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hope you enjoyed this article, and if you are interested in another article about something related (or unrelated), please convince Michael it’s a good idea at &lt;a href="mailto:michael@meteorops.com"&gt;michael@meteorops.com&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;strong&gt;Disclaimer:&lt;/strong&gt; In actual environments or production, it’s essential to fine-tune the permissions in the different manifests. Instead of using access keys and secret keys directly, consider implementing IAM Roles for Service Accounts (IRSA) to manage permissions more securely.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>kubernetes</category>
      <category>devops</category>
      <category>crossplane</category>
    </item>
    <item>
      <title>Deploy AWS Resources using Crossplane on Kubernetes</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Wed, 18 Sep 2024 23:50:11 +0000</pubDate>
      <link>https://dev.to/meteorops/deploy-aws-resources-using-crossplane-on-kubernetes-39i1</link>
      <guid>https://dev.to/meteorops/deploy-aws-resources-using-crossplane-on-kubernetes-39i1</guid>
      <description>&lt;p&gt;In this article we will be talking about Crossplane as an Infrastructure as Code (IaC) tool that is running on Kubernetes, why should we use it and how you configure AWS provider to start creating resources, we will be going through a full step by step example for you to be able to create your first resource with Crossplane&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Who is this article for?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DevOps engineers interested in learning another IaC tool&lt;/li&gt;
&lt;li&gt;Developers that want to take more Ops responsibility and provision their own infrastructure&lt;/li&gt;
&lt;li&gt;Engineering managers that are looking to implement an IaC tool in their company/startup&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why am I writing this?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I had some discussions with engineers that had some trouble to get started with Crossplane, it may be a little less straightforward than a well established tool like Terraform, some documentation isn’t precise for different use cases and providers and even ChatGPT’s code doesn’t seem to work at times. And here I am saving the day to make your life easier by giving you a step by step guide where you install and configure everything and deploy your first AWS resource using Crossplane.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why should you even use Crossplane then?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There are certain use cases where Crossplane provides very powerful capabilities being able to create both applications and cloud resources, those can be used for ephemeral environments for example or for having a SaaS company provide full environments that could be self created by a tenant. Those environments could be created by just applying a Kubernetes manifest which is much simpler than starting to run traditional IaC plan and apply commands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How this article works&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;We prepared a &lt;a href="https://github.com/MeteorOps/crossplane-aws-provider-bootstrap" rel="noopener noreferrer"&gt;repository with resources to deploy everything&lt;/a&gt; needed.&lt;/li&gt;
&lt;li&gt;We explain on each resource what it does.&lt;/li&gt;
&lt;li&gt;We walk you through how to deploy Crossplane.&lt;/li&gt;
&lt;li&gt;We deploy a S3 Bucket to make sure everything works.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Clone the repository and step into it&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/MeteorOps/crossplane-aws-provider-bootstrap.git cd crossplane-aws-provider-bootstrap
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;2. Make sure you have the required CLIs:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;a href="https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html" rel="noopener noreferrer"&gt;Install the AWS CLI &amp;amp; Authenticate it with your AWS Account&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://pwittrock.github.io/docs/tasks/tools/install-kubectl/" rel="noopener noreferrer"&gt;Install the Kubectl CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://helm.sh/docs/intro/install/" rel="noopener noreferrer"&gt;Install the Helm CLI&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;An existing Kubernetes cluster (&lt;a href="https://kind.sigs.k8s.io/docs/user/quick-start/" rel="noopener noreferrer"&gt;we’ll be using &lt;em&gt;kind&lt;/em&gt;&lt;/a&gt;)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Optional: Start a local kind cluster&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;brew install kind

kind create cluster

open /Applications/Docker.app

kubectl cluster-info --context kind-kind
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Repository Overview
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Link to the Github Repository: &lt;a href="https://github.com/MeteorOps/crossplane-aws-provider-bootstrap.git" rel="noopener noreferrer"&gt;https://github.com/MeteorOps/crossplane-aws-provider-bootstrap.git&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;creds&lt;/code&gt; file‍
AWS credentials - should be filled with your own AWS credentials&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;crossplane-provider-conf&lt;/code&gt; file
‍Uses the creds file to create a Crossplane ProviderConfig (separated into a different file because it takes time for this resource to be ready)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;crossplane-provider-bootstrap&lt;/code&gt; file
‍Creates the Crossplane AWS Provider, which enables creating AWS resources using Crossplane (and its dependencies):ServiceAccount, DeploymentRuntimeConfig, Provider, ClusterRole &amp;amp; ClusterRoleBindings&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bucket-definitions&lt;/code&gt; &amp;amp; &lt;code&gt;bucket-crd&lt;/code&gt; files
‍The Kubernetes Crossplane manifests that create a CompositeResourceDefinition and a Composition resource, which together define how to create a S3 Bucket (like a Terraform Module would).
Note: The ‘Composition’ resource relies on the ‘CompositeResourceDefinition’.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bucket-example&lt;/code&gt; file
‍The Kubernetes Crossplane manifest we’ll apply at the end to create a S3 bucket using Crossplane&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Deploy Crossplane
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Fill the creds file with your AWS access keys&lt;/strong&gt;&lt;br&gt;
Get your AWS IAM User (not an SSO user as it requires a token to work) access keys and fill them in the credentials file&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;NOTE&lt;/strong&gt;: for production usage, please create a Crossplane IAM user and use its access keys, or preferably use something like IRSA&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Deploy the Crossplane Helm Chart&lt;/strong&gt;&lt;br&gt;
Add the Helm repository from which the Crossplane Helm Charts will be fetched&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add crossplane-stable https://charts.crossplane.io/stable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Deploy Crossplane on your Kubernetes cluster in a new namespace named &lt;code&gt;crossplane-system&lt;/code&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install crossplane crossplane-stable/crossplane --namespace crossplane-system --create-namespace
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;3. Examine your Crossplane Deployment&lt;/strong&gt;&lt;br&gt;
‍&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command to get Crossplane's pods:&lt;/p&gt;

&lt;p&gt;kubectl get pods -n crossplane-system&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then, you should see 2 pods: crossplane &amp;amp; crossplane-rbac-manager&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxqutjua7awsf5f05pe5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faxqutjua7awsf5f05pe5.png" alt="Image description" width="512" height="54"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Provide Crossplane AWS access by creating a Kubernetes Secret&lt;/strong&gt;&lt;br&gt;
Insert your AWS credentials to the creds file and run the following from the same folder:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create secret generic aws-credentials -n crossplane-system --from-file=creds=./creds
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Make sure the secret was created as expected:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command:&lt;/p&gt;

&lt;p&gt;kubectl get secret aws-credentials -n crossplane-system&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should see the aws-credentials secret:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0fmmxtlfwbfnuhbedw1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg0fmmxtlfwbfnuhbedw1.png" alt="Image description" width="284" height="35"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Deploy the Crossplane AWS Provider&lt;/strong&gt;&lt;br&gt;
‍Creating a Crossplane AWS Provider requires creating a bunch of resources: ServiceAccount, DeploymentRuntimeConfig, Provider, ClusterRole &amp;amp; ClusterRoleBindings, and ProviderConfig&lt;br&gt;
We divided the resources creation into 2 phases:&lt;/p&gt;

&lt;p&gt;1 - &lt;code&gt;crossplane-provider-bootstrap.yaml&lt;/code&gt;:&lt;br&gt;
ServiceAccount, DeploymentRuntimeConfig, Provider, ClusterRole &amp;amp; ClusterRoleBindings&lt;/p&gt;

&lt;p&gt;2- &lt;code&gt;crossplane-provider-conf.yaml&lt;/code&gt;:&lt;br&gt;
ProviderConfig&lt;/p&gt;

&lt;p&gt;The reason for dividing it into 2 phases is that the creation of the ProviderConfig fails if we attempt to create it before the first set of Provider resources and dependencies is ready&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create the Provider Kubernetes resources using the bootstrap YAML file:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command:&lt;/p&gt;

&lt;p&gt;kubectl apply -f crossplane-provider-bootstrap.yaml&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Validate the creation readiness of the Provider &amp;amp; wait for it to be ready:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command to see the AWS Provider resource:&lt;/p&gt;

&lt;p&gt;kubectl get provider&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You should see something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0bkbbbz0lj0fdz3rp84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff0bkbbbz0lj0fdz3rp84.png" alt="Image description" width="284" height="35"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It might take 1-2 minutes to become Healthy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create the ProviderConfig resource &amp;amp; Validate its creation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command:&lt;/p&gt;

&lt;p&gt;kubectl apply -f crossplane-provider-conf.yaml &amp;amp;&amp;amp; kubectl get providerconfig&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f7xtqpzju4148lkpk93.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0f7xtqpzju4148lkpk93.png" alt="Image description" width="135" height="38"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Create a S3 Bucket using Crossplane
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Create the CompositeResourceDefinition to define a S3 Bucket:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command:&lt;/p&gt;

&lt;p&gt;kubectl apply -f bucket-definitions.yaml&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve1toh0l0zuhuodmh73q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve1toh0l0zuhuodmh73q.png" alt="Image description" width="800" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create the Composition to define a S3 Bucket:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Run the following command:&lt;/p&gt;

&lt;p&gt;kubectl apply -f bucket-crd.yaml&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yzsqfsbk6uutpziqte4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5yzsqfsbk6uutpziqte4.png" alt="Image description" width="800" height="80"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Create the S3 Bucket Crossplane resource in Kubernetes:&lt;/strong&gt; &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl apply -f bucket-example.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When we installed the AWS Provider, it was installed with some Crossplane CRDs of the AWS Provider.&lt;br&gt;
One of those CRDs is ‘bucket’.&lt;/p&gt;

&lt;p&gt;Now we can check if the bucket was created by running &lt;code&gt;kubectl get bucket&lt;/code&gt; against our Kubernetes cluster&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyltq64j5y2gy8pmhp33.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmyltq64j5y2gy8pmhp33.png" alt="Image description" width="346" height="37"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Check if the S3 Bucket was created in AWS:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;‍List you AWS S3 buckets and search for the newly created one:&lt;/p&gt;

&lt;p&gt;aws s3 ls&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5beu0dr209l46wnopggm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5beu0dr209l46wnopggm.png" alt="Image description" width="315" height="24"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Teardown &amp;amp; Cleanup
&lt;/h2&gt;

&lt;p&gt;We’ll start by deleting the S3 Bucket Crossplane resource in Kubernetes, which will end up deleting the resource in AWS.&lt;br&gt;
Eventually, if we used &lt;code&gt;kind&lt;/code&gt; to spin up a local Kubernetes cluster, we’ll terminate the cluster to keep our workstation nice and clean.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Delete the S3 Bucket resource:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl delete -f bucket-example.yaml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;If used kind delete the cluster:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kind delete cluster
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  Useful Debugging Commands
&lt;/h2&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get provider


kubectl logs -n crossplane-system deploy/crossplane -c crossplane


kubectl logs -n crossplane-system -l pkg.crossplane.io/provider=provider-aws
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>crossplane</category>
      <category>aws</category>
      <category>kubernetes</category>
      <category>devops</category>
    </item>
    <item>
      <title>Terraform Starter Boilerplate for GCP using Terragrunt</title>
      <dc:creator>Arthur Azrieli</dc:creator>
      <pubDate>Fri, 26 Apr 2024 11:16:24 +0000</pubDate>
      <link>https://dev.to/meteorops/terraform-starter-boilerplate-for-gcp-using-terragrunt-5efg</link>
      <guid>https://dev.to/meteorops/terraform-starter-boilerplate-for-gcp-using-terragrunt-5efg</guid>
      <description>&lt;h2&gt;
  
  
  The Boilerplate Github Repositories (stars are welcome ⭐)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;What you deploy: &lt;a href="https://github.com/MeteorOps/terragrunt-gcp-projects"&gt;https://github.com/MeteorOps/terragrunt-gcp-projects&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Modules you can use: &lt;a href="https://github.com/MeteorOps/terraform-gcp-modules"&gt;https://github.com/MeteorOps/terraform-gcp-modules&lt;/a&gt;
‍&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Terraform mistakes that made me build this boilerplate
&lt;/h2&gt;

&lt;p&gt;I built this boilerplate for a reason.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I saw what companies regret with how they implemented Terraform:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing all of the Terraform code in one &lt;code&gt;main.tf&lt;/code&gt; file&lt;/li&gt;
&lt;li&gt;Copy-pasting resources manually&lt;/li&gt;
&lt;li&gt;Copy-pasting configuration throughout the codebase&lt;/li&gt;
&lt;li&gt;No state separation and environment awareness&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This is why they regretted the above:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One terraform apply could ruin an entire environment&lt;/li&gt;
&lt;li&gt;Resources modifications required changes in multiple locations&lt;/li&gt;
&lt;li&gt;Configuration modifications required changes in multiple locations&lt;/li&gt;
&lt;li&gt;Accidentally deploying the wrong resources to the wrong environment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And so, I built this boilerplate for our clients (and you) to minimize regrets.&lt;/p&gt;

&lt;p&gt;The focus of this boilerplate is managing GCP resources.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is this boilerplate for you?
&lt;/h2&gt;

&lt;p&gt;If you're a CTO, a DevOps lead embarking on a new project on GCP, or simply in search of a template to organize your Terraform repositories, this project is for you. Finding a well-structured example for deploying GCP resources can be challenging. My own search for such a template was unfruitful, leading me to develop a solution that I've decided to share openly and discuss in this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  What should you expect from this guide?
&lt;/h2&gt;

&lt;p&gt;By the conclusion of this guide, you'll have a thorough understanding of how to establish Terraform repositories using a best-practice folder structure for provisioning GCP resources. You'll also be equipped to execute a straightforward demo, witnessing an end-to-end workflow in action.&lt;/p&gt;

&lt;h2&gt;
  
  
  What shouldn't you expect from this guide?
&lt;/h2&gt;

&lt;p&gt;An exhaustive library of modules for every resource in GCP. We kept the boilerplate minimal, so that you can utilize it for your needs.&lt;br&gt;
You can fairly easily utilize existing modules you created or found using the boilerplate.&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;To begin, clone the essential repositories:‍&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Primary Repository&lt;/strong&gt;&lt;br&gt;
Clone the terragrunt-gcp-projects repository to get started.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone git@github.com:MeteorOps/terragrunt-gcp-projects.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Modules Repository (Optional)&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For the modules used, clone the terraform-gcp-modules repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone git@github.com:MeteorOps/terraform-gcp-modules.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Repository Structure Explained
&lt;/h2&gt;

&lt;p&gt;The code organization follows a logical hierarchy to facilitate multiple projects, regions or environments.This structure gives you a number of benefits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchical configuration:&lt;/strong&gt; The configuration at each level cascades through the folders under it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;State separation:&lt;/strong&gt; The terraform state is saved per folder in a different path in a bucket, limiting the impact radius of changes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic-level of deployment:&lt;/strong&gt; The deeper into the folder you go, the more specific resources you affect with one deployment
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project
└ _global
└ region
   └ _global
   └ environment
      └ resource
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Creating and using root (project) level variables
&lt;/h3&gt;

&lt;p&gt;When dealing with multiple GCP projects or regions, passing common variables to modules can become repetitive. To avoid duplicating variables across each terragrunt.hcl file, leverage root terragrunt.hcl inputs to inherit variables seamlessly across regions and environments.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploy Using Terragrunt
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;Install Terraform version 0.12.6 or newer and Terragrunt version v0.25.1 or newer.&lt;br&gt;
Fill in your GCP Project ID in &lt;code&gt;my-project/project.hcl&lt;/code&gt;.&lt;br&gt;
Make sure the gcloud CLI is installed and you are authenticated, otherwise run gcloud auth login.&lt;/p&gt;
&lt;h3&gt;
  
  
  Module Deployment
&lt;/h3&gt;

&lt;p&gt;To deploy a single module:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;cd&lt;/code&gt; into the module's folder (e.g. &lt;code&gt;cd my-project/us-central1/rnd-1/vpc&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terragrunt plan&lt;/code&gt; to see the changes you are about to apply.&lt;/li&gt;
&lt;li&gt;If the plan looks good, run &lt;code&gt;terragrunt apply&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Environment Deployment
&lt;/h3&gt;

&lt;p&gt;To deploy all modules within an environment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;cd&lt;/code&gt; into the environment folder (e.g. &lt;code&gt;cd my-project/us-central1/rnd-1&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terragrunt run-all plan&lt;/code&gt; to see all the changes you are about to apply.&lt;/li&gt;
&lt;li&gt;If the plan looks good, run &lt;code&gt;terragrunt run-all apply&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;
  
  
  Testing Deployed Infrastructure
&lt;/h3&gt;

&lt;p&gt;Post-deployment, modules will output relevant information. For instance the IP of a deployed application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Outputs:

ip = "35.240.219.84"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A minute or two after the deployment finishes, you should be able to test the ip output in your browser or with curl:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl 35.240.219.84

# Output: Let MeteorOps know if this boilerplate needs any improvement!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Clean-Up Process
&lt;/h3&gt;

&lt;p&gt;To remove all deployed modules within an environment:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;cd&lt;/code&gt; into the environment folder (e.g. &lt;code&gt;cd my-project/us-central1/rnd-1&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Run &lt;code&gt;terragrunt run-all plan -destroy&lt;/code&gt; to see all the destroy changes you're about to apply.&lt;/li&gt;
&lt;li&gt;If the plan looks good, run &lt;code&gt;terragrunt run-all destroy&lt;/code&gt;.
‍&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This guide walks you through leveraging best practices for setting up and managing Terraform repositories for GCP with Terragrunt. These methodologies are designed to be straightforward, efficient, and easily adaptable to future projects or company needs.&lt;/p&gt;

&lt;h4&gt;
  
  
  P.S.
&lt;/h4&gt;

&lt;p&gt;Feel free to &lt;a href="https://meteorops.beehiiv.com/subscribe"&gt;subscribe to our Newsletter&lt;/a&gt; and learn about other insights and resources we release 👈🏼&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>devops</category>
      <category>googlecloud</category>
      <category>terragrunt</category>
    </item>
  </channel>
</rss>
