<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Polar Squad</title>
    <description>The latest articles on DEV Community by Polar Squad (@polarsquad).</description>
    <link>https://dev.to/polarsquad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F7487%2F38f1d0ac-1e68-4375-8fdf-b8b5244ab0f6.jpg</url>
      <title>DEV Community: Polar Squad</title>
      <link>https://dev.to/polarsquad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/polarsquad"/>
    <language>en</language>
    <item>
      <title>When LLMs struggle: Architecture, context, and hidden complexity</title>
      <dc:creator>Jari Haikonen</dc:creator>
      <pubDate>Tue, 31 Mar 2026 06:44:35 +0000</pubDate>
      <link>https://dev.to/polarsquad/when-llms-struggle-architecture-context-and-hidden-complexity-54be</link>
      <guid>https://dev.to/polarsquad/when-llms-struggle-architecture-context-and-hidden-complexity-54be</guid>
      <description>&lt;p&gt;The obvious LLM failures are easy to catch. Syntax errors, broken configs, a pipeline that refuses to run. You see the problem immediately and fix it. Those are not the ones that should worry you.&lt;/p&gt;

&lt;p&gt;The ones that should worry you are the ones that look completely fine. The code runs. The config is valid. The output looks reasonable. And yet, when someone with more experience takes a look, the problems become obvious immediately. They were just invisible to you.&lt;/p&gt;




&lt;h2&gt;
  
  
  The knowledge mirror
&lt;/h2&gt;

&lt;p&gt;There is a pattern I noticed pretty quickly when working on tasks outside my main area.&lt;/p&gt;

&lt;p&gt;When I work in areas I know well, like Terraform or CI/CD pipelines, I can evaluate the model's output almost automatically. I know what good looks like, I know the common failure patterns, and I catch mistakes fast. The feedback loop is tight.&lt;/p&gt;

&lt;p&gt;But when I work on something I know less about, that feedback loop breaks. And the problem is that the model does not help you here at all. It does not become more cautious in unfamiliar territory. It does not tell you when it is guessing. It produces the same confident, well-formatted output regardless of whether it is right or wrong.&lt;/p&gt;

&lt;p&gt;Anthropic's engineering team documented the same behavior when building multi-agent coding harnesses. Their &lt;a href="https://www.anthropic.com/engineering/harness-design-long-running-apps" rel="noopener noreferrer"&gt;published observation&lt;/a&gt;: when asked to evaluate their own output, agents reliably respond by confidently praising it, even when the quality is obviously mediocre to a human observer. The confidence is not correlated with the actual quality of the work.&lt;/p&gt;

&lt;p&gt;So what you end up with is a mirror: the model reflects your level of knowledge back at you. If you know the domain, you catch the mistakes. If you do not, you miss them. And the less you know, the more you are at the mercy of output you cannot properly evaluate.&lt;/p&gt;

&lt;p&gt;In practice, for me this showed up most clearly in development tasks. There are usually several valid ways to implement the same thing in code, and the right choice depends on context, team conventions, performance requirements, and a lot of other things the model cannot know. If you are not familiar enough with those trade-offs yourself, the model will just pick one. And if you do not notice, it gets built on.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem with architectural decisions
&lt;/h2&gt;

&lt;p&gt;LLMs are actually quite good at implementing things. Give the model a clear approach and it will execute it well. The problem is when you ask it to choose the approach.&lt;/p&gt;

&lt;p&gt;Architectural decisions involve context the model simply does not have: your team's skill level, how much complexity the team can realistically own and maintain at once, which parts of your system are already overengineered, the operational cost of what it is about to build, your future plans. Without that, it defaults to what it has seen most in training data, which is usually the textbook approach or the most complex one, not necessarily the most appropriate one for your situation.&lt;/p&gt;

&lt;p&gt;In DevOps this matters more than it might seem, because infrastructure decisions have long tails. A bad pattern in a Terraform module layout, a poorly thought-out pipeline structure, a dependency that should not be there, these things propagate. Fixing them later is far more expensive than catching them early.&lt;/p&gt;

&lt;p&gt;There is also the consistency problem. The model has absorbed a lot of old documentation and outdated best practices, and it has no sense of what is current or what fits your context. So it might solve the same kind of problem in two different places in your codebase using two completely different approaches. Both technically valid, but inconsistent in ways that make things harder to maintain over time.&lt;/p&gt;

&lt;p&gt;The practical answer is documentation: write down your decisions and conventions and feed them to the model. This genuinely helps. But the model does not always follow the rules you set for it, especially on longer tasks. You still need to review what it produces. Documentation reduces the drift, it does not eliminate it.&lt;/p&gt;

&lt;p&gt;A concrete example: a colleague was working on changes that touched multiple Terragrunt module levels and wanted to structure it as a single PR. The constraint is straightforward: you cannot use outputs from one level in another before applying the first one, so the rule is one PR per level. The fix: use &lt;code&gt;git checkout origin/main -- path_to_file&lt;/code&gt; to revert the level-two file back to main in your current branch, open the first PR, merge and apply it, then create a second PR with the level-two changes. She had already asked Copilot. It had given her something much longer and more complicated.&lt;/p&gt;

&lt;p&gt;The one thing to watch: if the reverted file had significant changes in it, save or stash them before running the checkout, because it will wipe them from your branch.&lt;/p&gt;

&lt;p&gt;A model that produces something technically correct but architecturally wrong is in some ways more dangerous than one that produces something broken, because at least broken things announce themselves.&lt;/p&gt;




&lt;h2&gt;
  
  
  When the model starts looping
&lt;/h2&gt;

&lt;p&gt;The other failure mode that shows up regularly is looping. You give the model a problem, it gives you an answer, the answer is wrong, you tell it so, it gives you a variation, that is also wrong, and so on. Anthropic's engineering team describes the same failure in the same terms for longer agentic tasks: on complex work, the agent tends to go off the rails over time, producing increasingly elaborate answers that are no more correct than the first one.&lt;/p&gt;

&lt;p&gt;A good example of this from my own work: I was building a GitLab CI pipeline and wanted to keep it DRY. The &lt;code&gt;changes:&lt;/code&gt; blocks that control when jobs run were being repeated across every rule, so I asked the model to clean it up using YAML anchors. It produced something that looked completely reasonable, valid YAML, clean structure. The pipeline failed. I fed the error back. It adjusted. Still failed. A few iterations in, the suggestions were getting more elaborate but the pipeline kept breaking in the same way.&lt;/p&gt;

&lt;p&gt;The next article in this series gets into exactly why this happens with GitLab specifically, and what the pattern looks like in practice. The signs are pretty recognizable once you have seen it a few times: the answers are getting longer but not more correct, the error is not actually changing between iterations, and you are spending more time explaining the problem than it would take to just fix it.&lt;/p&gt;

&lt;p&gt;The right move at that point is to stop, step back, figure out the root cause yourself, and either fix it directly or come back to the model with a much more specific prompt that contains the missing context. What does not work is adding more context and hoping the next attempt will break the pattern. Usually it does not.&lt;/p&gt;




&lt;h2&gt;
  
  
  Senior knowledge is more important, not less
&lt;/h2&gt;

&lt;p&gt;In practice I have found the opposite of what the "AI replaces engineers" conversation suggests.&lt;/p&gt;

&lt;p&gt;With LLMs, experienced engineers spend less time writing code and config and more time reviewing output, catching bad patterns and making architectural decisions. The volume of output goes up significantly, which means the demand for quality review goes up with it. And here is the uncomfortable part: a junior engineer can now generate code faster than a senior engineer can critically audit it. The rate-limiting factor that used to keep review meaningful has been removed.&lt;/p&gt;

&lt;p&gt;If you do not have the experience to evaluate what the model produces, you are not doing less work. You are just accumulating a gap between how much exists in your codebase and how much anyone genuinely understands. Addy Osmani calls this &lt;a href="https://addyosmani.com/blog/comprehension-debt/" rel="noopener noreferrer"&gt;comprehension debt&lt;/a&gt;, and the pattern he describes maps closely to what I have been seeing in practice.&lt;/p&gt;

&lt;p&gt;The value of experience has not disappeared. It has moved to a different place in the workflow.&lt;/p&gt;




&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;LLMs struggle most where human expertise matters most: architectural decisions, trade-off reasoning, domain-specific behavior. That is not a reason to avoid using them in those areas. It is a reason to stay engaged as the expert when you do.&lt;/p&gt;

&lt;p&gt;One of the most concrete examples of this I have seen is GitLab pipelines, where the model is technically correct about YAML and completely wrong about GitLab's implementation at the same time. If that sounds familiar, that is exactly what the next article in this series is about.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>architecture</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>LLMs in DevOps: Why They Work Best as a "Very Fast Junior Engineer"</title>
      <dc:creator>Jari Haikonen</dc:creator>
      <pubDate>Thu, 26 Mar 2026 11:37:46 +0000</pubDate>
      <link>https://dev.to/polarsquad/llms-in-devops-why-they-work-best-as-a-very-fast-junior-engineer-59oh</link>
      <guid>https://dev.to/polarsquad/llms-in-devops-why-they-work-best-as-a-very-fast-junior-engineer-59oh</guid>
      <description>&lt;p&gt;I was staring at roughly 10,000 lines of network rules spread across a live cloud environment. Two environments, dev and prod, two regions each, all handled by their own separate configuration files. The task was to cross-check what had already been imported into Terraform and what hadn't, and then split the rules correctly across all those files. That kind of task could easily take weeks to do carefully by hand. With an LLM doing the heavy lifting, I was done in three hours.&lt;/p&gt;

&lt;p&gt;That was the moment the mental model clicked for me.&lt;/p&gt;

&lt;p&gt;And yes, these were network security rules. But here is the thing: in a Terraform import workflow, the tooling itself is the safety net. The goal is a 1:1 match between your IaC and the actual state of the environment. If the AI-generated configuration has any drift from reality, Terraform tells you immediately when you run the import. You are not trusting the AI blindly, you are using it to do the repetitive work and then letting Terraform verify the result. That is a very different risk profile from asking an LLM to design your network security from scratch.&lt;/p&gt;

&lt;p&gt;I have been using AI tools as a regular part of my DevOps work for about a year now, not just occasionally but daily, across hobby projects, volunteer work and professional infrastructure and development work. I come at this as a lead DevOps consultant with over 20 years in IT, so I have a pretty good baseline for what good looks like and what bad looks like.&lt;/p&gt;

&lt;p&gt;After a year of this, I have some clear opinions about what these tools are actually good for, where they fall apart, and what way of thinking about them actually helps in practice.&lt;/p&gt;

&lt;p&gt;This is not a model comparison and not a benchmark. Those exist already. This is just what I have noticed from using LLMs in real DevOps work.&lt;/p&gt;




&lt;h2&gt;
  
  
  A year of real use
&lt;/h2&gt;

&lt;p&gt;The work has covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;coding in JavaScript, TypeScript, Golang and Java&lt;/li&gt;
&lt;li&gt;IaC with Terraform and Terragrunt&lt;/li&gt;
&lt;li&gt;configuration management with Ansible&lt;/li&gt;
&lt;li&gt;CI/CD work on GitLab, GitHub, Docker Compose and Helm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I have tried several models, including Claude (Sonnet, Opus, Haiku), ChatGPT, Gemini and Grok, and different IDEs like VS Code and Cursor. The models do have different strengths and there are clear gaps between them, but I am not going to get into that here. What I want to talk about is what using all of them has taught me about AI-assisted DevOps work in general.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pattern that kept repeating
&lt;/h2&gt;

&lt;p&gt;Across all that work, one pattern kept showing up.&lt;/p&gt;

&lt;p&gt;When I gave the model clear context, a well-scoped task and some constraints to work within, the output was fast and impressively good. When I gave it an open-ended problem or let things run without much correction, the quality dropped quickly. Dead code started accumulating, inconsistent patterns appeared and the model started looping through variations of the same wrong answer.&lt;/p&gt;

&lt;p&gt;The difference was not which model I was using. The difference was how much structure I brought to the interaction.&lt;/p&gt;

&lt;p&gt;And that structure comes directly from your own maturity and experience in the domain. The more you know, the more precisely you can specify what you want, and the better the output gets. This is probably the most underappreciated factor in how well LLMs actually perform in practice.&lt;/p&gt;

&lt;p&gt;Compare these two prompts for the same task:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Create pipeline that deploys my nodejs app"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;versus:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Create CI/CD pipelines for pull requests and deploying on main branch. Add quality gates to the PR pipeline: format, lint, security, build and docker build. In the main pipeline do docker builds and use the registry for cached images to make builds faster. On the Dockerfiles use multi-stage builds where possible to keep the final image small, and make sure we are not running as root. Make the pipelines DRY on the sections that overlap"&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The second prompt does not just describe what to build. It reflects years of experience with CI/CD, Docker best practices and security thinking. Someone without that background would not even know to ask for those things. The model cannot supply that knowledge from its own side, it can only work with what you give it.&lt;/p&gt;

&lt;p&gt;It is not that the model is bad. It just has no stake in the outcome and no experience to fall back on. It will produce output either way. The quality of that output depends almost entirely on the quality of the guidance behind it.&lt;/p&gt;




&lt;h2&gt;
  
  
  A very fast junior engineer
&lt;/h2&gt;

&lt;p&gt;The mental model that finally made this click for me: an LLM behaves like a very fast junior engineer.&lt;/p&gt;

&lt;p&gt;A good junior can produce a lot of work quickly and they follow clear instructions well. But they struggle with architectural decisions, tend to go with the most obvious approach rather than the most appropriate one, and need supervision.&lt;/p&gt;

&lt;p&gt;They act this way not because they are useless but because they lack the context and experience to make the right call on their own. Leave them unsupervised long enough and small decisions start to compound into bigger problems.&lt;/p&gt;

&lt;p&gt;LLMs behave exactly like this, just at roughly ten times the speed. The speed is real and genuinely useful, but it does not change the underlying dynamic.&lt;/p&gt;

&lt;p&gt;There is an important flip side to this that is worth saying directly: the analogy only works if you actually are the senior. If you jump into a domain you know nothing about, the dynamic inverts. The model becomes the one with more apparent knowledge and you have no real basis to supervise it. You cannot catch the bad architectural decisions because you do not recognise them. That is when you get the worst outcomes: confident-sounding output that is quietly wrong in ways that take a long time to find and fix.&lt;/p&gt;

&lt;p&gt;When you accept this framing, a few things shift in how you work:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your job becomes that of an architect instead of a typist. You define the structure, the constraints, the approach. The model handles the execution.&lt;/li&gt;
&lt;li&gt;Structuring the problem well matters more than prompting well. A well-defined task with clear context will beat a cleverly worded prompt for an undefined problem every time.&lt;/li&gt;
&lt;li&gt;You still need to know your domain. The better you understand the area you are working in, the better you can guide the model and catch its mistakes. Domain expertise is not optional, it is what makes the supervision possible in the first place.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What this looks like in practice
&lt;/h2&gt;

&lt;p&gt;One thing that has helped quite a lot is writing documentation and conventions that both humans and the model can use. Not AI-specific memory tools or special prompting tricks, but actual documentation that would exist for your team anyway. Things like guidelines in a Terraform modules folder, pipeline conventions, naming rules.&lt;/p&gt;

&lt;p&gt;When that structure exists and you give the model access to it, it follows the established patterns instead of inventing new ones. The corrections get smaller and the output actually fits the system you are building.&lt;/p&gt;

&lt;p&gt;The other thing is knowing when to stop iterating with the model and just fix something yourself. Sometimes two or three rounds of back and forth are not making progress and the model is just looping. At that point the fastest path forward is usually to step in, fix the specific issue yourself, and re-engage the model for the work around it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The takeaway
&lt;/h2&gt;

&lt;p&gt;AI is already genuinely useful in DevOps workflows. The value you get out of it scales with the quality of the supervision and structure you bring as the engineer. The model is the junior. You are the senior. That dynamic does not disappear as the tools get faster or more capable.&lt;/p&gt;

&lt;p&gt;The rest of this series goes into the details. The next piece looks at where LLMs consistently struggle and why the failures are harder to catch than they look. After that, a concrete GitLab pipeline example that most DevOps engineers will recognise. And then the positive story: why importing existing infrastructure into Terraform is one of the best use cases for LLMs I have found.&lt;/p&gt;

&lt;p&gt;If you have been using LLMs in DevOps or platform engineering work, I am curious what mental model you have settled on. Does the junior engineer analogy match your experience or have you found a better way to think about it?&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ai</category>
      <category>terraform</category>
      <category>platformengineering</category>
    </item>
    <item>
      <title>AI/ML Platforms: Pros and Cons</title>
      <dc:creator>Jani Ranta</dc:creator>
      <pubDate>Mon, 28 Oct 2024 11:33:47 +0000</pubDate>
      <link>https://dev.to/polarsquad/aiml-platforms-pros-and-cons-28ka</link>
      <guid>https://dev.to/polarsquad/aiml-platforms-pros-and-cons-28ka</guid>
      <description>&lt;p&gt;When choosing between AI/ML platforms, each provider has its strengths and trade-offs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Azure Machine Learning&lt;/strong&gt; excels in seamless integration with Microsoft services, strong security features, and ease of use through AutoML tools, but can become costly with complex pricing and requires expertise within the Azure ecosystem.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;AWS SageMaker&lt;/strong&gt; offers extensive flexibility, scalability, and strong AWS service integration, making it ideal for large-scale applications, though it can be challenging for beginners due to its steep learning curve and potentially high costs.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Google Vertex AI&lt;/strong&gt; is known for its user-friendly interface and superior AutoML capabilities, particularly for data-heavy operations, but has fewer pre-built models compared to AWS and can be expensive with larger datasets.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;On the other hand, &lt;strong&gt;open-source solutions&lt;/strong&gt; like Hugging Face offer cost savings and high customization, but demand significant technical expertise and manual setup, making them harder to scale without the right resources.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Machine Learning and AI Model Development
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;th&gt;Category&lt;/th&gt;
        &lt;th&gt;Microsoft Azure&lt;/th&gt;
        &lt;th&gt;Amazon Web Services&lt;/th&gt;
        &lt;th&gt;Google Cloud Platform&lt;/th&gt;
        &lt;th&gt;Open Source&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Machine Learning and AI Model Development&lt;/td&gt;
        &lt;td&gt;Azure Machine Learning&lt;/td&gt;
        &lt;td&gt;Amazon SageMaker&lt;/td&gt;
        &lt;td&gt;Vertex AI&lt;/td&gt;
        &lt;td&gt;MLflow, Kubeflow, Ray Serve&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Image Recognition and Computer Vision&lt;/td&gt;
        &lt;td&gt;Computer Vision&lt;/td&gt;
        &lt;td&gt;Amazon Rekognition&lt;/td&gt;
        &lt;td&gt;Vision AI&lt;/td&gt;
        &lt;td&gt;DeepStack AI Server&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Natural Language Processing (NLP)&lt;/td&gt;
        &lt;td&gt;Text Analytics, Language Understanding (LUIS)&lt;/td&gt;
        &lt;td&gt;Amazon Comprehend&lt;/td&gt;
        &lt;td&gt;Natural Language API&lt;/td&gt;
        &lt;td&gt;Haystack, Rasa, spaCy REST API, Hugging Face&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Speech Recognition and Text-to-Speech&lt;/td&gt;
        &lt;td&gt;Azure AI Speech Service&lt;/td&gt;
        &lt;td&gt;Amazon Transcribe, Amazon Polly&lt;/td&gt;
        &lt;td&gt;Speech-to-Text, Text-to-Speech&lt;/td&gt;
        &lt;td&gt;Vosk (for Speech Recognition), Coqui TTS (for Text-to-Speech)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Chatbots and Interactive Applications&lt;/td&gt;
        &lt;td&gt;Azure Bot Services, Microsoft Copilot Studio&lt;/td&gt;
        &lt;td&gt;Amazon Lex&lt;/td&gt;
        &lt;td&gt;Dialogflow&lt;/td&gt;
        &lt;td&gt;Rasa, Botpress&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Automated Text Processing and Analysis&lt;/td&gt;
        &lt;td&gt;Azure AI Document Intelligence&lt;/td&gt;
        &lt;td&gt;Amazon Comprehend&lt;/td&gt;
        &lt;td&gt;Document AI&lt;/td&gt;
        &lt;td&gt;Tesseract OCR, Apache Tika&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Generative Large Language Models (LLMs)&lt;/td&gt;
        &lt;td&gt;Azure OpenAI Service&lt;/td&gt;
        &lt;td&gt;Amazon Bedrock&lt;/td&gt;
        &lt;td&gt;Vertex AI, Gemini&lt;/td&gt;
        &lt;td&gt;DeepSpeed, Haystack, Hugging Face Transformers, GPT-J/NeoX Playground, Ollama&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Azure Machine Learning&lt;/strong&gt;: A cloud-based service designed for the entire machine learning lifecycle, enabling data scientists and engineers to build, train, and deploy models at scale. It supports various frameworks and offers tools for MLOps, data preparation, and model management.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon SageMaker&lt;/strong&gt;: A fully managed service that provides tools for building, training, and deploying machine learning models quickly. It includes features like built-in algorithms, Jupyter notebooks, and model monitoring capabilities&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vertex AI&lt;/strong&gt;: A unified platform that simplifies the machine learning workflow by integrating various tools and services for data preparation, model training, and deployment. It supports AutoML and custom training with TensorFlow and PyTorch.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MLflow&lt;/strong&gt;: An open-source platform for managing the machine learning lifecycle, including experimentation, reproducibility, and deployment. It provides a tracking server, projects, and a model registry.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Kubeflow&lt;/strong&gt;: An open-source machine learning toolkit for Kubernetes, designed to facilitate the deployment, orchestration, and management of machine learning workflows on Kubernetes clusters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ray Serve&lt;/strong&gt;: A scalable model serving library that allows users to deploy machine learning models in production with minimal latency. It integrates seamlessly with Ray, a distributed computing framework.&lt;/p&gt;

&lt;h2&gt;
  
  
  Image Recognition and Computer Vision
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Computer Vision&lt;/strong&gt;: A service that provides algorithms for image analysis, including object detection, image classification, and optical character recognition (OCR) capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Rekognition&lt;/strong&gt;: A service that makes it easy to add image and video analysis to applications, offering features like facial recognition and object detection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vision AI&lt;/strong&gt;: A Google Cloud service that provides powerful image analysis capabilities through pre-trained models and custom model training&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepStack AI Server&lt;/strong&gt;: An open-source platform for implementing AI capabilities in applications, including image recognition and face detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Natural Language Processing (NLP)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Text Analytics, Language Understanding (LUIS)&lt;/strong&gt;: Azure services that provide capabilities for sentiment analysis, key phrase extraction, and language understanding for building conversational applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Comprehend&lt;/strong&gt;: A natural language processing service that uses machine learning to find insights and relationships in text, such as sentiment and entity recognition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Natural Language API&lt;/strong&gt;: A Google Cloud service that allows developers to analyze and understand text through features like entity recognition and sentiment analysis.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Haystack, Rasa, spaCy REST API, Hugging Face&lt;/strong&gt;: Open-source frameworks and libraries for building NLP applications, offering capabilities for intent recognition, dialogue management, and text processing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Speech Recognition and Text-to-Speech
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Azure AI Speech Service&lt;/strong&gt;: A service that provides speech recognition and text-to-speech capabilities, enabling applications to convert speech to text and vice versa.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Transcribe, Amazon Polly&lt;/strong&gt;: Services for automatic speech recognition and text-to-speech, allowing developers to add voice capabilities to applications easily.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Speech-to-Text, Text-to-Speech&lt;/strong&gt;: Google Cloud services that enable audio transcription and speech synthesis, providing high-quality voice outputs for applications.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vosk, Coqui TTS&lt;/strong&gt;: Open-source tools for speech recognition and text-to-speech, allowing developers to integrate voice capabilities into their applications without relying on cloud services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chatbots and Interactive Applications
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Azure Bot Services&lt;/strong&gt;: A platform for building and deploying intelligent chatbots that can interact with users across various channels. With Microsoft Copilot Studio on Azure this service becomes very powerful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Lex&lt;/strong&gt;: A service for building conversational interfaces using voice and text, powered by the same technology as Alexa.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Dialogflow&lt;/strong&gt;: A Google Cloud service for building conversational agents, offering natural language understanding and integration with various messaging platforms.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Rasa, Botpress&lt;/strong&gt;: Open-source frameworks for developing conversational AI applications, providing tools for building, training, and deploying chatbots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Automated Text Processing and Analysis
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Azure AI Document Intelligence&lt;/strong&gt;: A service that helps automate the extraction of information from documents, enhancing data processing workflows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Comprehend&lt;/strong&gt;: Also mentioned under NLP, it provides capabilities for analyzing text and extracting insights from documents.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Document AI&lt;/strong&gt;: A Google Cloud service that automates the extraction of structured data from unstructured documents, improving data processing efficiency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tesseract OCR, Apache Tika&lt;/strong&gt;: Open-source tools for optical character recognition and document parsing, enabling automated text extraction from images and documents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Last but not least, Generative Large Language Models (LLMs)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;AWS SageMaker&lt;/strong&gt;: A fully managed machine learning (ML) service. Data scientists and developers can quickly and confidently build, train, and deploy ML models into a production-ready hosted environment. It provides a UI experience for running ML workflows that makes SageMaker ML tools available across multiple integrated development environments (IDEs).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Azure OpenAI Service&lt;/strong&gt;: A service that provides access to OpenAI's powerful language models, enabling developers to build applications that require natural language understanding and generation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Amazon Bedrock&lt;/strong&gt;: A managed service that allows developers to build and scale generative AI applications using foundation models from various providers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vertex AI, Gemini&lt;/strong&gt;: Google Cloud services that facilitate the development of generative AI applications, offering access to advanced language models for various use cases.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSpeed, Haystack, Hugging Face Transformers, GPT-J/NeoX Playground, Ollama&lt;/strong&gt;: Open-source tools and frameworks for building and deploying generative AI applications, providing capabilities for training and fine-tuning large language models.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
    &lt;tr&gt;
        &lt;th&gt;Feature/Capability&lt;/th&gt;
        &lt;th&gt;Azure Machine Learning&lt;/th&gt;
        &lt;th&gt;AWS SageMaker&lt;/th&gt;
        &lt;th&gt;Google Vertex AI&lt;/th&gt;
        &lt;th&gt;Open Source Solutions&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Management Type&lt;/td&gt;
        &lt;td&gt;Fully managed&lt;/td&gt;
        &lt;td&gt;Fully managed&lt;/td&gt;
        &lt;td&gt;Fully managed&lt;/td&gt;
        &lt;td&gt;Self-managed&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Ease of Use&lt;/td&gt;
        &lt;td&gt;High, with visual tools and AutoML&lt;/td&gt;
        &lt;td&gt;Moderate, with no-code options available&lt;/td&gt;
        &lt;td&gt;High, with integrated tools&lt;/td&gt;
        &lt;td&gt;Varies by tool&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Model Training&lt;/td&gt;
        &lt;td&gt;Supports various frameworks, AutoML&lt;/td&gt;
        &lt;td&gt;Supports various frameworks, AutoML&lt;/td&gt;
        &lt;td&gt;Supports various frameworks, AutoML&lt;/td&gt;
        &lt;td&gt;MLflow, Kubeflow, Ray Serve&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Model Deployment&lt;/td&gt;
        &lt;td&gt;Easy endpoint configuration&lt;/td&gt;
        &lt;td&gt;Easy endpoint configuration&lt;/td&gt;
        &lt;td&gt;Easy endpoint configuration&lt;/td&gt;
        &lt;td&gt;Requires manual setup&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Pre-built Models&lt;/td&gt;
        &lt;td&gt;Yes, through Azure Model Gallery&lt;/td&gt;
        &lt;td&gt;Yes, through SageMaker Model Zoo&lt;/td&gt;
        &lt;td&gt;Yes, through Model Garden&lt;/td&gt;
        &lt;td&gt;Depends on the specific tool&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Integration with Other Services&lt;/td&gt;
        &lt;td&gt;Strong integration with Azure services&lt;/td&gt;
        &lt;td&gt;Strong integration with AWS services&lt;/td&gt;
        &lt;td&gt;Strong integration with Google services&lt;/td&gt;
        &lt;td&gt;Varies by tool&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Generative AI Support&lt;/td&gt;
        &lt;td&gt;Yes, through Azure OpenAI Service&lt;/td&gt;
        &lt;td&gt;Yes, through SageMaker&lt;/td&gt;
        &lt;td&gt;Yes, through GenAI&lt;/td&gt;
        &lt;td&gt;DeepSpeed, Hugging Face Transformers&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;NLP Capabilities&lt;/td&gt;
        &lt;td&gt;Comprehensive (Text Analytics, LUIS)&lt;/td&gt;
        &lt;td&gt;Comprehensive (Comprehend)&lt;/td&gt;
        &lt;td&gt;Comprehensive (Natural Language API)&lt;/td&gt;
        &lt;td&gt;Haystack, Rasa, spaCy REST API&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Computer Vision Capabilities&lt;/td&gt;
        &lt;td&gt;Yes, through Computer Vision&lt;/td&gt;
        &lt;td&gt;Yes, through Amazon Rekognition&lt;/td&gt;
        &lt;td&gt;Yes, through Vision AI&lt;/td&gt;
        &lt;td&gt;DeepStack AI Server&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Speech Recognition&lt;/td&gt;
        &lt;td&gt;Yes, through Azure AI Speech Service&lt;/td&gt;
        &lt;td&gt;Yes, through Amazon Transcribe&lt;/td&gt;
        &lt;td&gt;Yes, through Speech-to-Text&lt;/td&gt;
        &lt;td&gt;Vosk&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Text-to-Speech&lt;/td&gt;
        &lt;td&gt;Yes, through Azure AI Speech Service&lt;/td&gt;
        &lt;td&gt;Yes, through Amazon Polly&lt;/td&gt;
        &lt;td&gt;Yes, through Text-to-Speech&lt;/td&gt;
        &lt;td&gt;Coqui TTS&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Collaboration Tools&lt;/td&gt;
        &lt;td&gt;Azure ML Workspaces&lt;/td&gt;
        &lt;td&gt;SageMaker Studio for team collaboration&lt;/td&gt;
        &lt;td&gt;Vertex AI Workbench&lt;/td&gt;
        &lt;td&gt;Varies by tool&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Cost Structure&lt;/td&gt;
        &lt;td&gt;Pay-as-you-go, pricing varies by usage&lt;/td&gt;
        &lt;td&gt;Pay-as-you-go, pricing varies by usage&lt;/td&gt;
        &lt;td&gt;Pay-as-you-go, pricing varies by usage&lt;/td&gt;
        &lt;td&gt;Free, but requires infrastructure&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
        &lt;td&gt;Customization&lt;/td&gt;
        &lt;td&gt;High customization options available&lt;/td&gt;
        &lt;td&gt;High customization options available&lt;/td&gt;
        &lt;td&gt;High customization options available&lt;/td&gt;
        &lt;td&gt;High, depending on the framework&lt;/td&gt;
    &lt;/tr&gt;
&lt;/table&gt;&lt;/div&gt;

</description>
    </item>
    <item>
      <title>How to use AWS Roles Anywhere</title>
      <dc:creator>Janne Pohjolainen</dc:creator>
      <pubDate>Wed, 21 Feb 2024 09:27:20 +0000</pubDate>
      <link>https://dev.to/polarsquad/how-to-use-aws-roles-anywhere-484p</link>
      <guid>https://dev.to/polarsquad/how-to-use-aws-roles-anywhere-484p</guid>
      <description>&lt;h2&gt;
  
  
  What is AWS Roles Anywhere?
&lt;/h2&gt;

&lt;p&gt;AWS Roles Anywhere enables you to use AWS Policies and AWS Roles for workloads such as servers, containers and applications that are running outside AWS without having to create long-term credentials.&lt;/p&gt;

&lt;p&gt;AWS Roles Anywhere uses SSL certificates to manage workload access through IAM Roles and handing over short-living session keys. &lt;/p&gt;

&lt;h2&gt;
  
  
  How does AWS Roles Anywhere work?
&lt;/h2&gt;

&lt;p&gt;A workload needs to have an SSL certificate (and a private key) that is signed by a Certificate Authority configured in AWS Roles Anywhere as a Trust Anchor. It is possible to configure external CAs or to use an AWS Private CA.&lt;/p&gt;

&lt;p&gt;AWS Roles Anywhere Profiles map which IAM Roles are possible to be assumed by the workload. The workload is then given a short-living session keys which grants access to AWS services according to the role policies.&lt;/p&gt;

&lt;p&gt;This is more secure than if user credentials or other long-term credentials are used, as they usually have much more access than a single role. It's also possible to set boundaries for the roles added to the Profile which override the role's policies.&lt;/p&gt;

&lt;h2&gt;
  
  
  What use-cases does AWS Roles Anywhere have?
&lt;/h2&gt;

&lt;p&gt;These roles can be used in any kind of workload running outside AWS that needs access to the resources in the cloud. For example,&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;On-premises databases servers could connect to S3 to fetch data. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you use Kubernetes in a private cloud, you can create a role that gives the application container running in the cluster permissions to specific AWS resources only.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If you already have an existing CA and PKI (Public-Key Infrastructure) system, another use case is to use AWS Roles Anywhere to use the existing certificates to grant access to AWS services.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's dive in deeper...&lt;/p&gt;

&lt;h2&gt;
  
  
  Some infrastructure is needed first
&lt;/h2&gt;

&lt;p&gt;First, a Certificate Authority (CA) is needed. For this example, we will create an AWS Private CA and then create an AWS Roles Anywhere trust anchor for it.&lt;br&gt;
After that, we will create an IAM Role with S3 read-only access and a Roles Anywhere Profile that links the IAM role to the trust anchor and CA.&lt;/p&gt;

&lt;p&gt;We will not dive deep into these and concentrate more on the workload side. Terraform code used to create CA, trust anchor, profile, role and an S3 bucket for this example can be found in &lt;a href="https://github.com/jpohjolainen/aws_roles_anywhere"&gt;https://github.com/jpohjolainen/aws_roles_anywhere&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once run, Terraform will output the ARNs of the newly created CA, trust anchor, role and profile, and also a S3 bucket name.&lt;/p&gt;

&lt;p&gt;These will be needed later on for creating a certificate for signing in to AWS and when revoking a certificate.&lt;/p&gt;

&lt;p&gt;Speaking of certificates, let's create a new certificate request and get a certificate from the CA.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Only the first private CA will be free for 30 days. If you create one and then delete it and create another, you will need to pay for the CA immediately. A private CA costs 300€ ($400) a month. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note 2&lt;/strong&gt;: It is also possible to create your own CA with an OpenSSL command and configure it using external CA. Here is how to do that: &lt;a href="https://aws.amazon.com/blogs/security/iam-roles-anywhere-with-an-external-certificate-authority/"&gt;https://aws.amazon.com/blogs/security/iam-roles-anywhere-with-an-external-certificate-authority/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  Creating a certificate and private key for our application
&lt;/h2&gt;
&lt;h3&gt;
  
  
  Create Certificate Signing Request (CSR) and a new Private Key.
&lt;/h3&gt;

&lt;p&gt;Create CSR &lt;code&gt;app-cert.csr&lt;/code&gt; and private key &lt;code&gt;app-private-key&lt;/code&gt;. Change the Subject to your liking. C=Country, ST=State, OU=Organization Unit, O=Organization and CN=Common Name (name of app, or hostname/domain)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openssl req &lt;span class="nt"&gt;-new&lt;/span&gt; &lt;span class="nt"&gt;-newkey&lt;/span&gt; rsa:2048 &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-out&lt;/span&gt; &lt;span class="s2"&gt;"app-cert.csr"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-keyout&lt;/span&gt; &lt;span class="s2"&gt;"app-private.key"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;-subj&lt;/span&gt; &lt;span class="s2"&gt;"/C=DE/ST=Berlin/OU=DevOps/CN=app1"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The previous command prompted for password for the private key, but we need to remove it as this is for an application&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openssl rsa &lt;span class="nt"&gt;-in&lt;/span&gt; &lt;span class="s2"&gt;"app-private.key"&lt;/span&gt; &lt;span class="nt"&gt;-out&lt;/span&gt; &lt;span class="s2"&gt;"app-private-nopass.key"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Request a certificate from the CA based on the CSR.
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws acm-pca issue-certificate &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--certificate-authority&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:acm-pca:eu-west-1:xxxxxx:certificate-authority/zzzzzzzzz"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--csr&lt;/span&gt; &lt;span class="s2"&gt;"fileb://app-cert.csr"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--signing-algorithm&lt;/span&gt; &lt;span class="s2"&gt;"SHA256WITHRSA"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--validity&lt;/span&gt; &lt;span class="nv"&gt;Value&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;365,Type&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"DAYS"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The --csr option really has &lt;code&gt;fileb://&lt;/code&gt; instead of &lt;code&gt;file://&lt;/code&gt;. It is used to read files in binary format in the AWS CLI.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The following signing algorithms are available: SHA256WITHECDSA, SHA384WITHECDSA, SHA512WITHECDSA, SHA256WITHRSA, SHA384WITHRSA and SHA512WITHRSA.&lt;br&gt;
The specified signing algorithm family (RSA or ECDSA) must match the algorithm family of the CA's secret key.&lt;/p&gt;

&lt;p&gt;The validity type can be YEARS, MONTHS, DAYS and the number as the value, END_DATE with value of YYYYMMDDHHMMSS or ABSOLUTE with a unix timestamp as the value.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Get the CertificateARN from the reply. It is used in next section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Download the issued certificate.
&lt;/h3&gt;

&lt;p&gt;Use the AWS CLI to download the certificate from the private CA&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws acm-pca get-certificate &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--certificate-arn&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:acm-pca:eu-west-1:xxxxxx:certificate-authority/zzzzzzzzz/certificate/af7d3bf5c562a7d91f9310da8ae6ea8d"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--certificate-authority-arn&lt;/span&gt; &lt;span class="s2"&gt;"arn:aws:acm-pca:eu-west-1:xxxxxx:certificate-authority/zzzzzzzzz"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="se"&gt;\&lt;/span&gt;
 |jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.Certificate, .CertificateChain'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; app-cert.pem
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This uses &lt;code&gt;jq&lt;/code&gt; to get the certificate in JSON format. It's possible to use &lt;code&gt;--output text&lt;/code&gt; and direct that to a file, but then you need to edit the file to move the second &lt;code&gt;-----BEGIN CERTIFICATE-----&lt;/code&gt; to its own line.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Show the certificate
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openssl x509 &lt;span class="nt"&gt;-in&lt;/span&gt; app-cert.pem &lt;span class="nt"&gt;-noout&lt;/span&gt; &lt;span class="nt"&gt;-text&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Creating a container with an application
&lt;/h2&gt;

&lt;p&gt;Once we have the certificate and  the private key created, we need to have a configuration for AWS.&lt;br&gt;
&lt;code&gt;$HOME/.aws/config&lt;/code&gt; is used by AWS SDK and CLI for getting access to AWS by signing in with the certificate and assume the role.&lt;/p&gt;

&lt;p&gt;AWS provides a tool to help with the sign-in called AWS Signing Helper. In this example, it is downloaded inside the Dockerfile when building the image.&lt;br&gt;
&lt;a href="https://docs.aws.amazon.com/rolesanywhere/latest/userguide/credential-helper.html"&gt;https://docs.aws.amazon.com/rolesanywhere/latest/userguide/credential-helper.html&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  .aws/config
&lt;/h3&gt;

&lt;p&gt;The helper tool can then be used in the file &lt;code&gt;$HOME/.aws/config&lt;/code&gt; to login to the AWS when SDK or CLI is used. Here we need the ARNs that the Terraform code above returns. Save this to a file called &lt;code&gt;aws-config&lt;/code&gt;. The Dockerfile expects this name:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[default]&lt;/span&gt;
    &lt;span class="py"&gt;credential_process&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;/usr/local/bin/aws_signing_helper credential-process --certificate /app/app-cert.pem --private-key /app/app-private-nopass.key --trust-anchor-arn arn:aws:rolesanywhere:eu-west-1:xxxxxx:trust-anchor/yyyyyyyy --profile-arn arn:aws:rolesanywhere:eu-west-1:xxxxxx:profile/ccccccc --role-arn arn:aws:iam::xxxxxx:role/RolesAnywhere&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Change the &lt;code&gt;--trust-anchor-arn&lt;/code&gt;, &lt;code&gt;--profile-arn&lt;/code&gt; and &lt;code&gt;--role-arn&lt;/code&gt; to values gotten from Terraform code.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Small application
&lt;/h3&gt;

&lt;p&gt;Here is a small Python code to print S3 buckets. Save this to a file &lt;code&gt;gets3buckets.py&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;hello_s3&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;s3_resource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, Amazon S3! Let&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s list your buckets:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;s3_resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;buckets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;all&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\t&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;hello_s3&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Copy the certificate, private key, aws-config and the above python code into a directory.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dockerfile
&lt;/h3&gt;

&lt;p&gt;Create a &lt;code&gt;Dockerfile&lt;/code&gt; in the same directory as the certificates and the other files previously created:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; debian:stable-slim&lt;/span&gt;

&lt;span class="k"&gt;ARG&lt;/span&gt;&lt;span class="s"&gt; homedir=/app&lt;/span&gt;

&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;DEBIAN_FRONTEND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;noninteractive apt-get update &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get upgrade &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;        awscli &lt;span class="se"&gt;\
&lt;/span&gt;        curl &lt;span class="se"&gt;\
&lt;/span&gt;        python3-boto3 &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;rm&lt;/span&gt; &lt;span class="nt"&gt;-rf&lt;/span&gt; /var/lib/apt/lists/&lt;span class="k"&gt;*&lt;/span&gt;

&lt;span class="c"&gt;# Download AWS Signing Helper&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; /usr/local/bin &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; curl &lt;span class="nt"&gt;-LO&lt;/span&gt; https://rolesanywhere.amazonaws.com/releases/1.1.1/X86_64/Linux/aws_signing_helper &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;chmod &lt;/span&gt;0755 aws_signing_helper

&lt;span class="c"&gt;# Create user to run the app&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;adduser &lt;span class="nt"&gt;--system&lt;/span&gt; &lt;span class="nt"&gt;--home&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$homedir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--no-create-home&lt;/span&gt; &lt;span class="nt"&gt;--shell&lt;/span&gt; /bin/false userapp
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$homedir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;chown &lt;/span&gt;userapp &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$homedir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# After this everything is run under the user&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; userapp&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --chown=userapp --chmod=0600 ./app-cert.pem "$homedir"&lt;/span&gt;
&lt;span class="c"&gt;# This should never be copied inside the image. It should be mounted from outside&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --chown=userapp --chmod=0600 ./app-private-nopass.key "$homedir"&lt;/span&gt;


&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$homedir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/.aws &lt;span class="se"&gt;\
&lt;/span&gt;    &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;chmod &lt;/span&gt;0700 &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$homedir&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;/.aws

&lt;span class="c"&gt;# Copy the aws-config &lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --chown=userapp --chmod=0644 ./aws-config "$homedir"/.aws/config&lt;/span&gt;

&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --chown=userapp --chmod=0755 gets3buckets.py "$homedir"&lt;/span&gt;

&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; "$homedir"&lt;/span&gt;

&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; python3 /app/gets3buckets.py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: The private key should never be baked into the container except when testing locally. The private key should be in a secret store like AWS Secrets Manager and then mounted from there when using inside Kubernetes.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Build the container
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker build &lt;span class="nt"&gt;-t&lt;/span&gt; s3rolesanywhere:test &lt;span class="nb"&gt;.&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When running the container, it will use the certificate and the private key with the aws_signing_helper tool to request AWS secrets and session keys, and assume the role. It will then print the buckets from S3 every 5 seconds.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-ti&lt;/span&gt; &lt;span class="nt"&gt;--rm&lt;/span&gt; s3rolesanywhere:test 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output should be something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;Hello, Amazon S3! Let&lt;span class="s1"&gt;'s list your buckets:
    private-ca-crl-xxxxxxxx
    ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Revoking certificate
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Get certificate serial number
&lt;/h3&gt;

&lt;p&gt;To revoke a certificate, you need to have its Serial number. You can get it from the certificate with &lt;code&gt;openssl&lt;/code&gt; command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openssl x509 &lt;span class="nt"&gt;-in&lt;/span&gt; app-cert.pem &lt;span class="nt"&gt;-noout&lt;/span&gt; &lt;span class="nt"&gt;-serial&lt;/span&gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Revoke certificate in AWS Private CA
&lt;/h3&gt;

&lt;p&gt;Then, you can revoke the certficate in AWS Private CA with the following command&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws acm-pca revoke-certificate &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--certificate-authority-arn&lt;/span&gt; &amp;lt;ARN of CA&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--certificate-serial&lt;/span&gt; &amp;lt;Serial of the cert to revoke&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--revocation-reason&lt;/span&gt; &lt;span class="s2"&gt;"&amp;lt;Reason for revoking&amp;gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: Only these are valid values for &lt;code&gt;--revocation-reason&lt;/code&gt;: AFFILIATION_CHANGED, CESSATION_OF_OPERATION, A_A_COMPROMISE, PRIVILEGE_WITHDRAWN, SUPERSEDED, UNSPECIFIED, KEY_COMPROMISE, CERTIFICATE_AUTHORITY_COMPROMISE&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It may take some time for the CRL file to appear.&lt;/p&gt;

&lt;h3&gt;
  
  
  Download AWS Private CA CRL file
&lt;/h3&gt;

&lt;p&gt;You can then download the CRL from the S3 bucket configured for CRL in the AWS Private CA. &lt;/p&gt;

&lt;p&gt;List CRL files in the Private CA S3 Bucket (&lt;code&gt;private_ca_s3_bucket&lt;/code&gt; is the output from Terraform)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;ls &lt;/span&gt;s3://&amp;lt;private_ca_s3_bucket&amp;gt;/crl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Get the CRL file from the S3 bucket&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws s3 &lt;span class="nb"&gt;cp &lt;/span&gt;s3://&amp;lt;private_ca_s3_bucket&amp;gt;/crl/xxxxxxxxx.crl &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This file is in DER format, and it needs to be in PEM format for AWS Roles Anywhere to accept.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;openssl crl &lt;span class="nt"&gt;-inform&lt;/span&gt; DER &lt;span class="nt"&gt;-in&lt;/span&gt; xxxxxxxx.crl &lt;span class="nt"&gt;-outform&lt;/span&gt; PEM &lt;span class="nt"&gt;-out&lt;/span&gt; privateca.crl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Import CRL to AWS Roles Anywhere
&lt;/h3&gt;

&lt;p&gt;A CRL file in PEM format needs to be then uploaded/imported to the AWS Roles Anywhere for it to revoke the access to the certificate&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aws rolesanywhere import-crl &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--crl-data&lt;/span&gt; privateca.crl &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--name&lt;/span&gt; &amp;lt;give some name&amp;gt; &lt;span class="se"&gt;\&lt;/span&gt;
 &lt;span class="nt"&gt;--trust-anchor-arn&lt;/span&gt; &amp;lt;ARN of the Turst Anchor&amp;gt; &lt;span class="nt"&gt;--enabled&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: You can only import 2 CRLs. Even with same names it doesn't overwrite them, so you need to manually remove them. Check &lt;code&gt;aws rolesanywhere list-crls&lt;/code&gt; and &lt;code&gt;aws rolesanywhere delete-crl&lt;/code&gt; AWS CLI commands.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After some time, the access with that certificate should not work anymore. The error will be something like&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;botocore.exceptions.CredentialRetrievalError: Error when 
retrieving credentials from custom-process: 
2024/02/16 11:11:34 AccessDeniedException: Certificate revoked
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;It is a pretty neat way of giving AWS access to applications and servers outside of AWS. It is more secure than creating normal, long-term user credentials and using them for applications. Compared to normal user credentials, if the session is hijacked in transit, then it is only valid for a short while instead of potentially being live for even months or years. And if the private key is compromised, then it can only access the services configured with policies to the role.&lt;/p&gt;

&lt;p&gt;But revoking certificates is not all that straightforward. The CA keeps up a list called Certificate Revocation List (CRL). This needs to be imported to AWS Roles Anywhere seperately. AWS Roles Anywhere doesn’t have any automated update system of the CRL, even with AWS Private CA that can push the CRL to S3. It can only be imported through AWS CLI or API, but for example Terraform does not yet have that capability.&lt;/p&gt;

&lt;p&gt;I feel that unless a company is already heavily invested in certificates and PKI systems, the AWS Roles Anywhere brings complications not needed on top of all other user management, especially as Terraform doesn't support import-crl now, and playing with AWS Private CA or OpenSSL requires more manual work than I think is necessary.&lt;/p&gt;

</description>
      <category>aws</category>
      <category>iam</category>
      <category>rolesanywhere</category>
    </item>
    <item>
      <title>Crossplane: How do providers work</title>
      <dc:creator>Joonas Venäläinen</dc:creator>
      <pubDate>Tue, 17 Oct 2023 07:16:35 +0000</pubDate>
      <link>https://dev.to/polarsquad/crossplane-how-do-providers-work-2fda</link>
      <guid>https://dev.to/polarsquad/crossplane-how-do-providers-work-2fda</guid>
      <description>&lt;p&gt;Providers are the meat around Crossplane’s bones, and they are used to extend the capabilities of Crossplane. When Crossplane is installed, it doesn't have any capabilities to interact with external systems. A core Crossplane pod will only watch the following resources.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;compositeresourcedefinitions.apiextensions.crossplane.io
compositionrevisions.apiextensions.crossplane.io         
compositions.apiextensions.crossplane.io                 
configurationrevisions.pkg.crossplane.io                 
configurations.pkg.crossplane.io                         
controllerconfigs.pkg.crossplane.io                      
locks.pkg.crossplane.io                                  
providerrevisions.pkg.crossplane.io                      
providers.pkg.crossplane.io                              
storeconfigs.secrets.crossplane.io
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you install a provider, a new pod is created to Crossplane's installation namespace. This pod is a Kubernetes Controller that watches the CRDs that are also installed as part of the provider package.&lt;/p&gt;

&lt;p&gt;To find out what different kinds of providers are available, you can check the &lt;a href="https://marketplace.upbound.io/" rel="noopener noreferrer"&gt;Upbound Marketplace&lt;/a&gt; and &lt;a href="https://github.com/crossplane-contrib" rel="noopener noreferrer"&gt;crossplane-contrib&lt;/a&gt; repository. For this series, we are going to work with the following providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://marketplace.upbound.io/providers/upbound/provider-gcp-storage/v0.37.0" rel="noopener noreferrer"&gt;provider-gcp-storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://marketplace.upbound.io/providers/upbound/provider-gcp-cloudplatform/v0.37.0" rel="noopener noreferrer"&gt;provider-gcp-cloudplatform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://marketplace.upbound.io/providers/upbound/provider-terraform/v0.10.0" rel="noopener noreferrer"&gt;provider-terraform&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those GCP providers are installed from the &lt;a href="https://marketplace.upbound.io/providers/upbound/provider-family-gcp/v0.37.0/docs" rel="noopener noreferrer"&gt;provider-family-gcp&lt;/a&gt; package. These provider-family packages are special packages that allow you to install only the provider packages you need instead of installing everything, which would mean 343 CRDs if you install the &lt;a href="https://marketplace.upbound.io/providers/upbound/provider-gcp/v0.37.0/docs" rel="noopener noreferrer"&gt;provider-gcp&lt;/a&gt; package instead. Crossplane also states:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;On average, 30 CRDs are used from Provider packages.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Looking at the average number, you would still have ~313 CRDs in the cluster that aren't used 🤯.&lt;/p&gt;

&lt;p&gt;Install the providers&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt;EOF | kubectl apply --filename=-&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pkg.crossplane.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;provider-gcp-storage&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xpkg.upbound.io/upbound/provider-gcp-storage:v0.36.0&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pkg.crossplane.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;provider-gcp-cloudplatform&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xpkg.upbound.io/upbound/provider-gcp-cloudplatform:v0.36.0&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pkg.crossplane.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;provider-terraform&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xpkg.upbound.io/upbound/provider-terraform:v0.10.0&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a little while, you should see the providers installed and in a healthy state&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get provider
---
NAME                          INSTALLED   HEALTHY   PACKAGE                                                      AGE
provider-gcp-cloudplatform    True        True      xpkg.upbound.io/upbound/provider-gcp-cloudplatform:v0.36.0   116s
provider-gcp-storage          True        True      xpkg.upbound.io/upbound/provider-gcp-storage:v0.36.0         116s
provider-terraform            True        True      xpkg.upbound.io/upbound/provider-terraform:v0.10.0           116s
upbound-provider-family-gcp   True        True      xpkg.upbound.io/upbound/provider-family-gcp:v0.37.0          107s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now the providers are installed and ready, we need to set up the ProviderConfig, which configures the credentials for the provider to be able to interact with external systems, in this case, with Google Cloud. You can have multiple ProviderConfigs and reference them in managed resources using &lt;code&gt;providerConfigRef&lt;/code&gt;. ProviderConfigs are cluster-scoped resources.&lt;/p&gt;

&lt;p&gt;You can set up a ProviderConfig per tenant when you have a multi-tenant cluster. When creating compositions, you could patch the value of &lt;code&gt;providerConfigRef&lt;/code&gt; in managed resources with a value of &lt;code&gt;spec.claimRef.namespace&lt;/code&gt;, which points to the namespace where the XRC was created.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi6k0b4x3rps2ju1hxub.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdi6k0b4x3rps2ju1hxub.png" alt="Multi-tenant providerconfig"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Every provider has their own individual settings available when it comes to ProviderConfig. For the GCP provider, you can find all the available configuration options &lt;a href="https://marketplace.upbound.io/providers/upbound/provider-family-gcp/v0.37.0/resources/gcp.upbound.io/ProviderConfig/v1beta1" rel="noopener noreferrer"&gt;here&lt;/a&gt; and for Terraform provider &lt;a href="https://marketplace.upbound.io/providers/upbound/provider-terraform/v0.10.0/resources/tf.upbound.io/ProviderConfig/v1beta1" rel="noopener noreferrer"&gt;here&lt;/a&gt;. If you need to override Controller related settings eg. ServiceAccount you can use &lt;a href="https://doc.crds.dev/github.com/crossplane/crossplane/pkg.crossplane.io/ControllerConfig/v1alpha1" rel="noopener noreferrer"&gt;ControllerConfig&lt;/a&gt; for that.&lt;/p&gt;

&lt;p&gt;In upcoming chapters, we will create resources in Google Cloud that involve creating a bucket, serviceaccount, iam-binding, and serviceaccountkey. Use the following to configure a new service account with needed permissions in GCP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# GCP Project ID
PROJECT_ID=""

gcloud iam service-accounts create crossplane-sa-demo --display-name "Crossplane Service Account Demo"

gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:crossplane-sa-demo@$PROJECT_ID.iam.gserviceaccount.com --role roles/storage.admin
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:crossplane-sa-demo@$PROJECT_ID.iam.gserviceaccount.com --role roles/iam.serviceAccountAdmin
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:crossplane-sa-demo@$PROJECT_ID.iam.gserviceaccount.com --role roles/iam.serviceAccountKeyAdmin
gcloud projects add-iam-policy-binding $PROJECT_ID --member serviceAccount:crossplane-sa-demo@$PROJECT_ID.iam.gserviceaccount.com --role roles/storage.iamMember
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a service account key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud iam service-accounts keys create credentials.json --iam-account=crossplane-sa-demo@$PROJECT_ID.iam.gserviceaccount.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create a Kubernetes secret in &lt;code&gt;crossplane-system&lt;/code&gt; namespace that contains the previously created credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create secret generic gcp-creds --from-file=creds=./credentials.json -n crossplane-system
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Create ProviderConfig that uses these credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt;EOF | kubectl apply --filename=-&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp.upbound.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ProviderConfig&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;projectID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;$PROJECT_ID&lt;/span&gt;
  &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Secret&lt;/span&gt;
    &lt;span class="na"&gt;secretRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-creds&lt;/span&gt;
    &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;crossplane-system&lt;/span&gt;
    &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;creds&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you run this inside GKE, using the Workload Identity for authentication is much better. You can find detailed instructions for it &lt;a href="https://marketplace.upbound.io/providers/upbound/provider-family-gcp/v0.37.0/docs/configuration" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;You can also read the secret from the filesystem using &lt;code&gt;fs&lt;/code&gt;. This might come in handy in cases where you are leveraging, for example, Hashicorp Vault with Vault Agent sidecar to inject secrets to pods. Here is a quick example of how you would configure it without going into too much detail about how to work with Vault Agent Injector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pkg.crossplane.io/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ControllerConfig&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-config&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;annotations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;vault.hashicorp.com/agent-inject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true"&lt;/span&gt;
    &lt;span class="na"&gt;vault.hashicorp.com/role&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;crossplane-providers"&lt;/span&gt;
    &lt;span class="s"&gt;...&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;pkg.crossplane.io/v1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provider&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;provider-gcp-storage&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;package&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;xpkg.upbound.io/upbound/provider-gcp-storage:v0.36.0&lt;/span&gt;
  &lt;span class="na"&gt;controllerConfigRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp-config&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gcp.upbound.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ProviderConfig&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;projectID&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;$PROJECT_ID&lt;/span&gt;
  &lt;span class="na"&gt;credentials&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;source&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Filesystem&lt;/span&gt;
    &lt;span class="na"&gt;fs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;/vault/secrets/gcp-creds&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can quickly test that everything is working by creating a &lt;a href="https://marketplace.upbound.io/providers/upbound/provider-gcp-storage/v0.37.0/resources/storage.gcp.upbound.io/Bucket/v1beta1" rel="noopener noreferrer"&gt;Bucket&lt;/a&gt; resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="s"&gt;cat &amp;lt;&amp;lt;EOF | kubectl apply --filename=-&lt;/span&gt;
&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;storage.gcp.upbound.io/v1beta1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Bucket&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ps-bucket-${RANDOM}&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;forProvider&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;location&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;US&lt;/span&gt;
&lt;span class="s"&gt;EOF&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After a little while, you should see the bucket resource ready and synced:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get bucket
---
NAME                            READY   SYNCED   EXTERNAL-NAME                  AGE
bucket-crossplane-demo-30855    True    True     bucket-crossplane-demo-30855   15m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, we are ready to start working with GCP using Crossplane. I will go through setting up the Terraform provider configs later in the series when it's time to start working with it.&lt;/p&gt;

&lt;p&gt;Remember to delete the test bucket resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl delete bucket &amp;lt;bucket_name&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next chapter quickly reviews available configuration options for managed resources.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>iac</category>
      <category>crossplane</category>
    </item>
    <item>
      <title>Crossplane: Streamline your infrastructure provisioning &amp; management</title>
      <dc:creator>Joonas Venäläinen</dc:creator>
      <pubDate>Tue, 17 Oct 2023 07:16:15 +0000</pubDate>
      <link>https://dev.to/polarsquad/crossplane-streamline-your-infrastructure-provisioning-management-3hni</link>
      <guid>https://dev.to/polarsquad/crossplane-streamline-your-infrastructure-provisioning-management-3hni</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnus3iufx5kdrnl5k902.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcnus3iufx5kdrnl5k902.png" alt="Crossplane architecture"&gt;&lt;/a&gt;&lt;br&gt;
Crossplane is an extension to Kubernetes, which transforms your Kubernetes cluster into a universal control plane. It allows you to manage anything that has an API available with the help of provider packages. It's also fully extendable, so you can build your own providers to support your APIs. Everybody likes 🍕 so here is a post &lt;a href="https://blog.crossplane.io/providers-101-ordering-pizza-with-kubernetes-and-crossplane" rel="noopener noreferrer"&gt;providers-101-ordering-pizza-with-kubernetes-and-crossplane&lt;/a&gt; which goes through on a high level how to build a provider that is capable of ordering pizza for you inside Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;Crossplane is often used to interact with cloud providers like Azure, AWS, and GCP. By using the Crossplane, you can bring your infrastructure management to Kubernetes. Another significant benefit of Crossplane is that it acts as a &lt;a href="https://kubernetes.io/docs/concepts/architecture/controller/" rel="noopener noreferrer"&gt;Kubernetes Controller&lt;/a&gt;, constantly monitoring the state of external resources. So, if someone would modify/delete the resources outside of Kubernetes, Crossplane would reconcile the resources.&lt;/p&gt;

&lt;p&gt;By connecting Kubernetes and the cloud provider APIs with Kubernetes Custom Resource Definitions (CRDs), Crossplane can be used to enable a Kubernetes-native approach to managing cloud resources. We can for example define a Database custom resource which will then provision a database in the cloud provider that we have hooked Crossplane into. This makes it easier for developers to start consuming infrastructure resources by hiding the complexity behind Crossplane compositions. The Ops team can create and maintain these compositions and the &lt;a href="https://docs.crossplane.io/v1.13/concepts/composite-resource-definitions/" rel="noopener noreferrer"&gt;XRDs&lt;/a&gt; which define how the resource will look like for the consumers. Without going into too much detail, we could have a custom resource &lt;code&gt;Database&lt;/code&gt; that the developers can consume.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;storage.polarsquad.com/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Database&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ps-demo-db-mysql&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;parameters&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ps-demo-db-mysql&lt;/span&gt;
    &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;small"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we define the size as &lt;code&gt;small&lt;/code&gt;. The Ops team could set small, medium, and large options for the size. Then, in compositions, patch these values to specific instance types depending on the cloud platform. This also enhances the experience for the developers as they don't have to know the specific instance types.&lt;/p&gt;

&lt;p&gt;As the resources are native Kubernetes manifests, you can bundle them with the other manifests you use to deploy the application to the cluster. When the resources are created, Crossplane will create the connection secrets to the application namespace for pods to consume.&lt;/p&gt;

&lt;p&gt;Overall, the Crossplane allows you to form your own Cloud Platform inside Kubernetes. With the help of compositions, you can build a self-service platform where developers can easily create resources on-demand when they need them without having to jump through hoops to get additional backing services.&lt;/p&gt;

&lt;p&gt;Throughout the series, I will be using the terms &lt;code&gt;XRD&lt;/code&gt;,&lt;code&gt;XR&lt;/code&gt;, and &lt;code&gt;XRS&lt;/code&gt;. Here is a quick overview of what they stand for.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;XRD&lt;/code&gt; - Composite Resource Definition&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;XR&lt;/code&gt; - Composite Resource&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;XRC&lt;/code&gt; - Claim&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Prerequisites:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Access to Google Cloud and &lt;a href="https://cloud.google.com/sdk/docs/install" rel="noopener noreferrer"&gt;gcloud cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kubernetes cluster eg. &lt;a href="https://minikube.sigs.k8s.io/docs/start/" rel="noopener noreferrer"&gt;minikube&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;&lt;a href="https://helm.sh/docs/intro/install/" rel="noopener noreferrer"&gt;Helm&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Access to &lt;a href="https://aiven.io/" rel="noopener noreferrer"&gt;Aiven&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Aiven offers a free tier so you can create an account and use it to get through the tutorial.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Install Crossplane
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add crossplane-stable https://charts.crossplane.io/stable
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install crossplane --namespace crossplane-system --create-namespace crossplane-stable/crossplane
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, the Crossplane is ready, and the next step is to install the needed &lt;a href="https://docs.crossplane.io/latest/concepts/providers/" rel="noopener noreferrer"&gt;Providers&lt;/a&gt; that give the Crossplane capabilities to provision managed resources to external systems.&lt;/p&gt;

&lt;p&gt;In this series, we will provision resources to Google Cloud using the Google Cloud providers and, later in the series, leverage Terraform provider to manage resources that don't have a Crossplane native provider available yet.&lt;/p&gt;

&lt;p&gt;Links to each part of this series:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/crossplane-how-do-providers-work-2fda"&gt;Crossplane: How do providers work&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>iac</category>
      <category>crossplane</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Grafana</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:28:01 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3</guid>
      <description>&lt;p&gt;Grafana is the industry standard open-source product for visualising metrics stored in a TSDB format, or a variety of other data sources. With Grafana, we can create dashboards, queries, and alerts from the data that we have. With all our metrics in long-term storage, we can use a single data source to access all the metrics from all our infrastructure that uses the metrics platform. This enables easily creating dashboards that aggregate data from multiple different Kubernetes clusters, and enable drilling down to a single resource easily.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;Next, we will set up a Grafana instance into our minikube and use Promxy as the default data source. This example assumes that you have completed the following steps, as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib"&gt;Prometheus Observability Platform: Handling multiple regions&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;base64&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;First we start with adding the Grafana Helm chart repository, and installing its contents into the Grafana namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add grafana https://grafana.github.io/helm-charts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we define Promxy as the data source. In the Helm values file, we need the following block to do this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;datasources.yaml:
  apiVersion: 1
  datasources:
  - name: Promxy
    type: prometheus
    url: "http://promxy.promxy.svc.cluster.local:8082"
    isDefault: true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We are using the &lt;code&gt;svc.cluster.local&lt;/code&gt; address for the Promxy service, because all our services are inside the cluster.&lt;/p&gt;

&lt;p&gt;I have converted the above into json so that it can be passed to Helm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install grafana grafana/grafana --create-namespace --namespace grafana --set-json 'datasources={"datasources.yaml":{"apiVersion":1,"datasources":[{"name":"Promxy","type":"prometheus","url":"http://promxy.promxy.svc.cluster.local:8082","isDefault":true}]}}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we need to get the admin password for the admin user:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can port-forward the Grafana service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n grafana services/grafana 9090:80
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt; to access the web UI, and log in with the username &lt;code&gt;admin&lt;/code&gt;, and the password acquired in the previous step. From here you can verify that Promxy is set up and acting as the default data source by navigating to Administration -&amp;gt; Data sources -&amp;gt; Promxy, and clicking the Test button at the bottom of the page.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsik76cty2tbpisxi2lj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdsik76cty2tbpisxi2lj.png" alt="Grafana UI" width="639" height="329"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtnmxyfaxxyblg43oerc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdtnmxyfaxxyblg43oerc.png" alt="Data source test" width="276" height="140"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Assuming the test was successful, we can then navigate to the “Explore” item in the menu&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcog4aklr3ykw4kblf0y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvcog4aklr3ykw4kblf0y.png" alt="Grafana Explore tab" width="258" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and check that we have metrics available in the “metrics explorer” section. Alternatively, we can use the following query to check that metrics are available:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;sum(kube_pod_container_status_restarts_total) by (namespace, container)&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;N.B. you might have to change the time range for the query to get results:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxf3d89qret3ve8h0sqgt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxf3d89qret3ve8h0sqgt.png" alt="Metrics in Grafana" width="777" height="404"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved setting up Grafana as the metrics visualisation tool for our metrics platform. This enables us to create dashboards and Grafana alerts for metrics from all sources  sending metrics to our long-term storage cluster (or clusters if we have multiple regions) that are queried using Promxy.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>grafana</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Application metrics</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:27:31 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024</guid>
      <description>&lt;p&gt;When creating our own applications, we need to use a metrics library to generate the metrics and then inside our application functions increment said metrics. With Go for example, we can use the Prometheus library. The metrics will then be exposed to the /metrics endpoint. If our application is inside a Kubernetes cluster with a prometheus-operator, we can use a ServiceMonitor to scrape its metrics. If we don’t have such a possibility, we can instead set up an application to send metrics straight to our long-term storage solution. For VictoriaMetrics, we can use the &lt;a href="https://github.com/VictoriaMetrics/metrics"&gt;github.com/VictoriaMetrics/metrics&lt;/a&gt; library to send the metrics to VictoriaMetrics. Remember to add authentication logic into the section pushing the metrics to the long-term storage, if necessary.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's set up a hello-world Golang application in our cluster, and use ServiceMonitor to send its metrics to Prometheus.&lt;/p&gt;

&lt;p&gt;First, we need to update our kube-prometheus-stack Helm deployment to pick up ServiceMonitor resources with a certain label attached. We need to pass the following value to our Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;prometheus:
  prometheusSpec:
    serviceMonitorSelector:
      matchExpressions:
      - key: app
        operator: Exists
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have converted that into json so that it can be passed to Helm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --reuse-values --set-json 'prometheus.prometheusSpec.serviceMonitorSelector={"matchExpressions":[{"key":"app","operator":"Exists"}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will update our kube-prometheus-stack to pick up ServiceMonitor resources from any namespace, as long as they have an app label attached.&lt;/p&gt;

&lt;p&gt;Next, we are going to create a namespace for our hello-world application which is a simple Golang application exposing metrics via the &lt;a href="https://github.com/prometheus/client_golang/tree/main/prometheus"&gt;Prometheus module&lt;/a&gt;. We will borrow &lt;a href="https://github.com/okteto/go-prometheus-monitoring/blob/master/main.go"&gt;this&lt;/a&gt; already-made application, which has logic defined to increment a metric called &lt;code&gt;hello_processed_total&lt;/code&gt; each time the page is loaded. &lt;/p&gt;

&lt;p&gt;To create a namespace and a pod, we use the following commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl create namespace hello-world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl run hello-world --namespace=hello-world --image='okteto/hello-world:golang-metrics' --labels app=hello-world
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we need to create a service for the new pod:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt;'EOF' | kubectl create -f -
apiVersion: v1
kind: Service
metadata:
  labels:
    app: hello-world
  name: hello-world
  namespace: hello-world
spec:
  ports:
  - name: http
    port: 8080
  selector:
    app: hello-world
  type: ClusterIP
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can test that our application is working by port-forwarding it. We can also check what the &lt;code&gt;hello_processed_total&lt;/code&gt; metric looks like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n hello-world services/hello-world 9090:8080
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now navigate to &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt; and &lt;a href="http://localhost:9090/metrics"&gt;http://localhost:9090/metrics&lt;/a&gt;. You should see a metric called &lt;code&gt;hello_processed_total&lt;/code&gt; with a number attached. Each reload of the page will increment this number.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbv5q9gyxtv62j2klkuy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzbv5q9gyxtv62j2klkuy.png" alt="Metrics" width="616" height="59"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, we need to set up a ServiceMonitor to send these metrics to Prometheus:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt;'EOF' | kubectl create -f -
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hello-world
  namespace: hello-world
  labels:
    app: hello-world
spec:
  selector:
    matchLabels:
      app: hello-world
  endpoints:
    - port: http
EOF
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This ServiceMonitor will target services matching the label selector (&lt;code&gt;app=hello-world&lt;/code&gt;) and will scrape the port called “http”.&lt;/p&gt;

&lt;p&gt;Now, if we port-forward our Prometheus service, we should see a new service in the service discovery section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n prometheus services/kube-prometheus-stack-prometheus 9090:9090
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to &lt;a href="http://localhost:9090/service-discovery"&gt;http://localhost:9090/service-discovery&lt;/a&gt; and you should see that there is a new service discovered with the name &lt;code&gt;serviceMonitor/hello-world/hello-world/0&lt;/code&gt; and it should show 1/1 active targets.&lt;/p&gt;

&lt;p&gt;We can now query the &lt;code&gt;hello_processed_total metric&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1yo9jctopwjomapxjl8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd1yo9jctopwjomapxjl8.png" alt="Metrics in Prometheus" width="781" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved sending metrics from our custom app running in its own namespace into Prometheus.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-grafana-40d3"&gt;Prometheus Observability Platform: Grafana&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>promethe</category>
      <category>metrics</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Handling multiple regions</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:27:13 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib</guid>
      <description>&lt;p&gt;When we have multiple regions such as the EU and US, we need to have a long-term storage solution running in both of those. If we want to combine the resources into a single query we need to use a query layer that can query both endpoints. One such component we can use is Promxy.&lt;/p&gt;

&lt;p&gt;Promxy uses the same PromQL syntax as Prometheus and we can define server groups with multiple endpoints. In our case, we would define our EU and US long-term storage endpoints under one server group. We can then use the single Promxy endpoint to query both the EU and the US.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps, as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can use the Helm chart offered in the &lt;a href="https://github.com/jacksontj/promxy/tree/master"&gt;Promxy repository&lt;/a&gt; to deploy a proxy to our Kubernetes cluster.&lt;/p&gt;

&lt;p&gt;First we clone the repository, because the Helm chart is not published to a public registry:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/jacksontj/promxy.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then we navigate to the folder containing the Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cd promxy/deploy/k8s/helm-charts/promxy/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We set up Promxy with the following &lt;code&gt;server_groups:&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;server_groups:
  - static_configs:
    - targets:
      - vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481
      labels:
        region: eu
    scheme: http
    path_prefix: /select/0/prometheus
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have converted this to json so we can pass it to the helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"server_groups":[{"static_configs":[{"targets":["vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481"],"labels":{"region":"eu"}}],"scheme":"http","path_prefix":"/select/0/prometheus"}]}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To install Promxy from the local Helm chart we use the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install promxy . --create-namespace --namespace promxy --set 'image.tag=latest' --set-json 'config.promxy={"server_groups":[{"static_configs":[{"targets":["vmcluster-victoria-metrics-cluster-vmselect.victoriametrics.svc.cluster.local:8481"],"labels":{"region":"eu"}}],"scheme":"http","path_prefix":"/select/0/prometheus"}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now port-forward the Promxy service and access the web UI from &lt;a href="http://localhost:9090:"&gt;http://localhost:9090:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n promxy services/promxy 9090:8082
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we can perform the same query for the &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; to verify that Promxy is able to reach the VictoriaMetrics data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4nhxmxnnfh5mcl0hnku.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd4nhxmxnnfh5mcl0hnku.png" alt="Promxy" width="777" height="438"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If we have more regions than just the US, we can add them under the &lt;code&gt;server_groups&lt;/code&gt; and query multiple VictoriaMetrics instances from a single Promxy source.&lt;/p&gt;

&lt;p&gt;We have now achieved setting up Promxy with &lt;code&gt;server_groups&lt;/code&gt; to use for querying VictoriaMetrics instances.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-application-metrics-2024"&gt;Prometheus Observability Platform: Application metrics&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>promxy</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Alert routing</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:26:37 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o</guid>
      <description>&lt;p&gt;Alertmanager is a component usually bundled with Prometheus to handle routing the alerts to receivers such as Slack, e-mail, and PagerDuty. It uses a routing tree to send alerts to one or multiple receivers.&lt;/p&gt;

&lt;p&gt;Routes define which receivers each alert should be sent to. You can define rules for the routes. The rules are evaluated from top to bottom, and alerts are sent to matching receivers. Usually, the match block is used to match the label name and value for a certain receiver. Notification integrations are configured for each receiver. There are multiple different options available, such as &lt;code&gt;email_configs&lt;/code&gt;, &lt;code&gt;slack_configs&lt;/code&gt;, and &lt;code&gt;webhook_configs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Alertmanager has a web UI that can be used to view current alerts and silence them if needed.&lt;/p&gt;

&lt;p&gt;With a platform setup, we usually don’t want to use multiple Alertmanagers, so we disable the provisioning of additional alertmanagers for Prometheus deployments that include them automatically. Instead, we use one centralised Alertmanager inside, for example, a Kubernetes cluster which is aimed at monitoring platform usage.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps, as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb"&gt;Prometheus Observability Platform: Alerts&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Amtool (&lt;a href="https://github.com/prometheus/alertmanager#install-1"&gt;https://github.com/prometheus/alertmanager#install-1&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now that we have an alert defined and deployed to vmalert we can add Alertmanager to our platform. Because we are creating this with a platform aspect in mind, we will install Alertmanager as a separate resource, and not as a part of the kube-platform-stack. We will use a tool called amtool which is bundled with alertmanager to run unit tests on our alert rules&lt;/p&gt;

&lt;p&gt;We can install the Alertmanager with the following Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install alertmanager prometheus-community/alertmanager --create-namespace --namespace alertmanager
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now port-forward the alertmanager service to access the alertmanager web UI from &lt;a href="http://localhost:9090:"&gt;http://localhost:9090:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n alertmanager services/alertmanager 9090:9093
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To trigger a test alert, we can use the following command from another terminal tab while keeping the port-forwarding on:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -H "Content-Type: application/json" -d '[{"labels":{"alertname":"TestAlert"}}]' localhost:9090/api/v1/alerts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now use amtool to list the currently firing alerts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amtool alert query --alertmanager.url=http://localhost:9090
---
Alertname   Starts At                Summary  State   
TestAlert   2023-07-07 07:23:55 UTC           active 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's add a test receiver and routing for it. Below is an example of the configuration we want to pass to Alertmanager in Helm values format.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;config:
  receivers:
    - name: default-receiver
    - name: test-team-receiver

  route:
    receiver: 'default-receiver'
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 4h
    routes:
      - receiver: 'test-team-receiver'
        matchers:
        - team="test-team"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I have converted the above into a json one-liner so we can pass it into Helm without having to create an intermediate file.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade alertmanager prometheus-community/alertmanager --namespace alertmanager --set-json 'config.receivers=[{"name":"default-receiver"},{"name":"test-team-receiver"}]' --set-json 'config.route={"receiver":"default-receiver","group_wait":"30s","group_interval":"5m","repeat_interval":"4h","routes":[{"receiver":"test-team-receiver","matchers":["team=\"test-team\""]}]}'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can now use amtool to test that an alert that has the label &lt;code&gt;team=test-team&lt;/code&gt; gets routed to the test-team-receiver:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amtool config routes test --alertmanager.url=http://localhost:9090 team=test-team
---
test-team-receiver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;amtool config routes test --alertmanager.url=http://localhost:9090 team=test     
---
default-receiver
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We have now set up an Alertmanager which can route alerts depending on team label value.&lt;/p&gt;

&lt;p&gt;Next, we need to update vmalert to route alerts into the Alertmanager using the cluster local address of the alertmanager service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade vmalert vm/victoria-metrics-alert --namespace victoriametrics --reuse-values --set server.notifier.alertmanager.url="http://alertmanager.alertmanager.svc.cluster.local:9093"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can run a pod that will be crashing to increment the &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; metric by creating a pod that has a typo in the sleep command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl run crashpod --image busybox:latest --command -- slep 1d
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we port-forward the alertmanager service. We should see an alert in there when we navigate to &lt;a href="http://localhost:9090:"&gt;http://localhost:9090:&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n alertmanager services/alertmanager 9090:9093
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3vu2jngob5p7evtrpeq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr3vu2jngob5p7evtrpeq.png" alt="alertmanager" width="778" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved setting up Alertmanager as our tool for routing alerts from the vmalert component.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-handling-multiple-regions-25ib"&gt;Prometheus Observability Platform: Handling multiple regions&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>alertmanager</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Alerts</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:25:36 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb</guid>
      <description>&lt;p&gt;With Prometheus, we can use PromQL to write alert rules and evaluate them using the given evaluation rules and intervals. Alerts have an evaluation period: if an alert is active for the duration of the evaluation period, then it will fire. Prometheus is usually bundled with a component called Alertmanager, which is used to route alerts to different receivers such as Slack and email. Once an alert fires, it is sent to the Alertmanager which uses a routing table to find out if the alert is to be sent to a receiver, and how to route it.&lt;/p&gt;

&lt;p&gt;Prometheus alerts are evaluated against the local storage. With VictoriaMetrics, we can use the vmalert component to evaluate alert rules against the VictoriaMetrics long-term storage using the same PromQL syntax as with Prometheus. It is tempting to write all the alerting rules in VictoriaMetrics, but depending on the size of the infrastructure we might want to evaluate some rules on the Prometheus servers where the data originates from, to avoid overloading VictoriaMetrics.&lt;/p&gt;

&lt;p&gt;Alert rules can be very complex, and it is best to validate them before deploying them to Prometheus. Promtool can be used to validate Prometheus alerting rules and run unit tests on them. You can implement these simple validation and unit testing steps in your continuous integration (CI) system.&lt;/p&gt;

&lt;p&gt;A good monitoring platform enables teams to write their own alerts against the metrics stored in the long-term storage. We can do this in a mono-repository or multi-repository fashion. With a mono-repository, we have all the infrastructure and the alerting defined in the same repository and pipelines delivering them to servers. A multi-repository approach would set up a separate repository for the alerts, where we define the alerting rules using PromQL, and add validation and unit tests.&lt;/p&gt;

&lt;p&gt;The main benefit of the multi-repository approach is reduced cognitive load. The contributors do not see or need to be aware of anything else than the alert rules. This also eliminates the possibility of introducing bugs into the underlying infrastructure. The downside of this approach is tying the separated alerting configuration back to the Prometheus server. &lt;/p&gt;

&lt;p&gt;Terraform can be used to set up the repository used for alerting as a remote module and thus pull the alerting rules into the server when deploying the server. With a mono-repository, we can more easily tie the alerts into the Prometheus server, but if we are using Terraform then we need to either split the state of the alerts or accept that the contributors might affect more resources than just the alerts which might also cause more anxiety to the contributors.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;This example assumes that you have completed the following steps as the components from those are needed:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj"&gt;Prometheus Observability Platform: Long-term storage&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Promtool (&lt;a href="https://github.com/prometheus/prometheus/tree/main#building-from-source"&gt;https://github.com/prometheus/prometheus/tree/main#building-from-source&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;yq (optional)&lt;/li&gt;
&lt;li&gt;jq (optional)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Figuring out suitable metrics for alerts can be hard. The &lt;a href="https://samber.github.io/awesome-prometheus-alerts/"&gt;awesome-prometheus-alerts&lt;/a&gt; website is an excellent source for inspiration for this. It has a collection of pre-made alerts using the PromQL syntax. For example, we can set up an alert for crash-looping Kubernetes pods, with the alert named &lt;code&gt;KubernetesPodCrashLooping&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Below there is an example unit test for the &lt;code&gt;KubernetesPodCrashLooping&lt;/code&gt; alert. First, we want to simplify the alert a little and add some required blocks for promtool to be able to validate the rule. This file is saved as &lt;code&gt;kube-alerts.rules.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;groups:
  - name: kube-alerts
    rules:
    - alert: KubernetesPodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[5m]) &amp;gt; 2
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can use the command promtool check rules &lt;code&gt;kube-alert.rules.yml&lt;/code&gt; to validate the rule. If everything is OK, the response looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;promtool check rules kube-alert.rules.yml
---
Checking kube-alert.rules.yml
  SUCCESS: 1 rules found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To write a unit test for this alert, we create a file called &lt;code&gt;kube-alert.test.yml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rule_files:
  - kube-alert.rules.yml

evaluation_interval: 1m

tests:
  - interval: 1m

    input_series:
      - series: kube_pod_container_status_restarts_total{namespace="test-namespace",pod="test-pod"}
        values: '1+2x15'

    alert_rule_test:
      - alertname: KubernetesPodCrashLooping
        eval_time: 15m
        exp_alerts:
          - exp_labels:
              severity: warning
              namespace: test-namespace
              pod: test-pod
            exp_annotations:
              summary: Pod test-namespace/test-pod is crash looping
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we are expecting an increase of over 2 in the &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; time series within 5 minutes, and that this increase is active for at least 10 minutes. In the summary field, we expect to receive namespace and pod labels and to a severity label with the value “warning”.&lt;/p&gt;

&lt;p&gt;To write a test for this rule we need to create an input series that can trigger this rule and pass all the labels needed for the summary field. Because our evaluation time is 15 minutes we need at least 15 entries into our series. The syntax &lt;code&gt;‘1+2x15’&lt;/code&gt; adds 2 to the previous number 15 times to create a time series. We also pass the required namespace and pod labels and write the expected summary field response.&lt;/p&gt;

&lt;p&gt;To run the unit test we use the command promtool test rules &lt;code&gt;kube-alert.test.yml&lt;/code&gt; which will return the following response if all went well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;promtool test rules kube-alert.test.yml
---
Unit Testing:  kube-alert.test.yml
  SUCCESS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we need to deploy vmalert so that we can evaluate alert rules against the data in the long-term storage.&lt;/p&gt;

&lt;p&gt;First we have to convert our alert rule into a format that works with Helm. The problem with promtool and Helm charts is that the groups: section is required in both of them, so we need to remove it from the alert we have created, but if we remove it then promtool no longer works. There are multiple ways to handle this, for example the Terraform &lt;code&gt;trimprefix()&lt;/code&gt; function which can be used to remove the &lt;code&gt;groups:&lt;/code&gt; section from the alert rules. For this use case we are going to use a monstrous one-liner to remove the groups: section, convert the output into json and then output it into a single line so we can pass it to Helm:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat kube-alert.rules.yml | sed '/groups:/d' | yq -o=json | jq -c
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will get us the following json one line string:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[{"name":"kube-alerts","rules":[{"alert":"KubernetesPodCrashLooping","expr":"increase(kube_pod_container_status_restarts_total[5m]) &amp;gt; 2","for":"10m","labels":{"severity":"warning"},"annotations":{"summary":"Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping"}}]}]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we can deploy the vmalert Helm chart:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install vmalert vm/victoria-metrics-alert --namespace victoriametrics --set 'server.notifier.alertmanager.url=http://localhost:9093' --set 'server.datasource.url=http://vmcluster-victoria-metrics-cluster-vmselect:8481/select/0/prometheus' --set 'server.remote.write.url=http://vmcluster-victoria-metrics-cluster-vminsert:8480/insert/0/prometheus' --set 'server.remote.read.url=http://vmcluster-victoria-metrics-cluster-vmselect:8481/select/0/prometheus' --set-json 'server.config.alerts.groups=[{"name":"kube-alerts","rules":[{"alert":"KubernetesPodCrashLooping","expr":"increase(kube_pod_container_status_restarts_total[5m]) &amp;gt; 2","for":"10m","labels":{"severity":"warning"},"annotations":{"summary":"Pod {{$labels.namespace}}/{{$labels.pod}} is crash looping"}}]}]'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;server.notifier.alertmanager:&lt;/code&gt; We are using a placeholder value here for now, as we cannot install the chart without providing some value here&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.datasource.url:&lt;/code&gt; Prometheus HTTP API compatible datasource&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.remote.write.url:&lt;/code&gt; Remote write url for storing rules and alert states&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;server.remote.read.url:&lt;/code&gt; URL to restore the alert states from&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can now port-forward the vmalert service and navigate to the web UI in &lt;a href="http://localhost:9090"&gt;http://localhost:9090&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1atpvxvxhmrfy192ikqj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1atpvxvxhmrfy192ikqj.png" alt="alert" width="783" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now achieved creating an alert rule, writing a unit test for it, and setting up vmalert with the alert rule defined.&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alert-routing-139o"&gt;Prometheus Observability Platform: Alert routing&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kubernetes</category>
      <category>alerting</category>
      <category>observability</category>
    </item>
    <item>
      <title>Prometheus Observability Platform: Long-term storage</title>
      <dc:creator>Aleksi Waldén</dc:creator>
      <pubDate>Thu, 14 Sep 2023 10:25:05 +0000</pubDate>
      <link>https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj</link>
      <guid>https://dev.to/polarsquad/prometheus-observability-platform-long-term-storage-4cbj</guid>
      <description>&lt;p&gt;As Prometheus is not so well designed for persisting data, a long-term storage solution is called for. Multiple different products can handle long-term storage for Prometheus metrics for example VictoriaMetrics, Grafana Mimir, Thanos, and M3.&lt;/p&gt;

&lt;p&gt;With some of these options, we get the capability to store the data into object storage which is ideal for modern workloads running in Kubernetes as we don’t want to store any persistent data inside our cluster. Object storage can be for example Azure Blob storage or AWS S3. This option however has downsides on performance (compared to block storage), so if you have high performance requirements, you might have to look into block storage options.&lt;/p&gt;

&lt;p&gt;In this document, we will be focusing on VictoriaMetrics. It was chosen because it is open-source, highly performant, and all its crucial components are free. VictoriaMetrics can only handle block storage, but it is also very fast due to its simple architecture designed only for local storage. It can be run in single mode or clustered. The central part of the architecture consists of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The vmstorage component, which stores the time series data;&lt;/li&gt;
&lt;li&gt;vmselect, used to fetch and merge data from vmstorage; and&lt;/li&gt;
&lt;li&gt;vminsert, which inserts the data into vmstorage nodes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the clustered version, data is distributed evenly across the vmstorage nodes by the vminsert component, and the distributed data is then fetched and merged by the vmselect component. In Kubernetes, each of these components will have its own pod and the vmselect and vminsert components will have a service to load balance the traffic. All the vmstorage endpoints (pods) will be connected to the vminsert and vmselect pods.&lt;/p&gt;

&lt;p&gt;VictoriaMetrics also has multiple additional features, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the vmalert component which can be used for alerts about the data, and&lt;/li&gt;
&lt;li&gt;vmagent which can be used as a data ingestion point, and for filtering and re-labelling metrics.&lt;/li&gt;
&lt;li&gt;the vmauth component for simple authentication, which uses credentials from the Authorization header. (You can also use some other component in front of vminsert or vmagent, such as oauth2-proxy to handle authentication.)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju9mnto04ijmfyqba1fs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fju9mnto04ijmfyqba1fs.png" alt="Basic architecture" width="708" height="740"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can set up Prometheus to write the data it receives into the long-term storage using the &lt;code&gt;remote_write&lt;/code&gt; block in the configuration. If authentication is set up, it also needs to be defined in the &lt;code&gt;remote_write&lt;/code&gt; block.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;We will now set up the following architecture with minikube, Prometheus, and VictoriaMetrics. This example assumes that you have completed the steps from &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-prometheus-1019"&gt;Prometheus Observability Platform: Prometheus&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi988mm3bbk13219fo2r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbi988mm3bbk13219fo2r.png" alt="Demo architecture" width="523" height="752"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First we add the VictoriaMetrics helm chart and install it into the VictoriaMetrics namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm repo add vm https://victoriametrics.github.io/helm-charts/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm install vmcluster vm/victoria-metrics-cluster --create-namespace --namespace victoriametrics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We should now see six pods running in the victoriametrics namespace:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get pods -n victoriametrics
---
NAME                                                           READY   STATUS    RESTARTS   AGE
vmcluster-victoria-metrics-cluster-vminsert-f8d48695c-gqx25    1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vminsert-f8d48695c-t8kcn    1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmselect-77465fb479-42wjs   1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmselect-77465fb479-t2jhp   1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmstorage-0                 1/1     Running   0          58s
vmcluster-victoria-metrics-cluster-vmstorage-1                 1/1     Running   0          58s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To access the vmselect web UI we need to port forward the vmselect service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n victoriametrics services/vmcluster-victoria-metrics-cluster-vmselect 9090:8481
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then navigate to &lt;a href="http://localhost:9090/select/0/prometheus/vmui"&gt;http://localhost:9090/select/0/prometheus/vmui&lt;/a&gt; to access the vmselect VMUI. The URL is a clustered url with the 0 representing the accountid of the cluster.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa28olty3pma1bdbrspz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwa28olty3pma1bdbrspz.png" alt="VMUI" width="780" height="322"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To set up remote writing from Prometheus into the VictoriaMetrics we need to update our kube-prometheus-stack deployment with the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack --namespace prometheus --reuse-values --set 'prometheus.prometheusSpec.remoteWrite[0].url=http://vmcluster-victoria-metrics-cluster-vminsert.victoriametrics.svc.cluster.local:8480/insert/0/prometheus/'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s break down the URL provided above. First, we have the service name for vminsert, which you can find with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl get svc -n victoriametrics 
---
NAME                                           TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
vmcluster-victoria-metrics-cluster-vminsert    ClusterIP   10.100.63.152   &amp;lt;none&amp;gt;        8480/TCP                     3h32m
vmcluster-victoria-metrics-cluster-vmselect    ClusterIP   10.99.12.151    &amp;lt;none&amp;gt;        8481/TCP                     3h32m
vmcluster-victoria-metrics-cluster-vmstorage   ClusterIP   None            &amp;lt;none&amp;gt;        8482/TCP,8401/TCP,8400/TCP   3h32m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we have the namespace victoriametrics. Since Prometheus and VictoriaMetrics are in different namespaces, we have &lt;code&gt;svc.cluster.local&lt;/code&gt; followed by the port number of the vminsert service and finally, we have the Prometheus enabled write endpoint with the accountid 0 in it, because we are using the clustered version of VictoriaMetrics.&lt;/p&gt;

&lt;p&gt;On the VMUI (&lt;a href="http://localhost:9090/select/0/prometheus/vmui"&gt;http://localhost:9090/select/0/prometheus/vmui&lt;/a&gt;) we can now verify that metrics are coming from Prometheus.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kubectl port-forward -n victoriametrics services/vmcluster-victoria-metrics-cluster-vmselect 9090:8481
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Navigate to the Query section and insert &lt;code&gt;kube_pod_container_status_restarts_total&lt;/code&gt; into the Query field. You should now see approximately the same output as you did from Prometheus.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbozct054mgkx3ww10if.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbozct054mgkx3ww10if.png" alt="Metrics" width="774" height="806"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have now set up a simple Prometheus and VictoriaMetrics integration&lt;/p&gt;

&lt;p&gt;Next part: &lt;a href="https://dev.to/polarsquad/prometheus-observability-platform-alerts-4dbb"&gt;Prometheus Observability Platform: Alerts&lt;/a&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
