<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: andygolubev</title>
    <description>The latest articles on DEV Community by andygolubev (@andygolubev).</description>
    <link>https://dev.to/andygolubev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1084373%2F211580fc-f491-4115-95d4-6d3cea5106d3.png</url>
      <title>DEV Community: andygolubev</title>
      <link>https://dev.to/andygolubev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/andygolubev"/>
    <language>en</language>
    <item>
      <title>Building an Offline AI Platform with K3s, Ansible, Argo CD, vLLM, and NVIDIA GPU</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Sun, 28 Jun 2026 12:03:56 +0000</pubDate>
      <link>https://dev.to/andygolubev/building-an-offline-ai-platform-with-k3s-ansible-argo-cd-vllm-and-nvidia-gpu-2cbo</link>
      <guid>https://dev.to/andygolubev/building-an-offline-ai-platform-with-k3s-ansible-argo-cd-vllm-and-nvidia-gpu-2cbo</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Running Kubernetes in the cloud is straightforward when every node can reach package repositories, container registries, GitHub, and model hubs. The task becomes much more interesting when the target server has no internet connection at all.&lt;/p&gt;

&lt;p&gt;For this proof of concept, I wanted to build a self-contained AI platform on a single Ubuntu server. The final environment had to include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;K3s&lt;/li&gt;
&lt;li&gt;NVIDIA GPU support&lt;/li&gt;
&lt;li&gt;vLLM with a locally stored Qwen model&lt;/li&gt;
&lt;li&gt;Argo CD and a local GitOps repository&lt;/li&gt;
&lt;li&gt;A FastAPI and LangChain chatbot&lt;/li&gt;
&lt;li&gt;Prometheus, Grafana, Loki, Tempo, and OpenTelemetry&lt;/li&gt;
&lt;li&gt;k9s for local operations&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The target machine is an Ubuntu 26.04 AMD64 server with an NVIDIA A10G GPU. I used an EC2 &lt;code&gt;g5.2xlarge&lt;/code&gt; instance for validation, but AWS is only a test harness. The installer itself does not provision infrastructure and does not depend on AWS.&lt;/p&gt;

&lt;p&gt;The important requirement was simple: after copying the bundle to the server, the installation must not make any external network request.&lt;/p&gt;

&lt;h2&gt;
  
  
  The two-environment design
&lt;/h2&gt;

&lt;p&gt;The solution separates preparation from installation. A connected machine downloads and packages everything. The isolated machine only consumes local files.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdbojykavmugz2h82w8z0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdbojykavmugz2h82w8z0.png" alt="Two-environment offline platform design" width="800" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This boundary is the main architectural decision. Instead of teaching every component how to tolerate a disconnected network, I move all network-dependent work to a controlled preparation phase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing the payload
&lt;/h2&gt;

&lt;p&gt;The repository provides one aggregate command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;offline-bundle
./scripts/download-all-artifacts.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The script requires Docker and at least 50 GB of free space. On macOS and non-AMD64 systems, it uses an Ubuntu 26.04 AMD64 container so that downloaded packages match the target architecture.&lt;/p&gt;

&lt;p&gt;Each artifact group is handled by a specialized script. Completed steps have content and environment fingerprints, so an interrupted download can resume without rebuilding everything. A &lt;code&gt;--clean&lt;/code&gt; option is available when a completely fresh payload is required.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ftx0sa5bfb7e3gk86jsbp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ftx0sa5bfb7e3gk86jsbp.png" alt="Offline payload preparation pipeline" width="799" height="176"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The generated &lt;code&gt;payload/&lt;/code&gt; directory contains binaries, &lt;code&gt;.deb&lt;/code&gt; packages, OCI image archives, Kubernetes manifests, tools, and model weights. It is generated locally and intentionally ignored by Git because it is large and reproducible.&lt;/p&gt;

&lt;p&gt;The model and vLLM image are the largest parts of the bundle. In this setup, the vLLM image is around 8 GB compressed and the Qwen2.5 7B model requires roughly 14–15 GB. Disk planning is not optional; the target installer checks for at least 80 GB of free space before it starts.&lt;/p&gt;

&lt;h2&gt;
  
  
  One command on the isolated server
&lt;/h2&gt;

&lt;p&gt;After preparing the payload, I copy the complete &lt;code&gt;offline-bundle/&lt;/code&gt; directory to the target and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;offline-bundle
./install.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The installer elevates with &lt;code&gt;sudo&lt;/code&gt;, verifies Ubuntu version and CPU architecture, checks free space, and validates the SHA256 checksum of every artifact. It then installs Ansible from local &lt;code&gt;.deb&lt;/code&gt; files and runs a localhost playbook.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ff8vzu4ikjeuv1ivh3pv1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Ff8vzu4ikjeuv1ivh3pv1.png" alt="Offline installation sequence" width="800" height="364"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The playbook uses &lt;code&gt;ansible_connection=local&lt;/code&gt;; SSH is not part of the installation design. The roles run in a deliberate order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Install K3s from its binary and air-gap image archive.&lt;/li&gt;
&lt;li&gt;Install the NVIDIA driver and container toolkit, then expose the GPU to Kubernetes.&lt;/li&gt;
&lt;li&gt;Start the local registry and Git mirror, then install Argo CD.&lt;/li&gt;
&lt;li&gt;Install k9s.&lt;/li&gt;
&lt;li&gt;Deploy the observability stack.&lt;/li&gt;
&lt;li&gt;Load the model and start vLLM.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The installer and roles are idempotent. If a validation fails, I can correct the problem and run the same command again.&lt;/p&gt;

&lt;h2&gt;
  
  
  What runs inside the server
&lt;/h2&gt;

&lt;p&gt;The result is a complete platform on one machine. Some supporting services, such as the Git daemon, run on the host. Application and platform workloads run inside K3s.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fio50o1b5cqnkd5s193ut.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fio50o1b5cqnkd5s193ut.png" alt="Single-node K3s runtime architecture" width="799" height="306"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;K3s imports its standard air-gap archive directly. Additional images are imported into containerd and pushed to a registry listening on &lt;code&gt;localhost:5000&lt;/code&gt;. Workloads use local image references, and the vLLM deployment uses &lt;code&gt;imagePullPolicy: Never&lt;/code&gt; as an extra guard against accidental pulls.&lt;/p&gt;

&lt;p&gt;The Qwen2.5-7B-Instruct snapshot is copied to &lt;code&gt;/opt/models&lt;/code&gt; and mounted into the vLLM pod. vLLM exposes an OpenAI-compatible endpoint on port 8000 and gets the single &lt;code&gt;nvidia.com/gpu&lt;/code&gt; resource advertised by the NVIDIA device plugin.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitOps without GitHub
&lt;/h2&gt;

&lt;p&gt;Argo CD usually pulls desired state from an external Git provider. That is impossible in an isolated network, so the bundle creates bare repositories on the target and serves them with a read-only Git daemon.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fg8080n6enavfr9mnfqxl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fg8080n6enavfr9mnfqxl.png" alt="Offline GitOps reconciliation flow" width="800" height="442"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;An in-cluster Service and Endpoints object exposes the host daemon as &lt;code&gt;git://git-mirror.gitops.svc.cluster.local&lt;/code&gt;. Argo CD reads an app-of-apps repository from this address, discovers the agent application, and deploys its Helm chart.&lt;/p&gt;

&lt;p&gt;This keeps the GitOps reconciliation model even when the platform cannot reach GitHub. The offline bundle is the delivery mechanism; the local Git mirror becomes the runtime source of truth.&lt;/p&gt;

&lt;h2&gt;
  
  
  The local AI application
&lt;/h2&gt;

&lt;p&gt;To verify that the stack works as a platform rather than a collection of pods, I included a small chatbot. It uses FastAPI, LangChain, and &lt;code&gt;ChatOpenAI&lt;/code&gt;, but points the client to the internal vLLM service instead of a public API.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgdnqtjl97ktv79y3624n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fgdnqtjl97ktv79y3624n.png" alt="Local chatbot inference request sequence" width="800" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The application supports a system prompt and conversation history. It also adds an OpenTelemetry span around every model invocation. Optional Langfuse integration can be enabled when a reachable Langfuse instance and credentials are provided.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fe19nqhn716bdy86fb26s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fe19nqhn716bdy86fb26s.png" alt="Chatbot running against the local model" width="799" height="643"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I also tested the OpenAI-compatible endpoint with OpenCode. Pointing existing OpenAI clients at the local service is one of the useful properties of vLLM: applications do not need a custom inference protocol.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fainwcbse75ffjm1b36vu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fainwcbse75ffjm1b36vu.png" alt="OpenCode using the local vLLM endpoint" width="800" height="735"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Observing the model and the GPU
&lt;/h2&gt;

&lt;p&gt;An offline platform still needs normal operational visibility. The bundle includes a deliberately compact but complete stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus for metrics&lt;/li&gt;
&lt;li&gt;Grafana for dashboards&lt;/li&gt;
&lt;li&gt;Loki and Promtail for logs&lt;/li&gt;
&lt;li&gt;Tempo for traces&lt;/li&gt;
&lt;li&gt;OpenTelemetry Collector for OTLP ingestion&lt;/li&gt;
&lt;li&gt;kube-state-metrics and node-exporter for Kubernetes and host metrics&lt;/li&gt;
&lt;li&gt;NVIDIA DCGM exporter for GPU metrics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prometheus scrapes vLLM metrics such as running requests as well as GPU utilization from DCGM exporter. Grafana provisions Prometheus, Loki, and Tempo as datasources and loads a bundled vLLM/GPU dashboard automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frxrao83egabpnau36ztn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Frxrao83egabpnau36ztn.png" alt="Grafana dashboard with vLLM and GPU metrics" width="800" height="273"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The following Mermaid diagram shows how one chat request becomes all three telemetry signals:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F51o8hhz8nzto0h8gqs8c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F51o8hhz8nzto0h8gqs8c.png" alt="Metrics, logs, and traces flow" width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Validation matters more in an air gap
&lt;/h2&gt;

&lt;p&gt;In a connected environment, a missing image or package may be downloaded later. In an isolated environment, a missing transitive dependency can stop the entire installation. For that reason, validation is part of the design rather than a final checklist.&lt;/p&gt;

&lt;p&gt;Before transfer, the preparation scripts verify the payload structure and checksums. During installation, the Ansible roles validate each layer before continuing. The acceptance checks include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The K3s node reaches &lt;code&gt;Ready&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The host reports the NVIDIA A10G with &lt;code&gt;nvidia-smi&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Kubernetes reports &lt;code&gt;nvidia.com/gpu: 1&lt;/code&gt; as allocatable.&lt;/li&gt;
&lt;li&gt;The vLLM pod starts without an external image pull.&lt;/li&gt;
&lt;li&gt;The model loads from &lt;code&gt;/opt/models/Qwen2.5-7B-Instruct&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;/v1/models&lt;/code&gt; and chat completion APIs respond.&lt;/li&gt;
&lt;li&gt;Argo CD applications synchronize from the local Git mirror.&lt;/li&gt;
&lt;li&gt;Prometheus targets are reachable.&lt;/li&gt;
&lt;li&gt;Grafana contains the provisioned datasources and dashboard.&lt;/li&gt;
&lt;li&gt;Loki returns logs and Tempo returns the chatbot trace.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The repository includes the exact commands in &lt;code&gt;offline-bundle/VALIDATION.md&lt;/code&gt;, making the test procedure reproducible instead of relying on visual inspection alone.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lessons learned
&lt;/h2&gt;

&lt;p&gt;The most difficult part of an offline Kubernetes installation is not K3s itself. It is the complete dependency graph around it: OS packages, container images, GPU kernel modules, tools, model files, manifests, and the runtime services that normally assume internet access.&lt;/p&gt;

&lt;p&gt;A few choices made the setup manageable:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use one explicit network boundary.&lt;/strong&gt; All downloads happen on the preparation host; installation uses local files only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin and record artifacts.&lt;/strong&gt; Image manifests and version files make the bundle understandable and reproducible.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify before transfer.&lt;/strong&gt; Checksums catch incomplete copies and corrupted large files before Ansible makes changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep the runtime local.&lt;/strong&gt; The registry, Git mirror, model storage, inference API, and telemetry backends all live on the target.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate each layer.&lt;/strong&gt; A running pod is not sufficient proof that the GPU, model API, GitOps reconciliation, and observability pipeline work together.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a proof of concept and intentionally uses a single node. A production design would need decisions about high availability, storage redundancy, backup, security hardening, model lifecycle, and how signed bundle updates cross the air gap. Still, the project demonstrates that the same cloud-native workflows can operate in an isolated environment when artifact preparation is treated as a first-class part of the architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With a prepared payload and a local Ansible playbook, I can turn an isolated Ubuntu GPU server into a small AI platform using one installation command. K3s provides the runtime, Argo CD preserves the GitOps workflow, vLLM serves a local model, and the observability stack makes the result operable.&lt;/p&gt;

&lt;p&gt;The code is available in my GitHub repository: &lt;a href="https://github.com/andygolubev/ansible-k3s-on-prem" rel="noopener noreferrer"&gt;github.com/andygolubev/ansible-k3s-on-prem&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this article.&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>ansible</category>
      <category>ai</category>
      <category>devops</category>
    </item>
    <item>
      <title>How llm-d Prefix-Cache Routing Made Qwen 7B on EKS 2.3x Faster</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Sat, 27 Jun 2026 08:34:17 +0000</pubDate>
      <link>https://dev.to/andygolubev/how-llm-d-prefix-cache-routing-made-qwen-7b-on-eks-23x-faster-2h8j</link>
      <guid>https://dev.to/andygolubev/how-llm-d-prefix-cache-routing-made-qwen-7b-on-eks-23x-faster-2h8j</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I wanted to benchmark how much the routing layer matters for LLM inference when the workload has repeated long prefixes.&lt;/p&gt;

&lt;p&gt;The setup was intentionally simple: Qwen2.5-7B-Instruct, vLLM, AWS EKS, FSx for Lustre, and eight &lt;code&gt;g5.xlarge&lt;/code&gt; GPU nodes. Each node had one NVIDIA A10G GPU and ran one vLLM decode replica. The interesting part was the comparison in front of those same eight pods.&lt;/p&gt;

&lt;p&gt;One path used a plain Kubernetes ClusterIP Service, which effectively gives round-robin-style traffic distribution. The other path used llm-d with the precise prefix-cache-aware endpoint picker.&lt;/p&gt;

&lt;p&gt;The result was not small. With the same hardware and the same vLLM pods, llm-d finished the 512-concurrency benchmark in &lt;strong&gt;358.7 seconds&lt;/strong&gt; instead of &lt;strong&gt;840.2 seconds&lt;/strong&gt;. Output throughput went from &lt;strong&gt;2,742 tok/s&lt;/strong&gt; to &lt;strong&gt;6,423 tok/s&lt;/strong&gt;, and mean time to first token dropped from &lt;strong&gt;19.0 seconds&lt;/strong&gt; to &lt;strong&gt;0.86 seconds&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;vLLM has a KV cache. If many requests share the same long prefix, the best case is to reuse the cached prefix blocks instead of recomputing the prefill again and again.&lt;/p&gt;

&lt;p&gt;But there is a catch: each vLLM replica has its own KV cache.&lt;/p&gt;

&lt;p&gt;With plain round-robin routing, repeated-prefix requests are scattered across replicas. A request may land on a pod that has never seen that prefix before, even though another pod already has the right KV blocks. That means the cluster burns GPU time on repeated prefill work, fills KV cache, and eventually starts queueing requests.&lt;/p&gt;

&lt;p&gt;llm-d solves this specific problem by making routing aware of prefix-cache locality. In this benchmark, the llm-d endpoint picker routed prompts to the replica that was most likely to already hold the matching prefix blocks.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdyff55r3dug83rf5g3vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fdyff55r3dug83rf5g3vs.png" alt="Round-robin Grafana dashboard" width="800" height="530"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwnq9ug7pzh7iqaft7315.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fwnq9ug7pzh7iqaft7315.png" alt="Architecture diagram" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The benchmark cluster was built on AWS EKS.&lt;/p&gt;

&lt;p&gt;The main components were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;8 x &lt;code&gt;g5.xlarge&lt;/code&gt; GPU nodes, each with 1 x NVIDIA A10G 24 GB.&lt;/li&gt;
&lt;li&gt;1 x &lt;code&gt;m6i.4xlarge&lt;/code&gt; system node for support workloads.&lt;/li&gt;
&lt;li&gt;8 vLLM decode pods, one per GPU node.&lt;/li&gt;
&lt;li&gt;Qwen2.5-7B-Instruct weights mounted from FSx for Lustre.&lt;/li&gt;
&lt;li&gt;NVIDIA GPU Operator for device plugin, DCGM exporter, validators, and GPU discovery.&lt;/li&gt;
&lt;li&gt;kube-prometheus-stack for Prometheus and Grafana.&lt;/li&gt;
&lt;li&gt;llm-d EPP router with precise prefix-cache routing.&lt;/li&gt;
&lt;li&gt;A baseline Kubernetes Service named &lt;code&gt;vllm-roundrobin&lt;/code&gt; that selected the same decode pods.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The important detail is that both paths used the same eight vLLM decode pods. The only meaningful difference in the A/B test was the routing layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F63c6mjkmh4t0dnqqr5xr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F63c6mjkmh4t0dnqqr5xr.png" alt="Solution schema" width="800" height="128"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Realization
&lt;/h2&gt;

&lt;p&gt;The infrastructure was created with Terraform, then the cluster dependencies were installed with Helm and scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./scripts/tf-init.sh
./scripts/tf-apply.sh
./scripts/update-kubeconfig.sh
./scripts/install-gpu-operator.sh
./scripts/install-fsx-csi-driver.sh
./scripts/install-monitoring.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model weights were not downloaded by the inference pod at runtime. They were already available on FSx for Lustre and mounted into the pods. This made pod restarts much faster and avoided pushing large model downloads into the benchmark path.&lt;/p&gt;

&lt;p&gt;For the llm-d test, I installed the precise prefix-cache routing stack and used the repo customization under &lt;code&gt;deploy/llm-d/&lt;/code&gt;. The important pieces were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;patch-vllm.yaml&lt;/code&gt;: configured 1 GPU per replica, local FSx Qwen path, GPU scheduling, KV events over ZMQ, and &lt;code&gt;--block-size=64&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;router.values.yaml&lt;/code&gt;: configured the EPP router and precise prefix-cache scorer.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;fsx-pvc.yaml&lt;/code&gt;: added a static FSx PV/PVC for the llm-d namespace.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;baseline-rr-service.yaml&lt;/code&gt;: created the plain Kubernetes Service for the round-robin baseline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There were also a few practical gotchas:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The baseline vLLM deployment had to be scaled to zero while running the llm-d demo, otherwise it occupied one GPU and the eighth decode pod could not schedule.&lt;/li&gt;
&lt;li&gt;The llm-d EPP pod needed a larger system node because its containers request enough CPU and memory that a tiny system node is not enough.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;enableServiceLinks: false&lt;/code&gt; was important for vLLM pods, because Kubernetes service environment variables can collide with vLLM's own &lt;code&gt;VLLM_PORT&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The vLLM &lt;code&gt;--block-size&lt;/code&gt; and the router scorer block size had to match.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fl9o3ozt4jv67perbu4lb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fl9o3ozt4jv67perbu4lb.png" alt="8 node Grafana overview" width="800" height="531"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmark Setup
&lt;/h2&gt;

&lt;p&gt;The main benchmark used &lt;code&gt;vllm bench serve&lt;/code&gt; with a repeated-prefix dataset:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dataset: prefix_repetition
prefixes: 150
prefix length: 2048 tokens
suffix length: 128 tokens
output length: 256 tokens
request rate: inf
max concurrency: 512
prompts: 9000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The benchmark was collected on 15 June 2026.&lt;/p&gt;

&lt;p&gt;I also ran a smaller rate ladder at requested rates of 20, 40, and 60 requests per second. That helped show where the round-robin path started saturating and where llm-d still had useful headroom.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;Here is the 512-concurrency result:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Round-robin&lt;/th&gt;
&lt;th&gt;llm-d&lt;/th&gt;
&lt;th&gt;llm-d advantage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Successful / failed requests&lt;/td&gt;
&lt;td&gt;9000 / 0&lt;/td&gt;
&lt;td&gt;9000 / 0&lt;/td&gt;
&lt;td&gt;Same&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Benchmark duration&lt;/td&gt;
&lt;td&gt;840.2 s&lt;/td&gt;
&lt;td&gt;358.7 s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;2.3x faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Request throughput&lt;/td&gt;
&lt;td&gt;10.71 req/s&lt;/td&gt;
&lt;td&gt;25.09 req/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+134%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output token throughput&lt;/td&gt;
&lt;td&gt;2,742 tok/s&lt;/td&gt;
&lt;td&gt;6,423 tok/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+134%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Total token throughput&lt;/td&gt;
&lt;td&gt;26,362 tok/s&lt;/td&gt;
&lt;td&gt;61,748 tok/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+134%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean TTFT&lt;/td&gt;
&lt;td&gt;19,029 ms&lt;/td&gt;
&lt;td&gt;863 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-95%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median TTFT&lt;/td&gt;
&lt;td&gt;18,458 ms&lt;/td&gt;
&lt;td&gt;340 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-98%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P99 TTFT&lt;/td&gt;
&lt;td&gt;36,739 ms&lt;/td&gt;
&lt;td&gt;12,544 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-66%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mean TPOT&lt;/td&gt;
&lt;td&gt;109.2 ms&lt;/td&gt;
&lt;td&gt;75.3 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-31%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P99 TPOT&lt;/td&gt;
&lt;td&gt;157.4 ms&lt;/td&gt;
&lt;td&gt;111.0 ms&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;-29%&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Prefix cache hit rate&lt;/td&gt;
&lt;td&gt;about 11%&lt;/td&gt;
&lt;td&gt;about 93%&lt;/td&gt;
&lt;td&gt;Much higher&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GPU KV cache usage&lt;/td&gt;
&lt;td&gt;about 98-99%&lt;/td&gt;
&lt;td&gt;about 64-71%&lt;/td&gt;
&lt;td&gt;Avoided saturation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Waiting requests&lt;/td&gt;
&lt;td&gt;about 180&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Queue removed&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The rate ladder showed the same shape:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Requested rate&lt;/th&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Achieved req/s&lt;/th&gt;
&lt;th&gt;Output tok/s&lt;/th&gt;
&lt;th&gt;Mean TTFT&lt;/th&gt;
&lt;th&gt;P99 TTFT&lt;/th&gt;
&lt;th&gt;Mean TPOT&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;20 req/s&lt;/td&gt;
&lt;td&gt;Round-robin&lt;/td&gt;
&lt;td&gt;7.05&lt;/td&gt;
&lt;td&gt;1,805.9&lt;/td&gt;
&lt;td&gt;3,338.5 ms&lt;/td&gt;
&lt;td&gt;17,075.5 ms&lt;/td&gt;
&lt;td&gt;78.5 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20 req/s&lt;/td&gt;
&lt;td&gt;llm-d&lt;/td&gt;
&lt;td&gt;11.85&lt;/td&gt;
&lt;td&gt;3,034.4&lt;/td&gt;
&lt;td&gt;514.0 ms&lt;/td&gt;
&lt;td&gt;1,142.9 ms&lt;/td&gt;
&lt;td&gt;52.8 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40 req/s&lt;/td&gt;
&lt;td&gt;Round-robin&lt;/td&gt;
&lt;td&gt;8.57&lt;/td&gt;
&lt;td&gt;2,192.7&lt;/td&gt;
&lt;td&gt;22,055.0 ms&lt;/td&gt;
&lt;td&gt;56,710.0 ms&lt;/td&gt;
&lt;td&gt;99.9 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40 req/s&lt;/td&gt;
&lt;td&gt;llm-d&lt;/td&gt;
&lt;td&gt;22.64&lt;/td&gt;
&lt;td&gt;5,795.0&lt;/td&gt;
&lt;td&gt;1,901.0 ms&lt;/td&gt;
&lt;td&gt;5,585.9 ms&lt;/td&gt;
&lt;td&gt;76.5 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;60 req/s&lt;/td&gt;
&lt;td&gt;Round-robin&lt;/td&gt;
&lt;td&gt;8.90&lt;/td&gt;
&lt;td&gt;2,278.1&lt;/td&gt;
&lt;td&gt;41,661.2 ms&lt;/td&gt;
&lt;td&gt;90,767.7 ms&lt;/td&gt;
&lt;td&gt;104.3 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;60 req/s&lt;/td&gt;
&lt;td&gt;llm-d&lt;/td&gt;
&lt;td&gt;21.52&lt;/td&gt;
&lt;td&gt;5,507.9&lt;/td&gt;
&lt;td&gt;3,496.8 ms&lt;/td&gt;
&lt;td&gt;8,605.3 ms&lt;/td&gt;
&lt;td&gt;122.3 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Round-robin saturated very early. Even when I requested 40 or 60 req/s, it only delivered about 8 to 9 req/s. TTFT then collapsed into tens of seconds.&lt;/p&gt;

&lt;p&gt;llm-d did not make the GPUs infinitely fast, of course. Eight A10Gs still have a real ceiling. But it moved the useful ceiling much higher because it avoided a large amount of repeated prefill work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why llm-d Won
&lt;/h2&gt;

&lt;p&gt;The workload had 150 repeated long prefixes. That is exactly the kind of traffic where cache locality matters.&lt;/p&gt;

&lt;p&gt;Round-robin distributed requests without knowing which replica had which prefix in its KV cache. So requests kept forcing prefills on replicas that did not need to do that work if traffic had been routed differently.&lt;/p&gt;

&lt;p&gt;With llm-d, vLLM emitted KV events and the router used those events to build a prefix-cache-aware view of the replicas. When the next request arrived, the endpoint picker could prefer the replica that already had the relevant prefix blocks.&lt;/p&gt;

&lt;p&gt;The result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prefix cache hit rate increased from about 11% to about 93%.&lt;/li&gt;
&lt;li&gt;Waiting requests dropped from about 180 to zero.&lt;/li&gt;
&lt;li&gt;KV cache stayed around 64-71% instead of pinning near 99%.&lt;/li&gt;
&lt;li&gt;Output throughput more than doubled.&lt;/li&gt;
&lt;li&gt;Mean TTFT dropped by about 95%.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most interesting part is that this was not a model change, GPU change, or replica-count change. It was the routing layer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbc8uf97u3sqiqi5zd8xi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fbc8uf97u3sqiqi5zd8xi.png" alt="NVIDIA SMI during load" width="800" height="350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Notes From The Run
&lt;/h2&gt;

&lt;p&gt;The vLLM logs showed the llm-d path running with no waiting queue while the prefix hit rate warmed up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Running: 64 reqs, Waiting: 0 reqs, GPU KV cache usage: 62.1%, Prefix cache hit rate: 63.5%
Running: 68 reqs, Waiting: 0 reqs, GPU KV cache usage: 68.9%, Prefix cache hit rate: 66.4%
Running: 76 reqs, Waiting: 0 reqs, GPU KV cache usage: 70.1%, Prefix cache hit rate: 72.7%
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final aggregate showed a wider P99 TTFT than the steady-state Grafana view, because the beginning of the run included cold-cache ramp-up. After the cache warmed, the median TTFT was 340 ms and the steady-state dashboard showed the system serving 512-concurrency traffic without queue buildup.&lt;/p&gt;

&lt;p&gt;There was also an FSx CSI controller warning about missing &lt;code&gt;DescribeFileSystems&lt;/code&gt; permission. In this setup it was not blocking, because I used static FSx PV/PVC configuration. The file system identity and mount details were already known, so dynamic FSx discovery was not part of the benchmark path.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This benchmark was a good reminder that LLM inference performance is not only about the GPU count.&lt;/p&gt;

&lt;p&gt;For repeated-prefix workloads, the routing layer can decide whether the cluster reuses KV cache or recomputes the same long prefixes again and again. In this run, llm-d precise prefix-cache routing made the same 8 x A10G fleet finish the workload &lt;strong&gt;2.3x faster&lt;/strong&gt;, while cutting mean TTFT by &lt;strong&gt;95%&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If your traffic has shared system prompts, long common instructions, retrieval templates, chat prefixes, or agent scaffolding, round-robin routing can quietly waste a lot of GPU time. Prefix-cache-aware routing is one of those changes that looks small in the architecture diagram but very large in the benchmark results.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this article.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/aws-eks-inference-llmd-vllm-benchmark-qwen-7b" rel="noopener noreferrer"&gt;https://github.com/andygolubev/aws-eks-inference-llmd-vllm-benchmark-qwen-7b&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>aws</category>
      <category>eks</category>
      <category>ai</category>
    </item>
    <item>
      <title>AWS Cloud Formation doing crazy</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Thu, 27 Nov 2025 07:01:12 +0000</pubDate>
      <link>https://dev.to/andygolubev/aws-cloud-formation-doing-crazy-38gp</link>
      <guid>https://dev.to/andygolubev/aws-cloud-formation-doing-crazy-38gp</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;I decided to write this article after a year and a half of actively using AWS CloudFormation across two separate products. Because it’s less popular than Terraform, finding solutions to some problems often meant piecing together hints from different sources. Here I’ll share my experience in the hope that it helps someone else solve their CloudFormation challenges. &lt;/p&gt;

&lt;p&gt;A large part of this article is code. It’s mainly a note for myself in the future, so I can remember how I used AWS CloudFormation if I need to work with it again.&lt;/p&gt;

&lt;p&gt;When you work with CloudFormation, there are some key differences from Terraform. For example: there’s no automatic drift remediation, deployments are all-or-nothing (no partial apply), you can’t deploy to multiple regions in one go, and stack policies have their own quirks you need to understand.&lt;/p&gt;

&lt;p&gt;Below I will show how to overcome these challenges to deploy this example architecture. Code is available on github.&lt;/p&gt;

&lt;h2&gt;
  
  
  CloudFormation Demo Stack for this Article
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Purpose:&lt;/strong&gt; &lt;br&gt;
End-to-end AWS reference environment that bootstraps networking, security, compute, data, and edge delivery through layered CloudFormation templates orchestrated by cfn-stacks/10-main-stack.yaml.&lt;/p&gt;

&lt;p&gt;Github: &lt;a href="https://github.com/andygolubev/article-cfn-pain-points" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73uifvdxhtj1c97gxsxo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F73uifvdxhtj1c97gxsxo.png" alt="Solution" width="800" height="432"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;High resolution image is here: &lt;a href="https://raw.githubusercontent.com/andygolubev/article-cfn-pain-points/cec0fbbbc884efe83831c6f75ba365fc887580c9/solution_diagram.png" rel="noopener noreferrer"&gt;https://raw.githubusercontent.com/andygolubev/article-cfn-pain-points/cec0fbbbc884efe83831c6f75ba365fc887580c9/solution_diagram.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;CloudFormation blueprints (cfn-stacks/):&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/cfn-stacks" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/cfn-stacks&lt;/a&gt;&lt;br&gt;
Modular stacks for shared artifacts and ECR registries, VPC and NAT topology, Route 53 hosted zone, Aurora/PostgreSQL, ElastiCache Redis, Fargate-based ECS, API Gateway fronting the internal NLB, EventBridge wiring, WAF protection, Lambda resources, and a us-east-1 global stack providing ACM/CloudFront distribution with DNS aliases. Parameter sets live in parameters-.json, while main-stack-policy.json locks down updates in stage/prod.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lambda layer (lambda-layer/):&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/lambda-layer" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/lambda-layer&lt;/a&gt;&lt;br&gt;
Dockerfile-driven build that packages shared Python helpers like common_service.get_hello_world() into a reusable layer zip (lambda_layer.zip) for multiple functions; build commands are documented in the folder README.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lambda functions (lambda-functions/demo_lambda/):&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/lambda-functions" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/lambda-functions&lt;/a&gt;&lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/ecr-repo-services/demo-antivirus-scanner" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/ecr-repo-services/demo-antivirus-scanner&lt;/a&gt;&lt;br&gt;
Sample Python handler that imports the shared layer artifact to return a greeting and request metadata, demonstrating code reuse across functions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ECS service (ecr-repo-services/):&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/ecr-repo-services/demo-backend-service" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/ecr-repo-services/demo-backend-service&lt;/a&gt;&lt;br&gt;
Two example workloads with ready-to-push Dockerfiles—demo-backend-service (Go HTTP service for ECS Fargate) and demo-antivirus-scanner (Python ARM64 Lambda image)—each with snippets for authenticating to ECR, creating repositories, and pushing images.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Frontend sample (cloudfront-frontend-code/):&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/cloudfront-frontend-code" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/cloudfront-frontend-code&lt;/a&gt;&lt;br&gt;
Minimal static site that represents the S3-hosted SPA/front-end assets later served through CloudFront.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Automation scripts (scripts/):&lt;/strong&gt; &lt;br&gt;
&lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main/scripts" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main/scripts&lt;/a&gt;&lt;br&gt;
01-deploy-cfn.sh orchestrates regional stack deployments, parameter wiring, and layer uploads; 02-deploy-cfn-global.sh handles the us-east-1 global stack, reading outputs from the regional deployment.&lt;/p&gt;

&lt;p&gt;The codebase is deployable and operational; I’ve verified it in my AWS account =)&lt;/p&gt;
&lt;h2&gt;
  
  
  Deploying to Multiple Regions
&lt;/h2&gt;

&lt;p&gt;CloudFormation wasn’t really designed for comfortable multi-region deployments. I don’t know why. But there are workarounds.&lt;/p&gt;

&lt;p&gt;Here’s what I’ve used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A Bash wrapper that deploys different resources to different regions and passes parameters between them.&lt;/li&gt;
&lt;li&gt;StackSets to push the needed resources into another region, plus Secrets Manager replication to bring the final value back into the original region.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;With the Bash approach, you can call aws cloudformation deploy multiple times with different parameters. To fetch values for later steps, use &lt;code&gt;aws cloudformation list-exports&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;WEGO_HOSTED_ZONE_ID&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudformation list-exports &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".Exports[] | select(.Name == &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;demo-hosted-zone-id&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;) | .Value"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;WEGO_HOSTED_ZONE_DOMAIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudformation list-exports &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; | jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s2"&gt;".Exports[] | select(.Name == &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;demo-hosted-zone-domain-name&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;) | .Value"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;DEMO_CLOUDFRONT_CERTIFICATE_DOMAIN_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;jq &lt;span class="nt"&gt;-r&lt;/span&gt; &lt;span class="s1"&gt;'.[] | select(.ParameterKey == "DemoCloudFrontCertificateDomainNameParam") | .ParameterValue'&lt;/span&gt; &lt;span class="s2"&gt;"parameters-&lt;/span&gt;&lt;span class="nv"&gt;$4&lt;/span&gt;&lt;span class="s2"&gt;.json"&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;S3_DEMO_BUCKET_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudformation describe-stacks &lt;span class="nt"&gt;--stack-name&lt;/span&gt; demo-s3-stack &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"Stacks[0].Outputs[?OutputKey=='DemoFrontendBucketName'].OutputValue"&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nv"&gt;S3_DEMO_BUCKET_OAI&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudformation describe-stacks &lt;span class="nt"&gt;--stack-name&lt;/span&gt; demo-s3-stack  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"Stacks[0].Outputs[?OutputKey=='DemoFrontendCloudFrontOAI'].OutputValue"&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you use StackSets, you need to add a few roles and some shared plumbing (the StackSet itself). The final template for the deployment has to be embedded inside the StackSet. It’s not pretty—linters won’t parse this setup—but for one-off cases it’s good enough.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying big stacks
&lt;/h2&gt;

&lt;p&gt;To pass parameters between stacks you have a few options:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nested stacks&lt;/li&gt;
&lt;li&gt;Exports and imports&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first glance, exports/imports look cleaner. In practice, they can lock you in. Once you export a value and other stacks start importing it, you can’t change that value freely. To update it, you have to touch every stack that consumes the export. The good news: it’s easy to see which stacks are using your export.&lt;/p&gt;

&lt;p&gt;Because of this, I usually prefer nested stacks with parameter passing. When the root stack changes, CloudFormation updates all dependent resources automatically—either by applying changes or recreating what’s needed. It keeps the dependency chain explicit and the updates predictable.&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying stack policy to nested stacks
&lt;/h2&gt;

&lt;p&gt;When you apply a stack policy to the root stack, it doesn’t automatically cover the nested stacks. Each nested stack is its own stack with its own policy. Because of that, I set the policy separately for every nested stack—usually in a small loop/script that iterates over child stacks and applies the policy to each one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;NESTED_STACK_ARNS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;aws cloudformation describe-stack-resources &lt;span class="nt"&gt;--stack-name&lt;/span&gt; demo-main-stack  &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"StackResources[?ResourceType=='AWS::CloudFormation::Stack'].PhysicalResourceId"&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; text&lt;span class="si"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Setting stack policy for demo main stack: demo-main-stack"&lt;/span&gt;
aws cloudformation set-stack-policy &lt;span class="nt"&gt;--stack-name&lt;/span&gt; demo-main-stack &lt;span class="nt"&gt;--stack-policy-body&lt;/span&gt; file://main-stack-policy.json &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Error setting stack policy to demo main stack. Exiting..."&lt;/span&gt;
&lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# Apply stack policy to each nested stack&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;STACK &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$NESTED_STACK_ARNS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Setting stack policy for nested stack: &lt;/span&gt;&lt;span class="nv"&gt;$STACK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    aws cloudformation set-stack-policy &lt;span class="nt"&gt;--stack-name&lt;/span&gt; &lt;span class="nv"&gt;$STACK&lt;/span&gt; &lt;span class="nt"&gt;--stack-policy-body&lt;/span&gt; file://./main-stack-policy.json &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Error setting stack policy to nested stack: &lt;/span&gt;&lt;span class="nv"&gt;$STACK&lt;/span&gt;&lt;span class="s2"&gt;. Exiting..."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi
done

for &lt;/span&gt;STACK &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="nv"&gt;$NESTED_STACK_ARNS&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Get stack policy for nested stack: &lt;/span&gt;&lt;span class="nv"&gt;$STACK&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    aws cloudformation get-stack-policy &lt;span class="nt"&gt;--stack-name&lt;/span&gt; &lt;span class="nv"&gt;$STACK&lt;/span&gt; &lt;span class="nt"&gt;--region&lt;/span&gt; &lt;span class="nv"&gt;$REGION&lt;/span&gt; &lt;span class="nt"&gt;--output&lt;/span&gt; json &lt;span class="nt"&gt;--no-cli-pager&lt;/span&gt; | jq &lt;span class="s1"&gt;'.StackPolicyBody | fromjson'&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="nv"&gt;$?&lt;/span&gt; &lt;span class="nt"&gt;-ne&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Error getting stack policy from nested stack: &lt;/span&gt;&lt;span class="nv"&gt;$STACK&lt;/span&gt;&lt;span class="s2"&gt;. Exiting..."&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
    &lt;span class="k"&gt;fi
done&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  How to deploy this stack
&lt;/h2&gt;

&lt;p&gt;You can deploy this stack with &lt;code&gt;aws&lt;/code&gt; cli tool. You also need &lt;code&gt;jq&lt;/code&gt; to be installed.&lt;/p&gt;

&lt;p&gt;It uses different parameters-env.json in cfn-stacks/ folder&lt;br&gt;
 for each environment.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;./scripts/01-deploy-cfn.sh --region eu-central-1 --env dev
./scripts/02-deploy-cfn-global.sh --region eu-central-1 --env dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You don’t always need to rely on out-of-the-box solutions, especially when they don’t fit your needs. With a bit of creativity and the right open-source tools, you can build a custom solution that’s both effective and cost-efficient. In this case, combining Prometheus, Grafana, Loki, and a few other tools, I managed to set up a reliable monitoring system that works perfectly for a small startup without breaking the bank.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this article.&lt;/p&gt;

&lt;p&gt;You can find all my articles on: &lt;a href="https://andygolubev.com/" rel="noopener noreferrer"&gt;https://andygolubev.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/article-cfn-pain-points/tree/main" rel="noopener noreferrer"&gt;https://github.com/andygolubev/article-cfn-pain-points/tree/main&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>cloudformation</category>
      <category>infrastructureascode</category>
      <category>devops</category>
    </item>
    <item>
      <title>Monitoring multiple k8s clusters on Digital Ocean with Prometheus and Grafana deployed using Terraform and Ansible role</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Mon, 26 Aug 2024 21:09:30 +0000</pubDate>
      <link>https://dev.to/andygolubev/monitoring-multiple-k8s-clusters-with-prometheus-and-grafana-deployed-using-terraform-and-ansible-role-57o2</link>
      <guid>https://dev.to/andygolubev/monitoring-multiple-k8s-clusters-with-prometheus-and-grafana-deployed-using-terraform-and-ansible-role-57o2</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;The internet is full of ready-made solutions for every taste, but problems arise when they don't fit your needs. That's when it's time to come up with something custom.&lt;/p&gt;

&lt;p&gt;This time, the challenge was to collect metrics from two K8S clusters located in different VPCs. &lt;/p&gt;

&lt;p&gt;It seemed like a simple task:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up a dedicated server for Prometheus.&lt;/li&gt;
&lt;li&gt;Create a VPC Peering connection.&lt;/li&gt;
&lt;li&gt;Deploy Prometheus in each cluster.&lt;/li&gt;
&lt;li&gt;Set up federation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, the issue is that Digital Ocean Cloud doesn't support VPC Peering  as of this writing (link to &lt;a href="https://docs.digitalocean.com/reference/terraform/reference/resources/vpc_peering/" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;), meaning all metrics would leave the cloud, go to the internet, and then come back, causing unnecessary traffic costs.&lt;/p&gt;

&lt;p&gt;To avoid this, we had to come up with alternative solutions that would work for a small startup while avoiding extra expenses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;So, here's the solution I implemented.&lt;/p&gt;

&lt;p&gt;There are three VPCs. Two of them host the clusters, and the third one contains the supporting tools, including a server with Grafana.&lt;/p&gt;

&lt;p&gt;Grafana connects to each cluster and pulls data for the dashboard. This setup ensures that traffic only flows when the dashboard is being viewed. Authentication is handled at the ingress.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq5gopwl40fh7dqlbrin.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwq5gopwl40fh7dqlbrin.jpg" alt="Solution" width="800" height="815"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tools
&lt;/h2&gt;

&lt;p&gt;For the implementation, I chose the following tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Prometheus&lt;/li&gt;
&lt;li&gt;Grafana&lt;/li&gt;
&lt;li&gt;Loki&lt;/li&gt;
&lt;li&gt;Terraform&lt;/li&gt;
&lt;li&gt;Ansible role&lt;/li&gt;
&lt;li&gt;Nginx&lt;/li&gt;
&lt;li&gt;Docker Compose&lt;/li&gt;
&lt;li&gt;Ubuntu server on Droplet&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I use Terraform to provision the server for the subsequent Grafana installation, as well as to create DNS records.&lt;/p&gt;

&lt;p&gt;I use an Ansible role to configure the server, including the installation and launch of all necessary services:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Certbot&lt;/li&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;Nginx&lt;/li&gt;
&lt;li&gt;Grafana dashboards&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Prometheus and Grafana Loki are installed in the K8S clusters, from where the metrics and logs are collected.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Code
&lt;/h2&gt;

&lt;p&gt;I have a standard Ansible role written with tasks and templates to automate the setup and configuration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;➜  monitoring-prometheus git:&lt;span class="o"&gt;(&lt;/span&gt;main&lt;span class="o"&gt;)&lt;/span&gt; tree monitoring_role 
monitoring_role
├── README.md
├── defaults
│&amp;nbsp;&amp;nbsp; └── main.yml
├── files
│&amp;nbsp;&amp;nbsp; ├── grafana-k8s-cluster-dashboard.json
│&amp;nbsp;&amp;nbsp; ├── grafana-k8s-logs-dashboard.json
│&amp;nbsp;&amp;nbsp; └── grafana-k8s-volumes-dashboard.json
├── handlers
│&amp;nbsp;&amp;nbsp; └── main.yml
├── meta
│&amp;nbsp;&amp;nbsp; └── main.yml
├── tasks
│&amp;nbsp;&amp;nbsp; ├── 01_wait_for_initialization.yml
│&amp;nbsp;&amp;nbsp; ├── 02_install_certbot_and_configure_nginx.yml
│&amp;nbsp;&amp;nbsp; ├── 03_install_docker.yml
│&amp;nbsp;&amp;nbsp; ├── 04_add_monitoring_user.yml
│&amp;nbsp;&amp;nbsp; ├── 05_copy_configuration_files.yml
│&amp;nbsp;&amp;nbsp; ├── 06_run_containers.yml
│&amp;nbsp;&amp;nbsp; ├── 07_enable_ufw.yml
│&amp;nbsp;&amp;nbsp; └── main.yml
├── templates
│&amp;nbsp;&amp;nbsp; ├── dashboards.yaml.j2
│&amp;nbsp;&amp;nbsp; ├── datasources.yaml.j2
│&amp;nbsp;&amp;nbsp; ├── default.j2
│&amp;nbsp;&amp;nbsp; └── docker-compose.yml.j2
├── tests
│&amp;nbsp;&amp;nbsp; ├── inventory
│&amp;nbsp;&amp;nbsp; └── test.yml
└── vars
    └── main.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Ansible role handles the issuance of certificates, installs Nginx, sets up Grafana with dashboards, and starts Docker compose.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result
&lt;/h2&gt;

&lt;p&gt;Here is an example of how the final dashboard looks:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fta0b41qmptebbudiy1u2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fta0b41qmptebbudiy1u2.jpg" alt="Dashboard" width="800" height="435"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;You don’t always need to rely on out-of-the-box solutions, especially when they don’t fit your needs. With a bit of creativity and the right open-source tools, you can build a custom solution that’s both effective and cost-efficient. In this case, combining Prometheus, Grafana, Loki, and a few other tools, I managed to set up a reliable monitoring system that works perfectly for a small startup without breaking the bank.&lt;/p&gt;

&lt;p&gt;I hope you enjoyed this article.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/monitoring-prometheus" rel="noopener noreferrer"&gt;https://github.com/andygolubev/monitoring-prometheus&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ansible</category>
      <category>prometheus</category>
      <category>grafana</category>
      <category>terraform</category>
    </item>
    <item>
      <title>Backup tool using AWS Batch, ECS and Fargate for backuping objects from other clouds</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Tue, 13 Feb 2024 10:11:49 +0000</pubDate>
      <link>https://dev.to/andygolubev/backup-tool-using-aws-batch-ecs-and-fargate-for-backuping-objects-from-other-clouds-3bj8</link>
      <guid>https://dev.to/andygolubev/backup-tool-using-aws-batch-ecs-and-fargate-for-backuping-objects-from-other-clouds-3bj8</guid>
      <description>&lt;h2&gt;
  
  
  Intro
&lt;/h2&gt;

&lt;p&gt;Sometimes we have to do dull tasks like setting up a backup system. We want to do it in the easiest and cheapest way possible. That's why I think AWS gives us lots of services to do it smoothly.&lt;br&gt;
Because my computing stuff is on Digital Ocean, I couldn't use AWS Backup. So I looked at services that can provide infrequent workload and don't cost anything when they're not busy.&lt;br&gt;
For this plan, I picked AWS Batch with ECS on Fargate. And I use Event Bridge scheduler to start the jobs.&lt;/p&gt;
&lt;h2&gt;
  
  
  Solution
&lt;/h2&gt;

&lt;p&gt;This is a general picture of my solution.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futn9mt923z6u56hj3gts.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Futn9mt923z6u56hj3gts.jpg" alt="solution diagram" width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I'm setting up everything using AWS CloudFormation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
  &lt;span class="na"&gt;BackuperComputeEnvironment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Batch::ComputeEnvironment&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MANAGED&lt;/span&gt;
      &lt;span class="na"&gt;ComputeEnvironmentName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;backuper-environment&lt;/span&gt;
      &lt;span class="na"&gt;ComputeResources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;MaxvCpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
        &lt;span class="na"&gt;SecurityGroupIds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;BackuperSecurityGroup&lt;/span&gt;
        &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;FARGATE&lt;/span&gt;
        &lt;span class="na"&gt;Subnets&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;BackuperSubnet&lt;/span&gt;
      &lt;span class="na"&gt;Tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name"&lt;/span&gt; &lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BackuperComputeEnvironment"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CreatedBy"&lt;/span&gt; &lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CloudFormationStack"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;App"&lt;/span&gt; &lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Backuper"&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;
      &lt;span class="na"&gt;State&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ENABLED&lt;/span&gt;

  &lt;span class="na"&gt;BackuperJobDefinition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS::Batch::JobDefinition&lt;/span&gt;
    &lt;span class="na"&gt;Properties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;container&lt;/span&gt;
      &lt;span class="na"&gt;JobDefinitionName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;BackuperJobDefinition&lt;/span&gt;
      &lt;span class="na"&gt;PlatformCapabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;FARGATE&lt;/span&gt;
      &lt;span class="na"&gt;ContainerProperties&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;Image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;registry.hub.docker.com/andygolubev/backuper:latest&lt;/span&gt;
        &lt;span class="na"&gt;Environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;AWS_BACKUP_DESTINATION_BUCKET&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;awsBackupDestinationBucketName&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_PG_USER&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doPgUser&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_KEY&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doKey&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_PG_DBNAME&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doPgDbname&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_PG_HOST&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doPgHost&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_SECRET&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doSecret&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_REGION_ENDPOINT&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doRegionEndpoint&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_PG_PORT&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doPgPort&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;DO_PG_PASSWORD&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;doPgPassword&lt;/span&gt;
        &lt;span class="na"&gt;Command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/bin/bash&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;-c&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;/backuper/s3_backup_script.sh &amp;amp;&amp;amp; /bin/bash -c /backuper/postgre_backup_script.sh&lt;/span&gt;
        &lt;span class="na"&gt;Privileged&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;JobRoleArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt;  &lt;span class="s"&gt;BackuperAmazonECSTaskExecutionRole.Arn&lt;/span&gt;
        &lt;span class="na"&gt;ExecutionRoleArn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!GetAtt&lt;/span&gt; &lt;span class="s"&gt;BackuperAmazonECSTaskExecutionRole.Arn&lt;/span&gt;
        &lt;span class="na"&gt;ReadonlyRootFilesystem&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
        &lt;span class="na"&gt;NetworkConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;AssignPublicIp&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ENABLED&lt;/span&gt;
        &lt;span class="na"&gt;ResourceRequirements&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MEMORY&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1024&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;Type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;VCPU&lt;/span&gt;
            &lt;span class="na"&gt;Value&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.5&lt;/span&gt;
        &lt;span class="na"&gt;LogConfiguration&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;LogDriver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;awslogs&lt;/span&gt;
          &lt;span class="na"&gt;Options&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;awslogs-group"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;BackuperLogGroup&lt;/span&gt;
            &lt;span class="s"&gt;"awslogs-stream-prefix"&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prefix"&lt;/span&gt;
      &lt;span class="na"&gt;Tags&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Name"&lt;/span&gt; &lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BackuperJobDefinition"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CreatedBy"&lt;/span&gt; &lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CloudFormationStack"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;App"&lt;/span&gt; &lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Backuper"&lt;/span&gt;&lt;span class="pi"&gt;}&lt;/span&gt;

&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I use my Docker Image with built-in bash scripts.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ubuntu:22.04&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /tmp&lt;/span&gt;

&lt;span class="c"&gt;# install tools&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nt"&gt;-y&lt;/span&gt; upgrade &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nt"&gt;--no-install-suggests&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="nb"&gt;install &lt;/span&gt;wget unzip curl tree git jq gettext zip ca-certificates

&lt;span class="c"&gt;# install aws cli v2&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;curl &lt;span class="s2"&gt;"https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip"&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; &lt;span class="s2"&gt;"awscliv2.zip"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    unzip awscliv2.zip &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    ./aws/install &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\ &lt;/span&gt;
    aws --version

&lt;span class="c"&gt;# install latest postgre tools&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nb"&gt;install &lt;/span&gt;lsb-release gnupg2 &lt;span class="nt"&gt;--no-install-suggests&lt;/span&gt; &lt;span class="nt"&gt;--no-install-recommends&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    sh &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" &amp;gt; /etc/apt/sources.list.d/pgdg.list'&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    wget &lt;span class="nt"&gt;--quiet&lt;/span&gt; &lt;span class="nt"&gt;-O&lt;/span&gt; - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt &lt;span class="nt"&gt;-y&lt;/span&gt; &lt;span class="nb"&gt;install &lt;/span&gt;postgresql-client

&lt;span class="c"&gt;# make working folders&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; /backuper&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;mkdir&lt;/span&gt; /backup&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;mkdir&lt;/span&gt; /backup_db
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /backuper&lt;/span&gt;

&lt;span class="c"&gt;# declare variables for s3_backup_script.sh&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_KEY=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_SECRET=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_REGION_ENDPOINT=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; AWS_BACKUP_DESTINATION_BUCKET=NOT_DEFINED&lt;/span&gt;

&lt;span class="c"&gt;# declare variables for postgre_backup_script.sh&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_PG_HOST=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_PG_PORT=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_PG_USER=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_PG_PASSWORD=NOT_DEFINED&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; DO_PG_DBNAME=NOT_DEFINED&lt;/span&gt;

&lt;span class="c"&gt;# create a buckets backup script inside the docker image&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"BUCKETS_ALL=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="s2"&gt;(AWS_ACCESS_KEY_ID=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="s2"&gt;DO_KEY AWS_SECRET_ACCESS_KEY=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="s2"&gt;DO_SECRET aws s3 ls --endpoint=&lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="s2"&gt;DO_REGION_ENDPOINT  | awk '{print &lt;/span&gt;&lt;span class="se"&gt;\$&lt;/span&gt;&lt;span class="s2"&gt;3}')"&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /backuper/s3_backup_script.sh
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;echo "This buckets will be processed: $BUCKETS_ALL" \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;for BUCKET in $BUCKETS_ALL \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;do \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  echo "Processing bucket -&amp;gt; $BUCKET" \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  mkdir -p /backup/$BUCKET/ \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  AWS_ACCESS_KEY_ID=$DO_KEY AWS_SECRET_ACCESS_KEY=$DO_SECRET aws s3 cp --quiet --recursive --endpoint=$DO_REGION_ENDPOINT s3://$BUCKET /backup/$BUCKET/ \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  ZIP_FILE_DATE_TIME=$(date +%Y-%m-%d--%H-%M) \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  zip --recurse-paths --quiet /backup/$ZIP_FILE_DATE_TIME-UTC-$BUCKET-bucket_backup.zip /backup/$BUCKET/ \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  aws s3 cp --storage-class GLACIER_IR /backup/$ZIP_FILE_DATE_TIME-UTC-$BUCKET-bucket_backup.zip  s3://$AWS_BACKUP_DESTINATION_BUCKET \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;  echo "Successfully Processed -&amp;gt; $BUCKET" \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;done \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;echo "Bucket backup is COMPLITED" &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /backuper/s3_backup_script.sh

&lt;span class="c"&gt;# create a postgre backup script inside the docker image&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;echo "Making dump for PostgreSQL Database --&amp;gt; $DO_PG_DBNAME" \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;mkdir -p /backup_db/$DO_PG_DBNAME/ \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;PGPASSWORD=$DO_PG_PASSWORD pg_dump -U $DO_PG_USER -h $DO_PG_HOST -p $DO_PG_PORT -Fc $DO_PG_DBNAME &amp;gt; /backup_db/$DO_PG_DBNAME/$DO_PG_DBNAME.dump \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;PGPASSWORD=$DO_PG_PASSWORD pg_dump -U $DO_PG_USER -h $DO_PG_HOST -p $DO_PG_PORT $DO_PG_DBNAME &amp;gt; /backup_db/$DO_PG_DBNAME/$DO_PG_DBNAME.sql \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;ZIP_FILE_DATE_TIME=$(date +%Y-%m-%d--%H-%M) \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;zip --recurse-paths --quiet /backup_db/$ZIP_FILE_DATE_TIME-UTC-$DO_PG_DBNAME-postgre_backup.zip /backup_db/$DO_PG_DBNAME/ \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;aws s3 cp --storage-class GLACIER_IR /backup_db/$ZIP_FILE_DATE_TIME-UTC-$DO_PG_DBNAME-postgre_backup.zip  s3://$AWS_BACKUP_DESTINATION_BUCKET \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;echo "Successfully Processed -&amp;gt; $DO_PG_DBNAME" \n&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;echo "Postgre backup is COMPLITED" &lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s1"&gt;'&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; /backuper/postgre_backup_script.sh

&lt;span class="c"&gt;# make the script runnable&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /backuper/s3_backup_script.sh
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x /backuper/postgre_backup_script.sh

&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/bin/bash", "-c",  "/backuper/s3_backup_script.sh &amp;amp;&amp;amp; /bin/bash -c /backuper/postgre_backup_script.sh"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the CloudFormation stack is deployed successfully, we can see all the resources that have been created.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgcid1y45dmjflflbk5o.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgcid1y45dmjflflbk5o.jpg" alt="cloud formation resources" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And as you can see, our AWS Batch service is fully set up and waiting for a trigger event.&lt;br&gt;
The scheduler is set with these configurations, and the cron will initiate an event at 8 AM UTC.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiu7yqoehutewl9t0zso.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqiu7yqoehutewl9t0zso.jpg" alt="scheduler" width="800" height="477"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have an Event Bridge rule that sifts through Batch Job-related events, keeping sensitive data out, and then sends them to the SNS Topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47y7dxszdljzwxbz7dg5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F47y7dxszdljzwxbz7dg5.jpg" alt="rule" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i61bl65cgnv242b2mbh.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2i61bl65cgnv242b2mbh.jpg" alt="transformer" width="800" height="464"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After running our job (either manually or by the scheduler), we can see it on our dashboard and receive emails with the job's status.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujnezbxntvbhs555xwru.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fujnezbxntvbhs555xwru.jpg" alt="dashboard" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result notification looks like this.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc955uxusvrf4pgd0jajv.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc955uxusvrf4pgd0jajv.jpg" alt="email" width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Here's an example of how you can handle occasional workloads efficiently with AWS Batch + ECS + Fargate, all while keeping costs down. Give it a shot!&lt;/p&gt;

&lt;p&gt;I hope you enjoy this article.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/aws-backup-with-batch-and-fargate" rel="noopener noreferrer"&gt;https://github.com/andygolubev/aws-backup-with-batch-and-fargate&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>ecs</category>
      <category>cloudformation</category>
      <category>awsbatch</category>
    </item>
    <item>
      <title>Kubernetes The Hard Way on AWS with Packer and Terraform</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Tue, 17 Oct 2023 09:58:28 +0000</pubDate>
      <link>https://dev.to/andygolubev/kubernetes-the-hard-way-on-aws-with-packer-and-terraform-56bh</link>
      <guid>https://dev.to/andygolubev/kubernetes-the-hard-way-on-aws-with-packer-and-terraform-56bh</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Kubernetes has undoubtedly become the de facto standard for container orchestration, offering a powerful and flexible platform for deploying, managing, and scaling containerized applications. As organizations increasingly adopt cloud-native architectures, mastering Kubernetes has become a critical skill for both developers and operations teams. While there are numerous managed Kubernetes services available in the cloud, there's immense value in understanding the intricacies of Kubernetes by building it from scratch, often referred to as "the hard way."&lt;br&gt;
I don't consider myself a regular user of this type of Kubernetes cluster because it can be challenging to maintain. However, it does serve as a valuable tool for educational purposes.&lt;/p&gt;

&lt;p&gt;I created this cluster with guidance from an ACloudGuru course called "Kubernetes the hard way." It was quite a challenge because the course utilized an older version of Kubernetes and an outdated DNS plugin. As a result, I had to modify many scripts and troubleshoot extensively. However, despite the difficulties, it turned out to be an enjoyable experience. Additionally, I had to design a multi-Availability Zone (AZ) network for my EC2 instances and set up all the necessary network components. This was necessary because the original course had initially placed all the hosts in the same security group.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;My solution utilizes three Availability Zones (AZs) within the same region. Additionally, I employ a bastion host for communication with my cluster. All of the EC2 instances use custom images that I constructed during the previous stage of my pipeline.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc284nhsd7rdfmxu1zdwg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc284nhsd7rdfmxu1zdwg.png" alt="Architecture" width="800" height="415"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c284nhsd7rdfmxu1zdwg.png" rel="noopener noreferrer"&gt;hi res image&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Realization
&lt;/h2&gt;

&lt;p&gt;I start by creating a Terraform state bucket and a DynamoDB table using the AWS CLI. This is a fairly common block in my pipelines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight hcl"&gt;&lt;code&gt;&lt;span class="err"&gt;...&lt;/span&gt;
&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
  &lt;span class="c1"&gt;# prefixes must be the same as in the 00-provider.tf&lt;/span&gt;
  &lt;span class="nx"&gt;AWS_BUCKET_NAME_PREFIX&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"terraform-state-for-kubernetes-the-hard-way-packer"&lt;/span&gt; 
  &lt;span class="nx"&gt;AWS_DYNAMO_DB_TABLE_NAME_PREFIX&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"terraform-state-for-terraform-state-for-kubernetes-the-hard-way-packer"&lt;/span&gt;

  &lt;span class="nx"&gt;AWS_REGION&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;$&lt;/span&gt;&lt;span class="p"&gt;{{&lt;/span&gt; &lt;span class="nx"&gt;vars&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;AWS_REGION&lt;/span&gt; &lt;span class="p"&gt;}}&lt;/span&gt;

&lt;span class="err"&gt;...&lt;/span&gt;
    &lt;span class="nx"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Create&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;bucket&lt;/span&gt;
      &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt;
        &lt;span class="nx"&gt;if&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"${{ env.AWS_REGION }}"&lt;/span&gt; &lt;span class="err"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"us-east-1"&lt;/span&gt; &lt;span class="p"&gt;]]&lt;/span&gt;&lt;span class="err"&gt;;&lt;/span&gt; &lt;span class="nx"&gt;then&lt;/span&gt;
          &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="nx"&gt;s3api&lt;/span&gt; &lt;span class="nx"&gt;create-bucket&lt;/span&gt; &lt;span class="nx"&gt;--bucket&lt;/span&gt; &lt;span class="nx"&gt;$AWS_BUCKET_NAME_PREFIX-$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--region&lt;/span&gt; &lt;span class="nx"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--no-cli-pager&lt;/span&gt;
        &lt;span class="nx"&gt;else&lt;/span&gt;
          &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="nx"&gt;s3api&lt;/span&gt; &lt;span class="nx"&gt;create-bucket&lt;/span&gt; &lt;span class="nx"&gt;--bucket&lt;/span&gt; &lt;span class="nx"&gt;$AWS_BUCKET_NAME_PREFIX-$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--region&lt;/span&gt; &lt;span class="nx"&gt;$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--no-cli-pager&lt;/span&gt; &lt;span class="nx"&gt;--create-bucket-configuration&lt;/span&gt; &lt;span class="nx"&gt;LocationConstraint&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;$AWS_REGION&lt;/span&gt;
        &lt;span class="nx"&gt;fi&lt;/span&gt;

        &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="nx"&gt;s3api&lt;/span&gt; &lt;span class="nx"&gt;put-bucket-versioning&lt;/span&gt; &lt;span class="nx"&gt;--bucket&lt;/span&gt; &lt;span class="nx"&gt;$AWS_BUCKET_NAME_PREFIX-$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--versioning-configuration&lt;/span&gt; &lt;span class="nx"&gt;Status&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;Enabled&lt;/span&gt;
        &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="nx"&gt;s3api&lt;/span&gt; &lt;span class="nx"&gt;put-bucket-encryption&lt;/span&gt; &lt;span class="nx"&gt;--bucket&lt;/span&gt; &lt;span class="nx"&gt;$AWS_BUCKET_NAME_PREFIX-$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--server-side-encryption-configuration&lt;/span&gt; &lt;span class="s1"&gt;'{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'&lt;/span&gt;

    &lt;span class="nx"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Create&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;DynamoDB&lt;/span&gt; &lt;span class="nx"&gt;table&lt;/span&gt;
      &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt;
        &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="nx"&gt;dynamodb&lt;/span&gt; &lt;span class="nx"&gt;create-table&lt;/span&gt; &lt;span class="nx"&gt;--table-name&lt;/span&gt; &lt;span class="nx"&gt;$AWS_DYNAMO_DB_TABLE_NAME_PREFIX-$AWS_REGION&lt;/span&gt; &lt;span class="nx"&gt;--attribute-definitions&lt;/span&gt; &lt;span class="nx"&gt;AttributeName&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;LockID&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;AttributeType&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;S&lt;/span&gt; &lt;span class="nx"&gt;--key-schema&lt;/span&gt; &lt;span class="nx"&gt;AttributeName&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;LockID&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;KeyType&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;HASH&lt;/span&gt; &lt;span class="nx"&gt;--billing-mode&lt;/span&gt; &lt;span class="nx"&gt;PAY_PER_REQUEST&lt;/span&gt; &lt;span class="nx"&gt;--tags&lt;/span&gt; &lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;Name&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"terraform state dynamo table"&lt;/span&gt; &lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;CreatedBy&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"AWS CLI"&lt;/span&gt; &lt;span class="nx"&gt;Key&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;Region&lt;/span&gt;&lt;span class="err"&gt;,&lt;/span&gt;&lt;span class="nx"&gt;Value&lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="nx"&gt;$AWS_REGION&lt;/span&gt; 

    &lt;span class="nx"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Create&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt; &lt;span class="nx"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;VPC&lt;/span&gt; &lt;span class="nx"&gt;in&lt;/span&gt; &lt;span class="nx"&gt;the&lt;/span&gt; &lt;span class="nx"&gt;region&lt;/span&gt;
      &lt;span class="nx"&gt;run&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt; &lt;span class="err"&gt;|&lt;/span&gt;
        &lt;span class="nx"&gt;aws&lt;/span&gt; &lt;span class="nx"&gt;ec2&lt;/span&gt; &lt;span class="nx"&gt;create-default-vpc&lt;/span&gt; &lt;span class="err"&gt;||&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;    &lt;span class="c1"&gt;# create default VPC if not exist. It is required for AMI building &lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I use bash scripts within the pipeline to create certificates and configurations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;➜  scripts-for-certs-and-configs git:&lt;span class="o"&gt;(&lt;/span&gt;main&lt;span class="o"&gt;)&lt;/span&gt; tree 
&lt;span class="nb"&gt;.&lt;/span&gt;
├── 00-k8s-network.sh
├── 01-certs-ca.sh
├── 02-certs-components.sh
├── 03-certs-api-server.sh
├── 04-certs-service-account.sh
├── 05-kubeconfig.sh
├── 06-generate-encryption-config.sh
├── 07-generate-etcd-service.sh
├── 08-generate-control-plane-configs.sh
├── 09-generate-cluster-role.sh
├── 10-generate-ngix-config.sh
├── 11-generate-containerd-config.sh
├── 12-generate-kubelet-config.sh
├── 13-generate-kube-proxy-config.sh
├── 14-bastion-key.sh
├── 15-generate-wavenet-manifest.sh
└── 16-generate-coredns-manifest.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, I utilize Packer to build all the Amazon Machine Images (AMIs) and copy necessary files.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;==&amp;gt; Builds finished. The artifacts of successful builds are:
--&amp;gt; k8s-control-plane-2.amazon-ebs.ubuntu-kubernetes-the-hard-way-control-plane-2: AMIs were created:
us-west-2: ami-0314729fef4933bdc

--&amp;gt; k8s-control-plane-0.amazon-ebs.ubuntu-kubernetes-the-hard-way-control-plane-0: AMIs were created:
us-west-2: ami-088c138db1acf379f

--&amp;gt; k8s-control-plane-1.amazon-ebs.ubuntu-kubernetes-the-hard-way-control-plane-1: AMIs were created:
us-west-2: ami-0d348cd361f433388

--&amp;gt; k8s-load-balancer-internal.amazon-ebs.ubuntu-kubernetes-the-hard-way-load-balancer-internal: AMIs were created:
us-west-2: ami-07cbdbe6027e64882

--&amp;gt; k8s-bastion-host.amazon-ebs.ubuntu-kubernetes-the-hard-way-bastion-host: AMIs were created:
us-west-2: ami-0f0e571a26d6ac08a

--&amp;gt; k8s-working-node-1.amazon-ebs.ubuntu-kubernetes-the-hard-way-working-node-1: AMIs were created:
us-west-2: ami-015f7e349eb6ec7ac

--&amp;gt; k8s-working-node-2.amazon-ebs.ubuntu-kubernetes-the-hard-way-working-node-2: AMIs were created:
us-west-2: ami-03feac7b952f2ce5c

--&amp;gt; k8s-working-node-0.amazon-ebs.ubuntu-kubernetes-the-hard-way-working-node-0: AMIs were created:
us-west-2: ami-089c713f758a50a63
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, I provision the entire infrastructure using my custom AMIs through Terraform.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;➜  terraform git:&lt;span class="o"&gt;(&lt;/span&gt;main&lt;span class="o"&gt;)&lt;/span&gt; tree
&lt;span class="nb"&gt;.&lt;/span&gt;
├── 00-provider.tf
├── 01-vpc.tf
├── 02-subnets.tf
├── 03-security-groups.tf
├── 04-route-tables.tf
├── 05-nat-gateway.tf
├── 06-ssh-key.tf
├── 07-ec2-control-plane.tf
├── 08-ec2-load-balancer.tf
├── 09-ec2-working-node.tf
├── 10-ec2-bastion.tf
└── 99-variables.tf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can watch a time-lapsed video.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dropbox.com/scl/fi/qj87ngxnysquoux8u5k79/article-kubernetes-the-hard-way-Andy-Golubev.mp4?rlkey=5kumm1comvrbax6ei6ei2b6jd&amp;amp;dl=0" rel="noopener noreferrer"&gt;Build and provision with GitHub Actions&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  This is what we observe in the AWS console
&lt;/h2&gt;

&lt;p&gt;Here are the results as I see them in the AWS console:&lt;/p&gt;

&lt;p&gt;VPC&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnf05vy3hdzsqulv73j5w.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnf05vy3hdzsqulv73j5w.jpg" alt="VPC" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;AMIs&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5ou9m09058mw2ogcrvx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy5ou9m09058mw2ogcrvx.jpg" alt="AMI" width="800" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instances&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87mt43a2wkaj3n361dmc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F87mt43a2wkaj3n361dmc.jpg" alt="Instances" width="800" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Kubernetes objects&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75eacauyclvo65vvs9w4.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F75eacauyclvo65vvs9w4.jpg" alt="Kubernetes objects" width="800" height="261"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;To summarize, my journey of building Kubernetes on AWS using Terraform and Packer was very educational. Although it was not easy, it was a unique opportunity to learn Kubernetes architecture and how it works in depth.&lt;/p&gt;

&lt;p&gt;I hope you enjoy this article.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/kubernetes-the-hard-way-aws" rel="noopener noreferrer"&gt;https://github.com/andygolubev/kubernetes-the-hard-way-aws&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kubernetes</category>
      <category>packer</category>
      <category>terraform</category>
      <category>aws</category>
    </item>
    <item>
      <title>AWS Serverless image recognition Telegram bot using Terraform</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Tue, 27 Jun 2023 11:45:19 +0000</pubDate>
      <link>https://dev.to/andygolubev/aws-serverless-image-recognition-telegram-bot-using-terraform-1oih</link>
      <guid>https://dev.to/andygolubev/aws-serverless-image-recognition-telegram-bot-using-terraform-1oih</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;The world of technology is constantly evolving, and with it comes the need for efficient and scalable solutions. Serverless architecture has gained significant popularity due to its ability to handle workloads without the need for infrastructure management. In this article, we will explore the process of building an AWS Serverless image recognition Telegram bot using Terraform.&lt;/p&gt;

&lt;p&gt;With a pay-as-you-go pricing model, you only incur costs when functions are executed, ensuring cost effectiveness. Additionally, the availability of the AWS Free Tier means that for small workloads, you pay nothing.&lt;/p&gt;

&lt;p&gt;This bot utilizes webhooks from Telegram, enabling it to operate in a reactive manner, responding promptly to specific events. By leveraging webhooks, the bot remains idle until triggered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution diagram
&lt;/h2&gt;

&lt;p&gt;The diagram below illustrates the complete solution, employing a range of AWS services to ensure its seamless functionality.&lt;br&gt;
At the heart of the architecture lies the Lambda function, playing a pivotal role in executing the desired operations. To optimize the efficiency of Lambda deployments and accelerate initialization, Lambda layers are employed. These layers contain all the necessary dependencies, streamlining the deployment process and facilitating faster development iterations.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftw1gnmu670owyukqhgwf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftw1gnmu670owyukqhgwf.png" alt="Solution scheme" width="800" height="508"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To ensure the security of sensitive information, such as the BOT Token, I utilize Secrets Manager. This secure vault enables all lambdas to access the token directly, eliminating the need to store it in environment variables.&lt;br&gt;
The storage of user-sent images is facilitated by an S3 bucket. This allows AWS Rekognition access and retrieval of images by simply defining the path to the image within the bucket.&lt;/p&gt;

&lt;p&gt;API Gateway acts as a proxy for lambda function calls, providing a seamless communication channel. Beyond its immediate role, API Gateway offers potential future benefits, such as traffic routing and the ability to create development environments for APIs. This versatility positions the product for future scalability and easy integration with evolving requirements.&lt;/p&gt;

&lt;p&gt;For simple statistics storage, DynamoDB serves as an effective solution. By leveraging DynamoDB, the solution efficiently stores and retrieves statistical data, ensuring reliable data management without unnecessary complexity.&lt;/p&gt;

&lt;p&gt;Lastly, AWS Rekognition is utilized to detect labels on the pictures. While the implementation utilizes the smallest capability of the service due to development time constraints, it serves as a demonstration of its functionality. AWS Rekognition offers powerful image analysis capabilities, which can be further explored and enhanced in future iterations.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bot functionality
&lt;/h2&gt;

&lt;p&gt;The bot's functionality revolves around three simple entities:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Text processing&lt;/li&gt;
&lt;li&gt;Image recognition&lt;/li&gt;
&lt;li&gt;Statistics request&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Initially, to streamline the implementation, I consolidated these functionalities within a single Lambda function. However, as the logic grows more complex, I am inclined to adopt a modular approach by separating these functionalities into individual Lambda functions.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bot Setup
&lt;/h2&gt;

&lt;p&gt;Setting up a new bot is a straightforward process that requires just three simple steps, all of which can be accomplished with the help of the BotFather. Let's dive into the process:&lt;/p&gt;

&lt;p&gt;Step 1: Request a New Bot&lt;br&gt;
Step 2: Choose a Bot Name&lt;br&gt;
Step 3: Assign an Account Name&lt;/p&gt;

&lt;p&gt;Optional Step: Set Bot Avatar Image&lt;/p&gt;

&lt;p&gt;You can see it in the screenshots:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77e2c6kdcnfkrjeaotvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77e2c6kdcnfkrjeaotvh.png" alt="scr01" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd8m4jpxvhohvj47f22a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjd8m4jpxvhohvj47f22a.png" alt="scr02" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Terraform project
&lt;/h2&gt;

&lt;p&gt;To ensure a smooth and efficient setup process, I employ Terraform, an industry-leading Infrastructure as Code (IaC) tool. With Terraform, I can easily provision the entire infrastructure stack required for the project.&lt;/p&gt;

&lt;p&gt;Here is the entire repository:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user@ubuntu:~/aws-telegram-bot-serverless$ tree
.
├── LICENSE
├── lambda
│&amp;nbsp;&amp;nbsp; ├── bot-dependencies-layer
│&amp;nbsp;&amp;nbsp; │&amp;nbsp;&amp;nbsp; └── requirements.txt
│&amp;nbsp;&amp;nbsp; ├── bot-function
│&amp;nbsp;&amp;nbsp; │&amp;nbsp;&amp;nbsp; └── bot.py
│&amp;nbsp;&amp;nbsp; └── webhook-function
│&amp;nbsp;&amp;nbsp;     └── webhook.py
└── terraform
    ├── 00-provider.tf
    ├── 01-roles.tf
    ├── 02-secrets.tf
    ├── 03-lambda-layer-with-dependencies.tf
    ├── 04-lambda-bot.tf
    ├── 05-lambda-webhook.tf
    ├── 06-api-gateway.tf
    ├── 07-bucket-images.tf
    ├── 08-dynamodb-stats.tf
    ├── 98-output.tf
    ├── 99-data.tf
    ├── create_bucket.sh
    ├── terraform.tfvars
    └── variables.tf

5 directories, 18 files
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's take a closer look at the comprehensive list of resources provisioned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;user@ubuntu:~/aws-telegram-bot-serverless/terraform$ terraform state list
data.archive_file.lambda-bot-zip-file
data.archive_file.layer-zip-file
data.archive_file.webhook-function-zip-file
data.aws_caller_identity.current-account
data.aws_lambda_invocation.webhook-lambda-invocation
aws_apigatewayv2_api.call-back-api
aws_apigatewayv2_integration.api-gw-to-lambda
aws_apigatewayv2_route.post-callback-route
aws_apigatewayv2_stage.prod
aws_cloudwatch_log_group.call-back-api-gw
aws_cloudwatch_log_group.lambda-log-bot
aws_cloudwatch_log_group.lambda-log-webhook
aws_dynamodb_table.aws-telegram-bot-statistics
aws_iam_policy.custom-policy
aws_iam_role.lambdaRole
aws_iam_role_policy_attachment.custom-policy-attachment
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/AWSLambdaExecute"]
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess"]
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/AmazonRekognitionReadOnlyAccess"]
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/service-role/AmazonS3ObjectLambdaExecutionRolePolicy"]
aws_lambda_function.bot-lambda
aws_lambda_function.webhook-lambda
aws_lambda_layer_version.lambda-layer-for-packages
aws_lambda_permission.api_gw
aws_s3_bucket.images-bucket
aws_s3_bucket_lifecycle_configuration.images-bucket-name-lifecycle_configuration
aws_secretsmanager_secret.bot-token-secret
aws_secretsmanager_secret_version.sversion
null_resource.pip-install
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;While provisioning my infrastructure using Terraform, I encountered a challenge related to environment dependencies. To ensure the successful execution of my Terraform code, I rely on a local provisioner that involves executing a Python PIP Install command and storing the results in the /tmp folder. In order to address this issue and ensure a consistent setup across environments, I have implemented the following solution on GitHub.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitHub Actions
&lt;/h2&gt;

&lt;p&gt;To ensure convenience and flexibility, I have designed the bot to be deployable using two different methods: GitHub Action and local setup. This allows you to choose the approach that best suits your preferences and requirements.&lt;/p&gt;

&lt;p&gt;To facilitate this deployment flexibility, I have made certain modifications to the Terraform backend section. Since I do not use Terragrunt in this project, I have incorporated a &lt;strong&gt;sed&lt;/strong&gt; command in my pipeline. This allows me to dynamically rewrite the Terraform backend section, specifying the appropriate AWS region. Although it may not be the most elegant solution, it effectively ensures the correct configuration for the backend.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;terraform&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Deploy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;bot&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;on&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;AWS'&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;

&lt;span class="nn"&gt;...&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Replace the Region in the Provider section of Terraform&lt;/span&gt;  
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;sed -i 's/us-east-1/${{ env.AWS_REGION }}/g' $TERRAFORM_PATH/00-provider.tf&lt;/span&gt;

    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Terraform Init&lt;/span&gt;
      &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;terraform -chdir=$TERRAFORM_PATH init&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I store a few secrets and one variable securely on GitHub. You should do the same if you want to use my code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx8yt9xxro5pm9pkg4p4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffx8yt9xxro5pm9pkg4p4.png" alt="scr03" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6yk00cjd3rppwpcwtc3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6yk00cjd3rppwpcwtc3.png" alt="scr04" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And I provide three workflows that simplify the deployment and management process of the bot:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1 - Create Terraform State Bucket and DynamoDB Table:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This workflow enables the creation of a Terraform state bucket and a DynamoDB table. It's important to note that these objects will not be managed by Terraform itself and will require manual removal when necessary.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2 - Provisioning of Essential Bot Services:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The second workflow automates the provisioning of all the services required by the bot.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3 - Destruction of Managed Infrastructure:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The final workflow focuses on the controlled destruction of the infrastructure managed by Terraform. This process ensures the efficient cleanup of resources when they are no longer needed.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frih6r2dtd36xl21u57gv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frih6r2dtd36xl21u57gv.png" alt="scr05" width="800" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While utilizing Step 1, it is crucial to exercise caution. As S3 buckets have a global namespace, conflicts may arise when selecting bucket names. To mitigate this, it may be necessary to modify bucket names prefix. This ensures uniqueness and prevents naming conflicts within the global S3 namespace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo Time
&lt;/h2&gt;

&lt;p&gt;To provide a clearer understanding of the bot's operation, I present a few screens showcasing its functionality in action. &lt;/p&gt;

&lt;p&gt;Image Recognition:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7uio1fop0vz5sq1zusyt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7uio1fop0vz5sq1zusyt.png" alt="scr06" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexx1rjfycn4yee5vp0hu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fexx1rjfycn4yee5vp0hu.png" alt="scr07" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuor8mhoqsavvxuxb3znf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuor8mhoqsavvxuxb3znf.png" alt="scr08" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjsbkrpu3tr2mca1dxogz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjsbkrpu3tr2mca1dxogz.png" alt="scr09" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Statistics Request:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ww8x9hs126qmtu7wha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52ww8x9hs126qmtu7wha.png" alt="scr10" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you have a Telegram account, I invite you to try out the bot and explore its capabilities firsthand. I provide this bot &lt;strong&gt;until the end of July 2023&lt;/strong&gt;. Further service is not guaranteed because I don’t want to go beyond AWS Free Tier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The AWS serverless stack offers you a powerful and scalable solution out of the box. This project demonstrates the potential of serverless architectures and highlights the ease of development using Terraform. &lt;/p&gt;

&lt;p&gt;I hope you enjoy this article.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository:&amp;nbsp;&lt;a href="https://github.com/andygolubev/aws-telegram-bot-serverless" rel="noopener noreferrer"&gt;https://github.com/andygolubev/aws-telegram-bot-serverless&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can try this bot yourself: &lt;a href="http://t.me/AWS_Image_Rekognition_Bot" rel="noopener noreferrer"&gt;http://t.me/AWS_Image_Rekognition_Bot&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn:&amp;nbsp;&lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aws</category>
      <category>terraform</category>
      <category>python</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Automate Publishing Markdown Files from GitHub to Confluence with github-to-confluence-publisher tool</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Tue, 23 May 2023 09:39:12 +0000</pubDate>
      <link>https://dev.to/andygolubev/automate-publishing-markdown-files-from-github-to-confluence-with-github-to-confluence-publisher-tool-eh4</link>
      <guid>https://dev.to/andygolubev/automate-publishing-markdown-files-from-github-to-confluence-with-github-to-confluence-publisher-tool-eh4</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Managing documentation across different platforms can be a time-consuming task, especially when it involves converting and uploading files manually. However, with the GitHub to Confluence Publisher tool, you can automate the process of publishing Markdown files from GitHub to Confluence effortlessly. This script simplifies the conversion of Markdown files into Confluence markup and streamlines the uploading process to your Confluence space. In this article, we will explore the setup, configuration, and functionality of the github-to-confluence-publisher tool.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/github-to-confluence-publisher" rel="noopener noreferrer"&gt;https://github.com/andygolubev/github-to-confluence-publisher&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up github-to-confluence-publisher
&lt;/h2&gt;

&lt;p&gt;To get started, follow these steps to set up the github-to-confluence-publisher tool.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a new domain in Confluence Cloud:
&lt;/h3&gt;

&lt;p&gt;Before using the tool, you need to create a new domain in Confluence Cloud. You can do this by visiting the Atlassian website and accessing the domain creation page.&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a new space, parent page, and API token:
&lt;/h3&gt;

&lt;p&gt;Within your Confluence Cloud domain, create a new space where you want to publish your documentation. Take note of the space's name and the parent page's ID, as you will need them for configuration later. Additionally, generate an API token for authentication purposes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configure the publisher:
&lt;/h3&gt;

&lt;p&gt;Open the config.yaml file in the publisher/config directory and update the following values with your specific information:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;confluence_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;The URL of your Confluence REST API.&lt;/span&gt;
&lt;span class="na"&gt;confluence_space&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;The name of the space you created earlier.&lt;/span&gt;
&lt;span class="na"&gt;confluence_parent_page_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;The ID of the parent page within your space.&lt;/span&gt;
&lt;span class="na"&gt;confluence_search_pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A search pattern used to identify autogenerated pages. It is recommended to use a random value to ensure proper deletion of autogenerated pages.&lt;/span&gt;
&lt;span class="na"&gt;github_folder_with_md_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;The path to the folder containing your Markdown files on GitHub.&lt;/span&gt;
&lt;span class="na"&gt;github_folder_with_image_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;The path to the folder containing your image files on GitHub.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;My example config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;confluence_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://test-publisher.atlassian.net/wiki/rest/api/&lt;/span&gt;

&lt;span class="na"&gt;confluence_space&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Documentat&lt;/span&gt; 
&lt;span class="na"&gt;counfluence_parent_page_id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;262359&lt;/span&gt;
&lt;span class="na"&gt;confluence_search_pattern&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;(this page is autogenerated)&lt;/span&gt;
&lt;span class="na"&gt;github_folder_with_md_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./data&lt;/span&gt;
&lt;span class="na"&gt;github_folder_with_image_files&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;./data_images&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you have completed the setup process, you can start using the github-to-confluence-publisher tool. &lt;/p&gt;

&lt;h2&gt;
  
  
  How it works:
&lt;/h2&gt;

&lt;p&gt;Initially, your Confluence space will be empty, as shown in the provided screenshot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7y6kz2v1u2dwbhp1wqj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7y6kz2v1u2dwbhp1wqj0.png" alt="image1" width="800" height="335"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Running the publisher: Run the publisher script either locally or using GitHub Actions. The script will search for pages in your Confluence space that match the confluence_search_pattern and delete them.&lt;/p&gt;

&lt;p&gt;Local run: If you choose to run the script locally, you will see a similar output to the "Local run" screenshot provided.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7419knlp80wiw94jkqc1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7419knlp80wiw94jkqc1.png" alt="local run" width="800" height="428"&gt;&lt;/a&gt;&lt;br&gt;
GitHub Actions: If you prefer using GitHub Actions, the process will be executed automatically, as shown in the "GitHub Actions" screenshot.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxnnwrzvc2afb5n6528b4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxnnwrzvc2afb5n6528b4.png" alt="github actions" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Populating the space: After running the publisher, the parent page will contain child pages with your content, as organized in your GitHub repository's folder structure. The child pages imitate folders and display the "Children Display" widget for easy navigation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkrfetzoosn2u5ma2vljl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkrfetzoosn2u5ma2vljl.png" alt="structure" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbtpaev0ivowzpi0uiph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsbtpaev0ivowzpi0uiph.png" alt="widget" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Attachment management: The tool automatically attaches images from your GitHub repository to the respective Confluence pages, ensuring all the necessary visuals are included in the documentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4aw90981oc62y2y2kz4v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4aw90981oc62y2y2kz4v.png" alt="attachment" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The GitHub to Confluence Publisher tool simplifies the process of publishing Markdown files from GitHub to Confluence. By automating the conversion and uploading tasks, it saves time and effort, allowing you to focus on creating high-quality documentation. &lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/github-to-confluence-publisher" rel="noopener noreferrer"&gt;https://github.com/andygolubev/github-to-confluence-publisher&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>github</category>
      <category>confluence</category>
      <category>md</category>
      <category>python</category>
    </item>
    <item>
      <title>Terraform and DigitalOcean: Automating Infrastructure and Catching the Hidden Load Balancer</title>
      <dc:creator>andygolubev</dc:creator>
      <pubDate>Wed, 17 May 2023 19:15:40 +0000</pubDate>
      <link>https://dev.to/andygolubev/terraform-and-digitalocean-automating-infrastructure-and-catching-the-hidden-load-balancer-1oji</link>
      <guid>https://dev.to/andygolubev/terraform-and-digitalocean-automating-infrastructure-and-catching-the-hidden-load-balancer-1oji</guid>
      <description>&lt;h2&gt;
  
  
  Introduction:
&lt;/h2&gt;

&lt;p&gt;In this article, I will demonstrate the process of provisioning various components of infrastructure, including Projects, Virtual Private Clouds (VPCs), Kubernetes clusters, Load Balancers, and DNS Records. Additionally, I will outline the steps to configure Kubernetes with ingress and cert-manager, all within a single pipeline.&lt;br&gt;
For this purpose, I have chosen DigitalOcean as the cloud provider due to its cost-effectiveness in comparison to leading providers like AWS, GCP, and Azure. To facilitate the infrastructure provisioning, I will be utilizing Terraform, a powerful infrastructure as code tool. Unfortunately, I won't be able to incorporate Terragrunt into this setup, as I encountered difficulties in configuring it with DigitalOcean Bucket (Space).&lt;/p&gt;
&lt;h2&gt;
  
  
  Project structure:
&lt;/h2&gt;

&lt;p&gt;The core of my project involves defining a live infrastructure along with several modules. The infrastructure definition comprises two distinct stages, which can be visualized through the provided diagram. &lt;br&gt;
You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/terraform-digital-ocean" rel="noopener noreferrer"&gt;https://github.com/andygolubev/terraform-digital-ocean&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Initially, I attempted to handle both stages within a single script. However, during the implementation, I encountered a limitation with Terraform. Specifically, I discovered that Terraform's integration with the DigitalOcean provider did not allow for the creation and configuration of Kubernetes using the HashiCorp Kubernetes provider within the same script.&lt;/p&gt;

&lt;p&gt;To address this challenge and ensure a smooth deployment process, I decided to split the infrastructure definition into two stages:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Stage 1:&lt;/strong&gt; Infrastructure Provisioning (VPC, Kubernetes, PostgreSQL, Install Nginx and Cert manager)&lt;br&gt;
&lt;strong&gt;Stage 2:&lt;/strong&gt; Kubernetes Configuration, DNS records setup&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyo2vkanvbs29l7gdyxr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhyo2vkanvbs29l7gdyxr.png" alt="Terraform stages" width="761" height="711"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By separating the infrastructure provisioning and Kubernetes configuration into distinct stages, we can overcome the limitations imposed by the integration challenges mentioned earlier. This approach allows for granular control and flexibility when deploying and managing infrastructure and Kubernetes resources.&lt;/p&gt;

&lt;p&gt;Furthermore, this division of stages enables better modularization and reusability, as each stage can be version-controlled, tested, and deployed independently. This not only simplifies the maintenance and troubleshooting process but also promotes scalability and agility when making changes or expanding the infrastructure in the future.&lt;/p&gt;

&lt;p&gt;Overall, by navigating around the constraints and adopting a two-stage approach, we can effectively define and deploy our live infrastructure while integrating Kubernetes seamlessly into the process, ensuring a reliable and scalable environment for our applications.&lt;/p&gt;

&lt;p&gt;This is my folders structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;➜  terraform-digital-ocean git:(main) tree .
.
├── Infrastructure
│   └── digitalocean
│       ├── infrastructure-live
│       │   └── test-v1
│       │       ├── stage1
│       │       │   ├── main-stage1.tf
│       │       │   └── outputs.tf
│       │       └── stage2
│       │           ├── main-stage2.tf
│       │           └── outputs.tf
│       └── infrastructure-modules
│           ├── kubernetes-config
│           │   └── v1.0
│           │       ├── 0-versions.tf
│           │       ├── 1-save-kubeconfig.tf
│           │       ├── 2-cluster-issuer.tf
│           │       ├── 3-ingress-demo.tf
│           │       ├── 4-services-good-afternoon.tf
│           │       ├── 4-services-good-evening.tf
│           │       ├── 4-services-good-morning.tf
│           │       ├── 5-service-pagenotfound.tf
│           │       ├── 6-load-balancer.tf
│           │       ├── 7-records.tf
│           │       ├── 8-variables.tf
│           │       └── 9-outputs.tf
│           ├── kubernetes-provision
│           │   └── v1.0
│           │       ├── 0-versions.tf
│           │       ├── 1-kubernetes.tf
│           │       ├── 2-save-kubeconfig.tf
│           │       ├── 3-ingress-and-cert-manager.tf
│           │       ├── 4-registry-access.tf
│           │       ├── 5-variables.tf
│           │       └── 6-outputs.tf
│           ├── postgresql
│           │   └── v1.0
│           │       ├── 0-versions.tf
│           │       ├── 1-postgres.tf
│           │       ├── 2-variables.tf
│           │       └── 3-outputs.tf
│           └── vpc
│               └── v1.0
│                   ├── 0-versions.tf
│                   ├── 1-vpc.tf
│                   ├── 2-variables.tf
│                   └── 3-outputs.tf
├── LICENSE
└── README.md
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Terraform backend setup:
&lt;/h2&gt;

&lt;p&gt;Prior to executing my pipeline, I have created a private bucket in DigitalOcean for storing terraform states.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbxghiczlkaovgpjpba0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjbxghiczlkaovgpjpba0.png" alt="bucket" width="800" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 1. Provision the infrastructure
&lt;/h2&gt;

&lt;p&gt;During this stage, I utilize my "main-stage1.tf" file to declare the necessary values for infrastructure provisioning. Additionally, I ensure proper management of dependencies between modules to guarantee a smooth and coherent deployment process.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="err"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"digitalocean_project"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;        &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"infra-demo-v1"&lt;/span&gt; 
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"vpc"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../../infrastructure-modules/vpc/v1.0"&lt;/span&gt;
  &lt;span class="nx"&gt;vpc_name&lt;/span&gt;             &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"vpc-test"&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"kubernetes-provision"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../../infrastructure-modules/kubernetes-provision/v1.0"&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="nx"&gt;k8s_cluster_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"demo-cluster-test-v1"&lt;/span&gt; &lt;span class="c1"&gt;#Edit&lt;/span&gt;
    &lt;span class="nx"&gt;vpc_id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc_id&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="nx"&gt;k8s_embedded_pool_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s-4vcpu-8gb"&lt;/span&gt;
    &lt;span class="nx"&gt;k8s_embedded_pool_nodes_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; 

    &lt;span class="c1"&gt;# Type "true" if you want this pool of nodes&lt;/span&gt;
    &lt;span class="nx"&gt;pool_1_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;k8s_pool_1_size&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"s-4vcpu-8gb"&lt;/span&gt;
    &lt;span class="nx"&gt;k8s_pool_1_nodes_count&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;digitalocean_project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"postgresql"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../../infrastructure-modules/postgresql/v1.0"&lt;/span&gt;

&lt;span class="p"&gt;...&lt;/span&gt;
    &lt;span class="nx"&gt;postgre_enabled&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="nx"&gt;posgre_cluster_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"postgresql-demo-test-v1"&lt;/span&gt;
&lt;span class="p"&gt;...&lt;/span&gt;

    &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="k"&gt;module&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;vpc&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;digitalocean_project&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;,]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete content of the file: &lt;a href="https://github.com/andygolubev/terraform-digital-ocean/blob/main/Infrastructure/digitalocean/infrastructure-live/test-v1/stage2/main-stage2.tf" rel="noopener noreferrer"&gt;main-stage1.tf&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The output of the Stage 1 is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Outputs:

k8s_cluster_id = "ba4854df-c6de-4deb-8385-77014b491454"
k8s_cluster_name = "demo-cluster-test-v1"
k8s_cluster_urn = "do:kubernetes:ba4854df-c6de-4deb-8385-77014b491454"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my pipeline, I utilize "k8s_cluster_name" output as an input for Stage 2. You can find details in the pipeline listing provided at the end of this article.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stage 2. Configure the Kubernetes cluster and DNS
&lt;/h2&gt;

&lt;p&gt;In Stage 2 of my pipeline, I use a combination of Kubernetes manifests, local command execution, and the creation of DigitalOcean resources to achieve the desired configuration and setup. You can see it in my "kubernetes-config" terraform module: &lt;a href="https://github.com/andygolubev/terraform-digital-ocean/tree/main/Infrastructure/digitalocean/infrastructure-modules/kubernetes-config/v1.0" rel="noopener noreferrer"&gt;kubernetes-config&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The "main-stage2.tf" file includes all the essential configurations for Stage 2.&lt;br&gt;
The complete content of the file: &lt;a href="https://github.com/andygolubev/terraform-digital-ocean/blob/main/Infrastructure/digitalocean/infrastructure-live/test-v1/stage2/main-stage2.tf" rel="noopener noreferrer"&gt;main-stage2.tf&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="err"&gt;...&lt;/span&gt;
&lt;span class="k"&gt;module&lt;/span&gt; &lt;span class="s2"&gt;"kubernetes-config"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;source&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"../../../infrastructure-modules/kubernetes-config/v1.0"&lt;/span&gt;

  &lt;span class="nx"&gt;digital_ocean_api_token_for_k8s_config&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;digital_ocean_api_token&lt;/span&gt;
  &lt;span class="nx"&gt;k8s_config_cluster_name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k8s_cluster_name&lt;/span&gt;


  &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"kuber.work"&lt;/span&gt;
  &lt;span class="nx"&gt;service1-subdomain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"service-1-test-morning"&lt;/span&gt;
  &lt;span class="nx"&gt;service2-subdomain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"service-2-test-afternoon"&lt;/span&gt;
  &lt;span class="nx"&gt;service3-subdomain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"service-3-test-evening"&lt;/span&gt;
  &lt;span class="nx"&gt;lb-workaround-subdomain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"lb-workaround-test"&lt;/span&gt;
  &lt;span class="nx"&gt;service1-service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"goodmorning"&lt;/span&gt; 
  &lt;span class="nx"&gt;service2-service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"goodafternoon"&lt;/span&gt; 
  &lt;span class="nx"&gt;service3-service&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"goodevening"&lt;/span&gt; 
  &lt;span class="nx"&gt;cluster-issuer&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"letsencrypt-prod"&lt;/span&gt; &lt;span class="c1"&gt;# letsencrypt-prod or letsencrypt-staging&lt;/span&gt;
  &lt;span class="nx"&gt;ssl-redirect&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"false"&lt;/span&gt; &lt;span class="c1"&gt;# To accommodate the requirement for the service to respond on HTTP, a temporary value is assigned for certificate issuing.&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To illustrate and demonstrate the functionality of the system, I have incorporated three distinct services, each offering unique endpoints:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://service-1-test-morning.kuber.work" rel="noopener noreferrer"&gt;https://service-1-test-morning.kuber.work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://service-2-test-afternoon.kuber.work" rel="noopener noreferrer"&gt;https://service-2-test-afternoon.kuber.work&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://service-3-test-evening.kuber.work" rel="noopener noreferrer"&gt;https://service-3-test-evening.kuber.work&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I do the job in this order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up cluster issuers for cert-manager, which are required   for TLS certificates management.&lt;/li&gt;
&lt;li&gt;provisioning of ingress and apply any necessary workarounds as prescribed in the DigitalOcean documentation. &lt;a href="https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes" rel="noopener noreferrer"&gt;How to Set Up an Nginx Ingress with Cert-Manager on DigitalOcean Kubernetes&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Set up hello services.&lt;/li&gt;
&lt;li&gt;Catch the load balancer ip and provision DNS records.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"local_file"&lt;/span&gt; &lt;span class="s2"&gt;"get_load_balancer_script"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;content&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;-&lt;/span&gt;&lt;span class="no"&gt;EOF&lt;/span&gt;&lt;span class="sh"&gt;
  #!/bin/bash
  doctl kubernetes cluster list-associated-resources $1 -o json | jq '{ load_balancer_id: .load_balancers[0].id, load_balancer_name: .load_balancers[0].name }'
&lt;/span&gt;&lt;span class="no"&gt;  EOF

&lt;/span&gt;  &lt;span class="nx"&gt;filename&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"/tmp/get_load_balancer_id.sh"&lt;/span&gt;

  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="nx"&gt;kubernetes_ingress_v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;demo-ingress&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;kubernetes_service_v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ingress-nginx-controller&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"external"&lt;/span&gt; &lt;span class="s2"&gt;"load_balancer_details"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;program&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;local_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get_load_balancer_script&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;filename&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;k8s_config_cluster_name&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="nx"&gt;local_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get_load_balancer_script&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this section, I create and execute a local script that utilizes the "doctl" tool to retrieve the IP address of the load balancer.&lt;/p&gt;

&lt;p&gt;And finally get data of the load balancer and use it for DNS records provision&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight terraform"&gt;&lt;code&gt;&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"digitalocean_loadbalancer"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;external&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load_balancer_details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;load_balancer_id&lt;/span&gt;
  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="nx"&gt;local_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get_load_balancer_script&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;data&lt;/span&gt; &lt;span class="s2"&gt;"digitalocean_domain"&lt;/span&gt; &lt;span class="s2"&gt;"this"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;domain&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;resource&lt;/span&gt; &lt;span class="s2"&gt;"digitalocean_record"&lt;/span&gt; &lt;span class="s2"&gt;"service1"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;domain&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;digitalocean_domain&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;
  &lt;span class="nx"&gt;type&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"A"&lt;/span&gt;
  &lt;span class="nx"&gt;name&lt;/span&gt;   &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kd"&gt;var&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;service1-subdomain&lt;/span&gt;
  &lt;span class="nx"&gt;value&lt;/span&gt;  &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="k"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;digitalocean_loadbalancer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ip&lt;/span&gt;

  &lt;span class="nx"&gt;depends_on&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt; &lt;span class="nx"&gt;local_file&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;get_load_balancer_script&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="err"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, I have successfully provisioned the required resources and ...&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffy5qizrs63jxj2pzf748.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffy5qizrs63jxj2pzf748.png" alt="resources" width="800" height="345"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;... have DNS records.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2nturk2x89j9893nkp3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr2nturk2x89j9893nkp3.png" alt="DNS records" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How it's time to test our service:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;➜  ~ curl https://service-1-test-morning.kuber.work -v
*   Trying 146.190.178.50:443...
* Connected to service-1-test-morning.kuber.work 
...
* Server certificate:
*  subject: CN=service-1-test-morning.kuber.work
*  start date: May 17 10:20:20 2023 GMT
*  expire date: Aug 15 10:20:19 2023 GMT
*  subjectAltName: host "service-1-test-morning.kuber.work" matched cert's "service-1-test-morning.kuber.work"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
...
Good Morning!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Automate the process using GitHub actions
&lt;/h2&gt;

&lt;p&gt;For the automation I use two separate workflows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provision the infrastructure&lt;/li&gt;
&lt;li&gt;Destroy the infrastructure (Manual run)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Workflow for the infrastructure provision:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1-Provision-infrastructure&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;push&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;pull_request&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;branches&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main"&lt;/span&gt; &lt;span class="pi"&gt;]&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;PATH_STAGE_1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./Infrastructure/digitalocean/infrastructure-live/test-v1/stage1/"&lt;/span&gt;
  &lt;span class="na"&gt;PATH_STAGE_2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./Infrastructure/digitalocean/infrastructure-live/test-v1/stage2/"&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;provision&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provision the infrastructure in Digital Ocean and configure Kubernetes&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install doctl&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;digitalocean/action-doctl@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_API_TOKEN }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Provision the infrastructure&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;DO_API_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{secrets.DO_API_TOKEN }}&lt;/span&gt;
          &lt;span class="na"&gt;DO_BUCKET_ACCESS_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_BUCKET_ACCESS_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;DO_BUCKET_SECRET_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_BUCKET_SECRET_KEY }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_1 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY"&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_1 plan -var="digital_ocean_api_token=$DO_API_TOKEN"&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_1 apply -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_2 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY"&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_2 plan -var="digital_ocean_api_token=$DO_API_TOKEN" -var="k8s_cluster_name=$(cd $PATH_STAGE_1 &amp;amp;&amp;amp; terraform output -raw k8s_cluster_name)"&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_2 apply -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve -var="k8s_cluster_name=$(cd $PATH_STAGE_1 &amp;amp;&amp;amp; terraform output -raw k8s_cluster_name)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here I use the output of the Stage 1 as the input for the Stage 2:&lt;br&gt;
&lt;code&gt;terraform -chdir=$PATH_STAGE_2 apply -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve -var="k8s_cluster_name=$(cd $PATH_STAGE_1 &amp;amp;&amp;amp; terraform output -raw k8s_cluster_name)"&lt;/code&gt;&lt;br&gt;
so this command returns just the cluster name: &lt;br&gt;
&lt;code&gt;terraform output -raw k8s_cluster_name&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow for the infrastructure destruction:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2-Destroy-infrastructure&lt;/span&gt;

&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;workflow_dispatch&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;

&lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;PATH_STAGE_1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./Infrastructure/digitalocean/infrastructure-live/test-v1/stage1/"&lt;/span&gt;
  &lt;span class="na"&gt;PATH_STAGE_2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./Infrastructure/digitalocean/infrastructure-live/test-v1/stage2/"&lt;/span&gt;

&lt;span class="na"&gt;jobs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;destroy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Destroy the infrastructure in Digital Ocean&lt;/span&gt;
    &lt;span class="na"&gt;runs-on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ubuntu-latest&lt;/span&gt;
    &lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Install doctl&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;digitalocean/action-doctl@v2&lt;/span&gt;
        &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_API_TOKEN }}&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Checkout&lt;/span&gt;
        &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v3&lt;/span&gt;

      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Destroy the infrastructure&lt;/span&gt;
        &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;DO_API_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_API_TOKEN }}&lt;/span&gt;
          &lt;span class="na"&gt;DO_BUCKET_ACCESS_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_BUCKET_ACCESS_KEY }}&lt;/span&gt;
          &lt;span class="na"&gt;DO_BUCKET_SECRET_KEY&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.DO_BUCKET_SECRET_KEY }}&lt;/span&gt;
        &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_1 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY"&lt;/span&gt;
          &lt;span class="s"&gt;doctl auth init --access-token $DO_API_TOKEN&lt;/span&gt;
          &lt;span class="s"&gt;doctl kubernetes cluster kubeconfig save $(terraform -chdir=$PATH_STAGE_1  output -raw  k8s_cluster_name) || true&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_2 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY" || true&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_2 apply -destroy -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve -var="k8s_cluster_name=$(cd $PATH_STAGE_1 &amp;amp;&amp;amp; terraform output -raw k8s_cluster_name)" || true&lt;/span&gt;
          &lt;span class="s"&gt;terraform -chdir=$PATH_STAGE_1 apply -destroy -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this stage, I begin by destroying the resources in Stage 2, followed by the destruction of all remaining resources.&lt;/p&gt;

&lt;p&gt;To enable multiple runs of my pipelines, it is crucial to include the capability to destroy Stage 1 along with the main resources.&lt;br&gt;
To ignore any potential command failures before destruction of Stage 1, I use  "|| true" expression.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;terraform -chdir=$PATH_STAGE_2 apply -destroy --auto-approve || true&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;During my exploration, I encountered an issue with DigitalOcean regarding VPC and Project deletion. Despite not observing any associated resources in the console, DigitalOcean indicates that the VPC still possesses resources. Consequently, when attempting to remove the VPC using Terraform, it raises an error due to the inconsistency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecew6ywmebarzncyd7ts.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fecew6ywmebarzncyd7ts.png" alt="destruction error" width="800" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When I rerun the pipeline, this specific step executes without encountering any errors.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tyulonbtz5i027225fa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0tyulonbtz5i027225fa.png" alt="destruction success" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you experienced this issue, please contact me or drop a comment.&lt;/p&gt;

&lt;p&gt;Below, you will find a list of pipeline workflows:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpilje74wqu5iak8klb3s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpilje74wqu5iak8klb3s.png" alt="workflows" width="800" height="311"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion:
&lt;/h2&gt;

&lt;p&gt;In this article, I have demonstrated the process of provisioning various components of infrastructure using Terraform on DigitalOcean. By adopting a two-stage approach, I have overcome the limitations of the Terraform and the DigitalOcean providers.&lt;/p&gt;

&lt;p&gt;You can find all of my code in my GitHub repository: &lt;a href="https://github.com/andygolubev/terraform-digital-ocean" rel="noopener noreferrer"&gt;https://github.com/andygolubev/terraform-digital-ocean&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feel free to connect with me on LinkedIn: &lt;a href="https://www.linkedin.com/in/andy-golubev/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/andy-golubev/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>digitalocean</category>
      <category>kubernetes</category>
      <category>githubactions</category>
    </item>
  </channel>
</rss>
