DEV Community: andygolubev

AWS Cloud Formation doing crazy

andygolubev — Thu, 27 Nov 2025 07:01:12 +0000

Intro

I decided to write this article after a year and a half of actively using AWS CloudFormation across two separate products. Because it’s less popular than Terraform, finding solutions to some problems often meant piecing together hints from different sources. Here I’ll share my experience in the hope that it helps someone else solve their CloudFormation challenges.

A large part of this article is code. It’s mainly a note for myself in the future, so I can remember how I used AWS CloudFormation if I need to work with it again.

When you work with CloudFormation, there are some key differences from Terraform. For example: there’s no automatic drift remediation, deployments are all-or-nothing (no partial apply), you can’t deploy to multiple regions in one go, and stack policies have their own quirks you need to understand.

Below I will show how to overcome these challenges to deploy this example architecture. Code is available on github.

CloudFormation Demo Stack for this Article

Purpose:
End-to-end AWS reference environment that bootstraps networking, security, compute, data, and edge delivery through layered CloudFormation templates orchestrated by cfn-stacks/10-main-stack.yaml.

Github: https://github.com/andygolubev/article-cfn-pain-points

High resolution image is here: https://raw.githubusercontent.com/andygolubev/article-cfn-pain-points/cec0fbbbc884efe83831c6f75ba365fc887580c9/solution_diagram.png

CloudFormation blueprints (cfn-stacks/):
https://github.com/andygolubev/article-cfn-pain-points/tree/main/cfn-stacks
Modular stacks for shared artifacts and ECR registries, VPC and NAT topology, Route 53 hosted zone, Aurora/PostgreSQL, ElastiCache Redis, Fargate-based ECS, API Gateway fronting the internal NLB, EventBridge wiring, WAF protection, Lambda resources, and a us-east-1 global stack providing ACM/CloudFront distribution with DNS aliases. Parameter sets live in parameters-.json, while main-stack-policy.json locks down updates in stage/prod.

Lambda layer (lambda-layer/):
https://github.com/andygolubev/article-cfn-pain-points/tree/main/lambda-layer
Dockerfile-driven build that packages shared Python helpers like common_service.get_hello_world() into a reusable layer zip (lambda_layer.zip) for multiple functions; build commands are documented in the folder README.

Lambda functions (lambda-functions/demo_lambda/):
https://github.com/andygolubev/article-cfn-pain-points/tree/main/lambda-functions
https://github.com/andygolubev/article-cfn-pain-points/tree/main/ecr-repo-services/demo-antivirus-scanner
Sample Python handler that imports the shared layer artifact to return a greeting and request metadata, demonstrating code reuse across functions.

ECS service (ecr-repo-services/):
https://github.com/andygolubev/article-cfn-pain-points/tree/main/ecr-repo-services/demo-backend-service
Two example workloads with ready-to-push Dockerfiles—demo-backend-service (Go HTTP service for ECS Fargate) and demo-antivirus-scanner (Python ARM64 Lambda image)—each with snippets for authenticating to ECR, creating repositories, and pushing images.

Frontend sample (cloudfront-frontend-code/):
https://github.com/andygolubev/article-cfn-pain-points/tree/main/cloudfront-frontend-code
Minimal static site that represents the S3-hosted SPA/front-end assets later served through CloudFront.

Automation scripts (scripts/):
https://github.com/andygolubev/article-cfn-pain-points/tree/main/scripts
01-deploy-cfn.sh orchestrates regional stack deployments, parameter wiring, and layer uploads; 02-deploy-cfn-global.sh handles the us-east-1 global stack, reading outputs from the regional deployment.

The codebase is deployable and operational; I’ve verified it in my AWS account =)

Deploying to Multiple Regions

CloudFormation wasn’t really designed for comfortable multi-region deployments. I don’t know why. But there are workarounds.

Here’s what I’ve used:

A Bash wrapper that deploys different resources to different regions and passes parameters between them.
StackSets to push the needed resources into another region, plus Secrets Manager replication to bring the final value back into the original region.

With the Bash approach, you can call aws cloudformation deploy multiple times with different parameters. To fetch values for later steps, use aws cloudformation list-exports.

Example:

WEGO_HOSTED_ZONE_ID=$(aws cloudformation list-exports --region $REGION | jq -r ".Exports[] | select(.Name == \"demo-hosted-zone-id\") | .Value")

WEGO_HOSTED_ZONE_DOMAIN=$(aws cloudformation list-exports --region $REGION | jq -r ".Exports[] | select(.Name == \"demo-hosted-zone-domain-name\") | .Value")

DEMO_CLOUDFRONT_CERTIFICATE_DOMAIN_NAME=$(jq -r '.[] | select(.ParameterKey == "DemoCloudFrontCertificateDomainNameParam") | .ParameterValue' "parameters-$4.json")

S3_DEMO_BUCKET_NAME=$(aws cloudformation describe-stacks --stack-name demo-s3-stack --region $REGION --query "Stacks[0].Outputs[?OutputKey=='DemoFrontendBucketName'].OutputValue" --output text)

S3_DEMO_BUCKET_OAI=$(aws cloudformation describe-stacks --stack-name demo-s3-stack  --region $REGION --query "Stacks[0].Outputs[?OutputKey=='DemoFrontendCloudFrontOAI'].OutputValue" --output text)

When you use StackSets, you need to add a few roles and some shared plumbing (the StackSet itself). The final template for the deployment has to be embedded inside the StackSet. It’s not pretty—linters won’t parse this setup—but for one-off cases it’s good enough.

Deploying big stacks

To pass parameters between stacks you have a few options:

Nested stacks
Exports and imports

At first glance, exports/imports look cleaner. In practice, they can lock you in. Once you export a value and other stacks start importing it, you can’t change that value freely. To update it, you have to touch every stack that consumes the export. The good news: it’s easy to see which stacks are using your export.

Because of this, I usually prefer nested stacks with parameter passing. When the root stack changes, CloudFormation updates all dependent resources automatically—either by applying changes or recreating what’s needed. It keeps the dependency chain explicit and the updates predictable.

Applying stack policy to nested stacks

When you apply a stack policy to the root stack, it doesn’t automatically cover the nested stacks. Each nested stack is its own stack with its own policy. Because of that, I set the policy separately for every nested stack—usually in a small loop/script that iterates over child stacks and applies the policy to each one.

NESTED_STACK_ARNS=$(aws cloudformation describe-stack-resources --stack-name demo-main-stack  --region $REGION --query "StackResources[?ResourceType=='AWS::CloudFormation::Stack'].PhysicalResourceId" --output text)

echo "Setting stack policy for demo main stack: demo-main-stack"
aws cloudformation set-stack-policy --stack-name demo-main-stack --stack-policy-body file://main-stack-policy.json --region $REGION
if [ $? -ne 0 ]; then
echo "Error setting stack policy to demo main stack. Exiting..."
exit 1
fi

# Apply stack policy to each nested stack
for STACK in $NESTED_STACK_ARNS; do
    echo "Setting stack policy for nested stack: $STACK"
    aws cloudformation set-stack-policy --stack-name $STACK --stack-policy-body file://./main-stack-policy.json --region $REGION
    if [ $? -ne 0 ]; then
    echo "Error setting stack policy to nested stack: $STACK. Exiting..."
    exit 1
    fi
done

for STACK in $NESTED_STACK_ARNS; do
    echo "Get stack policy for nested stack: $STACK"
    aws cloudformation get-stack-policy --stack-name $STACK --region $REGION --output json --no-cli-pager | jq '.StackPolicyBody | fromjson'
    if [ $? -ne 0 ]; then
    echo "Error getting stack policy from nested stack: $STACK. Exiting..."
    exit 1
    fi
done

How to deploy this stack

You can deploy this stack with aws cli tool. You also need jq to be installed.

It uses different parameters-env.json in cfn-stacks/ folder
for each environment.

Example:

./scripts/01-deploy-cfn.sh --region eu-central-1 --env dev
./scripts/02-deploy-cfn-global.sh --region eu-central-1 --env dev

Conclusion

You don’t always need to rely on out-of-the-box solutions, especially when they don’t fit your needs. With a bit of creativity and the right open-source tools, you can build a custom solution that’s both effective and cost-efficient. In this case, combining Prometheus, Grafana, Loki, and a few other tools, I managed to set up a reliable monitoring system that works perfectly for a small startup without breaking the bank.

I hope you enjoyed this article.

You can find all my articles on: https://andygolubev.com/

You can find all of my code in my GitHub repository: https://github.com/andygolubev/article-cfn-pain-points/tree/main

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/

Monitoring multiple k8s clusters on Digital Ocean with Prometheus and Grafana deployed using Terraform and Ansible role

andygolubev — Mon, 26 Aug 2024 21:09:30 +0000

Intro

The internet is full of ready-made solutions for every taste, but problems arise when they don't fit your needs. That's when it's time to come up with something custom.

This time, the challenge was to collect metrics from two K8S clusters located in different VPCs.

It seemed like a simple task:

Set up a dedicated server for Prometheus.
Create a VPC Peering connection.
Deploy Prometheus in each cluster.
Set up federation.

However, the issue is that Digital Ocean Cloud doesn't support VPC Peering as of this writing (link to documentation), meaning all metrics would leave the cloud, go to the internet, and then come back, causing unnecessary traffic costs.

To avoid this, we had to come up with alternative solutions that would work for a small startup while avoiding extra expenses.

Solution

So, here's the solution I implemented.

There are three VPCs. Two of them host the clusters, and the third one contains the supporting tools, including a server with Grafana.

Grafana connects to each cluster and pulls data for the dashboard. This setup ensures that traffic only flows when the dashboard is being viewed. Authentication is handled at the ingress.

Tools

For the implementation, I chose the following tools:

Prometheus
Grafana
Loki
Terraform
Ansible role
Nginx
Docker Compose
Ubuntu server on Droplet

I use Terraform to provision the server for the subsequent Grafana installation, as well as to create DNS records.

I use an Ansible role to configure the server, including the installation and launch of all necessary services:

Certbot
Docker
Nginx
Grafana dashboards

Prometheus and Grafana Loki are installed in the K8S clusters, from where the metrics and logs are collected.

The Code

I have a standard Ansible role written with tasks and templates to automate the setup and configuration.

➜  monitoring-prometheus git:(main) tree monitoring_role 
monitoring_role
├── README.md
├── defaults
│   └── main.yml
├── files
│   ├── grafana-k8s-cluster-dashboard.json
│   ├── grafana-k8s-logs-dashboard.json
│   └── grafana-k8s-volumes-dashboard.json
├── handlers
│   └── main.yml
├── meta
│   └── main.yml
├── tasks
│   ├── 01_wait_for_initialization.yml
│   ├── 02_install_certbot_and_configure_nginx.yml
│   ├── 03_install_docker.yml
│   ├── 04_add_monitoring_user.yml
│   ├── 05_copy_configuration_files.yml
│   ├── 06_run_containers.yml
│   ├── 07_enable_ufw.yml
│   └── main.yml
├── templates
│   ├── dashboards.yaml.j2
│   ├── datasources.yaml.j2
│   ├── default.j2
│   └── docker-compose.yml.j2
├── tests
│   ├── inventory
│   └── test.yml
└── vars
    └── main.yml

The Ansible role handles the issuance of certificates, installs Nginx, sets up Grafana with dashboards, and starts Docker compose.

Result

Here is an example of how the final dashboard looks:

Conclusion

I hope you enjoyed this article.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/monitoring-prometheus

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/

Backup tool using AWS Batch, ECS and Fargate for backuping objects from other clouds

andygolubev — Tue, 13 Feb 2024 10:11:49 +0000

Intro

Sometimes we have to do dull tasks like setting up a backup system. We want to do it in the easiest and cheapest way possible. That's why I think AWS gives us lots of services to do it smoothly.
Because my computing stuff is on Digital Ocean, I couldn't use AWS Backup. So I looked at services that can provide infrequent workload and don't cost anything when they're not busy.
For this plan, I picked AWS Batch with ECS on Fargate. And I use Event Bridge scheduler to start the jobs.

Solution

This is a general picture of my solution.

I'm setting up everything using AWS CloudFormation.

...
  BackuperComputeEnvironment:
    Type: AWS::Batch::ComputeEnvironment
    Properties:
      Type: MANAGED
      ComputeEnvironmentName: backuper-environment
      ComputeResources:
        MaxvCpus: 4
        SecurityGroupIds:
          - !Ref BackuperSecurityGroup
        Type: FARGATE
        Subnets:
          - !Ref BackuperSubnet
      Tags: {"Name" : "BackuperComputeEnvironment", "CreatedBy" : "CloudFormationStack", "App" : "Backuper"}
      State: ENABLED

  BackuperJobDefinition:
    Type: AWS::Batch::JobDefinition
    Properties:
      Type: container
      JobDefinitionName: BackuperJobDefinition
      PlatformCapabilities:
        - FARGATE
      ContainerProperties:
        Image: registry.hub.docker.com/andygolubev/backuper:latest
        Environment:
          - Name: AWS_BACKUP_DESTINATION_BUCKET
            Value: !Ref awsBackupDestinationBucketName
          - Name: DO_PG_USER
            Value: !Ref doPgUser
          - Name: DO_KEY
            Value: !Ref doKey
          - Name: DO_PG_DBNAME
            Value: !Ref doPgDbname
          - Name: DO_PG_HOST
            Value: !Ref doPgHost
          - Name: DO_SECRET
            Value: !Ref doSecret
          - Name: DO_REGION_ENDPOINT
            Value: !Ref doRegionEndpoint
          - Name: DO_PG_PORT
            Value: !Ref doPgPort
          - Name: DO_PG_PASSWORD
            Value: !Ref doPgPassword
        Command:
          - /bin/bash
          - -c
          - /backuper/s3_backup_script.sh && /bin/bash -c /backuper/postgre_backup_script.sh
        Privileged: false
        JobRoleArn: !GetAtt  BackuperAmazonECSTaskExecutionRole.Arn
        ExecutionRoleArn: !GetAtt BackuperAmazonECSTaskExecutionRole.Arn
        ReadonlyRootFilesystem: false
        NetworkConfiguration:
          AssignPublicIp: ENABLED
        ResourceRequirements:
          - Type: MEMORY
            Value: 1024
          - Type: VCPU
            Value: 0.5
        LogConfiguration:
          LogDriver: awslogs
          Options:
            "awslogs-group": !Ref BackuperLogGroup
            "awslogs-stream-prefix": "prefix"
      Tags: {"Name" : "BackuperJobDefinition", "CreatedBy" : "CloudFormationStack", "App" : "Backuper"}

...

I use my Docker Image with built-in bash scripts.

FROM ubuntu:22.04
WORKDIR /tmp

# install tools
RUN apt update && apt -y upgrade && apt -y --no-install-suggests --no-install-recommends install wget unzip curl tree git jq gettext zip ca-certificates

# install aws cli v2
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
    unzip awscliv2.zip && \
    ./aws/install && \ 
    aws --version

# install latest postgre tools
RUN apt -y install lsb-release gnupg2 --no-install-suggests --no-install-recommends && \
    sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' && \
    wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add - && \
    apt update && \
    apt -y install postgresql-client

# make working folders
RUN mkdir /backuper; mkdir /backup; mkdir /backup_db
WORKDIR /backuper

# declare variables for s3_backup_script.sh
ENV DO_KEY=NOT_DEFINED
ENV DO_SECRET=NOT_DEFINED
ENV DO_REGION_ENDPOINT=NOT_DEFINED
ENV AWS_BACKUP_DESTINATION_BUCKET=NOT_DEFINED

# declare variables for postgre_backup_script.sh
ENV DO_PG_HOST=NOT_DEFINED
ENV DO_PG_PORT=NOT_DEFINED
ENV DO_PG_USER=NOT_DEFINED
ENV DO_PG_PASSWORD=NOT_DEFINED
ENV DO_PG_DBNAME=NOT_DEFINED

# create a buckets backup script inside the docker image
RUN echo "BUCKETS_ALL=\$(AWS_ACCESS_KEY_ID=\$DO_KEY AWS_SECRET_ACCESS_KEY=\$DO_SECRET aws s3 ls --endpoint=\$DO_REGION_ENDPOINT  | awk '{print \$3}')" >> /backuper/s3_backup_script.sh
RUN echo '\
echo "This buckets will be processed: $BUCKETS_ALL" \n\
for BUCKET in $BUCKETS_ALL \n\
do \n\
  echo "Processing bucket -> $BUCKET" \n\
  mkdir -p /backup/$BUCKET/ \n\
  AWS_ACCESS_KEY_ID=$DO_KEY AWS_SECRET_ACCESS_KEY=$DO_SECRET aws s3 cp --quiet --recursive --endpoint=$DO_REGION_ENDPOINT s3://$BUCKET /backup/$BUCKET/ \n\
  ZIP_FILE_DATE_TIME=$(date +%Y-%m-%d--%H-%M) \n\
  zip --recurse-paths --quiet /backup/$ZIP_FILE_DATE_TIME-UTC-$BUCKET-bucket_backup.zip /backup/$BUCKET/ \n\
  aws s3 cp --storage-class GLACIER_IR /backup/$ZIP_FILE_DATE_TIME-UTC-$BUCKET-bucket_backup.zip  s3://$AWS_BACKUP_DESTINATION_BUCKET \n\
  echo "Successfully Processed -> $BUCKET" \n\
done \n\
echo "Bucket backup is COMPLITED" \
' >> /backuper/s3_backup_script.sh

# create a postgre backup script inside the docker image
RUN echo '\
echo "Making dump for PostgreSQL Database --> $DO_PG_DBNAME" \n\
mkdir -p /backup_db/$DO_PG_DBNAME/ \n\
PGPASSWORD=$DO_PG_PASSWORD pg_dump -U $DO_PG_USER -h $DO_PG_HOST -p $DO_PG_PORT -Fc $DO_PG_DBNAME > /backup_db/$DO_PG_DBNAME/$DO_PG_DBNAME.dump \n\
PGPASSWORD=$DO_PG_PASSWORD pg_dump -U $DO_PG_USER -h $DO_PG_HOST -p $DO_PG_PORT $DO_PG_DBNAME > /backup_db/$DO_PG_DBNAME/$DO_PG_DBNAME.sql \n\
ZIP_FILE_DATE_TIME=$(date +%Y-%m-%d--%H-%M) \n\
zip --recurse-paths --quiet /backup_db/$ZIP_FILE_DATE_TIME-UTC-$DO_PG_DBNAME-postgre_backup.zip /backup_db/$DO_PG_DBNAME/ \n\
aws s3 cp --storage-class GLACIER_IR /backup_db/$ZIP_FILE_DATE_TIME-UTC-$DO_PG_DBNAME-postgre_backup.zip  s3://$AWS_BACKUP_DESTINATION_BUCKET \n\
echo "Successfully Processed -> $DO_PG_DBNAME" \n\
echo "Postgre backup is COMPLITED" \
' >> /backuper/postgre_backup_script.sh

# make the script runnable
RUN chmod +x /backuper/s3_backup_script.sh
RUN chmod +x /backuper/postgre_backup_script.sh

ENTRYPOINT ["/bin/bash", "-c",  "/backuper/s3_backup_script.sh && /bin/bash -c /backuper/postgre_backup_script.sh"]

Once the CloudFormation stack is deployed successfully, we can see all the resources that have been created.

And as you can see, our AWS Batch service is fully set up and waiting for a trigger event.
The scheduler is set with these configurations, and the cron will initiate an event at 8 AM UTC.

We have an Event Bridge rule that sifts through Batch Job-related events, keeping sensitive data out, and then sends them to the SNS Topic.

After running our job (either manually or by the scheduler), we can see it on our dashboard and receive emails with the job's status.

The result notification looks like this.

Conclusion

Here's an example of how you can handle occasional workloads efficiently with AWS Batch + ECS + Fargate, all while keeping costs down. Give it a shot!

I hope you enjoy this article.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/aws-backup-with-batch-and-fargate

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/

Kubernetes The Hard Way on AWS with Packer and Terraform

andygolubev — Tue, 17 Oct 2023 09:58:28 +0000

Introduction

Kubernetes has undoubtedly become the de facto standard for container orchestration, offering a powerful and flexible platform for deploying, managing, and scaling containerized applications. As organizations increasingly adopt cloud-native architectures, mastering Kubernetes has become a critical skill for both developers and operations teams. While there are numerous managed Kubernetes services available in the cloud, there's immense value in understanding the intricacies of Kubernetes by building it from scratch, often referred to as "the hard way."
I don't consider myself a regular user of this type of Kubernetes cluster because it can be challenging to maintain. However, it does serve as a valuable tool for educational purposes.

I created this cluster with guidance from an ACloudGuru course called "Kubernetes the hard way." It was quite a challenge because the course utilized an older version of Kubernetes and an outdated DNS plugin. As a result, I had to modify many scripts and troubleshoot extensively. However, despite the difficulties, it turned out to be an enjoyable experience. Additionally, I had to design a multi-Availability Zone (AZ) network for my EC2 instances and set up all the necessary network components. This was necessary because the original course had initially placed all the hosts in the same security group.

Architecture

My solution utilizes three Availability Zones (AZs) within the same region. Additionally, I employ a bastion host for communication with my cluster. All of the EC2 instances use custom images that I constructed during the previous stage of my pipeline.

hi res image

Realization

I start by creating a Terraform state bucket and a DynamoDB table using the AWS CLI. This is a fairly common block in my pipelines.

...
env:
  # prefixes must be the same as in the 00-provider.tf
  AWS_BUCKET_NAME_PREFIX: "terraform-state-for-kubernetes-the-hard-way-packer" 
  AWS_DYNAMO_DB_TABLE_NAME_PREFIX: "terraform-state-for-terraform-state-for-kubernetes-the-hard-way-packer"

  AWS_REGION: ${{ vars.AWS_REGION }}

...
    - name: Create a bucket
      run: |
        if [[ "${{ env.AWS_REGION }}" == "us-east-1" ]]; then
          aws s3api create-bucket --bucket $AWS_BUCKET_NAME_PREFIX-$AWS_REGION --region $AWS_REGION --no-cli-pager
        else
          aws s3api create-bucket --bucket $AWS_BUCKET_NAME_PREFIX-$AWS_REGION --region $AWS_REGION --no-cli-pager --create-bucket-configuration LocationConstraint=$AWS_REGION
        fi

        aws s3api put-bucket-versioning --bucket $AWS_BUCKET_NAME_PREFIX-$AWS_REGION --versioning-configuration Status=Enabled
        aws s3api put-bucket-encryption --bucket $AWS_BUCKET_NAME_PREFIX-$AWS_REGION --server-side-encryption-configuration '{"Rules": [{"ApplyServerSideEncryptionByDefault": {"SSEAlgorithm": "AES256"}}]}'

    - name: Create a DynamoDB table
      run: |
        aws dynamodb create-table --table-name $AWS_DYNAMO_DB_TABLE_NAME_PREFIX-$AWS_REGION --attribute-definitions AttributeName=LockID,AttributeType=S --key-schema AttributeName=LockID,KeyType=HASH --billing-mode PAY_PER_REQUEST --tags Key=Name,Value="terraform state dynamo table" Key=CreatedBy,Value="AWS CLI" Key=Region,Value=$AWS_REGION 

    - name: Create a default VPC in the region
      run: |
        aws ec2 create-default-vpc || true    # create default VPC if not exist. It is required for AMI building

Then I use bash scripts within the pipeline to create certificates and configurations.

➜  scripts-for-certs-and-configs git:(main) tree 
.
├── 00-k8s-network.sh
├── 01-certs-ca.sh
├── 02-certs-components.sh
├── 03-certs-api-server.sh
├── 04-certs-service-account.sh
├── 05-kubeconfig.sh
├── 06-generate-encryption-config.sh
├── 07-generate-etcd-service.sh
├── 08-generate-control-plane-configs.sh
├── 09-generate-cluster-role.sh
├── 10-generate-ngix-config.sh
├── 11-generate-containerd-config.sh
├── 12-generate-kubelet-config.sh
├── 13-generate-kube-proxy-config.sh
├── 14-bastion-key.sh
├── 15-generate-wavenet-manifest.sh
└── 16-generate-coredns-manifest.sh

After that, I utilize Packer to build all the Amazon Machine Images (AMIs) and copy necessary files.

==> Builds finished. The artifacts of successful builds are:
--> k8s-control-plane-2.amazon-ebs.ubuntu-kubernetes-the-hard-way-control-plane-2: AMIs were created:
us-west-2: ami-0314729fef4933bdc

--> k8s-control-plane-0.amazon-ebs.ubuntu-kubernetes-the-hard-way-control-plane-0: AMIs were created:
us-west-2: ami-088c138db1acf379f

--> k8s-control-plane-1.amazon-ebs.ubuntu-kubernetes-the-hard-way-control-plane-1: AMIs were created:
us-west-2: ami-0d348cd361f433388

--> k8s-load-balancer-internal.amazon-ebs.ubuntu-kubernetes-the-hard-way-load-balancer-internal: AMIs were created:
us-west-2: ami-07cbdbe6027e64882

--> k8s-bastion-host.amazon-ebs.ubuntu-kubernetes-the-hard-way-bastion-host: AMIs were created:
us-west-2: ami-0f0e571a26d6ac08a

--> k8s-working-node-1.amazon-ebs.ubuntu-kubernetes-the-hard-way-working-node-1: AMIs were created:
us-west-2: ami-015f7e349eb6ec7ac

--> k8s-working-node-2.amazon-ebs.ubuntu-kubernetes-the-hard-way-working-node-2: AMIs were created:
us-west-2: ami-03feac7b952f2ce5c

--> k8s-working-node-0.amazon-ebs.ubuntu-kubernetes-the-hard-way-working-node-0: AMIs were created:
us-west-2: ami-089c713f758a50a63

Finally, I provision the entire infrastructure using my custom AMIs through Terraform.

➜  terraform git:(main) tree
.
├── 00-provider.tf
├── 01-vpc.tf
├── 02-subnets.tf
├── 03-security-groups.tf
├── 04-route-tables.tf
├── 05-nat-gateway.tf
├── 06-ssh-key.tf
├── 07-ec2-control-plane.tf
├── 08-ec2-load-balancer.tf
├── 09-ec2-working-node.tf
├── 10-ec2-bastion.tf
└── 99-variables.tf

You can watch a time-lapsed video.

Build and provision with GitHub Actions

This is what we observe in the AWS console

Here are the results as I see them in the AWS console:

VPC

AMIs

Instances

Kubernetes objects

Conclusion

To summarize, my journey of building Kubernetes on AWS using Terraform and Packer was very educational. Although it was not easy, it was a unique opportunity to learn Kubernetes architecture and how it works in depth.

I hope you enjoy this article.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/kubernetes-the-hard-way-aws

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/

AWS Serverless image recognition Telegram bot using Terraform

andygolubev — Tue, 27 Jun 2023 11:45:19 +0000

Introduction

The world of technology is constantly evolving, and with it comes the need for efficient and scalable solutions. Serverless architecture has gained significant popularity due to its ability to handle workloads without the need for infrastructure management. In this article, we will explore the process of building an AWS Serverless image recognition Telegram bot using Terraform.

With a pay-as-you-go pricing model, you only incur costs when functions are executed, ensuring cost effectiveness. Additionally, the availability of the AWS Free Tier means that for small workloads, you pay nothing.

This bot utilizes webhooks from Telegram, enabling it to operate in a reactive manner, responding promptly to specific events. By leveraging webhooks, the bot remains idle until triggered.

Solution diagram

The diagram below illustrates the complete solution, employing a range of AWS services to ensure its seamless functionality.
At the heart of the architecture lies the Lambda function, playing a pivotal role in executing the desired operations. To optimize the efficiency of Lambda deployments and accelerate initialization, Lambda layers are employed. These layers contain all the necessary dependencies, streamlining the deployment process and facilitating faster development iterations.

To ensure the security of sensitive information, such as the BOT Token, I utilize Secrets Manager. This secure vault enables all lambdas to access the token directly, eliminating the need to store it in environment variables.
The storage of user-sent images is facilitated by an S3 bucket. This allows AWS Rekognition access and retrieval of images by simply defining the path to the image within the bucket.

API Gateway acts as a proxy for lambda function calls, providing a seamless communication channel. Beyond its immediate role, API Gateway offers potential future benefits, such as traffic routing and the ability to create development environments for APIs. This versatility positions the product for future scalability and easy integration with evolving requirements.

For simple statistics storage, DynamoDB serves as an effective solution. By leveraging DynamoDB, the solution efficiently stores and retrieves statistical data, ensuring reliable data management without unnecessary complexity.

Lastly, AWS Rekognition is utilized to detect labels on the pictures. While the implementation utilizes the smallest capability of the service due to development time constraints, it serves as a demonstration of its functionality. AWS Rekognition offers powerful image analysis capabilities, which can be further explored and enhanced in future iterations.

Bot functionality

The bot's functionality revolves around three simple entities:

Text processing
Image recognition
Statistics request

Initially, to streamline the implementation, I consolidated these functionalities within a single Lambda function. However, as the logic grows more complex, I am inclined to adopt a modular approach by separating these functionalities into individual Lambda functions.

Bot Setup

Setting up a new bot is a straightforward process that requires just three simple steps, all of which can be accomplished with the help of the BotFather. Let's dive into the process:

Step 1: Request a New Bot
Step 2: Choose a Bot Name
Step 3: Assign an Account Name

Optional Step: Set Bot Avatar Image

You can see it in the screenshots:

Terraform project

To ensure a smooth and efficient setup process, I employ Terraform, an industry-leading Infrastructure as Code (IaC) tool. With Terraform, I can easily provision the entire infrastructure stack required for the project.

Here is the entire repository:

user@ubuntu:~/aws-telegram-bot-serverless$ tree
.
├── LICENSE
├── lambda
│   ├── bot-dependencies-layer
│   │   └── requirements.txt
│   ├── bot-function
│   │   └── bot.py
│   └── webhook-function
│       └── webhook.py
└── terraform
    ├── 00-provider.tf
    ├── 01-roles.tf
    ├── 02-secrets.tf
    ├── 03-lambda-layer-with-dependencies.tf
    ├── 04-lambda-bot.tf
    ├── 05-lambda-webhook.tf
    ├── 06-api-gateway.tf
    ├── 07-bucket-images.tf
    ├── 08-dynamodb-stats.tf
    ├── 98-output.tf
    ├── 99-data.tf
    ├── create_bucket.sh
    ├── terraform.tfvars
    └── variables.tf

5 directories, 18 files

Let's take a closer look at the comprehensive list of resources provisioned:

user@ubuntu:~/aws-telegram-bot-serverless/terraform$ terraform state list
data.archive_file.lambda-bot-zip-file
data.archive_file.layer-zip-file
data.archive_file.webhook-function-zip-file
data.aws_caller_identity.current-account
data.aws_lambda_invocation.webhook-lambda-invocation
aws_apigatewayv2_api.call-back-api
aws_apigatewayv2_integration.api-gw-to-lambda
aws_apigatewayv2_route.post-callback-route
aws_apigatewayv2_stage.prod
aws_cloudwatch_log_group.call-back-api-gw
aws_cloudwatch_log_group.lambda-log-bot
aws_cloudwatch_log_group.lambda-log-webhook
aws_dynamodb_table.aws-telegram-bot-statistics
aws_iam_policy.custom-policy
aws_iam_role.lambdaRole
aws_iam_role_policy_attachment.custom-policy-attachment
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/AWSLambdaExecute"]
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/AWSXrayWriteOnlyAccess"]
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/AmazonRekognitionReadOnlyAccess"]
aws_iam_role_policy_attachment.policy-attachment["arn:aws:iam::aws:policy/service-role/AmazonS3ObjectLambdaExecutionRolePolicy"]
aws_lambda_function.bot-lambda
aws_lambda_function.webhook-lambda
aws_lambda_layer_version.lambda-layer-for-packages
aws_lambda_permission.api_gw
aws_s3_bucket.images-bucket
aws_s3_bucket_lifecycle_configuration.images-bucket-name-lifecycle_configuration
aws_secretsmanager_secret.bot-token-secret
aws_secretsmanager_secret_version.sversion
null_resource.pip-install

While provisioning my infrastructure using Terraform, I encountered a challenge related to environment dependencies. To ensure the successful execution of my Terraform code, I rely on a local provisioner that involves executing a Python PIP Install command and storing the results in the /tmp folder. In order to address this issue and ensure a consistent setup across environments, I have implemented the following solution on GitHub.

GitHub Actions

To ensure convenience and flexibility, I have designed the bot to be deployable using two different methods: GitHub Action and local setup. This allows you to choose the approach that best suits your preferences and requirements.

To facilitate this deployment flexibility, I have made certain modifications to the Terraform backend section. Since I do not use Terragrunt in this project, I have incorporated a sed command in my pipeline. This allows me to dynamically rewrite the Terraform backend section, specifying the appropriate AWS region. Although it may not be the most elegant solution, it effectively ensures the correct configuration for the backend.

...
jobs:
  terraform:
    name: 'Deploy the bot on AWS'
    runs-on: ubuntu-latest

...

    - name: Replace the Region in the Provider section of Terraform  
      run: sed -i 's/us-east-1/${{ env.AWS_REGION }}/g' $TERRAFORM_PATH/00-provider.tf

    - name: Terraform Init
      run: terraform -chdir=$TERRAFORM_PATH init
...

I store a few secrets and one variable securely on GitHub. You should do the same if you want to use my code.

And I provide three workflows that simplify the deployment and management process of the bot:

1 - Create Terraform State Bucket and DynamoDB Table:

This workflow enables the creation of a Terraform state bucket and a DynamoDB table. It's important to note that these objects will not be managed by Terraform itself and will require manual removal when necessary.

2 - Provisioning of Essential Bot Services:

The second workflow automates the provisioning of all the services required by the bot.

3 - Destruction of Managed Infrastructure:

The final workflow focuses on the controlled destruction of the infrastructure managed by Terraform. This process ensures the efficient cleanup of resources when they are no longer needed.

While utilizing Step 1, it is crucial to exercise caution. As S3 buckets have a global namespace, conflicts may arise when selecting bucket names. To mitigate this, it may be necessary to modify bucket names prefix. This ensures uniqueness and prevents naming conflicts within the global S3 namespace.

Demo Time

To provide a clearer understanding of the bot's operation, I present a few screens showcasing its functionality in action.

Image Recognition:

Statistics Request:

If you have a Telegram account, I invite you to try out the bot and explore its capabilities firsthand. I provide this bot until the end of July 2023. Further service is not guaranteed because I don’t want to go beyond AWS Free Tier.

Conclusion

The AWS serverless stack offers you a powerful and scalable solution out of the box. This project demonstrates the potential of serverless architectures and highlights the ease of development using Terraform.

I hope you enjoy this article.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/aws-telegram-bot-serverless

You can try this bot yourself: http://t.me/AWS_Image_Rekognition_Bot

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/

Automate Publishing Markdown Files from GitHub to Confluence with github-to-confluence-publisher tool

andygolubev — Tue, 23 May 2023 09:39:12 +0000

Introduction

Managing documentation across different platforms can be a time-consuming task, especially when it involves converting and uploading files manually. However, with the GitHub to Confluence Publisher tool, you can automate the process of publishing Markdown files from GitHub to Confluence effortlessly. This script simplifies the conversion of Markdown files into Confluence markup and streamlines the uploading process to your Confluence space. In this article, we will explore the setup, configuration, and functionality of the github-to-confluence-publisher tool.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/github-to-confluence-publisher

Setting Up github-to-confluence-publisher

To get started, follow these steps to set up the github-to-confluence-publisher tool.

Create a new domain in Confluence Cloud:

Before using the tool, you need to create a new domain in Confluence Cloud. You can do this by visiting the Atlassian website and accessing the domain creation page.

Create a new space, parent page, and API token:

Within your Confluence Cloud domain, create a new space where you want to publish your documentation. Take note of the space's name and the parent page's ID, as you will need them for configuration later. Additionally, generate an API token for authentication purposes.

Configure the publisher:

Open the config.yaml file in the publisher/config directory and update the following values with your specific information:

confluence_url: The URL of your Confluence REST API.
confluence_space: The name of the space you created earlier.
confluence_parent_page_id: The ID of the parent page within your space.
confluence_search_pattern: A search pattern used to identify autogenerated pages. It is recommended to use a random value to ensure proper deletion of autogenerated pages.
github_folder_with_md_files: The path to the folder containing your Markdown files on GitHub.
github_folder_with_image_files: The path to the folder containing your image files on GitHub.

My example config:

confluence_url: https://test-publisher.atlassian.net/wiki/rest/api/

confluence_space: Documentat 
counfluence_parent_page_id: 262359
confluence_search_pattern: (this page is autogenerated)
github_folder_with_md_files: ./data
github_folder_with_image_files: ./data_images

Once you have completed the setup process, you can start using the github-to-confluence-publisher tool.

How it works:

Initially, your Confluence space will be empty, as shown in the provided screenshot.

Running the publisher: Run the publisher script either locally or using GitHub Actions. The script will search for pages in your Confluence space that match the confluence_search_pattern and delete them.

Local run: If you choose to run the script locally, you will see a similar output to the "Local run" screenshot provided.

GitHub Actions: If you prefer using GitHub Actions, the process will be executed automatically, as shown in the "GitHub Actions" screenshot.

Populating the space: After running the publisher, the parent page will contain child pages with your content, as organized in your GitHub repository's folder structure. The child pages imitate folders and display the "Children Display" widget for easy navigation.

Attachment management: The tool automatically attaches images from your GitHub repository to the respective Confluence pages, ensuring all the necessary visuals are included in the documentation.

Conclusion

The GitHub to Confluence Publisher tool simplifies the process of publishing Markdown files from GitHub to Confluence. By automating the conversion and uploading tasks, it saves time and effort, allowing you to focus on creating high-quality documentation.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/github-to-confluence-publisher

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/

Terraform and DigitalOcean: Automating Infrastructure and Catching the Hidden Load Balancer

andygolubev — Wed, 17 May 2023 19:15:40 +0000

Introduction:

In this article, I will demonstrate the process of provisioning various components of infrastructure, including Projects, Virtual Private Clouds (VPCs), Kubernetes clusters, Load Balancers, and DNS Records. Additionally, I will outline the steps to configure Kubernetes with ingress and cert-manager, all within a single pipeline.
For this purpose, I have chosen DigitalOcean as the cloud provider due to its cost-effectiveness in comparison to leading providers like AWS, GCP, and Azure. To facilitate the infrastructure provisioning, I will be utilizing Terraform, a powerful infrastructure as code tool. Unfortunately, I won't be able to incorporate Terragrunt into this setup, as I encountered difficulties in configuring it with DigitalOcean Bucket (Space).

Project structure:

The core of my project involves defining a live infrastructure along with several modules. The infrastructure definition comprises two distinct stages, which can be visualized through the provided diagram.
You can find all of my code in my GitHub repository: https://github.com/andygolubev/terraform-digital-ocean

Initially, I attempted to handle both stages within a single script. However, during the implementation, I encountered a limitation with Terraform. Specifically, I discovered that Terraform's integration with the DigitalOcean provider did not allow for the creation and configuration of Kubernetes using the HashiCorp Kubernetes provider within the same script.

To address this challenge and ensure a smooth deployment process, I decided to split the infrastructure definition into two stages:

Stage 1: Infrastructure Provisioning (VPC, Kubernetes, PostgreSQL, Install Nginx and Cert manager)
Stage 2: Kubernetes Configuration, DNS records setup

By separating the infrastructure provisioning and Kubernetes configuration into distinct stages, we can overcome the limitations imposed by the integration challenges mentioned earlier. This approach allows for granular control and flexibility when deploying and managing infrastructure and Kubernetes resources.

Furthermore, this division of stages enables better modularization and reusability, as each stage can be version-controlled, tested, and deployed independently. This not only simplifies the maintenance and troubleshooting process but also promotes scalability and agility when making changes or expanding the infrastructure in the future.

Overall, by navigating around the constraints and adopting a two-stage approach, we can effectively define and deploy our live infrastructure while integrating Kubernetes seamlessly into the process, ensuring a reliable and scalable environment for our applications.

This is my folders structure:

➜  terraform-digital-ocean git:(main) tree .
.
├── Infrastructure
│   └── digitalocean
│       ├── infrastructure-live
│       │   └── test-v1
│       │       ├── stage1
│       │       │   ├── main-stage1.tf
│       │       │   └── outputs.tf
│       │       └── stage2
│       │           ├── main-stage2.tf
│       │           └── outputs.tf
│       └── infrastructure-modules
│           ├── kubernetes-config
│           │   └── v1.0
│           │       ├── 0-versions.tf
│           │       ├── 1-save-kubeconfig.tf
│           │       ├── 2-cluster-issuer.tf
│           │       ├── 3-ingress-demo.tf
│           │       ├── 4-services-good-afternoon.tf
│           │       ├── 4-services-good-evening.tf
│           │       ├── 4-services-good-morning.tf
│           │       ├── 5-service-pagenotfound.tf
│           │       ├── 6-load-balancer.tf
│           │       ├── 7-records.tf
│           │       ├── 8-variables.tf
│           │       └── 9-outputs.tf
│           ├── kubernetes-provision
│           │   └── v1.0
│           │       ├── 0-versions.tf
│           │       ├── 1-kubernetes.tf
│           │       ├── 2-save-kubeconfig.tf
│           │       ├── 3-ingress-and-cert-manager.tf
│           │       ├── 4-registry-access.tf
│           │       ├── 5-variables.tf
│           │       └── 6-outputs.tf
│           ├── postgresql
│           │   └── v1.0
│           │       ├── 0-versions.tf
│           │       ├── 1-postgres.tf
│           │       ├── 2-variables.tf
│           │       └── 3-outputs.tf
│           └── vpc
│               └── v1.0
│                   ├── 0-versions.tf
│                   ├── 1-vpc.tf
│                   ├── 2-variables.tf
│                   └── 3-outputs.tf
├── LICENSE
└── README.md

Terraform backend setup:

Prior to executing my pipeline, I have created a private bucket in DigitalOcean for storing terraform states.

Stage 1. Provision the infrastructure

During this stage, I utilize my "main-stage1.tf" file to declare the necessary values for infrastructure provisioning. Additionally, I ensure proper management of dependencies between modules to guarantee a smooth and coherent deployment process.

...
resource "digitalocean_project" "this" {
  name        = "infra-demo-v1" 
...
}

module "vpc" {
  source = "../../../infrastructure-modules/vpc/v1.0"
  vpc_name             = "vpc-test"
...
}

module "kubernetes-provision" {
    source = "../../../infrastructure-modules/kubernetes-provision/v1.0"
...
    k8s_cluster_name = "demo-cluster-test-v1" #Edit
    vpc_id = module.vpc.vpc_id

...
    k8s_embedded_pool_size = "s-4vcpu-8gb"
    k8s_embedded_pool_nodes_count = 1 

    # Type "true" if you want this pool of nodes
    pool_1_enabled = true
    k8s_pool_1_size = "s-4vcpu-8gb"
    k8s_pool_1_nodes_count = 1

...
    depends_on = [ module.vpc, digitalocean_project.this,]
}

module "postgresql" {
    source = "../../../infrastructure-modules/postgresql/v1.0"

...
    postgre_enabled = true
    posgre_cluster_name = "postgresql-demo-test-v1"
...

    depends_on = [ module.vpc, digitalocean_project.this,]
}

The complete content of the file: main-stage1.tf

The output of the Stage 1 is:

Outputs:

k8s_cluster_id = "ba4854df-c6de-4deb-8385-77014b491454"
k8s_cluster_name = "demo-cluster-test-v1"
k8s_cluster_urn = "do:kubernetes:ba4854df-c6de-4deb-8385-77014b491454"

In my pipeline, I utilize "k8s_cluster_name" output as an input for Stage 2. You can find details in the pipeline listing provided at the end of this article.

Stage 2. Configure the Kubernetes cluster and DNS

In Stage 2 of my pipeline, I use a combination of Kubernetes manifests, local command execution, and the creation of DigitalOcean resources to achieve the desired configuration and setup. You can see it in my "kubernetes-config" terraform module: kubernetes-config

The "main-stage2.tf" file includes all the essential configurations for Stage 2.
The complete content of the file: main-stage2.tf

...
module "kubernetes-config" {
  source = "../../../infrastructure-modules/kubernetes-config/v1.0"

  digital_ocean_api_token_for_k8s_config = var.digital_ocean_api_token
  k8s_config_cluster_name = var.k8s_cluster_name


  domain = "kuber.work"
  service1-subdomain = "service-1-test-morning"
  service2-subdomain = "service-2-test-afternoon"
  service3-subdomain = "service-3-test-evening"
  lb-workaround-subdomain = "lb-workaround-test"
  service1-service = "goodmorning" 
  service2-service = "goodafternoon" 
  service3-service = "goodevening" 
  cluster-issuer = "letsencrypt-prod" # letsencrypt-prod or letsencrypt-staging
  ssl-redirect = "false" # To accommodate the requirement for the service to respond on HTTP, a temporary value is assigned for certificate issuing.

}

To illustrate and demonstrate the functionality of the system, I have incorporated three distinct services, each offering unique endpoints:

I do the job in this order:

Set up cluster issuers for cert-manager, which are required for TLS certificates management.
provisioning of ingress and apply any necessary workarounds as prescribed in the DigitalOcean documentation. How to Set Up an Nginx Ingress with Cert-Manager on DigitalOcean Kubernetes
Set up hello services.
Catch the load balancer ip and provision DNS records.

resource "local_file" "get_load_balancer_script" {
  content  = <<-EOF
  #!/bin/bash
  doctl kubernetes cluster list-associated-resources $1 -o json | jq '{ load_balancer_id: .load_balancers[0].id, load_balancer_name: .load_balancers[0].name }'
  EOF

  filename = "/tmp/get_load_balancer_id.sh"

  depends_on = [ kubernetes_ingress_v1.demo-ingress, kubernetes_service_v1.ingress-nginx-controller, ]
}

data "external" "load_balancer_details" {
  program = ["${local_file.get_load_balancer_script.filename}", "${var.k8s_config_cluster_name}"]
  depends_on = [ local_file.get_load_balancer_script, ]
}

In this section, I create and execute a local script that utilizes the "doctl" tool to retrieve the IP address of the load balancer.

And finally get data of the load balancer and use it for DNS records provision

data "digitalocean_loadbalancer" "this" {
  id = data.external.load_balancer_details.result.load_balancer_id
  depends_on = [ local_file.get_load_balancer_script, ]
}

data "digitalocean_domain" "this" {
  name = var.domain
}

resource "digitalocean_record" "service1" {
  domain = data.digitalocean_domain.this.id
  type   = "A"
  name   = var.service1-subdomain
  value  = data.digitalocean_loadbalancer.this.ip

  depends_on = [ local_file.get_load_balancer_script, ]
}

...

At this point, I have successfully provisioned the required resources and ...

... have DNS records.

How it's time to test our service:

➜  ~ curl https://service-1-test-morning.kuber.work -v
*   Trying 146.190.178.50:443...
* Connected to service-1-test-morning.kuber.work 
...
* Server certificate:
*  subject: CN=service-1-test-morning.kuber.work
*  start date: May 17 10:20:20 2023 GMT
*  expire date: Aug 15 10:20:19 2023 GMT
*  subjectAltName: host "service-1-test-morning.kuber.work" matched cert's "service-1-test-morning.kuber.work"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
...
Good Morning!

Automate the process using GitHub actions

For the automation I use two separate workflows:

Provision the infrastructure
Destroy the infrastructure (Manual run)

Workflow for the infrastructure provision:

name: 1-Provision-infrastructure

on:
  push:
    branches: [ "main" ]
  pull_request:
    branches: [ "main" ]

env:
  PATH_STAGE_1: "./Infrastructure/digitalocean/infrastructure-live/test-v1/stage1/"
  PATH_STAGE_2: "./Infrastructure/digitalocean/infrastructure-live/test-v1/stage2/"

jobs:
  provision:
    name: Provision the infrastructure in Digital Ocean and configure Kubernetes
    runs-on: ubuntu-latest
    steps:
      - name: Install doctl
        uses: digitalocean/action-doctl@v2
        with:
          token: ${{ secrets.DO_API_TOKEN }}

      - name: Checkout
        uses: actions/checkout@v3

      - name: Provision the infrastructure
        env:
          DO_API_TOKEN: ${{secrets.DO_API_TOKEN }}
          DO_BUCKET_ACCESS_KEY: ${{ secrets.DO_BUCKET_ACCESS_KEY }}
          DO_BUCKET_SECRET_KEY: ${{ secrets.DO_BUCKET_SECRET_KEY }}
        run: |
          terraform -chdir=$PATH_STAGE_1 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY"
          terraform -chdir=$PATH_STAGE_1 plan -var="digital_ocean_api_token=$DO_API_TOKEN"
          terraform -chdir=$PATH_STAGE_1 apply -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve
          terraform -chdir=$PATH_STAGE_2 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY"
          terraform -chdir=$PATH_STAGE_2 plan -var="digital_ocean_api_token=$DO_API_TOKEN" -var="k8s_cluster_name=$(cd $PATH_STAGE_1 && terraform output -raw k8s_cluster_name)"
          terraform -chdir=$PATH_STAGE_2 apply -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve -var="k8s_cluster_name=$(cd $PATH_STAGE_1 && terraform output -raw k8s_cluster_name)"

Here I use the output of the Stage 1 as the input for the Stage 2:
terraform -chdir=$PATH_STAGE_2 apply -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve -var="k8s_cluster_name=$(cd $PATH_STAGE_1 && terraform output -raw k8s_cluster_name)"
so this command returns just the cluster name:
terraform output -raw k8s_cluster_name

Workflow for the infrastructure destruction:

name: 2-Destroy-infrastructure

on:
  workflow_dispatch:

env:
  PATH_STAGE_1: "./Infrastructure/digitalocean/infrastructure-live/test-v1/stage1/"
  PATH_STAGE_2: "./Infrastructure/digitalocean/infrastructure-live/test-v1/stage2/"

jobs:
  destroy:
    name: Destroy the infrastructure in Digital Ocean
    runs-on: ubuntu-latest
    steps:
      - name: Install doctl
        uses: digitalocean/action-doctl@v2
        with:
          token: ${{ secrets.DO_API_TOKEN }}

      - name: Checkout
        uses: actions/checkout@v3

      - name: Destroy the infrastructure
        env:
          DO_API_TOKEN: ${{ secrets.DO_API_TOKEN }}
          DO_BUCKET_ACCESS_KEY: ${{ secrets.DO_BUCKET_ACCESS_KEY }}
          DO_BUCKET_SECRET_KEY: ${{ secrets.DO_BUCKET_SECRET_KEY }}
        run: |
          terraform -chdir=$PATH_STAGE_1 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY"
          doctl auth init --access-token $DO_API_TOKEN
          doctl kubernetes cluster kubeconfig save $(terraform -chdir=$PATH_STAGE_1  output -raw  k8s_cluster_name) || true
          terraform -chdir=$PATH_STAGE_2 init -var="digital_ocean_api_token=$DO_API_TOKEN" -backend-config="access_key=$DO_BUCKET_ACCESS_KEY" -backend-config="secret_key=$DO_BUCKET_SECRET_KEY" || true
          terraform -chdir=$PATH_STAGE_2 apply -destroy -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve -var="k8s_cluster_name=$(cd $PATH_STAGE_1 && terraform output -raw k8s_cluster_name)" || true
          terraform -chdir=$PATH_STAGE_1 apply -destroy -var="digital_ocean_api_token=$DO_API_TOKEN" --auto-approve

In this stage, I begin by destroying the resources in Stage 2, followed by the destruction of all remaining resources.

To enable multiple runs of my pipelines, it is crucial to include the capability to destroy Stage 1 along with the main resources.
To ignore any potential command failures before destruction of Stage 1, I use "|| true" expression.

terraform -chdir=$PATH_STAGE_2 apply -destroy --auto-approve || true

During my exploration, I encountered an issue with DigitalOcean regarding VPC and Project deletion. Despite not observing any associated resources in the console, DigitalOcean indicates that the VPC still possesses resources. Consequently, when attempting to remove the VPC using Terraform, it raises an error due to the inconsistency.

When I rerun the pipeline, this specific step executes without encountering any errors.

If you experienced this issue, please contact me or drop a comment.

Below, you will find a list of pipeline workflows:

Conclusion:

In this article, I have demonstrated the process of provisioning various components of infrastructure using Terraform on DigitalOcean. By adopting a two-stage approach, I have overcome the limitations of the Terraform and the DigitalOcean providers.

You can find all of my code in my GitHub repository: https://github.com/andygolubev/terraform-digital-ocean

Feel free to connect with me on LinkedIn: https://www.linkedin.com/in/andy-golubev/