A complete setup of secure and scalable serverless GitLab runners on AWS Fargate via Terraform IAC
As GitLab becomes widely used in my organization, we need a way to manage our CICD runner fleet in the most secure and scalable way. A serverless setup with AWS Fargate becomes attractive with its simplified infrastructure management and effortless scalability.
In this post, I will share a setup for a fleet of serverless GitLab runners on AWS Fargate managed via Terraform for ease of reproducibility in any environment.
Background
GitLab has a guide on Autoscaling GitLab CI on AWS Fargate, with runner manager hosted on an EC2, thus it’s not completely serverless. Others have shared a full ECS on Fargate setup for both managers and workers, such as Serverless GitLab CI/CD on AWS Fargate by Daniel Coutinho de Miranda and A serverless approach for GitLab integration on AWS by Damiano Giorgi.
However, these examples rely heavily on manual configuration and support only 1 manager profile in the EC2/Fargate Task.
The main motivations for the setup described in this post are:
Use Terraform Infra-as-Code for a more secure, reproducible and configurable setup.
Support multiple runner manager profiles in one Fargate instance.
Architecture Overview
Key elements:
ECS Service on AWS Fargate to host the manager service, this service can register multiple runner managers.
Each runner manager registers itself with GitLab upon creation of new ECS Task under this ECS Service.
When a job from GitLab is triggered, an appropriate runner manager is assigned the job by GitLab, it then creates a new worker ECS task in specific subnet and security group to perform the job.
Each worker task has a predefined role and container image.
Deployment Process
Step 1: Build and publish container image for managers
Code: https://github.com/GovTechSG/fargate-gitlab-runner
Set
IMAGE_NAME
,IMAGE_TAG
,AWS_ACCOUNT_ID
, andAWS_REGION
environment variables as desired.IMAGE_NAME
should start with the ECR domain${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
to publish to ECR later.Use
docker-compose build
to build the image.
Alternatively run:docker build -t ${IMAGE_NAME}:${IMAGE_TAG}
.Login to ECR
aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
.-
Publish to ECR using
docker-compose push
ordocker push ${IMAGE_NAME}:${IMAGE_TAG}
.Note that this single image spawns multiple managers using profiles from the
MANAGERS_CONFIGS
environment variable.
If you usepodman
instead ofdocker
, install podman-compose as well.
Step 2: Build and publish container image for worker tasks
Sample images are available at https://github.com/GovTechSG/fargate-gitlab-runner-worker.
Update
Dockerfile
to install new tools as necessary.Build and publish to ECR using the steps for managers image above.
Step 3: Set up additional resources required by the ECS Service
Following resources are required to be set up either manually or via another set of Terraform code (recommended):
Secret for GitLab Token: Obtain token for runners from GitLab, create a new secret in either AWS Secret Manager or System Manager Parameter Store. Take note of the KMS key used.
VPC, subnets and security groups required for both managers and workers. Take note that SSH communication via port 22 need to be allowed between managers’ and workers’ network ACLs and security groups.
Workers’ ECS Task roles with appropriate policies for the worker tasks to access AWS services.
Step 4: Deploy ECS Service using Terragrunt
Code: https://github.com/GovTechSG/fargate-gitlab-runner-terraform
Copy from
environments/sample-dev
to a new envenvironments/<env>
, replacing<env>
with your desired environment name.Update your environment settings at
environments/<env>/env_inputs.hcl
.If there’s no existing ECS Cluster for either managers or workers, create new one(s) following the samples at
environments/<env>/ecs-cluster-for-managers
orenvironments/<env>/ecs-cluster-for-workers
.Update variable values in
environments/<env>/ecs-fargate-gitlab-runner-service/terragrunt.hcl
matching your environment.Use
cd environments/<env>/ecs-fargate-gitlab-runner-service && terragrunt apply
to deploy the ECS Service.After a few minutes, review ECS Service logs, and check GitLab to verify that new runner(s) have been registered. The number of runners should be equal to
manager_instance_count * length(keys(managers_configs))
.Test the new GitLab runner(s) with your own job or a simple script like this:
test:
tags:
# these should match all the tags set in the manager configs or a subset (note that a subset may mean other non-Fargate runners can pick up the job, depending on your setup)
- dev
- tool1
script:
- echo "It works!"
- for i in $(seq 1 30); do echo "."; sleep 1; done
If all is successful, you should get an output similar to this:
If you prefer to use Terraform instead of Terragrunt for a quick test:
- Create
terraform.tfvars
file in folderterraform_modules/ecs-fargate-gitlab-runner-service
with all variable values found inenvironments/<env>/ecs-fargate-gitlab-runner-service/terragrunt.hcl
- Copy content of
versions.tf
andprovider.tf
fromenvironments/terragrunt.hcl
into their respective files interraform_modules/ecs-fargate-gitlab-runner-service
.- Finally, run
terraform apply
.
Resources Created by Terraform
These are the main resources created by the Terraform module:
an ECS Service for the manager container
its container definition with required environment variables
its IAM role that allows running and stopping the new ECS Tasks using the worker ECS Task definition(s)
worker ECS Task definition(s), one for each manager profile in managers_configs
What happened?
During Registration:
Once ECS Service is deployed by Terraform, it creates a task to host the manager container image.
When this container is up, it reads environment variables, most notably
MANAGERS_CONFIGS
variable, and registers each manager as a runner to GitLab, generates the full GitLab runnerconfig.toml
file as well as individual runner’sfargate_worker.toml
file containing the worker’s security, network settings and its task definition.
You should see this in GitLab’s list of runners, 1 runner for each key in MANAGERS_CONFIGS
(assuming manager_instance_count
is 1).
Running job:
Once a job whose tags match a subset of the tag list of those runners is triggered, GitLab sends the job to the appropriate runner manager.
The runner manager then starts a new worker ECS task using the task definition that have been passed in
MANAGERS_CONFIGS
, adding itsSSH_PUBLIC_KEY
as environment variable in the process.The worker ECS task is started with the
sshd
process with the manager’sSSH_PUBLIC_KEY
added to itsauthorized_keys
.Finally, the manager runs the job in the worker ECS task via
ssh
, the job output and status are shared back to GitLab as usual.
Troubleshooting
Followings are some issues that can occur:
Unable to start Fargate Task due to **No Container Instances were found in your cluster **error
- Check your ECS Cluster for workers and make sure
Default capacity provider strategy
is set toFARGATE
Manager unable to connect to ECS to start a task
- If the managers are hosted in private subnets, create VPC endpoints for ECS and ECR and make sure the managers can access them.
Manager unable to connect to worker ECS task via ssh
Make sure your worker container image has openssh installed and
SSH_PUBLIC_KEY
is added to the right user’s~/.ssh/authorized_keys
.Check that the subnets and security groups of both managers and workers allow traffic on port 22. Use the VPC Reachability Analyzer to confirm.
If the error is
signature algorithm ssh-rsa not in PubkeyAcceptedAlgorithms
, enablessh-rsa
by adding this to worker container image:RUN echo “PubkeyAcceptedKeyTypes +ssh-rsa” >> /etc/ssh/sshd_config
.
Worker ECS task has no credentials to access AWS
- Share the variable
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
with the SSH session by adding this tosshd
run:-o "SetEnv=AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=\"$AWS_CONTAINER_CREDENTIALS_RELATIVE_URI\""
Known Limitation:
- The Fargate driver doesn’t support ECS Exec yet. For more info: https://gitlab.com/gitlab-org/ci-cd/custom-executor-drivers/fargate/-/issues/49
Conclusion
We have set up a complete set of serverless runners for GitLab using AWS Fargate. With Terraform, the setup can be configured easily for different environments.
In addition, we have full freedom to create runners for a wide variety of needs: array of worker images with different tools installed, multiple worker ECS clusters using different spot instance strategies to optimize costs, custom worker roles to access cross account resources, …
The possibilities are boundless. It’s up to you to configure and experiment.
Thank you for reading. Do comment below to share your thoughts.
The main project for this article is hosted on Github.
This post was originally published at Medium
Top comments (0)