DEV Community

Cover image for Self-Hosting Codecov with GitLab Using Terraform: A Practical Deployment Guide
Anderson Leite
Anderson Leite

Posted on

Self-Hosting Codecov with GitLab Using Terraform: A Practical Deployment Guide

Motivation

Since 2023 Codecov provides an official self-hosted repository with Docker Compose examples and some basic guidance. It's a reasonable starting point, but it falls short in a few important ways:

  • Documentation gaps: The official docs cover the happy path (GitHub SaaS + Docker Compose on a single VM) but leave a lot of blanks for anything beyond that.
  • No GitLab self-hosted coverage: Our entire engineering workflow runs on a self-hosted GitLab instance. Getting Codecov to integrate with it: OAuth app creation, the right environment variables, the correct redirect URLs, etc required piecing together information from GitHub issues, the Codecov community forum, Codecov python code itself (to figure out how the env vars should be correctly named), and A LOT of trial and error. None of it is documented in one place.
  • Everything as code: At our company, we don't click through cloud consoles or run one-off scripts to provision infrastructure. Every resource: DNS records, OAuth applications, IAM roles, database instances, etc is managed through Terraform and reviewed like any other code change. The official self-hosted guide doesn't reflect this approach at all.

This article documents what we actually built: a production-grade, fully Terraform-managed Codecov deployment on AWS ECS Fargate, integrated with a self-hosted GitLab instance, with no manual steps after terraform apply.


Table of Contents

  1. Architecture Overview
  2. Prerequisites
  3. Repository Structure
  4. providers.tf
  5. variables.tf
  6. locals.tf
  7. data.tf
  8. Networking: nat-gateway.tf
  9. Networking: vpc-endpoints.tf
  10. Networking: service-discovery.tf
  11. security-groups.tf
  12. ecs-cluster.tf
  13. iam.tf
  14. Data Layer: rds.tf
  15. Data Layer: elasticache.tf
  16. Data Layer: efs-codecov-timescale.tf
  17. Data Layer: s3-codecov.tf
  18. secrets.tf
  19. GitLab OAuth: gitlab-oauth-codecov-app.tf
  20. ECS Services
  21. autoscaling.tf
  22. alb.tf
  23. acm.tf
  24. dns.tf
  25. outputs.tf
  26. CI/CD Pipeline
  27. Deploying
  28. Day-2 Operations

Architecture Overview

The deployment uses a private-first network model. All Codecov services run in private subnets with no public IP assignment. External traffic enters through a public Application Load Balancer (ALB) with HTTPS termination, and internal services communicate via AWS Cloud Map service discovery.

Internet
   │
   ▼
Cloudflare (DNS)
   │
   ▼
Application Load Balancer (public subnets, HTTPS 443)
   │
   ▼  (private subnets)
┌──────────────────────────────────────────────────────────────┐
│  ECS Fargate Cluster                                         │
│                                                              │
│  ┌─────────┐  ┌──────────┐  ┌─────┐  ┌────────┐  ┌────┐      │
│  │ Gateway │→ │ Frontend │  │ API │  │ Worker │  │ IA │      │
│  └─────────┘  └──────────┘  └──┬──┘  └────────┘  └────┘      │
│                                │                             │
│              Internal DNS: mycompany-tooling.local           │
│              (AWS Cloud Map)                                 │
└──────────────────────────────────────────────────────────────┘
   │          │            │          │
   ▼          ▼            ▼          ▼
  RDS      Redis       TimescaleDB   S3
(PostgreSQL) (Cache)   (ECS+EFS)  (Coverage data)
Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  • No public IPs on tasks all egress goes through multi-AZ NAT gateways
  • VPC endpoints for ECR, S3, Secrets Manager, CloudWatch Logs, and KMS, keeps traffic off the public internet and reduces NAT costs
  • AWS Cloud Map for internal service discovery instead of hardcoded IPs or environment variable injection
  • Secrets Manager for sensitive runtime values; SSM Parameter Store for config values
  • Deployment circuit breaker with rollback on every ECS service
  • GitLab OAuth application provisioned by Terraform itself, zero manual setup in GitLab UI

Prerequisites

  • Terraform >= 1.10
  • AWS CLI configured with sufficient permissions
  • Cloudflare API token with DNS edit rights on your zone
  • GitLab token with admin-level scope (for managing OAuth applications via the GitLab Terraform provider)
  • An existing VPC with subnets tagged with *-private-* and *-public-* name patterns
  • An S3 bucket for Terraform remote state

Repository Structure

terraform/
├── providers.tf                       # Backend + provider versions
├── variables.tf                       # All input variables
├── locals.tf                          # Computed values, tags, image versions
├── data.tf                            # Data sources (VPC, subnets, AZs)
├── outputs.tf                         # Useful post-apply outputs
│
├── nat-gateway.tf                     # Multi-AZ NAT gateways + route tables
├── vpc-endpoints.tf                   # Interface and Gateway VPC endpoints
├── service-discovery.tf               # AWS Cloud Map namespace + services
├── security-groups.tf                 # SGs for ECS, RDS, Redis, EFS
│
├── ecs-cluster.tf                     # Fargate cluster + CloudWatch log group
├── iam.tf                             # Task execution + task roles
│
├── rds.tf                             # PostgreSQL 17 (Multi-AZ)
├── elasticache.tf                     # Redis 7.x
├── efs-codecov-timescale.tf           # EFS for TimescaleDB persistence
├── s3-codecov.tf                      # Coverage data bucket
│
├── secrets.tf                         # Secrets Manager + SSM Parameters
├── gitlab-oauth-codecov-app.tf        # GitLab OAuth application
│
├── ecs-service-codecov-timescale.tf   # TimescaleDB on Fargate
├── ecs-service-codecov-gateway.tf     # Reverse proxy
├── ecs-service-codecov-frontend.tf    # Web UI
├── ecs-service-codecov-api.tf         # Backend API
├── ecs-service-codecov-worker.tf      # Background worker
├── ecs-service-codecov-ai.tf          # AI service
│
├── autoscaling.tf                     # CPU-based auto-scaling for API + Worker
├── alb.tf                             # ALB (HTTP→HTTPS redirect)
├── acm.tf                             # ACM certificate + Cloudflare DNS validation
└── dns.tf                             # Cloudflare CNAME → ALB
Enter fullscreen mode Exit fullscreen mode

providers.tf

terraform {
  required_version = ">= 1.10.0"

  backend "s3" {
    bucket       = "mycompany-tf-state"
    key          = "tooling/terraform.tfstate"
    region       = "eu-central-1"
    use_lockfile = true
    encrypt      = true
  }

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 6.0"
    }
    cloudflare = {
      source  = "cloudflare/cloudflare"
      version = ">= 5.0"
    }
    random = {
      source  = "hashicorp/random"
      version = ">= 3.0"
    }
    gitlab = {
      source  = "gitlabhq/gitlab"
      version = ">= 18.0"
    }
  }
}

provider "aws" {
  region = var.region
}

provider "cloudflare" {
  alias     = "main"
  api_token = var.cloudflare_api_token
}

provider "gitlab" {
  base_url = "${var.gitlab_url}/api/v4"
  token    = var.gitlab_token
}
Enter fullscreen mode Exit fullscreen mode

variables.tf

################################################################################
# General
################################################################################

variable "region" {
  description = "AWS region"
  type        = string
  default     = "eu-central-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "tooling"
}

variable "vpc_name" {
  description = "Name of the existing VPC to look up via data source"
  type        = string
  default     = "mycompany-vpc-tooling"
}

################################################################################
# Cloudflare
################################################################################

variable "cloudflare_api_token" {
  description = "Cloudflare API token"
  type        = string
  sensitive   = true
}

variable "cloudflare_zone_id" {
  description = "Cloudflare zone ID for your domain"
  type        = string
}

################################################################################
# Codecov Configuration
################################################################################

variable "codecov_domain" {
  description = "Domain name for Codecov"
  type        = string
  default     = "codecov.example.com"
}

variable "codecov_enterprise_license" {
  description = "Codecov enterprise license key (default 50-user community license is included in the image)"
  type        = string
  default     = ""
  sensitive   = true
}

variable "codecov_admins" {
  description = "List of GitLab usernames to designate as Codecov install admins"
  type        = list(string)
  default     = ["alice", "bob", "carol"]
}

variable "codecov_upload_token" {
  description = "Global upload token for Codecov CI integration (auto-generated if empty)"
  type        = string
  default     = ""
  sensitive   = true
}

################################################################################
# GitLab
################################################################################

variable "gitlab_url" {
  description = "GitLab instance URL"
  type        = string
  default     = "https://git.example.com"
}

variable "gitlab_token" {
  description = "GitLab API token with admin scope for managing OAuth applications"
  type        = string
  sensitive   = true
}

################################################################################
# RDS
################################################################################

variable "rds_instance_class" {
  description = "RDS instance class"
  type        = string
  default     = "db.t3.small"
}

variable "rds_allocated_storage" {
  description = "Initial allocated storage in GB"
  type        = number
  default     = 20
}

variable "rds_max_allocated_storage" {
  description = "Maximum allocated storage in GB for autoscaling"
  type        = number
  default     = 100
}

################################################################################
# TimescaleDB
################################################################################

variable "timescale_image" {
  description = "TimescaleDB docker image (pin this!)"
  type        = string
  default     = "timescale/timescaledb:2.25.1-pg17"
}

variable "timescale_db_username" {
  description = "TimescaleDB username"
  type        = string
  default     = "codecov"
}

variable "timescale_db_name" {
  description = "TimescaleDB database name"
  type        = string
  default     = "codecov_timeseries"
}

################################################################################
# ElastiCache
################################################################################

variable "redis_node_type" {
  description = "ElastiCache Redis node type"
  type        = string
  default     = "cache.t3.micro"
}
Enter fullscreen mode Exit fullscreen mode

locals.tf

locals {
  name_prefix = "mycompany-tooling"

  tags = {
    Project     = "mycompany-tooling"
    Environment = var.environment
    ManagedBy   = "terraform"
    Service     = "codecov"
  }

  # Codecov container images pin API and Worker to a specific calver release.
  # Gateway, Frontend, and IA don't yet has the 26.2.2 tags, so latest-calver is used.
  codecov_version          = "26.2.2"
  codecov_gateway_version  = "latest-calver"
  codecov_frontend_version = "latest-calver"
  codecov_ia_version       = "latest-calver"

  codecov_images = {
    gateway  = "codecov/self-hosted-gateway:${local.codecov_gateway_version}"
    frontend = "codecov/self-hosted-frontend:${local.codecov_frontend_version}"
    ia       = "codecov/self-hosted-api-umbrella:${local.codecov_ia_version}"
    api      = "codecov/self-hosted-api:${local.codecov_version}"
    worker   = "codecov/self-hosted-worker:${local.codecov_version}"
  }

  # Default 50-user community license bundled in the self-hosted image.
  # Override by setting var.codecov_enterprise_license.
  codecov_default_license = "F5O0Fu5ASFTPtWXM51BK8YQlq7IM2s+8TBGULrf9Um7wHjfPwI+Z3E4PfF/dPs6Uc5A+MLti+2etHq5dnFEfZgoiIVCLZ8x+0BVmUSWwPS42vJXnf1veY9Bglang4mDIhmfWfp5l6AT6cxmAVFpGrwobiK6OcN9pjWx4iWabazmsOiF9LM++v0WtuHNvhgzRcKmnJPgqahEB7qqF6KQ1hg=="
  codecov_license         = var.codecov_enterprise_license != "" ? var.codecov_enterprise_license : local.codecov_default_license
}
Enter fullscreen mode Exit fullscreen mode

data.tf

################################################################################
# General Data Sources
################################################################################

data "aws_region" "current" {}
data "aws_caller_identity" "current" {}
data "aws_availability_zones" "available" {}

################################################################################
# VPC Data Sources (from existing VPC)
################################################################################

data "aws_vpc" "main" {
  filter {
    name   = "tag:Name"
    values = [var.vpc_name]
  }
}

# Subnets are tagged with Name pattern: {vpc_name}-{az}-private-{env}
data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }

  filter {
    name   = "tag:Name"
    values = ["*-private-*"]
  }
}

# Subnets are tagged with Name pattern: {vpc_name}-{az}-public-{env}
data "aws_subnets" "public" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.main.id]
  }

  filter {
    name   = "tag:Name"
    values = ["*-public-*"]
  }
}

data "aws_internet_gateway" "this" {
  filter {
    name   = "attachment.vpc-id"
    values = [data.aws_vpc.main.id]
  }
}

# Fetch subnet details to map AZ -> subnet ID
data "aws_subnet" "public" {
  for_each = toset(data.aws_subnets.public.ids)
  id       = each.value
}

data "aws_subnet" "private" {
  for_each = toset(data.aws_subnets.private.ids)
  id       = each.value
}
Enter fullscreen mode Exit fullscreen mode

Networking: nat-gateway.tf

One NAT gateway per availability zone ensures that a single AZ failure doesn't break outbound connectivity for the rest of the cluster. Each private subnet gets its own route table pointing at the NAT gateway in the same AZ. IPv6 egress is handled by an egress-only internet gateway, which is needed for pulling images from Docker Hub over IPv6.

################################################################################
# Multi-AZ NAT Gateway (one per AZ)
################################################################################

locals {
  # Map: AZ -> public subnet id (one per AZ)
  public_subnet_by_az = {
    for subnet_id, s in data.aws_subnet.public :
    s.availability_zone => subnet_id
  }

  # Map: private subnet id -> AZ
  private_az_by_subnet = {
    for subnet_id, s in data.aws_subnet.private :
    subnet_id => s.availability_zone
  }
}

# One EIP per AZ
resource "aws_eip" "nat" {
  for_each = local.public_subnet_by_az
  domain   = "vpc"

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-nat-eip-${each.key}"
  })
}

# One NAT GW per AZ, in the public subnet for that AZ
resource "aws_nat_gateway" "this" {
  for_each      = local.public_subnet_by_az
  allocation_id = aws_eip.nat[each.key].id
  subnet_id     = each.value

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-nat-${each.key}"
  })

  depends_on = [data.aws_internet_gateway.this]
}

# A private route table per AZ
resource "aws_route_table" "private" {
  for_each = local.public_subnet_by_az
  vpc_id   = data.aws_vpc.main.id

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-private-rt-${each.key}"
  })
}

# Default IPv4 route per AZ -> NAT GW in that AZ
resource "aws_route" "private_default_v4" {
  for_each               = local.public_subnet_by_az
  route_table_id         = aws_route_table.private[each.key].id
  destination_cidr_block = "0.0.0.0/0"
  nat_gateway_id         = aws_nat_gateway.this[each.key].id
}

# Route S3 traffic through the Gateway endpoint (keeps S3 traffic private)
resource "aws_vpc_endpoint_route_table_association" "s3_private" {
  for_each        = local.public_subnet_by_az
  vpc_endpoint_id = aws_vpc_endpoint.s3.id
  route_table_id  = aws_route_table.private[each.key].id
}

# Associate each private subnet to the private route table for its AZ
resource "aws_route_table_association" "private" {
  for_each = local.private_az_by_subnet

  subnet_id      = each.key
  route_table_id = aws_route_table.private[each.value].id
}

# Egress-only IGW for IPv6
resource "aws_egress_only_internet_gateway" "this" {
  vpc_id = data.aws_vpc.main.id

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-eigw"
  })
}

resource "aws_route" "private_default_ipv6" {
  for_each = aws_route_table.private

  route_table_id              = each.value.id
  destination_ipv6_cidr_block = "::/0"
  egress_only_gateway_id      = aws_egress_only_internet_gateway.this.id
}

# Allow IPv6 egress from ECS tasks (needed for Docker Hub pulls over IPv6)
resource "aws_security_group_rule" "ecs_ipv6_egress_all" {
  type              = "egress"
  security_group_id = aws_security_group.ecs_services.id

  from_port        = 0
  to_port          = 0
  protocol         = "-1"
  ipv6_cidr_blocks = ["::/0"]
  description      = "Allow IPv6 egress (needed for Docker Hub pulls over IPv6)"
}
Enter fullscreen mode Exit fullscreen mode

Networking: vpc-endpoints.tf

VPC endpoints cut NAT gateway data processing costs for high-volume traffic (ECR image pulls, CloudWatch log shipping, Secrets Manager calls) and keep that traffic inside the AWS network.

Note: This file does not include ssm or ssmmessages VPC endpoints. If you don't have those already created in your VPC from another deployment, add them here, they're required for ECS Exec and SSM parameter access from tasks running in private subnets.

################################################################################
# VPC Endpoints so Fargate in private subnets can reach AWS APIs
################################################################################

# SG for Interface Endpoints (allow HTTPS from ECS tasks)
resource "aws_security_group" "vpc_endpoints" {
  name        = "${local.name_prefix}-vpc-endpoints"
  description = "Allow ECS tasks to reach VPC interface endpoints"
  vpc_id      = data.aws_vpc.main.id

  ingress {
    description = "HTTPS from ECS services SG"
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    security_groups = [
      aws_security_group.ecs_services.id,
      aws_security_group.timescale.id,
    ]
  }

  egress {
    description = "All egress"
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = merge(local.tags, { Component = "vpc-endpoints" })
}

locals {
  vpce_subnets = data.aws_subnets.private.ids
  vpce_sg      = [aws_security_group.vpc_endpoints.id]
}

# CloudWatch Logs for ECS task log shipping
resource "aws_vpc_endpoint" "logs" {
  vpc_id              = data.aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.logs"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = local.vpce_subnets
  security_group_ids  = local.vpce_sg
  private_dns_enabled = true
  tags                = merge(local.tags, { Name = "${local.name_prefix}-vpce-logs" })
}

# Secrets Manager for pulling secrets at task startup
resource "aws_vpc_endpoint" "secretsmanager" {
  vpc_id              = data.aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.secretsmanager"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = local.vpce_subnets
  security_group_ids  = local.vpce_sg
  private_dns_enabled = true
  tags                = merge(local.tags, { Name = "${local.name_prefix}-vpce-secrets" })
}

# ECR API for image manifest lookups
resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = data.aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = local.vpce_subnets
  security_group_ids  = local.vpce_sg
  private_dns_enabled = true
  tags                = merge(local.tags, { Name = "${local.name_prefix}-vpce-ecr-api" })
}

# ECR DKR for actual image layer pulls
resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = data.aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = local.vpce_subnets
  security_group_ids  = local.vpce_sg
  private_dns_enabled = true
  tags                = merge(local.tags, { Name = "${local.name_prefix}-vpce-ecr-dkr" })
}

# KMS required because Secrets Manager uses KMS for encryption
resource "aws_vpc_endpoint" "kms" {
  vpc_id              = data.aws_vpc.main.id
  service_name        = "com.amazonaws.${var.region}.kms"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = local.vpce_subnets
  security_group_ids  = local.vpce_sg
  private_dns_enabled = true
  tags                = merge(local.tags, { Name = "${local.name_prefix}-vpce-kms" })
}

# S3 Gateway for ECR image layers (stored in S3) + Codecov S3 bucket
resource "aws_vpc_endpoint" "s3" {
  vpc_id            = data.aws_vpc.main.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = [for rt in aws_route_table.private : rt.id]
  tags              = merge(local.tags, { Name = "${local.name_prefix}-vpce-s3" })
}
Enter fullscreen mode Exit fullscreen mode

Networking: service-discovery.tf

AWS Cloud Map provides internal DNS for service-to-service communication. Each Codecov component registers under a shared private namespace, resolving to <service>.mycompany-tooling.local.

################################################################################
# AWS Cloud Map - Service Discovery Namespace
################################################################################

resource "aws_service_discovery_private_dns_namespace" "this" {
  name        = "${local.name_prefix}.local"
  description = "Service discovery namespace for ${local.name_prefix} ECS services"
  vpc         = data.aws_vpc.main.id

  tags = local.tags
}

resource "aws_service_discovery_service" "api" {
  name = "api"

  dns_config {
    namespace_id   = aws_service_discovery_private_dns_namespace.this.id
    routing_policy = "MULTIVALUE"

    dns_records {
      ttl  = 10
      type = "A"
    }
  }

  tags = local.tags
}

resource "aws_service_discovery_service" "frontend" {
  name = "frontend"

  dns_config {
    namespace_id   = aws_service_discovery_private_dns_namespace.this.id
    routing_policy = "MULTIVALUE"

    dns_records {
      ttl  = 10
      type = "A"
    }
  }

  tags = local.tags
}

resource "aws_service_discovery_service" "timescale" {
  name = "timescale"

  dns_config {
    namespace_id   = aws_service_discovery_private_dns_namespace.this.id
    routing_policy = "MULTIVALUE"

    dns_records {
      ttl  = 10
      type = "A"
    }
  }

  tags = local.tags
}

resource "aws_service_discovery_service" "ia" {
  name = "ia"

  dns_config {
    namespace_id   = aws_service_discovery_private_dns_namespace.this.id
    routing_policy = "MULTIVALUE"

    dns_records {
      ttl  = 10
      type = "A"
    }
  }

  tags = merge(local.tags, { Component = "ia" })
}
Enter fullscreen mode Exit fullscreen mode

security-groups.tf

################################################################################
# Security Group - ECS Services
################################################################################

resource "aws_security_group" "ecs_services" {
  name        = "${local.name_prefix}-ecs-services"
  description = "Security group for ECS Fargate services"
  vpc_id      = data.aws_vpc.main.id

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-ecs-services"
  })
}

resource "aws_vpc_security_group_ingress_rule" "ecs_from_alb" {
  security_group_id            = aws_security_group.ecs_services.id
  description                  = "Allow traffic from ALB"
  from_port                    = 0
  to_port                      = 65535
  ip_protocol                  = "tcp"
  referenced_security_group_id = module.alb.security_group_id
}

# Allow ECS services to communicate with each other (Cloud Map service discovery)
resource "aws_vpc_security_group_ingress_rule" "ecs_from_ecs" {
  security_group_id            = aws_security_group.ecs_services.id
  description                  = "Allow inter-service communication"
  from_port                    = 0
  to_port                      = 65535
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.ecs_services.id
}

resource "aws_vpc_security_group_egress_rule" "ecs_to_internet" {
  security_group_id = aws_security_group.ecs_services.id
  description       = "Allow outbound for S3, ECR, Secrets Manager, and container image pulls"
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0"
}

################################################################################
# Security Group - RDS
################################################################################

resource "aws_security_group" "rds" {
  name        = "${local.name_prefix}-rds"
  description = "Security group for RDS PostgreSQL"
  vpc_id      = data.aws_vpc.main.id

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-rds"
  })
}

resource "aws_vpc_security_group_ingress_rule" "rds_from_ecs" {
  security_group_id            = aws_security_group.rds.id
  description                  = "PostgreSQL from ECS services"
  from_port                    = 5432
  to_port                      = 5432
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.ecs_services.id
}

################################################################################
# Security Group - Redis
################################################################################

resource "aws_security_group" "redis" {
  name        = "${local.name_prefix}-redis"
  description = "Security group for ElastiCache Redis"
  vpc_id      = data.aws_vpc.main.id

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-redis"
  })
}

resource "aws_vpc_security_group_ingress_rule" "redis_from_ecs" {
  security_group_id            = aws_security_group.redis.id
  description                  = "Redis from ECS services"
  from_port                    = 6379
  to_port                      = 6379
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.ecs_services.id
}

################################################################################
# Security Group - TimescaleDB (ECS)
################################################################################

resource "aws_security_group" "timescale" {
  name        = "${local.name_prefix}-timescale"
  description = "Security group for TimescaleDB ECS service"
  vpc_id      = data.aws_vpc.main.id

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-timescale"
  })
}

resource "aws_vpc_security_group_ingress_rule" "timescale_from_ecs" {
  security_group_id            = aws_security_group.timescale.id
  description                  = "PostgreSQL from ECS services"
  from_port                    = 5432
  to_port                      = 5432
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.ecs_services.id
}

resource "aws_vpc_security_group_egress_rule" "timescale_to_any" {
  security_group_id = aws_security_group.timescale.id
  description       = "Egress"
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0"
}

################################################################################
# Security Group - EFS (TimescaleDB persistence)
################################################################################

resource "aws_security_group" "efs_timescale" {
  name        = "${local.name_prefix}-efs-timescale"
  description = "EFS SG for TimescaleDB"
  vpc_id      = data.aws_vpc.main.id

  tags = merge(local.tags, { Name = "${local.name_prefix}-efs-timescale" })
}

resource "aws_vpc_security_group_ingress_rule" "efs_timescale_from_ecs" {
  security_group_id            = aws_security_group.efs_timescale.id
  description                  = "NFS from ECS tasks"
  from_port                    = 2049
  to_port                      = 2049
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.ecs_services.id
}

resource "aws_vpc_security_group_ingress_rule" "efs_timescale_from_timescale" {
  security_group_id            = aws_security_group.efs_timescale.id
  description                  = "NFS from Timescale task SG"
  from_port                    = 2049
  to_port                      = 2049
  ip_protocol                  = "tcp"
  referenced_security_group_id = aws_security_group.timescale.id
}

resource "aws_vpc_security_group_egress_rule" "efs_timescale_to_any" {
  security_group_id = aws_security_group.efs_timescale.id
  description       = "Egress"
  ip_protocol       = "-1"
  cidr_ipv4         = "0.0.0.0/0"
}
Enter fullscreen mode Exit fullscreen mode

ecs-cluster.tf

################################################################################
# ECS Cluster
################################################################################

resource "aws_ecs_cluster" "this" {
  name = "${local.name_prefix}-ecs"

  configuration {
    execute_command_configuration {
      logging = "OVERRIDE"

      log_configuration {
        cloud_watch_log_group_name = aws_cloudwatch_log_group.ecs_cluster.name
      }
    }
  }

  setting {
    name  = "containerInsights"
    value = "enabled"
  }

  tags = local.tags
}

resource "aws_ecs_cluster_capacity_providers" "this" {
  cluster_name       = aws_ecs_cluster.this.name
  capacity_providers = ["FARGATE"]

  default_capacity_provider_strategy {
    base              = 1
    weight            = 100
    capacity_provider = "FARGATE"
  }
}

resource "aws_cloudwatch_log_group" "ecs_cluster" {
  name              = "/aws/ecs/${local.name_prefix}-ecs"
  retention_in_days = 90

  tags = merge(local.tags, {
    Name = "/aws/ecs/${local.name_prefix}-ecs"
  })
}
Enter fullscreen mode Exit fullscreen mode

iam.tf

Two roles are required:

  • Task Execution Role: used by the ECS agent to pull images, read secrets from Secrets Manager and SSM, and write logs to CloudWatch.
  • Task Role: used by the running application containers in this case, to read/write the S3 coverage bucket using IAM auth (no static credentials needed).
################################################################################
# ECS Task Execution Role
# Used by ECS agent to pull images, access secrets, write logs
################################################################################

resource "aws_iam_role" "ecs_task_execution" {
  name = "${local.name_prefix}-ecsTaskExecutionRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action    = "sts:AssumeRole"
        Effect    = "Allow"
        Principal = { Service = "ecs-tasks.amazonaws.com" }
      }
    ]
  })

  tags = merge(local.tags, { Purpose = "ECS Task Execution" })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution_managed" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"
}

resource "aws_iam_role_policy" "ecs_task_execution_secrets" {
  name = "${local.name_prefix}-ecs-task-execution-secrets"
  role = aws_iam_role.ecs_task_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "SecretsManagerAccess"
        Effect = "Allow"
        Action = ["secretsmanager:GetSecretValue"]
        Resource = [
          "arn:aws:secretsmanager:${var.region}:${data.aws_caller_identity.current.account_id}:secret:${local.name_prefix}/codecov/*"
        ]
      },
      {
        Sid    = "SSMParameterAccess"
        Effect = "Allow"
        Action = ["ssm:GetParameters"]
        Resource = [
          "arn:aws:ssm:${var.region}:${data.aws_caller_identity.current.account_id}:parameter/${local.name_prefix}/codecov/*"
        ]
      },
      {
        Sid    = "CloudWatchLogs"
        Effect = "Allow"
        Action = [
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents"
        ]
        Resource = [
          "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:/aws/ecs/${local.name_prefix}-*",
          "arn:aws:logs:${var.region}:${data.aws_caller_identity.current.account_id}:log-group:/aws/ecs/${local.name_prefix}-*:log-stream:*"
        ]
      }
    ]
  })
}

resource "aws_iam_role_policy" "ecs_task_execution_efs" {
  name = "${local.name_prefix}-ecs-task-execution-efs"
  role = aws_iam_role.ecs_task_execution.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "EfsMount"
        Effect = "Allow"
        Action = [
          "elasticfilesystem:ClientMount",
          "elasticfilesystem:ClientWrite",
          "elasticfilesystem:ClientRootAccess"
        ]
        Resource = [
          aws_efs_file_system.timescale.arn,
          aws_efs_access_point.timescale.arn
        ]
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "ecs_task_execution_ssm" {
  role       = aws_iam_role.ecs_task_execution.name
  policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}

################################################################################
# ECS Task Role (Application-level permissions)
# Used by Codecov containers to access S3
################################################################################

resource "aws_iam_role" "ecs_task_codecov" {
  name = "${local.name_prefix}-codecov-taskRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action    = "sts:AssumeRole"
        Effect    = "Allow"
        Principal = { Service = "ecs-tasks.amazonaws.com" }
      }
    ]
  })

  tags = merge(local.tags, { Purpose = "ECS Task - Codecov Application" })
}

resource "aws_iam_role_policy" "ecs_task_codecov_s3" {
  name = "${local.name_prefix}-codecov-s3-access"
  role = aws_iam_role.ecs_task_codecov.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "S3CodecovStorageAccess"
        Effect = "Allow"
        Action = [
          "s3:GetObject",
          "s3:PutObject",
          "s3:DeleteObject",
          "s3:ListBucket",
          "s3:AbortMultipartUpload",
          "s3:ListMultipartUploadParts",
          "s3:GetBucketLocation",
          "s3:HeadBucket",
          "s3:ListBucketVersions"
        ]
        Resource = [
          aws_s3_bucket.codecov_storage.arn,
          "${aws_s3_bucket.codecov_storage.arn}/*"
        ]
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

Data Layer: rds.tf

################################################################################
# RDS PostgreSQL 17 for Codecov
################################################################################

resource "aws_db_subnet_group" "codecov" {
  name       = "${local.name_prefix}-postgres"
  subnet_ids = data.aws_subnets.private.ids

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-postgres"
  })
}

resource "random_password" "rds_password" {
  length  = 32
  special = false # Avoid URL-encoding issues in connection strings
}

resource "aws_db_instance" "codecov" {
  identifier = "${local.name_prefix}-postgres"

  engine         = "postgres"
  engine_version = "17"
  instance_class = var.rds_instance_class

  allocated_storage     = var.rds_allocated_storage
  max_allocated_storage = var.rds_max_allocated_storage
  storage_type          = "gp3"
  storage_encrypted     = true

  db_name  = "codecov"
  username = "db_admin"
  password = random_password.rds_password.result

  multi_az            = true
  publicly_accessible = false

  db_subnet_group_name   = aws_db_subnet_group.codecov.name
  vpc_security_group_ids = [aws_security_group.rds.id]

  backup_retention_period = 7
  backup_window           = "03:00-04:00"
  maintenance_window      = "sun:04:00-sun:05:00"

  skip_final_snapshot       = false
  final_snapshot_identifier = "${local.name_prefix}-postgres-final"
  deletion_protection       = true

  performance_insights_enabled = true

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-postgres"
  })
}
Enter fullscreen mode Exit fullscreen mode

Data Layer: elasticache.tf

################################################################################
# ElastiCache Redis for Codecov
################################################################################

resource "aws_elasticache_subnet_group" "codecov" {
  name       = "${local.name_prefix}-codecov"
  subnet_ids = data.aws_subnets.private.ids

  tags = local.tags
}

resource "aws_elasticache_cluster" "codecov" {
  cluster_id = "${local.name_prefix}-codecov-redis"

  engine               = "redis"
  engine_version       = "7.1"
  node_type            = var.redis_node_type
  num_cache_nodes      = 1
  parameter_group_name = "default.redis7"
  port                 = 6379

  subnet_group_name  = aws_elasticache_subnet_group.codecov.name
  security_group_ids = [aws_security_group.redis.id]

  snapshot_retention_limit = 7

  tags = merge(local.tags, {
    Name = "${local.name_prefix}-codecov-redis"
  })
}
Enter fullscreen mode Exit fullscreen mode

Data Layer: efs-codecov-timescale.tf

Codecov uses TimescaleDB for time-series metrics (coverage trends over time). Instead of running a separate managed database, we deploy it on ECS Fargate with EFS-backed persistent storage. The EFS access point enforces UID/GID 999, which matches the postgres user inside the TimescaleDB container.

resource "aws_efs_file_system" "timescale" {
  encrypted       = true
  throughput_mode = "bursting"

  tags = merge(local.tags, {
    Name      = "${local.name_prefix}-timescale-efs"
    Component = "timescale"
  })
}

resource "aws_efs_mount_target" "timescale" {
  for_each        = toset(data.aws_subnets.private.ids)
  file_system_id  = aws_efs_file_system.timescale.id
  subnet_id       = each.value
  security_groups = [aws_security_group.efs_timescale.id]
}

resource "aws_efs_access_point" "timescale" {
  file_system_id = aws_efs_file_system.timescale.id

  posix_user {
    uid = 999
    gid = 999
  }

  root_directory {
    path = "/pgdata"
    creation_info {
      owner_uid   = 999
      owner_gid   = 999
      permissions = "0750"
    }
  }

  tags = merge(local.tags, {
    Name      = "${local.name_prefix}-timescale-ap"
    Component = "timescale"
  })
}
Enter fullscreen mode Exit fullscreen mode

Data Layer: s3-codecov.tf

################################################################################
# S3 Bucket for Codecov Coverage Data
################################################################################

resource "aws_s3_bucket" "codecov_storage" {
  bucket = "mycompany-codecov-storage"

  tags = merge(local.tags, {
    Name = "mycompany-codecov-storage"
  })
}

resource "aws_s3_bucket_versioning" "codecov_storage" {
  bucket = aws_s3_bucket.codecov_storage.id

  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "codecov_storage" {
  bucket = aws_s3_bucket.codecov_storage.id

  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "AES256"
    }
  }
}

resource "aws_s3_bucket_public_access_block" "codecov_storage" {
  bucket = aws_s3_bucket.codecov_storage.id

  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

resource "aws_s3_bucket_policy" "codecov_storage_https_only" {
  bucket = aws_s3_bucket.codecov_storage.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid       = "EnforceHTTPS"
        Effect    = "Deny"
        Principal = "*"
        Action    = "s3:*"
        Resource = [
          aws_s3_bucket.codecov_storage.arn,
          "${aws_s3_bucket.codecov_storage.arn}/*"
        ]
        Condition = {
          Bool = { "aws:SecureTransport" = "false" }
        }
      }
    ]
  })
}
Enter fullscreen mode Exit fullscreen mode

secrets.tf

Secrets Manager stores all sensitive runtime values. SSM Parameter Store is used for non-secret configuration that needs central management (license key, upload token, OAuth client ID).

Note on ignore_changes: The cookie_secret and upload_token use lifecycle { ignore_changes = [...] }. This prevents Terraform from rotating these values on every apply. To rotate manually, taint the resource: terraform taint aws_secretsmanager_secret_version.codecov_cookie_secret.

################################################################################
# Random password generation
################################################################################

resource "random_password" "cookie_secret" {
  length  = 64
  special = false
}

resource "random_password" "upload_token" {
  length  = 40
  special = false
}

resource "random_password" "timescale_password" {
  length  = 32
  special = false
}

################################################################################
# Secrets Manager - Database URLs
################################################################################

resource "aws_secretsmanager_secret" "codecov_database_url" {
  name = "${local.name_prefix}/codecov/database-url"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "codecov_database_url" {
  secret_id = aws_secretsmanager_secret.codecov_database_url.id
  secret_string = format(
    "postgres://%s:%s@%s:%s/%s",
    aws_db_instance.codecov.username,
    random_password.rds_password.result,
    aws_db_instance.codecov.address,
    aws_db_instance.codecov.port,
    aws_db_instance.codecov.db_name
  )
}

resource "aws_secretsmanager_secret" "codecov_timeseries_database_url" {
  name = "${local.name_prefix}/codecov/timeseries-database-url"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "codecov_timeseries_database_url" {
  secret_id = aws_secretsmanager_secret.codecov_timeseries_database_url.id

  secret_string = format(
    "postgres://%s:%s@%s:%s/%s",
    var.timescale_db_username,
    random_password.timescale_password.result,
    "timescale.${local.name_prefix}.local",
    "5432",
    var.timescale_db_name
  )
}

################################################################################
# Secrets Manager - Redis URL
################################################################################

resource "aws_secretsmanager_secret" "codecov_redis_url" {
  name = "${local.name_prefix}/codecov/redis-url"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "codecov_redis_url" {
  secret_id     = aws_secretsmanager_secret.codecov_redis_url.id
  secret_string = "redis://${aws_elasticache_cluster.codecov.cache_nodes[0].address}:${aws_elasticache_cluster.codecov.cache_nodes[0].port}"
}

################################################################################
# Secrets Manager - GitLab OAuth
################################################################################

resource "aws_secretsmanager_secret" "codecov_gitlab_oauth" {
  name = "${local.name_prefix}/codecov/gitlab-oauth-secret"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "codecov_gitlab_oauth" {
  secret_id     = aws_secretsmanager_secret.codecov_gitlab_oauth.id
  secret_string = gitlab_application.codecov.secret
}

################################################################################
# Secrets Manager - GitLab Bot Token
# Used by the Worker to post PR comments and status checks back to GitLab
################################################################################

resource "aws_secretsmanager_secret" "codecov_gitlab_bot_token" {
  name = "${local.name_prefix}/codecov/gitlab-bot-token"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "codecov_gitlab_bot_token" {
  secret_id     = aws_secretsmanager_secret.codecov_gitlab_bot_token.id
  secret_string = var.gitlab_token
}

################################################################################
# Secrets Manager - Cookie Secret
################################################################################

resource "aws_secretsmanager_secret" "codecov_cookie_secret" {
  name = "${local.name_prefix}/codecov/cookie-secret"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "codecov_cookie_secret" {
  secret_id     = aws_secretsmanager_secret.codecov_cookie_secret.id
  secret_string = random_password.cookie_secret.result

  lifecycle {
    ignore_changes = [secret_string]
  }
}

################################################################################
# Secrets Manager - TimescaleDB Password
################################################################################

resource "aws_secretsmanager_secret" "timescale_password" {
  name = "${local.name_prefix}/codecov/timescale-password"
  tags = local.tags
}

resource "aws_secretsmanager_secret_version" "timescale_password" {
  secret_id     = aws_secretsmanager_secret.timescale_password.id
  secret_string = random_password.timescale_password.result
}

################################################################################
# SSM Parameters - Non-sensitive configuration
################################################################################

resource "aws_ssm_parameter" "codecov_license" {
  name  = "/${local.name_prefix}/codecov/enterprise-license"
  type  = "SecureString"
  value = local.codecov_license
  tags  = local.tags
}

resource "aws_ssm_parameter" "codecov_gitlab_client_id" {
  name  = "/${local.name_prefix}/codecov/gitlab-oauth-client-id"
  type  = "String"
  value = gitlab_application.codecov.application_id
  tags  = local.tags
}

resource "aws_ssm_parameter" "codecov_upload_token" {
  name  = "/${local.name_prefix}/codecov/upload-token"
  type  = "SecureString"
  value = var.codecov_upload_token != "" ? var.codecov_upload_token : random_password.upload_token.result

  lifecycle {
    ignore_changes = [value]
  }

  tags = local.tags
}
Enter fullscreen mode Exit fullscreen mode

GitLab OAuth: gitlab-oauth-codecov-app.tf

Terraform provisions the GitLab OAuth application directly. The generated credentials are stored back into SSM/Secrets Manager for the ECS services to consume.

resource "gitlab_application" "codecov" {
  name         = "Codecov"
  redirect_url = "https://${var.codecov_domain}/login/gle"
  scopes       = ["api"]
  confidential = true
}
Enter fullscreen mode Exit fullscreen mode

The credentials from gitlab_application.codecov are written to AWS secrets in secrets.tf (see codecov_gitlab_oauth and codecov_gitlab_client_id above).


ECS Services

Component Overview

Service CPU Memory Tasks Role
TimescaleDB 1024 2 GB 1 (fixed) Time-series database with EFS persistence
Gateway 256 512 MB 2 (fixed) Reverse proxy routing to API/Frontend/IA
Frontend 256 512 MB 2 (fixed) Web UI
API 512 1 GB 2–6 (auto-scaling) Backend REST/GraphQL API
Worker 512 1 GB 1–4 (auto-scaling) Background job processor
IA 512 1 GB 1 (fixed) AI-assisted code review service

Each service creates its own CloudWatch log group to keep logs isolated and independently configurable for retention.


ecs-service-codecov-timescale.tf

TimescaleDB runs as a Fargate task with an EFS volume for data persistence. An init sidecar waits for PostgreSQL to be ready and then runs CREATE EXTENSION IF NOT EXISTS timescaledb (idempotent: safe to run on every restart).

Important: The PGDATA env var is set to a subdirectory of the mount point (/var/lib/postgresql/data/data). Without this, the PostgreSQL entrypoint tries to chown the EFS mount root, which fails because EFS access points enforce ownership.

resource "aws_cloudwatch_log_group" "timescale" {
  name              = "/aws/ecs/${local.name_prefix}-timescale"
  retention_in_days = 30
  tags              = merge(local.tags, { Component = "timescale" })
}

resource "aws_ecs_task_definition" "timescale" {
  family                   = "${local.name_prefix}-timescale"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 1024
  memory                   = 2048
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task_execution.arn

  volume {
    name = "timescale-data"
    efs_volume_configuration {
      file_system_id     = aws_efs_file_system.timescale.id
      transit_encryption = "ENABLED"

      authorization_config {
        access_point_id = aws_efs_access_point.timescale.id
        iam             = "ENABLED"
      }
    }
  }

  container_definitions = jsonencode([
    {
      name      = "timescale"
      image     = var.timescale_image
      essential = true

      # IMPORTANT: run as postgres UID/GID matching the EFS access point
      user = "999:999"

      portMappings = [
        { containerPort = 5432, hostPort = 5432, protocol = "tcp" }
      ]

      environment = [
        { name = "POSTGRES_USER", value = var.timescale_db_username },
        { name = "POSTGRES_DB",   value = var.timescale_db_name },

        # IMPORTANT: use a subdirectory so the entrypoint doesn't try to chown the EFS mount root
        { name = "PGDATA", value = "/var/lib/postgresql/data/data" }
      ]

      secrets = [
        { name = "POSTGRES_PASSWORD", valueFrom = aws_secretsmanager_secret.timescale_password.arn }
      ]

      mountPoints = [
        { sourceVolume = "timescale-data", containerPath = "/var/lib/postgresql/data", readOnly = false }
      ]

      healthCheck = {
        command  = ["CMD-SHELL", "pg_isready -h 127.0.0.1 -p 5432 -U ${var.timescale_db_username} -d ${var.timescale_db_name} || exit 1"]
        interval = 10
        timeout  = 5
        retries  = 6
      }

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.timescale.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "timescale"
        }
      }
    },

    # One-shot init sidecar: enables the timescaledb extension (idempotent)
    {
      name      = "timescale-init"
      image     = "postgres:17"
      essential = false

      dependsOn = [
        { containerName = "timescale", condition = "START" }
      ]

      environment = [
        { name = "PGHOST",     value = "127.0.0.1" },
        { name = "PGPORT",     value = "5432" },
        { name = "PGUSER",     value = var.timescale_db_username },
        { name = "PGDATABASE", value = var.timescale_db_name }
      ]

      secrets = [
        { name = "PGPASSWORD", valueFrom = aws_secretsmanager_secret.timescale_password.arn }
      ]

      command = [
        "bash", "-lc",
        "for i in $(seq 1 60); do pg_isready -h 127.0.0.1 -p 5432 && break; sleep 2; done; pg_isready -h 127.0.0.1 -p 5432 || exit 1; psql -v ON_ERROR_STOP=1 -c \"CREATE EXTENSION IF NOT EXISTS timescaledb;\""
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.timescale.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "timescale-init"
        }
      }
    }
  ])

  tags = merge(local.tags, { Component = "timescale" })
}

resource "aws_ecs_service" "timescale" {
  name                   = "${local.name_prefix}-timescale"
  cluster                = aws_ecs_cluster.this.id
  task_definition        = aws_ecs_task_definition.timescale.arn
  desired_count          = 1
  launch_type            = "FARGATE"
  enable_execute_command = true

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.timescale.id]
    assign_public_ip = false
  }

  service_registries {
    registry_arn = aws_service_discovery_service.timescale.arn
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  tags = merge(local.tags, { Component = "timescale" })
}
Enter fullscreen mode Exit fullscreen mode

ecs-service-codecov-gateway.tf

The gateway is the only Codecov service registered with the ALB. It is a Traefik-based reverse proxy that routes traffic to the frontend, API, and IA services based on path prefix.

Important env var naming: The gateway uses CODECOV_DEFAULT_HOST (not CODECOV_FRONTEND_HOST) to route to the frontend. This is not obvious from the docs.

resource "aws_cloudwatch_log_group" "codecov_gateway" {
  name              = "/aws/ecs/${local.name_prefix}-codecov-gateway"
  retention_in_days = 30
  tags              = merge(local.tags, { Component = "gateway" })
}

resource "aws_ecs_task_definition" "codecov_gateway" {
  family                   = "${local.name_prefix}-codecov-gateway"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 256
  memory                   = 512
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task_codecov.arn

  container_definitions = jsonencode([
    {
      name      = "gateway"
      image     = local.codecov_images.gateway
      essential = true

      portMappings = [
        { containerPort = 8080, protocol = "tcp" }
      ]

      environment = [
        # Disable MinIO sidecar we use S3 directly
        { name = "CODECOV_GATEWAY_MINIO_ENABLED", value = "false" },

        # API upstream (internal Cloud Map name)
        { name = "CODECOV_API_HOST", value = "api.${local.name_prefix}.local" },
        { name = "CODECOV_API_PORT", value = "8000" },

        # Frontend upstream NOTE: DEFAULT_*, not FRONTEND_*
        { name = "CODECOV_DEFAULT_HOST", value = "frontend.${local.name_prefix}.local" },
        { name = "CODECOV_DEFAULT_PORT", value = "8080" },

        # IA service
        { name = "CODECOV_IA_HOST", value = "ia.${local.name_prefix}.local" },
        { name = "CODECOV_IA_PORT", value = "8000" },
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.codecov_gateway.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "gateway"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "bash -c '</dev/tcp/127.0.0.1/8080' || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 300
      }
    }
  ])

  tags = merge(local.tags, { Component = "gateway" })
}

resource "aws_ecs_service" "codecov_gateway" {
  name                   = "${local.name_prefix}-codecov-gateway"
  cluster                = aws_ecs_cluster.this.id
  task_definition        = aws_ecs_task_definition.codecov_gateway.arn
  desired_count          = 2
  launch_type            = "FARGATE"
  enable_execute_command = true

  lifecycle {
    ignore_changes = [desired_count]
  }

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.ecs_services.id]
    assign_public_ip = false
  }

  load_balancer {
    target_group_arn = module.alb.target_groups["codecov_gateway"].arn
    container_name   = "gateway"
    container_port   = 8080
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  tags = merge(local.tags, { Component = "gateway" })

  depends_on = [module.alb]
}
Enter fullscreen mode Exit fullscreen mode

ecs-service-codecov-frontend.tf

The frontend needs to know the GitLab instance URL to render the login button correctly. Both CODECOV_GLE_CLIENT_ID and GITLAB_ENTERPRISE_CLIENT_ID are set to the same value: both names are checked by different parts of the frontend code.

resource "aws_cloudwatch_log_group" "codecov_frontend" {
  name              = "/aws/ecs/${local.name_prefix}-codecov-frontend"
  retention_in_days = 30
  tags              = merge(local.tags, { Component = "frontend" })
}

resource "aws_ecs_task_definition" "codecov_frontend" {
  family                   = "${local.name_prefix}-codecov-frontend"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 256
  memory                   = 512
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task_codecov.arn

  container_definitions = jsonencode([
    {
      name      = "frontend"
      image     = local.codecov_images.frontend
      essential = true

      portMappings = [
        { containerPort = 8080, protocol = "tcp" }
      ]

      environment = [
        { name = "CODECOV_BASE_HOST", value = var.codecov_domain },
        { name = "CODECOV_API_HOST",  value = var.codecov_domain },
        { name = "CODECOV_SCHEME",    value = "https" },

        # Required to show the GitLab Enterprise login button in the UI
        { name = "CODECOV_GLE_HOST", value = var.gitlab_url },
      ]

      secrets = [
        # Both names are read by different parts of the frontend code
        {
          name      = "GITLAB_ENTERPRISE_CLIENT_ID"
          valueFrom = aws_ssm_parameter.codecov_gitlab_client_id.arn
        },
        {
          name      = "CODECOV_GLE_CLIENT_ID"
          valueFrom = aws_ssm_parameter.codecov_gitlab_client_id.arn
        }
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.codecov_frontend.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "frontend"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "wget --no-verbose --tries=1 --spider http://localhost:8080/ || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 60
      }
    }
  ])

  tags = merge(local.tags, { Component = "frontend" })
}

resource "aws_ecs_service" "codecov_frontend" {
  name            = "${local.name_prefix}-codecov-frontend"
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.codecov_frontend.arn
  desired_count   = 2
  launch_type     = "FARGATE"

  lifecycle {
    ignore_changes = [desired_count]
  }

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.ecs_services.id]
    assign_public_ip = false
  }

  service_registries {
    registry_arn = aws_service_discovery_service.frontend.arn
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  tags = merge(local.tags, { Component = "frontend" })
}
Enter fullscreen mode Exit fullscreen mode

ecs-service-codecov-api.tf

This is the most configuration-heavy service. A few things worth calling out:

  1. Dual env var paths for GitLab: Each GitLab setting is set twice: Once under SETUP__GITLAB_ENTERPRISE__* (feeds the runtime YAML config, used by internal config loading) and once under GITLAB_ENTERPRISE__* (direct config path, required by the GraphQL loginProviders resolver and OAuth callback handler). We discovered this by reading the Codecov Python source.

  2. S3 via SERVICES__MINIO__*: Despite the name, these env vars configure S3, not MinIO. Setting IAM_AUTH = true makes Codecov use the task's IAM role for S3 authentication, no static credentials.

  3. JSONCONFIG___ prefix: Variables with this prefix are parsed as JSON by Codecov's config loader. It's the only way to set list-valued config (like admins) through environment variables.

resource "aws_cloudwatch_log_group" "codecov_api" {
  name              = "/aws/ecs/${local.name_prefix}-codecov-api"
  retention_in_days = 30
  tags              = merge(local.tags, { Component = "api" })
}

resource "aws_ecs_task_definition" "codecov_api" {
  family                   = "${local.name_prefix}-codecov-api"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task_codecov.arn

  container_definitions = jsonencode([
    {
      name      = "api"
      image     = local.codecov_images.api
      essential = true

      portMappings = [{ containerPort = 8000, protocol = "tcp" }]

      environment = [
        { name = "RUN_ENV", value = "ENTERPRISE" },

        # External URLs
        { name = "CODECOV_URL",            value = "https://${var.codecov_domain}" },
        { name = "SETUP__CODECOV_URL",     value = "https://${var.codecov_domain}" },
        { name = "SETUP__CODECOV_API_URL", value = "https://${var.codecov_domain}/api" },

        # Enable TimescaleDB time-series features
        { name = "SETUP__TIMESERIES__ENABLED",    value = "true" },
        { name = "SETUP__TA_TIMESERIES__ENABLED", value = "true" },

        # GitLab Enterprise SETUP__* path (runtime YAML config)
        { name = "SETUP__GITLAB_ENTERPRISE__ENABLED",                   value = "true" },
        { name = "SETUP__GITLAB_ENTERPRISE__URL",                       value = var.gitlab_url },
        { name = "SETUP__GITLAB_ENTERPRISE__CLIENT_ID",                 value = tostring(gitlab_application.codecov.application_id) },
        { name = "SETUP__GITLAB_ENTERPRISE__GLOBAL_UPLOAD_TOKEN_ENABLED", value = "true" },

        # GitLab Enterprise direct path (required for loginProviders GraphQL resolver and OAuth callback)
        # get_config("gitlab_enterprise", "url") reads GITLAB_ENTERPRISE__URL
        { name = "GITLAB_ENTERPRISE__URL",          value = var.gitlab_url },
        { name = "GITLAB_ENTERPRISE__CLIENT_ID",    value = tostring(gitlab_application.codecov.application_id) },
        { name = "GITLAB_ENTERPRISE__REDIRECT_URI", value = "https://${var.codecov_domain}/login/gle" },
        { name = "GITLAB_ENTERPRISE__API_URL",      value = "${var.gitlab_url}/api/v4" },

        # S3 storage uses MINIO-style env vars regardless of the actual provider
        { name = "SERVICES__MINIO__HOST",       value = "s3.${var.region}.amazonaws.com" },
        { name = "SERVICES__MINIO__BUCKET",     value = aws_s3_bucket.codecov_storage.id },
        { name = "SERVICES__MINIO__REGION",     value = var.region },
        { name = "SERVICES__MINIO__VERIFY_SSL", value = "true" },
        { name = "SERVICES__MINIO__IAM_AUTH",   value = "true" },  # Use task IAM role, no credentials needed

        # Admins JSONCONFIG___ prefix triggers json.loads() in the config loader
        {
          name = "JSONCONFIG___SETUP__ADMINS"
          value = jsonencode([
            for username in var.codecov_admins : {
              service  = "gitlab_enterprise"
              username = username
            }
          ])
        },

        # Disable guest/anonymous access
        { name = "SETUP__GUEST_ACCESS", value = "off" },
      ]

      secrets = [
        { name = "SERVICES__DATABASE_URL",              valueFrom = aws_secretsmanager_secret.codecov_database_url.arn },
        { name = "SERVICES__TIMESERIES_DATABASE_URL",   valueFrom = aws_secretsmanager_secret.codecov_timeseries_database_url.arn },
        { name = "SERVICES__TA_TIMESERIES_DATABASE_URL", valueFrom = aws_secretsmanager_secret.codecov_timeseries_database_url.arn },
        { name = "SERVICES__REDIS_URL",                 valueFrom = aws_secretsmanager_secret.codecov_redis_url.arn },
        { name = "SETUP__HTTP__COOKIE_SECRET",          valueFrom = aws_secretsmanager_secret.codecov_cookie_secret.arn },
        { name = "SETUP__ENTERPRISE_LICENSE",           valueFrom = aws_ssm_parameter.codecov_license.arn },

        # GitLab OAuth client secret set on both paths
        { name = "SETUP__GITLAB_ENTERPRISE__CLIENT_SECRET", valueFrom = aws_secretsmanager_secret.codecov_gitlab_oauth.arn },
        { name = "GITLAB_ENTERPRISE__CLIENT_SECRET",        valueFrom = aws_secretsmanager_secret.codecov_gitlab_oauth.arn },

        # Bot token for posting status checks back to GitLab
        { name = "GITLAB_ENTERPRISE__BOT__KEY", valueFrom = aws_secretsmanager_secret.codecov_gitlab_bot_token.arn },

        # Global upload token
        { name = "SETUP__GITLAB_ENTERPRISE__GLOBAL_UPLOAD_TOKEN", valueFrom = aws_ssm_parameter.codecov_upload_token.arn },
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.codecov_api.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "api"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "bash -c '</dev/tcp/127.0.0.1/8000' || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 120
      }
    }
  ])

  tags = merge(local.tags, { Component = "api" })
}

resource "aws_ecs_service" "codecov_api" {
  name                   = "${local.name_prefix}-codecov-api"
  cluster                = aws_ecs_cluster.this.id
  task_definition        = aws_ecs_task_definition.codecov_api.arn
  desired_count          = 2
  launch_type            = "FARGATE"
  enable_execute_command = true

  lifecycle {
    ignore_changes = [desired_count]
  }

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.ecs_services.id]
    assign_public_ip = false
  }

  service_registries {
    registry_arn = aws_service_discovery_service.api.arn
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  tags = merge(local.tags, { Component = "api" })
}
Enter fullscreen mode Exit fullscreen mode

ecs-service-codecov-worker.tf

The worker has no inbound ports, it pulls jobs from Redis queues. It needs the same GitLab and S3 config as the API since it processes coverage uploads and posts results back to GitLab.

resource "aws_cloudwatch_log_group" "codecov_worker" {
  name              = "/aws/ecs/${local.name_prefix}-codecov-worker"
  retention_in_days = 30
  tags              = merge(local.tags, { Component = "worker" })
}

resource "aws_ecs_task_definition" "codecov_worker" {
  family                   = "${local.name_prefix}-codecov-worker"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task_codecov.arn

  container_definitions = jsonencode([
    {
      name      = "worker"
      image     = local.codecov_images.worker
      essential = true

      # Worker has no inbound ports pulls work from Redis queues

      environment = [
        { name = "RUN_ENV",                          value = "ENTERPRISE" },
        { name = "CODECOV_URL",                      value = "https://${var.codecov_domain}" },
        { name = "SETUP__CODECOV_URL",               value = "https://${var.codecov_domain}" },
        { name = "SETUP__TIMESERIES__ENABLED",       value = "true" },
        { name = "SETUP__TA_TIMESERIES__ENABLED",    value = "true" },

        # GitLab Enterprise
        { name = "GITLAB_ENTERPRISE__URL",     value = var.gitlab_url },
        { name = "GITLAB_ENTERPRISE__API_URL", value = "${var.gitlab_url}/api/v4" },

        # S3 storage
        { name = "SERVICES__MINIO__HOST",       value = "s3.${var.region}.amazonaws.com" },
        { name = "SERVICES__MINIO__BUCKET",     value = aws_s3_bucket.codecov_storage.id },
        { name = "SERVICES__MINIO__REGION",     value = var.region },
        { name = "SERVICES__MINIO__VERIFY_SSL", value = "true" },
        { name = "SERVICES__MINIO__IAM_AUTH",   value = "true" },
      ]

      secrets = [
        { name = "SERVICES__DATABASE_URL",               valueFrom = aws_secretsmanager_secret.codecov_database_url.arn },
        { name = "SERVICES__TIMESERIES_DATABASE_URL",    valueFrom = aws_secretsmanager_secret.codecov_timeseries_database_url.arn },
        { name = "SERVICES__TA_TIMESERIES_DATABASE_URL", valueFrom = aws_secretsmanager_secret.codecov_timeseries_database_url.arn },
        { name = "SERVICES__REDIS_URL",                  valueFrom = aws_secretsmanager_secret.codecov_redis_url.arn },
        { name = "SETUP__ENTERPRISE_LICENSE",            valueFrom = aws_ssm_parameter.codecov_license.arn },
        { name = "GITLAB_ENTERPRISE__CLIENT_ID",         valueFrom = aws_ssm_parameter.codecov_gitlab_client_id.arn },
        { name = "GITLAB_ENTERPRISE__CLIENT_SECRET",     valueFrom = aws_secretsmanager_secret.codecov_gitlab_oauth.arn },
        { name = "GITLAB_ENTERPRISE__BOT__KEY",          valueFrom = aws_secretsmanager_secret.codecov_gitlab_bot_token.arn },
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.codecov_worker.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "worker"
        }
      }
    }
  ])

  tags = merge(local.tags, { Component = "worker" })
}

resource "aws_ecs_service" "codecov_worker" {
  name            = "${local.name_prefix}-codecov-worker"
  cluster         = aws_ecs_cluster.this.id
  task_definition = aws_ecs_task_definition.codecov_worker.arn
  desired_count   = 1
  launch_type     = "FARGATE"

  lifecycle {
    ignore_changes = [desired_count]
  }

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.ecs_services.id]
    assign_public_ip = false
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  tags = merge(local.tags, { Component = "worker" })
}
Enter fullscreen mode Exit fullscreen mode

ecs-service-codecov-ai.tf

The IA (Ingestion/API umbrella) service is required by newer gateway releases. It listens on port 8000 and registers with Cloud Map.

resource "aws_cloudwatch_log_group" "codecov_ia" {
  name              = "/aws/ecs/${local.name_prefix}-codecov-ia"
  retention_in_days = 30
  tags              = merge(local.tags, { Component = "ia" })
}

resource "aws_ecs_task_definition" "codecov_ia" {
  family                   = "${local.name_prefix}-codecov-ia"
  requires_compatibilities = ["FARGATE"]
  network_mode             = "awsvpc"
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_task_execution.arn
  task_role_arn            = aws_iam_role.ecs_task_codecov.arn

  container_definitions = jsonencode([
    {
      name      = "ia"
      image     = local.codecov_images.ia
      essential = true

      portMappings = [
        { containerPort = 8000, protocol = "tcp" }
      ]

      environment = [
        { name = "RUN_ENV",                          value = "ENTERPRISE" },
        { name = "CODECOV_URL",                      value = "https://${var.codecov_domain}" },
        { name = "SETUP__CODECOV_URL",               value = "https://${var.codecov_domain}" },
        { name = "SETUP__TIMESERIES__ENABLED",       value = "true" },
        { name = "SETUP__TA_TIMESERIES__ENABLED",    value = "true" },
        { name = "SERVICES__MINIO__HOST",            value = "s3.${var.region}.amazonaws.com" },
        { name = "SERVICES__MINIO__BUCKET",          value = aws_s3_bucket.codecov_storage.id },
        { name = "SERVICES__MINIO__REGION",          value = var.region },
        { name = "SERVICES__MINIO__VERIFY_SSL",      value = "true" },
        { name = "SERVICES__MINIO__IAM_AUTH",        value = "true" },
      ]

      secrets = [
        { name = "SERVICES__DATABASE_URL",               valueFrom = aws_secretsmanager_secret.codecov_database_url.arn },
        { name = "SERVICES__TIMESERIES_DATABASE_URL",    valueFrom = aws_secretsmanager_secret.codecov_timeseries_database_url.arn },
        { name = "SERVICES__TA_TIMESERIES_DATABASE_URL", valueFrom = aws_secretsmanager_secret.codecov_timeseries_database_url.arn },
        { name = "SERVICES__REDIS_URL",                  valueFrom = aws_secretsmanager_secret.codecov_redis_url.arn },
        { name = "SETUP__ENTERPRISE_LICENSE",            valueFrom = aws_ssm_parameter.codecov_license.arn },
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = aws_cloudwatch_log_group.codecov_ia.name
          "awslogs-region"        = var.region
          "awslogs-stream-prefix" = "ia"
        }
      }

      healthCheck = {
        command     = ["CMD-SHELL", "bash -c '</dev/tcp/127.0.0.1/8000' || exit 1"]
        interval    = 30
        timeout     = 5
        retries     = 3
        startPeriod = 120
      }
    }
  ])

  tags = merge(local.tags, { Component = "ia" })
}

resource "aws_ecs_service" "codecov_ia" {
  name                   = "${local.name_prefix}-codecov-ia"
  cluster                = aws_ecs_cluster.this.id
  task_definition        = aws_ecs_task_definition.codecov_ia.arn
  desired_count          = 1
  launch_type            = "FARGATE"
  enable_execute_command = true

  lifecycle { ignore_changes = [desired_count] }

  network_configuration {
    subnets          = data.aws_subnets.private.ids
    security_groups  = [aws_security_group.ecs_services.id]
    assign_public_ip = false
  }

  service_registries {
    registry_arn = aws_service_discovery_service.ia.arn
  }

  deployment_circuit_breaker {
    enable   = true
    rollback = true
  }

  tags = merge(local.tags, { Component = "ia" })
}
Enter fullscreen mode Exit fullscreen mode

autoscaling.tf

CPU-based auto-scaling keeps the cluster right-sized. The API scales between 2 and 6 tasks; the Worker scales between 1 and 4. Both trigger at 70% average CPU utilization.

################################################################################
# ECS Auto Scaling - API
################################################################################

resource "aws_appautoscaling_target" "api" {
  max_capacity       = 6
  min_capacity       = 2
  resource_id        = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.codecov_api.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "api_cpu" {
  name               = "${local.name_prefix}-codecov-api-cpu"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.api.resource_id
  scalable_dimension = aws_appautoscaling_target.api.scalable_dimension
  service_namespace  = aws_appautoscaling_target.api.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 70
  }
}

################################################################################
# ECS Auto Scaling - Worker
################################################################################

resource "aws_appautoscaling_target" "worker" {
  max_capacity       = 4
  min_capacity       = 1
  resource_id        = "service/${aws_ecs_cluster.this.name}/${aws_ecs_service.codecov_worker.name}"
  scalable_dimension = "ecs:service:DesiredCount"
  service_namespace  = "ecs"
}

resource "aws_appautoscaling_policy" "worker_cpu" {
  name               = "${local.name_prefix}-codecov-worker-cpu"
  policy_type        = "TargetTrackingScaling"
  resource_id        = aws_appautoscaling_target.worker.resource_id
  scalable_dimension = aws_appautoscaling_target.worker.scalable_dimension
  service_namespace  = aws_appautoscaling_target.worker.service_namespace

  target_tracking_scaling_policy_configuration {
    predefined_metric_specification {
      predefined_metric_type = "ECSServiceAverageCPUUtilization"
    }
    target_value = 70
  }
}
Enter fullscreen mode Exit fullscreen mode

alb.tf

################################################################################
# Application Load Balancer
################################################################################

module "alb" {
  source  = "terraform-aws-modules/alb/aws"
  version = "~> 10.0"

  name               = "${local.name_prefix}-alb"
  load_balancer_type = "application"

  vpc_id  = data.aws_vpc.main.id
  subnets = data.aws_subnets.public.ids

  enable_deletion_protection = true

  security_group_ingress_rules = {
    all_http = {
      from_port   = 80
      to_port     = 80
      ip_protocol = "tcp"
      cidr_ipv4   = "0.0.0.0/0"
    }
    all_https = {
      from_port   = 443
      to_port     = 443
      ip_protocol = "tcp"
      cidr_ipv4   = "0.0.0.0/0"
    }
  }

  security_group_egress_rules = {
    vpc_outbound = {
      description = "Allow outbound to VPC for ECS health checks and container communication"
      ip_protocol = "-1"
      cidr_ipv4   = data.aws_vpc.main.cidr_block
    }
  }

  listeners = {
    # HTTP listener redirects all traffic to HTTPS
    http = {
      port     = 80
      protocol = "HTTP"

      redirect = {
        port        = "443"
        protocol    = "HTTPS"
        status_code = "HTTP_301"
      }
    }

    # HTTPS listener routes to Gateway target group
    https = {
      port            = 443
      protocol        = "HTTPS"
      ssl_policy      = "ELBSecurityPolicy-TLS13-1-2-Res-PQ-2025-09"
      certificate_arn = aws_acm_certificate_validation.codecov.certificate_arn

      forward = {
        target_group_key = "codecov_gateway"
      }
    }
  }

  target_groups = {
    codecov_gateway = {
      backend_protocol                  = "HTTP"
      backend_port                      = 8080
      target_type                       = "ip"
      deregistration_delay              = 30
      load_balancing_cross_zone_enabled = true

      health_check = {
        enabled             = true
        healthy_threshold   = 3
        interval            = 30
        matcher             = "200-399"
        path                = "/"
        port                = "traffic-port"
        protocol            = "HTTP"
        timeout             = 5
        unhealthy_threshold = 3
      }

      create_attachment = false
    }
  }

  tags = local.tags
}
Enter fullscreen mode Exit fullscreen mode

acm.tf

ACM certificates require DNS validation. The Cloudflare provider creates the validation records automatically so Terraform waits until the certificate is issued before proceeding.

In our case, we use a internal module to handle all these steps, here I will include all resources needed to achieve the same

################################################################################
# ACM Certificate with Cloudflare DNS validation
################################################################################

resource "aws_acm_certificate" "codecov" {
  domain_name       = var.codecov_domain
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }

  tags = local.tags
}

resource "cloudflare_dns_record" "acm_validation" {
  provider = cloudflare.main

  for_each = {
    for dvo in aws_acm_certificate.codecov.domain_validation_options : dvo.domain_name => {
      name  = dvo.resource_record_name
      type  = dvo.resource_record_type
      value = dvo.resource_record_value
    }
  }

  zone_id = var.cloudflare_zone_id
  name    = each.value.name
  type    = each.value.type
  content = each.value.value
  ttl     = 60
}

resource "aws_acm_certificate_validation" "codecov" {
  certificate_arn         = aws_acm_certificate.codecov.arn
  validation_record_fqdns = [for r in cloudflare_dns_record.acm_validation : r.hostname]
}
Enter fullscreen mode Exit fullscreen mode

dns.tf

################################################################################
# Cloudflare DNS Record codecov.example.com -> ALB
################################################################################

resource "cloudflare_dns_record" "codecov" {
  provider = cloudflare.main
  zone_id  = var.cloudflare_zone_id
  name     = var.codecov_domain
  type     = "CNAME"
  content  = module.alb.dns_name
  ttl      = 60
  proxied  = false
}
Enter fullscreen mode Exit fullscreen mode

outputs.tf

################################################################################
# ECS Cluster
################################################################################

output "ecs_cluster_name" {
  description = "Name of the ECS cluster"
  value       = aws_ecs_cluster.this.name
}

output "ecs_cluster_arn" {
  description = "ARN of the ECS cluster"
  value       = aws_ecs_cluster.this.arn
}

################################################################################
# Networking
################################################################################

output "alb_dns_name" {
  description = "DNS name of the Application Load Balancer"
  value       = module.alb.dns_name
}

output "codecov_url" {
  description = "URL to access Codecov"
  value       = "https://${var.codecov_domain}"
}

output "vpc_id" {
  description = "VPC ID used by the ECS cluster"
  value       = data.aws_vpc.main.id
}

################################################################################
# Data Layer
################################################################################

output "rds_endpoint" {
  description = "RDS PostgreSQL endpoint"
  value       = aws_db_instance.codecov.endpoint
  sensitive   = true
}

output "redis_endpoint" {
  description = "ElastiCache Redis endpoint"
  value       = aws_elasticache_cluster.codecov.cache_nodes[0].address
  sensitive   = true
}

output "codecov_storage_bucket" {
  description = "S3 bucket name for Codecov storage"
  value       = aws_s3_bucket.codecov_storage.id
}

################################################################################
# IAM
################################################################################

output "ecs_task_execution_role_arn" {
  description = "ARN of the ECS task execution role"
  value       = aws_iam_role.ecs_task_execution.arn
}

output "ecs_task_role_arn" {
  description = "ARN of the Codecov ECS task role"
  value       = aws_iam_role.ecs_task_codecov.arn
}

################################################################################
# GitLab Application
################################################################################

output "codecov_gitlab_client_id" {
  value     = gitlab_application.codecov.application_id
  sensitive = true
}

output "codecov_gitlab_client_secret" {
  value     = gitlab_application.codecov.secret
  sensitive = true
}
Enter fullscreen mode Exit fullscreen mode

CI/CD Pipeline

The GitLab CI pipeline enforces a safe plan-before-apply workflow. Every merge request gets a visible terraform plan diff as a pipeline artifact, and production applies require a manual button click in the GitLab UI.

# .gitlab-ci.yml

stages:
  - test
  - secret-detection
  - plan
  - apply

variables:
  TF_ROOT: terraform
  TF_STATE_NAME: tooling
  TF_VAR_gitlab_token: $DEVOPS_SA_GITLAB_TOKEN

include:
  - template: Jobs/SAST.gitlab-ci.yml
  - template: Jobs/Secret-Detection.gitlab-ci.yml

terraform_plan:
  stage: plan
  script:
    - cd $TF_ROOT
    - terraform init
    - terraform plan -out=tfplan -no-color | tee plan.txt
  artifacts:
    paths:
      - $TF_ROOT/tfplan
      - $TF_ROOT/plan.txt
    expire_in: 1 week
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH

terraform_apply:
  stage: apply
  script:
    - cd $TF_ROOT
    - terraform init
    - terraform apply -auto-approve tfplan
  dependencies:
    - terraform_plan
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
      when: manual  # Requires explicit button click in GitLab UI
Enter fullscreen mode Exit fullscreen mode

Deploying

First-Time Setup

  1. Configure variables create a terraform.tfvars (keep it out of git, it's in .gitignore):
region                     = "eu-central-1"
environment                = "tooling"
vpc_name                   = "mycompany-vpc-tooling"
codecov_domain             = "codecov.example.com"
cloudflare_zone_id         = "your-cloudflare-zone-id"
cloudflare_api_token       = "..."   # preferably via TF_VAR_cloudflare_api_token env var
codecov_enterprise_license = ""      # leave empty to use the bundled 50-user community license
gitlab_token               = "..."   # GitLab admin token
codecov_admins             = ["alice", "bob", "carol"]
gitlab_url                 = "https://git.example.com"
rds_instance_class         = "db.t3.small"
rds_allocated_storage      = 20
redis_node_type            = "cache.t3.micro"
Enter fullscreen mode Exit fullscreen mode
  1. Initialize Terraform:
terraform init
Enter fullscreen mode Exit fullscreen mode
  1. Review the plan:
terraform plan -out=tfplan
Enter fullscreen mode Exit fullscreen mode
  1. Apply:
terraform apply tfplan
Enter fullscreen mode Exit fullscreen mode

The first apply takes 10–20 minutes, mostly waiting for RDS Multi-AZ provisioning and ACM certificate DNS validation.

Pre-Commit Hooks

Install the hooks to catch formatting and secret issues before they reach CI:

pip install pre-commit
pre-commit install
Enter fullscreen mode Exit fullscreen mode

A minimal .pre-commit-config.yaml:

repos:
  - repo: https://github.com/antonbabenko/pre-commit-terraform
    rev: v1.105.0
    hooks:
      - id: terraform_fmt
      - id: terraform_validate

  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.30.0
    hooks:
      - id: gitleaks
Enter fullscreen mode Exit fullscreen mode

Day-2 Operations

Viewing Logs

All ECS services log to CloudWatch with the /aws/ecs/{name_prefix}-codecov-{service} naming pattern:

aws logs tail /aws/ecs/mycompany-tooling-codecov-api    --follow
aws logs tail /aws/ecs/mycompany-tooling-codecov-worker --follow
aws logs tail /aws/ecs/mycompany-tooling-timescale      --follow
Enter fullscreen mode Exit fullscreen mode

Restarting a Service

Force a new deployment (replaces all running tasks with new ones):

aws ecs update-service \
  --cluster mycompany-tooling-ecs \
  --service mycompany-tooling-codecov-api \
  --force-new-deployment
Enter fullscreen mode Exit fullscreen mode

Upgrading Codecov

Update the version in locals.tf:

codecov_version = "26.3.0"
Enter fullscreen mode Exit fullscreen mode

Run terraform plan to confirm only task definitions change, then apply. ECS performs a rolling deployment the circuit breaker rolls back automatically if health checks fail.

Scaling Manually

Auto-scaling handles normal load. To override the desired count:

aws ecs update-service \
  --cluster mycompany-tooling-ecs \
  --service mycompany-tooling-codecov-worker \
  --desired-count 3
Enter fullscreen mode Exit fullscreen mode

Note: lifecycle { ignore_changes = [desired_count] } in the service resources prevents Terraform from reverting manual scaling on the next apply.

Connecting GitLab CI to Codecov

After deployment, get the upload token:

aws ssm get-parameter \
  --name /mycompany-tooling/codecov/upload-token \
  --with-decryption \
  --query Parameter.Value \
  --output text
Enter fullscreen mode Exit fullscreen mode

Add this to each project's .gitlab-ci.yml:

upload_coverage:
  stage: test
  image: python:3.12-slim
  script:
    - pip install codecov-cli
    - codecovcli upload-process
        --token $CODECOV_TOKEN
        --codecov-yaml-path .codecov.yml
  coverage: '/TOTAL.*\s+(\d+%)$/'
  variables:
    CODECOV_URL: "https://codecov.example.com"
Enter fullscreen mode Exit fullscreen mode

Set CODECOV_TOKEN as a CI/CD variable in the project or group settings.


Summary

This setup gives you a production-ready Codecov deployment with:

  • High availability: Multi-AZ RDS, multi-AZ NAT gateways, 2+ replicas on gateway/frontend, circuit-breaker rollback on all services
  • Security: No public IPs on workloads, VPC endpoints, Secrets Manager for all sensitive values, S3 HTTPS-only policy, IAM-based S3 auth (no static credentials), least-privilege roles
  • Cost awareness: cache.t3.micro Redis, db.t3.small RDS, VPC endpoints reducing NAT data transfer costs, Fargate pay-per-use
  • GitLab integration fully automated: OAuth application, credentials, and env vars all provisioned by Terraform, no manual steps in the GitLab UI
  • Operational simplicity: GitOps via Terraform, centralized secrets, CloudWatch logging per service, CPU-based auto-scaling

The full Terraform state covers ~60 resources. A terraform destroy cleans up everything except the S3 state bucket, the RDS final snapshot, and any secrets with ignore_changes all intentional protections against accidental data loss.

Top comments (0)