<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ederson Brilhante</title>
    <description>The latest articles on DEV Community by Ederson Brilhante (@edersonbrilhante).</description>
    <link>https://dev.to/edersonbrilhante</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F606500%2F67cc9470-d75a-4b86-bb9f-07329fb2558a.jpeg</url>
      <title>DEV Community: Ederson Brilhante</title>
      <link>https://dev.to/edersonbrilhante</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/edersonbrilhante"/>
    <language>en</language>
    <item>
      <title>ForgeMT: GitHub Actions at Scale with Security and Multi-Tenancy on AWS</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Tue, 12 Aug 2025 14:51:47 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/forgemt-github-actions-at-scale-with-security-and-multi-tenancy-on-aws-3no9</link>
      <guid>https://dev.to/edersonbrilhante/forgemt-github-actions-at-scale-with-security-and-multi-tenancy-on-aws-3no9</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;GitHub Actions is the go-to CI/CD tool for many teams. But when your organization runs thousands of pipelines daily, the default setup breaks down. You hit limits on scale, security, and governance — plus skyrocketing costs.&lt;/p&gt;

&lt;p&gt;GitHub-hosted runners are easy but expensive and don’t meet strict compliance needs. Existing self-hosted solutions like Actions Runner Controller (ARC) or Terraform EC2 modules don’t fully solve multi-tenant isolation, automation, or centralized control.&lt;/p&gt;

&lt;p&gt;ForgeMT, built inside Cisco’s Security Business Group, fills that gap. It’s an open-source AWS-native platform that manages ephemeral runners with strong tenant isolation, full automation, and enterprise-grade governance.&lt;/p&gt;

&lt;p&gt;This article explains why ForgeMT matters and how it works — providing a practical look at building scalable, secure GitHub Actions runner platforms.&lt;/p&gt;




&lt;h1&gt;
  
  
  Why Enterprise CI/CD Runners Fail at Scale
&lt;/h1&gt;

&lt;p&gt;At large organizations, scaling GitHub Actions runners encounters four key bottlenecks:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Fragmented Infrastructure:&lt;/strong&gt;&lt;br&gt;
Teams independently choose their CI/CD tools: Jenkins, Travis, CircleCI, or self-hosted runners—which accelerates local delivery but creates duplicated effort, configuration drift, and fragmented monitoring. Without a unified platform, scalability, security, and reliability degrade.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weak Tenant Isolation:&lt;/strong&gt;&lt;br&gt;
Runners run untrusted code across teams. Without strong isolation, one compromised job can leak credentials or escalate attacks across tenants. Poor audit trails slow breach detection and hinder compliance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scalability Limits:&lt;/strong&gt;&lt;br&gt;
Static IP pools cause IPv4 exhaustion, and manual provisioning delays runner startup. Without elastic scaling, resources are wasted or pipelines queue up, killing developer velocity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Maintenance and Governance Overhead:&lt;/strong&gt;&lt;br&gt;
Uneven patching weakens security, infrastructure drift complicates troubleshooting, and audits become expensive and error-prone. Secure scaling demands centralized governance, consistent policy enforcement, and automation.&lt;/p&gt;

&lt;p&gt;In short, enterprises fail to scale GitHub Actions runners without a platform that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Centralizes multi-tenancy&lt;/li&gt;
&lt;li&gt;Automates lifecycle management&lt;/li&gt;
&lt;li&gt;Provides enterprise-grade observability and governance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;But beware—over-centralization can kill flexibility and introduce new challenges.&lt;/p&gt;


&lt;h1&gt;
  
  
  Why GitHub Actions — And Why It’s Not Enough at Enterprise Scale
&lt;/h1&gt;

&lt;p&gt;GitHub Actions is popular because it offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Deep GitHub integration:&lt;/strong&gt; triggers on PRs, branches, and tags with no extra logins, plus automatic secret and artifact handling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Extensible ecosystem:&lt;/strong&gt; thousands of marketplace actions simplify workflow creation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flexible runners:&lt;/strong&gt; GitHub-hosted runners for convenience, or self-hosted for control, cost savings, and compliance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Granular security:&lt;/strong&gt; native GitHub Apps, OIDC tokens, and fine-grained permissions enforce least privilege.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rapid scale:&lt;/strong&gt; pipelines at repo or org level enable smooth CI/CD growth.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;However, GitHub Actions alone can’t meet enterprise-scale demands. Enterprises require:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong tenant isolation and centralized governance across thousands of pipelines.&lt;/li&gt;
&lt;li&gt;A unified platform to avoid fragmented infrastructure and scaling bottlenecks.&lt;/li&gt;
&lt;li&gt;Fine-grained identity, network controls, and compliance enforcement.&lt;/li&gt;
&lt;li&gt;Automation for onboarding, patching, and auditing to reduce operational overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud providers like AWS supply identity, networking, and automation building blocks—IAM/OIDC, VPC segmentation, EC2, EKS (needed to build secure, scalable, multi-tenant CI/CD platforms).&lt;/p&gt;


&lt;h1&gt;
  
  
  Existing Solutions and Why They Fall Short
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Actions Runner Controller (ARC)&lt;/strong&gt; runs ephemeral Kubernetes pods as GitHub runners, scaling dynamically with declarative config and Kubernetes-native integration. But:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kubernetes namespaces alone don’t provide strong security isolation.&lt;/li&gt;
&lt;li&gt;No native AWS IAM/OIDC integration.&lt;/li&gt;
&lt;li&gt;Lacks onboarding, governance, and audit automation.&lt;/li&gt;
&lt;li&gt;Network policy management is manual, increasing operational overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Terraform AWS GitHub Runner Module&lt;/strong&gt; provisions EC2 self-hosted runners with customizable AMIs, integrating well with IaC pipelines. However:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Typically deployed per team, causing fragmentation.&lt;/li&gt;
&lt;li&gt;No native multi-tenant isolation.&lt;/li&gt;
&lt;li&gt;Requires manual IAM and account setup.&lt;/li&gt;
&lt;li&gt;No onboarding or patching automation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Commercial Runner-as-a-Service&lt;/strong&gt; options offer simple UX, automatic scaling, and vendor-managed maintenance with SLAs, but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High costs at scale.&lt;/li&gt;
&lt;li&gt;Vendor lock-in risks.&lt;/li&gt;
&lt;li&gt;Limited multi-tenant isolation.&lt;/li&gt;
&lt;li&gt;Often don’t meet strict compliance requirements.&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;
  
  
  Where ForgeMT Fits In
&lt;/h1&gt;

&lt;p&gt;ForgeMT combines the best of these approaches to deliver an enterprise-ready platform:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Orchestrates ephemeral runners seamlessly.&lt;/li&gt;
&lt;li&gt;Uses AWS-native identity and network isolation (IAM/OIDC).&lt;/li&gt;
&lt;li&gt;Built-in governance with full lifecycle automation.&lt;/li&gt;
&lt;li&gt;Designed for large, security-focused organizations.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;ForgeMT doesn’t reinvent ARC or EC2 modules but extends them with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strict multi-tenant isolation:&lt;/strong&gt; Each team runs in a separate AWS account to contain blast radius. IAM/OIDC enforces least privilege. Calico CNI manages Kubernetes network segmentation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Full automation:&lt;/strong&gt; Tenant onboarding, runner patching, centralized monitoring, and drift remediation happen automatically, cutting manual toil and errors.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Centralized control plane:&lt;/strong&gt; One dashboard securely manages all tenants with governance, audit logs, and compliance-ready traceability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost optimization:&lt;/strong&gt; Spot instances, warm pools, and autoscaling based on real-time metrics and spot prices reduce costs without sacrificing availability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-source transparency:&lt;/strong&gt; 100% open source—no vendor lock-in, no license fees, full customization freedom.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe53c06e4uen18938cyez.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe53c06e4uen18938cyez.jpg" alt="10k ft view" width="800" height="635"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h1&gt;
  
  
  Architecture Overview
&lt;/h1&gt;

&lt;p&gt;At its core, ForgeMT is a centralized control plane that orchestrates ephemeral runner provisioning and lifecycle management across multiple tenants running on both EC2 and Kubernetes.&lt;/p&gt;
&lt;h2&gt;
  
  
  Key Components
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/github-aws-runners/terraform-aws-github-runner" rel="noopener noreferrer"&gt;Terraform module for EC2 runners&lt;/a&gt; — provisions ephemeral EC2 runners with autoscaling, spot/on-demand, and ephemeral lifecycle.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/actions/actions-runner-controller" rel="noopener noreferrer"&gt;Actions Runner Controller (ARC)&lt;/a&gt; — manages EKS-based runners as Kubernetes pods with tenant namespace isolation.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://opentofu.org/" rel="noopener noreferrer"&gt;OpenTofu + Terragrunt&lt;/a&gt; — Infrastructure as Code managing tenant/account/region deployments declaratively.&lt;/li&gt;
&lt;li&gt;IAM Trust Policies — secure runner access with ephemeral credentials via role assumption.&lt;/li&gt;
&lt;li&gt;Splunk &amp;amp; Observability — centralized logs and metrics per tenant.&lt;/li&gt;
&lt;li&gt;Teleport — secure SSH access to ephemeral runners for auditing and debugging.&lt;/li&gt;
&lt;li&gt;EKS + Calico CNI — scalable pod networking with strong tenant segmentation and minimal IP usage.&lt;/li&gt;
&lt;li&gt;EKS + Karpenter — demand-driven node autoscaling with spot and on-demand instances, plus warm pools.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxf5xdxe5o5vt3f12lx7k.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxf5xdxe5o5vt3f12lx7k.jpg" alt="10k ft view multi tenants" width="800" height="701"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h1&gt;
  
  
  ForgeMT Control Plane
&lt;/h1&gt;

&lt;p&gt;The control plane is the platform’s brain — managing runner provisioning, lifecycle, security, scaling, and observability.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Centralized Orchestration:&lt;/strong&gt; Decides when and where to spin up ephemeral runners (EC2 or Kubernetes pods).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-Tenant Isolation:&lt;/strong&gt; Isolates each tenant via dedicated AWS accounts or Kubernetes namespaces, IAM roles, and network policies.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Enforcement:&lt;/strong&gt; Applies hardened runner configurations, automates ephemeral credential rotation, and enforces least privilege.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scaling &amp;amp; Optimization:&lt;/strong&gt; Integrates with Karpenter and EC2 autoscaling to scale runners up/down with demand and cost awareness.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability &amp;amp; Governance:&lt;/strong&gt; Streams logs and metrics to Splunk; provides audit trails and compliance dashboards.&lt;/li&gt;
&lt;/ol&gt;


&lt;h1&gt;
  
  
  Runner Types and Usage
&lt;/h1&gt;
&lt;h2&gt;
  
  
  Tenant Isolation
&lt;/h2&gt;

&lt;p&gt;Each ForgeMT deployment is single-tenant and region-specific. IAM roles, policies, VPCs, and services are scoped exclusively to that tenant-region pair. This hard boundary prevents cross-tenant access, simplifies compliance, and minimizes blast radius.&lt;/p&gt;
&lt;h2&gt;
  
  
  EC2 Runners
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Ephemeral VMs booted from Forge-provided or tenant-custom AMIs.&lt;/li&gt;
&lt;li&gt;Jobs run directly on VMs or inside containers.&lt;/li&gt;
&lt;li&gt;IAM role assumption replaces static credentials.&lt;/li&gt;
&lt;li&gt;Terminated after each job to avoid drift or leaks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcbtubvwhxfmlrl3fiai.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frcbtubvwhxfmlrl3fiai.jpg" alt="EC2 runner" width="800" height="713"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  EKS Runners
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Managed by ARC as Kubernetes pods in tenant namespaces.&lt;/li&gt;
&lt;li&gt;Images pulled from Forge or tenant ECR repositories.&lt;/li&gt;
&lt;li&gt;Scales dynamically for burst workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9pkf6g69uz87ovz0vrx.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq9pkf6g69uz87ovz0vrx.jpg" alt="EKS runner" width="800" height="703"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Warm Pools and Limits
&lt;/h2&gt;

&lt;p&gt;ForgeMT supports warm pools of pre-initialized runners to minimize cold start latency—especially beneficial for EC2 runners with slower boot times.&lt;/p&gt;

&lt;p&gt;Per-tenant limits enforce:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Max concurrent runners&lt;/li&gt;
&lt;li&gt;Warm pool size&lt;/li&gt;
&lt;li&gt;Runner lifetime (auto-termination after jobs)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These controls prevent resource abuse and keep costs predictable.&lt;/p&gt;


&lt;h1&gt;
  
  
  Tenant Onboarding
&lt;/h1&gt;

&lt;p&gt;Deploying a new tenant is straightforward and fully automated via a single declarative config file, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;gh_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;ghes_url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;
  &lt;span class="na"&gt;ghes_org&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;cisco-open&lt;/span&gt;
&lt;span class="na"&gt;tenant&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;iam_roles_to_assume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;arn:aws:iam::123456789012:role/role_for_forge_runners&lt;/span&gt;
  &lt;span class="na"&gt;ecr_registries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;123456789012.dkr.ecr.eu-west-1.amazonaws.com&lt;/span&gt;
&lt;span class="na"&gt;ec2_runner_specs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;small&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ami_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;forge-gh-runner-v*&lt;/span&gt;
    &lt;span class="na"&gt;ami_owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;123456789012'&lt;/span&gt;
    &lt;span class="na"&gt;ami_kms_key_arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;
    &lt;span class="na"&gt;max_instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;instance_types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t2.small&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t2.medium&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t2.large&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t3.small&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t3.medium&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;t3.large&lt;/span&gt;
    &lt;span class="na"&gt;pool_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;
    &lt;span class="na"&gt;volume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;
      &lt;span class="na"&gt;iops&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;
      &lt;span class="na"&gt;throughput&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;125&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gp3&lt;/span&gt;
  &lt;span class="na"&gt;large&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;ami_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;forge-gh-runner-v*&lt;/span&gt;
    &lt;span class="na"&gt;ami_owner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;123456789012'&lt;/span&gt;
    &lt;span class="na"&gt;ami_kms_key_arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;'&lt;/span&gt;
    &lt;span class="na"&gt;max_instances&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;instance_types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;c6i.8xlarge&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;c5.9xlarge&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;c5.12xlarge&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;c6i.12xlarge&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;c6i.16xlarge&lt;/span&gt;
    &lt;span class="na"&gt;pool_config&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[]&lt;/span&gt;
    &lt;span class="na"&gt;volume&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;
      &lt;span class="na"&gt;iops&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3000&lt;/span&gt;
      &lt;span class="na"&gt;throughput&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;125&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gp3&lt;/span&gt;
&lt;span class="na"&gt;arc_runner_specs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;dind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runner_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;max_runners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
      &lt;span class="na"&gt;min_runners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;scale_set_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dependabot&lt;/span&gt;
    &lt;span class="na"&gt;scale_set_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;dind&lt;/span&gt;
    &lt;span class="na"&gt;container_actions_runner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;123456789012.dkr.ecr.eu-west-1.amazonaws.com/actions-runner:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_requests_cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;500m&lt;/span&gt;
    &lt;span class="na"&gt;container_requests_memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;
    &lt;span class="na"&gt;container_limits_cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
    &lt;span class="na"&gt;container_limits_memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2Gi&lt;/span&gt;
    &lt;span class="na"&gt;volume_requests_storage_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gp2&lt;/span&gt;
    &lt;span class="na"&gt;volume_requests_storage_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10Gi&lt;/span&gt;
  &lt;span class="na"&gt;k8s&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;runner_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;max_runners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
      &lt;span class="na"&gt;min_runners&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
    &lt;span class="na"&gt;scale_set_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s&lt;/span&gt;
    &lt;span class="na"&gt;scale_set_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;k8s&lt;/span&gt;
    &lt;span class="na"&gt;container_actions_runner&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;123456789012.dkr.ecr.eu-west-1.amazonaws.com/actions-runner:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_requests_cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;500m&lt;/span&gt;
    &lt;span class="na"&gt;container_requests_memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;1Gi&lt;/span&gt;
    &lt;span class="na"&gt;container_limits_cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1'&lt;/span&gt;
    &lt;span class="na"&gt;container_limits_memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;2Gi&lt;/span&gt;
    &lt;span class="na"&gt;volume_requests_storage_type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gp2&lt;/span&gt;
    &lt;span class="na"&gt;volume_requests_storage_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ForgeMT platform uses this config to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Provision tenant-specific AWS accounts and resources.&lt;/li&gt;
&lt;li&gt;Set IAM roles with least privilege trust policies.&lt;/li&gt;
&lt;li&gt;Configure GitHub integration and runner specs.&lt;/li&gt;
&lt;li&gt;Enforce tenant limits and runner types.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This automation enables &lt;strong&gt;zero-touch onboarding&lt;/strong&gt; with no manual AWS or GitHub setup required by the tenant.&lt;/p&gt;




&lt;h1&gt;
  
  
  Extensibility
&lt;/h1&gt;

&lt;p&gt;ForgeMT lets tenants customize their environments and control runner access:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Custom AMIs&lt;/strong&gt; for EC2 runners with tenant-specific tooling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Private ECR repositories&lt;/strong&gt; to host container images for VMs or Kubernetes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant IAM roles&lt;/strong&gt; with trust policies so ForgeMT runners assume them securely without static keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced access patterns&lt;/strong&gt; like chained role assumptions or resource-based policies for complex needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This lets each team tune cost, security, and performance independently without affecting core platform stability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas07rit1wmg6ale1zos1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fas07rit1wmg6ale1zos1.jpg" alt=" " width="800" height="323"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  Security Model
&lt;/h1&gt;

&lt;p&gt;ForgeMT’s foundation is strong isolation and ephemeral execution to reduce risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dedicated IAM roles, namespaces, and AWS accounts&lt;/strong&gt; per tenant.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No cross-tenant visibility or access.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral runners&lt;/strong&gt; destroyed immediately after job completion to prevent credential or data leakage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporary credentials via IAM role assumption&lt;/strong&gt; replace static AWS keys.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-grained access control&lt;/strong&gt; configurable by tenants for resource permissions.&lt;/li&gt;
&lt;li&gt;Full audit trail of provisioning, execution, and shutdown logged via CloudWatch → Splunk.&lt;/li&gt;
&lt;li&gt;Meets CIS Benchmarks and internal security policies.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Debugging in a Secure, Ephemeral World
&lt;/h1&gt;

&lt;p&gt;Ephemeral runners mean persistent debugging isn’t possible by design, but ForgeMT offers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Live debugging with Teleport:&lt;/strong&gt; Keep runners alive temporarily via workflow tweaks to enable SSH into running jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reproducible reruns:&lt;/strong&gt; Failed jobs can be rerun identically from GitHub UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Log-based troubleshooting:&lt;/strong&gt; Access runner telemetry, syslogs, and job logs centrally without infrastructure exposure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes support:&lt;/strong&gt; Same debugging mechanisms apply to EKS runners, preserving isolation and auditability.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;ForgeMT is likely overkill for small teams. Start simple with ephemeral runners (EC2 or ARC), GitHub Actions, and Terraform automation. Only scale up when you hit real pain points. ForgeMT shines in multi-team environments where tenant isolation, governance, and platform automation are mission-critical. For solo teams, it just adds unnecessary complexity.&lt;/p&gt;

&lt;p&gt;ForgeMT addresses the major enterprise challenges of running GitHub Actions runners at scale by delivering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Strong multi-tenant isolation&lt;/li&gt;
&lt;li&gt;Fully automated lifecycle management and governance&lt;/li&gt;
&lt;li&gt;Flexible runner types with cost-aware autoscaling and warm pools&lt;/li&gt;
&lt;li&gt;Secure, ephemeral environments that meet compliance needs&lt;/li&gt;
&lt;li&gt;An open-source, extensible platform for customization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For organizations struggling to scale self-hosted runners securely and efficiently on AWS, ForgeMT provides a battle-tested, transparent platform that combines AWS best practices with developer-friendly automation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Dive Into the ForgeMT Project
&lt;/h2&gt;

&lt;p&gt;Ideas are cheap — execution is what counts. ForgeMT’s source code is public — check it out:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/cisco-open/forge/" rel="noopener noreferrer"&gt;https://github.com/cisco-open/forge/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⭐️ If you find it useful, don’t forget to drop a star!&lt;/p&gt;




&lt;h2&gt;
  
  
  🤝 Connect
&lt;/h2&gt;

&lt;p&gt;Let’s connect on &lt;a href="https://www.linkedin.com/in/edersonbrilhante/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://github.com/edersonbrilhante" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>githubactions</category>
      <category>devops</category>
      <category>cicd</category>
      <category>aws</category>
    </item>
    <item>
      <title>Learn how ForgeMT simplifies multi-tenant GitHub Actions runners with security, scalability, and automation. Read the full case study to see how it can streamline your CI/CD pipelines:</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Sat, 17 May 2025 10:14:18 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/learn-how-forgemt-simplifies-multi-tenant-github-actions-runners-with-security-scalability-and-215a</link>
      <guid>https://dev.to/edersonbrilhante/learn-how-forgemt-simplifies-multi-tenant-github-actions-runners-with-security-scalability-and-215a</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735" class="crayons-story__hidden-navigation-link"&gt;ForgeMT: A Scalable, Secure Multi-Tenant GitHub Runner Platform at Cisco&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/edersonbrilhante" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F606500%2F67cc9470-d75a-4b86-bb9f-07329fb2558a.jpeg" alt="edersonbrilhante profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/edersonbrilhante" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Ederson Brilhante
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Ederson Brilhante
                
              
              &lt;div id="story-author-preview-content-2493311" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/edersonbrilhante" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F606500%2F67cc9470-d75a-4b86-bb9f-07329fb2558a.jpeg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Ederson Brilhante&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;May 16 '25&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735" id="article-link-2493311"&gt;
          ForgeMT: A Scalable, Secure Multi-Tenant GitHub Runner Platform at Cisco
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/terraform"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;terraform&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/devops"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;devops&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/platformengineering"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;platformengineering&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/githubactions"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;githubactions&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;10&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              1&lt;span class="hidden s:inline"&gt; comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            14 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>terraform</category>
      <category>devops</category>
      <category>platformengineering</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>ForgeMT: A Scalable, Secure Multi-Tenant GitHub Runner Platform at Cisco</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Fri, 16 May 2025 08:55:41 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735</link>
      <guid>https://dev.to/edersonbrilhante/forgemt-a-scalable-secure-multi-tenant-github-runner-platform-at-cisco-735</guid>
      <description>&lt;h2&gt;
  
  
  🧭 Why ForgeMT Exists
&lt;/h2&gt;

&lt;p&gt;ForgeMT is a centralized platform that enables engineering teams to run GitHub Actions securely and efficiently — without building or managing their own CI infrastructure.&lt;/p&gt;

&lt;p&gt;It provides ephemeral runners (EC2 or Kubernetes), strict tenant isolation, and full automation behind a hardened, shared control plane.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before ForgeMT&lt;/strong&gt;, every team in Cisco’s Security Business Group had to build and maintain their own CI setup — leading to duplicated effort, inconsistent security, slow onboarding, and rising operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ForgeMT&lt;/strong&gt; replaced this fragmented approach with a secure, scalable, multi-tenant platform — saving time, reducing risk, and accelerating adoption.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚡ Fast Facts (ForgeMT Impact)
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;⏱️ 80+ engineering hours saved/month per team&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;📦 40,000+ GitHub Actions jobs/month&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;✅ 99.9% success rate across tenants&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;This post explains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;🚀 Why ForgeMT was needed&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;💼 What impact it had&lt;/strong&gt; — From reliability to cost savings and security compliance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧱 How it works&lt;/strong&gt; - Deep Dive into Architecture&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Jump to what matters most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;💼 Business Impact&lt;/strong&gt; – For leadership and stakeholders&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🏗️ Architecture&lt;/strong&gt; – For platform engineers and DevOps&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧠 Or keep reading&lt;/strong&gt; for full technical context and background&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🚨 From Fragmented CI to Scalable, Secure Solutions: The Journey to ForgeMT
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;Credit: Prototype by Matthew Giassa - MASc, EIT —who championed the Philips Labs GitHub Runner module across multiple teams.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Before ForgeMT, each team used its own CI stack—Jenkins, Travis, or Concourse. While these tools met local needs, they created long-term issues: inconsistent patching, security gaps, and poor scalability.&lt;/p&gt;

&lt;p&gt;Matthew built a promising PoC, but it was a siloed setup with manual AWS, GitHub, and Terraform steps. Rigid subnetting caused IPv4 exhaustion, and teams copy-pasting Terraform modules led to high maintenance overhead and config drift.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s3dw0mqyuv1kpi2g553.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2s3dw0mqyuv1kpi2g553.jpeg" alt="Before: Siloed Team Runners" width="800" height="653"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To address this complexity, I drove the end‑to‑end technical design and implementation of ForgeMT&lt;/strong&gt;—a centralized, multi‑tenant GitHub Actions runner service on AWS—while coordinating with infrastructure, security, and platform stakeholders to ensure a smooth production launch.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69k0jhzeo620078h7ita.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69k0jhzeo620078h7ita.png" alt="After: 10k feet view" width="800" height="782"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At scale, teams were running thousands of Actions jobs across dozens of isolated environments—each with its own patch cadence, network quirks, and IAM policies. ForgeMT unifies these into a single control plane, delivering consistent security, predictable performance, and dramatically simplified operations.&lt;/p&gt;

&lt;p&gt;For detailed business impact metrics (time saved, reliability gains, cost optimization), see the Business Impact section.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It builds on proven ephemeral EC2 and EKS/ARC runner modules, adding:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;IAM/OIDC-based tenant isolation&lt;/li&gt;
&lt;li&gt;Built-in observability (metrics, logs, dashboards)&lt;/li&gt;
&lt;li&gt;Automation for patching, Terraform drift, repo onboarding, and global Actions locks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By consolidating infrastructure into a hardened control plane, ForgeMT ensured that security and compliance were at the forefront while enabling rapid onboarding, eliminating manual patching, and solving IPv4 exhaustion. This was achieved by scaling pod-based runners via EKS + Calico CNI, with a strong focus on tenant isolation, IAM roles, and security groups (SG) to control access. The hardened control plane preserved the security and flexibility of the original prototype, delivering a secure, compliant, and scalable platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71lzhc49vpvee1lutwr5.jpeg" alt="After: Centralized ForgeMT Control Plane" width="800" height="782"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  📊 Business Impact
&lt;/h2&gt;

&lt;p&gt;ForgeMT has not only met the demands of various teams but also helped scale securely under Cisco’s guidance, optimizing cloud spend and increasing reliability across all stakeholders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dramatic time savings (80 + hours/month per team):&lt;/strong&gt; By automating every aspect of runner lifecycle—OS patching, Terraform module updates, ephemeral provisioning, and even repository registration—teams were freed from manual CI maintenance and could refocus on shipping features.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized cloud spend:&lt;/strong&gt; Spot and On-demand Instances, right‑sized instance selection per job type, and EKS + Calico’s IP‑efficient networking cut infrastructure costs without slowing builds. ForgeMT also supports warm instance pools for high-frequency jobs, avoiding cold starts when speed is critical—striking a smart balance between performance and cost.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rock‑solid reliability (99.9% success over 40K+ jobs/month):&lt;/strong&gt; Centralizing infrastructure eliminated snowflake environments and drift, reducing job failures caused by misconfiguration or stale runners to near zero.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise‑grade security &amp;amp; compliance:&lt;/strong&gt; IAM/OIDC per‑tenant isolation, CIS‑benchmarked AMIs, and end‑to‑end logging into Splunk ensured every action was auditable, vault‑grade credentials were never exposed, and internal audits passed with zero findings.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;True multi‑tenancy at scale:&lt;/strong&gt; Teams retain autonomy over AMIs, ECRs, and workflow definitions while ForgeMT transparently handles networking, isolation, and autoscaling—supporting dozens of teams without additional IP consumption or operational overhead.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS account isolation per tenant:&lt;/strong&gt; Each tenant can have one or more individual AWS accounts, with full control over their own network setup. This includes the flexibility to configure internal or public subnets within their AWS accounts, ensuring strong security boundaries and independent resource management without ForgeMT managing their network.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together, these outcomes turned a fractured, high‑toil CI landscape into a self‑service platform that scales securely, reduces costs, and accelerates delivery.&lt;/p&gt;




&lt;h2&gt;
  
  
  ⚙️ ForgeMT Architecture Overview
&lt;/h2&gt;

&lt;h3&gt;
  
  
  📦 Core Components &amp;amp; Technical Foundations
&lt;/h3&gt;

&lt;p&gt;These results are enabled by the following technical components:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/github-aws-runners/terraform-aws-github-runner" rel="noopener noreferrer"&gt;Terraform module for EC2 runners:&lt;/a&gt; Utilized as a Terraform module to provision ephemeral EC2-based GitHub Actions runners, supporting auto-scaling and cost optimization by using AWS spot and on-demand instances. This setup ensures that runners are created on-demand and terminated after use, aligning with the ephemeral nature of ForgeMT's infrastructure. &lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/actions/actions-runner-controller" rel="noopener noreferrer"&gt;ARC (Actions Runner Controller)&lt;/a&gt;: Employed to manage EKS-based GitHub Actions runners, enabling containerized, isolated job execution via Kubernetes. This approach leverages Kubernetes' orchestration capabilities for efficient scaling and management of CI/CD workloads.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://opentofu.org/" rel="noopener noreferrer"&gt;OpenTofu&lt;/a&gt; + &lt;a href="https://terragrunt.gruntwork.io/" rel="noopener noreferrer"&gt;Terragrunt&lt;/a&gt;: Implemented for Infrastructure as Code (IaC), ensuring region-, account-, and tenant-specific infrastructure deployments with DRY (Don't Repeat Yourself) principles. This methodology facilitates consistent and repeatable infrastructure provisioning across multiple environments.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/blogs/security/how-to-use-trust-policies-with-iam-roles/" rel="noopener noreferrer"&gt;IAM Trust Policies&lt;/a&gt;: Adopted to secure runner access using short-lived credentials via IAM roles and trust relationships, eliminating the need for static credentials and enhancing security.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://www.splunk.com/en_us/download/o11y-cloud-free-trial.html" rel="noopener noreferrer"&gt;Splunk Cloud&lt;/a&gt; &amp;amp; &lt;a href="https://www.splunk.com/en_us/download/o11y-cloud-free-trial.html" rel="noopener noreferrer"&gt;O11y(Observability)&lt;/a&gt;: Integrated for centralized logging and metrics aggregation, providing real-time observability across ForgeMT components. This setup enables detailed telemetry, including per-tenant dashboards for monitoring resource usage and optimization insights.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://goteleport.com/" rel="noopener noreferrer"&gt;Teleport&lt;/a&gt;: Utilized to provide secure, auditable SSH access to EC2 runners and Kubernetes pods, enhancing compliance, access control, and auditing capabilities.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/eks/" rel="noopener noreferrer"&gt;EKS&lt;/a&gt; + &lt;a href="https://docs.tigera.io/calico/latest/about/" rel="noopener noreferrer"&gt;Calico CNI&lt;/a&gt;: Leveraged to scale pod provisioning without consuming additional VPC IPs, utilizing Calico's efficient networking. This setup ensures tenant isolation and optimizes network resource usage within limited VPC subnets.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://aws.amazon.com/eks/" rel="noopener noreferrer"&gt;EKS&lt;/a&gt; + &lt;a href="https://karpenter.sh/" rel="noopener noreferrer"&gt;Karpenter&lt;/a&gt;: Enables dynamic, demand-driven autoscaling of Kubernetes worker nodes. Automatically provisions the most suitable and cost-effective EC2 instance types based on real-time pod requirements. Supports spot and on-demand capacity, prioritizing efficiency and performance. Warm pools can be configured to reduce cold start latency while maintaining cost control—ideal for high-churn CI/CD workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These technologies form the backbone of ForgeMT, enabling its robust performance and scalability.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧠 ForgeMT Control Plane (Managed by Forge Team)
&lt;/h3&gt;

&lt;p&gt;The ForgeMT control plane hosts shared infrastructure and reusable IaC modules:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;ForgeMT GitHub App:&lt;/strong&gt; Installed on tenant repositories to listen for GitHub workflow events and dynamically register ephemeral runners.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ForgeMT AMIs &amp;amp; Forge ECR:&lt;/strong&gt; Default base images for runners (VMs and containers).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Terraform Modules:&lt;/strong&gt; Each tenant-region pair deploys an isolated ForgeMT instance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;API Gateway + Lambda:&lt;/strong&gt; Processes GitHub webhook jobs to trigger runner provisioning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Centralized Logging:&lt;/strong&gt; Runner logs are forwarded to CloudWatch, then into Splunk Cloud Platform.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Centralized Observability:&lt;/strong&gt; All AWS metrics are sent to Splunk O11y Cloud&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Teleport:&lt;/strong&gt; Secure, role-based SSH access to VM runners (if needed), with session logging.&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  🏗️ Tenant Isolation
&lt;/h3&gt;

&lt;p&gt;Each ForgeMT deployment is dedicated to a single tenant, ensuring full isolation within a specific AWS region. This approach guarantees that IAM roles, policies, services, and AWS resources are scoped uniquely for each tenant-region pair, enforcing strict security, compliance, and minimizing the blast radius.&lt;/p&gt;




&lt;h3&gt;
  
  
  💻 Runner Types
&lt;/h3&gt;

&lt;h4&gt;
  
  
  🧱 AWS EC2-Based Runners (VM and Metal)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral Runner Provisioning:&lt;/strong&gt; EC2 runners are provisioned using Forge-provided AMIs or tenant-specific custom AMIs. These instances are pre-configured with the necessary tools to execute CI/CD jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Workload Execution:&lt;/strong&gt; Jobs can be executed directly on the EC2 instance or via containers, using container: blocks in GitHub workflows.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security:&lt;/strong&gt; Authentication to tenant AWS resources is handled through IAM roles and trust policies, eliminating the need for static credentials and ensuring dynamic, secure access control.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ephemeral Nature:&lt;/strong&gt; Once a job is completed, the EC2 instance is terminated, maintaining a completely stateless environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11dsb73l5s74zu4jgidq.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F11dsb73l5s74zu4jgidq.jpeg" alt="Forge Control Plane for AWS EC2 Runners&amp;lt;br&amp;gt;
" width="800" height="849"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  ☸️ EKS-Based Runners (Kubernetes)
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes-Orchestrated Actions:&lt;/strong&gt; Using the Actions Runner Controller (ARC), EKS runners are provisioned as pods within an Amazon EKS cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Resource Isolation:&lt;/strong&gt; Each tenant is assigned a dedicated namespace, service account, and IAM role, ensuring strict isolation of resources and permissions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Container Images:&lt;/strong&gt; Runners can pull container images from either the Forge ECR or the tenant’s own ECR, depending on the configuration.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scalability:&lt;/strong&gt; EKS is ideal for high-scale operations, leveraging Kubernetes' orchestration capabilities to manage the lifecycle of runners efficiently.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy234v7u4zcqkhut34yn.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpy234v7u4zcqkhut34yn.jpeg" alt="Forge Control Plane for K8S Pod Runners" width="800" height="674"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  🔁 Warm Pool
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reducing Startup Latency:&lt;/strong&gt; An optional warm pool can be configured for both EC2 and EKS runners, pre-initializing instances or pods to reduce waiting times during high demand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Importance for EKS:&lt;/strong&gt; For EKS runners, the need for warm pools is significantly reduced, as Kubernetes already provides rapid scaling and efficient pod initialization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Usage in EC2:&lt;/strong&gt; The warm pool helps minimize the initialization time for EC2 instances, resulting in faster job execution times for critical tasks.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  💻 Examples of Runner Types in ForgeMT
&lt;/h3&gt;

&lt;p&gt;ForgeMT offers flexibility for tenants to configure multiple runner types simultaneously, adapting to their workload needs. Each tenant can define as many runners as needed, with a parallelism limit set per tenant and runner type. Here are some typical runner examples and their use cases:&lt;/p&gt;
&lt;h4&gt;
  
  
  🧱 EC2 Runners
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Small:&lt;/strong&gt; Lightweight instances for tasks with minimal resource usage, such as quick tests or linting.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standard:&lt;/strong&gt; Instances for balanced workloads, ideal for code compilation or integration tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Large:&lt;/strong&gt; High-performance instances for tasks requiring more processing power, such as complex builds or load tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bare Metal:&lt;/strong&gt; Bare-metal instances for applications that need full control over the hardware, such as simulations or intensive processing tasks.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  ☸️ Kubernetes Runners
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dependabot:&lt;/strong&gt; Used for automated dependency update jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Light (k8s):&lt;/strong&gt; Runners for simple tasks that don't require Docker, like linting or unit test execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docker-in-Docker (DinD):&lt;/strong&gt; Used for jobs that require Docker inside Kubernetes, such as image building or integration tests involving containers.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  🔄 Configurable Parallelism per Tenant and Runner Type
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;Each &lt;strong&gt;tenant&lt;/strong&gt; can configure their own set of runners and use different EC2 instance types or Kubernetes pods simultaneously.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;parallelism limit&lt;/strong&gt; can be configured per runner type and tenant, ensuring that running multiple jobs does not overload resources.&lt;/li&gt;
&lt;li&gt;This allows each team to run jobs in parallel based on their needs without impacting the performance of other tenants or jobs.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Considerations
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Choosing the Right Runner:&lt;/strong&gt; Depending on workload complexity and job requirements, you may choose EC2 or EKS runners. EKS is generally preferred for lightweight, scalable workloads, while EC2 may be necessary for jobs with specific hardware or memory requirements.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  ⚙️ GitHub Integration
&lt;/h3&gt;

&lt;p&gt;GitHub events trigger ForgeMT through a webhook via API Gateway, dynamically registering runners into the appropriate GitHub Runner Groups associated with the tenant. The runner lifecycle is designed to be ephemeral: runners are registered just-in-time for job execution and are destroyed once the job is completed. When a new repository is installed, it is automatically registered with the correct GitHub Runner Group, ensuring seamless integration with the right tenant's runners.&lt;/p&gt;
&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy1nhvzmwcst3cngxv9h4.jpeg" alt="Github Integration Forge Control Plane" width="800" height="1314"&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  🔌 Extensibility
&lt;/h3&gt;

&lt;p&gt;Each tenant account can optionally manage the following resources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Tenant AMIs (for AWS EC2 runners):&lt;/strong&gt; Custom-built images with pre-installed tooling tailored to the tenant's specific requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant ECR:&lt;/strong&gt; Houses custom container images used for VM-based container jobs, GitHub composite actions, or full pod images in EKS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant IAM Role:&lt;/strong&gt; Configured with trust relationships to allow ForgeMT runners to securely assume roles without the need for AWS access keys.&lt;/li&gt;
&lt;li&gt;ForgeMT offers flexibility for teams to customize their runners according to their specific needs. If a tenant requires a custom Amazon Machine Image (AMI) or container image, it is their responsibility to build and maintain it. We provide a base image to get them started, but the final configuration is under their control. Once the custom image is ready, it can be shared with our accounts and integrated into the ForgeMT platform, enabling the team to meet their unique requirements.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  🔄 Optional Configurations
&lt;/h4&gt;

&lt;p&gt;Tenants can choose to configure the following based on their specific needs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Accessing AWS Resources via Runners:&lt;/strong&gt; To enable runners to interact with AWS services within the tenant's account, an IAM role must be established with a trust relationship permitting ForgeMT to assume it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pulling Images from Tenant ECR:&lt;/strong&gt; If runners need to pull images from the tenant's ECR—be it for container jobs, composite actions, or Kubernetes pods—the tenant must configure appropriate repository policies and IAM permissions to allow these operations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accessing Additional Tenant Resources:&lt;/strong&gt; For runners to access other AWS resources within the tenant's account, the IAM role assumed by ForgeMT must have policies granting the necessary permissions. This might involve setting up a chain of role assumptions or defining specific resource-based policies.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8i4l0u09vehborjbdtdk.jpeg" alt="Tenant's AWS Account Integration with ForgeMT Control Plane" width="800" height="368"&gt;
&lt;/h2&gt;
&lt;h2&gt;
  
  
  📊 Observability: Splunk Cloud &amp;amp; O11y
&lt;/h2&gt;

&lt;p&gt;ForgeMT delivers full-stack observability with centralized logging and per-tenant metrics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Centralized logging:&lt;/strong&gt; All relevant logs — syslog, AWS EC2 user data, GitHub runner job logs, worker logs, and agent logs — are sent to CloudWatch Logs and forwarded to Splunk Cloud for full visibility and auditability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metrics via Splunk O11y:&lt;/strong&gt; Captures detailed telemetry.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-tenant dashboards:&lt;/strong&gt; Each team gets dedicated dashboards showing cost breakdowns, resource usage, and optimization insights (e.g., high-memory job detection).&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdll4w5y6ze3qonrl0k3w.jpeg" alt="ForgeMT Control Plane integration with Splunk Cloud" width="800" height="613"&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  🔐 Security Model
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Strong tenant isolation:&lt;/strong&gt; Every tenant has its own IAM roles,  namespaces, and resources.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM Role Assumption:&lt;/strong&gt; Eliminates use of long-lived AWS credentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No cross-tenant visibility:&lt;/strong&gt; Runners cannot access other tenant workloads or secrets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fine-grained access control:&lt;/strong&gt; Each tenant defines what their runners can access by configuring the IAM role being assumed—this can include direct resource access or chained role assumptions for more advanced patterns.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔒 Ephemeral Isolation:&lt;/strong&gt; ForgeMT runners are automatically destroyed after every job — success or failure. This guarantees a clean slate every time, eliminates environment drift, blocks credential persistence, and prevents resource leaks by default.&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  🛡️ Compliance &amp;amp; Observability
&lt;/h3&gt;

&lt;p&gt;ForgeMT ensures strict compliance and security throughout the lifecycle of its ephemeral runners, from provisioning to execution and shutdown.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Full Audit Trail:&lt;/strong&gt; Every runner lifecycle event — including provisioning, execution, and shutdown — is logged, ensuring complete visibility and traceability for compliance audits. This audit trail is vital for maintaining transparency in high-security environments.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CloudWatch → Splunk Integration:&lt;/strong&gt; Logs from the runners are forwarded from CloudWatch to Splunk, enabling teams to perform real-time queries on logs. This integration supports compliance audits by providing detailed, queryable logs that can be easily reviewed and accessed for regulatory requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;IAM Integration:&lt;/strong&gt; By using IAM (Identity and Access Management), ForgeMT eliminates the use of hardcoded credentials or AWS long-term access keys. This significantly reduces the risk of unauthorized access and enhances security by enforcing role-based access and temporary credentials that follow the principle of least privilege.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security Standards Compliance:&lt;/strong&gt; ForgeMT meets internal security standards, which are aligned with industry best practices such as CIS Benchmarks. This ensures that the platform adheres to rigorous security controls and provides a secure environment for multi-tenant workloads.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🔍 Debugging Securely and Effectively
&lt;/h3&gt;

&lt;p&gt;ForgeMT offers teams the option to choose between EC2 Spot Instances and On-Demand Instances, allowing for flexibility in cost optimization. While Spot Instances can provide significant cost savings, they come with the inherent risk that AWS may reclaim the instance at any time. Teams are responsible for evaluating this risk and determining whether to use Spot or On-Demand Instances based on the criticality of their workloads.&lt;/p&gt;

&lt;p&gt;Given ForgeMT's design of ephemeral runners, which are terminated immediately after each job to prevent state persistence and credential leakage, debugging presents unique challenges. However, the platform offers robust solutions to address these challenges.&lt;/p&gt;

&lt;p&gt;For real-time debugging, developers can access running jobs via &lt;strong&gt;Teleport&lt;/strong&gt;. By including a sleep step in the workflow or using a custom wrapper, the runner can be kept alive temporarily. This allows for manual inspection and troubleshooting while the job is still running.&lt;/p&gt;

&lt;p&gt;Additionally, even without live access, ForgeMT maintains comprehensive observability. Teams can rely on syslogs, GitHub Actions job logs, and runner-level telemetry to understand job behavior. Every job runs in a fully reproducible environment, meaning developers can simply rerun failed jobs through the GitHub UI, replicating the exact conditions without side effects while maintaining full auditability.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;Kubernetes-based runners&lt;/strong&gt;, the same debugging approach applies: &lt;strong&gt;Teleport&lt;/strong&gt; can be used for live access to running jobs. The integration with Kubernetes allows teams to extend the same debugging capabilities while leveraging the scalability and flexibility of the containerized environment.&lt;/p&gt;


&lt;h3&gt;
  
  
  🚀 ForgeMT: Powering Tenants with Flexibility and Control
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;💥 Ephemeral by design —&lt;/strong&gt; Runners are created per job and disappear afterward. No drift. No patching. No residual garbage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🛠️ Infra-as-Code from top to bottom —&lt;/strong&gt; Fully automated. Declarative. Version-controlled. No snowflakes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🔐 Strong isolation baked in —&lt;/strong&gt; IAM, OIDC, and security group segmentation per tenant. No cross-tenant blast radius.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📦 Run anything, per tenant —&lt;/strong&gt; EC2 or EKS. k8s, dind, or metal. Each tenant defines their own mix.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🚦 Control usage at scale —&lt;/strong&gt; Enforce parallelism limits per tenant/type. No surprises. No abuse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🕹️ Custom policies, zero effort —&lt;/strong&gt; Tenants define autoscaling, labels, and configurations via GitHub — no AWS skills required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧘 No infra for tenants to manage —&lt;/strong&gt; No patching, no VPCs, no accounts. Just push code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🕵️ Observability without ownership —&lt;/strong&gt; Logs, metrics, and traces exposed per tenant. No nodes to babysit.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;⚡ Fast time-to-first-run —&lt;/strong&gt; Cold starts optimized. Most runners boot in &amp;lt;20s, even for large jobs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🌎 Network-aware provisioning —&lt;/strong&gt; Runners automatically deploy into the correct subnet, zone, or region.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;📊 Usage-aware scaling —&lt;/strong&gt; Instance types are selected based on cost/performance tradeoffs — no more overprovisioning by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🧩 GitHub-native workflows —&lt;/strong&gt; No toolchain rewrites required. Just drop in the runs-on labels and go.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;🚫 No global queues —&lt;/strong&gt; Each tenant is scoped, isolated, and throttled independently.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgmluht56f3mhucb6goz.jpeg" alt="ForgeMT Control Plane and all the integrations" width="800" height="782"&gt;
&lt;/h2&gt;
&lt;h3&gt;
  
  
  🛠️ Implementation &amp;amp; Adoption
&lt;/h3&gt;

&lt;p&gt;It took about 2 months to evolve from a single-tenant, EC2-only setup into a fully multi-tenant platform. Highlights:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🔹 *&lt;em&gt;Kubernetes support *&lt;/em&gt;— with Calico CNI + Karpenter&lt;/li&gt;
&lt;li&gt;🔹 &lt;strong&gt;Tenant isolation&lt;/strong&gt; by design&lt;/li&gt;
&lt;li&gt;🔹 &lt;strong&gt;Per-tenant automation&lt;/strong&gt; &amp;amp; base images&lt;/li&gt;
&lt;li&gt;🔹 &lt;strong&gt;EKS pod identity&lt;/strong&gt; for secure access&lt;/li&gt;
&lt;li&gt;🔹 Integrated with &lt;strong&gt;Teleport, Splunk&lt;/strong&gt;, and full observability&lt;/li&gt;
&lt;li&gt;🔹 Custom dashboards with enriched telemetry&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  🚀 Frictionless Adoption
&lt;/h3&gt;

&lt;p&gt;Onboarding was dead simple.&lt;/p&gt;

&lt;p&gt;For most tenants, switching to ForgeMT meant updating &lt;strong&gt;just the&lt;/strong&gt; runs-on label in their GitHub Actions workflows — ⚡ No rewrites. No migrations. No downtime.&lt;/p&gt;

&lt;p&gt;For teams that required deeper isolation, assuming their own IAM role was just as straightforward:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- name: Configure AWS Credentials
  uses: aws-actions/configure-aws-credentials@v4
  with:
    role-to-assume: arn:aws:iam::&amp;lt;tenant-account&amp;gt;:role/&amp;lt;role-name&amp;gt;
    aws-region: &amp;lt;aws-region&amp;gt;
    role-duration-seconds: 900

- name: Example
  run: aws cloudformation list-stacks
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;💡 This approach made adoption &lt;strong&gt;fast, safe, and low-friction&lt;/strong&gt; — even for teams skeptical of platform changes.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚫 Overkill Warning: When ForgeMT Is Too Much
&lt;/h2&gt;

&lt;p&gt;If you're a small team, ForgeMT might be overkill. Start with the basics: ephemeral runners (EC2 or ARC), GitHub Actions, and Terraform automation. Scale up only when you hit real pain. ForgeMT shines in multi-team setups where governance, tenant isolation, and platform automation matter. For solo teams, it may just add complexity you don’t need.&lt;/p&gt;




&lt;h2&gt;
  
  
  🔭 What’s Next
&lt;/h2&gt;

&lt;p&gt;I’m currently focused on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Cost-aware scheduling&lt;/strong&gt; — Prioritizing jobs based on real-time pricing and instance efficiency, optimizing for performance while reducing costs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dynamic autoscaling&lt;/strong&gt; — Moving from static warm pool rules to a more responsive, metrics-driven approach that adapts to the bursty nature of GitHub Actions workloads.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deeper observability&lt;/strong&gt; — Integrating GitHub metrics for actionable insights that drive optimized runner performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI-driven scaling optimization&lt;/strong&gt; — Leveraging historical data to predict workload demands, optimize resource allocation, and automate scaling decisions based on both performance and cost metrics.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re tackling similar problems — or looking to adopt, extend, or contribute to ForgeMT — let’s talk. I’m always open to collaborating with engineers building serious DevSecOps infrastructure.&lt;/p&gt;




&lt;h2&gt;
  
  
  🧪 Dive Into the ForgeMT Project
&lt;/h2&gt;

&lt;p&gt;Ideas are cheap — execution is everything. The ForgeMT source code is now publicly available — check it out:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://github.com/cisco-open/forge/" rel="noopener noreferrer"&gt;https://github.com/cisco-open/forge/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⭐️ Don’t forget to give it a star ;)!&lt;/p&gt;




&lt;h2&gt;
  
  
  ✍️ In Short
&lt;/h2&gt;

&lt;p&gt;ForgeMT emerged from real-world CI pain at enterprise scale. What began as a prototype to fix local inefficiencies has grown into a secure, multi-tenant, production-grade runner platform. I’m sharing this so others can skip the trial-and-error and build smarter from the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  🤝 Connect
&lt;/h2&gt;

&lt;p&gt;Let’s connect on &lt;a href="https://www.linkedin.com/in/edersonbrilhante/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt; and &lt;a href="https://github.com/edersonbrilhante" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;Always happy to trade notes with like-minded builders.&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.linkedin.com/pulse/forge-scalable-secure-multi-tenant-github-runner-brilhante--fyxbf" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>terraform</category>
      <category>devops</category>
      <category>platformengineering</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Decoding the Myth of 'Junior' in DevOps and SRE: Navigating Challenges and Cultivating Expertise</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Tue, 16 Jan 2024 15:38:01 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/decoding-the-myth-of-junior-in-devops-and-sre-navigating-challenges-and-cultivating-expertise-4bmk</link>
      <guid>https://dev.to/edersonbrilhante/decoding-the-myth-of-junior-in-devops-and-sre-navigating-challenges-and-cultivating-expertise-4bmk</guid>
      <description>&lt;p&gt;In my view, assigning roles such as &lt;strong&gt;'Junior DevOps'&lt;/strong&gt; and &lt;strong&gt;'Junior SRE (Site Reliability Engineer)'&lt;/strong&gt; seems impractical, reminiscent of labeling someone an &lt;strong&gt;'Entry-Level Software Architect.'&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Navigating the intricate landscape
&lt;/h2&gt;

&lt;p&gt;Navigating the intricate landscape of DevOps and SRE demands proficiency in &lt;strong&gt;coding, networking, cloud technologies, security, and system administration.&lt;/strong&gt; Envisioning someone with limited experience adeptly maneuvering through this multifaceted skill set poses a significant challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Software Architect analogy
&lt;/h2&gt;

&lt;p&gt;Similarly, giving the title &lt;strong&gt;"Software Architect"&lt;/strong&gt; to beginners doesn't align with the intricate demands of the role. Crafting sophisticated software solutions requires years of practical experience, involving intricate system design and understanding. Expecting a junior engineer to architect and implement a secure, scalable microservices architecture without in-depth knowledge and experience in the design principles of distributed systems is unrealistic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantity vs. Experience fallacy
&lt;/h2&gt;

&lt;p&gt;Furthermore, the belief that numerous junior roles collectively can achieve the same level of effectiveness as a seasoned professional echoes the fallacy of favoring &lt;strong&gt;quantity over experience.&lt;/strong&gt; While each junior role contributes to the team's growth, the efficiency and strategic thinking of an experienced architect often outpace the combined efforts of multiple entry-level professionals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pressure on companies
&lt;/h2&gt;

&lt;p&gt;In addition, the pressure on companies to leverage the benefits of DevOps and SRE roles within their organization often stems from the growing need for seamless integration between development and operations. Individuals in these positions are expected to possess a profound understanding of both coding and operations, creating a unique blend of skills. Unfortunately, finding professionals who embody this multidisciplinary expertise is a formidable challenge. Those who can seamlessly bridge the gap between traditional sysadmins and developers are not only rare but also come at a premium, given the scarcity of individuals with such comprehensive skills in the overall job market.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scarcity leading to desperation
&lt;/h2&gt;

&lt;p&gt;This scarcity sometimes leads companies to consider entry-level candidates, hoping to quickly train them to fill the void. However, the complex nature of the disciplines touched upon by DevOps and SRE roles means that becoming proficient in each area takes &lt;strong&gt;years of hands-on experience.&lt;/strong&gt; The high demand and limited supply of individuals with these multifaceted skills contribute to the desperation companies feel in recruiting for these roles.&lt;/p&gt;

&lt;h2&gt;
  
  
  Acknowledging the shortage
&lt;/h2&gt;

&lt;p&gt;Acknowledging this shortage is crucial, especially as it extends beyond DevOps and SRE roles to other senior positions. Over the past two decades, the industry has witnessed a trend of companies poaching professionals from one another rather than investing in training new talents. This cycle has created a snowball effect, further exacerbating the shortage of skilled individuals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solution: Attracting seasoned developers
&lt;/h2&gt;

&lt;p&gt;A potential solution lies in attracting seasoned developers with a penchant for infrastructure and operations to transition into roles in DevOps and SRE. These individuals often bring a wealth of experience, having naturally acquired knowledge in areas beyond coding, such as security, infrastructure, databases, and operations. Their diverse skill set aligns with the demands of contemporary senior developers who are expected to possess expertise beyond language-specific coding skills. By encouraging such transitions, companies can tap into a pool of experienced professionals and mitigate the challenges associated with the scarcity of multidisciplinary talent in the market.&lt;/p&gt;

&lt;h2&gt;
  
  
  Recommended pathway for aspiring professionals
&lt;/h2&gt;

&lt;p&gt;For aspiring professionals entering the tech industry, a recommended pathway involves starting as a developer before venturing into the multifaceted realms of DevOps and SRE. Beginning as a developer allows individuals to hone their coding skills and gain a solid foundation in software engineering principles. As they accumulate experience and familiarity with the development lifecycle, they can then gradually navigate towards operations, infrastructure, and other related disciplines. This gradual journey not only provides a comprehensive understanding of the intricacies of both coding and operations but also allows individuals to develop a deeper appreciation for the challenges addressed by DevOps and SRE roles. This approach acknowledges the value of hands-on experience and ensures that individuals entering these dynamic fields are well-equipped to contribute meaningfully to the integration of development and operations within an organization.&lt;/p&gt;

</description>
      <category>beginners</category>
      <category>devops</category>
      <category>discuss</category>
      <category>career</category>
    </item>
    <item>
      <title>Combining Packer, QEMU, Ubuntu Cloud Images, and Ansible</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Wed, 14 Jun 2023 20:50:24 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/combining-packer-qemu-ubuntu-cloud-images-and-ansible-47eg</link>
      <guid>https://dev.to/edersonbrilhante/combining-packer-qemu-ubuntu-cloud-images-and-ansible-47eg</guid>
      <description>&lt;p&gt;Hello everyone! I want to share a current use case at my company where I have the opportunity to work with &lt;a href="https://www.packer.io/"&gt;Packer&lt;/a&gt;, &lt;a href="https://www.qemu.org/"&gt;QEMU&lt;/a&gt;, &lt;a href="https://www.ansible.com/"&gt;Ansible&lt;/a&gt; and &lt;a href="https://cloud-images.ubuntu.com/"&gt;Ubuntu Cloud Images&lt;/a&gt; leveraging the concept of Infrastructure as Code (IaC).&lt;/p&gt;

&lt;p&gt;Infrastructure as Code (IaC) is a software engineering practice that enables the management and provisioning of infrastructure resources through code. Instead of manually configuring servers and infrastructure components, IaC allows you to define your desired infrastructure state using declarative or imperative code. It brings automation, version control, and consistency to infrastructure management.&lt;/p&gt;

&lt;p&gt;In our case, we utilize Packer, which is a powerful tool falling under the umbrella of IaC. Packer enables the creation of identical machine images for multiple platforms, such as virtual machines, containers, or cloud instances. With Packer, we define the configuration of our desired machine image, including the operating system, software stack, and customizations, all through code. Packer then automates the process of building these machine images, ensuring consistency and reproducibility.&lt;/p&gt;

&lt;p&gt;To further enhance our image-building process, we integrate Ansible as the provisioner for Packer. Ansible is an open-source automation tool that enables the configuration and management of systems through simple, human-readable YAML files. With Ansible, we define the desired state of our machine image, including the installation of packages, configuration files, and any other necessary setup steps. Ansible seamlessly integrates with Packer, allowing us to provision our machine image with ease.&lt;/p&gt;

&lt;p&gt;In our deployments, we rely on Ubuntu images to meet our diverse cloud computing needs. Ubuntu offers three types of images: live, server, and cloud. Live images provide a fully functional Ubuntu desktop environment that can be run directly from a USB drive or DVD without the need for installation, while server images are optimized for server deployments. However, for our specific use case, we have different requirements depending on the deployment environment. For our deployments in public cloud environments, we leverage the official Ubuntu images provided by the cloud provider, which are tailored and certified for their specific platform. Similarly, in our private on-premises cloud, we utilize the cloud version of Ubuntu images. These cloud images are specifically designed and pre-configured for cloud computing platforms, offering optimized performance and scalability. They enable us to efficiently deploy and manage Ubuntu instances in both our public and private cloud environments.&lt;/p&gt;

&lt;p&gt;Now, let's delve into our challenge. The process of building VM images for the public cloud involves the use of appropriate plugins for Packer. However, when it comes to our on-premises cloud, we encountered an obstacle. Our existing process relied on a deprecated plugin relying on QEMU as its underlying technology. QEMU, an open-source virtualization tool, empowers us to operate and manage virtual machines in various formats, including qcow2. To overcome this hurdle, our aim was to leverage QEMU using an official and updated plugin for Packer. This integration would seamlessly incorporate QEMU into our image-building process, delivering enhanced efficiency and reliability.&lt;/p&gt;

&lt;p&gt;While I had prior experience with Packer, my familiarity with QEMU was limited, especially when it came to using Packer with QEMU. To address this knowledge gap, I referred to the official documentation of Packer. However, I encountered a challenge: the documentation provided an example using a server version of CentOS, which wasn't suitable for my requirements. I needed a cloud version of Ubuntu, which does not come with default user and password credentials. To overcome this hurdle, I created a seed image that included user-data and meta-data. This seed image allows us to "emulate" the cloud-init functionality. By combining this seed image with the Ubuntu image, Packer can establish an SSH connection to the virtual machine successfully.&lt;/p&gt;

&lt;p&gt;In the seed image, we create a user with the necessary credentials for the initial build process. It's important to note that the user created in the seed image is only intended for the build phase and is not present in the final image. This approach ensures that the final image does not contain any unnecessary or insecure credentials, maintaining a clean and secure environment.&lt;/p&gt;

&lt;p&gt;Here's the code to generate the seed image for the packer_qemu.seed.pkr.hcl script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;source "file" "user_data" {
  content = &amp;lt;&amp;lt;EOF
#cloud-config
ssh_pwauth: True
users:
  - name: user
    plain_text_passwd: packer
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    lock_passwd: false
EOF
  target  = "user-data"
}

source "file" "meta_data" {
  content = &amp;lt;&amp;lt;EOF
{"instance-id":"packer-worker.tenant-local","local-hostname":"packer-worker"}
EOF
  target  = "meta-data"
}

build {
  sources = ["sources.file.user_data", "sources.file.meta_data"]

  provisioner "shell-local" {
    inline = ["genisoimage -output cidata.iso -input-charset utf-8 -volid cidata -joliet -r user-data meta-data"]
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;In the provided code, the genisoimage command-line tool plays a crucial role in generating the necessary configuration files for cloud-init. Specifically, it is used to create the cidata.iso file, which encapsulates the user_data and meta_data files. These files contain important cloud-init configuration data, such as user credentials and metadata information for the instance.&lt;br&gt;
By utilizing genisoimage, we can create a bootable ISO image that incorporates the required configuration data. This ISO image is then seamlessly integrated into the image-building process by Packer.&lt;br&gt;
To gain a better understanding of the genisoimage command-line options and functionality, you can refer to the official documentation at &lt;a href="https://manpages.debian.org/buster/genisoimage/genisoimage.1.en.html"&gt;Genisoimage Documentation&lt;/a&gt;. The documentation provides detailed explanations and examples to help you effectively utilize genisoimage in your image-building workflow.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Here's the code for the packer_qemu.qcow2.pkr.hcl script that uses the seed image and cloud image to build a new image and then runs an Ansible playbook to configure the new image:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;packer {
  required_plugins {
    vagrant = {
      version = "1.0.9"
      source  = "github.com/hashicorp/qemu"
    }
    ansible = {
      version = "1.0.4"
      source  = "github.com/hashicorp/ansible"
    }
  }
}

source "qemu" "ubuntu" {
  format           = "qcow2"
  disk_image       = true
  disk_size        = "10G"
  headless         = true
  iso_checksum     = "file:https://cloud-images.ubuntu.com/focal/current/SHA256SUMS"
  iso_url          = "https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img"
  qemuargs         = [["-m", "12G"], ["-smp", "8"], ["-cdrom", "cidata.iso"], ["-serial", "mon:stdio"]]
  shutdown_command = "echo 'packer' | sudo -S shutdown -P now"
  ssh_password     = "packer"
  ssh_username     = "user"
  vm_name          = "build.qcow2"
  output_directory = "output"
}

build {
  sources = ["source.qemu.ubuntu"]

  provisioner "ansible" {
    playbook_file = "ansible/qemu.yml"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;This code snippet showcases the Packer configuration language to orchestrate the build process. It begins with the packer block, which outlines the essential plugins required for Packer, including qemu and ansible. The source block focuses on configuring the QEMU source, encompassing various settings such as the format, disk size, ISO checksum and URL, QEMU arguments, SSH credentials, and more. Within the build block, the QEMU source is designated for the build process. &lt;br&gt;
Additionally, the provisioner section incorporates the ansible provisioner, specifying the Ansible playbook (ansible/qemu.yml) to execute for further customization of the newly created image. For a comprehensive understanding of the packer plugin arguments pertaining to qemu, you can refer to the official documentation at &lt;a href="https://developer.hashicorp.com/packer/plugins/builders/qemu"&gt;Packer QEMU Plugin&lt;/a&gt;. The documentation offers detailed insights into the various plugin options and configurations available for QEMU integration within Packer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;By combining Packer, QEMU, Ubuntu Cloud Images, and Ansible, we are able to automate the process of building consistent and reproducible machine images for our on-premises Data Center. This streamlined approach saves time, ensures consistency across our environments, and maintains a secure image without unnecessary credentials.&lt;/p&gt;

&lt;p&gt;I hope sharing our experience and providing these code snippets will help the community facing similar challenges in the future. Let's continue building and automating together! If you have any questions or suggestions, feel free to reach out.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>ubuntu</category>
      <category>iac</category>
      <category>automation</category>
    </item>
    <item>
      <title>Building labs using component-based architecture with Terraform and Ansible</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Thu, 14 Apr 2022 15:48:35 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/building-labs-using-component-based-architecture-with-terraform-and-ansible-15dm</link>
      <guid>https://dev.to/edersonbrilhante/building-labs-using-component-based-architecture-with-terraform-and-ansible-15dm</guid>
      <description>&lt;p&gt;Currently, I am a Site Reliability Engineer(SRE) in the observability team at Splunk. But when I worked in this solution I was part of the GDI(Get Data In) organisation at Splunk.&lt;/p&gt;

&lt;p&gt;Now, let's talk about the problem. &lt;br&gt;
Part of the engineer's job in GDI is building add-ons to Splunk. Add-ons, in a nutshell, are plugins to connect third party data sources to Splunk platform. &lt;/p&gt;

&lt;p&gt;Every time we need to work on a new add-on version of a specific third party,  we need to set up 2 labs, 1 for development purposes, and the other with QA specifications. &lt;/p&gt;

&lt;p&gt;The GDI organisation owns many add-ons, so we use a strategy to make rotations in which team and who in the team will work in a new version every time.&lt;br&gt;
This is good to spread the knowledge, but we had problems keeping reliable and consistent labs across the dev cycle and the teams.&lt;/p&gt;

&lt;p&gt;A big fraction of  people's time was manual work to set up the labs(manual configuration or writing new bash/power-shell scripts). Along with a lot of time expended in the development process, the manual work creates a great deal of headache for the developers&lt;/p&gt;

&lt;p&gt;The teams agreed it needed some automation, in order to reduce the pain to create labs and avoid duplication or rework. &lt;/p&gt;

&lt;p&gt;We came up with the idea of using infra as code(IaC). Which was nothing so special that other companies weren't already doing.&lt;/p&gt;

&lt;p&gt;Because the teams are small, and they are focused on the development of add-ons, we need an approach where the teams could have customised labs, but not necessary to write IaC scripts.&lt;/p&gt;

&lt;p&gt;Based on the Design Principles of react components, we came up with an idea to create components that can be reused and plugged in other components. And each component would be a Terraform Module, an Ansible Playbook or an Ansible Role.&lt;/p&gt;

&lt;p&gt;For a better elucidation, let's use this example - Build a lab with 3 different environments:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Environment A will have 4 windows instances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 Windows Server 2016 as Domain Controller.&lt;/li&gt;
&lt;li&gt;3 Windows Servers as members server:

&lt;ul&gt;
&lt;li&gt;1 Windows Server 2016 with Windows Event Collector:&lt;/li&gt;
&lt;li&gt;Splunk Universal Forward&lt;/li&gt;
&lt;li&gt;Collecting only Sysmon events from nodes&lt;/li&gt;
&lt;li&gt;1 Windows Server 2016 with Windows Event Forwarding&lt;/li&gt;
&lt;li&gt;1 Windows Server 2019 with Windows Event Forwarding&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Environment B will have 7 windows instances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 Windows Server 2016 as Domain Controller.&lt;/li&gt;
&lt;li&gt;6 Windows Servers as members server:

&lt;ul&gt;
&lt;li&gt;1 Windows Server 2016 with Windows Event Collector (WEC A):&lt;/li&gt;
&lt;li&gt;Splunk Universal Forward&lt;/li&gt;
&lt;li&gt;Collecting only Application events from nodes&lt;/li&gt;
&lt;li&gt;1 Windows Server 2016 with Windows Event Forwarding, sending logs to WEC A&lt;/li&gt;
&lt;li&gt;1 Windows Server 2019 with Windows Event Forwarding, sending logs to WEC A&lt;/li&gt;
&lt;li&gt;1 Windows Server 2019 with Windows Event Collector (WEC B):&lt;/li&gt;
&lt;li&gt;Splunk Universal Forward&lt;/li&gt;
&lt;li&gt;Collecting only Security events from nodes&lt;/li&gt;
&lt;li&gt;1 Windows Server 2016 with Windows Event Forwarding, sending logs to WEC B&lt;/li&gt;
&lt;li&gt;1 Windows Server 2019 with Windows Event Forwarding, sending logs to WEC B&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Environment C will have 3 windows instances:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 Windows Server 2016 as Domain Controller with Windows Event Collector :

&lt;ul&gt;
&lt;li&gt;Splunk Universal Forward&lt;/li&gt;
&lt;li&gt;Collecting only Security and Sysmon events from nodes&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;2 Windows Servers as Members Server:

&lt;ul&gt;
&lt;li&gt;1 Windows Server 2016 with Windows Event Forwarding&lt;/li&gt;
&lt;li&gt;1 Windows Server 2019 with Windows Event Forwarding&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Normally, Using terraform modules and Ansible Playbooks we could reproduce these environments. &lt;br&gt;
We would need to create specific playbooks and terraform configs for each environment. &lt;br&gt;
And here comes the problem. Spending time coding permutations in some similar configurations. &lt;/p&gt;

&lt;p&gt;To avoid that, our approach with component based architecture, we only have to write a single config file describing which modules these labs need to run without touching any Terraform Script or Ansible Playbook.&lt;/p&gt;
&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehhewqqr3phz7t9831np.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fehhewqqr3phz7t9831np.png" alt="Image description"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The solution we made is compatible with many kind of labs configurations deployed in AWS.&lt;/p&gt;

&lt;p&gt;Terraform scripts are used to deploy the infrastructure, spinning up EC2 instances and other AWS resources. And to provision softwares and system configuration inside of each EC2 instance, terraform calls proper ansible playbooks.&lt;/p&gt;

&lt;p&gt;Playbooks are a group of roles. A role represents an implementation of specifics configuration in an independent way.&lt;/p&gt;

&lt;p&gt;Take role &lt;code&gt;windows_splunk_universal_forward&lt;/code&gt; as example. This role downloads, installs and configures a splunk universal forward instance in Windows. This role is coded to be used any windows version.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Repo Structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   ├── ansible
   │   ├── all_roles
   │   │   └── distros
   │   │       ├── linux
   │   │       │   └── roles
   │   │       │       └── &amp;lt;new-linux-role&amp;gt;
   │   │       └── windows
   │   │           └── roles
   │   │               └── &amp;lt;new-windows-role&amp;gt;
   │   └── playbooks
   │       └── &amp;lt;new-playbook&amp;gt;
   └── terraform
       └── modules
           ├── distros
           │   └── &amp;lt;new-distro-type&amp;gt;
           └── environments
               └── &amp;lt;new-environment-type&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Terraform
&lt;/h3&gt;

&lt;p&gt;Terraform is an open-source infrastructure as code software tool that provides a consistent CLI workflow to manage hundreds of cloud services. Terraform codifies cloud APIs into declarative configuration files.&lt;/p&gt;

&lt;p&gt;For more info check on &lt;a href="https://www.terraform.io/" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Terraform Structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;   terraform/
   ├── modules
   │   ├── constants
   │   ├── core
   │   ├── distros
   │   │   └── &amp;lt;distro-type&amp;gt;
   │   └── environments
   │       └── &amp;lt;environment-type&amp;gt;
   └── wire
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is an environment?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A environment is a pre-defined kind of relations between nodes.&lt;br&gt;
Each module &lt;code&gt;environment&lt;/code&gt; is found in path &lt;code&gt;terraform/modules/environments&lt;/code&gt;. And uses the modules in &lt;code&gt;terraform/modules/distros&lt;/code&gt; to build the proper relations.&lt;/p&gt;

&lt;p&gt;For elucidation, take this case as an example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1 Windows Domain Controller.&lt;/li&gt;
&lt;li&gt;X number of Member Servers.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We have this hierarchy, because we need create first the DC and so give some data to member servers, such as IP:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```
# file: terraform/modules/environments/linux-standalone/main.tf

module "windows-domain-controller" {
    source = "../../distros/windows-server"
    ...
}

module "windows-server-member" {
    source = "../../distros/windows-server"
    ...
    windows_domain_controller = module.windows-domain-controller
    ...
}
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is a distro?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A distro is a pre-defined kind of AMI with specific kind of setup and/or provisioning.&lt;br&gt;
Each module &lt;code&gt;distro&lt;/code&gt; is found in path &lt;code&gt;terraform/modules/distros&lt;/code&gt;. And have a proper ansible playbook to execute the provisioning.&lt;/p&gt;

&lt;p&gt;For elucidation, take these cases as examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Linux&lt;/li&gt;
&lt;li&gt;Windows&lt;/li&gt;
&lt;li&gt;Splunk&lt;/li&gt;
&lt;li&gt;Free BSD&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Terraform example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```

locals {
...
provisioning_command     = "ansible-playbook -i $PUBLIC_IP /opt/automation/tools/ansible/playbooks/windows.yml --extra-vars='${local.extra_vars}'"
}

...

resource "aws_instance" "windows_server" {
...
}

resource "null_resource" "ansible" {

triggers = {
    command = replace(local.provisioning_command, "$PUBLIC_IP", "'${aws_instance.windows_server.public_ip},'")
}

provisioner "local-exec" {
    command = replace(local.provisioning_command, "$PUBLIC_IP", "'${aws_instance.windows_server.public_ip},'")
}
}
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Ansible
&lt;/h3&gt;

&lt;p&gt;Ansible is an open-source software provisioning, configuration management, and application-deployment tool enabling infrastructure as code. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows.&lt;/p&gt;

&lt;p&gt;For more info check on &lt;a href="https://www.ansible.com/" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Ansible Structure:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;```
ansible/
├── all_roles
│   ├── distros
│   │   └── &amp;lt;distro-type&amp;gt;
│   │       └── roles
│   │           └── &amp;lt;distro-role&amp;gt;
└── playbooks
```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What is a distro type?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A distro type is folder that centralize all ansible roles that can be used executed in a specific.&lt;/p&gt;

&lt;p&gt;Take windows as example: &lt;code&gt;ansible/all_roles/distros/windows&lt;/code&gt;. This folder centralize all ansible roles that can be used executed in a windows machines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is a distro role?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A distro role is a group of ansible tasks, that implements related configurations that represents a functionality.&lt;/p&gt;

&lt;p&gt;For elucidation, take the list of tasks from splunk UF role:&lt;br&gt;
    - Downloads Splunk UF&lt;br&gt;
    - Installs the download file&lt;br&gt;
    - Sets default configuration&lt;br&gt;
    - Starts Splunk UF&lt;/p&gt;
&lt;h3&gt;
  
  
  Explaining the config file
&lt;/h3&gt;

&lt;p&gt;Here you can find a complete config example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;config = {
  "myenv01" = {
    "type" = "windows_standalone"
    "nodes" = {
      "myvm01" = {
        "type" = "windows"
        "enabled_roles" = {
          "windows_funcionality01" = true
          "windows_funcionality02" = true
        }
        "os" = {
          "size"    = "t2.medium"
          "distro"  = "windows"
          "type"    = "windows"
          "version" = "2016"
        }
      }
    }
  }
  "myvm02" = {
    "type" = "linux_standalone"
    "nodes" = {
      "mylinux01" = {
        "type" = "linux"
        "enabled_roles" = {
          "linux_funcionality01" = true
          "linux_funcionality02" = true
        }
        "os" = {
          "size"    = "t2.medium"
          "distro"  = "ubuntu"
          "type"    = "linux"
          "version" = "20"
        }
      }
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Under the hood this configuration will be translated to create 2 EC2 instances in AWS, and each instance will run playbooks with specific roles.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Block explanation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Each block &lt;code&gt;myenv0x&lt;/code&gt; represents how the environment will be deployed. The type represents which predefined environment will be used.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Each block &lt;code&gt;myvm0x&lt;/code&gt; represents a VM that will be created. The type represents which predefined distro will be used.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;The block &lt;code&gt;os&lt;/code&gt; has 4 properties that will create a proper EC2 instance: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The AWS type instance&lt;/li&gt;
&lt;li&gt;Type of Distro (windows, linux, etc)&lt;/li&gt;
&lt;li&gt;OS Distro(ubuntu, debian, suse, windows, etc)&lt;/li&gt;
&lt;li&gt;Version of the OS Distro&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;With this info the terraform will know which AWS AMI to use to spin up in the EC2 instance&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The block &lt;code&gt;enabled_roles&lt;/code&gt; represents a list of Ansible Roles to execute in each instance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For more details about the code and implementation, check the &lt;a href="https://github.com/edersonbrilhante/lab-builder-demo" rel="noopener noreferrer"&gt;code demo&lt;/a&gt;, fully functional.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>terraform</category>
      <category>ansible</category>
      <category>architecture</category>
    </item>
    <item>
      <title>A serverless full-stack application using only git, google drive, and public ci/cd runners</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Fri, 16 Apr 2021 13:40:49 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/a-serverless-full-stack-application-using-only-git-google-drive-and-public-ci-cd-runners-262l</link>
      <guid>https://dev.to/edersonbrilhante/a-serverless-full-stack-application-using-only-git-google-drive-and-public-ci-cd-runners-262l</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR; How I built the Vilicus Service, a serverless full-stack application with backend workers and database only using git and ci/cd runners.&lt;/strong&gt;&lt;/p&gt;




&lt;h4&gt;
  
  
  What is Vilicus?
&lt;/h4&gt;

&lt;p&gt;&lt;a href="https://dev.to/edersonbrilhante/vilicus-a-overseer-for-security-scanning-of-container-images-eji"&gt;Vilicus&lt;/a&gt; is an open-source tool that orchestrates security scans of container images(Docker/OCI) and centralizes all results into a database for further analysis and metrics.&lt;/p&gt;

&lt;p&gt;Vilicus provides many alternatives to use it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/edersonbrilhante/vilicus"&gt;Own Installation&lt;/a&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/edersonbrilhante/vilicus-github-action"&gt;GitHub Action&lt;/a&gt; in your GitHub workflows;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/edersonbrilhante/vilicus-gitlab"&gt;Template CI&lt;/a&gt; in your GitLab CI/CD pipelines;&lt;/li&gt;
&lt;li&gt;
&lt;a href="http://vilicus.edersonbrilhante.com.br/"&gt;Free Online Service&lt;/a&gt;;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article explains how it was possible to build the Free Online Service without using a traditional deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--MaLtB1bF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i9eolikd4iicgb3r37js.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--MaLtB1bF--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/i9eolikd4iicgb3r37js.png" alt="Architecture" width="701" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Frontend is hosted in GitHub Pages. This frontend is a landing page with a free service to scan or display the vulnerabilities in container images. &lt;/p&gt;

&lt;p&gt;The results of container image scans are stored in a GitLab Repository.&lt;/p&gt;

&lt;p&gt;When the user asks to show the results from an image, the frontend consumes the GitLab API to retrieve the file with vulns from this image. In case this image is not scanned yet, the user has the option to schedule a scan using a google form.&lt;/p&gt;

&lt;p&gt;When this form is filled, the data is sent to a Google Spreadsheet.&lt;/p&gt;

&lt;p&gt;A GitHub Workflow runs every 5 minutes to check if there are new answers in this Spreadsheet. For each new image in the Spreadsheet, this workflow triggers another Workflow to scan the image and save the result in the GitLab Repository.&lt;/p&gt;

&lt;h4&gt;
  
  
  Why store in GitLab?
&lt;/h4&gt;

&lt;p&gt;GitLab provides bigger limits. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Here's a summary of differences in offering on public cloud and free tier:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Free users&lt;/th&gt;
&lt;th&gt;Max repo size (GB)&lt;/th&gt;
&lt;th&gt;Max file size (MB)&lt;/th&gt;
&lt;th&gt;Max API calls per hour (per client)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GitHub&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;5000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BitBucket&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Unlimited (up to repo size)&lt;/td&gt;
&lt;td&gt;5000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GitLab&lt;/td&gt;
&lt;td&gt;Unlimited&lt;/td&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;Unlimited (up to repo size)&lt;/td&gt;
&lt;td&gt;36000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  Google Drive
&lt;/h3&gt;

&lt;p&gt;This choice was a "quick win". In a usual deployment, the backend could call an API passing secrets without the clients knowing the secrets. &lt;/p&gt;

&lt;p&gt;But because I am using GitHub Pages I cannot use that(Well, I could do it in the javascript, but anyone using the Browser Inspect would see the secrets. So let's don't do it 😉)&lt;/p&gt;

&lt;p&gt;This makes the Google Spreadsheet perform as a Queue.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Google Form:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--pyUQWnG5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nyxtv3xkmk90k2i6ljhg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--pyUQWnG5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/nyxtv3xkmk90k2i6ljhg.png" alt="Form" width="800" height="664"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Google Spreadsheet with answers:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OY2FJpFH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezihbao6fboqhhjkrg4l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OY2FJpFH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezihbao6fboqhhjkrg4l.png" alt="Answers" width="800" height="110"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;h3&gt;
  
  
  GitHub Workflows
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;Schedule Workflow&lt;/code&gt; runs at most every 5 minutes. This workflow executes the python script that checks if there are new rows in the &lt;code&gt;Google Spreadsheet&lt;/code&gt;, and for each row is made an HTTP request to trigger the event &lt;code&gt;repository_dispatch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This makes the workflows perform as backend workers.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Schedule in workflow:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Schedule&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;schedule&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;cron&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;*/5&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;*'&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Event &lt;code&gt;repository_dispatch&lt;/code&gt; in WorkFlow:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Report&lt;/span&gt;
&lt;span class="na"&gt;on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;repository_dispatch&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;span class="nn"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Screenshots:
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;Schedule History:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--52iyEBwk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vx0omaxnghu518cjqfqf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--52iyEBwk--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vx0omaxnghu518cjqfqf.png" alt="Schedules" width="800" height="417"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Schedule WorkFlow:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Csmo2bfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0zn81m0ai0je3puhr56v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Csmo2bfX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/0zn81m0ai0je3puhr56v.png" alt="Schedule" width="800" height="240"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Scans History:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--1U2jDv5N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pzhsml4624pfb2edj4u2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--1U2jDv5N--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pzhsml4624pfb2edj4u2.png" alt="Scans" width="800" height="363"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Report Workflow:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--_pAnFAc5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c6p60gvam145vyajlfmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--_pAnFAc5--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/c6p60gvam145vyajlfmu.png" alt="Report" width="800" height="415"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Scan Report stored in GitLab:&lt;/em&gt;&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--xhlDDfsA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5qg7eshm699k7f22t5p1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--xhlDDfsA--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5qg7eshm699k7f22t5p1.png" alt="Report File Example" width="800" height="419"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Source Code:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus-report/blob/main/.github/workflows/schedule-jobs.yml"&gt;Schedule Workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus-report/blob/main/.github/workflows/report.yml"&gt;Report Workflow&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus-report/blob/main/commit.py"&gt;Script to upload the report file to Gitlab&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus-report/blob/main/schedule-jobs.py"&gt;Script to iterate the answers and trigger new scans&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://gitlab.com/edersonbrilhante/vilicus-reports-db"&gt;GitLab Repo with report files&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;
  
  
  Do you want to know more about GitHub Actions?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/actions/learn-github-actions"&gt;Learn GitHub Actions&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions"&gt;Workflow syntax&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h3&gt;
  
  
  Github Pages
&lt;/h3&gt;

&lt;p&gt;The Frontend is running in GitHub Pages. &lt;/p&gt;

&lt;p&gt;By default an application running in GH Pages is hosted as &lt;code&gt;http://&amp;lt;github-user&amp;gt;.github.io/&amp;lt;repository&amp;gt;&lt;/code&gt;. &lt;/p&gt;

&lt;p&gt;But GitHub allows you to customize the domain, because that it's possible to access Vilicus using &lt;code&gt;https://vilicus.edersonbrilhante.com.br&lt;/code&gt; instead of &lt;code&gt;http://edersonbrilhante.github.io/vilicus&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;
  
  
  GitHub Workflow to build the application and deploy it in GH Pages
&lt;/h4&gt;

&lt;p&gt;&lt;em&gt;Building the source code:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Build&lt;/span&gt;
  &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;|&lt;/span&gt;
    &lt;span class="s"&gt;cd website&lt;/span&gt;
    &lt;span class="s"&gt;npm install&lt;/span&gt;
    &lt;span class="s"&gt;npm run-script build&lt;/span&gt;
  &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;REACT_APP_GA_CODE&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.REACT_APP_GA_CODE }}&lt;/span&gt;
    &lt;span class="na"&gt;REACT_APP_FORM_SCAN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.REACT_APP_FORM_SCAN }}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;em&gt;Deploying the build:&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy&lt;/span&gt;
  &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;JamesIves/github-pages-deploy-action@releases/v3&lt;/span&gt;
  &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;GITHUB_TOKEN&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.GITHUB_TOKEN }}&lt;/span&gt;
    &lt;span class="na"&gt;BRANCH&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;gh-pages&lt;/span&gt;
    &lt;span class="na"&gt;FOLDER&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;website/build&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Source Code:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/.github/workflows/build-gh-pages.yml"&gt;Workflow to deploy code in GitHub Pages&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus/tree/main/website"&gt;Application Source Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus/tree/gh-pages"&gt;Deployed code in GH Pages&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Do you want to know more about GitHub Pages?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/pages/getting-started-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site"&gt;Configuring a publishing source&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/pages/configuring-a-custom-domain-for-your-github-pages-site"&gt;Configuring a custom domain&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  That’s it!
&lt;/h2&gt;

&lt;p&gt;In case you have any questions, please leave a comment here or ping me on &lt;a href="https://www.linkedin.com/in/edersonbrilhante"&gt;LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>showdev</category>
      <category>devops</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Fast startup application with database stored in container images</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Thu, 01 Apr 2021 15:15:54 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/fast-startup-application-with-database-stored-in-container-images-1k04</link>
      <guid>https://dev.to/edersonbrilhante/fast-startup-application-with-database-stored-in-container-images-1k04</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR;&lt;/strong&gt; This article shows which strategy I implemented to allow an application to be ready to use in a few minutes rather than many hours.&lt;/p&gt;

&lt;p&gt;In this article, I will talk about the strategy I used in the project Vilicus to have big databases synced in new setups. For those who don't know Vilicus yet, I recommend reading &lt;a href="https://dev.to/edersonbrilhante/vilicus-a-overseer-for-security-scanning-of-container-images-eji"&gt;my article&lt;/a&gt; about it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why the application takes too much time to start?
&lt;/h2&gt;

&lt;p&gt;At this moment the project &lt;a href="https://github.com/edersonbrilhante/vilicus"&gt;Vilicus&lt;/a&gt; uses &lt;a href="https://github.com/anchore/anchore-engine"&gt;Anchore&lt;/a&gt;, &lt;a href="https://github.com/quay/clair"&gt;Clair&lt;/a&gt;, and &lt;a href="https://github.com/aquasecurity/trivy"&gt;Trivy&lt;/a&gt; as vendors to run security scans in container images. Each vendor has its own programming language, database, internal dependencies and can use different vulnerabilities databases.&lt;/p&gt;

&lt;p&gt;Vilicus itself starts in milliseconds, but to be ready to use it's necessary to wait for the vendors to sync the vulnerabilities database with the latest changes. But these syncs can consume a lot of time.&lt;/p&gt;

&lt;p&gt;See for example Anchore, the one with more time-consuming to complete the sync:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;There is no exact time frame for the initial sync to complete as it depends heavily on environmental factors, such as the host's memory/cpu allocation, disk space, and network bandwidth. Generally, the initial sync should complete within 8 hours but may take longer. Subsequent feed updates are much faster as only deltas are updated.&lt;br&gt;
&lt;a href="https://docs.anchore.com/current/docs/faq/"&gt;https://docs.anchore.com/current/docs/faq/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Clair takes more or less 20 minutes. And Trivy is ready in a few seconds.&lt;/p&gt;

&lt;p&gt;If you run everything from scratch will take almost 1 day to sync all vulnerabilities databases, but after this major sync, the next syncs will be faster.&lt;/p&gt;

&lt;p&gt;This will be a problem if you would like to run an ephemeral instance in your CI / CD, so waiting hours for the sync to be completed before you can run the first scan will be inviable. Thinking about how to fix this problem, I came with a solution: Save updated database snapshots in container images every day.&lt;/p&gt;

&lt;p&gt;Now you must be thinking, this is not a good practice, and normally I would agree. But I believe there are exceptions in specific cases, such as fixing the problem is more important than conventions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Saving the database in a container image
&lt;/h2&gt;

&lt;p&gt;I'll show you in detail how I made Anchore work, but Clair and Trivy are not much different&lt;/p&gt;

&lt;h3&gt;
  
  
  Anchore
&lt;/h3&gt;

&lt;p&gt;First I have a compacted dump SQL, with the database already synced with less last 6 months, stored in a container image: &lt;a href="https://hub.docker.com/layers/vilicus/anchoredb/dumpsql/images/sha256-d9fffded216ee40bf31580467af80eeadab0988cbefe5cd82acfde410f683370?context=explore"&gt;vilicus/anchoredb:dumpsql&lt;/a&gt;. So we don't need to wait many hours, we just update the delta.&lt;/p&gt;

&lt;p&gt;I used this image as a base to create a local image(vilicus/anchoredb:files) with a script to restore the database when this image runs as a container.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus/blob/v0.0.3/deployments/dockerfiles/anchore/db/files/Dockerfile"&gt;Dockerfile content&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM vilicus/anchoredb:dumpsql as dumpsql

FROM postgres:9.6.21-alpine
LABEL vilicus.app.version=9.6.21-alpine

COPY --chown=postgres:postgres --from=dumpsql /opt/vilicus/data/anchore_db.tar.gz /opt/vilicus/data/anchore_db.tar.gz
COPY deployments/dockerfiles/anchore/db/files/restore-dbs.sh /docker-entrypoint-initdb.d/01.restore-dbs.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus/blob/e700a01a08483e188aad9129d0bda9c12067a6cc/scripts/build-anchore-image.sh#L57-L61"&gt;Building the container image&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker build -f deployments/dockerfiles/anchore/db/files/Dockerfile -t vilicus/anchoredb:files .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The image &lt;code&gt;vilicus/anchoredb:files&lt;/code&gt; is referenced in &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/v0.0.3/deployments/docker-compose.updater.yml"&gt;deployments/docker-compose.updater.yml&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we start the anchore and the anchoredb.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose -f deployments/docker-compose.updater.yml up \
    --build -d --force \
    --remove-orphans \
    --renew-anon-volumes anchore
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, we run this command to restore the database.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker exec anchoredb sh -c 'docker-entrypoint.sh postgres' &amp;amp;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we wait for the restore and the database we ready to be connected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run --network container:anchore vilicus/vilicus:latest \
    sh -c "dockerize -wait http://anchore:8228/health -wait-retry-interval 10s -timeout 1000s echo done"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With the Anchore Engine and the DB ready, we start the sync.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker exec anchore sh -c 'anchore-cli system wait'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the sync finishes we stop anchore and we kill gracefully the Postgres PID in anchoredb.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker stop anchore
docker exec -u postgres anchoredb sh -c 'pg_ctl stop -m smart'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We commit the container, with the changes made by the sync, into a new container image &lt;code&gt;vilicus/anchoredb:local-update&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CID=$(docker inspect --format="{{.Id}}" anchoredb)
docker commit $CID vilicus/anchoredb:local-update
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we finally build the container image that goes to docker hub, by copying the Postgres data from the image &lt;code&gt;vilicus/anchoredb:local-update&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://raw.githubusercontent.com/edersonbrilhante/vilicus/v0.0.3/deployments/dockerfiles/anchore/db/Dockerfile"&gt;Dockerfile content&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FROM as db
FROM postgres:9.6.21-alpine
COPY --chown=postgres:postgres --from=db /data/ /data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus/blob/e700a01a08483e188aad9129d0bda9c12067a6cc/scripts/build-anchore-image.sh#L30"&gt;Building the container image&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker build -f deployments/dockerfiles/anchore/db/Dockerfile -t vilicus/anchoredb:latest .
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check the complete script &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/v0.0.3/scripts/build-anchore-image.sh"&gt;here&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Clair and Trivy
&lt;/h3&gt;

&lt;p&gt;For Clair check &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/scripts/build-clair-image.sh"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For Trivy check &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/scripts/push-trivy-image.sh"&gt;here&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Updating the images every day
&lt;/h2&gt;

&lt;p&gt;To have the databases with the latest changes, I have a GitHub workflow that runs a job everyday building the images and pushing them to the Docker Hub.&lt;/p&gt;

&lt;p&gt;Check the &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/.github/workflows/build-images.yml"&gt;workflow&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IGZwEcKB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/grpk74axc68q1fbzwwo2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IGZwEcKB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_800/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/grpk74axc68q1fbzwwo2.png" alt="Complete workflow" width="800" height="480"&gt;&lt;/a&gt;Complete workflow&lt;/p&gt;




&lt;h2&gt;
  
  
  That's it!
&lt;/h2&gt;

&lt;p&gt;In case you have any questions, please leave a comment here or ping me on &lt;a href="https://www.linkedin.com/in/edersonbrilhante"&gt;🔗 LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>docker</category>
      <category>devops</category>
      <category>database</category>
      <category>programming</category>
    </item>
    <item>
      <title>GitLab Runners as a Service with Github Action</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Thu, 01 Apr 2021 14:52:43 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/gitlab-runners-as-a-service-with-github-action-149n</link>
      <guid>https://dev.to/edersonbrilhante/gitlab-runners-as-a-service-with-github-action-149n</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR;&lt;/strong&gt; This article will show how to implement the action "Gitlab Runner Service Action" in a "GitHub Workflow" that is triggered by a "GitLab-CI job", and this way having temporary GitLab Runners hosted by GitHub.&lt;/p&gt;

&lt;p&gt;For more info about &lt;code&gt;GitHub workflow&lt;/code&gt;, check the official &lt;a href="https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For more info about &lt;code&gt;GitLab-CI&lt;/code&gt;, check the official &lt;a href="https://docs.gitlab.com/ee/ci/yaml/gitlab_ci_yaml.html" rel="noopener noreferrer"&gt;documentation&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1
&lt;/h3&gt;

&lt;p&gt;Create a new GitHub repository with the following GitHub Workflow. File location: &lt;code&gt;.github/workflows/gitlab-runner.yaml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: Gitlab Runner Service
on: [repository_dispatch]
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Maximize Build Space
        uses: easimon/maximize-build-space@master
        with:
          root-reserve-mb: 512
          swap-size-mb: 1024
          remove-dotnet: 'true'
          remove-android: 'true'
          remove-haskell: 'true'

      - name: Gitlab Runner
        uses: edersonbrilhante/gitlab-runner-action@main
        with:
          registration-token: "${{ github.event.client_payload.registration_token }}"
          docker-image: "docker:19.03.12"
          name: ${{ github.run_id }}
          tag-list: "crosscicd"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What does this workflow do?
&lt;/h4&gt;

&lt;p&gt;This workflow will run just when the event repository_dispatch is triggered. The first step will be to increase the free space removing useless packages for our GitLab runner. And the second step will run the action that registers a new GitLab Runner with a tag &lt;code&gt;crosscicd&lt;/code&gt;, so start it and unregister it after a GitLab-CI job is completed with success or failure.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2
&lt;/h3&gt;

&lt;p&gt;Create a new GitLab repository with the following GitLab-CI config. File location: &lt;code&gt;.gitlab-ci.yml&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;start-crosscicd:
  image: alpine
  before_script:
    - apk add --update curl &amp;amp;&amp;amp; rm -rf /var/cache/apk/*
  script: |
    curl -H "Authorization: token ${GITHUB_TOKEN}" \
    -H 'Accept: application/vnd.github.everest-preview+json' \
    "https://api.github.com/repos/${GITHUB_REPO}/dispatches" \
    -d '{"event_type": "gitlab_trigger_'${CI_PIPELINE_ID}'", "client_payload": {"registration_token": "'${GITLAB_REGISTRATION_TOKEN}'"}}'

github:
  image: docker:latest
  services:
    - name: docker:dind
      alias: thedockerhost
  variables:
    DOCKER_HOST: tcp://thedockerhost:2375/
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""
  script:
    - df -h
    - docker run --privileged ubuntu df -h
  tags:
    - crosscicd
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What does this gitlab-ci?
&lt;/h4&gt;

&lt;p&gt;The job &lt;strong&gt;&lt;em&gt;start-crosscicd&lt;/em&gt;&lt;/strong&gt; will trigger the GitHub workflow, creating the GitLab runner with the tag &lt;code&gt;crosscicd&lt;/code&gt;. And the job &lt;code&gt;GitHub&lt;/code&gt; will wait for a runner with a tag &lt;code&gt;crosscicd&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 3
&lt;/h3&gt;

&lt;p&gt;Set the EnvVars in the new GitLab Repo&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    GITHUB_REPO:&amp;lt;username&amp;gt;/&amp;lt;github-repo&amp;gt;
    GITHUB_TOKEN:&amp;lt;GitHub Access Token&amp;gt;
    GITLAB_REGISTRATION_TOKEN:&amp;lt;GitLab Registration Token&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  How to create a new GitHub Access Token:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Go to &lt;a href="https://github.com/settings/tokens/new" rel="noopener noreferrer"&gt;https://github.com/settings/tokens/new&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Mark the item &lt;code&gt;workflow&lt;/code&gt; and click in generate a token&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7t41gibobma1uyfa7tnz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7t41gibobma1uyfa7tnz.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  How to get Registration Token:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Go to &lt;code&gt;https://gitlab.com/&amp;lt;username&amp;gt;/&amp;lt;repo&amp;gt;/-/settings/ci_cd&lt;/code&gt; and click and expand &lt;code&gt;Runners&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Copy the Registration Token&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5m0s0al49w76jiqjyjsq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5m0s0al49w76jiqjyjsq.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Where to store the EnvVars?
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Go to  &lt;code&gt;https://gitlab.com/&amp;lt;username&amp;gt;/&amp;lt;repo&amp;gt;/-/settings/ci_cd&lt;/code&gt; and click and expand &lt;code&gt;Variables&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Click in Add Variable and save it for each EnvVar&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8kfcu8v4p37m12z1e1w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm8kfcu8v4p37m12z1e1w.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvncw3tiqqdle5lpk1yd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmvncw3tiqqdle5lpk1yd.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4
&lt;/h3&gt;

&lt;p&gt;Now your pipeline is ready to run the GitLab Runner in GitHub trigger by Gitlab-CI Job :)&lt;/p&gt;




&lt;h2&gt;
  
  
  Example
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Video Demo
&lt;/h3&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/jI5U1iMboOs"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Screenshots
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frs0l7kh49uiyb7z71w4p.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frs0l7kh49uiyb7z71w4p.png" alt="Job start-crosscicd trigger Github Workflow"&gt;&lt;/a&gt;Job start-crosscicd trigger Github Workflow&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd97dpb7iu5zabbrn1ab3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd97dpb7iu5zabbrn1ab3.png" alt="Workflow triggered by Gitlab-CI job"&gt;&lt;/a&gt;Workflow triggered by Gitlab-CI job&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2rzlaf1yxq3uzwyaa3f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq2rzlaf1yxq3uzwyaa3f.png" alt="There is 17GB free by default"&gt;&lt;/a&gt;There is 17GB free by default&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycjolujvyhkgsjutgwdd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fycjolujvyhkgsjutgwdd.png" alt="After Maximize we have 54GB free to use"&gt;&lt;/a&gt;After Maximize we have 54GB free to use&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzcfrksmy46gss18m4db.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzzcfrksmy46gss18m4db.png" alt="Register a Runner, Start it, and Unregister after the job in GitLab is completed"&gt;&lt;/a&gt;Register a Runner, Start it, and Unregister after the job in GitLab is completed&lt;/p&gt;

&lt;h3&gt;
  
  
  Code
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/gitlab-runner-service-example" rel="noopener noreferrer"&gt;Repo with example code&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/gitlab-runner-action" rel="noopener noreferrer"&gt;GitHub Action&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  That’s it!
&lt;/h2&gt;

&lt;p&gt;In case you have any questions, please leave a comment here or ping me on &lt;a href="https://www.linkedin.com/in/edersonbrilhante" rel="noopener noreferrer"&gt;🔗 LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>gitlabci</category>
      <category>cicd</category>
      <category>devops</category>
      <category>githubactions</category>
    </item>
    <item>
      <title>Vilicus — An overseer for security scanning of container images</title>
      <dc:creator>Ederson Brilhante</dc:creator>
      <pubDate>Wed, 31 Mar 2021 20:19:18 +0000</pubDate>
      <link>https://dev.to/edersonbrilhante/vilicus-a-overseer-for-security-scanning-of-container-images-eji</link>
      <guid>https://dev.to/edersonbrilhante/vilicus-a-overseer-for-security-scanning-of-container-images-eji</guid>
      <description>&lt;p&gt;Vilicus is an open-source tool that orchestrates security scans of container images(Docker/OCI) and centralizes all results into a database for further analysis and metrics.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why scan for vulnerabilities?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;A recent &lt;a href="https://blog.prevasio.com/2020/12/operation-red-kangaroo-industrys-first.html" rel="noopener noreferrer"&gt;analysis&lt;/a&gt; of around 4 million Docker Hub images by cyber security firm Prevasio found that 51% of the images had exploitable vulnerabilities. A large number of these were cryptocurrency miners, both open and hidden, and 6,432 of the images had malware.&lt;br&gt;
&lt;a href="https://www.infoq.com/news/2020/12/dockerhub-image-vulnerabilities/" rel="noopener noreferrer"&gt;https://www.infoq.com/news/2020/12/dockerhub-image-vulnerabilities/&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F971e3mglk3o8dkls3h03.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F971e3mglk3o8dkls3h03.png" alt="Image from https://prevasio.com/static/web/viewer.html?file=/static/Red_Kangaroo.pdf"&gt;&lt;/a&gt;Image from &lt;a href="https://prevasio.com/static/web/viewer.html?file=/static/Red_Kangaroo.pdf" rel="noopener noreferrer"&gt;https://prevasio.com/static/web/viewer.html?file=/static/Red_Kangaroo.pdf&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Docker image security scanning is a process for finding security vulnerabilities within your Docker image files.&lt;br&gt;
Typically, image scanning works by parsing through the packages or other dependencies that are defined in a container image file, then checking to see whether there are any known vulnerabilities in those packages or dependencies.&lt;br&gt;
&lt;a href="https://resources.whitesourcesoftware.com/blog-whitesource/docker-image-security-scanning" rel="noopener noreferrer"&gt;https://resources.whitesourcesoftware.com/blog-whitesource/docker-image-security-scanning&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  How does it work?
&lt;/h2&gt;

&lt;p&gt;There are many tools to scan container images for vulnerabilities such as &lt;a href="https://github.com/anchore/anchore-engine" rel="noopener noreferrer"&gt;Anchore&lt;/a&gt;, &lt;a href="https://github.com/quay/clair" rel="noopener noreferrer"&gt;Clair&lt;/a&gt;, and &lt;a href="https://github.com/aquasecurity/trivy" rel="noopener noreferrer"&gt;Trivy&lt;/a&gt;. But sometimes the results from the same image can be different. And this project comes to help the developers to improve the quality of their container images by finding vulnerabilities and thus addressing them with agnostic sight from vendors.&lt;/p&gt;

&lt;p&gt;Some articles comparing the scanning tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://boxboat.com/2020/04/24/image-scanning-tech-compared/" rel="noopener noreferrer"&gt;Open Source CVE Scanner Round-Up: Clair vs Anchore vs Trivy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://opensource.com/article/18/8/tools-container-security" rel="noopener noreferrer"&gt;5 open source tools for container security&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a href="https://www.a10o.net/devsecops/docker-image-security-static-analysis-tool-comparison-anchore-engine-vs-clair-vs-trivy/" rel="noopener noreferrer"&gt;Docker Image Security: Static Analysis Tool Comparison — Anchore Engine vs Clair vs Trivy&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobow7zp9z7oirw5r6l2n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fobow7zp9z7oirw5r6l2n.png"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Cached Database
&lt;/h2&gt;

&lt;p&gt;Vilicus updates daily the vendor databases with the latest changes in the vulns DBs.&lt;/p&gt;

&lt;p&gt;Using a strategy to storage the database data in layers of docker images, the whole platform is ready to use in minutes instead of hours. Starting the sync feed with vulns from scratch can take at least 6 hours.&lt;/p&gt;

&lt;p&gt;Check the strategy used in &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/scripts/build-anchore-image.sh" rel="noopener noreferrer"&gt;Anchore&lt;/a&gt;, &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/scripts/build-clair-image.sh" rel="noopener noreferrer"&gt;Clair&lt;/a&gt; and &lt;a href="https://github.com/edersonbrilhante/vilicus/blob/main/scripts/build-trivy-image.sh" rel="noopener noreferrer"&gt;Trivy&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Local Registry
&lt;/h2&gt;

&lt;p&gt;Vilicus provides a local registry, so you can build a local image and scanning it without pushing it to a remote repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker build -t localhost:5000/local-image:my-tag .

curl -o docker-compose.yml https://raw.githubusercontent.com/edersonbrilhante/vilicus/main/deployments/docker-compose.yml

docker-compose up -d

IMAGE=localregistry.vilicus.svc:5000/local-image:my-tag

docker run -v ${PWD}/artifacts:/artifacts \
  --network container:vilicus \
  vilicus/vilicus:latest \
  sh -c "dockerize -wait http://vilicus:8080/healthz -wait-retry-interval 60s -timeout 2000s vilicus-client -p /opt/vilicus/configs/conf.yaml -i ${IMAGE}  -t /opt/vilicus/contrib/sarif.tpl -o /artifacts/results.sarif"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  GitHub Action
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;GitHub Actions makes it easy to automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub. Make code reviews, branch management, and issue triaging work the way you want.&lt;br&gt;
&lt;a href="https://github.com/features/actions" rel="noopener noreferrer"&gt;https://github.com/features/actions&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Vilicus provides a &lt;a href="https://github.com/marketplace/actions/vilicus-scan" rel="noopener noreferrer"&gt;GitHub action&lt;/a&gt; to help you scanning container images in your CI/CD.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Container scanning&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A scan can be done using a remote image and a local image. Using a remote repository such as docker.io the image will be &lt;strong&gt;&lt;em&gt;&lt;code&gt;docker.io/your-organization/image:tag&lt;/code&gt;&lt;/em&gt;&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  - name: Scan image
    uses: edersonbrilhante/vilicus-github-action@main
    with:
      image: "docker.io/myorganization/myimage:tag"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And to use a local image its need to tag as &lt;strong&gt;&lt;em&gt;&lt;code&gt;localhost:5000/image:tag&lt;/code&gt;&lt;/em&gt;&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  - name: Scan image
    uses: edersonbrilhante/vilicus-github-action@main
    with:
      image: "localhost:5000/myimage:tag"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Full example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Complete example with steps for cleaning space, building local image, Vilicus scanning, and uploading results to GitHub Security&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;name: Container Image CI
on: [push]
jobs:
  build
    runs-on: ubuntu-latest
    steps:
      - name: Maximize Build Space
        uses: easimon/maximize-build-space@master
        with:
          root-reserve-mb: 512
          swap-size-mb: 1024
          remove-dotnet: 'true'
          remove-android: 'true'
          remove-haskell: 'true'
      - name: Checkout branch
        uses: actions/checkout@v2
      - name: Build the Container image
        run: docker build -t localhost:5000/local-image:${GITHUB_SHA} .
      - name: Vilicus Scan
        uses: edersonbrilhante/vilicus-github-action@main
        with:
          image: localhost:5000/local-image:${{ github.sha }}
      - name: Upload results to github security
        uses: github/codeql-action/upload-sarif@v1
        with:
          sarif_file: artifacts/results.sarif
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Results in GitHub Security
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus-scan-examples" rel="noopener noreferrer"&gt;Check an example&lt;/a&gt; using Vilicus GitHub Action&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtmicxlyd96ikk9a0xqu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmtmicxlyd96ikk9a0xqu.png" alt="Pipeline example"&gt;&lt;/a&gt;Pipeline example&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcq9ox1c1zifdusp9k6a.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcq9ox1c1zifdusp9k6a.jpeg" alt="List with all vulns found"&gt;&lt;/a&gt;List with all vulns found&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx8g581t9k2gqexyusms.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx8g581t9k2gqexyusms.jpeg" alt="Vuln details"&gt;&lt;/a&gt;Vuln details&lt;/p&gt;




&lt;h2&gt;
  
  
  Source Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus-github-action" rel="noopener noreferrer"&gt;VIlicus GitHub Action&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/edersonbrilhante/vilicus" rel="noopener noreferrer"&gt;Vilicus&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  That’s it!
&lt;/h2&gt;

&lt;p&gt;In case you have any questions, please leave a comment here or ping me on &lt;a href="https://www.linkedin.com/in/edersonbrilhante" rel="noopener noreferrer"&gt;🔗 LinkedIn&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>security</category>
      <category>docker</category>
      <category>devops</category>
      <category>tooling</category>
    </item>
  </channel>
</rss>
