DEV Community

Foster Kojo Luh
Foster Kojo Luh

Posted on

Building Production-Ready Kubernetes Infrastructure with Pulumi, AWS, and TypeScript

Building Production-Ready Kubernetes Infrastructure with Pulumi and TypeScript

How to create a cost-optimized, secure, and scalable EKS cluster that's ready for production workloads

"Infrastructure as Code is like having a recipe for your restaurant. You don't want your chef to wing it every night - unless you're running a 'Surprise Kitchen' and your customers are into that kind of chaos."


Introduction

Kubernetes has become the de facto standard for container orchestration, but setting up a production-ready cluster on AWS can be complex and time-consuming. It's like trying to assemble IKEA furniture without the instructions - you'll get there eventually, but there will be tears, broken parts, and probably a divorce.

In this guide, we'll walk through building a complete Kubernetes infrastructure using Pulumi and TypeScript that includes cost optimization, security best practices, comprehensive monitoring, and CI/CD automation. Think of it as the "IKEA instructions" for your cloud infrastructure.

By the end of this post, you'll have a production-ready EKS cluster that can handle real-world workloads while maintaining security, cost efficiency, and operational excellence. And hopefully, your marriage will still be intact.


Why This Approach?

The Challenge

Traditional infrastructure setup often involves:

  • Manual AWS console configuration (clicking buttons like you're playing a slot machine)
  • Inconsistent deployments across environments (because who needs consistency, right?)
  • Security gaps and compliance issues (it's not a bug, it's a feature!)
  • High costs without optimization (money is just a social construct anyway)
  • Limited monitoring and observability (ignorance is bliss, until it's not)

Our Solution

We'll use Infrastructure as Code (IaC) with Pulumi to create:

  • Cost-optimized EKS cluster with Spot instances (because we're not made of money)
  • Enhanced security with WAF, CloudTrail, and GuardDuty (paranoia is just good planning)
  • Comprehensive monitoring with CloudWatch dashboards (we're watching you, infrastructure)
  • Automated CI/CD with GitHub Actions (because manual deployments are so 2010)
  • Modular architecture for easy maintenance (like LEGO, but for grown-ups)

Prerequisites

Before we begin, ensure you have:

# Node.js 20.11.0+ (because we're not savages using Node 14)
nvm use 20.11.0

# Pulumi CLI (the magic wand for infrastructure)
npm install -g @pulumi/pulumi

# AWS CLI configured (because clicking buttons is so passé)
aws configure

# kubectl for cluster management (the Swiss Army knife of Kubernetes)
brew install kubectl
Enter fullscreen mode Exit fullscreen mode

Pro tip: If you're still using AWS CLI v1, you're basically driving a horse and buggy on the information superhighway.


Step 1: Project Structure and Setup

Let's start by creating a well-organized project structure. Because chaos is so last season.

kubernetes-infrastructure/
├── modules/
│   ├── eks.ts          # EKS cluster configuration (the main event)
│   ├── network.ts      # VPC and networking (the plumbing)
│   ├── security.ts     # Security components (the bouncer)
│   └── monitoring.ts   # CloudWatch dashboards (the surveillance system)
├── __tests__/          # Comprehensive test suite (because we're not gamblers)
├── scripts/            # Deployment automation (laziness is a virtue)
├── .github/workflows/  # CI/CD pipeline (the assembly line)
└── config.ts          # Centralized configuration (the control center)
Enter fullscreen mode Exit fullscreen mode

Initialize the Project

mkdir kubernetes-infrastructure
cd kubernetes-infrastructure
npm init -y
npm install @pulumi/pulumi @pulumi/aws @pulumi/awsx
npm install --save-dev typescript @types/node jest
Enter fullscreen mode Exit fullscreen mode

This is like setting up your kitchen before you start cooking. You don't want to be running around looking for a spatula when your code is on fire.


Step 2: Core Infrastructure Components

Network Layer (VPC with AWSX)

We'll use AWSX for simplified VPC creation with best practices. It's like having a sous chef who knows exactly what you need before you ask.

// modules/network.ts
import * as awsx from "@pulumi/awsx";

export class NetworkStack extends pulumi.ComponentResource {
    public readonly vpc: awsx.ec2.Vpc;
    public readonly vpcId: pulumi.Output<string>;
    public readonly privateSubnetIds: pulumi.Output<string[]>;
    public readonly publicSubnetIds: pulumi.Output<string[]>;

    constructor(name: string, args: NetworkStackArgs, opts?: pulumi.ComponentResourceOptions) {
        super("kubernetes:network:NetworkStack", name, {}, opts);

        // Create VPC with AWSX best practices
        this.vpc = new awsx.ec2.Vpc(`${name}-vpc`, {
            cidrBlock: args.vpcCidr,
            numberOfAvailabilityZones: 3, // Because two is company, three is a party
            subnets: [
                { type: "public", mapPublicIpOnLaunch: true },   // The front door
                { type: "private", mapPublicIpOnLaunch: false }  // The back room
            ],
            tags: {
                ...args.tags,
                Name: `${name}-vpc`,
                Component: "Network"
            }
        }, { parent: this });

        // Export values for other modules
        // Like leaving breadcrumbs for Hansel and Gretel
        this.vpcId = this.vpc.vpcId;
        this.privateSubnetIds = this.vpc.privateSubnetIds;
        this.publicSubnetIds = this.vpc.publicSubnetIds;
    }
}
Enter fullscreen mode Exit fullscreen mode

EKS Cluster with Cost Optimization

// modules/eks.ts
export class EksCluster extends pulumi.ComponentResource {
    public readonly cluster: aws.eks.Cluster;
    public readonly nodeGroup: aws.eks.NodeGroup;
    public readonly clusterName: pulumi.Output<string>;
    public readonly clusterEndpoint: pulumi.Output<string>;

    constructor(name: string, args: EksClusterArgs, opts?: pulumi.ComponentResourceOptions) {
        super("kubernetes:eks:EksCluster", name, {}, opts);

        // Create EKS cluster
        // This is like building a restaurant - you need the kitchen (control plane) first
        this.cluster = new aws.eks.Cluster(`${name}-cluster`, {
            name: `${name}-cluster-${args.environment}`,
            version: args.clusterVersion || "1.29", // Because 1.28 is so last year
            roleMappings: [{
                groups: ["system:masters"],
                roleArn: args.clusterRoleArn,
                username: "admin"
            }],
            vpcConfig: {
                subnetIds: args.privateSubnetIds,
                securityGroupIds: [args.clusterSecurityGroupId],
                endpointPrivateAccess: true,
                endpointPublicAccess: true // We're not hermits
            },
            tags: {
                ...args.tags,
                Name: `${name}-cluster`,
                Component: "EKS"
            }
        }, { parent: this });

        // Create cost-optimized node group
        // Using Spot instances because we're not made of money
        // It's like buying day-old bread - still good, just cheaper
        this.nodeGroup = new aws.eks.NodeGroup(`${name}-nodegroup`, {
            clusterName: this.cluster.name,
            nodeGroupName: `${name}-nodegroup-${args.environment}`,
            nodeRoleArn: args.nodeRoleArn,
            subnetIds: args.privateSubnetIds,
            instanceTypes: [args.nodeGroupInstanceType],
            capacityType: "SPOT", // The budget-friendly option
            scalingConfig: {
                desiredSize: args.nodeGroupDesiredCapacity,
                maxSize: args.nodeGroupMaxSize,
                minSize: args.nodeGroupMinSize
            },
            tags: {
                ...args.tags,
                Name: `${name}-nodegroup`,
                Component: "EKS"
            }
        }, { parent: this });
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Security Implementation

Multi-Layer Security Approach

"Security is like wearing a condom - it might not feel as good, but you'll thank yourself later."

// modules/security.ts
export class SecurityStack extends pulumi.ComponentResource {
    public readonly wafWebAcl: aws.wafv2.WebAcl;
    public readonly cloudTrail: aws.cloudtrail.Trail;
    public readonly guardDutyDetector: aws.guardduty.Detector;

    constructor(name: string, args: SecurityStackArgs, opts?: pulumi.ComponentResourceOptions) {
        super("kubernetes:security:SecurityStack", name, {}, opts);

        // WAF for API protection
        // It's like having a bouncer at your API door
        this.wafWebAcl = new aws.wafv2.WebAcl(`${name}-web-acl`, {
            name: `${name}-web-acl-${args.environment}`,
            scope: "REGIONAL",
            defaultAction: { allow: {} }, // Innocent until proven guilty
            rules: [{
                name: "RateLimitRule",
                priority: 1,
                action: { block: {} }, // You're cut off!
                statement: {
                    rateBasedStatement: {
                        limit: 2000, // 2000 requests per 5 minutes
                        aggregateKeyType: "IP" // By IP address, not by feelings
                    }
                },
                visibilityConfig: {
                    cloudwatchMetricsEnabled: true,
                    metricName: "RateLimitRule",
                    sampledRequestsEnabled: true
                }
            }],
            tags: { ...args.tags, Component: "Security" }
        }, { parent: this });

        // CloudTrail for audit logging
        // Because Big Brother is watching, and in this case, that's a good thing
        this.cloudTrail = new aws.cloudtrail.Trail(`${name}-cloudtrail`, {
            name: `${name}-cloudtrail-${args.environment}`,
            s3BucketName: args.cloudTrailBucketName,
            includeGlobalServiceEvents: true,
            isMultiRegionTrail: true, // We're watching everywhere
            enableLogFileValidation: true,
            eventSelectors: [{
                readWriteType: "All",
                includeManagementEvents: true
            }],
            tags: { ...args.tags, Component: "Security" }
        }, { parent: this });

        // GuardDuty for threat detection
        // Like having a security guard who never sleeps
        this.guardDutyDetector = new aws.guardduty.Detector(`${name}-guardduty`, {
            enable: true,
            findingPublishingFrequency: "FIFTEEN_MINUTES", // Because threats don't wait
            tags: { ...args.tags, Component: "Security" }
        }, { parent: this });
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Monitoring and Observability

Comprehensive CloudWatch Dashboards

"Monitoring is like having a dashboard in your car - you don't need to look at it all the time, but when something goes wrong, you'll be glad it's there."

// modules/monitoring.ts
export class MonitoringStack extends pulumi.ComponentResource {
    public readonly dashboard: aws.cloudwatch.Dashboard;

    constructor(name: string, args: MonitoringStackArgs, opts?: pulumi.ComponentResourceOptions) {
        super("kubernetes:monitoring", name, {}, opts);

        // Create comprehensive dashboard
        // It's like having a control room for your infrastructure
        this.dashboard = new aws.cloudwatch.Dashboard(`${name}-dashboard`, {
            dashboardName: `${name}-dashboard-${args.environment}`,
            dashboardBody: pulumi.interpolate`{
                "widgets": [
                    {
                        "type": "metric",
                        "properties": {
                            "metrics": [
                                ["AWS/EKS", "cluster_cpu_utilization", "ClusterName", "${args.clusterName}"],
                                [".", "cluster_memory_utilization", ".", "."]
                            ],
                            "period": 300,
                            "title": "Cluster Resource Utilization"
                        }
                    },
                    {
                        "type": "metric",
                        "properties": {
                            "metrics": [
                                ["AWS/EKS", "cluster_failed_node_count", "ClusterName", "${args.clusterName}"],
                                [".", "cluster_active_node_count", ".", "."]
                            ],
                            "period": 300,
                            "title": "Node Health Status"
                        }
                    }
                ]
            }`,
            tags: { ...args.tags, Component: "Monitoring" }
        }, { parent: this });

        // Create alarms for critical metrics
        // Because sometimes you need a wake-up call
        this.createAlarms(name, args);
    }

    private createAlarms(name: string, args: MonitoringStackArgs): void {
        const alarms = [
            {
                name: "HighCPUUtilization",
                metricName: "cluster_cpu_utilization",
                threshold: 80,
                description: "High CPU utilization detected - your cluster is having a moment"
            },
            {
                name: "HighMemoryUtilization", 
                metricName: "cluster_memory_utilization",
                threshold: 80,
                description: "High memory utilization detected - time for some spring cleaning"
            }
        ];

        alarms.forEach(alarm => {
            new aws.cloudwatch.MetricAlarm(`${name}-${alarm.name}`, {
                name: `${name}-${alarm.name}-${args.environment}`,
                comparisonOperator: "GreaterThanThreshold",
                evaluationPeriods: 2, // Two strikes and you're out
                metricName: alarm.metricName,
                namespace: "AWS/EKS",
                period: 300,
                statistic: "Average",
                threshold: alarm.threshold,
                alarmDescription: alarm.description,
                tags: { ...args.tags, Component: "Monitoring" }
            }, { parent: this });
        });
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 5: Cost Optimization Strategies

Spot Instances and Auto Scaling

"Cost optimization is like being frugal at a buffet - you want to get your money's worth, but you don't want to be that person who takes all the shrimp."

// Cost optimization in EKS configuration
const costOptimizedConfig = {
    // Use Spot instances for up to 90% cost savings
    // It's like buying airline tickets on the day of the flight
    capacityType: "SPOT",

    // Smaller instance types for dev environments
    // Because your dev environment doesn't need to be a muscle car
    instanceTypes: isDev ? ["t3.small"] : ["t3.medium"],

    // Reduced node counts for dev
    // One server is enough for development, unless you're testing how fast you can break things
    scalingConfig: {
        desiredSize: isDev ? 1 : 3,
        maxSize: isDev ? 3 : 5,
        minSize: isDev ? 1 : 2
    },

    // Single NAT Gateway to reduce costs
    // Because one gateway is enough, unless you're running a toll booth business
    natGatewayStrategy: "single"
};
Enter fullscreen mode Exit fullscreen mode

Resource Management

// Pod disruption budgets for controlled scaling
// It's like having a "Do Not Disturb" sign for your pods
const podDisruptionBudget = new k8s.policy.v1.PodDisruptionBudget("app-pdb", {
    spec: {
        minAvailable: 1, // At least one pod must survive
        selector: {
            matchLabels: { app: "my-app" }
        }
    }
}, { provider: k8sProvider });

// Resource limits to prevent waste
// Because unlimited resources are like unlimited breadsticks - sounds great, ends badly
const deployment = new k8s.apps.v1.Deployment("app", {
    spec: {
        template: {
            spec: {
                containers: [{
                    name: "app",
                    resources: {
                        requests: { cpu: "100m", memory: "128Mi" }, // The minimum
                        limits: { cpu: "500m", memory: "512Mi" }     // The maximum
                    }
                }]
            }
        }
    }
}, { provider: k8sProvider });
Enter fullscreen mode Exit fullscreen mode

Step 6: CI/CD Pipeline

GitHub Actions Workflow

"CI/CD is like having a personal assistant who never calls in sick and doesn't need coffee breaks."

# .github/workflows/ci-cd.yml
name: Kubernetes Infrastructure CI/CD

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '20.11.0'

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Security scan
        run: npm audit

  preview:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '20.11.0'

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Install Pulumi
        uses: pulumi/setup-pulumi@v2

      - name: Preview changes
        run: pulumi preview --stack dev

  deploy-dev:
    needs: preview
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '20.11.0'

      - name: Deploy to dev
        run: |
          export PULUMI_CONFIG_PASSPHRASE=${{ secrets.PULUMI_CONFIG_PASSPHRASE }}
          pulumi up --yes --stack dev
Enter fullscreen mode Exit fullscreen mode

Step 7: Deployment and Testing

Deploy the Infrastructure

"Deployment is like cooking - you can follow the recipe perfectly, but you still need to taste it to make sure it's good."

# Set up environment
export PULUMI_CONFIG_PASSPHRASE='your-secure-passphrase'

# Deploy infrastructure
pulumi up --yes

# Configure kubectl
aws eks update-kubeconfig --name $(pulumi stack output clusterName) --region us-east-1

# Verify deployment
kubectl get nodes
kubectl get pods --all-namespaces
Enter fullscreen mode Exit fullscreen mode

Run Tests

# Unit tests
npm test

# Integration tests
npm run test:integration

# Security tests
npm run test:security
Enter fullscreen mode Exit fullscreen mode

Testing is like proofreading your text messages before sending them - it might seem unnecessary, but it saves you from a lot of embarrassment.


Step 8: Production Considerations

Security Hardening

  1. Network Policies: Implement strict network policies (because trust no one)
  2. RBAC: Configure proper role-based access control (not everyone needs admin access)
  3. Pod Security Standards: Enable pod security admission (because pods can be sneaky)
  4. Secrets Management: Use AWS Secrets Manager (because hardcoding secrets is like writing your password on a billboard)

Monitoring and Alerting

  1. Custom Metrics: Implement application-specific metrics (because generic metrics are like generic compliments - nice but not meaningful)
  2. Log Aggregation: Centralize logs with CloudWatch (because logs are like receipts - boring but important)
  3. Alert Escalation: Set up proper alert routing (because waking up the wrong person at 3 AM is bad for business)
  4. Dashboard Access: Provide team access to monitoring (because knowledge is power, and power is responsibility)

Backup and Disaster Recovery

  1. ETCD Backups: Regular cluster state backups (because hope is not a strategy)
  2. Application Data: Persistent volume backups (because data is like money - you don't realize how much you have until you lose it)
  3. Multi-Region: Consider cross-region deployment (because putting all your eggs in one region is like putting all your money in one stock)
  4. Recovery Testing: Regular disaster recovery drills (because practice makes perfect, and perfect is expensive)

Cost Analysis and Optimization

Monthly Cost Breakdown (Dev Environment)

Component Cost (USD)
EKS Control Plane $0.10/hour = $73/month
Worker Nodes (3x t3.small Spot) ~$45/month
NAT Gateway $0.045/hour = $32/month
CloudWatch Logs ~$10/month
Total ~$160/month

That's about the same as a Netflix subscription, but instead of watching movies, you're watching your infrastructure not break.

Production Cost Optimization

  • Reserved Instances: 1-3 year commitments for 30-60% savings (like buying in bulk at Costco)
  • Spot Instances: Up to 90% savings on worker nodes (the budget-friendly option)
  • Auto Scaling: Scale down during off-hours (because servers don't need to work overtime)
  • Resource Limits: Prevent resource waste (because unlimited anything is usually a bad idea)

Best Practices and Lessons Learned

Infrastructure as Code Benefits

  1. Reproducibility: Identical environments every time (like having a recipe that actually works)
  2. Version Control: Track infrastructure changes (because "it was working yesterday" is not a debugging strategy)
  3. Collaboration: Team can review and contribute (because two heads are better than one, unless they're arguing)
  4. Testing: Validate changes before deployment (because testing in production is like learning to swim in the deep end)

Security First Approach

  1. Least Privilege: Minimal required permissions (because giving everyone admin access is like giving everyone keys to your house)
  2. Network Segmentation: Private subnets for workers (because not everything needs to be on the internet)
  3. Audit Logging: Complete activity tracking (because if you're not logging it, it didn't happen)
  4. Threat Detection: Automated security monitoring (because security through obscurity is like hiding your keys under the doormat)

Operational Excellence

  1. Monitoring: Comprehensive observability (because you can't fix what you can't see)
  2. Alerting: Proactive issue detection (because being reactive is expensive)
  3. Documentation: Clear runbooks and procedures (because tribal knowledge is like a game of telephone)
  4. Automation: Reduce manual operations (because humans are great at creativity, not repetition)

Conclusion

Building a production-ready Kubernetes infrastructure doesn't have to be overwhelming. By using Pulumi with TypeScript, we've created a solution that is:

  • Cost-effective with Spot instances and optimization (because money doesn't grow on trees)
  • Secure with multi-layer protection (because paranoia is just good planning)
  • Observable with comprehensive monitoring (because ignorance is not bliss when it comes to infrastructure)
  • Automated with CI/CD pipelines (because manual deployments are so 2010)
  • Maintainable with modular architecture (because technical debt is like credit card debt - it compounds)

This infrastructure can scale from development to production while maintaining security, cost efficiency, and operational excellence. The modular approach makes it easy to customize for your specific needs while following AWS and Kubernetes best practices.

Next Steps

  1. Customize the configuration for your specific requirements (because one size doesn't fit all)
  2. Test thoroughly in a staging environment (because staging is like a dress rehearsal - it's not the real thing, but it's close enough)
  3. Monitor costs and performance closely (because what gets measured gets managed)
  4. Iterate based on real-world usage patterns (because the best plan is the one that adapts)

Remember, infrastructure is never "done" - it's a continuous journey of improvement and optimization. Start with this foundation and build upon it as your needs evolve. Because in the world of infrastructure, the only constant is change, and the only certainty is that something will break when you least expect it.


Ready to build your own production-ready Kubernetes infrastructure? The complete code and documentation are available in the GitHub repository.

And remember: "The best time to plant a tree was 20 years ago. The second best time is now." The same applies to infrastructure - the best time to set up proper infrastructure was yesterday, but today is a close second.


Resources:

"Infrastructure as Code is like having a backup plan for your backup plan. It's not paranoia if they're really out to get your servers."

Top comments (0)