Building Production-Ready Kubernetes Infrastructure with Pulumi and TypeScript
How to create a cost-optimized, secure, and scalable EKS cluster that's ready for production workloads
"Infrastructure as Code is like having a recipe for your restaurant. You don't want your chef to wing it every night - unless you're running a 'Surprise Kitchen' and your customers are into that kind of chaos."
Introduction
Kubernetes has become the de facto standard for container orchestration, but setting up a production-ready cluster on AWS can be complex and time-consuming. It's like trying to assemble IKEA furniture without the instructions - you'll get there eventually, but there will be tears, broken parts, and probably a divorce.
In this guide, we'll walk through building a complete Kubernetes infrastructure using Pulumi and TypeScript that includes cost optimization, security best practices, comprehensive monitoring, and CI/CD automation. Think of it as the "IKEA instructions" for your cloud infrastructure.
By the end of this post, you'll have a production-ready EKS cluster that can handle real-world workloads while maintaining security, cost efficiency, and operational excellence. And hopefully, your marriage will still be intact.
Why This Approach?
The Challenge
Traditional infrastructure setup often involves:
- Manual AWS console configuration (clicking buttons like you're playing a slot machine)
- Inconsistent deployments across environments (because who needs consistency, right?)
- Security gaps and compliance issues (it's not a bug, it's a feature!)
- High costs without optimization (money is just a social construct anyway)
- Limited monitoring and observability (ignorance is bliss, until it's not)
Our Solution
We'll use Infrastructure as Code (IaC) with Pulumi to create:
- Cost-optimized EKS cluster with Spot instances (because we're not made of money)
- Enhanced security with WAF, CloudTrail, and GuardDuty (paranoia is just good planning)
- Comprehensive monitoring with CloudWatch dashboards (we're watching you, infrastructure)
- Automated CI/CD with GitHub Actions (because manual deployments are so 2010)
- Modular architecture for easy maintenance (like LEGO, but for grown-ups)
Prerequisites
Before we begin, ensure you have:
# Node.js 20.11.0+ (because we're not savages using Node 14)
nvm use 20.11.0
# Pulumi CLI (the magic wand for infrastructure)
npm install -g @pulumi/pulumi
# AWS CLI configured (because clicking buttons is so passé)
aws configure
# kubectl for cluster management (the Swiss Army knife of Kubernetes)
brew install kubectl
Pro tip: If you're still using AWS CLI v1, you're basically driving a horse and buggy on the information superhighway.
Step 1: Project Structure and Setup
Let's start by creating a well-organized project structure. Because chaos is so last season.
kubernetes-infrastructure/
├── modules/
│ ├── eks.ts # EKS cluster configuration (the main event)
│ ├── network.ts # VPC and networking (the plumbing)
│ ├── security.ts # Security components (the bouncer)
│ └── monitoring.ts # CloudWatch dashboards (the surveillance system)
├── __tests__/ # Comprehensive test suite (because we're not gamblers)
├── scripts/ # Deployment automation (laziness is a virtue)
├── .github/workflows/ # CI/CD pipeline (the assembly line)
└── config.ts # Centralized configuration (the control center)
Initialize the Project
mkdir kubernetes-infrastructure
cd kubernetes-infrastructure
npm init -y
npm install @pulumi/pulumi @pulumi/aws @pulumi/awsx
npm install --save-dev typescript @types/node jest
This is like setting up your kitchen before you start cooking. You don't want to be running around looking for a spatula when your code is on fire.
Step 2: Core Infrastructure Components
Network Layer (VPC with AWSX)
We'll use AWSX for simplified VPC creation with best practices. It's like having a sous chef who knows exactly what you need before you ask.
// modules/network.ts
import * as awsx from "@pulumi/awsx";
export class NetworkStack extends pulumi.ComponentResource {
public readonly vpc: awsx.ec2.Vpc;
public readonly vpcId: pulumi.Output<string>;
public readonly privateSubnetIds: pulumi.Output<string[]>;
public readonly publicSubnetIds: pulumi.Output<string[]>;
constructor(name: string, args: NetworkStackArgs, opts?: pulumi.ComponentResourceOptions) {
super("kubernetes:network:NetworkStack", name, {}, opts);
// Create VPC with AWSX best practices
this.vpc = new awsx.ec2.Vpc(`${name}-vpc`, {
cidrBlock: args.vpcCidr,
numberOfAvailabilityZones: 3, // Because two is company, three is a party
subnets: [
{ type: "public", mapPublicIpOnLaunch: true }, // The front door
{ type: "private", mapPublicIpOnLaunch: false } // The back room
],
tags: {
...args.tags,
Name: `${name}-vpc`,
Component: "Network"
}
}, { parent: this });
// Export values for other modules
// Like leaving breadcrumbs for Hansel and Gretel
this.vpcId = this.vpc.vpcId;
this.privateSubnetIds = this.vpc.privateSubnetIds;
this.publicSubnetIds = this.vpc.publicSubnetIds;
}
}
EKS Cluster with Cost Optimization
// modules/eks.ts
export class EksCluster extends pulumi.ComponentResource {
public readonly cluster: aws.eks.Cluster;
public readonly nodeGroup: aws.eks.NodeGroup;
public readonly clusterName: pulumi.Output<string>;
public readonly clusterEndpoint: pulumi.Output<string>;
constructor(name: string, args: EksClusterArgs, opts?: pulumi.ComponentResourceOptions) {
super("kubernetes:eks:EksCluster", name, {}, opts);
// Create EKS cluster
// This is like building a restaurant - you need the kitchen (control plane) first
this.cluster = new aws.eks.Cluster(`${name}-cluster`, {
name: `${name}-cluster-${args.environment}`,
version: args.clusterVersion || "1.29", // Because 1.28 is so last year
roleMappings: [{
groups: ["system:masters"],
roleArn: args.clusterRoleArn,
username: "admin"
}],
vpcConfig: {
subnetIds: args.privateSubnetIds,
securityGroupIds: [args.clusterSecurityGroupId],
endpointPrivateAccess: true,
endpointPublicAccess: true // We're not hermits
},
tags: {
...args.tags,
Name: `${name}-cluster`,
Component: "EKS"
}
}, { parent: this });
// Create cost-optimized node group
// Using Spot instances because we're not made of money
// It's like buying day-old bread - still good, just cheaper
this.nodeGroup = new aws.eks.NodeGroup(`${name}-nodegroup`, {
clusterName: this.cluster.name,
nodeGroupName: `${name}-nodegroup-${args.environment}`,
nodeRoleArn: args.nodeRoleArn,
subnetIds: args.privateSubnetIds,
instanceTypes: [args.nodeGroupInstanceType],
capacityType: "SPOT", // The budget-friendly option
scalingConfig: {
desiredSize: args.nodeGroupDesiredCapacity,
maxSize: args.nodeGroupMaxSize,
minSize: args.nodeGroupMinSize
},
tags: {
...args.tags,
Name: `${name}-nodegroup`,
Component: "EKS"
}
}, { parent: this });
}
}
Step 3: Security Implementation
Multi-Layer Security Approach
"Security is like wearing a condom - it might not feel as good, but you'll thank yourself later."
// modules/security.ts
export class SecurityStack extends pulumi.ComponentResource {
public readonly wafWebAcl: aws.wafv2.WebAcl;
public readonly cloudTrail: aws.cloudtrail.Trail;
public readonly guardDutyDetector: aws.guardduty.Detector;
constructor(name: string, args: SecurityStackArgs, opts?: pulumi.ComponentResourceOptions) {
super("kubernetes:security:SecurityStack", name, {}, opts);
// WAF for API protection
// It's like having a bouncer at your API door
this.wafWebAcl = new aws.wafv2.WebAcl(`${name}-web-acl`, {
name: `${name}-web-acl-${args.environment}`,
scope: "REGIONAL",
defaultAction: { allow: {} }, // Innocent until proven guilty
rules: [{
name: "RateLimitRule",
priority: 1,
action: { block: {} }, // You're cut off!
statement: {
rateBasedStatement: {
limit: 2000, // 2000 requests per 5 minutes
aggregateKeyType: "IP" // By IP address, not by feelings
}
},
visibilityConfig: {
cloudwatchMetricsEnabled: true,
metricName: "RateLimitRule",
sampledRequestsEnabled: true
}
}],
tags: { ...args.tags, Component: "Security" }
}, { parent: this });
// CloudTrail for audit logging
// Because Big Brother is watching, and in this case, that's a good thing
this.cloudTrail = new aws.cloudtrail.Trail(`${name}-cloudtrail`, {
name: `${name}-cloudtrail-${args.environment}`,
s3BucketName: args.cloudTrailBucketName,
includeGlobalServiceEvents: true,
isMultiRegionTrail: true, // We're watching everywhere
enableLogFileValidation: true,
eventSelectors: [{
readWriteType: "All",
includeManagementEvents: true
}],
tags: { ...args.tags, Component: "Security" }
}, { parent: this });
// GuardDuty for threat detection
// Like having a security guard who never sleeps
this.guardDutyDetector = new aws.guardduty.Detector(`${name}-guardduty`, {
enable: true,
findingPublishingFrequency: "FIFTEEN_MINUTES", // Because threats don't wait
tags: { ...args.tags, Component: "Security" }
}, { parent: this });
}
}
Step 4: Monitoring and Observability
Comprehensive CloudWatch Dashboards
"Monitoring is like having a dashboard in your car - you don't need to look at it all the time, but when something goes wrong, you'll be glad it's there."
// modules/monitoring.ts
export class MonitoringStack extends pulumi.ComponentResource {
public readonly dashboard: aws.cloudwatch.Dashboard;
constructor(name: string, args: MonitoringStackArgs, opts?: pulumi.ComponentResourceOptions) {
super("kubernetes:monitoring", name, {}, opts);
// Create comprehensive dashboard
// It's like having a control room for your infrastructure
this.dashboard = new aws.cloudwatch.Dashboard(`${name}-dashboard`, {
dashboardName: `${name}-dashboard-${args.environment}`,
dashboardBody: pulumi.interpolate`{
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [
["AWS/EKS", "cluster_cpu_utilization", "ClusterName", "${args.clusterName}"],
[".", "cluster_memory_utilization", ".", "."]
],
"period": 300,
"title": "Cluster Resource Utilization"
}
},
{
"type": "metric",
"properties": {
"metrics": [
["AWS/EKS", "cluster_failed_node_count", "ClusterName", "${args.clusterName}"],
[".", "cluster_active_node_count", ".", "."]
],
"period": 300,
"title": "Node Health Status"
}
}
]
}`,
tags: { ...args.tags, Component: "Monitoring" }
}, { parent: this });
// Create alarms for critical metrics
// Because sometimes you need a wake-up call
this.createAlarms(name, args);
}
private createAlarms(name: string, args: MonitoringStackArgs): void {
const alarms = [
{
name: "HighCPUUtilization",
metricName: "cluster_cpu_utilization",
threshold: 80,
description: "High CPU utilization detected - your cluster is having a moment"
},
{
name: "HighMemoryUtilization",
metricName: "cluster_memory_utilization",
threshold: 80,
description: "High memory utilization detected - time for some spring cleaning"
}
];
alarms.forEach(alarm => {
new aws.cloudwatch.MetricAlarm(`${name}-${alarm.name}`, {
name: `${name}-${alarm.name}-${args.environment}`,
comparisonOperator: "GreaterThanThreshold",
evaluationPeriods: 2, // Two strikes and you're out
metricName: alarm.metricName,
namespace: "AWS/EKS",
period: 300,
statistic: "Average",
threshold: alarm.threshold,
alarmDescription: alarm.description,
tags: { ...args.tags, Component: "Monitoring" }
}, { parent: this });
});
}
}
Step 5: Cost Optimization Strategies
Spot Instances and Auto Scaling
"Cost optimization is like being frugal at a buffet - you want to get your money's worth, but you don't want to be that person who takes all the shrimp."
// Cost optimization in EKS configuration
const costOptimizedConfig = {
// Use Spot instances for up to 90% cost savings
// It's like buying airline tickets on the day of the flight
capacityType: "SPOT",
// Smaller instance types for dev environments
// Because your dev environment doesn't need to be a muscle car
instanceTypes: isDev ? ["t3.small"] : ["t3.medium"],
// Reduced node counts for dev
// One server is enough for development, unless you're testing how fast you can break things
scalingConfig: {
desiredSize: isDev ? 1 : 3,
maxSize: isDev ? 3 : 5,
minSize: isDev ? 1 : 2
},
// Single NAT Gateway to reduce costs
// Because one gateway is enough, unless you're running a toll booth business
natGatewayStrategy: "single"
};
Resource Management
// Pod disruption budgets for controlled scaling
// It's like having a "Do Not Disturb" sign for your pods
const podDisruptionBudget = new k8s.policy.v1.PodDisruptionBudget("app-pdb", {
spec: {
minAvailable: 1, // At least one pod must survive
selector: {
matchLabels: { app: "my-app" }
}
}
}, { provider: k8sProvider });
// Resource limits to prevent waste
// Because unlimited resources are like unlimited breadsticks - sounds great, ends badly
const deployment = new k8s.apps.v1.Deployment("app", {
spec: {
template: {
spec: {
containers: [{
name: "app",
resources: {
requests: { cpu: "100m", memory: "128Mi" }, // The minimum
limits: { cpu: "500m", memory: "512Mi" } // The maximum
}
}]
}
}
}
}, { provider: k8sProvider });
Step 6: CI/CD Pipeline
GitHub Actions Workflow
"CI/CD is like having a personal assistant who never calls in sick and doesn't need coffee breaks."
# .github/workflows/ci-cd.yml
name: Kubernetes Infrastructure CI/CD
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '20.11.0'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Security scan
run: npm audit
preview:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '20.11.0'
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Install Pulumi
uses: pulumi/setup-pulumi@v2
- name: Preview changes
run: pulumi preview --stack dev
deploy-dev:
needs: preview
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/develop'
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '20.11.0'
- name: Deploy to dev
run: |
export PULUMI_CONFIG_PASSPHRASE=${{ secrets.PULUMI_CONFIG_PASSPHRASE }}
pulumi up --yes --stack dev
Step 7: Deployment and Testing
Deploy the Infrastructure
"Deployment is like cooking - you can follow the recipe perfectly, but you still need to taste it to make sure it's good."
# Set up environment
export PULUMI_CONFIG_PASSPHRASE='your-secure-passphrase'
# Deploy infrastructure
pulumi up --yes
# Configure kubectl
aws eks update-kubeconfig --name $(pulumi stack output clusterName) --region us-east-1
# Verify deployment
kubectl get nodes
kubectl get pods --all-namespaces
Run Tests
# Unit tests
npm test
# Integration tests
npm run test:integration
# Security tests
npm run test:security
Testing is like proofreading your text messages before sending them - it might seem unnecessary, but it saves you from a lot of embarrassment.
Step 8: Production Considerations
Security Hardening
- Network Policies: Implement strict network policies (because trust no one)
- RBAC: Configure proper role-based access control (not everyone needs admin access)
- Pod Security Standards: Enable pod security admission (because pods can be sneaky)
- Secrets Management: Use AWS Secrets Manager (because hardcoding secrets is like writing your password on a billboard)
Monitoring and Alerting
- Custom Metrics: Implement application-specific metrics (because generic metrics are like generic compliments - nice but not meaningful)
- Log Aggregation: Centralize logs with CloudWatch (because logs are like receipts - boring but important)
- Alert Escalation: Set up proper alert routing (because waking up the wrong person at 3 AM is bad for business)
- Dashboard Access: Provide team access to monitoring (because knowledge is power, and power is responsibility)
Backup and Disaster Recovery
- ETCD Backups: Regular cluster state backups (because hope is not a strategy)
- Application Data: Persistent volume backups (because data is like money - you don't realize how much you have until you lose it)
- Multi-Region: Consider cross-region deployment (because putting all your eggs in one region is like putting all your money in one stock)
- Recovery Testing: Regular disaster recovery drills (because practice makes perfect, and perfect is expensive)
Cost Analysis and Optimization
Monthly Cost Breakdown (Dev Environment)
Component | Cost (USD) |
---|---|
EKS Control Plane | $0.10/hour = $73/month |
Worker Nodes (3x t3.small Spot) | ~$45/month |
NAT Gateway | $0.045/hour = $32/month |
CloudWatch Logs | ~$10/month |
Total | ~$160/month |
That's about the same as a Netflix subscription, but instead of watching movies, you're watching your infrastructure not break.
Production Cost Optimization
- Reserved Instances: 1-3 year commitments for 30-60% savings (like buying in bulk at Costco)
- Spot Instances: Up to 90% savings on worker nodes (the budget-friendly option)
- Auto Scaling: Scale down during off-hours (because servers don't need to work overtime)
- Resource Limits: Prevent resource waste (because unlimited anything is usually a bad idea)
Best Practices and Lessons Learned
Infrastructure as Code Benefits
- Reproducibility: Identical environments every time (like having a recipe that actually works)
- Version Control: Track infrastructure changes (because "it was working yesterday" is not a debugging strategy)
- Collaboration: Team can review and contribute (because two heads are better than one, unless they're arguing)
- Testing: Validate changes before deployment (because testing in production is like learning to swim in the deep end)
Security First Approach
- Least Privilege: Minimal required permissions (because giving everyone admin access is like giving everyone keys to your house)
- Network Segmentation: Private subnets for workers (because not everything needs to be on the internet)
- Audit Logging: Complete activity tracking (because if you're not logging it, it didn't happen)
- Threat Detection: Automated security monitoring (because security through obscurity is like hiding your keys under the doormat)
Operational Excellence
- Monitoring: Comprehensive observability (because you can't fix what you can't see)
- Alerting: Proactive issue detection (because being reactive is expensive)
- Documentation: Clear runbooks and procedures (because tribal knowledge is like a game of telephone)
- Automation: Reduce manual operations (because humans are great at creativity, not repetition)
Conclusion
Building a production-ready Kubernetes infrastructure doesn't have to be overwhelming. By using Pulumi with TypeScript, we've created a solution that is:
- Cost-effective with Spot instances and optimization (because money doesn't grow on trees)
- Secure with multi-layer protection (because paranoia is just good planning)
- Observable with comprehensive monitoring (because ignorance is not bliss when it comes to infrastructure)
- Automated with CI/CD pipelines (because manual deployments are so 2010)
- Maintainable with modular architecture (because technical debt is like credit card debt - it compounds)
This infrastructure can scale from development to production while maintaining security, cost efficiency, and operational excellence. The modular approach makes it easy to customize for your specific needs while following AWS and Kubernetes best practices.
Next Steps
- Customize the configuration for your specific requirements (because one size doesn't fit all)
- Test thoroughly in a staging environment (because staging is like a dress rehearsal - it's not the real thing, but it's close enough)
- Monitor costs and performance closely (because what gets measured gets managed)
- Iterate based on real-world usage patterns (because the best plan is the one that adapts)
Remember, infrastructure is never "done" - it's a continuous journey of improvement and optimization. Start with this foundation and build upon it as your needs evolve. Because in the world of infrastructure, the only constant is change, and the only certainty is that something will break when you least expect it.
Ready to build your own production-ready Kubernetes infrastructure? The complete code and documentation are available in the GitHub repository.
And remember: "The best time to plant a tree was 20 years ago. The second best time is now." The same applies to infrastructure - the best time to set up proper infrastructure was yesterday, but today is a close second.
Resources:
- Pulumi Documentation
- Pulumi Crosswalk for AWS - AWSx
- AWS EKS Best Practices
- Kubernetes Security
- CloudWatch Monitoring
"Infrastructure as Code is like having a backup plan for your backup plan. It's not paranoia if they're really out to get your servers."
Top comments (0)