DEV Community: hugolesta

How to Upgrade EKS 1.32: Making the Switch from bootstrap.sh to nodeadm

hugolesta — Tue, 28 Oct 2025 17:10:23 +0000

Did you know staying on a deprecated version of Kubernetes in EKS can cost you six times more? At $0.60 per hour instead of the standard $0.10 per hour, you could end up paying nearly $500 per month just for running outdated clusters.

However, the EKS upgrade from version 1.31 to 1.32 isn't just about avoiding extended support fees—it introduces one of the biggest under-the-hood changes in recent EKS history. Specifically, the traditional bootstrap.sh script used for years to configure worker nodes is now replaced by a new tool called nodeadm.

This architectural shift coincides with another critical change: after November 26, 2025, Amazon EKS will no longer publish EKS-optimized Amazon Linux 2 (AL2) AMIs. Furthermore, Kubernetes 1.32 will be the final version with AL2 AMI support, while version 1.33 onwards will only support Amazon Linux 2023 (AL2023) and Bottlerocket based AMIs.

If your clusters currently run EKS 1.31 or earlier on Amazon Linux 2 AMIs, upgrading to 1.32 will break your node initialization unless you adapt your user-data scripts and switch to AL2023. Throughout this article, we'll walk through exactly what changes between versions 1.31 and 1.32, why nodeadm is now required, and how to rewrite your user-data and Terraform templates to ensure a smooth transition.

Why EKS 1.32 Requires a New Bootstrap Approach

Amazon's evolution of EKS introduces significant architectural changes with version 1.32. The most critical change affects how worker nodes bootstrap and join your clusters, requiring careful attention during upgrades.

End of Support for bootstrap.sh in AL2023

EKS 1.32 marks a pivotal shift as it's the final version for which Amazon will release Amazon Linux 2 (AL2) AMIs. Starting with Kubernetes 1.32, AL2023 introduces a completely different node initialization process that abandons the traditional /etc/eks/bootstrap.sh script. This script has been the foundation of EKS node bootstrapping since the service launched, but is now replaced entirely in the AL2023 operating system. The familiar bash-based bootstrap approach that many DevOps teams have built automation around is completely absent in the new OS version.

nodeadm as the New Default Bootstrap Tool

AL2023 replaces bootstrap.sh with nodeadm, a tool that uses a YAML configuration schema. Unlike the previous approach where metadata was discovered automatically through the Amazon EKS DescribeCluster API call, nodeadm requires explicit provision of cluster information. This fundamental change means you must now specify three critical parameters that were previously auto-discovered:

apiServerEndpoint
certificateAuthority
service CIDR

Additionally, the format for applying parameters to kubelet has changed. Previously accomplished with --kubelet-extra-args, node customization now requires using NodeConfigSpec. This shift aims to reduce API throttling risks during large-scale node deployments.

Impact on Existing AL2-Based Clusters

The elimination of bootstrap.sh creates immediate backward compatibility issues. When upgrading an EKS cluster to version 1.32 while still using AL2-based node groups, nodes will fail to join the cluster. Any automation depending on bootstrap.sh will break, as files like /etc/eks/bootstrap.sh and /etc/eks/eni-max-pods.txt no longer exist.

For organizations with self-managed node groups or custom AMI configurations, this requires extracting and explicitly providing cluster metadata that was formerly obtained automatically. Consequently, deployment scripts, Terraform modules, and CloudFormation templates must be rewritten to align with the new declarative approach before successfully migrating to EKS 1.32.

Preparing for the Migration to nodeadm

Before migrating to EKS 1.32 with nodeadm, careful preparation is essential to ensure a smooth transition from the traditional bootstrap approach to the new paradigm.

Identifying Affected Clusters Running AL2

After November 26, 2025, AWS will end support for EKS AL2-optimized AMIs. Kubernetes version 1.32 represents the final release where Amazon EKS will provide AL2 AMIs. This deadline necessitates prompt action, especially for organizations with multiple clusters.

To identify affected clusters:

Check the AMI type for each node group using:

aws eks describe-nodegroup --cluster-name <cluster-name> --nodegroup-name <nodegroup-name>

Examine existing user-data scripts that reference /etc/eks/bootstrap.sh, which won't exist in AL2023.

Choosing Between AL2023 and Bottlerocket AMIs

Upon identifying clusters requiring migration, you must decide between AL2023 and Bottlerocket as your future node operating system.

AL2023 offers several advantages over AL2:

Secure-by-default approach with preconfigured security policies
SELinux in permissive mode
IMDSv2-only mode enabled by default
Optimized boot times and improved package management

Alternatively, Bottlerocket provides:

Purpose-built container-optimized design with minimal attack surface
Enhanced security with read-only file systems
Automatic updates
Improved compliance with security standards like CIS benchmarks

Choose AL2023 when you need significant customizations with direct OS-level access. Opt for Bottlerocket if you prefer a container-native approach with minimal node customization.

Scanning for Deprecated APIs with kubent or pluto

Prior to upgrading, scan for deprecated APIs that might break during the transition. Kubernetes frequently removes beta APIs with each new version, potentially disrupting your workloads.

The kube-no-trouble tool (kubent) efficiently identifies resources using deprecated APIs:

kubent

This scans all accessible namespaces and lists APIs that will be deprecated compared to your current Kubernetes version. For clusters with hundreds of applications across multiple namespaces, this tool proves invaluable in detecting potential upgrade issues beforehand.

Backing Up Cluster State and Node Configurations

A comprehensive migration plan should always include thorough backups. Given that you cannot downgrade an EKS cluster after upgrading, backups become crucial.

Recommended backup steps include:

Taking etcd snapshots for core Kubernetes data
Backing up cluster configuration using tools like Velero
Documenting node-specific configurations, especially custom user-data scripts
Preserving IAM role configurations that will need migration to the new nodeadm format

Additionally, establish a documented rollback procedure with well-defined testing protocols before proceeding with the upgrade.

Implementing nodeadm and AL2023

Image Source: AWS Documentation

Debugging and Validating the Upgrade

After implementing nodeadm and upgrading to EKS 1.32, troubleshooting becomes essential as new components may not function perfectly on the first try. The migration introduces different debugging approaches that we need to master.

Common nodeadm Errors and How to Fix Them

The nodeadm debug command serves as our first line of defense when troubleshooting unhealthy or misconfigured nodes. It validates critical requirements including network access to AWS APIs, credentials for the IAM role, connectivity to the EKS Kubernetes API endpoint, and authentication with the EKS cluster.

nodeadm debug -c file://nodeConfig.yaml

For configuration validation before implementation, the nodeadm config check command proves invaluable:

nodeadm config check -c file://nodeConfig.yaml

Most permission-related issues arise from the Hybrid Nodes IAM role missing the necessary eks:DescribeCluster action. Other common errors include network connectivity problems, incorrect node IP configuration, and timeout issues which can be remedied by extending timeouts:

nodeadm install K8S_VERSION --credential-provider CREDS_PROVIDER --timeout 20m0s

Verifying kubelet and containerd Startup Logs

Examining kubelet logs offers visibility into node initialization problems. For AL2023 nodes, we can use:

systemctl status kubelet
journalctl -u kubelet -o cat

Moreover, checking the status helps verify successful restarts after upgrades. For deeper troubleshooting, we can connect to the node using SSH or kubectl debug and inspect the logs. My personal preference is using AWS Systems Manager Session Manager to enhance security, avoiding opening SSH ports:

chroot /host journalctl -u kubelet -o cat

Checking Node Readiness and Taints

To verify node status after migration, we use standard kubectl commands:

kubectl get nodes -o wide
kubectl describe node NODE_NAME

The STATUS column should display "Ready" for all nodes with the updated version number visible. Untainted nodes are essential for workload scheduling. Furthermore, checking events related to nodes can reveal issues with node registration or initialization.

Validating Add-on Compatibility Post-Upgrade

Post-upgrade add-on validation requires checking deployment versions and ensuring pods are running correctly. For example, to verify the CoreDNS version:

kubectl describe deployment coredns -n kube-system | grep Image | cut -d ':' -f 3

To inspect add-on logs for errors:

kubectl logs -n kube-system -l k8s-app=kube-dns

For networking add-ons like Amazon VPC CNI, we should create test pods to validate IP assignment. Additionally, testing CoreDNS functionality using tools like nslookup ensures proper DNS resolution. Finally, checking if the number of replicas equals the number of nodes for daemonset add-ons like vpc-cni and kube-proxy confirms proper deployment.

Conclusion

Migrating to EKS 1.32 represents a significant paradigm shift for Kubernetes operations on AWS. The transition from bootstrap.sh to nodeadm fundamentally changes how worker nodes join your clusters, requiring careful planning and execution. Additionally, the impending deprecation of Amazon Linux 2 AMIs after November 26, 2025, creates urgency for organizations to adapt their infrastructure.

Throughout this upgrade journey, you must remember that nodeadm demands explicit configuration through YAML rather than command-line arguments. This declarative approach actually offers better consistency and reproducibility for node configurations once implemented correctly. Undoubtedly, the initial migration might seem daunting, especially when rewriting existing automation scripts or Terraform modules.

The choice between AL2023 and Bottlerocket AMIs depends largely on your specific operational requirements. AL2023 provides a familiar environment with improved security features, whereas Bottlerocket offers a container-optimized approach with minimal attack surface. Regardless of your choice, both options eliminate the traditional bootstrap files and require adaptation to the new nodeadm paradigm.

Before initiating any upgrade, thorough preparation becomes essential. First, identify affected clusters running AL2, then scan for deprecated APIs, and finally back up your cluster state. After implementing the necessary changes, debugging tools like nodeadm debug and nodeadm config check help troubleshoot any issues that arise during the migration process.

The EKS 1.32 upgrade, therefore, presents both challenges and opportunities. While it requires significant changes to existing workflows, it also aligns your infrastructure with AWS's future direction. Consequently, organizations that proactively adapt their node initialization processes will avoid extended support fees and benefit from improved security and performance features of newer operating systems.

Ultimately, staying current with EKS versions not only saves operational costs but also ensures compatibility with the evolving Kubernetes ecosystem. Though this particular upgrade demands more effort than typical version bumps, the long-term benefits of embracing nodeadm and AL2023 far outweigh the initial investment required for migration.

Key Takeaways

EKS 1.32 introduces the most significant architectural change in recent history, requiring organizations to abandon the traditional bootstrap.sh script in favor of nodeadm for worker node initialization.

• Critical deadline approaching: Amazon ends AL2 AMI support on November 26, 2025, making EKS 1.32 the final version supporting Amazon Linux 2.

• Bootstrap method completely changes: nodeadm replaces bootstrap.sh and requires explicit YAML configuration instead of automatic cluster metadata discovery.

• Migration requires careful preparation: Identify AL2-based clusters, scan for deprecated APIs with kubent, and backup cluster state before upgrading.

• Choose your AMI strategy: Select between AL2023 for familiar environments with enhanced security or Bottlerocket for container-optimized minimal attack surface.

• Debug with new tools: Use nodeadm debug and nodeadm config check commands to troubleshoot configuration issues and validate node readiness.

• Avoid costly extended support: Staying on deprecated Kubernetes versions costs 6x more ($0.60/hour vs $0.10/hour), potentially adding $500+ monthly per cluster.

The shift to nodeadm represents AWS's commitment to more secure, declarative infrastructure management. While the initial migration requires significant effort in rewriting automation scripts and Terraform modules, organizations that proactively adapt will benefit from improved security, performance, and long-term cost savings.

FAQs

Q1. What is the main change in EKS 1.32 that affects node initialization? EKS 1.32 replaces the traditional bootstrap.sh script with a new tool called nodeadm for worker node initialization. This change requires explicit YAML configuration instead of automatic cluster metadata discovery.

Q2. When will Amazon stop supporting Amazon Linux 2 (AL2) AMIs for EKS? Amazon will end support for EKS AL2-optimized AMIs after November 26, 2025. Kubernetes version 1.32 is the final release where Amazon EKS will provide AL2 AMIs.

Q3. What are the alternatives to Amazon Linux 2 for EKS nodes? The main alternatives are Amazon Linux 2023 (AL2023) and Bottlerocket AMIs. AL2023 offers improved security features and a familiar environment, while Bottlerocket provides a container-optimized design with a minimal attack surface.

Q4. How can I identify and fix common nodeadm errors? You can use the 'nodeadm debug' command to validate critical requirements and the 'nodeadm config check' command to validate configurations. Common issues include missing IAM permissions, network connectivity problems, and incorrect node IP configurations.

Q5. What are the cost implications of staying on a deprecated version of EKS? Running a deprecated version of EKS can cost six times more than the standard rate. Instead of $0.10 per hour, you could end up paying $0.60 per hour, potentially adding over $500 per month for each outdated cluster.

The $10,000 Label: How We Used Go, Clean Architecture, and AWS to Build a FinOps-Driven Cloud Tagging Engine 🏷️

hugolesta — Mon, 29 Sep 2025 16:01:54 +0000

Why Consistent Tagging is Your Company’s Most Underrated FinOps Tool:

The Business Problem: Imagine your cloud bill is a massive corporate expense report. Without proper tagging—simple key-value labels like project: crm-migration or owner: finance-team —you're paying thousands every month for line items labeled simply “Server.” This isn't just an accounting headache; it's a direct threat to cost control and security.
Cost Bloat: Orphaned or forgotten AWS resources (Shadow IT) continue to generate costs because no one is accountable for terminating them.
Billing Disputes: Finance teams struggle to attribute costs accurately, leading to friction and delayed chargebacks.
Security Risks: Unmanaged resources often fall outside compliance or patch cycles.

We decided to solve this with sys-tag-manager, a powerful, automated system built in Golang that acts as our centralized "Cloud Label Printer," ensuring every AWS resource is correctly accounted for, compliant, and cost-trackable.

What Is a Tagging Strategy?

A tagging strategy is a structured approach to applying metadata (tags) to cloud resources. Tags are simple key–value pairs like:

owner: finance-team
project: crm-migration
environment: production

On their own, tags look trivial. But when applied consistently across an entire cloud estate, they form the backbone of organization, governance, and cost management.

A tagging strategy defines:

Which tags are required (e.g. owner, project, environment).
How tags should be formatted (naming conventions, lowercase vs camelCase, separators).
When tags should be applied (at creation time vs automated correction).
Who is responsible for maintaining them?

For further guidance, consult the AWS documentation: Best Practices for Tagging AWS Resources

Technical Foundation: Go, Clean Architecture, and AWS Cost Savings

The Go Advantage: Performance Meets FinOps:

While prototyping in Python is quick, a core, mission-critical tool demands performance and reliability. We chose Golang for sys-tag-manager because:

Cost-Efficient Execution: Go's minimal memory footprint and extremely fast startup time are critical when running as AWS Lambda functions or Kubernetes CronJobs. This translates directly into lower AWS compute costs (less time billed for execution) compared to resource-heavier languages.
Reliability: Static typing and robust concurrency ensure the system can efficiently handle a rapidly growing number of AWS API calls without failure, ensuring 100% compliance coverage.

Clean Architecture: Decoupling Logic from the AWS SDK

To ensure the system remains maintainable as our cloud estate scales, we invested in Clean Architecture. This strategic separation is key to our long-term technical debt reduction:

Domain (Core Business Logic): Pure, independent rules (e.g., "A resource is compliant if it has the required tags: owner and project"). This is highly testable and knows nothing about AWS.
Use Cases (Application Logic): Defines the "what" (e.g., "Check compliance and apply tags").
Adapters (AWS Implementation): Isolated logic that interacts directly with specific AWS SDK services (EC2Tagger, RDSTagger). This prevents vendor lock-in and allows us to add new services (S3, Lambda) without touching the core business rules.

The Compliance Engine: Terraform, SSM, and Metadata Management

The Discovery Layer: Leveraging AWS Resource Explorer

Before sys-tag-manager can fix untagged resources, it must efficiently find them across accounts and Regions. We achieved this by using AWS Resource Explorer as our primary discovery and inventory layer.

Instead of writing complex API calls to list every resource type across every Region, sys-tag-manager utilizes Resource Explorer's unified search capabilities.

The Workflow:

Discovery: sys-tag-manager uses the Resource Explorer API to query the entire cloud estate for resources that are missing required tags (e.g., tag:owner is absent).
Validation: For each untagged resource found, sys-tag-manager checks its metadata against the centralized, correct rules stored in AWS SSM Parameter Store.
Correction: The system then applies the right tags, assigning the resource to the correct owner or project, ensuring immediate compliance.

This design significantly streamlined the core tagging loop, ensuring we are not just efficient in applying tags (Golang), but also efficient in finding them (Resource Explorer), saving API call costs and latency.

Harnessing Shared Infrastructure: The Fallback Mechanism

While our primary goal is to enforce resource-specific tagging, we recognized that some resources are "Shared Infrastructure" (e.g., core networking components, centralized security groups) that don't belong to a single owner. Addressing this was a critical design challenge.

Our solution was a smart fallback mechanism:

The tagging engine first checks for the required resource-specific tags.
If the tags are missing, it then checks a predefined list of AWS ARNs (Amazon Resource Names) that are designated as shared infrastructure.
If an ARN matches, the system applies a generic, shared set of tags (e.g., owner: platform-team, charge-code: shared-infra) instead of flagging it as non-compliant. This prevents false positives and ensures accurate cost attribution for common resources.

The Terraform + SSM Parameter Store Synergy

The true power of sys-tag-manager lies in its ability to dynamically enforce tagging rules based on centralized, auditable metadata.

Centralized Rule Source: We leverage AWS Systems Manager (SSM) Parameter Store to store the required tag keys, values, and compliance rules.
Terraform as the Single Source of Truth: The compliance rules in SSM are managed exclusively by Terraform. This means:
1. Immutability: Every rule change is tracked, reviewed, and deployed via a GitOps workflow.
2. Automation: When a new project is created in Terraform, the required tag values for that project are automatically pushed to SSM, immediately making those tags valid for the tag manager checks.

This integration ensures that the FinOps rules are always aligned with the deployed infrastructure definitions, creating a clean, traceable metadata loop.

Business Impact: Quantifiable Results for FinOps

Metric	Before sys-tag-manager	After sys-tag-manager	Value Proposition
Compliance Time	Weeks (Manual Audits)	Minutes (Automated Correction)	Faster cost allocation & reduced risk.
Orphaned Resources	~12% (Estimate)	<1%	Direct savings on wasted AWS spend.
FinOps Accuracy	High Friction/Disputes	High Trust/Automated Showback	Enables accurate, automated chargeback.

By investing in this system, we have fundamentally shifted from reactive tag auditing to proactive, automated compliance enforcement. This not only saves engineering hours but directly enables our finance team to confidently utilize AWS Cost and Usage Reports (CUR) for accurate showback and chargeback, making our entire cloud operation more accountable and financially efficient.

Wrapping Up: Compliance as Code, Savings as the Result

sys-tag-manager is more than just an automation script; it's the enforcement layer for our FinOps and security policies, ensuring that our cloud environment is self-healing and financially accountable.

By embracing Golang for performance, Clean Architecture for maintainability, and the Terraform + SSM synergy for centralized metadata management, we've transformed tagging from a manual burden into an automated, cost-saving asset. This shift has given us the confidence that every dollar spent on AWS is trackable, auditable, and directly attributed to a business owner or project.

The result is a culture of Compliance as Code where engineers can focus on feature delivery, knowing that the foundational governance—the tagging—is handled automatically and efficiently by sys-tag-manager.

Let's Keep the Conversation Going 🗣️
We've focused on the technical core of sys-tag-manager, but the true organizational victory was how we scaled this system across dozens of teams without friction.

Would you be interested in learning more about how we automated the communication of compliance status and tagging fixes to developers, FinOps, and management?

I'd love to hear your thoughts! What's the biggest tagging challenge you face in your organization? Share your experiences and suggestions in the comments below!

Feel free to connect with me on LinkedIn to discuss our approach to automated communication and team onboarding:

LinkedIn: Hugo Lesta

GitHub: Hugo Lesta's GitHub