hugolesta

Posted on Sep 29

The $10,000 Label: How We Used Go, Clean Architecture, and AWS to Build a FinOps-Driven Cloud Tagging Engine 🏷️

#aws #finops #terraform #go

Why Consistent Tagging is Your Company’s Most Underrated FinOps Tool:

The Business Problem: Imagine your cloud bill is a massive corporate expense report. Without proper tagging—simple key-value labels like project: crm-migration or owner: finance-team —you're paying thousands every month for line items labeled simply “Server.” This isn't just an accounting headache; it's a direct threat to cost control and security.
Cost Bloat: Orphaned or forgotten AWS resources (Shadow IT) continue to generate costs because no one is accountable for terminating them.
Billing Disputes: Finance teams struggle to attribute costs accurately, leading to friction and delayed chargebacks.
Security Risks: Unmanaged resources often fall outside compliance or patch cycles.

We decided to solve this with sys-tag-manager, a powerful, automated system built in Golang that acts as our centralized "Cloud Label Printer," ensuring every AWS resource is correctly accounted for, compliant, and cost-trackable.

What Is a Tagging Strategy?

A tagging strategy is a structured approach to applying metadata (tags) to cloud resources. Tags are simple key–value pairs like:

owner: finance-team
project: crm-migration
environment: production

On their own, tags look trivial. But when applied consistently across an entire cloud estate, they form the backbone of organization, governance, and cost management.

A tagging strategy defines:

Which tags are required (e.g. owner, project, environment).
How tags should be formatted (naming conventions, lowercase vs camelCase, separators).
When tags should be applied (at creation time vs automated correction).
Who is responsible for maintaining them?

For further guidance, consult the AWS documentation: Best Practices for Tagging AWS Resources

Technical Foundation: Go, Clean Architecture, and AWS Cost Savings

The Go Advantage: Performance Meets FinOps:

While prototyping in Python is quick, a core, mission-critical tool demands performance and reliability. We chose Golang for sys-tag-manager because:

Cost-Efficient Execution: Go's minimal memory footprint and extremely fast startup time are critical when running as AWS Lambda functions or Kubernetes CronJobs. This translates directly into lower AWS compute costs (less time billed for execution) compared to resource-heavier languages.
Reliability: Static typing and robust concurrency ensure the system can efficiently handle a rapidly growing number of AWS API calls without failure, ensuring 100% compliance coverage.

Clean Architecture: Decoupling Logic from the AWS SDK

To ensure the system remains maintainable as our cloud estate scales, we invested in Clean Architecture. This strategic separation is key to our long-term technical debt reduction:

Domain (Core Business Logic): Pure, independent rules (e.g., "A resource is compliant if it has the required tags: owner and project"). This is highly testable and knows nothing about AWS.
Use Cases (Application Logic): Defines the "what" (e.g., "Check compliance and apply tags").
Adapters (AWS Implementation): Isolated logic that interacts directly with specific AWS SDK services (EC2Tagger, RDSTagger). This prevents vendor lock-in and allows us to add new services (S3, Lambda) without touching the core business rules.

The Compliance Engine: Terraform, SSM, and Metadata Management

The Discovery Layer: Leveraging AWS Resource Explorer

Before sys-tag-manager can fix untagged resources, it must efficiently find them across accounts and Regions. We achieved this by using AWS Resource Explorer as our primary discovery and inventory layer.

Instead of writing complex API calls to list every resource type across every Region, sys-tag-manager utilizes Resource Explorer's unified search capabilities.

The Workflow:

Discovery: sys-tag-manager uses the Resource Explorer API to query the entire cloud estate for resources that are missing required tags (e.g., tag:owner is absent).
Validation: For each untagged resource found, sys-tag-manager checks its metadata against the centralized, correct rules stored in AWS SSM Parameter Store.
Correction: The system then applies the right tags, assigning the resource to the correct owner or project, ensuring immediate compliance.

This design significantly streamlined the core tagging loop, ensuring we are not just efficient in applying tags (Golang), but also efficient in finding them (Resource Explorer), saving API call costs and latency.

Harnessing Shared Infrastructure: The Fallback Mechanism

While our primary goal is to enforce resource-specific tagging, we recognized that some resources are "Shared Infrastructure" (e.g., core networking components, centralized security groups) that don't belong to a single owner. Addressing this was a critical design challenge.

Our solution was a smart fallback mechanism:

The tagging engine first checks for the required resource-specific tags.
If the tags are missing, it then checks a predefined list of AWS ARNs (Amazon Resource Names) that are designated as shared infrastructure.
If an ARN matches, the system applies a generic, shared set of tags (e.g., owner: platform-team, charge-code: shared-infra) instead of flagging it as non-compliant. This prevents false positives and ensures accurate cost attribution for common resources.

The Terraform + SSM Parameter Store Synergy

The true power of sys-tag-manager lies in its ability to dynamically enforce tagging rules based on centralized, auditable metadata.

Centralized Rule Source: We leverage AWS Systems Manager (SSM) Parameter Store to store the required tag keys, values, and compliance rules.
Terraform as the Single Source of Truth: The compliance rules in SSM are managed exclusively by Terraform. This means:
1. Immutability: Every rule change is tracked, reviewed, and deployed via a GitOps workflow.
2. Automation: When a new project is created in Terraform, the required tag values for that project are automatically pushed to SSM, immediately making those tags valid for the tag manager checks.

This integration ensures that the FinOps rules are always aligned with the deployed infrastructure definitions, creating a clean, traceable metadata loop.

Business Impact: Quantifiable Results for FinOps

Metric	Before sys-tag-manager	After sys-tag-manager	Value Proposition
Compliance Time	Weeks (Manual Audits)	Minutes (Automated Correction)	Faster cost allocation & reduced risk.
Orphaned Resources	~12% (Estimate)	<1%	Direct savings on wasted AWS spend.
FinOps Accuracy	High Friction/Disputes	High Trust/Automated Showback	Enables accurate, automated chargeback.

By investing in this system, we have fundamentally shifted from reactive tag auditing to proactive, automated compliance enforcement. This not only saves engineering hours but directly enables our finance team to confidently utilize AWS Cost and Usage Reports (CUR) for accurate showback and chargeback, making our entire cloud operation more accountable and financially efficient.

Wrapping Up: Compliance as Code, Savings as the Result

sys-tag-manager is more than just an automation script; it's the enforcement layer for our FinOps and security policies, ensuring that our cloud environment is self-healing and financially accountable.

By embracing Golang for performance, Clean Architecture for maintainability, and the Terraform + SSM synergy for centralized metadata management, we've transformed tagging from a manual burden into an automated, cost-saving asset. This shift has given us the confidence that every dollar spent on AWS is trackable, auditable, and directly attributed to a business owner or project.

The result is a culture of Compliance as Code where engineers can focus on feature delivery, knowing that the foundational governance—the tagging—is handled automatically and efficiently by sys-tag-manager.

Let's Keep the Conversation Going 🗣️
We've focused on the technical core of sys-tag-manager, but the true organizational victory was how we scaled this system across dozens of teams without friction.

Would you be interested in learning more about how we automated the communication of compliance status and tagging fixes to developers, FinOps, and management?

I'd love to hear your thoughts! What's the biggest tagging challenge you face in your organization? Share your experiences and suggestions in the comments below!

Feel free to connect with me on LinkedIn to discuss our approach to automated communication and team onboarding:

LinkedIn: Hugo Lesta

GitHub: Hugo Lesta's GitHub