Davide de Paolis

Posted on Jan 10

Road To Compliance: Will Your Internal Users Hate Your Platform Team?

#compliance #aws #serverless #publicspeaking

Picture this: It's Thursday, 10:20 AM. You're the Engineering Manager of the Platform Team, and you've just called a "DevOps & Compliance Meeting." You announce new mandatory guidelines that must be implemented by next week, deployments will break without them, and no, this wasn't discussed with Product Managers. The room fills with frustrated voices: "Will this break our deployments?" "We already planned our sprint!" "Was this discussed with our PMs?!"

Sound familiar? This scenario plays out in engineering organizations everywhere (where security and platform teams often play the "cloud police"). And it's exactly where we found ourselves.

When I joined my current company to rebuild and grow the platform team, we were a cloud-native organization that had outgrown its single-account AWS setup. With 15 teams and ~80 engineers, we faced classic scaling challenges:

Operational Overhead: When you scale your organization from a handful of developers working on a monolith to multiple autonomous teams, operational and organizational overhead multiplies—monitoring, roadmap alignment, interfaces and contracts, backward/forward compatibility across services and APIs, security, and deployment pipelines all become more complex.

Compliance Chaos: How do you monitor compliance, enforce best practices, and attribute ownership and costs across a growing infrastructure?

Developer Friction: Every compliance initiative feels like extra work, new constraints, and broken workflows.

The previous platform team had literally burned out trying to build too much with too few resources. They were ambitious — dreaming of an Internal Developer Platform, a Service Catalog, a Cloud Center of Excellence. But fear of vendor lock-in led to over-reliance on self-hosted solutions. (I suggest this great post by Gregor Hophe about lock-in

Starting with the foundational choice of using Kubernetes to run our workloads (at least it was on AWS EKS), to basically everything else (image registries, access management, security scanning)—a plethora of self-hosted tools (Keycloak, Rancher, Longhorn, Harbor) had to be learned, configured, hosted, and maintained on our Kubernetes cluster. All of this burdened the small platform team, and they eventually collapsed under the weight of their own ambitions.

I am already strongly opinionated about Kubernetes - and that is the topic of another talk I co-hosted at several conferences last year too Serverless vs Kubernetes: the final showdown

Building the Foundation

We needed a fundamentally different approach. Our first step was establishing a proper multi-account landscape using AWS Organizations. This wasn't just about separation—it was about creating the foundation for everything that followed:

🗄 Isolation: Data, infrastructure, and access boundaries
🔒 Security Controls: Tailored policies for different applications
📊 Quota Allocation: Individual quotas for projects
💰 Cost Allocation: Separate billing and budgets by domain/team

To reduce operational overhead, we replaced self-hosted tools with AWS native services. This facilitated security and compliance while dramatically reducing the burden on our team.

The Human Challenge

Despite the obvious benefits to us and the security team, developers saw only:

Extra work
New setups
More constraints
Broken routines
Forced learning of new skills

Although frustration was building on our side as well (why did everything seem so difficult for the developers, despite our goodwill and efforts in supporting them?), we couldn't ignore the fact that all these changes were genuinely difficult. We were the ones literally responsible for "engineering enablement", so we couldn't just tell people: "We made this decision, here's the documentation, make those changes, live with it" especially when our documentation was overly detailed, extremely boring, and worse, often outdated.

We knew how to solve the technical challenges, but how were we supposed to support our initiative without solving the cultural ones?

Tagging: A Deceptively Simple Problem

We decided to revamp an old initiative that hadn't been particularly successful: tagging AWS resources. Per-se, this is a very trivial task, just add some key-value pairs to your resources. The reasons why tagging is crucial are:

Ownership Attribution: Who owns this resource? Who should we contact when issues arise?
Cost Allocation: Which team or project should be billed for this resource?
Access Management: IAM and resource policies can be scoped by owner tags
Incident Response: Quickly identify affected teams and applications
Compliance Reporting: Track resources by environment, sensitivity, and regulatory requirements

Early on, we made the mistake of treating tagging as "just metadata," assuming teams would adopt it once the rules were defined. That didn't work. Tagging only becomes effective when clearly connected to outcomes developers actually care about, like cost visibility, ownership, and faster incident resolution.

We started small by defining a minimal, opinionated set of required tags and making them easy to apply through infrastructure-as-code templates and platform defaults. Instead of enforcing everything upfront, we focused on visibility first: showing teams where tags were missing, what that meant in terms of cost and accountability, and how to fix it with minimal effort.

Over time, we gradually introduced stronger guardrails using Tag Policies and Service Control Policies, but only after teams understood the value and had the right tooling in place. The result was much better adoption, fewer exceptions, and a shared understanding that tagging wasn't "platform bureaucracy," but an enabler for governance and FinOps.

Let me dive into the technical details that made this possible. Understanding the differences between these AWS services is crucial for implementing effective compliance.

Tag Policies vs. Service Control Policies: Understanding the Difference

AWS Organizations Tag Policies: The Standardization Layer

Tag Policies help you standardize tags across your organization by defining allowed keys and values:

What Tag Policies do:

Define standardized tag keys across the organization
Specify allowed values for each tag
Validate tag values during resource creation
Provide compliance reporting on tag usage

Critical limitation:

They validate values but don't prevent resource creation without tags
Resources can still be created with missing tags
They're about standardization, not enforcement

Service Control Policies (SCPs): The Prevention Layer

SCPs provide hard enforcement by preventing actions that don't meet your requirements:

What SCPs do:

Prevent resource creation without required tags
Apply at account-level or OU-level
Cannot be overridden by any user or role in the account
Provide true enforcement of compliance policies

Critical challenge:

Can break deployments if not carefully implemented
Requires thorough testing before activation
Should only be enabled after teams are informed and prepared

The Key Difference: Validation vs. Prevention

Here's the crucial distinction:

Tag Policies say: "If you're going to use the 'Environment' tag, it must be one of these values: production, staging, development (not _prod, not live, not test)"

Service Control Policies say: "You cannot create an EC2 instance unless it has an 'Environment' tag."

This was all good and effective. But if we had simply rolled out those policies—informing teams a couple of sprints ahead and warning them that deployments would break without proper tags — we knew chaos would break loose. Teams were already stretched thin with product deadlines and feature commitments. They had little time, and even less interest, in making changes that didn't directly contribute to their goals. The tagging requirement would inevitably be forgotten, postponed, and only remembered when deployments failed at the worst possible moment.

Avoiding this scenario was the main cultural and collaborative change we wanted to introduce (compared to the previous team's approach).

Inform Before You Enforce

So instead of jumping straight to enforcement, we needed a different approach: detect untagged resources first, then inform teams about what needed to change and why.

The good news? AWS provides separate tools for informing and enforcing compliance. We could use Security Hub Resource Tagging Standard to detect issues and give teams visibility, then only later enforce the rules through policies once everyone was on board.

AWS Security Hub Resource Tagging Standard: The Detection Layer

The AWS Security Hub Resource Tagging Standard is a foundational security standard that became our compliance visibility engine. Here's what it actually does:

Core Capabilities:

Automatically discovers and evaluates resources across all AWS accounts
Checks for the presence of required tags on resources
Generates findings for non-compliant resources
Provides compliance scoring and trending over time What it doesn't do:
It doesn't prevent resource creation
It doesn't validate tag values (only presence)
It doesn't enforce anything—it only detects and reports

This was perfect for our "inform before enforce" strategy. We could see the compliance landscape without breaking anyone's deployments.

Security Hub Tagging Standard says: "Here are all the resources that don't have the required tags."

Understanding the differences between Tagging Standard, Tagging Policies, and SCPs is essential for implementing a phased compliance approach.

Our Three-Layer Governance Strategy

We implemented a comprehensive tagging governance strategy using these three complementary services:

Layer 1: Tag Policies (Standardization)

Purpose: Prevent tag chaos and inconsistency
Define allowed tag keys and values
Ensure consistency across the organization
Validate values when tags are present

Layer 2: Security Hub + Config (Detection & Reporting)

Purpose: Provide visibility and inform teams
Continuously monitor compliance
Generate findings for non-compliant resources
Track compliance trends over time

Layer 3: Service Control Policies (Prevention)

Purpose: Ensure compliance for critical resources
Prevent resource creation without required tags
Hard enforcement at the account/OU level

We activated these layers in sequence, not all at once.

The Soft Enforcement Process: Detection Without Disruption

Following the principle don't block deployments—detect and inform instead. we ended up not just using Tag Policies, SCPs, and Security Hub out of the box, but building a custom serverless solution that leveraged these services along with AWS Config, Lambda, EventBridge, SQS, and Chatbot to create two complementary enforcement mechanisms.

Solution 1: Real-Time Detection and Notification

Instead of blocking the deployment of untagged resources with SCPs, we allow them to be created but detect them immediately:

How it works:

AWS Config monitors all resource creation events in real-time
EventBridge filters for "ConfigurationItemChanged" events with "ResourceDiscovered" status, checking for missing mandatory tags
Lambda processes each finding to identify the owner by correlating:
- CloudTrail events
- CreatedBy tags
- Resource naming patterns
- CI/CD infrastructure metadata (GitHub Actions runs, etc.)
SQS buffers the findings to aggregate multiple resources from the same deployment (avoiding Slack spam)
AWS Chatbot sends a friendly but firm Slack notification to the responsible team: "Hey, we saw what you just deployed! It's missing tags, but we're letting it through for now. Here's how to fix it..."

The processing interval is carefully tuned — fast enough to provide timely feedback, but slow enough to batch related resources from the same CI/CD run.

Solution 2: Weekly Progress Reporting

For existing infrastructure and tracking overall progress, we built a separate reporting pipeline:

How it works:

EventBridge triggers a Lambda on a weekly schedule
Lambda fetches all Security Hub findings related to tagging compliance
Noise filtering removes:
- AWS-managed resources
- Temporary resources (spot instances, auto-scaling test instances)
- Resources in exempted accounts (sandbox, development)
Aggregation groups findings by team and calculates compliance metrics
SQS buffers the aggregated reports
Report generator creates:
- Jira tickets for teams to plan into their sprints [well.. actually were are not there yet!]
- Slack messages celebrating progress and thanking collaborative teams
- Gentle pokes for teams falling behind

Decoupling real-time monitoring from weekly reports gives teams a grace period. They can quickly add tags after deployment but before the weekly report runs.

This is soft enforcement — it doesn't slow down development, but it makes non-compliance visible and trackable. Teams can work "eventually compliant" by planning the automatically generated tickets into their regular sprints.

Lessons Learned

Why Previous Efforts Failed

Loss of focus on either goals or people
Over-engineering compliance controls without considering team experience
Not listening enough and imposing too much

What Works Instead

Focus on urgent and high-impact items first
Define clear purpose and iterate based on feedback
Combine automation, communication, and empathy
Mix soft and hard enforcement strategically
Make compliance visible and measurable without breaking workflows

The Phased Rollout That Actually Worked

Phase 1: Monitor Only (Security Hub)

Collect baseline compliance data
Identify patterns and common violations
Build team awareness through reporting
Start with a minimal set of required tags (3-4 tags max)
Gradually iterate and expand Tag Policies based on feedback

Phase 2: Soft Enforcement (Notifications)

Deploy real-time Slack notifications for new violations
Launch weekly progress reports and Jira ticket creation
Refine Tag Policies to add more allowed values as teams requested them
Iterate on tag requirements based on actual usage patterns
Continue to adjust and improve the tag schema

Phase 3: Selective Hard Enforcement (SCPs)

Activate SCPs only for critical resources in production accounts
Keep development/sandbox accounts with soft enforcement only
Gradual expansion based on compliance rates
Final iteration on Tag Policies to lock in the standardized schema

Key Insights from Our Journey

Our tagging compliance initiative taught us valuable lessons that apply to any internal compliance effort—not just tagging. Here's what we discovered:

Our approach has shifted compliance from a source of friction toward a more collaborative effort:

Proactive Communication: Teams now receive actionable, timely feedback when resources are created without proper tags
Visible Progress: Weekly reports show improvement over time, celebrating wins and identifying areas needing attention
Reduced Resistance: Informing before enforcing and empowering team champions has helped build trust
Ongoing Journey: Compliance is an ongoing process, not a one-time project—we're still iterating and improving

The technical implementation was the easy part. The hard part? The human factors. Here's what our experience highlighted:

1. Empower Team Champions

Instead of imposing rules from the top, empower champions within product teams to carry ownership of compliance topics. This scales your influence and reduces resistance.
(We're still learning how to identify and support these champions effectively. It's not a one-and-done activity—it requires ongoing investment and relationship building.)

2. Communication and Trust

Share your goals openly and adjust your approach as priorities shift. Make your communication style collaborative, not prescriptive. This means regular check-ins, transparent roadmaps, and being willing to say "we got this wrong" when feedback reveals issues.

3. Quick Wins and Momentum

Start with quick wins, build momentum, and iterate with patience and empathy. As Robin Sharma says:

"Change is hard at first, messy in the middle, and gorgeous at the end."

(We're still in the "messy middle" for many initiatives. Progress isn't linear, and that's okay.)

4. Tour of Duty Model

Involve representatives from engineering teams early through temporary rotations. Engineers join the platform team for a fixed period (or vice-versa), providing direct feedback while spreading platform knowledge back to their home teams.
This creates empathy in both directions — platform engineers understand product pressures, and product engineers understand platform constraints.

A Framework You Can Use

Our biggest realisation was how much the human factors mattered more than the technical implementation. The AWS services worked as expected (in a matter of days of development). The hard part was - and continues to be - building trust, communicating effectively, and maintaining empathy when teams push back.

Our Minimum Viable Governance framework can be applied to any internal compliance initiative—security policies, cost controls, access management, or anything else:

Define clear purpose and communicate it openly
Assess urgency vs. impact to prioritize efforts
Inform before you enforce using automated systems
Measure and communicate progress regularly
Iterate based on feedback from your internal users
Empower team champions to drive adoption

This isn't just about tagging. It's about how platform teams can drive any compliance initiative without becoming "the cloud police."

Conclusion: Compliance as a Collaborative Journey

The question isn't whether your internal users will hate your Platform Team — it's whether you'll give them reasons to.
The real work (for Platform Teams, DevEx teams, or even Security teams ) is understanding that compliance is fundamentally a people problem, not (only) a technology problem.

Your internal users are your customers too. Treat them with the same care you'd give external customers, and they'll become your biggest advocates rather than your biggest obstacles.

About This Post

This blog post is a written adaptation of my conference talk "Road to Compliance: Will Your Internal Users Hate Your Platform Team?" which I presented at several conferences throughout 2024 and 2025.

I wanted to document both our compliance journey and the content of the talk in written form, making it easier to reference and share with others facing similar challenges. While the live presentations included interactive discussions and Q&A sessions that brought additional insights, this post captures the core narrative, technical implementation details, and lessons learned from our experience.

If you prefer video format, you can watch recordings of the talk here:

Feel free to reach out if you'd like to discuss platform engineering, compliance strategies, or building better developer experiences!

Resources and Further Reading

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more