DEV Community

Cover image for Starting Enterprise-wide Kafka Governance
Prasad Rane
Prasad Rane

Posted on

Starting Enterprise-wide Kafka Governance

Technical leadership isn't always about having the right title; it’s about having the right influence. When you are operating without direct authority, setting standards across autonomous teams requires a shift from "enforcing mandates" to "solving shared pain."

The Mess: Visualizing the Chaos

Visualizing the Chaos

It began on a Tuesday in a conference room, facing a whiteboard filled with chaotic arrows representing event flows. The AWS MSK dashboard displayed a patchwork of inconsistent topic names.

loan-processing.loancreated
LoanService.Events.Created
mortgage.v2.loan.originated
Enter fullscreen mode Exit fullscreen mode

These topics often performed the same functions but were impossible to correlate during system failures. Same intent. Same cluster. Completely different conventions.

When something broke, tracing ownership or flow was guesswork.

But naming inconsistencies were just the symptom.

  • IAM policies ranged from overly permissive to painfully restrictive
  • Schema evolution had no guardrails
  • Ownership was unclear
  • Every team used Kafka differently

The real issue was simple:

Everyone depended on Kafka. Nobody owned its governance.

That’s how technical debt quietly compounds; until it becomes operational risk.

The Realization: Governance is a Missing System, Not a Missing Rule

The Realization

Sitting down, one thought became clear:

We’re building technical debt faster than we can document it.

Without intervention, cross-service debugging would become unsustainable within a year. But there was a constraint:

  • No central authority
  • No mandate power
  • Fully autonomous teams This wasn’t a tooling problem. It was a coordination problem.

Meeting Invite: Start with Pain, Not Policy

Instead of proposing a solution, I invited collaboration by framing it as a way to fix shared frustrations.

Meeting Invite

That framing mattered. No one wants another rulebook. But everyone wants their pain heard.
Five engineers showed up:
2 from platform engineering
3 from application teams

That was enough to start.

The Meeting: Shifting from Skepticism to Engagement

Shifting from Skepticism to Engagement

In the meeting, I led with pain points rather than solutions, showing real examples of redundant and confusing topics on the MSK dashboard.
Then someone said:

“We got paged last week because a producer couldn’t write to a topic. Nobody knew how to fix it.”

That changed the room. We moved from:

“Why are we here?”

to

“We need to fix this.”

That shift from abstract discussion to shared pain is where influence begins.

Drafting Standards: The Technical Blueprint

Drafting Standards

Over the next two weeks, I drafted the first version of a governance framework. It focused on three pillars:

1.Topic Naming Convention

Convention: {domain}.{service}.{event-type}.{version}
Example: loan.loanservice.loan-created.v1

Goal: Make the topic names - discoverable, searchable, self-descriptive, and future-proof with versioning.

2. IAM Policies (Least Privilege)

Creating mandatory least-privilege templates tied to service roles to eliminate wildcards.

Goal: Make the secure path the default path.

3. Schema Evolution (Avro + Compatibility)

Enforced Avro with BACKWARD compatibility which is integrated with schema registry and gets checked before deployment.

Goal: Prevent breaking downstream consumers silently

Automation: If It’s Not Enforced, It’s Optional

Enforcing the rules

Documentation alone doesn’t scale. So we embedded the standards into tooling.

Terraform Modules: Topic provisioning templates now included regex validation for naming and pre-configured IAM defaults.
CI/CD Linters: We built a linter to flag violations in service configuration files before deployment.
Example failure: FAILED: topic name 'test_topic' does not match pattern
Example success: PASS: topic 'loan.loanservice.loan-created.v1'
Gate: Automation turned guidance into a "gate," ensuring that the right way was also the only way to deploy.
That’s when governance becomes real:

Not a suggestion. A system constraint.

The Tipping Point: Adoption Without Intervention

Adoption

Three months later, something subtle happened. I was reviewing a PR from a team I’d never worked with.

Everything passed:

  • Naming ✔
  • IAM ✔
  • Schema ✔

No feedback needed. That’s when I knew:

The system didn’t need me anymore.

And that’s the goal of good governance.

Key Takeaway

This experience reshaped how I think about technical leadership:

Standards don’t succeed because they’re correct.
They succeed because people believe in them.

What Actually Worked

  • Start with pain, not policy
  • Involve stakeholders early
  • Facilitate, don’t dictate
  • Bake standards into tooling

Final Thought

Result

If you're working on platform or architecture problems in a multi-team environment, remember:

You don’t need authority to create alignment.

You need:

  • clarity
  • empathy
  • and systems that reinforce good decisions

Standards stick when they make people’s lives better — not when they add gates.

Top comments (0)