If you run Kafka as shared infrastructure, you've probably faced this question at some point: who is responsible for this topic, and what does it cost us?
This is the core problem that Kafka FinOps tries to solve. In this post I'll explain what chargeback reporting means in a Kafka context, why it's hard, and how we implemented it in PartitionPilot.
What Is Chargeback Reporting?
Chargeback is a practice borrowed from cloud FinOps: instead of treating infrastructure costs as a single shared line item, you break them down by team, service, or product — and charge each one for what they actually use.
In AWS or GCP this is relatively straightforward. Cloud providers give you cost allocation tags. But Kafka has no native cost model. It doesn't know about teams, budgets, or ownership.
That's where chargeback reporting for Kafka comes in.
The Two Cost Drivers in Kafka
Before you can do chargeback, you need to understand what actually costs money in Kafka:
Storage — every message written to a topic is stored on disk until it expires (based on retention settings). A topic with a 7-day retention and high throughput can consume hundreds of gigabytes.
Traffic — every byte written to (bytes-in) and read from (bytes-out) a topic generates network traffic. On AWS MSK or Confluent Cloud, this traffic is billed directly.
Both can be measured via Prometheus JMX metrics:
-
kafka.log:type=Log,name=Size→ storage per topic-partition -
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec→ inbound traffic per topic -
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec→ outbound traffic per topic
The Missing Piece: Ownership
Metrics alone aren't enough for chargeback. You also need to know who owns each topic.
In most Kafka deployments, topic ownership is tribal knowledge. It lives in someone's head, in a Confluence page that's three years out of date, or nowhere at all.
For chargeback to work, you need a system that:
- Tracks which team or person owns each topic
- Links cost metrics to that ownership
- Produces a report that finance or engineering management can actually use
How PartitionPilot Implements This
PartitionPilot connects to your Prometheus endpoint and takes periodic cost snapshots. Each snapshot captures storage and traffic per topic, stamped with a timestamp.
On top of that, it lets you assign an owner to each topic and consumer group. Ownership is stored in a PostgreSQL database alongside the cost data.
The result: a chargeback report in CSV format that looks like this:
Owner | Topic | Storage (GB) | Traffic In (GB) | Traffic Out (GB) | Estimated Cost
------------+------------------------+--------------+-----------------+------------------+---------------
Team A | orders.v2 | 12.4 | 45.2 | 180.8 | CHF 23.40
Team B | user-events | 8.1 | 120.3 | 360.9 | CHF 41.20
Team C | analytics.raw | 95.2 | 890.1 | 2670.3 | CHF 312.80
This report can be exported and shared with engineering managers or finance teams on a monthly basis.
Why This Is Harder Than It Sounds
A few things make Kafka chargeback tricky in practice:
Topics are shared. A single topic can be written to by one team and consumed by three others. Who pays for the outbound traffic — the producer or the consumers? There's no universal answer. PartitionPilot lets you assign separate ownership for producer and consumer sides.
Retention makes storage non-obvious. The cost of a topic depends not just on throughput, but on retention settings. A low-traffic topic with 30-day retention can cost more than a high-traffic topic with 1-hour retention.
Metrics need aggregation. Raw Prometheus metrics are per-broker, per-partition. You need to aggregate them per topic across all brokers to get meaningful numbers.
Getting Started
PartitionPilot is self-hosted via Docker Compose. You can start a free 30-day trial at partitionpilot.com — no credit card required.
If your team is running Kafka as shared infrastructure and you want to start doing proper cost allocation, give it a try.
Pascal Clément — founder of PartitionPilot
Top comments (0)