Muskan

Posted on May 15 • Originally published at zop.dev

Chargeback Without Spreadsheets: The 4-Field Schema That Replaced Our 200-Tag Mess

#chargeback #without #spreadsheets #field

The tag taxonomy starts at 30 keys and climbs from there. Year one, every team agrees on env, team, cost-center, service. Year two, finance asks for customer, feature, data-classification, pci-scope. Year three, the security team adds compliance-tier, the platform team adds iac-tool, and an exec asks for revenue-stream so they can correlate cost with bookings. By year four the spreadsheet of "official tags" runs to 200 keys, three of which are required and the rest of which are aspirational.

The numbers from this taxonomy are wrong, and everyone knows it. Per-team chargeback accuracy hovers around 70 to 80 percent. The misattribution is consistent enough that finance has stopped pushing back: the platform team always overpays by 8 percent, the recommendation team always underpays by 6 percent, the unattributed bucket holds 12 percent of total spend. Every month the same conversation happens. Every quarter someone proposes "tag governance" again. The taxonomy keeps growing.

The structural problem isn't engineer discipline. It's that tags require a human to set them at resource creation time, and the taxonomy has no lifecycle. Stale tags don't generate errors; they generate misattribution. Adding a new tag means backfilling thousands of resources. Renaming a tag is functionally impossible. The system is set up to grow but never to shrink, and the people whose costs depend on the data have no leverage to change the resource-creation behavior of the engineers writing the IaC.

The fix is to stop trying to tag perfectly and start absorbing tag variance at the cost-record stage. Four fields per cost record, populated by a lookup table that the FinOps team owns. Engineers tag what they remember; the lookup absorbs what they forget. Per-team chargeback accuracy goes from 70-80 percent to 92-96 percent over six weeks. The 200-tag spreadsheet stops being the source of truth and becomes one input among several.

This pattern composes with the chargeback vs showback work on the foundational shape of team-level accountability.

Why the 200-tag taxonomy is a structural problem, not a hygiene problem

Year one of cloud chargeback, every org's tag taxonomy looks the same: about 30 keys, with three to six marked required. Engineers comply because the list is short. Finance gets reasonable numbers. Everyone agrees the system works.

Year	Tag count	% of resources with all required tags	Engineer comment
1	30	85	"Yeah, tagging is fine"
2	80	65	"Why do I need to set 'compliance-tier' on a Lambda?"
3	150	45	"What does 'revenue-stream' even mean for a CI job?"
4	220	30	"I just copied the tags from the last service"

The compliance ratio drops because the marginal tag added in year two has no obvious meaning at the resource level. An engineer creating an EC2 instance for a CI runner doesn't know what revenue-stream should be. They guess, copy from somewhere, or leave it blank. The cost system attributes the resource to whatever the guess produced, or to the unattributed bucket if blank. Either way the chargeback number for that team is wrong by some amount the team can't measure and finance can't audit.

The standard response is "tag governance": automated checks that block resource creation if required tags are missing. This works for new resources but doesn't fix the historical drift, doesn't help with tags that are present-but-wrong, and creates friction that engineering pushes back on. The CI runner case is the typical pushback example: forcing the engineer to figure out revenue-stream for a build worker is friction with no operational value.

The deeper problem is that tags are the wrong abstraction for chargeback. Tags are properties of a resource. Chargeback is a mapping from resources to chargeable units (teams, cost centers, services). A taxonomy of 200 properties of resources is too detailed for the mapping (most tags don't matter for chargeback) and not flexible enough for the mapping (changing the mapping requires changing tags on resources). The mapping needs to live somewhere else.

The 4 fields that actually matter for chargeback

Strip the chargeback schema down to what every consumer actually needs.

Field	Cardinality	Consumer	What it answers
`cost_center`	30-100 per org	Finance	Which department's budget pays for this
`service`	50-300 per org	Engineering leadership	Which product surface is this resource for
`env`	3-5 per org	On-call, security, audit	Production / staging / dev / experiment / sandbox
`owner_email`	one per service	Anyone debugging cost	Who do I ask about this spend

Cost_center is the only field finance cares about for the actual chargeback. Everything else is operational. Service drives the per-team breakdown engineering uses to know where to focus optimization. Env separates prod from non-prod so production overspend isn't hidden in dev experimentation. Owner_email closes the loop on overspend questions; if a number looks weird, there's a human to ask.

Why four and not five: every additional field doubles the maintenance work on the lookup table while adding marginal value to the chargeback report. Customer attribution sounds important but is volatile (customers churn, services get re-targeted) and high-cardinality (cost records explode). Compliance tier is real for security audits but not for chargeback; it lives in a separate system that joins on resource_id when needed. Feature flag attribution is interesting for product analytics but not for cost allocation.

Why four and not three: dropping owner_email seems tempting (cost_center owner is in HR, look it up). In practice, looking up the cost_center owner during a 2 AM cost spike is friction nobody pays. Having owner_email on the cost record means anyone reading the dashboard can email the right person without a directory lookup.

The four-field shape also gives finance the structure they actually use. The monthly chargeback rollup is SUM(cost) GROUP BY cost_center — one query, no joins. The per-team breakdown is SUM(cost) GROUP BY service filtered to the cost_center. The "what's running in prod" question is WHERE env = 'prod'. Three queries, four fields, no spreadsheet.

The lookup table absorbs tag variance

Engineers don't stop tagging. They keep doing what they do. The cost-record pipeline has a lookup-table step that converts whatever tag soup the resource has into the four canonical fields.

The lookup table is owned by the FinOps team. It's a yaml file checked into a git repo, reviewed via pull requests, deployed alongside the cost-record pipeline. A typical entry:

Match condition	cost_center	service	owner_email
`team:platform` AND `env:prod`	ENG-PLATFORM-001	platform-api	platform-leads@example.com
`team:platform` AND `env:staging`	ENG-PLATFORM-001	platform-api	platform-leads@example.com
`team:rec-eng`	ENG-RECOMMENDATIONS-002	rec-pipeline	rec-leads@example.com
`account:acct-12345` (no team tag)	ENG-PLATFORM-001	platform-api	platform-leads@example.com
`account:acct-67890` AND `service:billing`	FIN-BILLING-PROD-005	billing-api	billing-eng@example.com

When a tag taxonomy changes, only the lookup table changes. Engineers don't get tag-rename PRs. The historical cost records keep their attribution because they were written through the lookup snapshot in effect at that time. The cost of changing chargeback policy collapses from "rewrite tags on N thousand resources" to "edit one yaml file."

The lookup table is the only thing finance owns end-to-end. The tag taxonomy is owned by engineering (because tagging happens at resource creation). The cost-record pipeline is owned by data engineering. The reports are consumed by finance. Putting the policy in the middle, in a place finance owns, is what makes the chargeback numbers actually trustworthy.

Inference rules for tag absence

Tags will be missing. Half the resources in any given month don't have all four required tags set. The lookup table needs explicit fallback rules for absence.

Field missing	Fallback derivation	Example
`team` tag	Use the account-to-cost-center map	account 12345 → cost_center=ENG-PLATFORM-001
`service` tag	Use the most-tagged service in the account	70% of resources tagged service=platform-api → default to platform-api
`env` tag	Use account tier (prod accounts → prod, sandbox accounts → dev)	account named `prod-us-east` → env=prod
`owner_email`	Look up cost_center owner from the cost_center map	cost_center=ENG-PLATFORM-001 → owner_email=platform-leads@
Everything missing	Route to "unattributed" bucket; finance reviews monthly	Any resource that survives all fallbacks

Each fallback is explicit and auditable. The cost record carries a derivation field showing which rules fired ("team-tag-missing → account-fallback → ENG-PLATFORM-001"). When finance asks "why is this charged to platform," the answer is in the record, not in someone's head.

The unattributed bucket is small (typically under 5 percent of monthly spend after the lookup is set up). It's the safety valve: anything the rules can't attribute lands here, finance reviews monthly, and either a new rule gets added (if it's a recurring case) or the bucket is allocated proportionally (if it's truly one-off).

The trick to keeping the unattributed bucket small is that the fallback rules cascade. A resource with no tags but in a known account gets attributed by the account rule. A resource in an unknown account but with a service tag gets attributed by the service rule. Each rule pulls some percentage of resources out of "unattributed" without requiring the engineer to add a tag. After three months of tuning, the cascading rules cover 95 percent of resources.

Migration: 6 weeks for a 200-engineer org

The migration is shorter than people expect because engineers don't have to do anything. The work is concentrated in the FinOps team writing lookup tables and validating the new attribution.

Week	Work	Deliverable
1	Inventory existing tags + their actual usage	Spreadsheet of tag → resource count → uniqueness
2	Write the initial lookup table from the inventory	yaml lookup table covering 80% of resources
3	Run the new pipeline alongside the old; collect both attributions	Side-by-side report of old vs new chargeback per cost_center
4	Tune the lookup table where the two diverged; add cascading rules	Lookup table v2 + cascading inference rules
5	Cut over reports to use the new attribution; old pipeline still runs for audit	New chargeback reports go live
6	Retire stale tags; communicate to engineering what the source of truth is now	Stale tag list deleted from the canonical taxonomy doc

The shadow-run weeks (3 and 4) are where the work converges. The two pipelines produce different numbers; the FinOps team investigates each delta over $5,000/month and adjusts the lookup table. Most deltas are explained by inference-rule edge cases (account boundaries, service rename, env-tag misuse). A few are real bugs in the old pipeline that the new attribution exposed.

Engineering involvement is minimal. One sync at week 1 (we're collecting tag data, no action needed). One sync at week 5 (the chargeback report you see is now generated this way; here's the doc). Engineers don't change their tagging behavior because the lookup absorbs the variance. The friction cost on engineering is roughly two hours of meetings.

The retirement of stale tags at week 6 is optional but worth doing. The taxonomy doc gets pruned to the small set of tags that the lookup actually reads. Engineers stop seeing the 200-tag wishlist; they see the 20 tags that matter. New tag proposals go through the FinOps team because adding a tag now means adding lookup-table logic, not just adding a row to the wishlist.

Lookup-table version history for time-travel attribution

The lookup table will change. Cost centers split when teams reorganize. Services rename when products rebrand. The chargeback system needs to handle change without rewriting historical records.

The cost-record pipeline writes a lookup_version field on each record. The lookup table is versioned (a git tag or a row in a snapshot table). Reports query through the version that was current when the cost record was written.

The historical Q4 2025 chargeback report keeps showing ENG-PLATFORM-001 even after the cost center splits. Comparing 2026 Q2 to 2025 Q4 produces a footnote ("ENG-PLATFORM-001 split into PLATFORM-API-009 and PLATFORM-DATA-010 in 2026-Q1") rather than a misattribution. Finance gets honest historical comparisons; engineering gets the current cost-center structure.

The version history also lets the FinOps team experiment safely. A proposed lookup change can be shadow-run on historical data to see how it would have changed the attribution. If the change would have produced large unexpected swings, the team investigates before going live.

Accuracy goes from 70-80 percent to 92-96 percent

The improvement isn't from cleaner tag data. The data is the same. The improvement is from absorbing tag variance at the attribution layer instead of demanding tag perfection at the resource layer.

Dimension	Tag-only chargeback accuracy	4-field schema accuracy
Per-team rollup	70-80%	92-96%
Per-service breakdown	60-75%	88-94%
Per-env split (prod vs non-prod)	80-88%	95-98%
Unattributed bucket	8-15% of spend	2-5% of spend

Per-team accuracy improves the most because the lookup table gives every resource a deterministic team mapping, even when the tag is missing. Under tag-only chargeback, a missing team tag meant the resource sat in unattributed; under the lookup-table model, the account-fallback rule covers it. The team that owns the account gets charged, even without the tag.

Per-service breakdown improves the second-most because services are the most volatile dimension. New services get created, old services get renamed, services get split or merged. The tag taxonomy can't keep up; the lookup table absorbs the changes in one yaml edit.

Per-env split is the easiest to fix because env is high-signal at the account level. Most orgs have separate AWS accounts per env, so even when the env tag is missing on a resource, the account tells the lookup what env it's in.

The unattributed bucket dropping from 8-15 percent to 2-5 percent is what makes finance trust the numbers again. A monthly chargeback report where 12 percent of the spend is "we don't know" is hard to act on. A report where 3 percent is unattributed (with a list of specific resources finance can investigate) is operational.

The 200-tag taxonomy isn't the problem. Demanding that 200 tags be set correctly on every resource is the problem. Move the policy to a lookup table the FinOps team owns, accept that tags are noisy inputs, and the chargeback numbers stop being a quarterly argument.