Modeling statutory payroll across 100+ countries without hardcoding a nightmare

#programming #ai #devops #saas

If you have ever tried to compute net pay in more than one country, you know the trap. The first country goes fine. You write a function, it works, you ship it. The second country breaks half your assumptions. By the fifth you have a tangle of if country == "X" branches and a payroll engine nobody wants to touch.

This is a write-up of how we model statutory payroll rules across a large set of countries as data rather than code so that adding a corridor is a config change and not a deploy.

The core mistake: rules as code
The naive version looks like this, and it is fine until it is not.

def net_pay(gross, country):
    if country == "IN":
        pf = min(gross * 0.12, 1800)
        esi = gross * 0.0075 if gross <= 21000 else 0
        return gross - pf - esi
    elif country == "PH":
        sss = ...  # different brackets entirely
        philhealth = ...
        return gross - sss - philhealth
    # ... and so on, forever

Every new country is a code change, a review, a deploy, and a fresh chance to break the countries you already support. Statutory rates also change mid-year, so you are redeploying for what should be a data update. This does not scale past a handful of corridors.

The shift: rules as a declarative schema

The better model treats every statutory deduction or contribution as a typed rule record. A rule has a country, an effective date range, a base it applies to, a calculation method, and bounds. The engine stays generic and just interprets rules.

{
  "country": "IN",
  "component": "provident_fund_employee",
  "effective_from": "2024-04-01",
  "effective_to": null,
  "base": "basic_plus_da",
  "method": "percentage",
  "rate": 0.12,
  "cap_base": 15000,
  "rounding": "nearest_rupee"
}

The engine reads the applicable rules for a country and a pay date, sorts by dependency, and applies them. Adding a country becomes "write its rule records." Changing a rate mid-year becomes "close the old record with an effective_to, open a new one." No deploy.

def compute(gross, components, country, pay_date):
    rules = rule_store.applicable(country, pay_date)
    ctx = {"gross": gross, **components}
    results = {}
    for rule in topological_order(rules):
        base = resolve_base(rule["base"], ctx, results)
        results[rule["component"]] = apply_method(rule, base)
    return results

The three things that actually bite you

The schema is the easy part. Here is where real-world payroll fights back.

The "base" is never just gross. Provident fund in India applies to the basic plus dearness allowance, not the total gross. Some contributions apply to a capped base, some to uncapped. If your rule cannot express "what amount does this percentage apply to," you will hardcode exceptions within a week. The basefield has to be a first-class, composable concept, not an afterthought.
Effective dating is not optional. Statutory rates change, and they change on government timelines, not yours. You will run payroll for May with May's rates and re-run a correction for March with March's rates in the same week. Every rule needs a validity window, and the engine must select rules by pay date, never "latest."
Ordering and dependency. Some deductions are computed on the amount after another deduction. Tax often depends on the contributions already taken. Treat rules as a dependency graph and resolve in topological order. A flat loop over rules will silently compute tax on the wrong base in exactly the countries where it matters most.

Why we publish the human version too

The engine is internal, but the statutory logic behind it is exactly what a company hiring in a new country needs to understand before they commit. So the same rule data that drives the calculation also drives the country guides we publish. When the provident fund cap changes, the engine and the public page update from the same source, which means the documentation cannot drift away from the system.
If you want to see the human-readable side of this, the country pages on the global employer of record hub lay out the statutory components per country, and the India corridor in particular goes deep on the contribution mechanics.

Takeaways if you are building this

Model statutory rules as effective-dated data records, not branches in code.
Make "base of calculation" a first-class field. It is the thing that breaks first.
Select rules by pay date, never "most recent," or your corrections will be wrong.
Resolve dependencies in topological order so downstream deductions read the right base.

The payoff is that onboarding a new country stops being an engineering project and becomes a data task a compliance analyst can own. That is the difference between covering five countries and covering a hundred.