Berkan Sesen

Posted on May 1 • Originally published at sesen.ai

Custom Likelihoods in PyMC: One-Inflated Beta Regression for Loan Repayment

#bayesian #probabilistic #pymc #customlikelihood

When a borrower takes out a personal loan, they might repay every penny, default entirely, or land anywhere in between. The interesting variable is the fraction eventually recovered: a number between 0 and 1 for each loan in the portfolio. Plot the distribution across thousands of loans and it looks like a smooth Beta curve with a tall spike bolted on at the right edge — a mass of borrowers who repaid in full.

That spike is good news for the lender, but a headache for the modeller. Standard Beta regression handles continuous outcomes on (0, 1), but it cannot produce a point mass at the boundary. Logistic regression predicts a binary paid-or-not label, throwing away the partial repayment information. Neither tool fits the data you actually have.

In our first PyMC post, we built hierarchical models using built-in distributions. In the second, we handled non-standard likelihoods with pm.Potential for right-censored survival data.

This post takes the final step: writing a piecewise log-likelihood from scratch for a mixture of continuous and discrete components. By the end, you will construct a One-Inflated Beta (OIB) regression in PyMC, hand-code the Beta log-density, and infer how borrower characteristics drive both the probability of full repayment and the expected partial repayment fraction.

Let's Build It

Click the badge below to open the full interactive notebook:

We will generate synthetic loan data for 2,000 borrowers, fit an OIB regression model, and recover the true data-generating parameters.

import numpy as np
import pymc as pm
import pytensor.tensor as pt
import arviz as az
import matplotlib.pyplot as plt

np.random.seed(42)

# --- Generate synthetic loan data ---
N = 2000
credit_score = np.random.normal(0, 1, N)
loan_to_value = np.random.normal(0, 1, N)
interest_rate = np.random.normal(0, 1, N)
income_ratio = np.random.normal(0, 1, N)
X = np.column_stack([credit_score, loan_to_value, interest_rate, income_ratio])
feature_names = ['credit_score', 'loan_to_value', 'interest_rate', 'income_ratio']

# True parameters
true_psi = np.array([0.5, 0.8, -0.6, -0.4, 0.3])    # pi coefficients
true_delta = np.array([0.3, 0.5, -0.3, -0.2, 0.2])   # theta coefficients
true_phi = 5.0                                          # Beta precision

# Per-loan probability of full repayment (logistic link)
logit_pi = true_psi[0] + X @ true_psi[1:]
pi_true = 1 / (1 + np.exp(-logit_pi))

# Per-loan mean partial repayment (logistic link)
logit_theta = true_delta[0] + X @ true_delta[1:]
theta_true = 1 / (1 + np.exp(-logit_theta))

# Beta shape parameters from mean-precision
alpha_true = theta_true * true_phi
beta_true = (1 - theta_true) * true_phi

# Sample from the OIB mixture
u = np.random.uniform(size=N)
repayment = np.where(u < pi_true, 1.0, np.random.beta(alpha_true, beta_true))

n_full = (repayment == 1.0).sum()
print(f"Fully repaid: {n_full}/{N} ({n_full/N:.1%})")
print(f"Partial repayment: {N - n_full}/{N} ({(N - n_full)/N:.1%})")

$Histogram of loan repayment fractions showing a tall spike at 1.0 for 1,214 fully repaid loans and a smooth Beta-shaped distribution for 786 partial repayments between 0 and 1.$

Of 2,000 loans, 1,214 (60.7%) are fully repaid and 786 (39.3%) show partial repayment. The histogram immediately reveals the two populations: a tall spike at 1.0 and a continuous spread below it. No single standard distribution can capture both. Now let's build the OIB model.

# Split observations by type
full_idx = np.where(repayment == 1.0)[0]
partial_idx = np.where(repayment < 1.0)[0]
partial_values = repayment[partial_idx]

with pm.Model() as oib_model:
    # Pi sub-model: probability of full repayment (logistic link)
    psi_intercept = pm.Normal('psi_intercept', mu=0, sigma=5)
    psi_coeffs = pm.Normal('psi_coeffs', mu=0, sigma=1, shape=4)
    logit_pi = psi_intercept + pt.dot(X, psi_coeffs)
    pi = pm.Deterministic('pi', pm.math.invlogit(logit_pi))

    # Theta sub-model: mean of partial repayment Beta (logistic link)
    delta_intercept = pm.Normal('delta_intercept', mu=0, sigma=5)
    delta_coeffs = pm.Normal('delta_coeffs', mu=0, sigma=1, shape=4)
    logit_theta = delta_intercept + pt.dot(X, delta_coeffs)
    theta = pm.Deterministic('theta', pm.math.invlogit(logit_theta))

    # Phi: Beta precision (shared across all loans)
    phi = pm.Gamma('phi', alpha=2, beta=0.5)

    # Convert mean-precision to standard Beta parameters
    a = theta * phi
    b = (1 - theta) * phi

    # Expected repayment: E[Y] = pi + (1 - pi) * theta
    E_f = pm.Deterministic('E_f', pi + (1 - pi) * theta)

    # --- Piecewise log-likelihood via pm.Potential ---
    # Fully repaid loans: log(pi_i)
    pm.Potential('ll_full', pt.sum(pt.log(pi[full_idx])))

    # Partial repayments: log(1 - pi_i) + log Beta(y_i | a_i, b_i)
    pa, pb = a[partial_idx], b[partial_idx]
    beta_logp = (pt.gammaln(pa + pb) - pt.gammaln(pa) - pt.gammaln(pb)
                 + (pa - 1) * pt.log(partial_values)
                 + (pb - 1) * pt.log(1 - partial_values))
    pm.Potential('ll_partial', pt.sum(pt.log(1 - pi[partial_idx]) + beta_logp))

with oib_model:
    trace = pm.sample(draws=1000, tune=3000, chains=4,
                      target_accept=0.95, random_seed=42,
                      init='jitter+adapt_diag')

print(az.summary(trace, var_names=['psi_intercept', 'psi_coeffs',
                                    'delta_intercept', 'delta_coeffs', 'phi']))

The trace plots show healthy chains: zero divergences, good mixing across all four chains, and unimodal posteriors centred near the true parameter values. Sampling 4,000 draws per chain with the Potential-based likelihood took about 6 seconds.

You just fitted a custom Bayesian mixture model with 11 free parameters. Now let's understand how each piece works.

What Just Happened?

Two populations, one model

Our data contains two distinct groups. Some borrowers repay their loan in full (repayment fraction = 1.0), and others repay partially (0 < fraction < 1). The OIB model treats this as a mixture: with probability $\pi_i$ the outcome is exactly 1, and with probability $1 - \pi_i$ it follows a Beta distribution.

Both $\pi_i$ and the Beta mean $\theta_i$ vary across borrowers. A high credit score might increase both the chance of full repayment and the expected partial repayment. The model captures these relationships through separate linear predictors with logistic links, ensuring both quantities stay between 0 and 1.

The piecewise log-likelihood

The OIB density is a mixture of a point mass and a continuous distribution. For observation $y_i$ :

Taking logs:

The addition in the second branch is critical: it corresponds to multiplying the mixing weight $(1 - \pi_i)$ by the Beta density in probability space. A common mistake is to write multiplication of two log quantities (i.e. log(1-pi) * log(Beta(...))) instead of addition. That would have no probabilistic interpretation.

We implement this by splitting observations into two groups and adding each group's log-likelihood as a separate pm.Potential:

# Fully repaid: sum of log(pi_i) over fully-repaid loans
pm.Potential('ll_full', pt.sum(pt.log(pi[full_idx])))

# Partial: sum of log(1 - pi_i) + Beta_logpdf(y_i) over partial loans
pm.Potential('ll_partial', pt.sum(pt.log(1 - pi[partial_idx]) + beta_logp))

This pattern should feel familiar from Post 21, where we used pm.Potential to handle right-censored observations. The principle is the same: when your likelihood has distinct branches for different observation types, split them into separate Potential terms.

Hand-coding the Beta log-density

Rather than relying on pm.logp(pm.Beta.dist(...), value), we compute the Beta log-density directly using the gamma function:

beta_logp = (pt.gammaln(a + b) - pt.gammaln(a) - pt.gammaln(b)
             + (a - 1) * pt.log(y) + (b - 1) * pt.log(1 - y))

This follows from the Beta density formula $f(y \mid \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} y^{\alpha-1}(1-y)^{\beta-1}$ . Writing it out explicitly has two advantages: you can see exactly what the sampler is differentiating through, and you avoid potential issues with PyMC's internal distribution objects when used inside Potential expressions.

The model structure

The model has three sub-components connected by link functions:

Pi sub-model controls which mixture component generates each observation. A logistic link maps the linear predictor $\psi_0 + \psi_1 x_{\text{credit}} + \psi_2 x_{\text{ltv}} + \psi_3 x_{\text{rate}} + \psi_4 x_{\text{income}}$ to a probability. Positive $\psi_1$ means higher credit scores increase the chance of full repayment.

Theta sub-model sets the mean of the Beta distribution for partial repayments, also through a logistic link with its own coefficients $\delta_0, \ldots, \delta_4$ . This captures a subtlety that pure classification misses: among borrowers who do not fully repay, some covariates still push the partial fraction higher.

Phi is a single shared precision parameter for the Beta component. Higher phi means less variance in partial repayments. It uses a $\text{Gamma}(2, 0.5)$ prior with mean 4, which favours moderate precision values.

Checking the fit

Let's compare the estimated coefficients to the true values we used to generate the data.

summary = az.summary(trace, var_names=['psi_intercept', 'psi_coeffs',
                                        'delta_intercept', 'delta_coeffs', 'phi'])

true_vals = np.concatenate([true_psi, true_delta, [true_phi]])
param_names = (['psi_intercept'] + [f'psi_coeffs[{i}]' for i in range(4)] +
               ['delta_intercept'] + [f'delta_coeffs[{i}]' for i in range(4)] + ['phi'])

fig, ax = plt.subplots(figsize=(8, 6))
y_pos = np.arange(len(param_names))
means = summary['mean'].values
hdi_low = summary['hdi_3%'].values
hdi_high = summary['hdi_97%'].values

ax.errorbar(means, y_pos, xerr=[means - hdi_low, hdi_high - means],
            fmt='o', color='steelblue', capsize=4, label='Posterior (94% HDI)')
ax.scatter(true_vals, y_pos, marker='x', color='crimson', s=80,
           zorder=5, label='True value')
ax.set_yticks(y_pos)
ax.set_yticklabels(param_names)
ax.axvline(0, color='gray', linestyle='--', alpha=0.5)
ax.set_xlabel('Value')
ax.legend(loc='lower right')
ax.set_title('Parameter Recovery: Posterior vs True Values')
plt.tight_layout()

Every true value falls within its 94% highest density interval. The model correctly identifies that credit score has the strongest positive effect on full repayment (psi_coeffs[0] = 0.85, true: 0.8), while loan-to-value ratio is the strongest negative predictor (psi_coeffs[1] = -0.58, true: -0.6). The precision parameter phi is recovered at 5.47 (true: 5.0), and the effective sample sizes all exceed 2,500.

Posterior predictive check

The ultimate test: can the model reproduce the observed data distribution, including the spike at 1.0? Since we used pm.Potential rather than an observed distribution, we generate predictive samples manually from the posterior:

# Extract posterior samples
psi_int_post = trace.posterior['psi_intercept'].values.flatten()
psi_coeff_post = trace.posterior['psi_coeffs'].values.reshape(-1, 4)
delta_int_post = trace.posterior['delta_intercept'].values.flatten()
delta_coeff_post = trace.posterior['delta_coeffs'].values.reshape(-1, 4)
phi_post = trace.posterior['phi'].values.flatten()

rng = np.random.default_rng(42)
n_draws = 500
ppc_samples = np.zeros((n_draws, N))

for i in range(n_draws):
    lp = psi_int_post[i] + X @ psi_coeff_post[i]
    pi_i = 1 / (1 + np.exp(-lp))
    lt = delta_int_post[i] + X @ delta_coeff_post[i]
    theta_i = 1 / (1 + np.exp(-lt))
    a_i, b_i = theta_i * phi_post[i], (1 - theta_i) * phi_post[i]
    u_i = rng.uniform(size=N)
    ppc_samples[i] = np.where(u_i < pi_i, 1.0, rng.beta(a_i, b_i))

fig, ax = plt.subplots(figsize=(8, 4))
ax.hist(repayment, bins=50, density=True, alpha=0.5, color='steelblue',
        label='Observed', edgecolor='white')
ax.hist(ppc_samples.flatten(), bins=50, density=True, alpha=0.4, color='coral',
        label='Posterior predictive', edgecolor='white')
ax.set_xlabel('Repayment Fraction')
ax.set_ylabel('Density')
ax.legend()
ax.set_title('Posterior Predictive Check')
plt.tight_layout()

The posterior predictive distribution matches both the spike at 1.0 and the shape of the partial repayment component. This is something neither pure Beta regression nor logistic regression can achieve.

Going Deeper

The mean-precision parameterisation

The standard Beta distribution uses shape parameters $\alpha$ and $\beta$ , but these are difficult to interpret. A borrower with $\alpha = 2.8$ and $\beta = 2.1$ tells you almost nothing at a glance. The mean-precision reparameterisation solves this:

Now $\mu$ is the mean of the distribution (the expected partial repayment fraction) and $\phi$ is the precision (higher means less spread). The inverse mapping recovers the standard parameters:

In our model, $\mu$ is called $\theta$ and depends on covariates through a logistic link. The precision $\phi$ is shared across all observations, which assumes that the variance of partial repayments (given the mean) is the same for all borrowers. This is a simplification; a fully heteroscedastic model would give $\phi$ its own linear predictor.

Why logistic links?

Both $\pi$ (probability of full repayment) and $\theta$ (mean of the Beta) must live in (0, 1). The logistic function $\sigma(x) = 1 / (1 + e^{-x})$ maps any real-valued linear predictor to this interval. This is the same link function used in logistic regression and Bayesian classification.

The priors reflect the link: $\text{Normal}(0, 5)$ on the intercepts allows the baseline probability to range widely, while $\text{Normal}(0, 1)$ on the slope coefficients gently regularises each covariate's effect. On the logistic scale, a coefficient of 1.0 roughly doubles the odds, so a $\text{Normal}(0, 1)$ prior is mildly informative.

The expected value formula

The overall expected repayment for borrower $i$ combines both components:

This is the E_f deterministic in our model. It allows you to rank borrowers by expected repayment even when their risk profiles differ in how they fail: one borrower might have a high chance of full repayment but low partial repayment if they default, while another has a moderate chance of full repayment but high partial recovery.

Why pm.Potential and not pm.CustomDist?

PyMC offers two ways to implement custom likelihoods. pm.CustomDist lets you define a distribution from its logp function, which would look like this for OIB:

def oib_logp(value, pi, a, b):
    return pt.switch(
        pt.eq(value, 1.0),
        pt.log(pi),
        pt.log(1 - pi) + pm.logp(pm.Beta.dist(alpha=a, beta=b), value)
    )

This is elegant but fragile. The pt.switch operator evaluates both branches for every observation during automatic differentiation.

When value = 1.0, the Beta branch computes pm.logp(Beta, 1.0), which returns negative infinity (since the Beta density is zero at boundaries for $\beta > 1$ ). Even though the switch selects the other branch, the gradient through the infinite branch corrupts the NUTS sampler. The result: 100% divergence rate.

The pm.Potential approach avoids this entirely. By pre-splitting observations into fully-repaid and partial groups, the Beta density is never evaluated at the boundary. This is the same pattern we used for censored data in survival analysis: separate the observation types, compute each group's log-likelihood independently, and add them as Potential terms.

The trade-off is that pm.Potential does not enable pm.sample_posterior_predictive out of the box (you need to write manual prediction code, as we did). For many production workflows, that is a minor inconvenience compared to the reliability gain.

Sampling considerations

Our sampling configuration follows the original code that inspired this tutorial:

3,000 tuning steps with 1,000 posterior draws per chain. The long warm-up helps the NUTS sampler adapt its step size to the geometry of the piecewise likelihood.
4 chains for convergence diagnostics. With $\hat{R}$ and effective sample size, four chains provide reliable evidence that the sampler has explored the full posterior.
target_accept=0.95 raises the acceptance threshold from the default 0.8, which reduces divergences in models with sharp likelihood boundaries.
init='jitter+adapt_diag' initialises each chain near the prior mean with small random perturbations. A practical note from the original code: if covariates have very different scales (e.g., one ranges from 0 to 1 while another ranges from 0 to 200), the default jitter of roughly $\pm 1$ can push initial coefficient values far from reasonable territory. Standardising covariates beforehand, as we did, avoids this.

When to use something else

The OIB model assumes that exactly-one observations arise from a fundamentally different process than partial observations. If instead you have data with a spike at zero (e.g., insurance claims where most customers file nothing), you want a zero-inflated model. If you have spikes at both boundaries, you need a zero-and-one-inflated Beta (ZOIB).

For data with no boundary spikes at all, standard Beta regression (via MLE or Bayesian inference) is simpler and sufficient. The extra complexity of the OIB mixture is only justified when the data genuinely contains a discrete mass at the boundary.

Where This Comes From

The OIB model sits at the intersection of two lines of research: Beta regression for bounded continuous data, and inflated distributions for boundary spikes.

Beta regression: Ferrari and Cribari-Neto (2004)

The foundation is the Beta regression model introduced by Silvia Ferrari and Francisco Cribari-Neto in their 2004 paper "Beta Regression for Modelling Rates and Proportions" (Journal of Applied Statistics, 27(7), 799-815). They observed that rates, proportions, and fractions appear everywhere in applied statistics, yet researchers typically transform them (logit, arcsine) and apply linear regression. This is problematic because the transformation distorts the error structure and complicates interpretation.

Their key insight was to model the response directly as Beta-distributed, using the mean-precision parameterisation we adopted:

where $0 < y < 1$ , $0 < \mu < 1$ is the mean, and $\phi > 0$ is the precision. Ferrari and Cribari-Neto showed that this is a natural exponential family model when parameterised through $\mu$ , and proposed a logit link for the mean:

"The proposed model is useful for situations where the variable of interest is continuous and restricted to the interval (0, 1). [...] A convenient parameterisation of the beta density in terms of the mean and a precision parameter is used."

Their framework supports maximum likelihood estimation, but the Bayesian extension (which we use) adds uncertainty quantification and regularisation through priors. The connection to MLE is direct: the posterior mode of our model with flat priors equals the MLE of Ferrari and Cribari-Neto's model.

Inflated models: Ospina and Ferrari (2010)

The standard Beta has support on the open interval (0, 1), so it cannot assign positive probability to the boundaries 0 or 1. Raydonal Ospina and Silvia Ferrari addressed this in "Inflated Beta Distributions" (Statistical Papers, 51(1), 111-126, 2010). They defined a class of mixed continuous-discrete distributions:

This is exactly the piecewise density we implemented with pm.Potential. The parameter $\pi$ controls the inflation: the probability of observing the boundary value. Ospina and Ferrari also developed zero-inflated and zero-and-one-inflated variants for different boundary patterns.

"In many practical situations, the variable of interest is continuous in the open standard unit interval but may also assume the extreme values zero and/or one with positive probabilities. [...] We introduce a class of inflated beta distributions."

Their work established the theoretical properties (moments, maximum likelihood estimation, score functions) that underpin our Bayesian implementation.

From MLE to MCMC

The original MLE approach estimates $\pi$ , $\mu$ , and $\phi$ by maximising the log-likelihood. The Bayesian version replaces optimisation with MCMC sampling, yielding full posterior distributions rather than point estimates. This is particularly valuable for the OIB model because the piecewise likelihood creates a posterior geometry that point estimates cannot capture: the uncertainty in $\pi$ and $\theta$ is correlated, and the posterior for $\phi$ is often skewed.

Where Ferrari and Cribari-Neto derived score functions by hand, we supply the log-density components to PyMC and let the NUTS sampler handle the rest. The automatic differentiation in PyTensor computes gradients through the gammaln and log operations, enabling efficient Hamiltonian Monte Carlo.

Algorithm summary

The complete OIB regression procedure:

For each observation $i$ , compute $\pi_i = \sigma(\psi_0 + \mathbf{x}_i^\top \boldsymbol{\psi})$ (full repayment probability)
Compute $\theta_i = \sigma(\delta_0 + \mathbf{x}_i^\top \boldsymbol{\delta})$ (partial repayment mean)
Compute Beta shape parameters: $\alpha_i = \theta_i \phi$ , $\beta_i = (1 - \theta_i) \phi$
Evaluate the piecewise log-likelihood: $\log \pi_i$ if $y_i = 1$ , else $\log(1 - \pi_i) + \log \text{Beta}(y_i \mid \alpha_i, \beta_i)$
Sum across all observations and sample the posterior via NUTS

Interactive Tools

Distribution Explorer — Visualise the Beta distribution and other families used in this model
Bayes' Theorem Calculator — Explore Bayesian reasoning interactively

Hierarchical Bayesian Regression with PyMC: When Groups Share Strength — Partial pooling and group-level priors in PyMC
Bayesian Survival Analysis with PyMC: Modelling Customer Churn — Another custom likelihood built in PyMC
From MLE to Bayesian Inference — Why we use priors and posteriors instead of point estimates
MCMC Metropolis-Hastings: An Island-Hopping Guide — The sampling engine behind PyMC

Frequently Asked Questions

When should I use a One-Inflated Beta model instead of logistic regression?

Use OIB when your outcome is a fraction between 0 and 1 with a spike at the boundary value of 1. Logistic regression discards the partial repayment information by collapsing everything into a binary label. OIB preserves both the probability of full repayment and the distribution of partial repayments, giving you richer predictions.

Why use pm.Potential instead of pm.CustomDist for the likelihood?

The pm.CustomDist approach evaluates both branches of the piecewise likelihood for every observation during automatic differentiation. When the Beta density is evaluated at the boundary value of 1.0, it returns negative infinity, which corrupts the NUTS sampler gradients and causes 100% divergences. Splitting observations with pm.Potential avoids evaluating the Beta density at the boundary entirely.

What is the mean-precision parameterisation of the Beta distribution?

Instead of the standard shape parameters alpha and beta, the mean-precision form uses mu (the mean, between 0 and 1) and phi (the precision, controlling spread). This is more interpretable: mu directly tells you the expected partial repayment fraction, while phi tells you how concentrated the distribution is around that mean. The standard parameters are recovered as alpha = mu * phi and beta = (1 - mu) * phi.

How do I check whether the OIB model fits my data well?

Generate posterior predictive samples by drawing from the fitted model and comparing the resulting distribution to the observed data. The key check is whether the model reproduces both the spike at 1.0 (the proportion of fully repaid loans) and the shape of the continuous partial repayment distribution. If either component is mismatched, the model needs adjustment.

Can this model handle spikes at both 0 and 1?

Yes, but you would need a Zero-and-One-Inflated Beta (ZOIB) model. This adds a third mixture component for the spike at zero, with its own probability parameter. The piecewise likelihood gains a third branch, but the pm.Potential implementation pattern remains the same: split observations into three groups and add each group's log-likelihood separately.

DEV Community