DEV Community

Cover image for Simulated Annealing for Continuous Distribution Parameter Estimation in R
Jarrett Meyer
Jarrett Meyer

Posted on

3 3

Simulated Annealing for Continuous Distribution Parameter Estimation in R

This was a fun one!

I've been trying to find a distribution that described HbA1c in the general population. After a few hours of digging, the best I could find was this research from NIH. In this article, all we are given is a few quantiles.

5% 10% 25% 50% 75% 90% 95%
4.7 4.8 5.1 5.4 5.7 6.1 6.8

I want to try to fit a continuous distribution to these points. Simulated annealing lets us search the problem space for a global optimum. I know from the text of the research that the data is skewed (mean = 5.6, mode = 5.3). We need a distribution that allows for skewness. Let’s go with the gamma distribution.

The first thing we need to do is create a data frame of our known data set.

df <- data.frame(quantile = c(0.05, 0.1, 0.25, 0.50, 0.75, 0.9, 0.95),
                 value = c(4.7, 4.8, 5.1, 5.4, 5.7, 6.1, 6.8))
Enter fullscreen mode Exit fullscreen mode

Next, we need to write a function that we want our SA algorithm to minimize.

fun <- function (x) {
    shape <- x[1]
    scale <- x[2]
    df$est <- qgamma(df$quantile, shape, scale)
    df$sq_err <- (df$est - df$value) ^ 2
    error <- sum(df$sq_err)
    error
}
Enter fullscreen mode Exit fullscreen mode

We will use the GenSA package for the simulated annealing.

library(GenSA)

lower_bound <- 1e-6
upper_bound <- 100

fit <- GenSA(par = c(1, 1), 
             fn = fun, 
             lower = c(lower_bound, lower_bound), 
             upper = c(upper_bound, upper_bound))
Enter fullscreen mode Exit fullscreen mode

This gives us gamma distribution parameters of shape = 89.4837 and scale = 16.2514. The SSE for this fit is 0.1747.

Lets see how this looks when plotted against the original data.

library(ggplot2)

df2 <- data.frame(x = seq(from = 3.0, to = 10.0, by = 0.01))
df2$y = pgamma(df2$x, fit$par[1], fit$par[2])

ggplot() +
    geom_line(aes(x, y), df2, color = "red") +
    geom_point(aes(value, quantile), df, size = 2, color = "blue")
Enter fullscreen mode Exit fullscreen mode

Simulated data vs fit

There you go! If you only have a few data points and their quantiles, you can make a guess about their distribution. You can try this with multiple distributions and compare their resulting SSEs.

AWS GenAI LIVE image

Real challenges. Real solutions. Real talk.

From technical discussions to philosophical debates, AWS and AWS Partners examine the impact and evolution of gen AI.

Learn more

Top comments (0)

Billboard image

Create up to 10 Postgres Databases on Neon's free plan.

If you're starting a new project, Neon has got your databases covered. No credit cards. No trials. No getting in your way.

Try Neon for Free →

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay