This was a fun one!
I've been trying to find a distribution that described HbA1c in the general population. After a few hours of digging, the best I could find was this research from NIH. In this article, all we are given is a few quantiles.
| 5% | 10% | 25% | 50% | 75% | 90% | 95% | 
|---|---|---|---|---|---|---|
| 4.7 | 4.8 | 5.1 | 5.4 | 5.7 | 6.1 | 6.8 | 
I want to try to fit a continuous distribution to these points. Simulated annealing lets us search the problem space for a global optimum. I know from the text of the research that the data is skewed (mean = 5.6, mode = 5.3). We need a distribution that allows for skewness. Let’s go with the gamma distribution.
The first thing we need to do is create a data frame of our known data set.
df <- data.frame(quantile = c(0.05, 0.1, 0.25, 0.50, 0.75, 0.9, 0.95),
                 value = c(4.7, 4.8, 5.1, 5.4, 5.7, 6.1, 6.8))
Next, we need to write a function that we want our SA algorithm to minimize.
fun <- function (x) {
    shape <- x[1]
    scale <- x[2]
    df$est <- qgamma(df$quantile, shape, scale)
    df$sq_err <- (df$est - df$value) ^ 2
    error <- sum(df$sq_err)
    error
}
We will use the GenSA package for the simulated annealing.
library(GenSA)
lower_bound <- 1e-6
upper_bound <- 100
fit <- GenSA(par = c(1, 1), 
             fn = fun, 
             lower = c(lower_bound, lower_bound), 
             upper = c(upper_bound, upper_bound))
This gives us gamma distribution parameters of shape = 89.4837 and scale = 16.2514. The SSE for this fit is 0.1747.
Lets see how this looks when plotted against the original data.
library(ggplot2)
df2 <- data.frame(x = seq(from = 3.0, to = 10.0, by = 0.01))
df2$y = pgamma(df2$x, fit$par[1], fit$par[2])
ggplot() +
    geom_line(aes(x, y), df2, color = "red") +
    geom_point(aes(value, quantile), df, size = 2, color = "blue")
There you go! If you only have a few data points and their quantiles, you can make a guess about their distribution. You can try this with multiple distributions and compare their resulting SSEs.
              
    
Top comments (0)