Momchil Atanasov

Posted on Mar 15

Implementing a Dynamic Compressor

#algorithms #gamedev #go #programming

Background

As part of my hobby game engine "lacking", I am implementing an audio mixer in Go for native builds. It is based on miniaudio but uses its own node-based mixing logic. In many ways, I am trying to keep the API and behavior as close as possible to the Web Audio API, since my engine supports WebAssembly builds where Web Audio API is already internally used.

While most Node types have proven intuitive to implement and/or there are sufficient resources online, the Dynamic Compressor node turned out elusive. Unfortunately, the Web Audio API specification provides limited information on the matter: https://webaudio.github.io/web-audio-api/#DynamicsCompressorOptions-processing

The goal of this article is to share what I have learned in the process and hopefully help someone else that might be stumbling with the same stuff. That said, keep in mind that I’m a complete beginner in the audio processing space, so take this article with a grain of salt.

Side note: Using AI can get you 90% along the way, or maybe even 100% if you know the subject and are able to guide it along the way. In fact, I did use the help of AI to figure out a lot of the stuff here. Regardless, I prefer to understand what I am implementing as much as possible (except for Quaternions - still can't create a visual understanding in my head), hence this article.

Algorithm

The main goal of the Dynamic Compressor is to reduce the output volume if the input signal consistently passes a certain threshold.

It appears that there are a few ways to implement a compressor, but the general flow is as follows:

Consume audio frames (PCM frames)
Determine input power (in dB units)
Determine target compression amount (in dB units)
Calculate desired audio gain (in ratio units)
Gradually move current gain towards desired gain
Apply gain to input frame and pass to output

(Step 1) Consume audio frames

In my particular case, there is nothing fancy about this. Input PCM frames arrive in chunks, and there is a for loop in the handler function that processes each frame in turn.

func (n *CompressorNode) Process(ctx ProcessContext, inputFrames, outputFrames FrameList) {
  for i, frame := range inputFrames {
    // Steps 2 to 6 happen here.
  }
}

(Step 2) Determine input power

Here we need to determine how loud the input is. Some implementations use algorithms like RMS (root mean square) to determine that. From what I understand, this requires tracking multiple samples in a row to calculate that.

For this initial implementation, I am just basing it on the value of the current input frame.

peak := max(sprec.Abs(frame.Left), sprec.Abs(frame.Right))
peakDB := audio.GainToDB(max(1.0e-8, peak)) // avoid log of zero

The equation for converting audio amplitude to dB is as follows:

db(x) = 20\log_{10}(x)

This is exactly what happens in the audio.GainToDB function.
Since log is not defined for 0, a max function with a small lower boundary is used to prevent -Inf values.

(Step 3) Determine target compression amount

This was the most problematic step for me. I could not figure out what compression function should be used and how this function relates to gain.

The reduction function

A compression function is often depicted like in the following image (it shows a hard compressor).

The X-axis represents the input dB, and the Y-axis represents the output dB that should ideally be produced in return. Hence, what we actually care about is how much dB the input signal loses due to compression. This is achieved by subtracting the function value from what would otherwise have been the original output value.

The following picture should make that clearer.

We care about the vertical stripes. To put that into mathematical terms, having our compression function f(x), we actually care about the reduction r(x) it produces.

r(x) = f(x) - i(x)

Here i(x) stands for the identity function. This is just the diagonal line that represents no compression (output equals input; Y equals X) and has the following form.

i(x) = x

All of this is very important. Once we figure out what the f(x) function is, we will actually need the r(x) function to use it in code, since it gives us the compression dB amount. And in many cases, the r(x) function actually turns out to be more simple and pleasant to read.

The compression function

For now, we will stick with the hard compressor function. Once we get a grasp on things and cover all steps, we will rework it into a soft-compressor (at the end of the article).

As a reminder, here is what the function f(x) looks for a hard compressor.

We can clearly see that initially the output signal equals the input signal. This is basically the identity function. After a certain input signal strength, the output signal is compressed.

The point at which this transition happens is called the threshold and is a configurable parameter on a Dynamic Compressor. Input frames that cross that threshold are expected to be compressed.

Since this function cannot be represented by a single math expression, we will need to split it into two parts (regions), determined by input ranges.

Non-compression region

This is the region before the threshold (denoted as t) is reached and can be expressed as:

x <= t

As mentioned above and as can be seen in the diagram, this region does not perform any compression and the f(x) function is just the identity function.

f(x) = i(x) = x

However, we actually care about the reduction in dB.

r(x) = f(x) - i(x) = i(x) - i(x) = 0

This makes sense: we expect 0 dB in reduction (in audio systems, 0 dB means full volume, or 1.0 gain).

Hard-compression region

This is the region after the threshold (denoted as t) and can be expressed as:

t < x

Here we need to perform compression, which is done through a scaling factor (q). This can be described with the following equation:

\frac{x - t}{f(x)-i(t)}=q

This might seem intimidating at first, but it actually represents the rate of ascent of the compressed output line:

The q constant determines how much the input moves along the x axis for a unit increase in the y axis. This ratio is normally configurable in a Dynamic Compressor.

If we consider that $i(t) = t$ , we have all we need to derive f(x) by transforming the equation above.

q (f(x) - t) = x - t

q f(x) = x - t + q t

f(x) = \frac{x}{q} - \frac{t}{q} + t

f(x) = \frac{x}{q} - (\frac{t}{q} - t)

f(x) = \frac{x}{q} - (\frac{1}{q} - 1) t

We now have the compression function for the hard compression region. Just like with the non-compression region, we actually need to calculate the reduction function, since this is what we care about and what will be used in the code.

r(x) = f(x) - i(x) = \frac{x}{q} - (\frac{1}{q} - 1) t - x

r(x) = \frac{x}{q} - x - (\frac{1}{q} - 1) t

r(x) = (\frac{1}{q} - 1) x - (\frac{1}{q} - 1) t

r(x) = (\frac{1}{q} - 1) (x - t)

Using the reduction functions in code

Now that we have the reduction functions for both compression regions, we can put that into code.

var reductionDB float32
if peakDB <= threshold {
    reductionDB = 0.0
} else {
    reductionDB = ((1.0 / ratio) - 1.0) * (peakDB - threshold)
}

(Step 4) Calculate desired audio gain

This is fairly trivial. We need to convert the reductionDB value to an amplitude fraction value.

Keep in mind that our r(x) functions always produce values in the range 0.0 to -inf, which is ideal since this maps to the gain value range of 1.0 to 0.0, respectively.

The required transformation is the opposite to the one we used previously.

gain(x) = 10^{\frac{x}{20}}

Which can be put into code.

targetGain := audio.DBToGain(reductionDB)

(where DBToGain holds the implementation for the gain-from-dB function)

(Step 5) Gradually move current gain towards desired gain

So far, we have determined the input power, we have calculated the desired db reduction and have transformed that into a fractional value (i.e. gain).

Should the input signal pass the threshold, the gain value would start oscillating up and down at the same rate as the input signal. We don't want that. Instead, we want the gain value to smoothly transition towards the desired value.

To be even more precise, we'd like the actual gain value to ignore high frequency changes to the desired gain value and instead follow the low frequency changes - in essence a low-pass filter.

We achieve that by using the equation for a first-order low-pass filter.

g_{new} = (1 - \beta) g_{target} + \beta g

More information can be found on the Wikipedia page.

However, we don't calculate $\beta$ in terms of a cut-off frequency, as would normally be done. Instead, we explore the low-pass filter as an exponential smoothing function and configure it in such a way as to ensure ~63% transition to the target value after $\tau$ duration.

As such, the equation for $\beta$ is:

\beta = e^{-\frac{1}{\tau f_s}}

Here $f_s$ is the discrete sampling frequency. Usually this is 44100 though other standards are available as well. Regardless, this is information that is internally available to the processor node.

How exponential smoothing works is explained well in the Exponential Smoothing article.

I would be lying if I said that I fully understand why this is used in real life and it seems that not all analog Dynamic Compressors follow that approach, as is explained in (https://www.audiotechnology.com/tutorials/understanding-compression-2).

That said, this is the approach we are taking. In code, there is a very elegant way to implement the first equation through the usage of the mix function (popular with graphics shader developers).

n.currentGain = sprec.Mix(targetGain, n.currentGain, coeff)

Here, coeff represents $\beta$ in the above equations and controls the rate of convergence.

We should still keep in mind that this filter isn't being applied to the input signal here. It is being applied to the gain value, which controls the amount of compression. It is common to want the compressor to be quicker to act when the signal requires compression, but more relaxed when backing off.

These are the so called attack and release parameters. And this is also why, depending on whether the gain is increasing or decreasing, the coeff argument is calculated differently.

In code, this looks as follows:

const ( // normally these would be configurable
    attack  = 0.005
    release = 0.25
    sampleRate = 44100.0
)

var coeff float32
if targetGain < n.currentGain {
    coeff = float32(math.Exp(-1.0 / (sampleRate * attack)))
} else {
    coeff = float32(math.Exp(-1.0 / (sampleRate * release)))
}

n.currentGain = sprec.Mix(targetGain, n.currentGain, coeff)

(Step 6) Apply gain to input frame and pass to output

This is the most trivial step. We take the input values and multiply them by the gain value. Then we pass them as output.

outputFrames[i] = Frame{
    Left:  frame.Left * n.currentGain,
    Right: frame.Right * n.currentGain,
}

If we piece together all the code, we get the following implementation:

type CompressorNode struct {
    currentGain float32 // this should be initialized to 1.0
}

func (n *CompressorNode) Process(ctx ProcessContext, inputFrames, outputFrames FrameList) {
    const ( // these should be configurable
        sampleRate = 44100.0
        attack  = 0.005
        release = 0.25
    )

    for i, frame := range inputFrames {
        peak := max(sprec.Abs(frame.Left), sprec.Abs(frame.Right))
        peakDB := audio.GainToDB(max(1.0e-8, peak)) // avoid log of zero

        var reductionDB float32
        if peakDB <= threshold {
            reductionDB = 0.0
        } else {
            reductionDB = ((1.0 / ratio) - 1.0) * (peakDB - threshold)
        }
        targetGain := audio.DBToGain(reductionDB)

        var coeff float32
        if targetGain < n.currentGain {
            coeff = float32(math.Exp(-1.0 / (sampleRate * attack)))
        } else {
            coeff = float32(math.Exp(-1.0 / (sampleRate * release)))
        }
        n.currentGain = sprec.Mix(targetGain, n.currentGain, coeff)

        outputFrames[i] = Frame{
            Left:  frame.Left * n.currentGain,
            Right: frame.Right * n.currentGain,
        }
    }
}

Soft Dynamic Compressor

While the hard compressor is easy to implement, it has a sharp edge when transitioning from the non-compression region into the hard-compression region. This can become noticeable.

To mitigate that, a soft-compression region is introduced, so that it smooths the transition between the two regions.

The compression function looks as follows:

The region between the blue vertical lines is called the knee and it spans $\frac{k}{2}$ units to the left of the threshold and $\frac{k}{2}$ units to the right of the threshold.

What is important to note is that the non-compression and hard-compression regions still have the same functions, it is just that their ranges have changed, as depicted in the following image.

In this case, the whole compression function should be split into three parts - non-compression region, a soft-compression region and a hard-compression region. The tricky part is figuring out the soft-compression region, since it needs to bridge the two neighbouring regions in a smooth fashion.

Non-compression region

The range for this region has now changed to:

x <= t - \frac{k}{2}

As stated, the compression function remains the same:

f(x) = x

The reduction function is preserved as well:

r(x) = 0

Hard-compression region

The range for this region has now changed to:

t + \frac{k}{2} <= x

As stated, the function remains the same:

f(x) = \frac{x}{q} - (\frac{1}{q} - 1) t

And the reduction is preserved as well:

r(x) = (\frac{1}{q} - 1) (x - t)

Soft-compression region

The range for this region is as follows:

t - \frac{k}{2} < x < t + \frac{k}{2}

However, we don't have the equation for the function. We need a function that is smooth, monotonically increasing and that connects to the non-compression and hard-compression regions seamlessly (i.e. has the same value and first-order derivative at the boundary).

It is clear that using a linear function would not work here, since it will create two sharp edges at the boundary. Instead, it turns out that a quadratic polynomial fits perfectly.

First things first, let's simplify our problem a bit. If we translate our function a bit so that the soft-compression region starts at the origin, this will make the math much simpler.

So let us instead explore a function g(x) that represents this translated version as depicted here:

This way, $g(0) = 0$ , which makes things much simpler.

To get back to the f function from this new g function (that we are yet to derive), we can use the following equation.

f(x) = g(x - (t - \frac{k}{2})) + i(t - \frac{k}{2})

It might look complicated at first, but this adjustment just takes into account the amount by which the input and output of g are shifted. We can input the lower boundary value to verify that it holds true.

f(t - \frac{k}{2}) = g((t - \frac{k}{2}) - (t - \frac{k}{2})) + i(t - \frac{k}{2})

f(t - \frac{k}{2}) = g(0) + i(t - \frac{k}{2})

f(t - \frac{k}{2}) = i(t - \frac{k}{2})

This is exactly what we expect, so our translation works.

Lower boundary

We have translated g to start at the origin, hence we know the following:

g(0) = 0

Furthermore, the derivative must match that of the identity function, which is always 1 (easy to derive, not shown here).

g'(0) = 1

Upper boundary

In addition, the end of the region must match the start of the hard-compression region.

g(k) = f_{hard}(t + \frac{k}{2}) - f_{non}(t - \frac{k}{2})

This just checks that the elevation at the upper boundary of the g function matches the height difference between the start of the hard-compression region and the end of the non-compression region.

The above thus leads to the following.

g(k) = \frac{t + \frac{k}{2}}{q} - (\frac{1}{q} - 1) t - (t - \frac{k}{2})

g(k) = \frac{t}{q} + \frac{\frac{k}{2}}{q} - \frac{t}{q} + t - t + \frac{k}{2}

g(k) = \frac{\frac{k}{2}}{q} + \frac{k}{2}

g(k) = (\frac{1}{q} + 1) \frac{k}{2}

Lastly, we want the derivative at the end of g to match the derivative at the start of $f_{hard}$ . But the derivative of $f_{hard}$ is also always a constant equal to $\frac{1}{q}$ (easy to derive, not shown here).

Hence, we have:

g'(k) = \frac{1}{q}

The quadratic polynomial

Now we try to find the parameters of a quadratic polynomial that satisfy the above constraints.

g(x) = a x^2 + b x + c

We need to find out what are the values of a, b and c. Let's use the constraints to figure that out.

We know that $g(0) = 0$ , hence we have:

g(0) = a 0^2 + b 0 + c = c

Thus, for this to work, c needs to be 0 and we can ignore it from now on. So we are looking for the parameters a and b.

g(x) = a x^2 + b x

The derivative of g is as follows:

g'(x) = 2 a x + b

Knowing that $g'(0) = 1$ , we have the following:

g'(0) = 2 a 0 + b = b = 1

So we can determine that $b = 1$ . All we are left to calculate is a. We will use the upper-range constraint to do that.

g(k) = (\frac{1}{q} + 1) \frac{k}{2}

a k^2 + k = (\frac{1}{q} + 1) \frac{k}{2}

a k + 1 = (\frac{1}{q} + 1) \frac{1}{2}

2 a k + 2 = \frac{1}{q} + 1

2 a k + 1 = \frac{1}{q}

2 a k = \frac{1}{q} - 1

a = \frac{\frac{1}{q} - 1}{2 k}

Excellent, we now have the value for a. The last thing we need to do is verify that the derivative matches. Recall that:

g'(x) = 2ax + 1

But we know what a is, so it becomes:

g'(x) = (\frac{\frac{1}{q} - 1}{k})x + 1

Now we plug the value k and see if that matches the derivative for the hard-compression range.

g'(k) = (\frac{\frac{1}{q} - 1}{k})k + 1

g'(k) = \frac{1}{q} - 1 + 1

g'(k) = \frac{1}{q}

Which is exactly what our derivative constraint for the upper bound required. We now have a complete solution for the g function.

g(x) = (\frac{\frac{1}{q} - 1}{2k})x^2 + x

Having the g function, we need to transform it in order to get the compression function f.

f(x) = g(x - (t - \frac{k}{2})) + (t - \frac{k}{2})

f(x) = (\frac{\frac{1}{q} - 1}{2k})[x - (t - \frac{k}{2})]^2 + [x - (t - \frac{k}{2})] + (t - \frac{k}{2})

f(x) = (\frac{\frac{1}{q} - 1}{2k})[x - (t - \frac{k}{2})]^2 + x

The reduction function becomes:

r(x) = f(x) - i(x) = (\frac{\frac{1}{q} - 1}{2k})[x - (t - \frac{k}{2})]^2 + x - x

r(x) = (\frac{\frac{1}{q} - 1}{2k})[x - (t - \frac{k}{2})]^2

All of this results in the following code:

var reductionDB float32
switch {
    case peakDB <= (threshold - knee / 2.0): // non-compression
        reductionDB = 0.0
    case peakDB >= (threshold + knee / 2.0): // hard-compression
        reductionDB = ((1.0 / ratio) - 1.0) * (peakDB - threshold)
    default: // soft-compression
        z := peakDB - (threshold - knee / 2.0)
        reductionDB = ((1.0 / ratio) - 1.0) * z * z / (2.0 * knee)
}

Threshold translation

Before we can call it a day, we need to take into account one last thing. As I mentioned at the beginning of the article, I need my implementation to match what is described in the WebAudio API specification.

Notice that unlike our design so far, the soft-compression region starts at threshold and not at threshold - knee/2. Furthermore, the hard compression starts at threshold+knee and not threshold+knee/2. So the code needs to be adjusted to shift the threshold by knee/2 to the right, in order to match the specification.

Non-compression region

The range is adjusted to be as follows:

x <= t

The function is preserved:

f(x) = x

And the reduction is preserved as well:

r(x) = 0

Hard-compression region

The range is adjusted to be as follows:

t + k <= x

The function is adjusted by shifting the threshold:

f(x) = \frac{x}{{q}} - (\frac{1}{{q}} - 1)(t + \frac{k}{2})

And the reduction is adjusted as well:

r(x) = (\frac{1}{{q}} - 1)[x - (t + \frac{k}{2})]

Soft-compression region

The range is adjusted to be as follows:

t < x < t + k

The function is adjusted by shifting the threshold:

f(x) = (\frac{\frac{1}{q} - 1}{2k})(x - t)^2 + x

The reduction function is also adjusted and becomes:

r(x) = (\frac{\frac{1}{q} - 1}{2k})(x - t)^2

Code change

The code becomes as follows:

var reductionDB float32
switch {
    case peakDB <= threshold: // non-compression
        reductionDB = 0.0
    case peakDB >= threshold + knee: // hard-compression
        reductionDB = ((1.0 / ratio) - 1.0) * (peakDB - (threshold + knee / 2.0))
    default: // soft-compression
        z := peakDB - threshold
        reductionDB = ((1.0 / ratio) - 1.0) * z * z / (2.0 * knee)
}

Final Code

The following is the final state of the source code.

type CompressorNode struct {
    currentGain float32 // this should be initialized to 1.0
}

func (n *CompressorNode) Process(ctx ProcessContext, inputFrames, outputFrames FrameList) {
    const ( // these should be configurable
        sampleRate = 44100.0
        attack  = 0.005
        release = 0.25
    )

    for i, frame := range inputFrames {
        peak := max(sprec.Abs(frame.Left), sprec.Abs(frame.Right))
        peakDB := audio.GainToDB(max(1.0e-8, peak)) // avoid log of zero

        var reductionDB float32
        switch {
            case peakDB <= threshold: // non-compression
                reductionDB = 0.0
            case peakDB >= threshold + knee: // hard-compression
                reductionDB = ((1.0 / ratio) - 1.0) * (peakDB - (threshold + knee / 2.0))
            default: // soft-compression
                z := peakDB - threshold
                reductionDB = ((1.0 / ratio) - 1.0) * z * z / (2.0 * knee)
        }
        targetGain := audio.DBToGain(reductionDB)

        var coeff float32
        if targetGain < n.currentGain {
            coeff = float32(math.Exp(-1.0 / (sampleRate * attack)))
        } else {
            coeff = float32(math.Exp(-1.0 / (sampleRate * release)))
        }
        n.currentGain = sprec.Mix(targetGain, n.currentGain, coeff)

        outputFrames[i] = Frame{
            Left:  frame.Left * n.currentGain,
            Right: frame.Right * n.currentGain,
        }
    }
}

There you have it. This is how to implement a basic Dynamic Compressor with a soft transition that matches the threshold requirements of the WebAudio API specification.

What can be improved is the calculation of the input power and most constants in the code should be parameterized.

While I have tried to derive as much as possible myself and present it above (with the help of the Desmos tool), it is possible that I might have missed something or miscalculated. Feel free to comment below with tips and tricks on the matter.

DEV Community

Implementing a Dynamic Compressor

Background

Algorithm

(Step 1) Consume audio frames

(Step 2) Determine input power

(Step 3) Determine target compression amount

The reduction function

The compression function

Non-compression region

Hard-compression region

Using the reduction functions in code

(Step 4) Calculate desired audio gain

(Step 5) Gradually move current gain towards desired gain

(Step 6) Apply gain to input frame and pass to output

Soft Dynamic Compressor

Non-compression region

Hard-compression region

Soft-compression region

Lower boundary

Upper boundary

The quadratic polynomial

Threshold translation

Non-compression region

Hard-compression region

Soft-compression region

Code change

Final Code

Top comments (0)