Ali Sherief

Posted on Jan 27, 2020 • Edited on Mar 1, 2020

Python audio processing at lightspeed ⚡ Part 1: zignal

#python #audio

I really, really want to find out what these audio signals are made of. The reason why I like audio processing is that it combines applied math and signals with cool music tones. This post series serves to scratch my developer itch. As much as I would like to grab all the python audio libraries, figure them out and place them here, I simply can't look at all those libraries, so I will cover as many as I can. Along the way, you will learn quite a bit about sound theory! It looks like I'll cover one library or most of one library per post.

I will start by covering the features of a simple signal processing module that is designed for readability of the source code, zignal. We will then work our way up more advanced sound libraries and hopefully, you will finish reading this blog series knowing more about the different python audio modules than before.

A short backstory of finding zignal

This post was originally going to be about the audioop python module but I was disappointed with its lack of equalization functions and nearly everything else related to audio. So I was looking for python libraries that could do this and zignal came up on the first page of results (spoiler alert: it doesn't but it's still interesting). I still might cover audioop in some other post but only because I'm curious.

By the way, scipy.signal has many, many more audio manipulator functions than audioop, but it will not be covered in this post either.

Different audio waveforms

It makes sense to start with explaining the composition of the simplest types of wave forms. I know you might be thinking what kind of waveforms album songs are made of, but that is complicated territory since there are many instruments and post-processing effects at play. For now, I will show you the following:

Sine wave

This wave is based on the sine function. I won't go into detail about the mathematical properties of the sine function, but its two most important parameters are the frequency, which is how short each wave is (low frequencies are longer), and the amplitude, which is how tall it is. Each sample in the sound is a single amplitude.

In zignal:

>>> import zignal
>>>
>>> x = zignal.Sinetone(fs=44100, f0=997, duration=0.1, gaindb=-20)
>>> print(x)
=======================================
classname        : Sinetone
sample rate      : 44100.0 [Hz]
channels         : 1
duration         : 0.100 [s]
datatype         : float64
samples per ch   : 4410
data size        : 0.034 [Mb]
has comment      : no
peak             : [0.1]
RMS              : [0.0707]
crestfactor      : [1.4147]
-----------------:---------------------
frequency        : 997.0 [Hz]
phase            : 0.0 [deg]
-----------------:---------------------

Lets review the keyword arguments used here:

fs: sampling frequency aka sample rate which is the number of samples (data points) in the waveform per second. To draw a comparison, CD audio has a sample rate of 44100 Hz and DVD audio has a sample rate or 96000 Hz or 192000 Hz.
f0: Frequency, in Hz.
duration: The length of the sound in seconds. Since this is a sine wave, it will oscillate the sine wave for that long.
gaindb: The gain in decibels (dB). The gain is how loud (how big the amplitude) the samples are going to be before any post-processing sound effects are done (as opposed to volume which is the dB of the final sound).

Don't worry about the output properties just yet. I'll cover that in due time. Also decibels is not simply a measure of amplitude; there will be a description of decibels later below.

Square wave

This waveform is square-shaped and it goes down and up at fixed intervals. It is recognizable by its constant frequencies, the horizontal lines.

>>> import zignal
>>> x = zignal.SquareWave(fs=44100, f0=997, duration=0.1, gaindb=-20)
>>> print(x)
=======================================
classname        : SquareWave
sample rate      : 44100.0 [Hz]
channels         : 1
duration         : 0.100 [s]
datatype         : float64
samples per ch   : 4410
data size        : 0.034 [Mb]
has comment      : no
peak             : [0.1]
RMS              : [0.1]
crestfactor      : [1.]
-----------------:---------------------
frequency        : 997.0 [Hz]
phase            : 0.0 [deg]
duty cycle       : 0.500 (50.0%)
-----------------:---------------------

Triangle and Sawtooth waves

Triangle waves are like square waves but with lower average amplitudes. A sawtooth wave raises from a minimum value to a maximum value in a straight line, hence the jagged shape.

zignal doesn't have classes for these. They are simple waveforms so that could be a useful TODO for someone to implement.

How waves pulsate

Periodic waveforms are composed of a bunch of sine waves at different frequencies and lengths (as in, how long it lasts). The waveform is made up of harmonics. Each harmonic is a number which multiplies one particular sine wave which is called a fundamental, because it has a fundamental frequency. This creates a pitch.

The other waveforms I talked about above can be described in harmonics too. The square wave and triangle wave only have odd harmonics, but the triangle wave rolls off (becomes less audible to audio equipment) faster than the square wave. Sawtooth waves contain all integer harmonics.

If this section is confusing, this infographic should help you understand this (frequency horizontal, amplitude vertical):

Something rolling off has its frequency becoming very high or very low.

Plotting

Yes, you can plot sounds in zignal! This is a useful feature for any signal processing library because a frequency graph alone does not reveal all of the properties of the waveform. Since it's using matplotlib you also get the option to save the plot. This is a zignal plot of a sine wave with a duration of 10 milliseconds, frequency of 997 Hz and a sample rate of 44100 Hz:

>>> import zignal
>>> x = zignal.Sinetone(fs=44100, f0=997, duration=0.01, gaindb=-20)
>>> x.plot()

This is the FFT (Fast Fourier Transform, a function that converts the sound between time and frequency) plot of the same waveform:

>>> import zignal
>>> x = zignal.Sinetone(fs=44100, f0=997, duration=0.01, gaindb=-20)
>>> x.plot_fft()

What are decibels

As you can see, the second plot displays the frequency against the magnitude. This magnitude is almost the amplitude but not quite. What we measure as 'decibels' is actually the logarithm of the ratio of the amplitudes and what is being called the magnitude on this plot. It is 20 * log10(amp/amp_ref) where amp_ref is a reference amplitude which allows us to use decibels as a unit of measurement in the first place. Its exact value isn't important.

If we just used the amplitude as a measure of loudness, well, it could have any value because it isn't tethered to any "base" or "origin" value. Having a reference amplitude to divide by solves this problem because the result has an origin of 1 (when the amplitude and the reference amplitude are the same). It then has to be scaled since there could be very small amplitudes in the sample that cannot be compared with normal samples linearly.

Frequencies have a similar measurement called a decade (dec).

But more importantly, the second plot is demonstrating that waveforms have frequencies of different loudness. Decibels are a measure of loudness, how loud each part of the waveform is going to be.

What you may have also noticed is that this waveform has one channel. Zignal and many other audio processing libraries support multiple channels in a sound. A channel is a single waveform in the sound.

But what about the display output?

We now know enough terminologies to inspect the meaning of each displayed output parameter.

Displayed output	Meaning
classname	The name of the python class
sample rate	Number of samples in the waveform per second, measured in Hz
channels	Number of channels in the sound
duration	How long the sound lasts in seconds
datatype	The numpy datatype used for the samples
samples per ch	how many samples are in each channel. Each channel should have the same number of samples.
data size	How large the sound is based on the duration, sample rate and number of channels (megabytes)
has comment	Is there a user string comment in the sound object?
peak	Loudest sample in the waveform (per channel)
RMS	root mean square of the waveform (`sqrt(sum(each_amplitude^2))`). This value has useful physical properties.
crestfactor	How extreme the peaks in the waveform are (`abs(peak)/rms`)
frequency	The frequency of the waveform. Zignal uses a default of 997 Hz but you usually want a different frequency.
phase	Offset of the waveform. It's possible to "push" a waveform back and forward in time.
duty cycle	How "active" a waveform is. Its the ratio of the pulse width (how long the waveform is "on" in a single period) and the period length.⚑ It's usually expressed as a percentage. Zignal sounds have a default duty cycle of 0.5 (50%).

⚑This parameter wasn't very clear to me either. I hope as I finish this blog series its meaning and usefulness will become more clear, but if you do know what it means please let me know in the comments because it will be very useful to hear.

One important parameter that is not listed here (but should) is bit rate. This is the number of different amplitudes that a sample can take on. maximum_amplitude=2^(bit_rate). The higher the bit rate, the more fine-grained amplitudes the sample can take on. Samples can't just have any amplitude because they are stored digitally, as opposed to analog samples. This is why all digital samples have a bit rate.

Audio effects

Zignal's selection of filters and transformations is quite modest, but you at least have a set of basic effects like fade-in and fade-out and delaying. Lets see how these effects work.

Zignal has fade_in(millisec) and fade_out(millisec) methods which gradually reduce the loudness of the waveform at the beginning and end respectively in milliseconds. This fade applies to all channels.

Attention: The current version of zignal available on PyPI seems to fail when calling these functions with the error: TypeError: 'numpy.float64' object cannot be interpreted as an integer. This is a known issue and the patch didn't get merged in time before publishing this article. Please be patient while I submit a bugfix for this. Bugfix has been merged, please update to zignal 0.6.0.

delay(n, channel) shifts all the samples in a channel to the right n times. The first n samples are then filled with zero i.e. made empty. Channel numbers start at 1.

Zignal can also check if a waveform is composed of samples that are entirely zero (is_empty()) and if a waveform's peak is less than a specified loudness in decibels (is_probably_empty(limit)).

It is also possible to convert the sound between floating point bit rate between 0 and 1 (convert_to_float(targetbits)), and integer bit rates (convert_to_integer(targetbits)). Converting between different integer bit rates is not supported yet, and neither are 24 bit integer bit rates, that would also be a good TODO. The sounds can also be resampled (resample(targetrate)) and normalized (normalise()). You can also speed up and slow down the audio sample without re-sampling (set_sample_rate()).

It's also possible to dither the sound. Dithering compresses the sound into a smaller number of bits. It results in less noise than if the sampled bits were just truncated or rounded away and it removes something called "harmonic quantization distortion" which is when the harmonics are distorted and as a result increase the noise in a sample.

In zignal, the function for dithering is supposed to be dither(bits) but that function has not been implemented yet, as in, it raises NotImplementedError.

to_mono() mixes all the channels down to a single, "mono" channel returning a new sound object. Finally, you can export the sound to a WAV file with write_wav_file(filename).

Under the hood

Zignal makes use of scipy for making the actual samples and for exporting the WAV file. Signal processing is heavily numeric so numpy is used for the vast majority of the arrays and datatypes as well as many other assorted numeric functions used to alter the signal. The plots are made with matplotlib and in fact you can pass all of the keyword arguments of matplotlib.pyplot.plot() to the plot() method of Zignal sounds.

For sample rate conversion (the parts that have been implemented at least), the samplerate package was used to provide the conversion.

Closing words

As you can see I'm no audio expert 👓 but I tried my best to understand what I wrote.

Zignal is a good signal processing library... for learning how signal processing works. There are quite a few things which need to be implemented before it can be used for production. Hopefully I will find other production-ready libraries and write about them in later blog posts to come.

Got any python audio libraries you want me to talk about? Let me know in the comments so I can write about them too.

And it goes without saying, if you see any errors in this post, let me know so I can correct them.

Icons made by Freepik from www.flaticon.com

Image by PublicDomainPictures from Pixabay