ECecillo

Posted on Apr 2 • Edited on Sep 29

Workshop ⚒️ - Create sound with sin(x) 🎧

#beginners #learning #sampling #signal

Before we start

Hello 👋, this article is part of a series on signal processing and is an opportunity for me to document the different concepts I had to grasp to code my audio encoder.

If this subject interests you and you want to learn more, I invite you to follow me or visit this post that I update as I write new articles 😁.

I am open to any remarks or suggestions for improvement; feel free to share your feedback in the comments to contribute to the enrichment of this series of articles 💪.

Happy reading 😉.

Introduction

What I propose to you today is to conduct a small workshop consisting of creating a sound from a few basic mathematical formulas that will be based on the notions we have seen previously + new ones that will complete our understanding of analogue signals and digitisation.

The goal of this article is not to code a complex program but to illustrate concepts.

I will only give you the necessary algorithms to do it yourself, and at the end, I will put a sandbox where you can directly launch the program that I coded on my side with the language of my choice 😉.

There will be no copy-paste in this article, so you will need to be at least comfortable with the language of your choice.

Settle in comfortably, grab something to drink ☕️, and if that's all good, let's go 🚀!

Code part

Creating a Curve and Retrieving Points

We saw that the shape of an analogue signal fundamentally was based on sinusoidal curves.
Therefore, we will need to use our function $sin(x)$ for this.

Then, once we have our signal curve, we will need to digitise it. Usually, to go from an analogue to a digital signal, we already have small electronic equipment that does it for us: the ADCs. (Analog-Digital-Controller)

A small illustration of an ADC that one can find on the internet 😉.

However, for simplicity, we will not go through all the necessary steps to digitise an analogue signal, because for a $sin(x)$ curve, it would be a bit overkill 😅.

We will detail the digitization process in another article.

To play a sound, we need values that we will calculate based on $sin(x)$ , we will proceed with what is called sampling.

Sampling

Sampling is the first step in analog-to-digital conversion (digitization).
It consists of taking "pictures" or samples of the analogue signal at regular intervals to be able to retrieve the original signal from key points.

This step is very important since the rest of the digitization will be based on these data!

The frequency at which these samples are selected is called sampling frequency.

But wait, I've already heard of frequency, does it correspond to this then?

No, that's why I will clarify a bit the difference between the two from here and specify their impact on the sound we perceive to avoid confusion 😁.

Frequency of an Analog Signal

The frequency of an analogue signal indicates how many cycles occur in one-second.
For example, a frequency of $1Hz$ indicates that one wave takes one second to complete a cycle.

Increasing the frequency produces sounds of different intensities and pitches.

You can find examples here: Examples of the impact of frequency on $sin(x)$

We will see in another article the relationship between frequency and musical notes 🎼.

Sampling Frequency

The sampling frequency indicates the number of samples we take per second to represent the analogue signal digitally.
The higher the sampling frequency, the more precise the digital representation will be.

Frequency of an Analog Signal and Sampling Frequency

Understanding the Correlation Between Wave Frequency and Sampling

Frequency of a signal: It determines the number of zigzags or cycles the signal has over a certain distance.
Sampling frequency: It determines how frequently we place points along these zigzags to capture the signal's information.

If you increase the signal frequency (more zigzags) but keep the sampling frequency constant (the markers are always spaced the same distance), then yes

, you will cross more zigzags, but you will still cross the same number of markers over a given distance.

In terms of audio, this means that if you have a constant sampling frequency (say 44.1 kHz <=> 44,100 samples per second), but you increase the frequency of the wave you generate, you will have more wave cycles (more "high-pitched notes") in the same period, but each note will still be represented by the same number of samples.

According to the Nyquist-Shannon theorem, the sampling frequency must be at least twice as high as the highest frequency present in the analogue signal to avoid aliasing (signal distortion caused by higher frequencies).

However, the more we increase the number of samples per second, the larger our file size will be because we need to store much more information.

Illustration of the Nyquist-Shannon Principle

The main idea of the theorem is that we must be able to place at least 2 points per cycle on a signal to identify its curve.

If we take the following example where we have a signal of $20Hz$ with a sampling frequency of $30Hz$ , then we might end up with the following result:

We have more than 1 point for one of the cycles which might complicate retrieving the original signal 😬.

This time let's see what happens with a sampling frequency of $80Hz$ :

A very important thing to observe here is the phase value of the signal which is zero. (this is important for later)

The samples represented by our red crosses are calculated every $12.5ms$ which we calculate from:

\frac{1}{\text{sampling frequency}} = \text{... seconds}

Here, it is therefore $\frac{1}{80Hz}=0.0125s$ or $125ms$ after conversion to milliseconds.

But wait, we said that twice the sampling frequency was enough, why did we take four times that of the wave frequency here?

If we had taken a frequency of $40Hz$ here's what it would have looked like:

Strange, right? Why are all our points at zero here?

Because of our phase value!

Yes, it can skew our sampling and make us believe that our original signal is flat.

This is a situation called undersampling and illustrates why choosing the sampling frequency and a value of sampling phase (not to be confused with the signal's phase value) can be important.

Sampling Phase

As the phase corresponds to where our signal starts, the sampling phase tells us where (amplitude) we will begin our sampling. We create a small offset so as not to be impacted by the signal's phase value.

If I now define an offset of $1ms$ for our sampling phase, I should start at a moment where our curve is not at an amplitude of 0.

Great! We can now manage to observe when our curve seems to rise and fall.

OK, I understand why we need a high frequency, but I see the frequency $44.1 kHz$ everywhere, why do we use this sampling frequency?

The Human Ear and Science

Humans are capable of hearing sounds that can vary between frequencies of $20Hz$ and $20kHz$ .
As we saw earlier, the Nyquist-Shannon theorem indicated that we needed to have a sampling frequency twice as high as the frequency of a wave.

So $2*20kHz$ (because it's the highest frequency that the human ear can perceive) gives us $40kHz$ .

We're not far from our $44.1kHz$ , but why do we have $4.1kHz$ extra?

There are several reasons for this, but the main one today is the following:

Margin for anti-aliasing filters:

We keep a margin of $4.1kHz$ to allow the design of filters that eliminate frequencies that go beyond the audible bandwidth (wave frequency higher than $20kHz$ )

We will discuss the concept of psychoacoustics a bit later when we talk about quantization and signal filtering 😉.

Hey! I wanted to create a sound, not read a Lecture on signal processing!

Now, we should have almost all the elements to determine our values on our $sin(x)$ curve and listen to the sound generated by it!

Sampling with $sin(x)$

We will need the following values:

The frequency of our signal.

The sampling frequency.

The desired duration for our signal.

Today, we will play the pitch A440 (standard pitch) which corresponds to a signal frequency of $440Hz$ and we will sample it with a frequency of $44.1kHz$ because we can afford it, right!

Lastly, I decided that we would have a duration of $4s$ for our signal.

To summarize:

Signal frequency: $440Hz$

Sampling frequency: $44.1kHz$

Duration: $4s$

If we make a graph of this thing, zooming in on the first ten milliseconds (otherwise it would be unreadable), we will have:

All this is cool, but now let's move on to our program for calculating points on these curves.

Do you remember in the first article when I started to represent my $sin(x)$ curve over a radians interval and explained that it was practical to express it in radians because $2π$ corresponds to a period for $sin(x)$ ?

In fact, the $2π$ is much more useful than that; in trigonometry, it helps us for the sine and cosine functions to turn around a circle and define angles all along it.
We can express these angles in radians ( $π$ ) or in degrees ( $2π = 360°$ )

Uh ok, but why are you talking about this again 🥶?

Actually, we need a way to know how I will move on my curve to find my points.

We need a kind of compass that will indicate to us, considering all the samples and the duration of our signal, in which direction we will move on our $sin(x)$ curve according to the curve's frequency.

Our compass will therefore be an angle, and to have an angle, we can use our $2π$ .

We need to have a constant angle that considers all the samples we must have over the total duration of the signal, which is here four seconds.

Reminder: If the sampling frequency is the number of samples per second that we can have.

Then, calculating the total number of samples over a four-second interval consists of multiplying the sampling frequency by the number of seconds:

$\text{nsamps} = 4 * 44100$

We just need to calculate the constant angle of all our samples:

$\text{angle} = \frac{2π}{nsamps}$

We will proceed with the following steps:

Calculate the total number of samples to be taken over the four seconds. $nsamps$

Calculate the angle $θ$ for all our samples $nsamps$ .

For each sample $i$ relative to the total number of samples $nsamps$

Calculate the value $sample$ of the sample $i$ with respect to $sin(θ * frequency * i)$

In algorithm:
Pre-define: 
    Duration, SampleRate, Frequency

Start:
    nsamps <- Duration * SampleRate
    angle <- (2 * π)/nsamps
    foreach i of nsamps
        sample <- sin(angle * Frequency * i)
End.
I invite you to first code this rather simple algorithm, but which already does everything we wanted.

Storing Our Values

I don't know if you remember, but in the first article, I introduced you a bit to digital signals, I told you that these were a discrete representation (in binary form) of a signal.

To store our sample values in the form of a digital signal, we need to transform these floating values into binary!

This is a very important process that usually occurs in the last stages of digitization to be able to store the processed information on our computers.

However, the more computer-savvy among you may know that depending on your OS, the computer does not read and store byte sequences (8 bits) in the same direction.

Little-Endian and Big-Endian

There are thus two types of orders depending on your OS:

Big-Endian: the most significant bit is first (at the lowest memory address), the least significant bit is last (at the highest address).

Little-Endian: the least significant bit is first (at the lowest memory address), the most significant bit is last (at the highest address).

See it a bit like the difference between reading a novel and reading a manga:

We read a novel from left to right where the most important information will be at the end of the book. (Little-Endian)

The manga, we read from right to left. (Big-Endian)

If you are coding the program on your side, I invite you to type the following command to find out if your OS is Little-Endian or Big-Endian.
lscpcu | grep "Byte Order"
We will therefore complete our algorithm to be able to write the content of the variable $sample$ in our loop in a file:
Pre-define: 
    Duration, SampleRate, Frequency

Start:
    --- New ---
    fileName <- "out.bin"
    fd <- os.Create("path/to/write/file"+ fileName)
    --- EndNew ---

    nsamps <- Duration * SampleRate
    angle <- (2 * π)/nsamps
    foreach i of nsamps
        sample <- sin(angle * Frequency * i)

        --- New ----
        bufByte <- LittleEndian(sample)
        byteWritten <- fd.write(bufByte)
        show (" Wrote " + samples " in byte " + byteWritten)
        --- EndNew ---
End.
If you have managed to code this algorithm, you should now have a file in which there is a binary representation of your signal 💪 🥳.

To listen to it, you can use software like Audacity and open the file as a "raw audio file".
You just need to switch to mono-channel and select the correct encoding.

Otherwise, you can execute the following command, but you will need to install FFMPEG on your machine:
ffplay -f f32le -ar  44100 -showmode 1 out.bin
Demo 🖥

Sandbox JavaScript + Bun 🥟 😁

To Conclude

First of all, congratulations for making it this far, I know it must not have been easy, but I hope you learned a few things and had a little fun.

If you are interested in this series, you can follow me so as not to miss new articles and leave a little comment if you like it.

Otherwise, I'll see you next time for the next article which will be a bit softer on math, frequency, and music 🎧.

DEV Community

Workshop ⚒️ - Create sound with sin(x) 🎧

Before we start