I want to make it so that if a sound happens in my front room, that triggers something else to happen.
So my first step is detecting a sound.
How can I do that? Well, I have a webcam with a mic, and a raspberry pi. Maybe I can combine them in some way to do that. I'm a programmer, right? so it can't be that hard...
Step 1. Get sound from a mic.
There are plenty of ways to do this, for example PyAudio, but I know all I really want is a stream of bytes. arecord exists on my raspberry pi by default, so let's see if I can get a stream to pipe out from there.
The first conundrum is: what is my device name? Luckily arecord has an option to tell me.
Well, this is a long list, but I see something called "sysdefault" which is a promising name:
sysdefault:CARD=C920 HD Pro Webcam C920, USB Audio Default Audio Device
OK, so, what else? Well, let's test this by only recording for a second, and recording to a file.
arecord --device="sysdefault:CARD=C920" --duration=1 file.wav
Recording WAVE 'file.wav' : Unsigned 8 bit, rate 8000 Hz, Mono
And I have a one second long wave file I can listen to, if I want.
So, if I want to pipe this into something, I just need to change
-, which is Linux shorthand for stdout. (If you ran the command substituting
- then the stream would be written to your terminal, which would be quite ugly.)
Step 2. What is a sound?
Signal processing is a rich field of study that has produced many useful tools for working on analogue and digital data to determine things about it. It's where we get data compression, for example, as well as those nifty visualisations on music players.
There are plenty of methods for determining if something is a signal or just noise, for example I could take a Fast Fourier Transform of the data and see if the peaks in frequency are what I want to hear.
Except that seems like a lot of work. All I want to know is if any sound has happened, and the easiest tool for the job is the trusty root mean square of the error (aka RMS of the error).
So, the first step here is to determine what "no sound" means. To do that I can calculate the RMS of the signal in a quiet place and save that value.
sumsq = 0 count = 0 for b in buffer: count += 1 sumsq += b*b mean = sumsq/count rms = math.sqrt(mean)
Because the sound stream's packets are 8 bits, and the stream is in mono configuration, rather than stereo, it's easy to calculate the RMS of the signal from individual bytes. A different setup would require some changes to the calculation.
So, I can take that RMS value and save it to a file, and use it later to calculate the RMS of the error to see if there is a signal.
def rmsError(buffer, quiet_value): sumsqdiff = 0 count = 0 for b in buffer: count += 1 sumsqdiff += (b-quiet_value)**2 mean = sumsqdiff / count rms = math.sqrt(mean) return rms
Just like that.
Step 3. Plumbing the sound pipes
So, I know how to generate the stream of sound, and I know how I am going to process it, but how do I get it into python to do so?
For this I shall use the subprocess module. Specifically the
This is the line:
proc = subprocess.Popen(['arecord', '--device=sysdefault:CARD=C920', '-M', '--rate=32000', '-'], stdout=subprocess.PIPE)
'-' at the end of the list is the file name, if you recall, that is the shorthand for stdout. I've also upped the sampling rate from 8000 to 32000, just because I can, and set the M flag which means arecord will memory map the stream, which seems pretty fancy.
The first thing to note is that I have omitted a duration, which means the mic will be listened to until the process is stopped by some method (see below if you can't wait to find out).
Next, note that the
--device argument is system dependent. You should use the method I detailed above to determine your own mic's pcm name.
The other, important, thing to note is that the stdout parameter is assigned
subprocess.PIPE, which allows me to read the output arecord is generating.
To get the buffer that will be analysed, I just need to read a certain number of bytes from the stdout stream:
buffer = proc.stdout.read(32000)
Thanks to the stream being 8 bit, the number of bytes per second is the same as the sampling frequency! Isn't that just dandy.
So, pulling it all together:
import subprocess import math import os import signal as killsig def detect(): quiet_val = 127.0 with open('calibration.txt', 'r') as califile: quiet_val_str = califile.readline() quiet_val = float(quiet_val_str) proc = subprocess.Popen(['arecord', '--device=sysdefault:CARD=C920', '-M', '--rate=32000', '-'], stdout=subprocess.PIPE) proc.stdout.read(44) #skip the file header signal = 0 while signal < 2: buffer = proc.stdout.read(32000) signal = rmsError(buffer, quiet_val) print('signal: %f'%signal) os.kill(proc.pid, signal.SIGINT)
So, as noted in the code, I want to skip the file header, so it's not considered part of the signal. I figured out the length by looking at
file.wav. I'm sure it's in an RFC or other specification somewhere, but who has the time to check?
I've decided that "sound" is anything with an RMS error of 2 or above. I got that from some experimenting. I could set it higher if I want a less sensitive sound detector.
I decided to use
os.kill(proc.pid, signal.SIGINT) (which is the equivalent of pressing CTRL+C in the terminal) over
proc.terminate() because it seems gentler. Those other methods are fine, I'm just fussy.
This is a quick and kinda dirty, but not very dirty, way to detect sound from a mic.
I have defined sound to be any signal that has an RMS error of 2 or above from the quietness value I calculated.
There is no error detection in this code, so if anything goes wrong, e.g. the calibration file isn't present, or the device name is wrong, the script will crash inelegantly. Something for the readers to look into 😉.
Now I have to figure out what to do once I have detected a sound...
Thanks for reading and let me know in the comments if you found this helpful, any tips to improve it, or anything else!
Top comments (0)