DEV Community

Cover image for Detecting sound via a webcam mic in Linux
Matt Ellen
Matt Ellen

Posted on • Updated on


Detecting sound via a webcam mic in Linux

I want to make it so that if a sound happens in my front room, that triggers something else to happen.

So my first step is detecting a sound.

How can I do that? Well, I have a webcam with a mic, and a raspberry pi. Maybe I can combine them in some way to do that. I'm a programmer, right? so it can't be that hard...

Step 1. Get sound from a mic.

There are plenty of ways to do this, for example PyAudio, but I know all I really want is a stream of bytes. arecord exists on my raspberry pi by default, so let's see if I can get a stream to pipe out from there.

The first conundrum is: what is my device name? Luckily arecord has an option to tell me.

arecord --list-pcms
Enter fullscreen mode Exit fullscreen mode

Well, this is a long list, but I see something called "sysdefault" which is a promising name:

    HD Pro Webcam C920, USB Audio
    Default Audio Device
Enter fullscreen mode Exit fullscreen mode

OK, so, what else? Well, let's test this by only recording for a second, and recording to a file.

arecord --device="sysdefault:CARD=C920" --duration=1 file.wav
Enter fullscreen mode Exit fullscreen mode

Looks good!

Recording WAVE 'file.wav' : Unsigned 8 bit, rate 8000 Hz, Mono

And I have a one second long wave file I can listen to, if I want.

So, if I want to pipe this into something, I just need to change file.wav to -, which is Linux shorthand for stdout. (If you ran the command substituting file.wav for - then the stream would be written to your terminal, which would be quite ugly.)

Step 2. What is a sound?

Signal processing is a rich field of study that has produced many useful tools for working on analogue and digital data to determine things about it. It's where we get data compression, for example, as well as those nifty visualisations on music players.

There are plenty of methods for determining if something is a signal or just noise, for example I could take a Fast Fourier Transform of the data and see if the peaks in frequency are what I want to hear.

Except that seems like a lot of work. All I want to know is if any sound has happened, and the easiest tool for the job is the trusty root mean square of the error (aka RMS of the error).

So, the first step here is to determine what "no sound" means. To do that I can calculate the RMS of the signal in a quiet place and save that value.

sumsq = 0
count = 0

for b in buffer:
  count += 1
  sumsq += b*b

mean = sumsq/count
rms = math.sqrt(mean)
Enter fullscreen mode Exit fullscreen mode

Because the sound stream's packets are 8 bits, and the stream is in mono configuration, rather than stereo, it's easy to calculate the RMS of the signal from individual bytes. A different setup would require some changes to the calculation.

So, I can take that RMS value and save it to a file, and use it later to calculate the RMS of the error to see if there is a signal.

def rmsError(buffer, quiet_value):
  sumsqdiff = 0
  count = 0

  for b in buffer:
    count += 1
    sumsqdiff += (b-quiet_value)**2

  mean = sumsqdiff / count
  rms = math.sqrt(mean)

  return rms
Enter fullscreen mode Exit fullscreen mode

Just like that.

Step 3. Plumbing the sound pipes

So, I know how to generate the stream of sound, and I know how I am going to process it, but how do I get it into python to do so?

For this I shall use the subprocess module. Specifically the Popen function.

This is the line:

proc = subprocess.Popen(['arecord', '--device=sysdefault:CARD=C920', '-M', '--rate=32000', '-'], stdout=subprocess.PIPE)
Enter fullscreen mode Exit fullscreen mode

The '-' at the end of the list is the file name, if you recall, that is the shorthand for stdout. I've also upped the sampling rate from 8000 to 32000, just because I can, and set the M flag which means arecord will memory map the stream, which seems pretty fancy.

The first thing to note is that I have omitted a duration, which means the mic will be listened to until the process is stopped by some method (see below if you can't wait to find out).

Next, note that the --device argument is system dependent. You should use the method I detailed above to determine your own mic's pcm name.

The other, important, thing to note is that the stdout parameter is assigned subprocess.PIPE, which allows me to read the output arecord is generating.

To get the buffer that will be analysed, I just need to read a certain number of bytes from the stdout stream:

buffer =
Enter fullscreen mode Exit fullscreen mode

Thanks to the stream being 8 bit, the number of bytes per second is the same as the sampling frequency! Isn't that just dandy.

So, pulling it all together:

import subprocess
import math
import os
import signal as killsig

def detect():
  quiet_val = 127.0
  with open('calibration.txt', 'r') as califile:
    quiet_val_str = califile.readline()
    quiet_val = float(quiet_val_str)

  proc = subprocess.Popen(['arecord', '--device=sysdefault:CARD=C920', '-M', '--rate=32000', '-'], stdout=subprocess.PIPE) #skip the file header

  signal = 0

  while signal < 2:
    buffer =
    signal = rmsError(buffer, quiet_val)

  print('signal: %f'%signal)

  os.kill(, signal.SIGINT)
Enter fullscreen mode Exit fullscreen mode

So, as noted in the code, I want to skip the file header, so it's not considered part of the signal. I figured out the length by looking at file.wav. I'm sure it's in an RFC or other specification somewhere, but who has the time to check?

I've decided that "sound" is anything with an RMS error of 2 or above. I got that from some experimenting. I could set it higher if I want a less sensitive sound detector.

I decided to use os.kill(, signal.SIGINT) (which is the equivalent of pressing CTRL+C in the terminal) over proc.kill() or proc.terminate() because it seems gentler. Those other methods are fine, I'm just fussy.


This is a quick and kinda dirty, but not very dirty, way to detect sound from a mic.

I have defined sound to be any signal that has an RMS error of 2 or above from the quietness value I calculated.

There is no error detection in this code, so if anything goes wrong, e.g. the calibration file isn't present, or the device name is wrong, the script will crash inelegantly. Something for the readers to look into ๐Ÿ˜‰.

Now I have to figure out what to do once I have detected a sound...

Thanks for reading and let me know in the comments if you found this helpful, any tips to improve it, or anything else!

Top comments (0)

๐ŸŒš Friends don't let friends browse without dark mode.

Sorry, it's true.