DEV Community

Jochem Stoel
Jochem Stoel

Posted on

[Discuss] Using machine learning to process audio files?

I use Audacity and various software (VST plugins) that require a DAW/host that supports them to manually remove background noise from audio files (spoken recordings). By background noise I mean the subtle background static that you pretty much always get with affordable microphones and phones. By selecting a range of "silence" in the file (nobody talking, just noise) as a so called audio profile I can improve the results. I repeat the same procedure until the background noise level is entirely zero/gone, sometimes at cost of the voice. The quality of the result depends mostly on the quality of the file and the amount of noise.

Noise in this context does not mean random interruptions from the environment like traffic, it always refers to the static.

The results are often unusally good, it is amazing what these tools can do when you apply them correctly. Correctly however also means manually. The parameters to process the audio are a little different for each single file so in order to process audio in bulk I need to automate it somehow. I have been looking for command line utilities or libraries of some sorts to do that. There are some tools out there but none of them delivers results. I even tried loading vst plugins in a headless VST host with predefined parameters but this is hacky and crap.


PhonicMind advertises itself as "Online AI Vocal Extractor", an online tool that extracts the vocal track from an audio file using "artificial intelligence" to improve results. It is a commercial product (you have to pay to use it) and there is a repository that separates singing voice from music based on deep neural networks in Tensorflow. I did not look into and do not care if the two are related.


So, it follows with a lightbulb. Machine learning for background noise removal? I suspect (pretty much assume) that detecting and removing noise in bulk audio like that is much simpler than what PhonicMind attempts to do. Perhaps simple enough to do it myself.

How would I go about prototyping a solution? I have a general understanding of machine learning and know about TensorFlow but can't say I ever used it for something. If at all possible I prefer a solution that requires the least amount of studying the matter so if you happen to know of a command line utility or framework that I did not find then that is probably even better.


Top comments (3)

puritanic profile image
Darkø Tasevski

Spotify uses machine learning and algorithms to analyze music, and the way they are doing it is fascinating, maybe not what you're asking but this is an interesting read anyway:

jochemstoel profile image
Jochem Stoel

You are right, not what I am looking for.

lornashorefan profile image
lorna shore

It has been some years since this post, but I am working on exactly this problem. removes background noise of spoken audio. It does this automatically for you. You can try it for free!