DEV Community

Jochem Stoel
Jochem Stoel

Posted on

8 2

[Discuss] Using machine learning to process audio files?

I use Audacity and various software (VST plugins) that require a DAW/host that supports them to manually remove background noise from audio files (spoken recordings). By background noise I mean the subtle background static that you pretty much always get with affordable microphones and phones. By selecting a range of "silence" in the file (nobody talking, just noise) as a so called audio profile I can improve the results. I repeat the same procedure until the background noise level is entirely zero/gone, sometimes at cost of the voice. The quality of the result depends mostly on the quality of the file and the amount of noise.

Noise in this context does not mean random interruptions from the environment like traffic, it always refers to the static.

The results are often unusally good, it is amazing what these tools can do when you apply them correctly. Correctly however also means manually. The parameters to process the audio are a little different for each single file so in order to process audio in bulk I need to automate it somehow. I have been looking for command line utilities or libraries of some sorts to do that. There are some tools out there but none of them delivers results. I even tried loading vst plugins in a headless VST host with predefined parameters but this is hacky and crap.


PhonicMind

PhonicMind advertises itself as "Online AI Vocal Extractor", an online tool that extracts the vocal track from an audio file using "artificial intelligence" to improve results. It is a commercial product (you have to pay to use it) and there is a https://github.com/andabi/music-source-separation repository that separates singing voice from music based on deep neural networks in Tensorflow. I did not look into and do not care if the two are related.


Lightbulb

So, it follows with a lightbulb. Machine learning for background noise removal? I suspect (pretty much assume) that detecting and removing noise in bulk audio like that is much simpler than what PhonicMind attempts to do. Perhaps simple enough to do it myself.

How would I go about prototyping a solution? I have a general understanding of machine learning and know about TensorFlow but can't say I ever used it for something. If at all possible I prefer a solution that requires the least amount of studying the matter so if you happen to know of a command line utility or framework that I did not find then that is probably even better.

Thanks.

API Trace View

Struggling with slow API calls? đź•’

Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more →

Top comments (3)

Collapse
 
puritanic profile image
Darkø Tasevski •

Spotify uses machine learning and algorithms to analyze music, and the way they are doing it is fascinating, maybe not what you're asking but this is an interesting read anyway:

medium.com/s/story/spotifys-discov...

Collapse
 
jochemstoel profile image
Jochem Stoel •

You are right, not what I am looking for.

Collapse
 
lornashorefan profile image
lorna shore •

It has been some years since this post, but I am working on exactly this problem. nonoisy.com removes background noise of spoken audio. It does this automatically for you. You can try it for free!

Sentry image

See why 4M developers consider Sentry, “not bad.”

Fixing code doesn’t have to be the worst part of your day. Learn how Sentry can help.

Learn more

đź‘‹ Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay