<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: N D</title>
    <description>The latest articles on DEV Community by N D (@n_d_87a2e0ec24f0923d167cb).</description>
    <link>https://dev.to/n_d_87a2e0ec24f0923d167cb</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2697676%2F43f824ec-3da7-4762-8464-93d2cf510ae0.png</url>
      <title>DEV Community: N D</title>
      <link>https://dev.to/n_d_87a2e0ec24f0923d167cb</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/n_d_87a2e0ec24f0923d167cb"/>
    <language>en</language>
    <item>
      <title>Building an AI Profanity Filter with Vocal Separation</title>
      <dc:creator>N D</dc:creator>
      <pubDate>Fri, 27 Mar 2026 14:39:37 +0000</pubDate>
      <link>https://dev.to/n_d_87a2e0ec24f0923d167cb/building-an-ai-profanity-filter-with-vocal-separation-3m24</link>
      <guid>https://dev.to/n_d_87a2e0ec24f0923d167cb/building-an-ai-profanity-filter-with-vocal-separation-3m24</guid>
      <description>&lt;p&gt;I built an online tool that automatically detects and bleeps profanity in video and audio files. Here's the high-level architecture.&lt;/p&gt;

&lt;h2&gt;
  
  
  The problem
&lt;/h2&gt;

&lt;p&gt;Manual profanity censoring takes 45+ minutes for a 10-minute video. You have to listen through, find each word, razor the audio, drop a beep effect. For songs, it's nearly impossible without destroying the music.&lt;/p&gt;

&lt;h2&gt;
  
  
  The solution
&lt;/h2&gt;

&lt;p&gt;AI speech recognition + neural vocal separation.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it works
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;User uploads a file or pastes a YouTube URL&lt;/li&gt;
&lt;li&gt;Audio is extracted with FFmpeg&lt;/li&gt;
&lt;li&gt;AI speech-to-text transcribes the audio (AssemblyAI / Deepgram)&lt;/li&gt;
&lt;li&gt;Profanity is detected using morphological analysis (lemmatization)&lt;/li&gt;
&lt;li&gt;Each word is replaced with beep/silence/custom sound via FFmpeg&lt;/li&gt;
&lt;li&gt;For songs: Demucs AI separates vocals from instruments first&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Song mode — the hard part
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/facebookresearch/demucs" rel="noopener noreferrer"&gt;Demucs&lt;/a&gt; by Meta AI does the heavy lifting — splitting audio into vocal and instrumental tracks. Profanity detection runs only on the vocal track, then the censored vocals are mixed back with the original instruments. The music stays untouched.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frontend&lt;/strong&gt;: Next.js (React)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backend&lt;/strong&gt;: NestJS (Node.js), BullMQ queues&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio processing&lt;/strong&gt;: Python (FastAPI), Demucs, FFmpeg&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Docker Compose, PostgreSQL, Redis&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;12,000+ files processed. Three processing modes: standard (clean speech), precise (noisy audio), enhanced (songs with vocal separation).&lt;/p&gt;

&lt;p&gt;Free for up to 15 minutes per month at &lt;a href="https://videocensor.net" rel="noopener noreferrer"&gt;videocensor.net&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Would love to hear your thoughts!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>showdev</category>
      <category>python</category>
    </item>
  </channel>
</rss>
