Most of our day to day usage of computers uses our computers as a sound device so I thought it will be nice if I somehow connect default audio output to voice recognition so that independent of what software you use all words will be recognized ? Teams , youtube, tiktok, twitter, Edge, VLC, β¦ you name it ( unfortunately for Windows only βΉ ) . And how much can we push it like Subtitles for cable TV π
Deepgram sound to text converter for all sounds in emitted Windows
What you can find in this repository?
- How to get started Deepgram in windows forms
- Sample for custom label control with borders in Windows form
- How to get and capturesystem wide default audio output
- How to record captured audio as mp3
- How to save and get system settings
Additional Resources / Info
In order to have system wide voice recognitionyou need basically need two components a voice recognizer service like deepgram and some way to eavesdrop generated sounds from PC. I chose C# as language as I will be directly connecting to System. In order to loopback (technical term for connecting to self sound system) in windows you use Wasapi driver. I chose NAudio library for windows system.
And here are some of the results.
While the system has settings, transparent β¦ properties the most critical system is getting audio and recognize it.
private async void ConvertAndTranscript()
//enter credentials for deepgram
var credentials = new Credentials(textBoxApiKey.Text);
//Create our export folder to record sound and CSV file
var outputFolder = CreateRecordingFolder();
//File settings
var dateTimeNow = DateTime.Now;
var fileName = $"{dateTimeNow.Year}_{dateTimeNow.Month}_{dateTimeNow.Day}#{dateTimeNow.Hour}_{dateTimeNow.Minute}_{dateTimeNow.Minute}_record";
var soundFileName = $"{fileName}.mp3";
var csvFileName = $"{fileName}.csv";
var outputSoundFilePath = Path.Combine(outputFolder, soundFileName);
var outputCSVFilePath = Path.Combine(outputFolder, csvFileName);
//init deepgram
var deepgramClient = new DeepgramClient(credentials);
//init loopback interface
_WasapiLoopbackCapture = new WasapiLoopbackCapture();
//generate memory stream and deepgram client
using (var memoryStream = new MemoryStream())
using (var deepgramLive = deepgramClient.CreateLiveTranscriptionClient())
//the format that will we send to deepgram is 24 Khz 16 bit 2 channels
var waveFormat = new WaveFormat(24000, 16, 2);
var deepgramWriter = new WaveFileWriter(memoryStream, waveFormat);
//mp3 writer if we wanted to save audio
LameMP3FileWriter? mp3Writer = checkBoxSaveMP3.Checked ?
new LameMP3FileWriter(outputSoundFilePath, _WasapiLoopbackCapture.WaveFormat, LAMEPreset.STANDARD_FAST) : null;
//file writer if we wanted to save as csv
StreamWriter? csvWriter = checkBoxSaveAsCSV.Checked ? File.CreateText(outputCSVFilePath) : null;
//deepgram options
var options = new LiveTranscriptionOptions()
Punctuate = true,
Diarize = true,
Encoding = Deepgram.Common.AudioEncoding.Linear16,
ProfanityFilter = checkBoxProfinityAllowed.Checked,
Language = _SelectedLanguage.LanguageCode,
Model = _SelectedModel.ModelCode,
await deepgramLive.StartConnectionAsync(options);
//when we receive data from deepgram this is mostly taken from their samples
deepgramLive.TranscriptReceived += (s, e) =>
if (e.Transcript.IsFinal &&
e.Transcript.Channel.Alternatives.First().Transcript.Length > 0)
var transcript = e.Transcript;
var text = $"{transcript.Channel.Alternatives.First().Transcript}";
_CaptionForm?.captionLabel.BeginInvoke((Action)(() =>
csvWriter?.WriteLine($@"{DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss \"GMT\"zzz")},""{text}""");
_CaptionForm.captionLabel.Text = text;
catch (Exception ex)
deepgramLive.ConnectionError += (s, e) =>
//when windows tell us that there is sound data ready to be processed
//better than polling
_WasapiLoopbackCapture.DataAvailable += (s, a) =>
mp3Writer?.Write(a.Buffer, 0, a.BytesRecorded);
var buffer = ToPCM16(a.Buffer, a.BytesRecorded, _WasapiLoopbackCapture.WaveFormat);
deepgramWriter.Write(buffer, 0, buffer.Length);
memoryStream.Position = 0;
//when recording stopped release and flush all file pointers
_WasapiLoopbackCapture.RecordingStopped += (s, a) =>
if (mp3Writer != null)
mp3Writer = null;
if (csvWriter != null)
csvWriter = null;
while (_WasapiLoopbackCapture.CaptureState != NAudio.CoreAudioApi.CaptureState.Stopped)
if (_CancellationTokenSource?.IsCancellationRequested == true)
_CancellationTokenSource = null;
The rest of the code is for getting code ready to exexute show hide forms etc.
So after all how can you you have subtitles on TV ? In order to achieve this you need to somehow enter TV signal to PC I use a usb capture card for this in order to process capture card input I use OBS once I get the audio signal it makes no difference to me since I process all output sound signals. Then I use computers HDMI output to send signal to TV. It makes no difference to TV and Cable box.
Ps: If you have some issues with lagging check your network connection also there seems to be a problem with memory stream which is not happy with hacky solution. Any PR is welcomed.
