DEV Community

Cover image for Add Speech Recognition to Your PC even to your TV
Mustafa Unal
Mustafa Unal

Posted on

Add Speech Recognition to Your PC even to your TV

Overview of My Submission

Most of our day to day usage of computers uses our computers as a sound device so I thought it will be nice if I somehow connect default audio output to voice recognition so that independent of what software you use all words will be recognized ? Teams , youtube, tiktok, twitter, Edge, VLC, … you name it ( unfortunately for Windows only ☹ ) . And how much can we push it like Subtitles for cable TV 😊

Submission Category:

Accessibility Advocates

Link to Code on GitHub

GitHub logo bleakview / deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

deepgramwinsys

Deepgram sound to text converter for all sounds in emitted Windows

What you can find in this repository?

  • How to get started Deepgram in windows forms
  • Sample for custom label control with borders in Windows form
  • How to get and capturesystem wide default audio output
  • How to record captured audio as mp3
  • How to save and get system settings



Additional Resources / Info

In order to have system wide voice recognitionyou need basically need two components a voice recognizer service like deepgram and some way to eavesdrop generated sounds from PC. I chose C# as language as I will be directly connecting to System. In order to loopback (technical term for connecting to self sound system) in windows you use Wasapi driver. I chose NAudio library for windows system.
And here are some of the results.

TeamsIt works on teams

youtube videoIt works on browser

While the system has settings, transparent … properties the most critical system is getting audio and recognize it.

private async void ConvertAndTranscript()
{
    //enter credentials for deepgram
    var credentials = new Credentials(textBoxApiKey.Text);
    //Create our export folder to record sound and CSV file
    var outputFolder = CreateRecordingFolder();
    //File settings
    var dateTimeNow = DateTime.Now;
    var fileName = $"{dateTimeNow.Year}_{dateTimeNow.Month}_{dateTimeNow.Day}#{dateTimeNow.Hour}_{dateTimeNow.Minute}_{dateTimeNow.Minute}_record";
    var soundFileName = $"{fileName}.mp3";
    var csvFileName = $"{fileName}.csv";
    var outputSoundFilePath = Path.Combine(outputFolder, soundFileName);
    var outputCSVFilePath = Path.Combine(outputFolder, csvFileName);
    //init deepgram
    var deepgramClient = new DeepgramClient(credentials);
    //init loopback interface
    _WasapiLoopbackCapture = new WasapiLoopbackCapture();
    //generate memory stream and deepgram client
    using (var memoryStream = new MemoryStream())
    using (var deepgramLive = deepgramClient.CreateLiveTranscriptionClient())
    {
        //the format that will we send to deepgram is 24 Khz 16 bit 2 channels  
        var waveFormat = new WaveFormat(24000, 16, 2);
        var deepgramWriter = new WaveFileWriter(memoryStream, waveFormat);
        //mp3 writer if we wanted to save audio
        LameMP3FileWriter? mp3Writer = checkBoxSaveMP3.Checked ?
            new LameMP3FileWriter(outputSoundFilePath, _WasapiLoopbackCapture.WaveFormat, LAMEPreset.STANDARD_FAST) : null;

        //file writer if we wanted to save as csv
        StreamWriter? csvWriter = checkBoxSaveAsCSV.Checked ? File.CreateText(outputCSVFilePath) : null;
        //deepgram options
        var options = new LiveTranscriptionOptions()
        {
            Punctuate = true,
            Diarize = true,
            Encoding = Deepgram.Common.AudioEncoding.Linear16,
            ProfanityFilter = checkBoxProfinityAllowed.Checked,
            Language = _SelectedLanguage.LanguageCode,
            Model = _SelectedModel.ModelCode,
        };
        //connect 
        await deepgramLive.StartConnectionAsync(options);
        //when we receive data from deepgram this is mostly taken from their samples
        deepgramLive.TranscriptReceived += (s, e) =>
        {
            try
            {
                if (e.Transcript.IsFinal &&
                   e.Transcript.Channel.Alternatives.First().Transcript.Length > 0)
                {
                    var transcript = e.Transcript;
                    var text = $"{transcript.Channel.Alternatives.First().Transcript}";
                    _CaptionForm?.captionLabel.BeginInvoke((Action)(() =>
                    {
                        csvWriter?.WriteLine($@"{DateTime.Now.ToString("yyyy-MM-dd HH:mm:ss \"GMT\"zzz")},""{text}""");
                        _CaptionForm.captionLabel.Text = text;
                        _CaptionForm?.captionLabel.Refresh();
                    }));
                }
            }
            catch (Exception ex)
            {

            }
        };
        deepgramLive.ConnectionError += (s, e) =>
        {

        };
        //when windows tell us that there is sound data ready to be processed
        //better than polling
        _WasapiLoopbackCapture.DataAvailable += (s, a) =>
        {
            mp3Writer?.Write(a.Buffer, 0, a.BytesRecorded);
            var buffer = ToPCM16(a.Buffer, a.BytesRecorded, _WasapiLoopbackCapture.WaveFormat);
            deepgramWriter.Write(buffer, 0, buffer.Length);
            deepgramLive.SendData(memoryStream.ToArray());
            memoryStream.Position = 0;
        };
        //when recording stopped release and flush all file pointers 
        _WasapiLoopbackCapture.RecordingStopped += (s, a) =>
        {
            if (mp3Writer != null)
            {
                mp3Writer.Dispose();
                mp3Writer = null;
            }
            if (csvWriter != null)
            {
                csvWriter.Dispose();
                csvWriter = null;
            }
            _WasapiLoopbackCapture.Dispose();
        };
        _WasapiLoopbackCapture.StartRecording();
        while (_WasapiLoopbackCapture.CaptureState != NAudio.CoreAudioApi.CaptureState.Stopped)
        {
            if (_CancellationTokenSource?.IsCancellationRequested == true)
            {
                _CancellationTokenSource?.Dispose();
                _CancellationTokenSource = null;
                return;
            }
            Thread.Sleep(500);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The rest of the code is for getting code ready to exexute show hide forms etc.

So after all how can you you have subtitles on TV ? In order to achieve this you need to somehow enter TV signal to PC I use a usb capture card for this in order to process capture card input I use OBS once I get the audio signal it makes no difference to me since I process all output sound signals. Then I use computers HDMI output to send signal to TV. It makes no difference to TV and Cable box.

Ps: If you have some issues with lagging check your network connection also there seems to be a problem with memory stream which is not happy with hacky solution. Any PR is welcomed.

Top comments (0)