DEV Community

Cover image for 3 Accurate Speech-to-Text APIs You Don't Know About
Robert Gehring
Robert Gehring

Posted on

3 Accurate Speech-to-Text APIs You Don't Know About

Since speech recognition is at the forefront of most machine learning efforts, most developers have been leveraging some of the most popular and advanced speech-to-text APIs to build their services. Many large companies, like Google, Microsoft, Amazon provide cloud APIs for speech processing. Below are three speech to text APIs you may not have heard of, but which are successfully compete with popular vendors and provide you the best-in-class accuracy.

1. Rev.ai

The Rev.ai API provides speech-to-text recognition services that can make audio and video content searchable and accessible. Rev.ai offers global recognition models supporting all major English accents for use in speech-to-text transcription. The speech recognition engine was trained on 50,000+ hours of human-transcribed content from a wide range of topics, industries, and accents. You can easily check how Rev.ai compares with Google.

2. SpeechText.AI

The SpeechText.AI API automatically transcribes speech to text and summarizes audio data with high accuracy in multiple languages. SpeechText.AI uses a combination of speech recognition and natural language processing models to auto-summarize your recordings and highlight key moments in discussion. The unique domain-specific speech recognition technology enables users to improve the accuracy of automatic transcription for industries such as finance, healthcare, legal, IT, HR, and others. The service can recognize multiple speakers and add word-by-word timestamps, punctuation, casing to transcription results. SpeechText.AI supports almost all common media file formats and can transcribe audio/video files stored on your hard drive or files accessible over public URLs (HTTP, FTP, Google Drive, Dropbox, etc.). The SpeechText.AI's speech recognition algorithm achieves a word error rate of 3.8% on the open source LibriSpeech dataset (~96% accuracy).

3. AssemblyAI

AssemblyAI offers speech recognition and transcription capabilities which help developers in building voice-powered applications. The API supports customizable transcription which recognizes industry-specific phrases unique to a product. It is more than speech-to-text service. AssemblyAI has a lot of useful functions. E.g. it automatically detects and removes sensitive information from audio files such as credit card numbers, SSNs, and more.

Do you have another speech recognition engine you use? Tell me about it in the comments.

Top comments (0)