ASR of ML Kit: Make Your App Recognize Speech

#hmscore #machinelearning

Our lives are now packed with advanced devices, such as mobile gadgets, wearables, smart home appliances, telematics devices, and more.

Of all the features that make them advanced, the major one is the ability to understand user speech. Speaking into a device and telling it to do something are naturally easier and more satisfying than using input devices (like a keyboard and mouse) for the same purpose.

To help devices understand human speech, HMS Core ML Kit introduced the automatic speech recognition (ASR) service, to create a smoother human-machine interaction experience.

Service Introduction

ASR can recognize and simultaneously convert speech (no longer than 60s) into text, by using industry-leading deep learning technologies. Boasting regularly updated algorithms and data, currently the service delivers a recognition accuracy of 95%+. The supported languages now are: Mandarin Chinese (including Chinese-English bilingual speech), English, French, German, Spanish, Italian, Arabic, Russian, Thai, Malay, Filipino, and Turkish.

Demo

Use Cases

ASR covers many fields spanning life and work, and enhances recognition capabilities of searching for products, movies, TV series, and music, as well as the capabilities for navigation services. When a user searches for a product in a shopping app through speech, this service recognizes the product name or feature in speech as text for search.

Similarly, when a user uses a music app, this service recognizes the song name or singer input by voice as text to search for the song.

On top of these, ASR can even contribute to driving safety: During driving — when users are not supposed to use their phone to, for example, search for a place — ASR allows them to speak out where they want to go and converts the speech into text for the navigation app which can then offer the search results to users.

Features

Real-time result output
Available options: with and without speech pickup UI
Endpoint detection: Start and end points of speech can be accurately located.
Silence detection: No voice packet is sent for silent parts.
Intelligent conversion of number formats: For example, when the speech is "year two thousand twenty-two", the text output by ASR will be "2022".

How to Integrate ML Kit?

For guidance about ML Kit integration, please refer to its official document. Also welcome to the HUAWEI Developers website, where you can find other resources for reference.

DEV Community