Bhārat Bhāṣā Stack will catalyze Voice Assistant and Conversational AI innovations for vernacular Indic languages as India Stack did for FinTech.
A decade ago, nobody could imagine how digital payments happen today in India. Even street vendors accept money as small as 50 rupees (less than a dollar) on mobile phone. Neither the seller or buyer has to pay any transaction fee. Street vendors don't have deep pockets to build payment gateways either.
It became possible due to Unified Payment Interface (UPI) of the India Stack. It is the digital infrastructure for authentication, payment, and authorization. Every bank implements UPI, so all mobile wallets are interoperable and free of cost.
India speaks several languages, and a large part of the population is neither tech-savvy nor English literate. In-app Voice Assistants are the natural choice to take the benefits of the internet to the masses.
But building these assistants takes a huge investment that small companies can not afford. Just as India Stack offered digital payment infrastructure, we need an affordable Indic Language Stack for conversational AI.
The Indic Language Stack for Conversational AI consists of voice and language technologies:
- Listen: Convert speech audio to text. It is called Automatic Speech Recognition (ASR) or Speech-to-Text (STT).
- Understand: Understand meaning or intent in the text, and extract important entities. It is called Natural Language Understanding (NLU).
- Speak: Ask questions to clarify, confirm, or seek needed information from the user. It is called Speech Synthesis or Text-to-Speech (TTS).
- Translate: Humans speak different languages. Applications may need to translate text from one language to another. It is called Machine Translation (MT).
- Phonetically Translate: Many people type Indic languages using phonetic spellings on roman keyboards. The computers may need to do phonetic-translation of the text to Indic language scripts. It is called Transliteration.
- See/Read: The ability to recognize images of handwritten or printed characters. It is called Optical Character Recognition (OCR).
It has layers to offer diverse entry point based of need and maturity of an organisation:
- Script: India has several scripts, almost one per language
- Data: Training data is the biggest and most expensive barrier
- Models: Even when data is available, traning deep learning model is unaffordable for small organisations.
- Software as a Service (SaaS): SaaS frees developers from hosting the model and managing the service infrastructure. It makes it easier to start building applications.
- Software Development Kit (SDK): SDKs in popular programming languages and OS platforms form the final layer. SDKs can use the models or SaaS.
It will take systematic and sustained collaboration to design and build the Bhārat Bhāṣā Stack:
- Academia sharing research paper with code on Conversational AI problems relevant to India.
- Industry building voice-enabled products and services for the common man.
- Government playing a role like the one in building India Stack.
- Industry Bodies speeding up collaboration through conferences and consortiums.