DEV Community

Cover image for A beginner's guide to the Parakeet-Rnnt-1.1b model by Nvidia on Replicate
aimodels-fyi
aimodels-fyi

Posted on • Originally published at aimodels.fyi

A beginner's guide to the Parakeet-Rnnt-1.1b model by Nvidia on Replicate

This is a simplified guide to an AI model called Parakeet-Rnnt-1.1b maintained by Nvidia. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.

Model overview

The parakeet-rnnt-1.1b is an advanced Automatic Speech Recognition (ASR) model developed jointly by NVIDIA NeMo and Suno.ai. It excels at transcribing English speech with high accuracy, outperforming the popular OpenAI Whisper model on several benchmark datasets. The model utilizes the FastConformer architecture, a optimized version of the Conformer model, and is trained in a multitask setup with a Transducer decoder (RNNT) loss.

Model inputs and outputs

Inputs

  • audio_file: A 16,000 Hz mono-channel audio file (WAV format) to be transcribed by the ASR model.

Outputs

  • Output: The transcribed speech in lower-case English text.

Capabilities

The parakeet-rnnt-1.1b model has demo...

Click here to read the full guide to Parakeet-Rnnt-1.1b

Top comments (0)