Architect Alick

Posted on Nov 13

Setting Up NVIDIA Parakeet TDT 0.6B v3 for Speech Recognition on AWS EC2 Ubuntu

#ubuntu #aws #tutorial #deeplearning

Introduction

NVIDIA's Parakeet TDT 0.6B v3 is a state-of-the-art automatic speech recognition (ASR) model that delivers exceptional accuracy for English transcription. With 600 million parameters, this model combines the FastConformer architecture with the Token-and-Duration Transducer (TDT) decoder to provide:

Automatic punctuation and capitalization
Word-level timestamp predictions
Processing of audio segments up to 24 minutes in a single pass
Impressive speed: RTFx of 3380 on the HF-Open-ASR leaderboard

This guide walks you through setting up Parakeet TDT 0.6B v3 on an AWS EC2 Ubuntu instance, similar to how you would deploy Whisper, but optimized for NVIDIA's cutting-edge ASR technology.

Choosing the Right EC2 Instance

For running Parakeet TDT 0.6B v3, you need an EC2 instance with NVIDIA GPU support. Here are your options:

Recommended Instance Types

g6.2xlarge (Recommended)

GPU: 1x NVIDIA L4 with 24 GB memory
vCPUs: 8
RAM: 32 GiB
Performance: 2x better for deep learning inference compared to g4dn instances
Cost: ~$0.98/hour (us-east-1)
Best for: Production workloads with modern GPU architecture

g4dn.xlarge (Budget Option)

GPU: 1x NVIDIA T4 with 16 GB memory
vCPUs: 4
RAM: 16 GiB
Cost: Lower cost entry point
Best for: Development and testing

Hardware Requirements:

Minimum 2GB RAM for model loading
Supports NVIDIA Volta, Ampere, Hopper, and Blackwell architectures
At least 30-40 GB disk space

Step 1: Launch EC2 Instance on AWS

1.1 Create the EC2 Instance

The g6.2xlarge instance is one of the most cost-effective options for running speech recognition models on AWS. Follow these steps to launch your instance:

Step-by-step launch process:

Open the AWS Console and navigate to the EC2 Dashboard
Click the "Launch instance" button in the top section
Enter an instance name (e.g., "parakeet-asr-instance")
In the Application and OS Images (Amazon Machine Image) section:
- Search for "Ubuntu"
- Select Ubuntu Server 22.04 LTS (or Ubuntu 24.04 LTS for newer releases)
- Verify the AMI is marked as "Free tier eligible" if applicable
In the Instance type section:
- Search for or select g6.2xlarge
- This instance provides 1x NVIDIA L4 GPU with 24GB memory, 8 vCPUs, and 32 GiB RAM
In the Key pair (login) section:
- Select an existing key pair or create a new one
- Important: Download and securely save the .pem file if creating a new key pair
- This key is required for SSH access
In the Network settings section:
- Leave default VPC settings
- Allow SSH traffic from your IP address (or 0.0.0.0/0 for development, but restrict in production)
In the Storage (Root volume) section:
- Expand the storage configuration
- Change the EBS Volume size from 100 GB
- Keep the volume type as gp3 (General Purpose SSD)
Leave all other settings at their default values
Review your configuration and click "Launch instance"

1.2 Monitor Instance Launch

After clicking "Launch instance":

You'll see a confirmation page with your Instance ID
Click on your instance ID to view the instance details page
Wait 1-2 minutes for the instance to reach the "Running" state
Once running, note the Public IPv4 address displayed on the instance page
The instance will automatically assign an elastic IP; this is your connection address

1.3 Connect to Your Instance via SSH

Once the instance is running, connect using SSH:

# Use the following command (replace the path and DNS/IP accordingly)
ssh -i /path/to/your-key.pem ubuntu@your-instance-public-dns

Example:

ssh -i ~/Downloads/my-parakeet-key.pem ubuntu@ec2-54-123-45-67.compute-1.amazonaws.com

Or using the public IPv4 address:

ssh -i ~/Downloads/my-parakeet-key.pem ubuntu@54.123.45.67

Expected output on first connection:

The authenticity of host '...' can't be established.
ECDSA key fingerprint is ...
Are you sure you want to continue connecting (yes/no/[fingerprint])?

Type yes and press Enter to add the host to your known hosts.

Success indicator:

ubuntu@ip-xxx-xxx-xxx-xxx:~$

If you see this prompt, your SSH connection is successful! Your EC2 instance is ready for software installation.

Step 2: Assign IAM Role to EC2 Instance

To allow your EC2 instance to access AWS S3 buckets (for storing audio files and transcription results), you need to assign an IAM role. This is more secure than using hardcoded AWS credentials.

2.1 Create an IAM Role

Create the role:

Open the AWS IAM Console (https://console.aws.amazon.com/iam/)
In the left sidebar, click "Roles"
Click the "Create role" button
In the "Trusted entity type" section:
- Select "AWS Service"
In the "Service or use case" section:
- Search for and select "EC2" from the list
- This allows EC2 instances to use this role
Click "Next" to proceed to permissions
In the "Permissions policies" section:
- Search for "S3"
- Select "AmazonS3FullAccess"
- Note: For production environments, create a custom policy that restricts access to specific S3 buckets and operations instead of granting full S3 access
Click "Next" to review
In the "Role name" field, enter a descriptive name:
- Example: asr-ec2-role or parakeet-s3-access-role
Optionally add a description: "Role for EC2 instance to access S3 for ASR audio files"
Click "Create role"

2.2 Attach the Role to Your EC2 Instance

Now attach this role to your running EC2 instance:

Go back to the EC2 Dashboard
Click "Instances" in the left sidebar
Find and click on your instance (the g6.2xlarge instance you just created)
You'll see the instance details page
Click the "Actions" button (top-right corner)
Hover over "Security" in the dropdown menu
Click "Modify IAM role"
In the dropdown menu that appears:
- Select the role you just created (e.g., asr-ec2-role)
Click "Update IAM role"

Verification:

Refresh the instance details page
Scroll to the "Details" tab
Look for the "IAM instance profile" field
You should see your role name displayed there

Your EC2 instance can now access S3 without requiring explicit AWS credentials!

2.3 Verify IAM Role Access (Optional)

To confirm the role is working correctly, you can test S3 access from your instance:

# Connect to your instance via SSH (if not already connected)
ssh -i your-key.pem ubuntu@your-instance-public-dns

# Test listing S3 buckets
aws s3 ls

# Example output (your actual buckets will be listed):
# 2025-11-02 12:34:56 my-asr-audio-files
# 2025-11-02 12:35:12 my-transcription-results

If you see your S3 buckets listed, the IAM role is properly configured!

Step 3: System Update and Basic Dependencies

Once connected to your EC2 instance via SSH, start with system updates:

# Update package lists
sudo apt update

# Install Python pip and basic tools
sudo apt install python3-pip -y

Step 4: Install NVIDIA Drivers and CUDA Toolkit

4.1 Install NVIDIA Driver

For Ubuntu, install the appropriate NVIDIA driver version:

# Install NVIDIA driver (version 525 or later)
sudo apt install nvidia-driver-525 -y

# Reboot the system to load the driver
sudo reboot

After reboot, reconnect to your instance and verify the driver installation:

nvidia-smi

You should see output showing your GPU (NVIDIA L4 for g6 or T4 for g4dn instances).

4.2 Install CUDA Toolkit

# Install NVIDIA CUDA toolkit
sudo apt install nvidia-cuda-toolkit -y

4.3 Install FFmpeg

FFmpeg is required for audio processing:

sudo apt install ffmpeg -y

Verify installation:

ffmpeg -version

Step 5: Set Up Python Environment with Miniconda

Using Conda helps manage dependencies and avoid conflicts.

5.1 Download and Install Miniconda

# Download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Run the installer
bash Miniconda3-latest-Linux-x86_64.sh

Follow the prompts:

Press ENTER to review the license
Type yes to accept
Press ENTER to confirm the installation location
Type yes when asked to initialize Miniconda3

5.2 Initialize Conda

# Initialize conda for bash
/home/ubuntu/miniconda3/bin/conda init

# Reload your shell configuration
source ~/.bashrc

5.3 Accept Conda Terms of Service

# Accept TOS for main channel
conda config --set channel_priority strict
conda config --add channels conda-forge

Step 6: Create and Configure Conda Environment

6.1 Create New Environment

# Create conda environment with Python 3.11
conda create -n parakeet_env python=3.11 -y

# Activate the environment
conda activate parakeet_env

6.2 Install GCC Libraries (Important)

Some dependencies require updated GCC libraries:

# Install GCC libraries via conda
conda install -c conda-forge libgcc-ng libstdcxx-ng -y

Step 7: Install PyTorch and NeMo Toolkit

7.1 Install PyTorch

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

7.2 Install Core Dependencies

# Install numpy and other core dependencies
pip install numpy packaging Cython

7.3 Install NeMo Toolkit with ASR Support

# Install NeMo toolkit with ASR (Automatic Speech Recognition) support
pip install nemo_toolkit['asr']

This installation includes:

NeMo core framework
ASR-specific modules
FastConformer and TDT decoder components
All necessary dependencies for Parakeet models

Step 8: Verify Installation

8.1 Check GPU Access

Create a test script to verify PyTorch can access the GPU:

python3 << EOF
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
EOF

Expected output:

PyTorch version: 2.x.x
CUDA available: True
CUDA version: 11.8
GPU device: NVIDIA L4 (or NVIDIA T4)
GPU memory: 22.35 GB (or 16.00 GB)

8.2 Verify NeMo Installation

python3 << EOF
import nemo
import nemo.collections.asr as nemo_asr
print(f"NeMo version: {nemo.__version__}")
print("NeMo ASR module loaded successfully!")
EOF

Step 9: Load and Test Parakeet TDT 0.6B v3

9.1 Create Inference Script

Create a file called parakeet_inference.py:

import nemo.collections.asr as nemo_asr
import torch

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")

# Load the Parakeet TDT 0.6B v3 model
print("Loading Parakeet TDT 0.6B v3 model...")
asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v3")

# Test with a sample audio file
# Download a sample audio file
import os
if not os.path.exists("sample_audio.wav"):
    os.system("wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav -O sample_audio.wav")

# Transcribe the audio
print("\nTranscribing audio...")
transcription = asr_model.transcribe(["sample_audio.wav"])

# Print results
print("\nTranscription result:")
print(transcription[0])

9.2 Run the Inference Script

python3 parakeet_inference.py

The first run will:

Download the model from Hugging Face (~2.5 GB)
Load it into GPU memory
Download a sample audio file
Transcribe the audio

Expected output:

CUDA available: True
Using GPU: NVIDIA L4
Loading Parakeet TDT 0.6B v3 model...
[NeMo I ...] Instantiating model from pre-trained checkpoint

Transcribing audio...

Transcription result:
he tells us that at this festive season of the year with...

Troubleshooting

Issue 1: SSH Connection Refused

Possible causes and solutions:

Instance is still starting up (wait 1-2 minutes)
Security group doesn't allow SSH on port 22
Incorrect key permissions:

  chmod 400 /path/to/your-key.pem

Wrong username (should be ubuntu for Ubuntu AMIs)

Issue 2: IAM Role Not Working

Solution: Verify role attachment:

# From within the EC2 instance
aws sts get-caller-identity
# Should show the role ARN

Next Step

We will explore different deploy options once we confirmed which open source model we are going to use.

Conclusion

You now have a fully functional Parakeet TDT 0.6B v3 ASR system running on AWS EC2 with S3 integration.

Happy transcribing! 🎙️