DEV Community

Architect Alick
Architect Alick

Posted on

Setting Up NVIDIA Parakeet TDT 0.6B v3 for Speech Recognition on AWS EC2 Ubuntu

Introduction

NVIDIA's Parakeet TDT 0.6B v3 is a state-of-the-art automatic speech recognition (ASR) model that delivers exceptional accuracy for English transcription. With 600 million parameters, this model combines the FastConformer architecture with the Token-and-Duration Transducer (TDT) decoder to provide:

  • Automatic punctuation and capitalization
  • Word-level timestamp predictions
  • Processing of audio segments up to 24 minutes in a single pass
  • Impressive speed: RTFx of 3380 on the HF-Open-ASR leaderboard

This guide walks you through setting up Parakeet TDT 0.6B v3 on an AWS EC2 Ubuntu instance, similar to how you would deploy Whisper, but optimized for NVIDIA's cutting-edge ASR technology.

Choosing the Right EC2 Instance

For running Parakeet TDT 0.6B v3, you need an EC2 instance with NVIDIA GPU support. Here are your options:

Recommended Instance Types

g6.2xlarge (Recommended)

  • GPU: 1x NVIDIA L4 with 24 GB memory
  • vCPUs: 8
  • RAM: 32 GiB
  • Performance: 2x better for deep learning inference compared to g4dn instances
  • Cost: ~$0.98/hour (us-east-1)
  • Best for: Production workloads with modern GPU architecture

g4dn.xlarge (Budget Option)

  • GPU: 1x NVIDIA T4 with 16 GB memory
  • vCPUs: 4
  • RAM: 16 GiB
  • Cost: Lower cost entry point
  • Best for: Development and testing

Hardware Requirements:

  • Minimum 2GB RAM for model loading
  • Supports NVIDIA Volta, Ampere, Hopper, and Blackwell architectures
  • At least 30-40 GB disk space

Step 1: Launch EC2 Instance on AWS

1.1 Create the EC2 Instance

The g6.2xlarge instance is one of the most cost-effective options for running speech recognition models on AWS. Follow these steps to launch your instance:

Step-by-step launch process:

  1. Open the AWS Console and navigate to the EC2 Dashboard
  2. Click the "Launch instance" button in the top section
  3. Enter an instance name (e.g., "parakeet-asr-instance")
  4. In the Application and OS Images (Amazon Machine Image) section:
    • Search for "Ubuntu"
    • Select Ubuntu Server 22.04 LTS (or Ubuntu 24.04 LTS for newer releases)
    • Verify the AMI is marked as "Free tier eligible" if applicable
  5. In the Instance type section:
    • Search for or select g6.2xlarge
    • This instance provides 1x NVIDIA L4 GPU with 24GB memory, 8 vCPUs, and 32 GiB RAM
  6. In the Key pair (login) section:
    • Select an existing key pair or create a new one
    • Important: Download and securely save the .pem file if creating a new key pair
    • This key is required for SSH access
  7. In the Network settings section:
    • Leave default VPC settings
    • Allow SSH traffic from your IP address (or 0.0.0.0/0 for development, but restrict in production)
  8. In the Storage (Root volume) section:
    • Expand the storage configuration
    • Change the EBS Volume size from 100 GB
    • Keep the volume type as gp3 (General Purpose SSD)
  9. Leave all other settings at their default values
  10. Review your configuration and click "Launch instance"

1.2 Monitor Instance Launch

After clicking "Launch instance":

  1. You'll see a confirmation page with your Instance ID
  2. Click on your instance ID to view the instance details page
  3. Wait 1-2 minutes for the instance to reach the "Running" state
  4. Once running, note the Public IPv4 address displayed on the instance page
  5. The instance will automatically assign an elastic IP; this is your connection address

1.3 Connect to Your Instance via SSH

Once the instance is running, connect using SSH:

# Use the following command (replace the path and DNS/IP accordingly)
ssh -i /path/to/your-key.pem ubuntu@your-instance-public-dns
Enter fullscreen mode Exit fullscreen mode

Example:

ssh -i ~/Downloads/my-parakeet-key.pem ubuntu@ec2-54-123-45-67.compute-1.amazonaws.com
Enter fullscreen mode Exit fullscreen mode

Or using the public IPv4 address:

ssh -i ~/Downloads/my-parakeet-key.pem ubuntu@54.123.45.67
Enter fullscreen mode Exit fullscreen mode

Expected output on first connection:

The authenticity of host '...' can't be established.
ECDSA key fingerprint is ...
Are you sure you want to continue connecting (yes/no/[fingerprint])?
Enter fullscreen mode Exit fullscreen mode

Type yes and press Enter to add the host to your known hosts.

Success indicator:

ubuntu@ip-xxx-xxx-xxx-xxx:~$
Enter fullscreen mode Exit fullscreen mode

If you see this prompt, your SSH connection is successful! Your EC2 instance is ready for software installation.


Step 2: Assign IAM Role to EC2 Instance

To allow your EC2 instance to access AWS S3 buckets (for storing audio files and transcription results), you need to assign an IAM role. This is more secure than using hardcoded AWS credentials.

2.1 Create an IAM Role

Create the role:

  1. Open the AWS IAM Console (https://console.aws.amazon.com/iam/)
  2. In the left sidebar, click "Roles"
  3. Click the "Create role" button
  4. In the "Trusted entity type" section:
    • Select "AWS Service"
  5. In the "Service or use case" section:
    • Search for and select "EC2" from the list
    • This allows EC2 instances to use this role
  6. Click "Next" to proceed to permissions
  7. In the "Permissions policies" section:
    • Search for "S3"
    • Select "AmazonS3FullAccess"
    • Note: For production environments, create a custom policy that restricts access to specific S3 buckets and operations instead of granting full S3 access
  8. Click "Next" to review
  9. In the "Role name" field, enter a descriptive name:
    • Example: asr-ec2-role or parakeet-s3-access-role
  10. Optionally add a description: "Role for EC2 instance to access S3 for ASR audio files"
  11. Click "Create role"

2.2 Attach the Role to Your EC2 Instance

Now attach this role to your running EC2 instance:

  1. Go back to the EC2 Dashboard
  2. Click "Instances" in the left sidebar
  3. Find and click on your instance (the g6.2xlarge instance you just created)
  4. You'll see the instance details page
  5. Click the "Actions" button (top-right corner)
  6. Hover over "Security" in the dropdown menu
  7. Click "Modify IAM role"
  8. In the dropdown menu that appears:
    • Select the role you just created (e.g., asr-ec2-role)
  9. Click "Update IAM role"

Verification:

  1. Refresh the instance details page
  2. Scroll to the "Details" tab
  3. Look for the "IAM instance profile" field
  4. You should see your role name displayed there

Your EC2 instance can now access S3 without requiring explicit AWS credentials!

2.3 Verify IAM Role Access (Optional)

To confirm the role is working correctly, you can test S3 access from your instance:

# Connect to your instance via SSH (if not already connected)
ssh -i your-key.pem ubuntu@your-instance-public-dns

# Test listing S3 buckets
aws s3 ls

# Example output (your actual buckets will be listed):
# 2025-11-02 12:34:56 my-asr-audio-files
# 2025-11-02 12:35:12 my-transcription-results
Enter fullscreen mode Exit fullscreen mode

If you see your S3 buckets listed, the IAM role is properly configured!


Step 3: System Update and Basic Dependencies

Once connected to your EC2 instance via SSH, start with system updates:

# Update package lists
sudo apt update

# Install Python pip and basic tools
sudo apt install python3-pip -y
Enter fullscreen mode Exit fullscreen mode

Step 4: Install NVIDIA Drivers and CUDA Toolkit

4.1 Install NVIDIA Driver

For Ubuntu, install the appropriate NVIDIA driver version:

# Install NVIDIA driver (version 525 or later)
sudo apt install nvidia-driver-525 -y

# Reboot the system to load the driver
sudo reboot
Enter fullscreen mode Exit fullscreen mode

After reboot, reconnect to your instance and verify the driver installation:

nvidia-smi
Enter fullscreen mode Exit fullscreen mode

You should see output showing your GPU (NVIDIA L4 for g6 or T4 for g4dn instances).

4.2 Install CUDA Toolkit

# Install NVIDIA CUDA toolkit
sudo apt install nvidia-cuda-toolkit -y
Enter fullscreen mode Exit fullscreen mode

4.3 Install FFmpeg

FFmpeg is required for audio processing:

sudo apt install ffmpeg -y
Enter fullscreen mode Exit fullscreen mode

Verify installation:

ffmpeg -version
Enter fullscreen mode Exit fullscreen mode

Step 5: Set Up Python Environment with Miniconda

Using Conda helps manage dependencies and avoid conflicts.

5.1 Download and Install Miniconda

# Download Miniconda installer
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

# Run the installer
bash Miniconda3-latest-Linux-x86_64.sh
Enter fullscreen mode Exit fullscreen mode

Follow the prompts:

  • Press ENTER to review the license
  • Type yes to accept
  • Press ENTER to confirm the installation location
  • Type yes when asked to initialize Miniconda3

5.2 Initialize Conda

# Initialize conda for bash
/home/ubuntu/miniconda3/bin/conda init

# Reload your shell configuration
source ~/.bashrc
Enter fullscreen mode Exit fullscreen mode

5.3 Accept Conda Terms of Service

# Accept TOS for main channel
conda config --set channel_priority strict
conda config --add channels conda-forge
Enter fullscreen mode Exit fullscreen mode

Step 6: Create and Configure Conda Environment

6.1 Create New Environment

# Create conda environment with Python 3.11
conda create -n parakeet_env python=3.11 -y

# Activate the environment
conda activate parakeet_env
Enter fullscreen mode Exit fullscreen mode

6.2 Install GCC Libraries (Important)

Some dependencies require updated GCC libraries:

# Install GCC libraries via conda
conda install -c conda-forge libgcc-ng libstdcxx-ng -y
Enter fullscreen mode Exit fullscreen mode

Step 7: Install PyTorch and NeMo Toolkit

7.1 Install PyTorch

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Enter fullscreen mode Exit fullscreen mode

7.2 Install Core Dependencies

# Install numpy and other core dependencies
pip install numpy packaging Cython
Enter fullscreen mode Exit fullscreen mode

7.3 Install NeMo Toolkit with ASR Support

# Install NeMo toolkit with ASR (Automatic Speech Recognition) support
pip install nemo_toolkit['asr']
Enter fullscreen mode Exit fullscreen mode

This installation includes:

  • NeMo core framework
  • ASR-specific modules
  • FastConformer and TDT decoder components
  • All necessary dependencies for Parakeet models

Step 8: Verify Installation

8.1 Check GPU Access

Create a test script to verify PyTorch can access the GPU:

python3 << EOF
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"CUDA version: {torch.version.cuda}")
if torch.cuda.is_available():
    print(f"GPU device: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
EOF
Enter fullscreen mode Exit fullscreen mode

Expected output:

PyTorch version: 2.x.x
CUDA available: True
CUDA version: 11.8
GPU device: NVIDIA L4 (or NVIDIA T4)
GPU memory: 22.35 GB (or 16.00 GB)
Enter fullscreen mode Exit fullscreen mode

8.2 Verify NeMo Installation

python3 << EOF
import nemo
import nemo.collections.asr as nemo_asr
print(f"NeMo version: {nemo.__version__}")
print("NeMo ASR module loaded successfully!")
EOF
Enter fullscreen mode Exit fullscreen mode

Step 9: Load and Test Parakeet TDT 0.6B v3

9.1 Create Inference Script

Create a file called parakeet_inference.py:

import nemo.collections.asr as nemo_asr
import torch

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"Using GPU: {torch.cuda.get_device_name(0)}")

# Load the Parakeet TDT 0.6B v3 model
print("Loading Parakeet TDT 0.6B v3 model...")
asr_model = nemo_asr.models.ASRModel.from_pretrained("nvidia/parakeet-tdt-0.6b-v3")

# Test with a sample audio file
# Download a sample audio file
import os
if not os.path.exists("sample_audio.wav"):
    os.system("wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav -O sample_audio.wav")

# Transcribe the audio
print("\nTranscribing audio...")
transcription = asr_model.transcribe(["sample_audio.wav"])

# Print results
print("\nTranscription result:")
print(transcription[0])
Enter fullscreen mode Exit fullscreen mode

9.2 Run the Inference Script

python3 parakeet_inference.py
Enter fullscreen mode Exit fullscreen mode

The first run will:

  1. Download the model from Hugging Face (~2.5 GB)
  2. Load it into GPU memory
  3. Download a sample audio file
  4. Transcribe the audio

Expected output:

CUDA available: True
Using GPU: NVIDIA L4
Loading Parakeet TDT 0.6B v3 model...
[NeMo I ...] Instantiating model from pre-trained checkpoint

Transcribing audio...

Transcription result:
he tells us that at this festive season of the year with...
Enter fullscreen mode Exit fullscreen mode

Troubleshooting

Issue 1: SSH Connection Refused

Possible causes and solutions:

  • Instance is still starting up (wait 1-2 minutes)
  • Security group doesn't allow SSH on port 22
  • Incorrect key permissions:
  chmod 400 /path/to/your-key.pem
Enter fullscreen mode Exit fullscreen mode
  • Wrong username (should be ubuntu for Ubuntu AMIs)

Issue 2: IAM Role Not Working

Solution: Verify role attachment:

# From within the EC2 instance
aws sts get-caller-identity
# Should show the role ARN
Enter fullscreen mode Exit fullscreen mode

Next Step

  • We will explore different deploy options once we confirmed which open source model we are going to use.

Conclusion

You now have a fully functional Parakeet TDT 0.6B v3 ASR system running on AWS EC2 with S3 integration.

Happy transcribing! 🎙️

Top comments (0)