Dehya Qalbi

Posted on Nov 25, 2024 • Edited on Nov 27, 2024

Textor-AI: A powerful Speech-to-Text built with Django REST Framework, Next.js and AssemblyAI

#devchallenge #assemblyaichallenge #ai #api

This is a submission for the AssemblyAI Challenge: Sophisticated Speech-to-Text.

What I Built

Textor-AI is a modern web application that converts audio recordings into accurate text transcriptions using advanced speech recognition technology. Built with Next.js 14, TypeScript, and Tailwind CSS, it offers a seamless and intuitive user experience for audio transcription needs.

A powerful Speech-to-Text API built with Django REST Framework and AssemblyAI. Textor-AI provides enterprise-grade transcription capabilities with advanced features like multi-language support, real-time status tracking, and comprehensive transcription management.

Transform your voice and audio into text effortlessly using AssemblyAI's powerful speech recognition. Record directly or upload audio files, and let our AI handle the rest. Supports multiple formats and provides instant transcriptions.

Demo Links

Live Demo: https://textor-ai-dehyabi.vercel.app/
Frontend Repository: https://github.com/dehyabi/fe-textor-ai
Backend Repository: https://github.com/dehyabi/textor-ai

Screenshots

The Journey

In the ever-evolving landscape of web development, I recently embarked on an exciting project to build Textor-AI, a modern audio transcription application powered by AssemblyAI. This journey not only enhanced my technical skills but also provided valuable insights into the intersection of AI technology and user experience design.

Project Overview

Textor-AI was born from the desire to create a seamless, user-friendly solution for converting speech to text. The application features a modern Next.js frontend and a robust Django backend, allowing users to upload audio files and receive accurate transcriptions. The tech stack was carefully chosen to ensure scalability, performance, and production readiness.

Technical Stack

Frontend

Next.js 14: Latest React framework for production-grade applications
TypeScript: For type-safe code and better development experience
TailwindCSS: For responsive and beautiful styling
Framer Motion: For fluid animations and transitions
Web Audio API: For robust audio processing
Axios: For handling HTTP requests
Vercel: Cloud hosting platform for frontend deployment

Backend

Django: Python web framework for building robust backend services
Django REST Framework: For creating RESTful APIs
AssemblyAI Integration: Implemented in Django views for audio processing
MySQL: Database for storing transcription data
PythonAnywhere: Cloud hosting platform for Django backend deployment

Implementation Details

Audio Processing and Recording

Implemented browser's native MediaRecorder API
Built custom audio visualization system
Handled cross-browser compatibility
Managed microphone permissions effectively
Created format conversion system using Web Audio API

API Integration

AssemblyAI Integration
- Implemented AssemblyAI API calls in Django views
- Handled file uploads and transcription requests
- Managed API responses and error handling
- Stored transcription results in MySQL database
API Endpoints
- /api/transcribe/upload/: Handles audio file uploads
- /api/transcribe/{transcript_id}/: Checks transcription status for specific ID
- /api/transcribe/: Retrieves transcription history

User Interface

Minimalist, futuristic interface design
Dark mode implementation
Smooth animations with Framer Motion
Real-time transcription progress tracking
Responsive design for all devices

Deployment Architecture

Frontend deployed on Vercel
Backend hosted on PythonAnywhere cloud platform
SSL/HTTPS enabled for secure communication
Automated backup and monitoring
Load balancing configuration

Technical Challenges and Solutions

Audio Processing
- Challenge: Browser compatibility with different formats
- Solution: Sophisticated format conversion system using Web Audio API
- Implemented efficient error management
- Optimized audio quality maintenance
File Management
- Challenge: Large audio files affecting performance
- Solution: Client-side file size validation (max 5MB)
- Implemented compression using Next.js API routes
- Added proper error handling and retry mechanisms
Real-time Updates
- Challenge: Managing async transcription processes
- Solution: Custom progress tracking system
- Implemented proper status tracking
- Created engaging user feedback system

Performance Optimization

Implemented lazy loading for components
Utilized Next.js server-side rendering
Optimized image and audio processing
Implemented code splitting
Designed efficient state management
Optimized API calls with caching
Added retry logic for failed requests

Learning Outcomes

Deep understanding of audio processing in web browsers
Expertise in Next.js 14 and TypeScript
Experience with Django backend development
Knowledge of AssemblyAI API integration
Skills in modern UI/UX design
Proficiency in performance optimization

Future Improvements

Features
- Speaker diarization
- Real-time transcription
- Multiple language support
- Progressive Web App capabilities
- Batch processing
- Export functionality
Technical
- WebSocket implementation
- Enhanced caching system
- Further performance optimization
- Advanced security features
- Load balancing improvements

Conclusion

Building Textor-AI with Next.js 14, Django, and AssemblyAI has been a valuable learning experience. The combination of these technologies provides a robust foundation for building scalable speech-to-text applications. The Django backend, deployed on PythonAnywhere, proved particularly effective for handling the AssemblyAI integration, while Next.js delivered a smooth and responsive frontend experience with improved performance through server-side rendering.

DEV Community