This is a submission for the AssemblyAI Voice Agents Challenge
What I Built
Inspired by the need for effortless audio transcription and content creation, I built Swift-Pad, a powerful audio transcription and transformation tool leveraging AssemblyAI's Universal-Streaming technology.
Swift-Pad allows users to easily record or upload audio, transcribe it with remarkable accuracy, and instantly transform those transcriptions into summaries, emails, blogs, quick notes, and more. The goal was simple: remove the hassle from turning audio conversations into actionable content.
This submission addresses the Business Automation Voice Agent prompt with:
- Automated Audio Transcription: Real-time, high-accuracy speech-to-text transcription for business meetings, calls, and interviews.
- AI-driven Content Transformation: Converts transcriptions into summaries, emails, blog posts, lists, and notes instantly.
- Speaker Diarization and Highlight Extraction: Automatically identifies speakers and key moments, enhancing readability and usability.
- Seamless File Management: Securely manages audio files with automatic storage and retrieval through Supabase integration.
- Scalable and Secure User Authentication: Provides robust user authentication and data security through Clerk and Prisma ORM integration.
- Usage Tracking & Rate Limiting: Ensures fair and controlled usage through Upstash Redis integration.
Core Problem Addressed
Professionals, creators, and businesses frequently face the time-consuming task of manually transcribing audio recordings and converting conversations into actionable formats. Swift-Pad automates this workflow, reducing hours of manual effort into minutes, allowing users to quickly capture, manage, and leverage valuable insights from audio content.
Demo
Live Demo: https://swift-pad.vercel.app/
Screenshots:
- All Notes Page
- Recording & Uploading Interface
- Transcription Display & Transformation Options
- Generated Content from Transcription
GitHub Repository
Swift Pad
Swift Pad is a web application for audio transcription and transformation. Record or upload audio files, get accurate transcriptions, and transform them into various formats like summaries, blog posts, emails, and more.
Features
- 🎙️ Audio Recording: Record audio directly in your browser
- 📤 File Upload: Upload existing audio files (MP3, WAV, M4A, etc.)
- 🔤 Accurate Transcription: Powered by AssemblyAI's state-of-the-art speech recognition
- 🌐 Multi-language Support: Transcribe audio in multiple languages
- 🔄 Text Transformations: Convert transcriptions into summaries, blog posts, emails, and more
- 👥 Speaker Diarization: Identify different speakers in conversations
- 📊 Auto Highlights: Automatically extract key moments from transcriptions
- 🔒 Secure Storage: Files stored securely with Supabase Storage
Technology Stack
- Frontend: Next.js 15, React 19, TailwindCSS
- Backend: Next.js API routes, tRPC
- Database: PostgreSQL via Prisma ORM
- Authentication: Clerk Auth
- Storage: Supabase Storage
-
AI Services
- …
Technical Implementation & AssemblyAI Integration
Here's your summarized Technical Implementation & AssemblyAI Integration section styled similarly to the provided example:
Technical Implementation & AssemblyAI Integration
1. Flexible Dual-SDK Architecture
Swift Pad uses a dual-SDK architecture to maximize flexibility with AssemblyAI integration, enabling both direct SDK access and AI SDK compatibility:
// assemblyaiClient.ts
import { createAssemblyAI } from '@ai-sdk/assemblyai';
import { AssemblyAI } from 'assemblyai';
// AI SDK Integration
export const assemblyaiClient = createAssemblyAI({ apiKey });
// Direct AssemblyAI SDK
export function getAssemblyAIClient(apiKey?: string) {
return new AssemblyAI({ apiKey: apiKey || defaultApiKey });
}
This setup supports advanced features, user-provided keys (BYOK), and easy integration with multiple AI providers.
2. Robust Asynchronous Transcription with Intelligent Polling
To handle AssemblyAI’s asynchronous transcription efficiently, Swift Pad implements intelligent polling with error handling and timeout protection:
// whisper.ts
async function pollForCompletion(transcriptId: string) {
let attempts = 0;
while (attempts < 200) {
const transcript = await client.transcripts.get(transcriptId);
if (transcript.status === 'completed') return transcript;
if (transcript.status === 'error') throw new Error(transcript.error);
await new Promise(res => setTimeout(res, 3000));
attempts++;
}
throw new Error('Transcription timeout');
}
This ensures reliable completion tracking and user-friendly error messaging.
3. Enhanced Audio Processing Features
Swift Pad leverages AssemblyAI’s advanced audio processing capabilities through careful parameter configuration:
// whisper.ts
const params = {
audio_url: input.audioUrl,
language_code: input.language || "en",
speaker_labels: true,
auto_highlights: true,
};
This enables speaker identification, key moment extraction, and multilingual transcription.
4. Optimized Browser-based Audio Recording
High-quality audio recording in Swift Pad captures audio directly in-browser optimized for AssemblyAI processing:
// useAudioRecording.ts
const mediaRecorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
mediaRecorder.ondataavailable = e => chunks.push(e.data);
mediaRecorder.onstop = () => setAudioBlob(new Blob(chunks));
This setup provides optimal compatibility and user-friendly controls.
5. Multi-tier Storage & Fallback Strategy
Swift Pad employs a robust storage upload mechanism with multiple fallback methods ensuring reliability:
// useSupabaseUpload.ts
try {
return await uploadFileToS3(file);
} catch {
try {
return await uploadViaServerAPI(file);
} catch {
return await supabase.storage.from(bucket).upload(path, file);
}
}
This multi-tier approach ensures uploads succeed under various conditions.
6. End-to-End Workflow Integration
Swift Pad seamlessly integrates recording, storage, transcription, and display into a unified workflow:
// RecordingModal.tsx
const url = await uploadToSupabase(audioBlob);
const result = await transcribeFromS3.mutateAsync({ audioUrl: url });
router.push(`/whispers/${result.id}`);
Users experience effortless audio-to-text transformations from start to finish.
7. Integration Architecture Overview
Complete AssemblyAI integration workflow:
Recording/Upload → Supabase Storage → AssemblyAI Transcription
↓ ↓
User Interface ← Database (Prisma) ← Google Gemini (Title & Transformation)
This architecture ensures clarity, scalability, and optimal user experience.
Top comments (0)