DEV Community

Cover image for SwiftPad: AI Powered Transcription with AssemblyAI
Imisioluwa Elijah
Imisioluwa Elijah Subscriber

Posted on

SwiftPad: AI Powered Transcription with AssemblyAI

AssemblyAI Voice Agents Challenge: Business Automation

This is a submission for the AssemblyAI Voice Agents Challenge

What I Built

Inspired by the need for effortless audio transcription and content creation, I built Swift-Pad, a powerful audio transcription and transformation tool leveraging AssemblyAI's Universal-Streaming technology.

Swift-Pad allows users to easily record or upload audio, transcribe it with remarkable accuracy, and instantly transform those transcriptions into summaries, emails, blogs, quick notes, and more. The goal was simple: remove the hassle from turning audio conversations into actionable content.

This submission addresses the Business Automation Voice Agent prompt with:

  • Automated Audio Transcription: Real-time, high-accuracy speech-to-text transcription for business meetings, calls, and interviews.
  • AI-driven Content Transformation: Converts transcriptions into summaries, emails, blog posts, lists, and notes instantly.
  • Speaker Diarization and Highlight Extraction: Automatically identifies speakers and key moments, enhancing readability and usability.
  • Seamless File Management: Securely manages audio files with automatic storage and retrieval through Supabase integration.
  • Scalable and Secure User Authentication: Provides robust user authentication and data security through Clerk and Prisma ORM integration.
  • Usage Tracking & Rate Limiting: Ensures fair and controlled usage through Upstash Redis integration.

Core Problem Addressed

Professionals, creators, and businesses frequently face the time-consuming task of manually transcribing audio recordings and converting conversations into actionable formats. Swift-Pad automates this workflow, reducing hours of manual effort into minutes, allowing users to quickly capture, manage, and leverage valuable insights from audio content.

Demo

Live Demo: https://swift-pad.vercel.app/

Screenshots:

  • All Notes Page

All Notes Page

  • Recording & Uploading Interface

Recording & Uploading Interface

Recording & Uploading Interface

  • Transcription Display & Transformation Options

Transcription Display

Transformation Options

  • Generated Content from Transcription

Generated Content

GitHub Repository

Swift Pad

Swift Pad is a web application for audio transcription and transformation. Record or upload audio files, get accurate transcriptions, and transform them into various formats like summaries, blog posts, emails, and more.

Features

  • 🎙️ Audio Recording: Record audio directly in your browser
  • 📤 File Upload: Upload existing audio files (MP3, WAV, M4A, etc.)
  • 🔤 Accurate Transcription: Powered by AssemblyAI's state-of-the-art speech recognition
  • 🌐 Multi-language Support: Transcribe audio in multiple languages
  • 🔄 Text Transformations: Convert transcriptions into summaries, blog posts, emails, and more
  • 👥 Speaker Diarization: Identify different speakers in conversations
  • 📊 Auto Highlights: Automatically extract key moments from transcriptions
  • 🔒 Secure Storage: Files stored securely with Supabase Storage

Technology Stack

  • Frontend: Next.js 15, React 19, TailwindCSS
  • Backend: Next.js API routes, tRPC
  • Database: PostgreSQL via Prisma ORM
  • Authentication: Clerk Auth
  • Storage: Supabase Storage
  • AI Services

Technical Implementation & AssemblyAI Integration

Here's your summarized Technical Implementation & AssemblyAI Integration section styled similarly to the provided example:


Technical Implementation & AssemblyAI Integration

1. Flexible Dual-SDK Architecture

Swift Pad uses a dual-SDK architecture to maximize flexibility with AssemblyAI integration, enabling both direct SDK access and AI SDK compatibility:

// assemblyaiClient.ts
import { createAssemblyAI } from '@ai-sdk/assemblyai';
import { AssemblyAI } from 'assemblyai';

// AI SDK Integration
export const assemblyaiClient = createAssemblyAI({ apiKey });

// Direct AssemblyAI SDK
export function getAssemblyAIClient(apiKey?: string) {
  return new AssemblyAI({ apiKey: apiKey || defaultApiKey });
}
Enter fullscreen mode Exit fullscreen mode

This setup supports advanced features, user-provided keys (BYOK), and easy integration with multiple AI providers.

2. Robust Asynchronous Transcription with Intelligent Polling

To handle AssemblyAI’s asynchronous transcription efficiently, Swift Pad implements intelligent polling with error handling and timeout protection:

// whisper.ts
async function pollForCompletion(transcriptId: string) {
  let attempts = 0;
  while (attempts < 200) {
    const transcript = await client.transcripts.get(transcriptId);
    if (transcript.status === 'completed') return transcript;
    if (transcript.status === 'error') throw new Error(transcript.error);
    await new Promise(res => setTimeout(res, 3000));
    attempts++;
  }
  throw new Error('Transcription timeout');
}
Enter fullscreen mode Exit fullscreen mode

This ensures reliable completion tracking and user-friendly error messaging.

3. Enhanced Audio Processing Features

Swift Pad leverages AssemblyAI’s advanced audio processing capabilities through careful parameter configuration:

// whisper.ts
const params = {
  audio_url: input.audioUrl,
  language_code: input.language || "en",
  speaker_labels: true,
  auto_highlights: true,
};
Enter fullscreen mode Exit fullscreen mode

This enables speaker identification, key moment extraction, and multilingual transcription.

4. Optimized Browser-based Audio Recording

High-quality audio recording in Swift Pad captures audio directly in-browser optimized for AssemblyAI processing:

// useAudioRecording.ts
const mediaRecorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
mediaRecorder.ondataavailable = e => chunks.push(e.data);
mediaRecorder.onstop = () => setAudioBlob(new Blob(chunks));
Enter fullscreen mode Exit fullscreen mode

This setup provides optimal compatibility and user-friendly controls.

5. Multi-tier Storage & Fallback Strategy

Swift Pad employs a robust storage upload mechanism with multiple fallback methods ensuring reliability:

// useSupabaseUpload.ts
try {
  return await uploadFileToS3(file);
} catch {
  try {
    return await uploadViaServerAPI(file);
  } catch {
    return await supabase.storage.from(bucket).upload(path, file);
  }
}
Enter fullscreen mode Exit fullscreen mode

This multi-tier approach ensures uploads succeed under various conditions.

6. End-to-End Workflow Integration

Swift Pad seamlessly integrates recording, storage, transcription, and display into a unified workflow:

// RecordingModal.tsx
const url = await uploadToSupabase(audioBlob);
const result = await transcribeFromS3.mutateAsync({ audioUrl: url });
router.push(`/whispers/${result.id}`);
Enter fullscreen mode Exit fullscreen mode

Users experience effortless audio-to-text transformations from start to finish.

7. Integration Architecture Overview

Complete AssemblyAI integration workflow:

Recording/Upload → Supabase Storage → AssemblyAI Transcription
      ↓                                       ↓
 User Interface ← Database (Prisma) ← Google Gemini (Title & Transformation)
Enter fullscreen mode Exit fullscreen mode

This architecture ensures clarity, scalability, and optimal user experience.

Top comments (0)