DEV Community

Cover image for How to build a Voice-to-Text App with Laravel AI SDK
Matteo Barbero
Matteo Barbero

Posted on

How to build a Voice-to-Text App with Laravel AI SDK

Stop Writing Boilerplate: Build a Voice-to-Text App with Laravel AI SDK

If you are a PHP developer, you have likely looked at AI integration with a mix of excitement and fatigue. Excitement for the possibilities, but fatigue from WRITING THE SAME BOILERPLATE code to handle multipart requests, API keys, and error handling for OpenAI or Anthropic.

The new Laravel AI SDK changes this. It turns complex AI interactions into fluid, Laravel-style syntax.

In this guide, we won't just talk about it—we will build a functional Voice-to-Text application in under 10 minutes. I call mine "Dettami", but you can call yours whatever you want.

Let's write some code.

Prerequisites

  • Laravel 10+ (or the latest version)
  • PHP 8.2+
  • An OpenAI API Key

Step 1: Install the SDK

First, forget about guzzlehttp/guzzle manual calls. Pull in the first-party package.

composer require laravel/ai
Enter fullscreen mode Exit fullscreen mode

Once installed, publish the configuration file. This is where you will define your "drivers" (OpenAI, Gemini, Mistral, etc.).

php artisan install:ai
Enter fullscreen mode Exit fullscreen mode

Add your key to your .env file:

OPENAI_API_KEY=...
Enter fullscreen mode Exit fullscreen mode

Step 2: Define the Route

We need a single endpoint to accept the audio file. In your routes/web.php or routes/api.php:

use App\Http\Controllers\TranscribeAudioController;

Route::post('/transcribe', TranscribeAudioController::class)->name('transcribe');
Enter fullscreen mode Exit fullscreen mode

Step 3: The Logic (Where the Magic Happens)

This is the part that usually hurts. Handling file uploads + external API calls is often messy.

Create your controller:

php artisan make:controller TranscribeAudioController --invokable
Enter fullscreen mode Exit fullscreen mode

Now, look at how clean this solution is using the Transcription facade. We don't need to manually construct a POST request to https://api.openai.com/v1/audio/transcriptions. The SDK does it for us.

<?php

namespace App\Http\Controllers;

use Illuminate\Http\Request;
use Illuminate\Support\Facades\Storage;
use Laravel\AI\Facades\Transcription;

class TranscribeAudioController extends Controller
{
    public function __invoke(Request $request)
    {
        $request->validate([
            'audio' => 'required|file|mimes:webm,mp3,wav'
        ]);

        $file = $request->file('audio');

        // 1. Save the file temporarily
        // We need a physical path to pass to the SDK
        $path = $file->store('temp_recordings');
        $fullPath = Storage::path($path);

        try {
            // 2. Transcribe with one fluent method chain
            // The SDK automatically uses your configured default driver
            $text = Transcription::fromPath($fullPath, $file->getMimeType())
                ->generate();

            // 3. Cleanup
            Storage::delete($path);

            return response()->json([
                'success' => true,
                'text' => (string) $text,
            ]);

        } catch (\Throwable $e) {
            // Log the error and return a user-friendly message
            return response()->json(['error' => 'Transcription failed'], 500);
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Why this matters

Notice line 27: Transcription::fromPath(...).
That is the abstraction we have been waiting for. It removes the cognitive load of remembering specific API parameters for every different provider. Today you use OpenAI, tomorrow you switch to Gemini—the code stays strictly the same.

Step 4: The Frontend (The "Recorder")

For the frontend, you don't need a heavy React app. A simple Blade view with vanilla JS covers it perfectly.

The core requirement is the browser's MediaRecorder API. Here is the pseudo-code for what your JavaScript needs to do:

  1. Request Microphone access (navigator.mediaDevices.getUserMedia).
  2. Record chunks of audio into a Blob.
  3. Send that Blob to our Laravel endpoint via FormData.

Here is a glimpse of the UI I built for this:

Dettami Interface

Summary

We just built a fully functional Voice-to-Text backend in about 20 lines of code.

The Laravel AI SDK respects your time. It handles the "plumbing" so you can focus on building the actual product.

If you want to inspect the full source code for the demo project "Dettami", check the repository here:
GitHub: maiobarbero/dettami

Keep moving forward.

Top comments (0)