Creating a Text-to-Speech AI Agent in JavaScript using OpenAI API

#webdev #programming #beginners #javascript

Introduction

Have you ever wanted to convert text into speech using AI? OpenAI’s Text-to-Speech (TTS) API allows developers to generate high-quality speech from text. In this blog, we will build a simple AI-powered TTS agent in JavaScript using OpenAI's API. By the end, you'll have a working program that converts any text into speech and plays it back.

Prerequisites

Before we begin, ensure you have the following:

Node.js installed (Download here)
An OpenAI API Key (Get it here)
Basic knowledge of JavaScript

Step 1: Install DependenciesWe will use axios to interact with

OpenAI’s API and play-sound to play the generated audio.

npm install axios play-sound

Step 2: Writing the TTS Function

We will create a function that:

Sends a request to OpenAI’s TTS API
Saves the generated audio
Plays the audio file

const axios = require('axios');
const player = require('play-sound')();
const fs = require('fs');

const OPENAI_API_KEY = 'your-api-key';

async function textToSpeech(text) {
    try {
        const response = await axios.post(
            'https://api.openai.com/v1/audio/speech',
            {
                model: 'tts-1',
                input: text,
                voice: 'alloy',
            },
            {
                headers: {
                    'Authorization': `Bearer ${OPENAI_API_KEY}`,
                    'Content-Type': 'application/json'
                },
                responseType: 'arraybuffer'
            }
        );

        const filePath = 'output.mp3';
        fs.writeFileSync(filePath, response.data);
        console.log('Playing audio...');
        player.play(filePath);
    } catch (error) {
        console.error('Error:', error.response ? error.response.data : error.message);
    }
}

textToSpeech("Hello, this is an AI-generated voice!");

Step 3: Running the Script

Save the file as tts.js and run it using:

node tts.js

Learn how to create image analysis with the Google Cloud Vision API.

Customization

Change the Voice: OpenAI provides multiple voices like alloy, echo, fable, etc. Try different voices!
Integrate into a Web App: Use this in a frontend React/Next.js project by calling the API via a backend.

Conclusion

With just a few lines of JavaScript, we have successfully built a powerful AI-powered text-to-speech agent. Whether for accessibility, automation, or just for fun, AI-driven voice synthesis is a game-changer. Try it out and enhance your projects with realistic AI voices!