DEV Community

Play Button Pause Button
Joel Gee Roy
Joel Gee Roy

Posted on

Human-like AI Conversations: Giving a Voice to ChatGPT with Murf

In November of 2022, ChatGPT took the world by storm. Yet it was missing one thing - a voice. In this post, I'll talk about how you can add a voice to your GPT responses using Murf's text-to-speech API.

Murf AI is a text-to-speech platform that lets you generate lifelike AI voiceovers from text in minutes. Apart from its online studio, Murf also offers an API that you can use to integrate Murf's versatile voice generation capabilities into your business logic. Adding a voice to your ChatGPT responses opens up a whole world of possibilities and lets you offer more features to your users.

As seen in the video, we will build a web application that accepts a prompt from the user, generate a ChatGPT response, and then narrate that answer. The context of the conversation is maintained, and ChatGPT will answer accordingly. Behind the scenes, we will send the conversation to ChatGPT to generate a response and then send that response to Murf to get the voiceover.

Link to final code: Github Repo

Getting access to the APIs

Before we can start building the app, we need access to the APIs of both OpenAI and Murf. OpenAI offers free credits, which are valid for 3 months. Create a free account on their website, and you will get API keys which you can use to call OpenAI APIs.

To access Murf API, register your interest on Murf text to speech API's landing page. Murf's team will contact you, and you will receive your API keys. Check out Murf API docs to learn more.

Setting up the web application

We'll be using React, Vite, and TypeScript for the web application. We will need the openai npm package as well as axios as dependencies.

npm create vite@latest chatgpt-murf --template react-ts
npm install openai axios
Enter fullscreen mode Exit fullscreen mode

We can now create a .env file in the root of our project to store the API keys.

VITE_OPENAI_API_KEY=YOUR_OPENAI_KEY
VITE_MURF_API_KEY=YOUR_MURF_KEY
Enter fullscreen mode Exit fullscreen mode

Integrating ChatGPT API

The openai npm package gives us a straightforward way to access the ChatGPT API. We will create a generateChatGPTResponse function that we can invoke to generate a ChatGPT response easily.

import { Configuration, OpenAIApi } from "openai";

export type ChatGPTMessage = {
  role: "system" | "user" | "assistant";
  content: string;
};

type GenerateChatGPTResponseBody = {
  messages: ChatGPTMessage[];
};

const configuration = new Configuration({
  apiKey: import.meta.env.VITE_OPENAI_API_KEY as string,
});

const openai = new OpenAIApi(configuration);

export async function generateChatGPTResponse({
  messages,
}: GenerateChatGPTResponseBody) {
  const completion = await openai.createChatCompletion({
    model: "gpt-3.5-turbo",
    messages: messages,
    max_tokens: 1000,
  });

  return completion;
}
Enter fullscreen mode Exit fullscreen mode

The API key stored in our .env file is used to set up the configuration of OpenAI API. The createChatCompletion method of the openai package is used to send the request. It requires the following parameters:

  • model: The model that should be used to generate the responses. We will choose the new gpt-3.5-turbo model, which powers ChatGPT and is OpenAI's most advanced language model (as of March 2023).

  • messages: OpenAI's ChatGPT dashboard gives us a good idea of how it works - the user types in prompts in the input field, and ChatGPT gives them an answer relevant to the entire conversation it had with the user. To keep the context intact, ChatGPT requires the whole conversation it had with the user to be sent with every new request. So messages is an array of objects which stores this conversation. It looks something like this:

messages = [
   {"role": "user", "content": "Which is the largest planet in the solar system?"},
   {"role": "assistant", "content": "Jupiter is the largest planet in the Solar system"},
   {"role": "user", "content": "How big is it compared to earth?"}
]
Enter fullscreen mode Exit fullscreen mode
  • max_tokens: This optional parameter specifies the maximum number of tokens allowed for the generated answer.

The ChatGPT API will give a response relevant to the messages array that we sent. The response will have the following format (example from OpenAI Docs):

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 9,
    "completion_tokens": 12,
    "total_tokens": 21
  }
}
Enter fullscreen mode Exit fullscreen mode

More information can be found in OpenAI docs and API Reference.

Integrating Murf API

Murf provides multiple REST API endpoints that you can use to perform actions relating to voiceover generation. For the scope of this project, we just need the /speech/generate-with-key endpoint of Murf API. We will use axios to access the REST endpoint.

import axios from "axios";

export type GenerateSpeechInput = {
  voiceId: string;
  text: string;
  format: string;
};

export const api = axios.create({
  baseURL: "https://api.murf.ai/v1",
});

export async function generateSpeechWithKey(data: GenerateSpeechInput) {
  return api.post("/speech/generate-with-key", data, {
    headers: {
      "api-key": import.meta.env.VITE_MURF_API_KEY
      "Content-Type": "application/json",
    },
  });
}

Enter fullscreen mode Exit fullscreen mode

The Murf API key is passed as a header to the request. The /speech/generate-with-key endpoint requires the following parameters:

  • voiceId: The voice you want to use to generate the voiceover. Use the /speech/voices endpoint of Murf API to access the complete list of voices you can use. Some examples are: "en-US-marcus" and "en-UK-gabriel".

  • text: The text that you want to convert to speech.

  • format: The format of the output audio file. This can be MP3, WAV, etc.

Once a request is sent, Murf will generate the voiceover with the given text using the voice specified. A response in the following format will be received.

{
 "audioFile": "string",
 "encodedAudio": "string",
 "audioLengthInSeconds": 0,
 "wordDurations": [
    {
        "word": "string",
        "startMs": 0,
        "endMs": 0
    }
 ],
 "consumedCharacterCount": 0,
 "remainingCharacterCount": 0
}
Enter fullscreen mode Exit fullscreen mode

The audioFile key will contain the URL to the generated audio file.

Building the UI

Now that we have functions to access both OpenAI and Murf let's build the UI that will invoke these APIs. Let's take a look at what we're making:

Image description

To keep things simple, we will create just 2 react components - GPTQueryItem and GPTForm.

type GPTQueryItemProps = {
  question: string;
  gptResponseId: number;
  handleAudioPlay: (gptResponseId: number) => void;
  currentPlayingAudioId: number | undefined;
};

function GPTQueryItem({
  question,
  gptResponseId,
  handleAudioPlay,
  currentPlayingAudioId,
}: GPTQueryItemProps) {
  return (
    <div className="gpt-query-item">
      <div className="gpt-query-item__user">
        <span className="material-symbols-outlined gpt-query-item__user__icon">
          face
        </span>
        <div className="gpt-query-item__user__question">{question}</div>
      </div>
      <div className="gpt-query-item__gpt">
        <span className="material-symbols-outlined gpt-query-item__gpt__icon">
          smart_toy
        </span>
        <button
          className="gpt-query-item__gpt__audio-button"
          onClick={() => handleAudioPlay(gptResponseId)}
        >
          {currentPlayingAudioId === gptResponseId ? (
            <span className="material-symbols-outlined gpt-query-item__gpt__audio-button__icon">
              pause
            </span>
          ) : (
            <span className="material-symbols-outlined gpt-query-item__gpt__audio-button__icon">
              play_arrow
            </span>
          )}
          <div className="gpt-query-item__gpt__audio-button__text">
            {currentPlayingAudioId === gptResponseId ? "Pause" : "Play"}
          </div>
        </button>
      </div>
    </div>
  );
}

export default GPTQueryItem;
Enter fullscreen mode Exit fullscreen mode

GPTQueryItem will contain the individual GPT Query cards we saw in the UI. It will include a question from the user and an audio play button that will play ChatGPT's answer to it.

import { useState, useRef } from "react";
import { generateChatGPTResponse, ChatGPTMessage } from "../../api/OpenAiAPI";
import { generateSpeechWithKey, GenerateSpeechInput } from "../../api/MurfAPI";
import GPTQueryItem from "../GPTQueryItem";
import Spinner from "../Spinner";

enum STATUS_TYPES {
  IDLE,
  LOADING,
}

type GPTQuery = {
  question: string;
  gptResponse: string;
  gptResponseAudioUrl: string;
};

function GPTForm() {
  const [prompt, setPrompt] = useState("");
  const [status, setStatus] = useState(STATUS_TYPES.IDLE);
  const [gptQueries, setGptQueries] = useState([] as GPTQuery[]);
  const [currentAudioId, setCurrentAudioId] = useState(
    undefined as number | undefined
  );

  const audioElementRef = useRef<HTMLAudioElement>(null);

  function handlePromptChange(e: React.ChangeEvent<HTMLInputElement>) {
    setPrompt(e.target.value);
  }

  async function handlePromptSubmit() {
    if (!prompt || prompt.trim() === "") return;

    setStatus(STATUS_TYPES.LOADING);

    const messagesList = [] as ChatGPTMessage[];
    gptQueries.forEach((query) => {
      messagesList.push(
        {
          role: "user",
          content: query.question,
        },
        {
          role: "assistant",
          content: query.gptResponse,
        }
      );
    });
    messagesList.push({
      role: "user",
      content: prompt,
    });

    const gptRes = await generateChatGPTResponse({ messages: messagesList });

    const murfPayload: GenerateSpeechInput = {
      voiceId: "en-US-marcus",
      format: "MP3",
      text: gptRes.data.choices[0].message?.content as string,
    };

    const murfAudio = await generateSpeechWithKey(murfPayload);

    setGptQueries([
      ...gptQueries,
      {
        question: prompt,
        gptResponse: gptRes.data.choices[0].message?.content as string,
        gptResponseAudioUrl: murfAudio.data.audioFile,
      },
    ]);

    setPrompt("");

    setStatus(STATUS_TYPES.IDLE);
  }

  function handleAudioPlay(audioIdToPlay: number) {
    if (currentAudioId === audioIdToPlay) {
      audioElementRef.current?.pause();
      setCurrentAudioId(undefined);
    } else {
      setCurrentAudioId(audioIdToPlay);
      audioElementRef.current?.play();
    }
  }

  function handleOnAudioPlayEnd() {
    setCurrentAudioId(undefined);
  }

  return (
    <div className="gpt-form">
      <div className="gpt-form__heading">ChatGPT + Murf</div>
      <div className="gpt-form__queries">
        {gptQueries.length > 0 ? (
          gptQueries?.map((gptQuery, index) => {
            return (
              <GPTQueryItem
                key={`${gptQuery.question} + ${index}`}
                question={gptQuery.question}
                gptResponseId={index}
                handleAudioPlay={handleAudioPlay}
                currentPlayingAudioId={currentAudioId ?? undefined}
              />
            );
          })
        ) : (
          <div className="gpt-form__queries__placeholder">
            Start by asking a question
          </div>
        )}
      </div>
      <div className="gpt-form__user-input">
        <input
          onChange={handlePromptChange}
          value={prompt}
          disabled={status === STATUS_TYPES.LOADING}
          className="gpt-form__user-input__actual"
        />
        <button
          onClick={handlePromptSubmit}
          disabled={status === STATUS_TYPES.LOADING}
          className="gpt-form__user-input__button"
        >
          {status === STATUS_TYPES.LOADING ? (
            <Spinner />
          ) : (
            <span className="material-symbols-outlined gpt-form__user-input__button__icon">
              send
            </span>
          )}
        </button>
      </div>
      <audio
        autoPlay
        ref={audioElementRef}
        onEnded={handleOnAudioPlayEnd}
        src={
          currentAudioId !== undefined
            ? gptQueries[currentAudioId].gptResponseAudioUrl
            : undefined
        }
      />
    </div>
  );
}

export default GPTForm;
Enter fullscreen mode Exit fullscreen mode

The GPTForm component will contain a list of GPTQueryItems and an input field for the user to ask new queries.

The latest prompt the user typed in the input field is stored in the prompt state variable. gptQueries is an array of objects that holds the conversation between the user and ChatGPT. As indicated by the GPTQuery type, an object in the gptQueries array will have the question asked by the user, ChatGPT's response to it, and the URL to the audio generated by Murf for that response. In the UI, we iterate over the gptQueries array and for each item, we display a corresponding GPTQueryItem.

We will integrate the generateChatGPTResponse and generateSpeechWithKey functions in the handlePromptSubmit function.

When the user submits a new prompt, the handlePromptSubmt function is invoked, and we iterate through the gptQueries state variable to populate the messageList variable. After that iteration, we push the user's latest prompt into the messageList. Now messageList contains the entire conversation and the user's new prompt.

The generateChatGPTResponse function is then invoked with messageList sent as the value for the messages parameter. The response from generateChatGPTResponse is used to generate the payload for generateSpeechWithKey. The voiceId is hardcoded to "en-US-marcus"; You can use a different Murf voice if needed. Once the Murf API returns a response, the new query item - which includes the user question, the ChatGPT response, and the murf audio - is pushed to the gptQueries array.

Now add a wee bit of styling, and you've got yourself a ChatGPT bot that speaks.

Conclusion

In this post, we saw how to build a basic chatbot using ChatGPT and Murf API. But the possibilities don't end there. If we use OpenAI's Whisper API, which is a speech-to-text service, we can convert this app to a full-fledged Alexa-like voice assistant (maybe the topic for another blog post).

ChatGPT is not the only integration you can combine with Murf API; You can connect the API to any business logic that can invoke REST APIs. You can also try different Murf voices and see which works best for your use case. Register your interest in Murf API and start building today!

Additional Resources

  1. OpenAI Docs
  2. Murf API Docs
  3. Project Github Repo
  4. My twitter profile ๐Ÿฅน

Oldest comments (6)

Collapse
 
nithinshibu profile image
nithinshibu

Really appreciate you for providing a detailed explanation.โ™ฅ๏ธ๐Ÿค

Collapse
 
being_rozario profile image
Rozario

Nice ๐Ÿ”ฅ

Collapse
 
jrewlogic profile image
JetaunBrown

Dope!

Collapse
 
celson_campos_9f33b4aa2ef profile image
Celson Campos

Cool ๐Ÿ˜Ž

Collapse
 
corners2wall profile image
Corners 2 Wall

Good job

Collapse
 
abdelkarim_oufkir_cf60d4a profile image
ABDELKARIM OUFKIR

Thatโ€™s awesome