Video transcription is the process of converting the spoken words in a video into written text. This can be done automatically using speech recognition technology or manually by transcribers. The transcribed text is usually synchronised with the video, allowing viewers to read along as they watch.
There are a variety of use cases for video transcription, including:
- Accessibility: Transcription can make videos more accessible to people with hearing impairments or those who speak different languages.
- Searchability: Transcribed videos can be searched and indexed, making it easier for viewers to find specific information within the video.
- Subtitling: Transcription can be used to create subtitles for videos, making them more accessible to a broader audience.
- Language Learning: Transcription can be used as a tool for language learning, allowing students to read along with native speakers and improve their listening and comprehension skills.
This implies that video transcription is a necessity for improved accessibility. As a result, several SaaS companies, such as Cloudinary, offer a method to transcribe videos smoothly.
Cloudinary, a cloud-based platform for managing and optimising media assets such as images and videos, also offers a video transcription add-on. The add-on converts the audio in our videos to text automatically. It applies an effective AI algorithm to our videos using Google’s Cloud Speech API to generate the best possible speech recognition results. These generated texts can be overlayed on our video as subtitles using Cloudinary video transformations.
In this article, we will build a Nuxt application that uploads a video to Cloudinary and applies the Cloudinary transcription add-on on upload. Then we will use the Cloudinary transformation API to overlay the AI-generated text on the video. Finally, we will make the transformed video downloadable.
Application Repository and Demo
Repo: https://github.com/yhuakim/nuxi-transcription
Demo: https://nuxi-transcription.netlify.app/
Prerequisites
- A Cloudinary free tier account and valid API keys.
- Good knowledge of Nuxt version 3.
Registering the Google AI Video Transcription Addon
After creating a free tier account, enable the Google AI video transcription addon to use the Cloudinary transcription endpoint. We can do this by following the steps below:
Navigate to the addon tab, then click the Google AI video transcription card as shown in the image below:
The preceding step will take us to the screen below, where we can select the Free plan link. This will enable our free plan.
Now, we have successfully registered the Google AI video transcription addon.
Setting Up a Development Environment
To set up a development environment, we will need to run the following commands in the terminal:
npx nuxi init video-transcription && cd video-transcription
This command creates a Nuxt project in a directory called video-transcription
and navigates into the directory.
Next, we need to install the Nuxt3 dependencies by running the following command in the terminal:
yarn install
Installing Dependencies
After initializing a Nuxt project, we will add other dependencies necessary for the current project. These dependencies include:
Cloudinary: A Node.js SDK for media management. We can add this package by entering the following command:
yarn add cloudinary
File saver: A package to download the transcribed video to the storage device. This can be added using the following command:
yarn add file-saver
TailwindCSS: A CSS framework to assist with the page styles. To use this, we can run the following command:
yarn add -D tailwindcss postcss autoprefixer
npx tailwindcss init -p
Then, we can follow the instructions here for a complete guide to set up TailwindCSS in a Nuxt3 application.
Building Out The User Interface
To create the UI of this project, we will split it into two major sections namely:
- The Form handling section: This will handle the user’s input, the video file and the upload to our Cloudinary endpoint.
- The Transcribed video preview: Here, the returned transcribed video is displayed to the user with a download button.
Before we start writing the code, let's adjust our file structure as follows:
- Navigate to the project directory and create a
pages
folder. - Then, open the
app.vue
file and add<NuxtPage />
component as shown below, and then runyarn dev -o
to start the development server.
<template>
<div>
<NuxtPage />
</div>
</template>
Form Handling
Now let’s create a file in the pages
folder called index.vue
then, add the following snippet
<!-- pages/index.vue -->
<template>
<div class="max-w-[60rem] grid justify-center items-center mx-auto py-4">
<header class="text-4xl font-bold text-gray-700 text-center">
<h1>Video Transcription App</h1>
</header>
<main class="py-5">
<section class="py-5 flex">
<form @submit="handleSubmit">
<div class="upload">
<label
for="video-file"
class="block mb-2 text-sm font-medium text-gray-500"
>Upload a Video file</label
>
<input
type="file"
@change="handleChange"
name="video-file"
id="video-file"
class="g-gray-50 border border-gray-300 text-gray-900 text-sm
rounded-lg focus:ring-blue-500 focus:border-blue-500 block w-full p-2.5"
/>
</div>
<button
v-if="selected !== null"
type="submit"
class="text-white bg-gradient-to-r from-blue-500 via-blue-600
to-blue-700 hover:bg-gradient-to-br focus:ring-4 focus:outline-none
focus:ring-blue-300 dark:focus:ring-blue-800 font-medium rounded-lg
text-sm px-5 py-2.5 text-center mr-2 mb-5 mt-2"
>
Upload
</button>
</form>
</section>
</main>
</div>
</template>
The snippet above is a markup for our Form, which consists of an input
element with a type of file
and a button
with a type of submit
. The submit
button is configured to be visible only when a user has selected a valid input.
Next, let’s create the handleSubmit
and handleChange
methods that we have defined in the markup above. Let’s add the following snippet just above our template
markup
<!-- pages/index.vue -->
<script setup>
const selected = useState("selected", () => null);
const videoUrl = useState("videoUrl", () => "");
const downloaded = useState("downloaded", () => false);
const handleChange = (e) => {
if (e.target.files && e.target.files[0]) {
const i = e.target.files[0];
let reader = new FileReader();
reader.onload = () => {
let base64String = reader.result;
selected.value = base64String;
};
reader.readAsDataURL(i);
}
};
const handleSubmit = async (e) => {
e.preventDefault();
try {
const body = JSON.stringify(selected.value);
const config = {
headers: {
"Content-Type": "application/json",
},
};
const { data } = await useFetch(`/transcribe`, {
method: "POST",
headers: config.headers,
body,
});
videoUrl.value = JSON.parse(data.value.data);
} catch (error) {
console.error(error);
}
};
</script>
The snippet above shows how we manage our states, monitor and process the user’s input using the handleChange
method and what to do with the file
once the user submits it. To manage our states we will use the Nuxt provided useState
composable. To learn more about the useState
composable visit here.
In the handleChange
method, we will convert the user-selected video file into a data url format using Javascript’s built-in FileReader()
constructor. Then we will assign it to the selected
state.
Then, in the handleSubmit
method, we will:
- prevent the default action of the onSubmit event by calling the
preventDefault()
function - send a
POST
request to our server route (/transcribe
), with the value of theselected
state as thebody
of the request - now assign the response to our
videoUrl
state
Creating the Server Route
From the above snippet, we are making a POST
request to the /transcribe
route but we have not created this route. To create this route, let’s navigate to the project directory and create a folder named server
. Inside this folder, let’s create another folder called routes
and create a file called transcribe.js
.
In the transcribe.js
file, we will handle two major logic:
- Upload and Transcription: Here we will use the Cloudinary upload API to upload the user-selected video file. Also, we will ask Cloudinary to transcribe this video using the transcription addon.
- Video Transformation: Here we will use the Cloudinary video transformation API to overlay the generated text on the video.
First, let’s add the following snippet to /transcribe.js
:
import { v2 as Cloudinary } from 'cloudinary';
import { parse } from 'path'
export default defineEventHandler(async (event) => {
const { name, path } = await readBody(event)
const filename = parse(name).name
try {
Cloudinary.config({
cloud_name: process.env.CLOUD_NAME,
api_key: process.env.API_KEY,
api_secret: process.env.API_SECRET,
secure: true
});
await Cloudinary.uploader.upload(path,
{
resource_type: "video",
public_id: `demo/${filename}`,
raw_convert: "google_speech:srt:vtt"
},
function (error, result) {
if (result) {
return result
}
});
} catch (error) {
console.log(error);
}
})
In the snippet above, we
- imported the
cloudinary
andpath
packages - created a handler function for our server logic which takes an event parameter.
- destructured the
name
andpath
from the body of our request using Nitro’s built-inreadBody
query and passing in theevent
parameter. - trimmed off the file extension from the user-selected file name using the
parse
method we imported, by passing in the file name and then retrieving only the name attribute. - set up a Cloudinary instance by passing in our
cloud_name
,api_key
andapi_secret
, inside a try-catch statement. - used the
Upload
method provided by the Cloudinary instance to upload the video file located at the path stored in ourpath
variable. Additionally, we will set theraw_convert
option togoogle_speech:srt:vtt
. This will trigger an automatic transcription of the video as it uploads.
Once this process is completed, we have one more task to perform, and that’s to overlay the generated text file on our video. To do this, we will add the following snippet just below our upload method.
// server/routes/transcribe.js
const transcribedVideo = Cloudinary.url(`demo/${filename}`, {
resource_type: "video",
loop: false,
controls: true,
autoplay: true,
fallback_content: "Your browser does not support HTML5 video tags",
transformation: [
{
overlay: {
resource_type: "subtitles",
public_id: `demo/${filename}.en-US.srt`
}
},
{ flags: "layer_apply" }
]
})
return {
statusCode: 200,
data: JSON.stringify(transcribedVideo)
};
In the above snippet, we;
- used the Cloudinary
url
endpoint to retrieve the uploaded video by its name, and then we applied anoverlay
transformation to it. Theoverlay
will be ofresource_type
subtitles, then we will provide the location of our generated text using thepublic_id
option. - returned the
url
generated from our Cloudinary transformation as aJSON
string to the client side with astatusCode
of200
.
Now we have successfully created our server logic and can post and retrieve data from our client side, let’s display our transcribed video to the user.
Displaying The Transcribed Video and Adding a Download Button
Here we will strive to display the transcribed video as returned from our server logic to our users. Additionally, we will provide a download button, so the user can save the video to their local storage.
Now let’s navigate to our pages/index.vue
file and add the following snippet just below the Form section.
<section class="transcribed-preview">
<!-- preview Transcribed video here -->
<div v-if="videoUrl === ''" class="">
Your transcribed video will appear here
</div>
<div v-else class="max-w-[35rem] shadow-2xl">
<video autoplay controls className="mb-5 w-100">
<source
:src="videoUrl ? `${videoUrl}.webm` : ''"
type="video/webm"
/>
<source :src="videoUrl ? `${videoUrl}.mp4` : ''" type="video/mp4" />
<source :src="videoUrl ? `${videoUrl}.ogv` : ''" type="video/ogg" />
</video>
</div>
<button
v-if="videoUrl !== ''"
@click="handleDownload"
class="text-white bg-gradient-to-r from-blue-500 via-blue-600 to-blue-700
hover:bg-gradient-to-br focus:ring-4 focus:outline-none focus:ring-blue-300
dark:focus:ring-blue-800 font-medium rounded-lg text-sm px-5 py-2.5
text-center mr-2 mb-2"
:disabled="downloaded ? true : false"
>
Download Video
</button>
</section>
The above snippet only displays when we have a response from our server. Now let’s handle our download button logic. To do this we will add the following snippet inside our script
element.
import { saveAs } from "file-saver";
const handleDownload = () => {
if (videoUrl.value !== "") {
saveAs(videoUrl.value, "transcribed video");
downloaded.value = true;
}
};
The method above uses the saveAs
method we imported to initiate a download when the button is clicked. The saveAs
method takes in two parameters; the first is the url
of the file and the second is the name of the file that will be saved.
Finally, we will have the following screen once we have successfully executed all these procedures.
Conclusion
This article shows a managed method to implement video transcription with Cloudinary in a Nuxt version 3 application. It exposes the usage of server routes to protect and manage our API keys. Additionally, we explored the use of Nuxt3 reactive state management to manage our application states.
Resources
The following are resources for further reading and guides to set up our next project.
Top comments (0)