DEV Community

Cover image for Automate videos transcription with Koyeb Serverless Engine
Edouard Bonlieu for Koyeb

Posted on • Originally published at koyeb.com

Automate videos transcription with Koyeb Serverless Engine

Introduction

This guide showcases how to deploy a video transcription service. We will use the Google Cloud Video Intelligence API to transform videos to text (speed-to-text) and the Koyeb Serverless Engine to handle your video files and orchestrate the processing.

Once you have completed this tutorial, you will be able to upload your videos via the Koyeb S3-compatible API which will trigger a function generating a video speech transcription file each time a new video is uploaded.

You can then use the speech transcription file to:

  • Index the results in a database and make your videos searchable
  • Automatically generate subtitles for your videos
  • Moderate videos based on their content

And many other use-cases.

In this guide, we use a Koyeb Managed Store to store our videos and the generated video speech transcription file. You can also connect your own Cloud Storage Provider to integrate with your existing infrastructure and data with minimal effort.
Learn how to connect your Cloud Storage Provider here.

Requirements

To successfully follow and implement this tutorial, you need:

Steps

To build a video transcription service using Koyeb and the GCP Video Intelligence API there are four steps:

  1. Create a Koyeb Store to Upload videos & Retrieve the video speech transcription file
  2. Create a Koyeb Secret to store your Google Cloud Service configuration
  3. Create a Stack and deploy the video transcription function
  4. Upload a video and retrieve the video transcription file

Create a Koyeb Store to Upload videos & Retrieve the video speech transcription file

The first step is to create a Koyeb Store to store our videos and the transcription files generated. Koyeb Stores provides an S3-compatible API allowing you to manage your data programmatically using any S3-compatible SDKs and tools.

To create a new Koyeb Managed Store with the CLI, in your terminal, type:

koyeb create store -f new-store.yaml
Enter fullscreen mode Exit fullscreen mode

Where the content of new-store.yaml is:

name: my-store-01
type: koyeb
Enter fullscreen mode Exit fullscreen mode

From there, you have a Koyeb Store up and running. You can interact with your store using any S3 compatible SDKs and tools.

Configure S3cmd

The next step is to configure S3cmd to interact with the Store, so we can upload videos and retrieve video audio transcription files from there.

In the Koyeb Control Panel, click API in the left side menu and click New in the S3 credentials section. A modal appears to create a new S3 credential. Enter the name and a description (optional) to identify and remember what this credential is used for.

Click the Submit button. Save the access_key and secret_key generated in a secure place. Once the modal closed, you will not able to see them again.

Create an S3cmd in your home repository and replace the value REPLACE_ME with the credentials you previously generated.

[default]
access_key = REPLACE_ME
secret_key = REPLACE_ME
bucket_location = US
check_ssl_certificate = True
check_ssl_hostname = True
default_mime_type = binary/octet-stream
delay_updates = False
delete_after = False
delete_after_fetch = False
delete_removed = False
dry_run = False
enable_multipart = True
encoding = UTF-8
encrypt = False
follow_symlinks = False
force = False
get_continue = False
guess_mime_type = True
host_base = s3.eu-west-1.prod.koyeb.com
host_bucket = %(bucket)s.s3.eu-west-1.prod.koyeb.com
human_readable_sizes = False
invalidate_default_index_on_cf = False
invalidate_default_index_root_on_cf = True
invalidate_on_cf = False
limit = -1
limitrate = 0
list_md5 = False
long_listing = False
max_delete = -1
multipart_chunk_size_mb = 15
multipart_max_chunks = 10000
preserve_attrs = True
progress_meter = True
put_continue = False
recursive = False
recv_chunk = 65536
reduced_redundancy = False
requester_pays = False
restore_days = 1
restore_priority = Standard
send_chunk = 65536
server_side_encryption = False
signature_v2 = False
signurl_use_https = False
skip_existing = False
socket_timeout = 300
stats = False
stop_on_error = False
throttle_max = 100
urlencoding_mode = normal
use_https = True
use_mime_magic = True
verbosity = WARNING
Enter fullscreen mode Exit fullscreen mode

To check the configuration is working fine, in the terminal type:

s3cmd -c ~/.s3cfg-gcp  ls
2020-10-28 09:15  s3://my-store-01
Enter fullscreen mode Exit fullscreen mode

You should see the Store you previously created.

Create a Koyeb Secret to store your Google Cloud Service Account configuration

Create a Koyeb Secret to securely store your GCP Service Account configuration. Koyeb Secrets allow you to access API credentials, tokens, etc. securely in your configuration and functions without having to expose them.

Create a secret.yaml file and replace the value with your GCP Service Account configuration.

name: gcp-sa-vi
value: |
  {...}
Enter fullscreen mode Exit fullscreen mode
koyeb create secrets -f secret.yaml
Enter fullscreen mode Exit fullscreen mode

Create a Stack and deploy the video transcription function

Our Store is configured and ready-to-use. The next step is to deploy our processing function to perform the video speech transcription.
We will use the Koyeb Catalog App to perform the processing as it allows you to perform this operation without writing a single line of code.

In the terminal, start by creating a new Stack. Stacks are processing environments containing code and containers.

koyeb create stack -n video-transcription
Enter fullscreen mode Exit fullscreen mode

With our Stack created, we can configure and deploy the video speech transcription app. Create a file containing our function configuration video-transcription.yaml:

functions:
  - name: gcp-video-intelligence
    use: gcp-video-intelligence@1.0.1
    with:
      STORE: my-store-01 #The store to watch to trigger the function and save the GCP Video intelligence result. This parameter is required.
      GCP_KEY: my-gcp-secret #The name of the secret in which the GCP service account will be stored. This parameter is required.
      VIDEO_INTELLIGENCE_FEATURE: SPEECH_TRANSCRIPTION
Enter fullscreen mode Exit fullscreen mode

Deploy the function by running:

koyeb create revision video-transcription -f video-transcription.yaml`
Enter fullscreen mode Exit fullscreen mode

This deploys the function into our Stack. Now, each time a video is uploaded to the Store my-store-01, the function will be triggered and a video speech transcription file will be generated.

Upload a video and retrieve the video transcription file

With our processing stack ready, we can now check everything is running fine and that for each video uploaded, a video speech transcription file is generated.

To upload a video using S3cmd, in the terminal type:

s3cmd put /path/to/video.mp4 s3://my-store-01
Enter fullscreen mode Exit fullscreen mode

Now, if you type koyeb logs stack-events video-transcription you see an event appears that triggers your functions. This event is then used in your function to retrieve the video file and perform the speech transcription. You can follow the function execution running: koyeb logs functions video-transcription gcp-video-intelligence.

Once the execution done, you can retrieve the speech-transcription file running:

s3cmd get s3://my-store-01/gcp-video-intelligence-SPEECH_TRANSCRIPTION-[...].json
Enter fullscreen mode Exit fullscreen mode

This file contains the result of the processing function with the detected text in the video:

"results": [
    {
        "alternatives": [
            {
                "transcript": "Hey, I'm John...",
                "confidence": 0.7477226853370667,
                "words": [
                    {
                        "startTime": {
                            "nanos": 500000000
                        },
                        "endTime": {
                            "nanos": 700000000
                        },
                        "word": "Hey,"
                    },
                    {
                        "startTime": {
                            "nanos": 700000000
                        },
                        "endTime": {
                            "nanos": 900000000
                        },
                        "word": "I'm"
                    },
                    ...
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this guide, we discovered how to deploy a video transcription service using Google Video Intelligence API and the Koyeb Serverless Engine.
We used S3cmd to upload and retrieve video but you can also use any S3 compatible SDKs and tools.

The catalog integrations code used in this guide is available on GitHub.

If you have any questions about this tutorial, feel free to reach out to us on the Koyeb Slack Community.

Top comments (0)