Evan Lin

Posted on Jan 15 • Originally published at evanlin.com on Jan 15

Using GCP STT and LINE Video Flex Messages to Build a Chatbot for Seniors that Translates Videos (with Automatic Text Output)

#ai #api #cloud #tutorial

Preface:

Hello everyone, I am Evan Lin, a senior technical promotion engineer at LINE Taiwan. Flex Message Update 3 was announced recently, and in addition to bringing more diverse customization items. The most eye-catching thing is that the Flex Message including a video can display video components in Flex Message. This article will use a case that is often seen when the elderly use LINE to communicate with their families, and demonstrate how to use STT technology to make Flex Message display more and more useful information to help you.

Pain points of the problem to be solved

I don't know if everyone has similar problems. Some elders or friends may not be convenient to type, and often communicate through voice messages. But in many cases, it is actually inconvenient to turn on the phone volume to "listen to messages" or "watch voice videos", such as on the MRT or during a meeting. At this time, I often think about whether there is a convenient service that can upload videos (or voice files). Through STT (Speech-To-Text) technology to translate the voice in the video into text, and through the new format of Flex Message Update 3 released by LINE this time, so that users can see your message content faster without opening the video content.

How to use? How to set up? :

Open source code:

Without further ado, let's take a look at the code.

Open source code: https://github.com/kkdai/linebot-video-gcp

How to quickly set up on your own platform:

Please follow the following process to directly set up a LINE Bot and the combination of GCP STT and GCS (Google Cloud Storage) through the open source code.

Preparation

LINE Developers Account
- Go to LINE Developers Console and log in through your LINE account.
- Open a LINE Developers account
- Open an official account Message API Channel and get channel secret and channel access token. Please refer to this tutorial.
Need a free Heroku account.
Need a paid Google Cloud Platform account, and get the GCP JSON key file. For more detailed process, please refer to Cloud Storage client libraries.

Start deployment

Please go to the code: https://github.com/kkdai/linebot-video-gcp
Press: "Deploy" to deploy to your Heroku account
In addition to filling in the App Name, the following parameters must be filled in to run completely.
Please go to the LINE official account platform, and in the "Settings" in the upper right corner, select "Account Settings"
1. Set up your official account basic information and open the group joining function.
3. Go to the response settings and change the following settings:
5. Change the response mode to "Chatbot"
6. Disable "Automatic response message"
7. Enable "Webhook"
8. Go to the Messaging API option and fill in the Webhook URL: `: https://{YOUR_HEROKU_SERVER_ID}.herokuapp.com/callback
9. Regarding the process of quick deployment, you can refer to the video in another article:
10. How to deploy LINE BotTemplate
11. Hoe to modify your LINE BotTemplate code

How to use:

Open the chatbot
Send a video message
After a second or two of judgment time, you will receive the Flex Message of the result

Results

Development process record:

Here, through some development processes, to share with you how to complete such development in the LINE chatbot:

How to develop Golang App on Heroku

In the past, if you wanted to use Google Cloud related APIs (for example: Google Cloud Storage) to access APIs, you could only operate through JSON files. But if you want to put it on Heroku, you must put it on GitHub, and at this time, it is easy to accidentally put the JSON file and be stolen by the robot. Through environment variables and Golang Buildpack, you can help you deploy Herokuu projects in a safe and secure state, and you can also open source to Github. Welcome to refer to this article.

Here is also a simple test code to confirm whether the connection to GCP is correctly connected.

How to capture video or sound files in LINE chat conversations through the official account (Official Account)

First of all, when the official account receives an image message (image), a video message (video) or a sound message (audio), it usually goes through the following Message Hook.

Taking the Video Message Webhook as an example, you will see the following example (from LINE Dev Doc)

If the video provider of the Video Message is an external service, you may also be able to get the file link.
But if you upload a video to the official account, you may find that the official account cannot directly access the file content. You need to get the relevant information through the Get content API.

The following provides an example of Golang code to demonstrate how to get the video file content uploaded to the LINE platform:

`
for _, event := range events {
if event.Type == linebot.EventTypeMessage {
switch message := event.Message.(type) {
.......
case *linebot.VideoMessage:
//Get Video binary from LINE server based on message ID.
content, err := bot.GetMessageContent(message.ID).Do()
if err != nil {
log.Println("Got GetMessageContent err:", err)
}
defer content.Content.Close()

In this way, you can get the IO Reader content of the file, and you can use the content IO Reader to download the file, or directly upload it to Google Cloud Storage.

How to upload files to Google Cloud Storage

First look at the code, the following things need to pay attention to:

You need a cancel context to avoid the upload time being too long, causing API Error.
According to the previous file example, you can actually use the content IO Reader in bot.GetMessageContent(message.ID).Do(). There is no need to really download it to the local end and then upload it to GCS.
It is recommended not to repeat the file name uploaded to GCS. You can use the following method to generate a unique name.

`
func buildFileName() string {
return time.Now().Format("20060102150405")
}

After uploading to GCS, you can enter the next stage and send the file to the STT service - Google Speech-To-Text for judgment.

How to let the STT service determine the text in the file

Test your files online through the STT console

Here, taking Google Speech To Text as an example, first go to the console to open the relevant services, and then go to Speech Console. And you can directly upload and test a short video on the console. Here are a few parameters that can be explained:

When choosing a file, you can choose MP4 as the STT judgment file, as long as the Audio Encoding is selected correctly.
Encoding: Remember to choose MP3, otherwise there will usually be problems. iPhone phones are MP4 containers plus MP3 Audio Encoding.
Sample Rate: Remember to choose 48000

The next item also has some adjustments:

Spoken language: Remember to choose zh-TW, unless you are recording in English.
Transcription model: Choosing Default is actually enough.

The above process is roughly a way for you to test your audio files through the Console. Here are a few things to pay attention to.

Google STT can directly eat video files, without running audio extraction by yourself.
Google STT can directly eat video files, without running audio extraction by yourself.
Google STT can directly eat video files, without running audio extraction by yourself.

And pay attention to your Sample Rate not to choose the wrong one. Only then can the sound be read correctly.

Use Go GCP STT SDK to judge

First, to process the content, you need to use the Speech To Text client API, and you need a file location. In order to make the network processing faster, here we use the File URI method inside Google Cloud Storage. You can refer to the following composition method:

`
// The path to the remote audio file to transcribe.
fileURI := fmt.Sprintf("gs://%s/%s", c.bucketName, c.objectName)

Next, I will share some of the code with you:

Here are some important things to share with you:

1. You must use the beta SDK to support Chinese and MP3

`
// Must use "apiv1p1beta1" version to enable support on Chinese and MP3
speech "cloud.google.com/go/speech/apiv1p1beta1" //v1p1beta1
speechpb "google.golang.org/genproto/googleapis/cloud/speech/v1p1beta1" //v1p1beta1

Currently, the Speech SDK of Golang does not support Chinese, so you must use beta1 to find Chinese support! Currently, the Speech SDK of Golang does not support Chinese, so you must use beta1 to find Chinese support! Currently, the Speech SDK of Golang does not support Chinese, so you must use beta1 to find Chinese support!

This article mentions that I finally found a solution. "iThome Ironman - Day 23 Google Cloud Speech-to-Text - Sub-series Final Chapter"

2. About the settings of STT

Here, the STT console has been used to test the files that often appear, so the following related settings are obtained

`
Config: &speechpb.RecognitionConfig{
Encoding: speechpb.RecognitionConfig_MP3,
SampleRateHertz: 48000,
LanguageCode: "zh-TW",
}

Encoding: Refers to the format of music compression. Please note that only v1p1beta1 supports MP3
SampleRateHertz: The sample rate of sampling, mobile phones and cameras usually use 48000
LanguageCode: This refers to the Model language package used by STT. Please note that only v1p1beta1 supports Chinese zh-TW

3. About the use of results

`
// Prints the results.
var resultStr string
for _, result := range resp.Results {
for _, alt := range result.Alternatives {
log.Printf("\"%v\" (confidence=%3f)\n", alt.Transcript, alt.Confidence)
resultStr = resultStr + alt.Transcript + " "
}
}

The returned series of results, and the confidence level is attached. So you have to find them one by one to deal with. There are some things that are recommended to be specially handled here:

resp.Results may be empty (because there is no appropriate answer), and it needs to be specially handled in use.
Before using the answer, remember to check whether the confidence level is high enough.

How to get the public location of GCS

Open sharing permissions

It is recommended to open the permissions through the interface

Go to Console to the project that owns Google Cloud Storage
Open Edit Access through the More Button

Precautions:

If you want to change the permissions through code, please do not use one file at a time to open. Will be wrong. (This section has problems in recent use)
Open the folder permissions once, and then you don't need to adjust it.

Get the public file location

After modifying the folder permissions and making it Public. How to get the Google Cloud Storage external link?

The rules are as follows:

`
https://storage.googleapis.com//

So you can write a simple function to help you generate.

How to send Video Flex Component Message

The format of the Video Componet Flex Message here is as follows:

The Hero body has a Video Component, and there is "More Information" on it, which allows developers to link to more information websites. It can be the official website or related instructions.
There are two paragraphs of text in the Body below, including the translated content.

Here are some experience sharing for everyone:

If the video does not have PreviewURL, you can give an example first. After the user plays the video once, the correct Preview will appear.
If there is a TextComponent, you need to specially handle the empty case, otherwise it will cause an error message similar to the following:

2022/04/13 06:19:44 linebot: APIError 400 A message (messages[0]) in the request body is invalid
[/body/contents/1/contents/1] invalid text content
[/body/contents/1/contents/1] text or contents must be specified

Since Google STT takes more than a minute to access and detect, it is recommended to use PushMsg to reply to this message.
In terms of the target of the message, special processing is required for groups (chat rooms):

`
// Determine the push msg target.
target := event.Source.UserID
if event.Source.GroupID != "" {
target = event.Source.GroupID
} else if event.Source.RoomID != "" {
target = event.Source.RoomID
}

In this way, you can complete the relevant work. Please everyone can deploy it by themselves through the open source code. https://github.com/kkdai/linebot-video-gcp

Future related work

This article combines the commonly heard function STT (Speech-To-Text) with the new function Video Component updated by Flex Message Update 3 released, hoping to bring everyone more ideas, and then develop more tools that can help users.

In fact, there are more ideas that can be collaborated on, such as:

The function of automatically subtitling through OA
Make the advertising text more vivid through Video Component, because there are customized menus on it, how to make users interact with the video and turn it into a game is also a way. It is also very worth thinking about.

If you have any suggestions or questions, please contact us through the LINE Developers official discussion forum or the official account of the LINE developer official community.

LINE Developers related technical documents and blogs:

DEV Community