DEV Community: Nokia Bell Labs

How to Port an AWS Serverless App to KNIX MicroFunctions

Bell Labs — Fri, 21 Aug 2020 20:03:29 +0000

In my last article I described the implementation of a serverless speech-to-speech language translation app using AWS Lambda and Step Functions. In this article, I will describe how to port that same app to KNIX MicroFunctions.

KNIX MicroFunctions

KNIX MicroFunctions is a new serverless platform originally developed by Nokia Bell Labs. It has since been open-sourced on GitHub. (Full disclosure: I am part of the team that developed KNIX.)

KNIX minimizes startup delays for function executions, provides support for persistent functions, and optimizes resource utilization. It is AWS-compatible but outperforms AWS Step Functions and Express workflows by a factor of 6 (as of June 2020).

KNIX runs on Kubernetes/Knative as well as on bare metal or VMs. At present, the KNIX platform supports the Python 3.6 and Java 8 runtimes. It also comes with developer tools such as a web IDE (shown below), SDK, and CLI.

KNIX Workflows

KNIX supports serverless workflows which are comparable to Amazon's Step Functions. You can use them to create a serverless app that consists of multiple functions. In my AWS Lambda implementation of the translation app I had used the Amazon States Language (ASL) to define a state machine for my app. KNIX also supports ASL along with some KNIX-specific extensions (e.g., for long-running session functions). As such, I could simply copy my ASL definition from the AWS console and paste it into the KNIX workflow editor. The only change I had to make was to replace the ARN identifiers in the ‘Resource fields’ with the corresponding KNIX function names.

The KNIX workflow editor has a built-in visualization tool which produced the following visualization for my workflow:

As you can see in the adapted ASL definition below (only the first branch is shown), the only change I had to make was to remove the ARN-specific parts from the ‘Resource’ fields since I kept the function names the same in KNIX.

{
    "Comment": "Parallelized Language Translation",
    "StartAt": "ParallelTranslator",
    "States": {
        "ParallelTranslator": {
            "Type": "Parallel",
            "End": true,
            "Branches": [
                {
                    "StartAt": "Italian",
                    "States": {
                        "Italian": {
                            "Type": "Pass",
                            "Result": {
                                "TargetLanguageCode": "it",
                                "VoiceId": "Carla"
                            },
                            "ResultPath": "$.TargetLanguage",
                            "Next": "SpeechToTextItalian"
                        },
                        "SpeechToTextItalian": {
                             "Type": "Task",
                             "Resource": "speech2text",
                             "Next": "TranslateToItalian"
                        },
                        "TranslateToItalian": {
                            "Type": "Task", 
                            "Resource": "translate",
                            "Next": "TextToSpeechItalian"
                        },
                        "TextToSpeechItalian": {
                            "Type": "Task",
                            "Resource": "tts",
                            "End": true
                        }

                    }
                },
                [...]
            ]
        }
    }
}

Instead of copying and pasting the ASL definition into the KNIX workflow editor, I also could have uploaded a file containing the workflow JSON or used the KNIX workflow import tool in the GUI. The workflow import tool lets KNIX users upload a zip file containing the workflow JSON, as well as all function code and any function code dependencies. The zip file must adhere to the following directory structure:

 myWorkflow.json
            ├── myFunction1/
               ├── myFunction1.py|java
               ├── [requirements.txt]
               ├── [myDependency.py|java]
               ├── [otherDependencies/]
                    ├── [myOtherDependency.png]
            ├── [myFunction2/]
               ├── [myFunction2.py|java]
               ├── [requirements.txt]
               ├── [myDependency.py|java]
               ├── [otherDependencies/]
                    ├── [myOtherDependency.png]
            ...

In the following, I will summarize any changes I had to make to the AWS Lambda function code of my app to make it run on KNIX.

speech2text Function

The 'speech2text' Lambda function reads the audio clip from the object store and sends it to IBM's Watson SpeechToText cloud service. I had to change the name of the method that gets called by the serverless platform when the function is invoked from ‘lambda_handler’ to ‘handle’. Unlike in AWS Lambda, in KNIX, the handler method to be invoked by the platform cannot be customized and must always be called ‘handle’.

I also had to change my function code to use the KNIX object store instead of Amazon's S3 for storing the voice sample to be translated. The KNIX object store APIs are exposed through the ‘context’ API object in the ‘handle’ method. The KNIX object-store operates on strings which means that binary data such as audio clips have to be base64-encoded. So I had to import the base64 library and add a line to base64-decode the audio recording after retrieving it from the KNIX object-store.

Unlike Step Functions, KNIX workflows cannot yet be triggered by object store operations, although this feature is currently under development. As such, I had to manually execute my KNIX workflow using the GUI after uploading the voice sample to the KNIX object-store.

import json
import base64
import requests
from requests.auth import HTTPBasicAuth

def handle(event, context):

    # get the recording
    recording = context.get("recording.mp3")

    # base64-decode recording
    recording = base64.b64decode(recording)

    url = "https://stream-fra.watsonplatform.net/speech-to-text/api/v1/recognize"

    # send audio to IBM Watson Speech-to-Text service
    response = requests.post(url=url, data=recording, auth=HTTPBasicAuth('apikey', [api_key]))

    transcript = response.json()['results'][0]['alternatives'][0]['transcript']

    return_json = {}
    return_json['SourceText'] = transcript
    return_json['TargetLanguage'] = event['TargetLanguage']
    return_json['AWSCredentials'] = event['AWSCredentials']

    return return_json

translate Function

The 'translate' function translates the transcripts to one of three target languages. Aside from changing the handler method name, I had to make a slight change to the import statement for the 'requests' library, namely, delete the AWS-specific 'botocore.vendored' part. I also had to add the 'boto3' library as a function requirement. KNIX requires that all non-standard libraries be listed in the 'Requirements' tab of the function editor (in the format of a pip ‘requirements.txt’ file) as shown below.

Since we are now calling the Amazon Translate service from outside the AWS ecosystem, I also had to pass the access key and secret associated with my AWS account to the boto3 session object.

import json
import boto3

def handle(event, context):
    source_text = event['SourceText']
    target_language_code = event['TargetLanguage']['TargetLanguageCode']
    target_language_voice_id  = event['TargetLanguage']['VoiceId']
    access_key = event['AWSCredentials']['AccessKey']
    secret_key = event['AWSCredentials']['SecretKey']

    # create boto3 session
    translate_client = boto3.Session(
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        region_name='us-east-1').client('translate')

    # call AWS Translate
    result = translate_client.translate_text(Text=source_text,
             SourceLanguageCode="en", TargetLanguageCode=target_language_code)

    # create return value JSON object
    return_json = {}
    return_json['SourceText'] = event['SourceText']
    return_json['TranslatedText'] = result.get('TranslatedText')
    return_json['TargetLanguageCode'] = target_language_code + "-" + target_language_code.upper()
    return_json['AWSCredentials'] = event['AWSCredentials']
    return_json['VoiceId'] = target_language_voice_id

    return return_json

tts Function

The 'tts' function sends the translation result to Amazon's text-to-speech service, 'Polly' and saves the returned mp3 audio data to the object-store. I had to make the following changes in this function:

change the entry method name to 'handle'
add 'boto3' to the list of requirements in the KNIX function editor
initialize the boto3 session with the access key/secret associated with my AWS account
save the resulting voice sample (after base64-encoding it) to the KNIX object store instead of S3 using the context.put() API

import json
import base64
import boto3

def handle(event, context):

    source_text = event['SourceText']
    translated_text = event['TranslatedText']
    target_language_code = event['TargetLanguageCode']
    target_language_voice_id  = event['VoiceId']
    access_key = event['AWSCredentials']['AccessKey']
    secret_key = event['AWSCredentials']['SecretKey']

    polly_client = boto3.Session(
        aws_access_key_id=access_key,
        aws_secret_access_key=secret_key,
        region_name='us-east-1').client('polly')

    response = polly_client.synthesize_speech(VoiceId=target_language_voice_id,
                    OutputFormat='mp3', LanguageCode=target_language_code,
                    Text = translated_text)

    context.put(target_language_code + '.mp3', base64.b64encode(response['AudioStream'].read()).decode("utf-8"))

    return_json = { 'Translation' : translated_text }
    return return_json

Workflow Execution

In order to test my speech-to-speech translation app under KNIX, I first uploaded my voice recording to the KNIX object store using the KNIX GUI. I then used the GUI to deploy and execute my newly created workflow. The KNIX workflow execution dialog lets me specify the JSON-based workflow input (AWS credentials for my app) and visualizes the function execution on a timeline (see screenshot below). I can also view any log statements generated by my functions as well as the workflow output (in this case the translations in text form).

After my workflow execution finished successfully, I navigated to the object store interface from where I could download the speech translations generated by my app (see screenshot below).

Closing

I hope this article has helped you get a better understanding of the KNIX MicroFunctions platform and how to move existing serverless apps from the AWS ecosystem to KNIX MicroFunctions.

You can find the complete source code for both the AWS Lambda as well as the KNIX versions of my speech-to-speech translation app in my Github repo.

Implementing a Serverless Speech-To-Speech Language Translation App

Bell Labs — Wed, 22 Jul 2020 18:48:09 +0000

By Andre Beck
Bio: https://www.bell-labs.com/usr/andre.beck

This is the first in a series of articles that aim to illustrate the implementation of simple cloud apps on serverless platforms such as AWS Lambda/Step Functions or IBM Cloud Functions.

In this article I will describe the implementation of a simple serverless speech-to-speech language translation app. The basic idea for the app is to take an English voice recording as input and simultaneously translate it to three different target languages: Italian, French, and German. For each target language the app first transcribes the recording, then translates the transcript and finally synthesizes the translation back to speech. I implemented the app using AWS Lambda and Step Functions as well as the following AWS/IBM cloud services: IBM Watson Speech-To-Text, Amazon Translate, and Amazon Polly (Text-To-Speech).

Step Functions Design

Since the app consists of more than one function I decided to use AWS Step Functions to create a serverless workflow for the app. Step Functions is an AWS orchestration service that models workflows as state machines. They are defined using the JSON-based Amazon States Languages(ASL). An ASL workflow definition consists of a map of all possible workflow states and the transitions between them. For example, a Step Functions state machine could specify a sequence of Lambda function executions and how to handle function execution errors (retry handling). You can find more ASL examples in the Step Functions Docs.

A visualization of the ASL definition (state machine) I came up with for my translation app is shown below. It makes use of the 'Parallel' ASL state in order to process the operations for each target language in parallel. The 'Parallel' state defines a fixed number of branches that receive the same input but are executed in parallel. For scenarios where the number of parallel execution branches is not known in advance Amazon recently added the Map state which I could have used to support a dynamic set of target languages.

The ASL definition shown below only includes the first branch of my state machine. The branches for the other two target languages are very similar. In fact each branch executes the exact same three AWS Lambda functions in the following sequence: 'speech2text', 'translate', and 'tts'. The only difference between the branches is the language code and voice id parameter passed to the initial function. I use the ASL 'Pass' state and 'ResultPath' construct to add these parameters to the function input.

Step Functions Trigger

Step Functions, like individual Lambda functions, can be triggered by S3 operations. For my app I decided that it makes sense for it be triggered whenever a new voice recording is uploaded to a specific S3 bucket. I configured the Step Functions trigger by following the steps in this AWS developer guide.

I first created a new S3 bucket, then a trail in AWS CloudTrail (to receive the S3 events) and lastly a CloudWatch Events rule where I specified that any 'PutObject' operation in my newly created S3 bucket should trigger the execution of my Step Functions state machine. The event that gets passed as input to my state machine contains among other data the key of the object that triggered the execution.

speech2text Function

The 'speech2text' Lambda function reads the audio clip from the S3 object that triggered the Step Functions execution and subsequently sends it IBM's Watson SpeechToText cloud service. I also experimented with Amazon's Transcribe service, but was disappointed by the long processing times, especially for short audio clips. Amazon Transcribe appears to be optimized for audio recordings that are 2 minutes or longer.

The IBM speech-to-textservice (which has a free tier of up to 500 minutes per month) provides a REST API that I call from my Lambda function. The service uses the basic HTTP authentication scheme in which the API key is sent as an HTTP request header value.

translate Function

The 'translate' function shown below translates the transcripts to one of three target languages. In order to call Amazon's Translate service from my Lambda function I first had to add the correct permissions to the IAM role associated with my function (see below).

Aside from the source text, the translate API expects both a source language code ("en" in my case) and target language code.

tts Function

The 'tts' function shown below is invoked after the 'translate' function has finished. It sends the translation result to Amazon's text-to-speech service, 'Polly' and saves the returned mp3 audio data to an S3 bucket that I created for this purpose. Make sure to not write the translations to the same bucket as the source audio clips or else your functions will be executed in a loop. Polly's API expects a voice id, an output format (I used mp3) and a language code. You can find a list of voices and corresponding voice ids here.

Closing

Serverless computing is an exciting new paradigm that promises to free software developers from the burden of having to manage servers and infrastructure. Yet at the same time it also completely changes the way software is built, deployed, and run. I hope this article has helped you get some insights into this new world of serverless computing. You can find the complete source code for my speech-to-speech translation app in my Github repo.