Nader Dabit

Posted on Sep 2, 2019

How to Build an AI Enabled Natural Language Synthesization Chrome Extension

#javascript #chromeextension #awsamplify

Transilator is a Chrome extension that translates and then synthesizes text on your screen to natural sounding speech. In this tutorial, I'll show you how I built it.

To view the code, click here

Here's a demo of the extension:

This is part 1 of a 3 part series going into adding Machine Learning and AI capabilities to your app using AWS Amplify Predictions.

Part 1 - Building Transilator: Language detection of text, text translation, and natural speech synthesization.

Part 2 - Image entity recognition - Building a field guide for exploring nature.

Part 3 - Text recognition from images - Turning conference badges into contacts.

About this extension

Translitor allows you to highlight text on your screen and read it back in the language of your choice.

Features

Lifelike speech that is pleasant to listen to
Languages outputs supported: arabic, english, chinese, dutch, spanish, portugese, danish, hindi, italian, japanese, korean, norwegian, polish, russian, swedish, turkish
Language inputs supported: dutch, portugese, english, italian, french, spanish

Use cases

Learning a new language / how words should be pronounced
Listening to a news article, documentation, or blog post
Users with vision problems / accessibility related use cases
Listening to emails
Listening to translated content from other languages to your language
Reviewing a blog post / tweet before publishing it
General multitasking (working on some things while listening to others)

Getting started

There are two main parts to this tutorial:

Creating the Amplify project and creating the ML and AI services
Building the Chrome extension and connecting to the ML and AI services created in step 1

Part 1 - Creating the ML & AI services with Amplify

AWS Amplify is framework for building cloud-enabled applications that includes a CLI (for creating and managing services), a client library (for connecting to the APIs created by the CLI), a UI library (for making things like authentication simpler), and a hosting platform with CI & CD.

In this tutorial we will use the CLI to create the services and the Amplify client library to interact with those APIs.

Creating the project.

We want to use modular and modern JavaScript to build our extension, therefore we need to use Webpack (or something like it). A perfect starter project already exists, a Chrome extension boilerplate that uses Webpack (click here to view it).

Clone this boilerplate and then change into the new directory:

git clone git@github.com:samuelsimoes/chrome-extension-webpack-boilerplate.git

cd chrome-extension-webpack-boilerplate

If you do not already have the Amplify CLI installed and configured, click here to see a video walkthrough of how to do this.

Next, initialize a new Amplify project:

$ amplify init

Next, we'll add the services that we'll need using the predictions category.

Text interpretation

We'll start by adding text interpretation:

$ amplify add predictions

? Please select from of the below mentioned categories:
❯ Interpret

? What would you like to interpret?
❯ Interpret Text

? Provide a friendly name for your resource: (interpretText<XXXX>)

? What kind of interpretation would you like?
❯ All

? Who should have access?
❯ Auth and Guest users

Text translation

Next, we'll add text translation:

$ amplify add predictions

? Please select from of the below mentioned categories:
❯ Convert

? What would you like to convert?
❯ Translate text into a different language

? Provide a friendly name for your resource: (translateText<XXXX>)

? What is the source language?
❯ Choose any language, we will change this dynamically later in our app

? What is the target language?
❯ Choose any language, we will change this dynamically later in our app

? Who should have access?
❯ Auth and Guest users

Speech synthesization

Next, we want to add a way to take the text translation and synthesize speech.

$ amplify add predictions

? Please select from of the below mentioned categories:
❯ Convert

? What would you like to convert?
❯ Generate speech audio from text

? Provide a friendly name for your resource (speechGenerator<XXXX>)

? What is the source language?
❯ Choose any language, we will change this dynamically later in our app

? Select a speaker
❯ Choose any speaker, we will change this dynamically later in our app

? Who should have access?
❯ Auth and Guest users

Now, we have all of the API configurations created and we can create the services by running the Amplify push command:

amplify push

Now the services have been deployed and we can continue creating the Chrome extension!

Part 2 - Building the extension.

Chrome extension overview

Chrome extensions are made up of a few main files:

Read more about the basics of chrome extensions here. The below file definitions are copied from this post.

manifest.json - This file bootstraps your extension and provides meta data like versioning. Without this, you have no extension.

background scripts (background.js) - The heart and soul of your extension. This is where you create a listener to actually trigger the popup when users click you icon. All “hard” business logic and native browser interaction should go in here as much as possible.

content scripts (content.js) - Content scripts can be injected into the tabs in the browser and access the DOM in the context of a browser session. This is where you can add new DOM elements, add extra listeners etc. Content scripts are optional

popup UI (popup.js & popup.html) - The little app you see when clicking/activating an extension. Can be built with any framework like React or Vue or just vanilla JS. We are using vanilla JS.

In this extension, I am using the popup UI and content scripts to control most of the behavior.

In popup.js, there is logic that allows the user to choose the language they would like to translate their text to. In content.js, there is a listener that listens to events that happen in popup.js so we can send messages back and forth between the two. When the user chooses a language, we call the following method in popup.js:

// popup.js
chrome.tabs.query({active: true, currentWindow: true}, function(tabs) {
  chrome.tabs.sendMessage(tabs[0].id, {language}, function(response) {
    console.log('response: ', response)
  });
});

Then, in content.js, we can receive that message and update the local state by attaching a listener for the current page:

// content.js
chrome.runtime.onMessage.addListener(
  function(request, sender) {
    if (!sender) return
    state.setLanguage(request.language)
    return true
})

These two functions are what controls the data flow between the chrome extension UI and the actual code running on the user's browser.

Building it out

The next thing we need to do to continue is install the Amplify library:

npm install aws-amplify

Next, we need to add the content script. This boilerplate by default doesn't have this, so we will add it manually.

touch src/js/content.js

Now, update the manifest.json and add the following to enable the new content script and allow the content script to work on the currently active tab:

"permissions": ["activeTab"],
"content_scripts": [{
    "matches": ["*://*/*"],
    "js": ["content.bundle.js"],
    "run_at": "document_end"
  }],

Next, we need to update the webpack config to also process the content.js script:

entry: {
  popup: path.join(__dirname, "src", "js", "popup.js"),
  options: path.join(__dirname, "src", "js", "options.js"),
  background: path.join(__dirname, "src", "js", "background.js"),
  content: path.join(__dirname, "src", "js", "content.js")
},
chromeExtensionBoilerplate: {
  notHotReload: ["content"]
},

Here we exclude the content script from hot reloading and add the new entrypoint to the entry configuration.

popup.js

In popup.js we set up an event listener for clicks in the popup. When the user clicks on a language, we then send a message to the content script with an object that contains the chosen language. We also have a function that adds a new class to the button to darken the background and let the user know it is selected.

import "../css/popup.css";

window.addEventListener('DOMContentLoaded', () => {
  var buttons = document.getElementsByClassName("lang-button");
  Array.from(buttons).forEach(function(button) {
    button.addEventListener('click', function(item) {
      Array.from(buttons).forEach(item => item.classList.remove("button-selected"))
      item.target.classList.add("button-selected")
      const language = item.target.dataset.id
      chrome.tabs.query({active: true, currentWindow: true}, function(tabs) {
        chrome.tabs.sendMessage(tabs[0].id, {language}, function(response) {
          console.log('response: ', response)
        });
      });
    });
  });
});

content.js

Content.js is where most of our code lives. Here, there is an event listener to listen for a mouseup event and three main functions that run if any text is selected:

interpretFromPredictions - This function interprets the language of the selected text:

function interpretFromPredictions(textToInterpret) {
  Predictions.interpret({
    text: {
      source: {
        text: textToInterpret,
      },
      type: "ALL"
    }
  }).then(result => {
    const language = result.textInterpretation.language
    const translationLangugage = state.getLanguage()
    translate(textToInterpret, language, translationLangugage)
  })
  .catch(err => {
    console.log('error: ', err)
  })
}

translate - This function translates the highlighted text into the language chosen by the user.

function translate(textToTranslate, language, targetLanguage) {
  Predictions.convert({
    translateText: {
      source: {
        text: textToTranslate,
        language
      },
      targetLanguage
    }
  }).then(result => {
    generateTextToSpeech(targetLanguage, result.text)
  })
    .catch(err => {
      console.log('error translating: ', err)
    })
}

generateTextToSpeech - Once the translation is complete, the last step is to synthesize it into natural speech.

function generateTextToSpeech(language, textToGenerateSpeech) {
  const voice = voices[language]
  Predictions.convert({
    textToSpeech: {
      source: {
        text: textToGenerateSpeech,
      },
      voiceId: voice
    }
  }).then(result => {
    console.log('result: ', result)
    let AudioContext = window.AudioContext || window.webkitAudioContext;
    console.log({ AudioContext });
    const audioCtx = new AudioContext();
    if (source) {
      source.disconnect()
    }
    source = audioCtx.createBufferSource();
    audioCtx.decodeAudioData(result.audioStream, (buffer) => {
      source.buffer = buffer;
      source.playbackRate.value = 1
      source.connect(audioCtx.destination);
      source.start(0);
    }, (err) => console.log({err}));

    // setResponse(`Generation completed, press play`);
  })
    .catch(err => {
      console.log('error synthesizing speech: ', err)
    })
}

The service used for the voice synthesization is Amazon Polly. Amazon Polly has different voices for the languages that are translated (see the list here.

In the generatedTestToSpeech function we use the language to determine the voice:

// Voice data
const voices = {
  ar: "Zeina",
  zh: "Zhiyu",
  da: "Naja",
  nl: "Lotte",
  en: "Salli",
  ...
}

// Get proper voice in the function:
const voice = voices[language]

To set and update the language chosen by the user we have a basic state machine:

const state = {
  language: 'en',
  getLanguage: function() {
    return this.language
  },
  setLanguage: function(language) {
    this.language = language
  }
}

To view all of the code for content.js, click here.

Finally in popup.html we render the buttons to choose the different languages.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title></title>
</head>
<body>
  <p class="heading">Choose Language</p>
  <div class="list">
    <h4 class='lang-button' data-id="en">English</h4>
    <h4 class='lang-button' data-id="es">Spanish</h4>
    <h4 class='lang-button' data-id="pt">Portugese</h4>
    <h4 class='lang-button' data-id="zh">Chinese</h4>
    <h4 class='lang-button' data-id="ar">Arabic</h4>
    <h4 class='lang-button' data-id="da">Danish</h4>
    <h4 class='lang-button' data-id="nl">Dutch</h4>
    <h4 class='lang-button' data-id="hi">Hindi</h4>
    <h4 class='lang-button' data-id="it">Italian</h4>
    <h4 class='lang-button' data-id="ja">Japanese</h4>
    <h4 class='lang-button' data-id="ko">Korean</h4>
    <h4 class='lang-button' data-id="no">Norwegian</h4>
    <h4 class='lang-button' data-id="pl">Polish</h4>
    <h4 class='lang-button' data-id="ru">Russian</h4>
    <h4 class='lang-button' data-id="sv">Swedish</h4>
    <h4 class='lang-button' data-id="tr">Turkish</h4>
  </div>
</body>
</html>

Next, use the css in popup.css or create your own styling for the popup menu in popup.css.

Building and deploying the extension

Now the extension is completed and we can try it out.

To run webpack and build the extension, run the following command:

npm run build

Now you will see that the build folder is populated with the extension code that has been bundled by webpack.

To upload and use the extension:

Visit chrome://extensions (menu -> settings -> Extensions).
Enable Developer mode by ticking the checkbox in the upper-right corner.
Click on the "Load unpacked extension..." button.
Select the directory containing your unpacked extension.

To view the completed codebase, click here

My Name is Nader Dabit. I am a Developer Advocate at Amazon Web Services working with projects like AWS AppSync and AWS Amplify. I specialize in cross-platform & cloud-enabled application development.

Latest comments (6)

Jordhan Carvalho • Sep 4 '19

Awesome! I've been learning some aws concepts and started to dive into amplify, most of the good guides|content about amplify framework have your name on it. Just wanted to give you a big thanks!