DEV Community

Cover image for New tool for building voice user interfaces
Ottomatias Peura
Ottomatias Peura

Posted on

New tool for building voice user interfaces

We’ve been hard at work at Speechly for the past months in building developer tool for voice user interfaces and as a result of that work, we’ve opened up our new Speechly Dashboard in private beta. And the best thing is, you can join the beta, too!

Speechly Dashboard is our web tool that can be used to build Spoken Language Understanding (SLU) models that can be used to build voice user interfaces to any app or service. The model is configured by providing it with sample utterances that are annotated using our own syntax language.

After the model is configured, it can be tested in the Speechly Playground and integrated into applications with our client libraries. You can even share the Playground to your friends or colleagues for feedback.

Speechly Playground is a web application that provides the user with a microphone and when an user gives permission to the browser and starts speaking, it returns with the user intent and entities that are extracted as per the sample utterances that it’s configured with.

If it sounds complicated, it’s not. Let’s make an example: we want to build a simple home automation app that can be used to turn on and off lights in different rooms. These actions are the user intents that our model is interested in. The user has two kinds of intents, to turn the lights on and off. Let’s call them turn_on and turn_off.

The rooms where the lights can be activated are modifiers for these intents. We call these modifiers entities. In some other similar tools they can also be called slots.

Now we have to think of different ways the user can control lights with our app. It’s easy to think of at least a few different ways. Here’s a short list of these examples or utterances.

  • Turn off the lights in kitchen
  • Switch the lights on in bedroom
  • Turn the bedroom lights on
  • Make the living room dark
  • Turn the kitchen lights on
  • Switch on the bedroom lamp

The more sample utterances we give to our model, the better it works, so you can (and you should!) come up with more. In a real-life application, you’d preferably collect these from your users, too.

Because the Speechy uses deep neural networks for doing the end-to-end speech-to-intent, it quickly learns to generalize and detect intent and entities correctly even for cases that it has not been explicitly trained for. This means that the developer does not need to build an exhaustive list of user “commands”, but rather examples that train a model that can adapt to natural human speech. For users this means that they can communicate using their own words and expressions rather than having to learn and repeat preset commands.

Now we have to annotate the sample utterances. We have the two intents, turn_on and turn_off, so we just tell each utterance which intent it has. With our syntax, it’s done like so:

  • *turn_off Turn off the lights in kitchen
  • *turn_on Switch the lights on in bedroom
  • *turn_on Turn the bedroom lights on
  • *turn_off Make the living room dark
  • *turn_on Turn the kitchen lights on
  • *turn_on Switch on the bedroom lamp

But now our model would return exactly the same intent with each of these utterances and it would be hard to distinguish between different rooms.

Let’s use entities for this and annotate the utterances again.

  • *turn_off Turn off the lights in [kitchen](location)
  • *turn_on Switch the lights on in [bedroom](location)
  • *turn_on Turn the [bedroom](location) lights on
  • *turn_off Make the [living room](location) dark
  • *turn_on Turn the [kitchen](location) lights on
  • *turn_on Switch on the [bedroom](location) lamp

Now our model would know, for example, that for the first utterance, the user intent is to turn off the lights and the room where the user wants to turn them off is the kitchen. We could make this even smarter by using our advanced SLU rules, but that’s a more advanced topic that you can learn more in our documentation.

Now that we have configured our model, it can be tested in the Speechly Playground. The Playground returns the user intents and entities in real time along with the text transcript of what was said.

When the model works as expected, it can be integrated into a website, iPhone or Android app or any other service by using our client libraries.

If you are interested in getting an access to our private beta, sign up to our waiting list with the form on our front page. You can also send us an email to and tell more about what you are trying to achieve and we’ll help you forward.

Top comments (1)

laitonen profile image
Eduard Rastoropov

Why doesn't Apple enable a more powerful integration of Siri for the apps?