In the previous part we created a website where users can generate GIF animations using Emoji, domain-specific language (DSL) and a Canvas. In this post we'll upgrade our animations to talkies!
I thought that it'd be funny to create animations where Emoji can talk. I already had Emoji moving around and displaying phrases as text. Obviously it was missing sound. In this article I'll show you how I added it!
tl;dr: try this animation
⚠️ warning: contains sound!
Accidentally I stumbled upon "Text To Speech In 3 Lines Of JavaScript" article (thanks, @asaoluelijah!) and that "3 lines" quickly migrated to my project.
const msg = new SpeechSynthesisUtterance();
msg.text = 'Hello World';
// ☝️ You can run this in the console, BTW
Surely "3 lines" turned out to be 80. But I'll get to that later.
Text-to-Speech — is a part of browser Web Speech API that allows us to read text out loud and recognize speech.
But before we can go further with adding Text-to-Speech to animation, I need to show you how I rendered animation in the first place.
Animation and RxJS
After parsing DSL and rendering it to canvas (see part I), I had an array of frames:
[ { image: 'http://.../0.png'
, phrases: [ 'Hello!' ]
, duration: 1000
, { image: 'http://.../1.png'
, phrases: [ 'Hi!' ]
, duration: 1000
Each frame had a rendered image
, phrases
within it and frame duration
To show the animation I used a React component with RxJS stream inside:
import React, { useState, useEffect } from 'react';
function Animation({ frames }) {
// state for current frame
const [frame, setFrame] = useState(null);
useEffect(() => {
// turn array intro stream of arrays
const sub = from(frames).pipe(
// with each frame delayed by frame.duration
delayWhen(frame => timer(frame.duration)),
// mapped to an Image
map(frame => <img src={frame.image} />)
return () => sub.unsubscribe(); // teardown logic
}, [frames]);
return frame;
Here I use a useEffect
hook to create a RxJS Observable and a subscription to it. The from
function will iterate over the rendered frames
array, delayWhen
will delay each frame by frame.duration
and map
will turn each frame into a new <img />
element. And I can easily loop the animation by simply adding a repeat()
Note that subscription has to be cancelled at some point (specially the endless repeat()
): the component might be destroyed or the frames
might change. So the function passed to useEffect
hook needs to return a teardown callback. In this case I unsubscribe from the animation observable, effectively terminating the flow.
With that covered, we can now discuss the Text-to-Speech!
Text-to-Speech and RxJS
Now I needed to pronounce the text using Speech API, but that frame.duration
delay I used wouldn't work: I had to wait until the phrase is spoken and only then switch to the next frame. Also, if user edits the scenario or navigates away — I need to stop current synthesis. Happily, RxJS is ideal for such things!
First I needed to create an Observable wrapper around Speech Synthesis API:
export function speak(text) {
return new Observable((observer) => {
// create and config utterance
const utterance = new SpeechSynthesisUtterance();
utterance.text = text;
// subscribe our observer to utterance events
utterance.onend = () => observer.complete();
utterance.onerror = (err) => observer.error(err);
// start the synthesis
return () => {
When utterance will end Observable will complete, thus letting us chaining the synthesis. Also, if we unsubscribe from Observable — the synthesis will be stopped.
I've actually decided to publish this Observable wrapper as an npm package. There's a link in the footer 👇!
Now we can safely compose our phrases and be notified when they end:
complete(){ console.log('done'); }
Try this code online at
And to integrate the Text-to-Speech back into our Animation component:
concatMap(frame => {
// concat all phrases into a chain
const phrases$ = concat(
EMPTY, => speak(text))
// we'll wait for phrase to end
// even if duration is shorter
const duration$ = merge(
// to acknowledge the duration we need to merge it
// while ignoring it's values
return merge(
of(<img src={frame.image} />),
Thats it! Now our Emoji can walk and talk!
Turn the volume up and try this "Dancing" animation
And surely try creating your own 🙂
It was pretty simple, huh?
But there was a hidden trick: previously the web app was hosted on GitHub pages and users shared their animations using downloaded GIFs. But GIF cannot contain sound, you know... so I needed another way for users to share animations.
In the next article I'll share details on how I migrated the create-react-app to NextJS/Vercel platform and added MongoDB to it.
Have a question or idea? Please, share your thoughts in the comments!
Thanks for reading this and see you next time!
❤️ 🦄 📖
Web Speech API
RxJS Text-to-Speech wrapper npm package
npm i rxjs-tts
My twitter (in case you want to follow 🙂)
Top comments (0)