DEV Community: VBAK

Converting a Prototype to React Components

VBAK — Fri, 04 Oct 2019 21:25:31 +0000

Newbie Nightmare: Open-Ended Tasks

It's what makes web development exciting and drew me to it but---one of the most challenging tasks I have encountered in my first year-ish of learning web development is translating visual designs to code. It's a very open-ended task with many opportunities to make future choices that lead to ugly if not wrong results which I believe is called technical debt.

Newbie Dream: Curated Best Practices

I have deferred and relied on others' best practices when approaching such an open-ended task. However, those best practices probably or rather hopefully came from a large number of iterations of experience and following them without sharing the same intuition requires a large amount of faith. Implementing someone else's best practices into whatever app I am applying them to also requires a good amount of luck (hopefully my app doesn't have that one feature/requirement that ends up being the kryptonite of any of the best practices I follow). Lastly, vetting someone's best practice as a newbie is nearly impossible.

Newbie Reality: Reinvent the Wheel

While I want to be efficient and resourceful, I also need to build a deeper intuition for tasks related to converting a prototype to UI logic. I think my favorite way to do that is to approach an open-ended task with one outcome: empirical learning. I'll be writing this post in (approximately) realtime while I work on and learn about the project at the same time a la Rahzel.

Plug

I manage the repo for the OSS project that I'll be talking about in this post. As you will see throughout this post, we need a lot of help building this app so if you are interested in contributing please take a look at our repo at the following link:

Wildfire Survivor Management System (Link to GitHub Repository)

We are building two apps for the staff at the United Way of Northern California to help them manage California wildfire survivor data. This is a project that started out initially as an IBM Call for Code one-day hackathon event (link).

Our hackathon team's UX designer created the Sketch files located at this link and the first non-hackathon-team-member contributor converted them to an interactive prototype for each app:

Wildfire Survivor Application (Link to Invision Prototype)
- Users (wildfire survivors) fill out this HTML form-element-based application in order to submit the necessary information for United Way staff members to evaluate and distribute financial assistance. We have a development version which successfully runs locally with minimum functionality (user can enter and preview data in the form elements) but a lot of essential functionality is still missing before users can safely and conveniently use this app (authentication, file storage, CRUD methods, data encryption, and things we haven't even thought of...)
Wildfire Survivor Dashboard: (Link to Invision Prototype)
- This is what I'll be tackling in this post!

Data, Display, Events

Of the many ways to breakdown this problem, the following three components of this project are fundamental:

What data will be displayed to the user?
How will it be displayed?
What events will take place?

Data

Since we are making both the app where users submit their data and the app where admin manage it, we have some flexibility in choosing how the data is structured. For now, I'll keep it simple and continue to use the very linear structure I gave to the data in the Wildfire Survivor Application:

`schema.js` (Link to GitHub repo)

This file exports an object (named SCHEMA) which contains data about each field that will receive some input from the user (inspired by MongoDB's $jsonSchema object (link to their awesome docs))

const SCHEMA = {
  survivor_first_name: {
    input: "text",
    type: "string",
    initial_value: "",
    placeholder: "First Name",
    test_value: test_values.first_name,
    validation: () => {}
  },
  ...
}

`component_fields.js` (Link to GitHub repo)

This file exports an object (named FIELDS) which lists the field names for each fieldset. (These fieldsets were determined from conversations with the end-users---the staff members who will be managing this information). Right now I'm assuming they are going to turn into separate React components, so I've kept the name as "component_fields". However, this is a fundamental structure I'm giving the data primarily to keep it simple (for now) so it may change over time as the project improves.

const FIELDS = {
  general_information: [
    "survivor_first_name",
    "survivor_middle_name",
    "survivor_last_name",
    "survivor_phone",
    "survivor_email",
    "survivor_address1",
    "survivor_address2",
    "survivor_city",
    "survivor_state",
    "survivor_zip"
  ],

  ...,
}

The code samples shown represent the following section of the Sketch file which corresponds to the general_information fieldset:

The goal is to allow us to add and remove fields from different fieldsets over time as we gather more feedback from our end-users.

Display

The Dashboard consists of four main views. Here are my initial thoughts on the views' relationship with the various fields:

Dashboard

Applications are grouped first by fire_name in a scrolling navigation element and then by status (which is currently not included in either schema.js or component_fields.js) in two separate containers beneath it

Analytics

The financial assistance dollar amount visualizations will be displayed by wildfire and over time

Applications

Each wildfire has its own screen displaying a list of all applications that were submitted to receive financial assistance, grouped by status in different tabs

A single application is displayed as follows:
- The main container displays the application data in the same fieldsets that are used in the Survivor Application (i.e. as grouped in component_fields.js) across different tabs
- A side panel includes options for the application's status

Map

The Map view displays an embed of CALFIRE's Camp Fire Structure Status (link)

Events

There are a two broad types of events that the Survivor Dashboard components will need to handle:

Changes to data from a Survivor Application
Changes to Admin-only fields (application status, financial assistance, status update notifications, messaging, etc)

Version 0

Okay so talking through that has helped me to mentally organize the different screens a bit to start seeing some patterns across the screens. Time to jump into a codepen!

Survivor Application Data

I've created some data for an application to use for this initial version. The file lives in the repo at this link and I used jsdelivr to deliver it to my pen. I am avoiding any Firebase functions until I've wrapped my head around the UI.

I'll start with how the Survivor Dashboard displays a single survivor's application. This screen shows different fields based on different tabs that are selected.

Here's the pen! Please click through and let me know if you have any feedback! (I am particularly proud of the way I wrote the logic around the UI for the "Notes" section.)

Bonus Learnings

A few bonus learnings (i.e. things I thought I knew until I spent a couple of hours debugging my misunderstanding for each one):

The C in CDN stands for Content but it could also stand for Cached.
- I have been using the super cool jsdelivr for getting a .json file with fake data from this project's Github repo. However, I made some commits with changes to that file (adding a few key-value pairs) yet my fetch was not fetching the latest file! I came across this issue on the jsdelivr Github repo where one of the comments explains that CDN files are cached and may take up to a day to refresh. So my workaround was changing the file name in my repo which changes the URL and thus counts as a new file.
Another cache-related problem I encountered was the "Cache-Control" request header
- At some point, I had waited long enough for the CDN to update its cache (which I realized after the fact), but my browser cache was still being referenced.
- On a side not, I look forward to referencing these sorts of concepts in Julia Evans' HTTP zine, who teaches in a very effective style for me---visual comic:

getting closer to a final table of contents for the new HTTP zine pic.twitter.com/fYsGiKQ3Pi
— 🔎Julia Evans🔍 (@b0rk) August 27, 2019

I will purchase that zine eventually! For now, I've referenced MDN and added the following init object to my fetch call to ignore browser cache:


fetch("https://cdn.jsdelivr.net/...", { cache: "no-cache" })
  .then((response) => { /* ... */ })

CORS
- CORS is a frustrating concept to learn/utilize for a newbie. Big safety and big headaches because of it. I ran into a CORS error when requesting a resource from Firebase Storage and found the following resolution which involved configuring CORS for the Google Cloud Platform project. Let's begin with the shell provided in the Google Cloud Platform console:
- Open the shell (leftmost icon at top right corner of the screen)

If it doesn't exist already, create a file called cors.json using a text editor. I chose to use pico for no other reason than it was part of one of the answers for this StackOverflow question - pico cors.json - Add something like this (replace the array mapped to the "origin" property to an array of strings with domains you wish to allow for a given method for this project's Storage:

  [
    {
      "origin": ["https://s.codepen.io"],
      "method": ["GET"],
      "maxAgeSeconds": 60
    }
  ]

Save that file! I exclaim it because I didn't.
- one way of doing that is to type ^X and then Y when it asks "save modified buffer?"
- another way is to type ^O to "Write Out" the file and hit enter when it prompts File name to write:
Run the following command (replace exampleproject in the URL with your actual project ID) to set your saved JSON file as the cors configuration file:

gsutil cors set cors.json gs://exampleproject.appspot.com

Now you can use Firebase Storage URLs in your codepen!

Using RunwayML To Create a Lip Sync Animation

VBAK — Mon, 05 Aug 2019 17:08:01 +0000

The goal is to create an open-source app or library which allows musicians to expedite the process of creating visuals for their music:

Lip Sync

In parallel with my study of shader functions, I have been exploring ways to incorporate an animation of my face (or any character I wish to create) that will lip-sync to my song in an HTML/Canvas animation.

This was originally inspired by the output from the forced aligner I used (gentle), which included the time the word was spoken, as well as the duration of each phoneme of the word (phonemes are fundamental units of a word's sound).

For example, gentle's result for the word "let" (the duration of the phoneme is shown in seconds):



{
      "alignedWord": "let",
      "phones": [
        {
          "duration": 0.09,
          "phone": "l_B"
        },
        {
          "duration": 0.09,
          "phone": "eh_I"
        },
        {
          "duration": 0.04,
          "phone": "t_E"
        }
      ]
}

My first plan was to map mouth shape coordinates to each phoneme when rendering the canvas at each frame of the animation. As a first attempt, I have used the following image I found on the web which shows the mouth shape corresponding to different letters:

Source: https://fmspracticumspring2017.blogs.bucknell.edu/2017/04/18/odds-ends-lip-syncing/

I've tried to articulate my intention with comments throughout the code, but essentially, the master image (with all of the mouth shapes) is translated to display the desired phonemes for each word as it is displayed.

I feel confident that this case study can be extended to a full song, with custom mouth shape coordinates (which will probably start out as drawings using vectr). This will be likely the next step I take to produce a full song's animation.

But before I proceed with that route, I wanted to try out something I came across a few days ago: RunwayML, which is software that provides a GUI to run different open-source ML models. RunwayML is explicitly marketed as software for creators. There's a free download and it's unbelievably easy to use so if you are interested in using machine learning for creative endeavors, I highly recommend it.

Using RunwayML

Instead of using the image of mouth shapes, or drawing my own, I was happy to utilize the power of facial recognition to do that work for me.

I started by recording a short video of myself with my phone:

I then created a new workspace in RunwayML and added to it the Face Landmarks model, which is described by its author as follows:

A ResNet-32 network was trained from scratch on a dataset of about 3 million faces. This dataset is derived from a number of datasets. The face scrub dataset2, the VGG dataset1, and then a large number of images I personally scraped from the internet. I tried as best I could to clean up the combined dataset by removing labeling errors, which meant filtering out a lot of stuff from VGG. I did this by repeatedly training a face recognition model and then using graph clustering methods and a lot of manual review to clean up the dataset. In the end, about half the images are from VGG and face scrub. Also, the total number of individual identities in the dataset is 7485. I made sure to avoid overlap with identities in LFW so the LFW evaluation would be valid.

The model takes a video file as input and outputs the coordinates (in x,y pixels) for different recognized face features. The output format I chose was .JSON and the resulting data structure is:



[
    {
        time: 0.01,
        landmarks: [
           {
               bottom_lip: [[x0,y0], [x1,y1], ...],
               chin: [[x0,y0], [x1,y1], ...],
               left_eye: [[x0,y0], [x1,y1], ...],
               ...
           }
        ]
    }
]

Each time value (based on the frame rate of the export, which in this case is 10 fps) has a corresponding set of landmarks (facial features). The facial features have assigned to it an array of [x, y] pixel coordinate arrays.

Here's RunwayML interface during the export. The top panel shows the uploaded video, the bottom panel shows the export/preview of the model's output, and the side panel has model parameters:

I copied over the .JSON output to a pen, and built out an 10 fps animation using the face landmark coordinates:

Woo!! I think that's pretty awesome, given how smooth the whole process went. Note, I did not adjust or study any of the model parameters so I will explore that next.

A small note if you are new to RunwayML: make sure you download, install and open Docker Desktop if you are running the model locally. RunwayML does give you credits to use a remote GPU to run the model, and I'll be using that this week to run a full video with a higher export frame-rate.

Follow Me

YouTube
Instagram
Twitter

The Fence Problem from The Book of Shaders

VBAK — Tue, 30 Jul 2019 02:41:39 +0000

I am working through Patricio Gonzalez Vivo's The Book of Shaders which is turning out to be one of the best (engaging, challenging, fun) technical books I've come across. The Think series by Allen Downey is to Python and Statistical Learning as this book is to computer graphics. At least that's how it feels.

I'm on Chapter 5:

This chapter could be named "Mr. Miyagi's fence lesson."...

...The following code structure is going to be our fence:

#ifdef GL_ES
precision mediump float;
#endif

uniform vec2 u_resolution;
uniform vec2 u_mouse;
uniform float u_time;

// Plot a line on Y using a value between 0.0-1.0
float plot(vec2 st, float pct){
  return  smoothstep( pct-0.02, pct, st.y) -
          smoothstep( pct, pct+0.02, st.y);
}

void main() {
    vec2 st = gl_FragCoord.xy/u_resolution;

    float y = st.x;

    vec3 color = vec3(y);

    // Plot a line
    float pct = plot(st,y);
    color = (1.0-pct)*color+pct*vec3(0.0,1.0,0.0);

    gl_FragColor = vec4(color,1.0);
}

I spent the first day staring at and playing around with that code. I didn't get much out of that (immediately).

distraction or my mind digesting information

I took a detour and worked through the wonderfully short and packed tutorials from Toby Schachman's Shadershop which were uh-mazing. If you haven't tried it and are interested in learning about the construction of noise for complex computer graphics you should definitely go through his videos. His pedagogy is built around spatial learning which is a profound concept for learning computer graphics and definitely has influenced my thinking about the field.

back to the fence

I was ready to take on the Fence Code after some Shadershop wins. I realized that I was trying to figure out the Fence Code only in my head, which was why I was hitting a roadblock, or rather, fence. The reason Shadershop was so effective was that it allowed visual learners to use a spatial medium to control the code and view the effects on the resulting graphic. I chose to take that as advice.

I added a few more uniforms to the Fence Code, and used WebGL Fundamentals' boilerplate to set up the WebGL context and sliders to control the shader function uniform variables.

Below is a codepen I created to show the result. Note the change in the (very small) preformatted text when you move the sliders to help you visualize the process better:

I'm itching to jump to some conclusions about the relationships between uniform variable values and the graphical result, but the more I play around with different value combinations, the broader those relationships get. So I'm not quite ready to move on from this chapter.

The biggest takeaway from this experience: the color value and position of the pixels are handled separately. I know, i know, the shader functions themselves are separate so they are handled separately, but I wasn't comfortable with the concept.

So I added a u_count uniform variable in the fragment shader's script, which determines how many of the triangles are drawn on the canvas (there are a total of 24 triangles). Changing that count does not affect the color of the canvas. So the fragment shader is generating a color map of sorts which is applied over all pixels of the canvas, not just the ones that are specified in the vertex shader. This led me to two conceptual confusions I couldn't quite visualize before: that animations could be made almost entirely with fragment shaders----varying the color of the pixels over time instead of "moving" (more like changing the visibility of) the pixels over time (as you would do with hand-drawn animations) could give similar or same results, and that shader scripts are run by the GPU which processes in parallel (i.e. all the pixels at the same time).

Thanks for reading!

Follow Me

YouTube
Instagram
Twitter

My Learnings from Building Forms in React: Part 1

VBAK — Thu, 18 Jul 2019 07:26:47 +0000

Over the last month or so, I've spent a significant amount of time building, pondering, troubleshooting and Googling how React handles forms. This post, and probably more to follow, is a place for me to think out loud about what I've learned!

As a rule, none of my posts are meant to be interpreted as best practices or tutorials. They are meant to be interpreted as musings and ramblings. Should something come across something as inefficient or incorrect, do me a favor and call it out so I can correct my understanding.

I've been working on a volunteer project with the following requirements:

Users should be able to create an account and submit a form with different field types
Admin should be able to view registered users' form data and update their account status on a dashboard

Registration Functionality

The users follow a simple (for them) process to create an account and submit the form:

Login
Edit a form
Preview the form
Submit the form

Edit or Preview

The data received from form input fields on the Edit page may be different than what is rendered on the Preview page.

For example, the user inputs their First, Middle and Last Name in three separate fields on the Edit page, but a single Name string is rendered on the Preview page.

A codepen to illustrate (vanilla JS):

Users input data into separate fields as single strings (first_name, middle_name, last_name) and the preview element displays that data differently (first_name + ' ' + middle_name + ' ' + last_name).

Refresh

Should the user mistakenly refresh the Edit page (which I notoriously do, by accident, when using mobile apps) they would hate to see their form data deleted.

For development purposes, and in order to quickly observe and test the relationship between state and the Edit/Preview page form elements without introducing a database into the mix, I am using localStorage method to keep the app's state updated and saved.

Here's an example codepen to illustrate (using React Hooks):

Here's a video that shows how Local Storage updates upon user input:

Note that the first time I provide inputs, the associated key is added to the Local Storage object and subsequent updates update the value. The following screenshot shows the value of state after clicking the Left-Handed checkbox but before state is updated. Note the absence of the left-handed property after the first click. I used debugger to pause chrome at that line.

Dynamic Form Elements

It was a seemingly harmless Add a Contact button but it took a few hours to implement my first untested attempt.

Here's an example codepen to illustrate (using React Hooks):

Hot diggity! New components are rendered upon clicking the button, and the contactInfo object updates accordingly.

Looking Ahead

I'll be adding Firebase functionality hopefully tomorrow and will look to write another post about my learnings from that soon enough.

Thanks for reading!

Follow Me

YouTube
Instagram
Twitter

Creating Visuals for Music Using Speech Recognition, Javascript and ffmpeg: Version 1

VBAK — Tue, 16 Jul 2019 06:01:39 +0000

This is my discussion of version 1 of a project I'm working on called animatemusic.

Click here to go to its GitHub repository.

My blog post on version 0 of this project can be found here.

milliseconds, not frames

In version 0 of this project, in my pursuit to render text on a canvas element (which would eventually become a video frame), I chose to design around the essential question:

which video frame is being rendered?

which was a reasonable question to ask since I was rendering a finite and known quantity of frames (based on the framerate and duration of the video).

However, it ended up having a not-so-reasonable solution because it involved a conversion from float (start time in seconds) to integer (frame number) for each set of words that were to be rendered. This caused repetitive rounding, which resulted in the text lagging behind the vocal audio. Here's codepen to articulate the plight:

For version 1, I've chosen to steer clear of this issue by designing around a new essential question:

at what time is the word rendered?

Examples using `requestAnimationFrame`

I found two resources that reassured me that my new essential question was worth pursuing:

I would eventually like to use a library such as anime.js or three.js, and their documentation and API also catered to a time-based animation approach.

Refactor and `generateFrameImages`

I took this opportunity to refactor my original script, in addition to adding functions to render the text on the canvas when the current time (elapsed) of the video is within the start and end times of the word. Here's a codesandbox and sample transcript json for upload:

The Result

I am trembling with excitement as I present to you the new and improved resulting video!! The lyrics sync with the audio so much better than version 0! I don't see many issues with the text (although there are some blanks and one <unk>, which is gentle's version of undefined)

Thanks for reading! Please comment below if you have something you can share to help me improve this project. Going to try and sleep now amongst the adrenaline rush...

Follow Me

YouTube
Instagram

Creating Visuals for Music Using Speech Recognition, Javascript and ffmpeg: Version 0

VBAK — Mon, 15 Jul 2019 05:01:49 +0000

Hello! This is my first blog post on dev.to

I make music and I code.

The Problem

Putting out music and garnering attention to it requires me to wear multiple hats for a variety of tasks: branding, social media marketing, beat production, songwriting, mastering audio, shooting and editing videos, designing graphics, the list goes on...

In order to create social media audiovisual content for my music, I generally follow this process:

1) Make a beat in Garageband
2) Write lyrics
3) Practice the song
4) Setup my the DSLR camera
5) Setup my microphone
6) Video myself recording the song
7) Import the video into Adobe Premiere
8) Import the song audio into Adobe Premiere
9) Align the audio with the video
10) Add and align lyrics (text graphics) with the audio
11) Add some effects to the video I like this 80s look
12) Render the video (45 minutes to an hour)
13) Export to .mp4 (another 30-40 minutes)
14) Upload to YouTube (another 30-40 minutes)
15) Upload to IGTV (another 30-40 minutes)

I want to increase the time I spend on steps 1 through 3 and decrease the time I spend on steps 4 through 15.

Inspiration

Last Sunday (07/07/2019) I was refactoring some of my code on a project from jQuery to Web APIs. One thing led to the next, as they do the longer I am on MDN, and I came across the WebRTC (Web Real-Time Communication) standard and the YouTube LiveStream API documentation. This led me to Googling info about audio and video codecs. This finally led me to ffmpeg, an open source software used for audio and video processing. Sweet--I could start something from there.

I had used this software sparingly in the past, so I spent a few days experimenting with a few different image-to-video conversions in order to learn the basics. Here I've used ffmpeg to convert a sort-of timelapse of the BART (Bay Area Rapid Transit) train that passes nearby using 338 images taken throughout the day:

This inspired and led me to the project I'm working on now.

The Project

I've called this project animatemusic at this GitHub repository. My goal is to create a toolchain in order to expedite the creation of visuals for my songs.

The Tech

Node.js
DOM Web API
JSZip
FileSaver
ffmpeg

How it Works Thus Far

The process is a bit choppy right now since I'm running the various responsibilities in series in a semi-manual fashion:

1) Export my vocals from Garageband to a single .wav file
2) Type the song lyrics into a .txt file
3) Feed the song vocals and lyrics to a locally run CLI of gentle and receive a JSON file with the forced-alignment results
4) Install and run my animatemusic repo locally
5) upload the JSON file (along with some other parameters) and receive a .zip folder with individual video frame .png files
6) Use ffmpeg to stitch the images into a (lyric) video file
7) Use ffmpeg to combine the song audio and the lyric video

Setting Up gentle

gentle is a forced-alignment tool that relies on kaldi which is a speech recognition toolkit. Forced-alignment involves matching a text transcript with the corresponding speech audio file.

The installation process for gentle was rocky, so the following tips and resources may be useful to you, should you choose to install it:

"Error finding kaldi files"
I added branch: "master" to the gentle .gitmodules file in order to capture some of the latest updates in kaldi which resolved some installation issues
Install gentle in a python virtual environment since they expect you to use python@2.7.x and the corresponding pip version
In gentle's install_deps.sh bash script, comment out any of the brew install software names that you already have installed since any brew warnings will prevent the bash script from proceeding to the next step, which is the critical setup.py process

Generating the Forced-Alignment Results

Once you have gentle running, give yourself a pat on the back and then run the following in your terminal, now outside of the virtual environment which used python@2.7.x:

python3 align.py path/to/audio path/to/transcript -o path/to/output

The resulting file is in JSON format with the following structure:



{
  "transcript": string,
  "words": [
      {
        "alignedWord": string,
        "case": string,
        "end": number,
        "endOffset": number,
        "phones": [
            {
               "duration": number,
               "phone": string
            }
        ],
        "start": number,
        "startOffset": number,
        "word": string
      }
  ]
}

transcript
- holds the full text of your transcript in a single string
words
- holds word Objects in an array
alignedWord
- is the word string that gentle recognized from the audio
case
- is a success string with either "success" or "not-in-audio" values
end
- is the time in seconds of when the word ends in the audio
endOffset
- I'm not sure...TBD (comment if you know)
start
- is the time in seconds of when the word starts in the audio
startOffset
- I'm not sure...TBD (comment if you know)
word
- is the word in the transcript to which it forced-aligned the word in the audio file

Converting Forced-Alignment Results to Video Frames

If I can create an image for each video frame, I can render all of those image frames into a video using ffmpeg.

Right now, I have a single script block in my index.html which performs all of the logic around this process. Here's the minimal interface I've created thus far:

Here are the inputs to my script:

"video frame rate" and "full song length"
- determine the total number of frames in the (eventual) video. Default values: 30 fps (frames per second) and 60 seconds, resulting in 1800 frames.
"words per frame" determine how many words will be displayed together on the canvas at any given time
- right now my script is not optimal--if your cadence is fast, the time between words is short and this causes rounding errors and the script fails. This motivated the addition of this input.
"video width" and "video height"
- set the size for the canvas element
"lyrics"
- is the JSON output from gentle

The following scripts must be loaded first:

jszip.min.js
- The wonderful JSZip client-side library which generates a zip file
FileSaver.js
- The wonderful FileSaver client-side library which, among other functionality, exposes the saveAs variable to trigger a browser download of a file

The script I've written right now, can be seen in the repo's index.html. It's still a work in progress so please provide feedback. Here's how it works:

Upon uploading the transcript, the event handler handleFiles is called. handleFiles:
- Parses the file into a regular JS object
- Renders either a blank image (no lyrics being sung for that frame) or an image with the lyrics text (for frames where lyrics are being sung) onto the canvas element
- Saves the canvas element first as a dataURL and then as a .png file object to the folder object which will eventually be zipped
- Initiates the download of the zipped folder upon completion of all image renders

A few helper functions to break up the responsibilities:

prepareWordData
- takes the words Array from the transcript
- extracts wordsPerFrame words at a time (default of 3 words)
- creates an Array of new reduced versions of the original word Objects using the first and last word's start and end values, respectively for every set of words:




{
  alignedWord: string,
  case: "success",
  end: number,   // the last word's `end` property
  start: number // the first word`s `start` property
}

```

- `getWordDuration`
   - takes a word object and returns the difference (in seconds) between the `start` and `end` values. 
  - this "duration" is used to determine how many frames need to be rendered for each set of words

- `renderWordFrames`
    - takes the word (empty string if no lyrics are spoken during those frames) and duration of the word
    - creates a new 2D `context` object
    - fills it with the words' text
    - gets the `dataURL` using the `.toDataURL()` property on the `canvas` element
   - saves it to the folder-object-to-be-zipped with filenames starting with `0.png`
    - This filename convention was chosen since it's the default filename sequence that `ffmpeg` expects

#### Generating the Video From Rendered Frames

Now that I have an image file for each frame of the video, I can use `ffmpeg` to stich them together. I have found the following parameters to be successful:

`ffmpeg -framerate 30 -i "%d.png" -s:v 640x480 -c:v libx264 -profile:v high -crf 20 -pix_fmt yuv420p path/to/output.mp4`

- `-framerate 30` sets the video frame rate to 30 frames per second
- `-i "%d.png"` matches the sequential filenames
- `-s:v` sets the size of the video frame (corresponding to the `canvas` element size, in this exampel, 640x480)
- `-c:v` specifies the video codec (I've used `libx264` which is recommended by YouTube and Instagram)
- `-profile:v` sets the quality of the video to `high` (haven't fully understood how it works yet) 
- `crf` is the "Constant Rate Factor" which I haven't fully understood, but it ranges from 0 (lossless) to 51 (lowest quality)
- `-pix_fmt` sets the pixel format used, in this case, `yuv420` which sets the ratio of pixels for luminance Y (or brightness), chrominance blue U and chrominance red V. I'm pretty rough on these concepts so please correct or enlighten if you are more experienced.

This command generates a video at the output path, stiching the images together at a given framerate.

#### Adding the Song Audio

Now that I have the video for the lyrics, I can add the song audio (full song not just the vocals) using:

`ffmpeg -i path/to/video -i path/to/audio -vcodec libx264 -acodec libmp3lame path/to/output.mp4`

The first two input flags identify the video and audio files which will be streamed together using the video codec and audio codec specified.

#### The Result

Here's what I end up with!





It's pretty rough but the adrenaline rush was real when I saw it the first time.

## Next Steps

I consider this a successful Proof-Of-Concept. Here are my next steps:

- Over time, the lyrics fall out of sync with the audio, and this is most likely due to the fact that I rely on rounding the number of frames at 3 different places in the script

- The manner in which the three words align with the vocals is suboptimal. I may consider increasing the number of words shown per set of frames

- It's dull! The project is called `animatemusic` and this video is lacking interesting animations. If you recall, the word objects contain an array of phonemes used to pronounce the word. Mixing this with [anime.js, particularly their morphing animation](https://animejs.com/documentation/#morphing) will lead to some interesting lip sync animation attempts down the road

- The process is fragmented. Generating the forced-alignment output, generating the video frame images and generating the final output video currently takes places in three separate manual steps. I would like to eventually integrate these different services

- Integrations. The eventual goal is to connect this process with my YouTube and Instagram accounts so that I can upload to them upon completion using their APIs

- Refactoring. There's a lot of improvements needed in my script and I now feel confident enough to dive in and build this project out properly with tests

## Feedback

If you can help me improve my code, blog post, or my understanding of the context and concepts around anything you read above, please leave a comment below.

## Follow Me

[YouTube](https://www.youtube.com/channel/UCqvA0CVB3QR3TLpGVzkD35g)
[Instagram](https://www.instagram.com/vbaknation)

Thanks for reading!

DEV Community: VBAK

Converting a Prototype to React Components

Newbie Nightmare: Open-Ended Tasks

Newbie Dream: Curated Best Practices

Newbie Reality: Reinvent the Wheel

Plug

Data, Display, Events

Data

schema.js (Link to GitHub repo)

component_fields.js (Link to GitHub repo)

Display

Dashboard

Analytics

Applications

Map

Events

Version 0

Survivor Application Data

Bonus Learnings

Using RunwayML To Create a Lip Sync Animation

Related posts:

Lip Sync

Using RunwayML

Follow Me

The Fence Problem from The Book of Shaders

distraction or my mind digesting information

back to the fence

Follow Me

My Learnings from Building Forms in React: Part 1

Registration Functionality

Edit or Preview

Refresh

Dynamic Form Elements

Looking Ahead

Follow Me

Creating Visuals for Music Using Speech Recognition, Javascript and ffmpeg: Version 1

milliseconds, not frames

Examples using requestAnimationFrame

Refactor and generateFrameImages

The Result

Follow Me

Creating Visuals for Music Using Speech Recognition, Javascript and ffmpeg: Version 0

The Problem

Inspiration

The Project

The Tech

How it Works Thus Far

Setting Up gentle

Generating the Forced-Alignment Results

Converting Forced-Alignment Results to Video Frames

`schema.js` (Link to GitHub repo)

`component_fields.js` (Link to GitHub repo)

Examples using `requestAnimationFrame`

Refactor and `generateFrameImages`