Create account

DEV Community

Michel Sabchuk

Posted on Apr 14

Learning spell: using cloudflare's AI to improve speaking skills

#cloudflarechallenge #devchallenge #ai #remix

This is a submission for the Cloudflare AI Challenge.

What I Built

Learning Spell is a web application for practicing language skills. Users will be challenged with a sentence obtained using Cloudflare's Text Generation models and should read it aloud to be recorded. The recording will be transcripted using Cloudflare's Automatic Speech Recognition. The transcription will be then compared to the original sentence.

Demo

You can test the application in the link below:

https://learning-spell.turbosys.workers.dev/

Watch the video below for a quick demonstration:

My Code

The code is available on Github:

michelts / learning-spell

Web application for practicing language skills

Learning Spell

Web application for practicing language skills.

Technical overview

The project uses Remix running on Cloudflare Workers.

Development

Install dependencies using pnpm install
Create the database by running the command npx wrangler d1 migrations apply --local DB
Start the development server using pnpm dev

Open up http://127.0.0.1:8787 and you should be ready to go!

The project uses Cloudflare Workers AI and thus, you might need to set your account id even to run the project locally. Do it by settings an .env file with the variable CLOUDFLARE_ACCOUNT_ID pointing to your account id.

Deployment

If you don't already have an account, then create a cloudflare account here and after verifying your email address with Cloudflare, go to your dashboard and set up your free custom Cloudflare Workers subdomain.

Once that's done, you should be able to deploy your app:

npm run deploy

You might need change the database name and…

View on GitHub

Journey

When I knew about the hackathon, I was afraid it was too late. It was the last Monday before the due date and all the work and life stuff happening.

But I have been wanting to test Cloudflare Workers for a time already (plus a couple of other tools like Remix, Drizzle, and Tailwind). It was a good excuse!

In my current position, we have been working on speech-to-text and text-to-speech features (we use AWS there), I wanted to see what I can do with Cloudflare's tooling.

I also just finished reading all Harry Potter books for my two daughters (we read books together before sleep since they were babies).

With that, the theme I picked came to me naturally: an application to allow kids (and adults, of course) to practice speaking by reading Harry Potter movie quotes. All set in a magician ambiance.

Proof of concept

The POC consisted of:

Recording audio using react-audio-voice-recorder
Submitting it to a worker using Remix's useSubmit and multipart/form-data
Transcribing it using automatic speech recognition

Remix is compatible with Cloudflare Workers. I use React and Django in my current position and we use react-router-dom, so the mental model and the project structuring feel familiar to me. The experience with Remix was very positive!

I began with a couple of hardcoded sentences, but my goal was to use text generation to get them.

Generating quotes

The lack of time didn't allow me to test deeply the best text generation model for my use case. I hand-picked a couple of them and ended up using mistral-7b-instruct-v0.1-awq to return sentences in JSON format.

I can probably automate the models comparison: by generating a couple of sentences using the same app workflow and comparing results. That's for the next steps!

The model will generate the same sentence for the same instructions so, to be able to generate additional quotes, I'm keeping the generated messages in a database table. Even with that, I can only generate a couple of individual sentences, after some point, the model will return repeated content.

I didn't study how to improve that for one reason: generating the same quotes for a single theme using AI seems overkill! This approach would shine when generating sentences for different themes - and at this point, caching the sentence on the database could be appropriate.

Imagine the student picking his own theme: Harry Potter, Dragon Ball, The Godfather, or whatever you like! This was out of the boundaries of my POC though: with what I already did, I know it is possible and that's enough for now!

Comparing sentences

I'm using jsdiff to compare sentences and render the correct, incorrect, and missing terms:

Layout

Time to make the app beautiful! I have been playing with Tailwind on side projects and it can make you really productive! I love the utility-first approach!

It is powerful! Even the wand from the image below was built using Tailwind:

One interesting thought: I have more than a decade of experience using CSS, and so, translating Tailwind to CSS and vice-versa is easy for me. I wonder how it feels for some unfamiliar with CSS at all 👀.

Translation

I'm Brazilian and so are my daughters. I like to give them some opportunity to understand what they are reading and thus I included also a translation of the sentence, but only after submitting, to avoid taking the attention out of the main task:

For now, I'm hard-coding the translation to Portuguese. On a production app though, I would be reading the user's language instead to conditionally translate (e.g. English speakers don't need a translation).

Takeaways

Cloudflare Workers use its own runtime. I faced only one compatibility issue: generating ulid ids. The ulid package is not compatible with workers.

I overcame it by using ulid-workers, which has its own limitations, but would work for my use-case.

Also, Remix sourcemaps didn't work with Cloudflare Workers, but it seems that it should be working already. I should double-check it!

Other than that, the experience was great!

Next steps

The project is already fun, but there's a lot that can be done further:

Multiple themes for better variance
Better fit text translation in the app
Add monitoring tools (e.g. Sentry)
Better handle server-error
Parse transcription with streaming
Authentication

Multiple Models and/or Triple Task Types

The project is using 3 different task types:

Text generation
Automatic Speech Recognition
Translation