Karan Rathod for ToolJet

Posted on Aug 29, 2024 • Originally published at blog.tooljet.com

Build An Audio Transcriber and Analyzer using ToolJet and OpenAI🎙️

#javascript #webdev #ai #programming

In this hands-on tutorial, we’ll learn how to build a powerful audio transcriber and analyzer using ToolJet and Open AI. We'll quickly design an intuitive UI using ToolJet's pre-built components, and then use the platform's query builder to interact with Open AI for audio transcription and analysis.

By the end of this tutorial, we'll have a fundamental structure to build more sophisticated transcription and audio analysis applications.

Prerequisites:

ToolJet (https://github.com/ToolJet/ToolJet) : An open-source, low-code platform designed for quickly building and deploying internal tools. Sign up for a free ToolJet cloud account here.
Open AI Account : Register for an Open AI account to utilize AI-powered features in your ToolJet applications. Sign up here.

Here's a quick preview of what we are going to build:

Before you begin, go to the Open AI Console, and copy your secret key. Next, login to your ToolJet account, locate the Data Sources section on the left sidebar, and configure Open AI as a data source using the secret key.

Once the data source is configured, create a new app called "Speech Insight" from the dashboard. And with that, we are ready to start building our application.

Step 1: Building the UI for the Audio Transcriber

Let's use ToolJet's visual app builder to design our UI.

For the app header, drag and drop an Icon component on the canvas. Navigate to its properties panel on the right, and select the IconBrandDingtalk icon.
Drop a Text component next to it, and enter "Speech Insight" under its Data property.
Change the color of both components to blue (#3E63DD). This will be the primary color scheme of our app, update the color scheme of the remaining components accordingly.

Place a Container component below the header. We will organize the upcoming components inside the Container component.
Rename it to mainContainer.

On the top left of the Container component, place a Text component with the label "Output".
Below it, add another Container and place two Text components inside it. We will use these components to display the transcribed text and feedback. Name them transcribedText and feedback respectively.
Add a File Picker component below it and rename it to uploader. Change its Accept file types property to "audio/*".
Place a Button component below it and rename it to analyzeButton and add an "Analyze Button" label to it.

Note: We are renaming key components to make them easier to reference in other parts of our application.

Finally, place four Statistics components on the right for Fluency, Pronunciation, Intonation, and Vocabulary scores.
Drop a Button component below it labeled "Copy Output" and rename it to copyButton.

The UI is now ready! Time to configure the interactions with Open AI.

Step 2: Interact With Open AI

In the below steps, we will go through the configuration to interact with Open AI using both REST API and ToolJet's native integration.

Expand the query panel at the bottom and click on the Add button to create a new REST API query. Rename the query to transcribe.
Enter the Open AI URL under the URL property: https://api.openai.com/v1/audio/transcriptions.
Add a new row under Header, enter Content-Type as the key and multipart/form-data as the value.
Add another key for Authorization. Enter Bearer <OPEN_AI_KEY> in the value.

Under Body, add file as the key and value as {{ components.uploader.file[0] }}. This will ensure the audio file selected in our uploader/File Picker component is sent.
Add model as the key and enter whisper-1 as the value.

Now if we select an audio file in the uploader/File Picker component and click on the Run button, we will see the transcribed audio as the output.

Once the audio is transcribed, we need to analyze it to provide a score. Let's use the native Open AI integration for this query.

Click on the Add button and add a new query. Select Open AI as the data source for this query. This is the same data source that we had set up at the beginning. Rename it to analyze.
Select Chat as the operation and Message as input, and enter the below prompt:



Based on the transcribed audio below, provide a 
JSON object in response with the following details:

   - Fluency (out of 10)
   - Pronunciation (out of 10)
   - Vocabulary (out of 10)
   - Intonation (out of 10)
   - A paragraph that gives general feedback 
on the transcribed text's quality and overall improvement suggestions.

   Return the object in the following format:

   {fluency: "...", pronunciation: "...", 
vocabulary: "...", intonation: "...", feedback: "..."}

   Transcribed text:
   {{queries.transcribe.data.text}}

In this prompt, we are using Open AI to perform a detailed analysis of the audio transcription. We are referencing the data returned by the transcribe query in the prompt along with other scoring criteria.

Running this query will result in the following output:

Both the queries are ready. As a final step, let's automate the process of triggering the analyze query every time the transcribe query is successfully executed.

Go back to the transcribe query, navigate to Events and add a new event handler.
Select Query Success as the Event, Run Query as the Action, and analyze as the Query.

By using events, we have set up the process of triggering the analyze query after the transcribe query is triggered and is ready with the output for analysis.

Step 3: Binding the Transcripts and Analysis to Components

Onto the final step. We have built our UI and also built queries to interact with Open AI. Now we can connect it all together and see the app in action.

Select the Analyze Audio button, navigate to its properties panel on the right and add a new event.
Select On click as the Event, Run Query as the Action, and transcribe as the Query.

Now every time the Analyze Audio button is clicked, it will trigger the transcribe query.

Select the Copy Output button and add a new event to it.
Select On click as the Event, Copy to clipboard as the Action, and {{queries.analyze.data}} as the Text.

This configuration will ensure that the analyzed output gets copied when you click on the Copy Output button.

Select the Text component that we had placed to display the transcript. Enter the following value under its Data property:
Transcript: {{queries.transcribe.data.text}}
Select the Text component to display the feedback. Enter the following value under its Data property:
Feedback: {{JSON.parse(queries.analyze.data).feedback}}

Note: We received a JSON string in response to our analyze query. Therefore, we need to parse it to construct a JavaScript object before displaying it.

Select the Statistics component for "Fluency" and enter the below value under its Primary value property:
{{JSON.parse(queries.analyze.data).fluency}}
Update the rest of the Statistics components using the same logic.

Our audio transcriber and analyzer is now fully complete. Upload an audio file and click on the Analyze Audio button to see the transcription, feedback, and scores getting populated in the UI.

Conclusion

In this tutorial, we learned how to create a complete audio transcription and analysis tool using ToolJet and OpenAI. We walked through designing an intuitive user interface, setting up API queries to interact with OpenAI, and binding the results to display transcriptions, feedback, and speech analysis scores.

To further customize the application, experiment with different UI components to enhance the user experience or integrate additional APIs to analyze other aspects of the audio, such as emotion detection or language translation.

To learn more, check out ToolJet's official documentation or connect on Slack for questions or queries.

Top comments (2)

Pratik Agrawal • Aug 30 '24

Great Read 🚀

Comment hidden by post author - thread only accessible via permalink

William • Aug 29 '24

Ah neat.

Another post about Open AI.

With 86 likes and 16 bookmarks at the time of my comment.

0 interaction. No comments. Nothing.

@mods can we sort this out? It's clearly engagement farming or botting. Theres dozens of posts like this every week. They all follow the same format, and add no value to dev.to.

Some comments have been hidden by the post's author - find out more

DEV Community

Build An Audio Transcriber and Analyzer using ToolJet and OpenAI🎙️

Step 1: Building the UI for the Audio Transcriber

Step 2: Interact With Open AI

Step 3: Binding the Transcripts and Analysis to Components

Conclusion

Top comments (2)

Read next

Experience the magic of interactive web animations!

Template or No Template: Weighing the Pros and Cons

How to use generics in pipe-and-combine

Docker All in one 1️⃣