Journal your ideas and experiences with your voice: Voice Journal

#cloudflarechallenge #devchallenge #ai #nextjs

This is a submission for the Cloudflare AI Challenge.

What I Built

Voice Journal is a journal app which transcribes your natural speech to what I call a voice note 🔉. Apart from this, it also stores your voice recordings for playback ⏯️.

It has built-in AI chat integration, so you can ask anything about your notes, like list the activities I did today. You can also generate a concise summary of your voice note.

Demo

Here, try the live version of Voice Journal deployed on Cloudflare Pages and AI Workers.

My Code

Here’s the GitHub repository

The project is structured into two directories: web for frontend and worker for AI workers.

Journey

It was my first time using Cloudflare AI Workers, and I had an amazing developer experience working with it. I started building the app with AI Workers and used Hono as a routing library.

Apart from this, I was getting an error while working with the whisper model via the REST API; thus, later, I tried the same but with Cloudflare AI Bindings for Workers and it worked.

Thus, I had to deploy the workers separately for the same issue.

Build the frontend with Next.js, Tailwind CSS and shadcn components, integrated the AI Worker, and finally deploy it on Cloudflare Pages.

I have integrated three AI models, here’s the detail for each:

The whisper model is used for generating transcripts from audio; it takes an array buffer converted to an 8-bit unsigned integer for audio processing.

For text-generation, integrated mistral-7b-instruct-v0.1 model.

To provide a summary of voice notes, the bart-large-cnn model is used, which tries to summarize the recorded voice notes. I think it needs more fine-tuning for better results.

The most challenging part of this whole building process was handling the audio data format and how to process it, as it was my first time dealing with it.

The future plan is to connect Cloudflare R2 object storage to store the audio files and fine-tune the summarization model.

Multiple Models and/or Triple Task Types

The Voice Journal utilizes three models for performing the tasks:

whisper for speech to text conversion
mistral-7b-instruct-v0.1 for text generation
bart-large-cnn to summarize the notes

Top comments (4)

Me • Apr 13 '24

The AI isn't really necessary honestly. It's like AI for the sake of AI. I asked it how many notes I made, and it gave me a rambly answer and an estimate of 30 to 40 notes. There were 5. That was kinda funny.