This is a submission for the Cloudflare AI Challenge.
What I Built
Voice Journal is a journal app which transcribes your natural speech to what I call a voice note 🔉. Apart from this, it also stores your voice recordings for playback ⏯️.
It has built-in AI chat integration, so you can ask anything about your notes, like list the activities I did today. You can also generate a concise summary of your voice note.
Demo
Here, try the live version of Voice Journal deployed on Cloudflare Pages and AI Workers.
My Code
Here’s the GitHub repository
The project is structured into two directories: web for frontend and worker for AI workers.
Journey
It was my first time using Cloudflare AI Workers, and I had an amazing developer experience working with it. I started building the app with AI Workers and used Hono as a routing library.
Apart from this, I was getting an error while working with the whisper
model via the REST API; thus, later, I tried the same but with Cloudflare AI Bindings for Workers and it worked.
Thus, I had to deploy the workers separately for the same issue.
Build the frontend with Next.js, Tailwind CSS and shadcn components, integrated the AI Worker, and finally deploy it on Cloudflare Pages.
I have integrated three AI models, here’s the detail for each:
The whisper model is used for generating transcripts from audio; it takes an array buffer converted to an 8-bit unsigned integer for audio processing.
For text-generation, integrated mistral-7b-instruct-v0.1 model.
To provide a summary of voice notes, the bart-large-cnn model is used, which tries to summarize the recorded voice notes. I think it needs more fine-tuning for better results.
The most challenging part of this whole building process was handling the audio data format and how to process it, as it was my first time dealing with it.
The future plan is to connect Cloudflare R2 object storage to store the audio files and fine-tune the summarization model.
Multiple Models and/or Triple Task Types
The Voice Journal utilizes three models for performing the tasks:
- whisper for speech to text conversion
- mistral-7b-instruct-v0.1 for text generation
- bart-large-cnn to summarize the notes
Top comments (4)
The AI isn't really necessary honestly. It's like AI for the sake of AI. I asked it how many notes I made, and it gave me a rambly answer and an estimate of 30 to 40 notes. There were 5. That was kinda funny.
Currently, it acts as an AI chatbot, because I have not provided the AI model with context of user's notes. Still working on that...
Edit: It's fixed now.
This sounds interesting, waiting for fine tune
Thanks, I also think that the summarization model is in beta, that's why the results are all alright.