TL;DR
Talk about dot-transcriber, a side-(side-project) to be able to catch up on the travel time for my work. The ability to process voice messages and rewrite them as notes and get an AI second-brain.
Github: https://github.com/ffex/dot-transcriber
My needs
I was late preparing my talk for Rustboy (if you like gameboy check it!) at Fosdem, so I wanted to dedicate all my available time to finishing it. Can I also use the travel time to go to work? Was it possible to make that work?
I definitely did not invent anything. I created an agent that turns an audio message (with the typical pauses or divergences of a creative mind process!) into structured notes (Obsidian style, which I love).
The goal is also to run it locally using Whisper and Ollama. Here’s the current status:
- Whisper for transcription
- Ollama (I need to find a model I’m satisfied with) for correction
- Corrected text is fed back into Ollama, which checks for relevance to previous notes so they can be linked.
- Communication via Telegram
Clearly, this is all vibecoded because if I didn’t even have time for Rustboy, I certainly didn’t have time for this other project!
What I got
A Telegram bot where I can send audio and receive linked notes as output, and also updated in case an update is needed. It works well, although the transcriptions are imprecise and the summaries are sometimes not very accurate.
What I learned
- I learned that with precise, bulleted prompts, the vibe coding is incredible and the output far exceeds the code I could have written.
- Sometimes the code, on the other hand, is underperforming or wrong!
- Reading all this code is mentally destructive.
- I learned that this is what the future is, even if some of our work is ripped away from us.
What scared me.
- What am I needed for? Only to orchestrate?
- How do I intervene in the code if I haven’t read it?
- When I have so much code to read, how could I be in it without having read it?
- When getting down to the details, how can I become relevant to not just do prompt try-catching?
What needs to be improved
Things to improve:
- Inaccurate local transcription
- Update of notes to keep more track of updates
- Logical confusion about how notes are connected
- In some notes, there is sometimes extreme syntax
- Better integrate my view of Claude Code’s work. How to do that? How to simplify reading the generated code?
The future?
The idea is just to make it a useful tool for marking notes via audio message for all those times when you can’t do anything but use your voice (driving, walking, or while playing sports?).
It should be part of a much larger personal assistant, but for now, let’s stick to its task:
- transcribing audio and getting various types of processing:
- writing notes / wikis
- writing tasks or structured projects
- writing articles
- other…
- The initial idea actually originated also as “D&D Master Solo,” but that is another story.
Other consideration
It is a project born before OpenClaw 🙃.
Conclusion
I write this article to search a dialogue about coding agent, how do you use it? how to learn about the code that it writes? idk!
Top comments (0)