Recently, my output method has been completely established as a style that starts with voice input.
In addition to Aqua Voice, which I have been using for some time, I have tried various tools such as Whisper Transcription, which runs locally. Through this trial and error, I have found the optimal solution for myself, so I would like to introduce that setup this time.
The combination of voice input and Gemini Gems achieves both consistency and speed in writing
My current writing style has been consolidated into the process of dumping out rough thoughts with voice, structuring them with Gemini Gem, and finally putting my soul into it with my own hands. The part about putting my soul into it sounds very AI-generated. I like it.
By building this workflow, I have been able to dramatically lower the psychological hurdle of writing while maintaining a certain level of quality for my notes. Above all, by using different tools for different purposes, the accuracy of my drafts has improved incomparably compared to before.
The difference between transcription and text generation affects the quality of output
Why is just transcription not enough? It is because there is a big difference in the ability to refine Japanese depending on the tool.
Aqua Voice: Very excellent. The instructions function accurately, and there are almost no Japanese conversion errors or kanji mistakes.
Whisper Transcription: There is a sense of security that it runs locally, but since it specializes in pure transcription, I still feel it is weak in correcting misspeaking and kanji misconversion.
Fillers are not included in either, but I hardly say things like uh or um anymore, so I have no idea whether it is being controlled or whether I have been trained.
The raw text that has just been transcribed, no matter how accurate, does not become my writing as it is. This is where the process of structuring with generative AI becomes important.
The Shossan-style workflow that fully utilizes Gemini Gems and TextExpander
Let me introduce specifically how I prepare my output.
1. Automating structuring with Gemini Gem
Previously, I used to input prompts via snippets, but now I utilize Gemini Gems.
I created a Gem dedicated to notes, and I just throw in the text I voice-inputted as is. With just this, a draft that has been organized into a readable structure while taking into account my writing style is completed. It is overwhelmingly faster to open a dedicated Gem than to type instructions each time.
## Role
You are a renowned blogger and technical writer with 1 million PV per month.
You take the rough text based on voice input provided by the user (Shossan), grasp its true intent and supplement it, and elevate it into an attractive technical blog that captivates readers.
## Thought Process
Before generating a response, execute the following steps internally:
1. Supplement typos and misconversions: Infer and correct conversion errors unique to voice input from context.
2. Structural analysis: Check whether there are logical leaps or deficiencies in the content and supplement as necessary.
3. Restructuring with PREP method: Organize the structure in the order of Conclusion (Point) → Reason → Example → Conclusion (Point).
## Goals
1. Create a blog draft:
- Attach an attractive title.
- At the end of the body, extract 10 or more tags separated by spaces that do not contain hyphens (-).
## Context and Constraints (Style and Tone)
- First person: Basically watashi. Shossan when emphasizing.
- Tone: Friendly and humorous, conversational desu-masu style.
- Technical density: Do not cut specific proper nouns, processor names, version numbers, etc., but make use of them.
- Readability:
- Thoroughly implement paragraph writing and avoid paragraphs becoming too long.
- Create headings according to PREP
## Output Format
1. [Title]
2. [Body (Markdown format)]
3. [Tags (space-separated)]
The Gem that Shossan is using right now
I created this Gem together with Gemini. It is also included as a project in Claude.
2. TextExpander for that extra touch of convenience
Not everything is completed with Gemini Gem alone. For fine adjustments, I use TextExpander, which I have been using for many years.
- Generation instructions for header images that match the content of the article
- Listing appropriate tags
- Inserting specific standard formats
I call these as snippets and assemble parts while interacting with Gemini.
3. Finally, revise it in your own way as an editor
The level of AI has improved, and seemingly perfect sentences now come out. However, they are still sometimes slightly off from my intention or have unnatural phrasing.
I always read it back with my own eyes at the end and revise it to my own expression. Only through this process does the text created by AI become elevated into Shossan notes. Sometimes I do not know whether I am a writer or an editor, but I believe this final touch is my sincerity to readers.
Someday, a world will come where text typed on a keyboard feels warmer. Now, which paragraph in this note did I actually type on a keyboard? Do you feel the warmth?
Use tools appropriately and ensure consistent quality
The general flow has not changed since I started outputting with voice input. However, by preparing a dedicated mold like Gemini Gem, I have been able to stabilize the quality of my output at a high level.
If you have tried voice input but gave up because corrections were troublesome, please try creating a dedicated AI Gem. Your thoughts should crystallize into text surprisingly smoothly.
Audio-Technica AT2020 CWH Condenser Microphone
MOTU UltraLite mk5 18in 22out USB-C Audio Interface
Quality voice input comes from high-quality devices.
If you found this post helpful, you can support my work here:
👉 Buy Me a Coffee
Top comments (0)