Khôi Nguyễn Vũ

Posted on Mar 3

Code, AI, and Family: How I Used Google Gemini to Build for the People I Love

#devchallenge #geminireflections #gemini #iot

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

My journey with Google Gemini has been all about two awesome projects: one that fixed a huge, exhausting headache for someone I love, and another that I'm working on right now to solve a real-world, everyday problem for my family! Both of these projects really showed me how AI is becoming way more than just a fun chat box to play with-it's a serious, incredibly powerful tool for building cool things that actually change how people work.

Project 1: The Meeting Transcribe App
Project 2: IoT Voice Calculator
What I Learned
Feedback & Conclusion

Project 1: The Meeting Transcribe App (Completed)

I built a long-audio transcription app to help out my girlfriend. She works in market research and spends hours upon hours interviewing people. Sometimes a single deep-dive session can last for 2 to 3 hours straight! To make her research reports rock-solid and trustworthy, she really needs accurate transcripts. If you have ever tried to manually type out a 3-hour conversation, you know it can easily take 9 or 10 hours of pausing, rewinding, and typing.

The catch? Real people don't talk like news anchors. They naturally speak with heavy local accents, use everyday slang, drop subjects from their sentences, and definitely don't speak in perfect grammar. We tried popular tools like YouTube auto-subs, Fireflies AI, and Whisper. They're totally great for English, but they really struggled to make sense of natural, fast-paced conversational Vietnamese. Paying for premium transcription services was super expensive, and honestly, it only gave us about 80-90% accuracy. That meant she still had to spend hours staring at a screen, fixing silly mistakes.

So, using Google AI Studio, I built a custom React app specifically tailored for her. I initially hosted it on Google Cloud Run, but paying for a production API key for a custom personal project started to add up way too fast. My workaround? I just shared the AI Studio project directly with her and taught her how to run it "locally" on her own account using the generous free tier! It was incredibly empowering to show a non-developer how to interact directly with the AI environment. It still processes everything beautifully. The result? An amazing ~95% accuracy for Vietnamese! It turns a messy, rough draft into a nearly finished transcript, saving her so much time and mental energy. Her research project is done now, but she will definitely use this setup for future projects whenever the need pops up. Moving forward, if her teammates ever need it, I’ll redeploy the app to Google Cloud Run to turn it into a proper production project.

Project 2: IoT Voice Calculator for Vietnamese Grocery Stores (Brainstorming/Next Steps)

For my next project, I'm using Gemini 3.0 to mix software with hardware, and it is actually designed for my mom! She runs a small, traditional grocery store (called a tạp hóa) in our hometown in a normal province in Vietnam (not a massive, tech-heavy city like Ha Noi or Ho Chi Minh).

For low-tech folks like my mom, the idea of setting up complex supermarket barcode scanners, inventory databases, and point-of-sale cash registers is way too complicated. It just adds too much workload to her day. A typical tạp hóa is super chaotic: motorbikes honking right outside, fans spinning loudly in the heat, and a bunch of customers pointing and yelling out their orders all at once. In that kind of environment, most of the time she just does mental math. If a customer buys a ton of stuff, she scrambles to write it all down on a scrap of paper to calculate it by hand while bagging the items. That's exactly how most small sellers operate in our hometown, and it is super stressful!

I'm designing a tiny, battery-powered device using an ESP32 microcontroller to fix this. The idea is simple: she doesn't have to touch anything. She just speaks naturally while working, like, "Hai chai nước tương, một ký đường" (Two bottles of soy sauce, one kilo of sugar). The device acts as an ear, sending the audio straight to my server, and Gemini acts as the brain. It understands the spoken Vietnamese, matches the colloquial item names to her store's database, and calculates the exact total in a flash! I'm also building a backend dashboard, but it's purely for me to manage things from afar. I'll use it to onboard new devices, instantly lock a device if it ever gets robbed, check how often she is using it, and manually add new local slang so the AI gets smarter over time. If it works well for her and makes her days easier, I'm going to let other local sellers in the neighborhood try it out too!

Demo

Check out the Transcribe App Prompt

What I Learned

Building the Transcribe app and mapping out the IoT device taught me some really valuable, hard-earned lessons:

Designing for Real People: I learned that high-tech solutions (like those fancy barcode scanners) completely fail if they don't match the user's actual tech level. Building an app for my mom means the UI has to be completely invisible. Buttons are intimidating; voice is natural. She shouldn't have to learn a new operating system or change how she works; she just needs to talk out loud and get the correct answer back.
The Challenge of Local Dialects: Vietnamese isn't just one standard sound—we have so many incredibly distinct regional accents from the North, Central, and South! Because Vietnamese isn't as globally dominant as English in AI training data, I realized that teaching the AI to perfectly recognize voice and calculate money across different local provinces will take time, patience, and constant tweaking with local slang. It is a long-term learning process for both me and the model.
Handling Large Files: I learned the right way to deal with huge files using the Gemini File API flow. At first, I thought about passing the audio as Base64 strings directly in the JSON payload. I quickly realized that a 3-hour audio file can easily be hundreds of megabytes, and sending that much data at once is a fast way to crash a browser or instantly time out a server! Staging it securely through the File API made the app infinitely more stable and scalable.
Keeping it Simple: I realized that getting perfect, down-to-the-millisecond timestamps is super hard when people talk passionately over each other. But I learned that my girlfriend's market research cares way more about what was said, not the exact millisecond it happened. Accepting "approximate time" (like grouping text into 30-second blocks) saved me a massive ton of engineering headaches while still being exactly what she needed for her reports!

Google Gemini Feedback

What went great:

I used Gemini 3.0 Pro to architect and build the code, and Gemini 3.0 Flash to rapidly process the heavy audio. The massive context window and the generous free tier in Google AI Studio made this entire journey possible for a solo developer like me. When my Cloud Run API costs got too high, being able to simply transition my girlfriend to use the AI Studio project locally on her own free-tier account was an absolute lifesaver! As the Google team keeps improving the Gemini API over time, I know this tool is only going to get faster and smarter for her future research.

The Vietnamese language support is honestly out of this world! It handles local phrases, dropped pronouns, and thick accents way better than other dedicated tools I've tried. Plus, having AI Studio auto-generate the starter React code let me focus my weekends purely on the fun API logic and architecture. The UI that Gemini generated right out of the box was so awesome and modern-I didn't even have to waste time tweaking CSS or fighting with buttons!

Where things got bumpy (The bad and the ugly):

It wasn't all smooth sailing, though! Pushing the limits of audio processing definitely led to some speed bumps:

Click to see the technical hurdles I faced

Cloud Run Proxy Headaches: Using a template backend on Cloud Run was great at first, but serverless architectures really hate long-running tasks. I ran into major timeout limits when handling those massive 3-hour audio uploads. I had to heavily tweak the backend to safely stream the files without dropping the connection before Gemini could even start its job.
Audio Hallucinations & Parallel Processing: Even with Gemini's huge context window, feeding a full 3-hour audio file straight into the model sometimes caused it to skip important parts or, hilariously, completely make up new conversations near the end of the file! My fix was to write a script that chops the massive audio file into smaller, bite-sized 10-20 minute pieces. I then process those smaller files in parallel to drastically increase the response time, and finally stitch the text back together perfectly at the end.
Talking Over Each Other: When multiple people get excited during an interview and talk at the exact same time, it is super hard to tell who is who. The model still gets a bit confused when voices completely blend together, which is a tough diarization problem I'm still looking to improve.

Despite the hiccups, the intense challenges I overcame building the Transcribe app gave me the exact confidence I need to start building my ESP32 grocery assistant. The spark has definitely been lit, and I can't wait to bring Gemini out of the browser and into the physical world to help out my mom!

A Huge Thank You!

Before I sign off, I just want to send a massive thank you to the Google AI team for building AI Studio and Gemini. Being a solo, personal developer, these tools have given me the superpower to actually build and ship real-world solutions that directly help the people I love around me.

Surprise! This entire blog post was actually written and refined using the new Gemini chat with Canvas tool! As a non-native English speaker, I'm amazed at how these tools helped me articulate my ideas. What do you think of the result? AI is such a powerful ally for developers worldwide, ensuring that language isn't a barrier to bringing our unique projects to the global stage.

Let me know your thoughts in the comments!

DEV Community