mindy

Posted on Mar 25

The Gemini Live Agent Challenge Hackathon: SparkWake Project Review

#geminiliveagentchallenge #googlecloud

Hello.

Today, I am writing a review of my participation in The Gemini Live Agent Challenge hackathon, as of March 25, 2026. Before I forget, I wanted to document the project I submitted and, in particular, the AI tools I utilized during the development process.

Table of Contents

AI Tools Used
Introduction to the Hackathon
Project Introduction: SparkWake
Review and Future Plans

1. AI Tools Used

One of the goals of this project was to maximize productivity by leveraging as many available AI tools as possible. I hope to write separate posts about my experiences using the AI tools listed below.

UI & Design: Google Stitch
https://stitch.withgoogle.com/

Reason for choosing: As a tool in the Google AI ecosystem, the biggest advantage was being able to design the UI for free using the Gemini 3 model.
Key features: It includes Figma export, MCP support, and code copy functions, making it easy to modify detailed components.

General Queries & Text Refinement: Gemini

Reason for choosing: I use it well both at home and at work. I used it for writing scripts, drafting blog posts, and general Q&A. It served as a reliable assistant.

Image Generation: Nano Banana

Reason for choosing: I had used it in other projects before, and it is the easiest way to generate images.

AI IDE: Kiro
https://kiro.dev/

Reason for choosing: After using it recently via ai-dlc, I felt a dramatic improvement in code quality. I was familiar with it because it was covered in an AWS-related session, and the cute character was a plus. It has gotten much smarter, has lots of good features, and gives free credits. I also use it at work.
Usage: I used the Steering, Agent, and MCP features. In particular, I connected Google Cloud documentation to MCP so it would always design the infrastructure based on the latest information (March 2026). I used it heavily, consuming about 1,500 credits over 20 days.
Note: Next time, I am thinking of trying Antigravity! I want to experience Google's AI ecosystem fully: Stitch -> Antigravity -> Gemini, etc.
Note: This time I tried utilizing steering, agent, and mcp. However, the agent didn't always invoke exactly when I wanted it to, and just because I wrote something in steering didn't mean it always followed it (though it mostly did).
Note: The best thing I did was explicitly commanding it to base all information on the latest data (March 2026) and to use the MCP Google Cloud docs when designing infrastructure. Even with this specified, it sometimes defaulted to Gemini 2.0 or said, "Sorry, I guessed! Let me check the documentation!" which was a bit flustering. Still, when I gave it a task, it would search the docs on its own and say, "The latest documentation says this. I will modify it," which made me think, "Ah, this is a real agent. So smart." I was highly satisfied.

AI Model: Claude Opus 4.5

Reason for choosing: Both the AI community and my coworker who uses Kiro recommended Opus. I liked that the answers were compact and strictly to the point. If GPT flatters you, Sonnet talks too much, and Gemini feels like a brilliant but socially awkward IT junior, Opus feels like a real human... It was my first time seeing an AI literally answer ">Yes!<" to a question.

API:
gemini-2.5-flash-native-audio-preview

Reason for usage: Used to implement essential hackathon requirements like real-time conversation, barge-in (interruptions), and camera video analysis. gemini-3-flash-preview
Reason for usage: Used for text-based analysis, such as weekly reports and daily summaries. I chose the Flash model over Pro because I needed fast, near real-time response speeds. It was fast, cheap, and perfect for the purpose.

Methodology: AI-DLC
https://github.com/awslabs/aidlc-workflows

Reason for choosing: I learned it before and liked it! It felt like putting up a safety fence for Kiro. But whenever it slips through, you have to patch the fence up.
Note: AI-DLC is incredibly powerful during the design phase, but it gets a bit ambiguous during the additional modification phase. It's not realistic to leave a decision document for every tiny minor fix. I need to figure out a lighter, more efficient management method. What other good methods are out there?
Note: The steering file was inefficiently long. Even though the steering file itself doesn't use much context usage, I think it ultimately impacts things since it affects every single conversation or task. I heard people who are good at this generate them dynamically for each project... How should I do that? -- Note: After speaking with an expert, I learned that rather than repeating instructions, it's better to use Markdown structures like #, ##, ### and explicitly say "Refer to #3.1 ##security of ~ document." This is much better for both saving tokens and instruction clarity. If I do a major overhaul of the steering file next time, I plan to apply this method. I wonder if other AI tools have something similar to steering.
Note regarding variables: This time, Kiro had a particularly hard time finding Firebase-related settings. I'm not sure if the GitHub settings got tangled or if it's simply a problem with the search scope, but if it's the latter, I'll need to explicitly tell it exactly which document those values are in next time.

elevenlabs: AI English Voice

Reason for usage: Because real-time conversation and barge-in were mandatory, I had to interact with the project in English. As a non-native speaker, I needed a voice to replace mine. I needed a natural English voice, so I selected and used the Siren (for narration) and Anika (for conversation replacement) voices.
Usage method: I'm not sure if it was initial tokens or monthly tokens, but I was given 10,000 credits, which deducted based on character count. In voice exploration, you can choose language, accent, category, gender, etc. I filtered by English / Social Media / Female and picked my two favorites.

CodeRabbit
https://www.coderabbit.ai/

Reason for usage: First of all... the name is cute! I introduced it for security and code reviews. Why CodeRabbit specifically? I had briefly heard about it in a presentation before. After using it, well, it takes some time and development effort, but I liked it because it felt like it caught the things I missed. It's free for 14 days + public repos.

1-1. Non-AI Tools Used

(Listed here if the tool supports AI but I didn't explicitly use those features)

Figma (not AI)
https://www.figma.com

Reason for usage: I saw that the Figma MCP could be linked with Kiro, so I wanted to try out the MCP + it's the most famous UI tool, so I wanted to use it alongside Stitch. Figma was harder than development... However, I hit some sort of limit and couldn't use it well in practice. Next time, just using Stitch should be enough. Or I'd like to try Figma Make.

CapCut
https://www.capcut.com/ko-kr/

Reason for usage: Used for video editing. When I searched for Mac video editing, this was the most highly recommended tool. Note: Video editing was much harder than I thought and took a lot of time. Filming the video, checking it, cutting it, adding voice, adding dubbing, adding music, checking the flow. It wasn't easy. It is free, but for the first n days, you are granted a Premium trial (which allows AI usage).

OBS
https://obsproject.com/

Reason for usage: Used for video/audio recording. I heard this tool is widely used to record both screen and audio on a Mac. If not this, I was going to use Zoom to record.

Google Cloud (Hackathon Prerequisite)
https://console.cloud.google.com/firestore/databases

Firestore: NoSQL database. Opus recommended it as the perfect fit for me. Looking at the data structure, it felt like JSON. I want to study it more later! This opportunity helped me slightly improve my understanding of NoSQL. From what I knew, the Fire~ series runs differently from Google Cloud (I once accidentally detached my Google Cloud account from billing while trying to create a Firebase project;;), so it was fascinating that there is a way to link and create Firestore directly within Google Cloud.

https://console.firebase.google.com/?hl=ko

Firebase: Used for user authentication and frontend hosting. It was my first time using the Firebase ecosystem, and I think I can use this as a starting point for drill-down learning. The features provided are so vast that I'm looking forward to using it more in the future.

https://console.cloud.google.com/

Cloud Run / Function: Backend API server. I spun up a Google Cloud project just to do this project. It seems the free tier lasts for 90 days, so I should use it diligently. Others: Secret Manager, Artifact Registry, Cloud Scheduler, Cloud Storage, IaC approach (I used Terraform).

YouTube
https://www.youtube.com/watch?v=QnaA6KU3_28&feature=youtu.be

Reason for usage: I had to upload a demo video to YouTube to participate in the hackathon.

GitHub

Because I actively used the AI IDE Kiro for this project, management through GitHub became even more crucial. Since the AI was generating the code, I made sure it strictly adhered to convention rules, and specifically set it up so that it always went through a Security check before committing or pushing.

Devpost
https://geminiliveagentchallenge.devpost.com/

It seems to be a platform where global hackathons are actively held. Taking this as an opportunity, I want to challenge myself with more hackathons in the future. Actually, I was also eyeing the 'Amazon Nova Challenge', but sadly couldn't participate because the deadlines overlapped.

2. Introduction to the Hackathon

The Gemini Live Agent Challenge is a hackathon that utilizes the Gemini Live API to go beyond simple text input/output to implement an AI that recognizes and reacts to video and audio. The topic I had been wanting to work on happened to fit perfectly with a live agent, so I participated in the Live Agents category. I spent about 20 days on it (March 2 -> March 16).

The judging criteria were as follows:

Interruption Handling (barge-in) - Can the user interrupt the AI while it's speaking?
Distinct Persona/Voice - Does the AI have a unique character/voice?
Real-time Audio - Real-time voice conversation.
Real-time Vision - Real-time video recognition.

Project Introduction: SparkWake I built SparkWake, the AI miracle morning routine coach app I’ve always wanted to make.

When thinking about "What kind of app do I want to build?", I prefer "highly focused apps" that target a specific purpose. For example, not just a routine app, but a morning routine app; not general plant care, but plant watering. I’ve always thought that integrating Gemini and the Google ecosystem would create great synergy, so I absolutely wanted to use Gemini for this project. The hackathon timing was perfect, so I built it! Plus, I had never participated in a GCP hackathon before and really wanted to try.

The goals of this project were:

Use absolutely every AI tool available to execute the project.
Use all the technologies I've been wanting to try for feature implementation.
Just do whatever I want.
And the result! It looks like this ^.^

The core features are as follows:

Real-time Voice Coaching: The AI guides you through your routine with its voice and interacts with you.
Barge-in Support: Responds naturally even if the user interrupts while the AI is talking.
Video Authentication: Automatically verifies routine completion by recognizing specific user actions (e.g., waving) through the camera.
Tool Calling: If you give a command like "Play a yoga video," the AI directly launches YouTube.
Personalized Report: Provides an AI-generated, customized coaching message along with an analysis of your completion rate.

One-line summary: I used the Gemini Live API to create an AI coach app that does your hard-to-maintain morning routine with you in real-time via audio and video.

Challenge Page: https://geminiliveagentchallenge.devpost.com/
My Challenge Page: https://devpost.com/software/sparkwake
Challenge Project Gallery: https://geminiliveagentchallenge.devpost.com/project-gallery

4. Review & Future Plans

Doing the planning, UI, development, filming, video editing, dubbing, testing, and posting all by myself was incredibly tough. I didn't realize it before because I always worked with teammates... Nevertheless, I think the sheer fact that I was able to do all of this alone proves that you can achieve sufficiently high productivity solo by actively leveraging AI tools.

It was a project, topic, and technology that I really desperately wanted to try, so I had a lot of fun doing it. There was a weekend where I worked for 13 hours straight in one day, and I worked on it every single day after getting off work. I truly experienced the joy and power of flow. It was a blast.

While building it, I reaffirmed that I have a strong inclination as a "Builder" who creates things from scratch, and that I am a person who finds the most joy when building.

Here is what I want to improve in the future:

Token Reduction: I heard English saves almost half the tokens compared to Korean. When I do token optimization (?) later, I have to decide: should I change all comments and guides to English / let it think in English but only answer in Korean / switch completely to English to trade translation for token savings / or just stick with Korean and trade my money for it. However, thinking about how I would have used far fewer tokens and probably gotten better quality if I had used English does make me feel a little bit resentful.
Smoother Session Transitions: How can I make it flow more seamlessly between sessions? When Kiro 2 tries to pick up what Kiro 1 was doing, how can I give it better context? In the case of this hackathon, it kept trying to use web tts instead of the adk. Firmly establishing the purpose/rules by saying "This is a hackathon" fixed it.
Efficient Use of AI Coding Tools: Kiro has many features, and I'm sure there's a better way to utilize Opus as well.
Real-world App Usage: I want to keep using it personally, so I will officially launch the app.
Operations: I want to experience everything required for operations, including daily active users / logging / monitoring / devops agent / Slack integration (?)! Usually, hackathons or side projects end once they are built, but I really want to make this my product and gain operational experience.

That is all.
Thank you for reading this long post.

Best, mindy.