Building SgStudyPal: Multimodal AI Tutoring with Gemini 2.5 Flash & Google Cloud Run

#ai #gemini #googlecloud #showdev

For the Gemini Live Agent Challenge, I wanted to solve a real-world problem: making homework engaging and accessible for students in Singapore. The result is SgStudyPal, an AI-powered tutoring platform that utilizes both real-time voice and multimodal image recognition to act as a personalized tutor.

Here is a breakdown of how I built the infrastructure using Google's ecosystem.

The Tech Stack

Frontend: Next.js 14 (App Router), Tailwind CSS
Backend: Node.js, Vercel AI SDK
AI Models: Google Gemini 2.5 Flash (Multimodal) & Gemini Live (WebSockets)
Auth & DB: Firebase Authentication & Firestore
Infrastructure: Google Cloud Run (Docker)

Architecture & Google Cloud Integration

To ensure a highly scalable, stateful environment, I opted out of standard serverless edge functions and built a strict Multi-Stage Docker container deployed on Google Cloud Run.

Multimodal Homework Help: Students can upload a photo of a complex math worksheet. The Next.js backend parses the image into a binary buffer and streams it securely to the gemini-2.5-flash model. By utilizing strict prompt engineering, the AI bypasses standard chat pleasantries and immediately breaks down the visual math problem step-by-step.
Real-Time Video Tutor (Gwen): Using WebSockets, the app connects directly to the Gemini Live API, allowing students to have fluid, interruptible, low-latency audio/video conversations with their AI tutor.

Overcoming Deployment Challenges

Deploying a Turborepo Next.js app to Cloud Run requires careful environment variable management. To ensure Firebase client variables (NEXT_PUBLIC_) were securely baked into the production bundle while keeping Gemini API keys strictly isolated as runtime secrets, I implemented a 3-stage Docker build process (deps, builder, runner).

By combining the low latency of Google Cloud Run with the incredible multimodal capabilities of Gemini 2.5 Flash, SgStudyPal transforms static worksheets into interactive, real-time learning experiences.

DEV Community

Building SgStudyPal: Multimodal AI Tutoring with Gemini 2.5 Flash & Google Cloud Run

The Tech Stack

Architecture & Google Cloud Integration

Overcoming Deployment Challenges

Top comments (0)