Rashika Karki

Posted on Nov 8, 2025

Building LearnForge: Multi-Agent AI Learning Platform on Cloud Run with Google ADK

#cloudrunhackathon #serverless #agents

Note: I wrote this article for the Google Cloud Run Hackathon 2025.

I built LearnForge, an AI-powered learning platform that does something no other platform does: it has a conversation with you, figures out what you actually want to learn (not what you think you want), researches the topic in real-time, and then creates a personalized learning journey that adapts to how you learn.

Built on Google Cloud Run and Agent Development Kit (ADK) with 12 specialized AI agents working together, it handles everything from vague "I want to learn AI" statements to structured, adaptive learning missions that remember where you left off—even if you come back weeks later.

The result? A learning experience that feels like having a personal tutor who never forgets, never gets tired, and actually knows what they're talking about.

The Problem: Why Online Learning is Hard (And How I Fixed It)

Let me paint you a picture. You're sitting at your computer, motivated, ready to learn something new. You type "learn Python" into a course platform. You get 47 courses. You pick one. Three hours later, you realize:

You're learning about data types when you wanted to build web APIs
The instructor assumes you know nothing, but you've coded in JavaScript for years
The examples are boring and don't relate to what you actually want to build
You're on lesson 12 of 200, and you have no idea if you're learning the right things

Sound familiar? This is the fundamental problem with online learning: one-size-fits-all doesn't fit anyone.

The Real Problem

Here's what's broken:

1. The Goal Mismatch

When someone says "I want to learn AI," they could mean:

Building chatbots for customer service
Fine-tuning language models for text summarization
Understanding neural network architectures
Creating computer vision systems for medical imaging

But traditional platforms give everyone the same "Introduction to AI" course. It's like asking "Where do you want to go?" and getting the same map regardless of whether you said "the grocery store" or "Mount Everest."

2. The Experience Gap

A beginner needs hand-holding. An intermediate learner needs to skip basics. An expert needs advanced concepts. But courses are static, they can't adapt.

3. The Context Loss

You start a course on Monday. Life happens. You come back on Friday. The platform has no idea:

What you were learning
How well you understood it
Where you left off
What you were confused about

4. The Content Problem

Most courses are pre-recorded, static content. But learning is dynamic. You might need:

A different explanation when you're confused
Real-world examples relevant to your goals
Video tutorials when you're a visual learner
Step-by-step guides when you prefer structured learning

The Solution: AI That Actually Understands

What if, instead of picking from a catalog, you could just... talk to an AI?

You: "I want to learn AI"

AI: "Great! I checked what's trending in AI learning. Most people start with either model training, data pipelines, or building AI-powered applications. Which direction do you want to focus on?"

You: "Building applications, I think"

AI: "Got it. Are you thinking of chatbots, image generators, or recommendation systems?"

You: "Chatbots"

AI: "Perfect! So you want to learn AI chatbot development, at an intermediate level, focusing on building real applications. Does that sound right?"

And then here's the magic the AI:

Researches the topic using Google Search
Creates a personalized learning mission with checkpoints
Guides you through it conversationally
Adapts content based on your responses
Remembers everything, even if you come back weeks later

This isn't a chatbot. This is a multi-agent AI system that coordinates 12 specialized agents to create a complete learning experience.

How It Works: The Magic Behind the Scenes

LearnForge uses Google's Agent Development Kit (ADK) to orchestrate 12 specialized AI agents. Think of it like a well-coordinated team where each agent has a specific job, but to the user, it feels like talking to one intelligent tutor.

Phase 1: Mission Creation (Meet Polaris)

When you first connect, you meet Polaris, the Pathfinder. Polaris doesn't just ask questions it researches your topic in real-time to ask intelligent, informed questions.

You: "I want to learn machine learning"

Polaris: [Searches Google for "machine learning learning paths 2025"]
         [Finds that most people focus on: model training, data prep, or deployment]

         "When people explore 'machine learning,' they often focus on 
         model training, data prep, or deployment. Which of these do 
         you want to master first?"

Behind the scenes, Polaris uses:

Pathfinder Agent: Conversational goal clarification with research
Search Agent: Google Search API for real-time topic research
Mission Curator Agent: Converts your goals into structured learning missions

The result? Instead of a generic "Machine Learning 101" course, you get a mission tailored to your specific goal: "Building Production ML Systems with Python and TensorFlow" or "Data Preparation for Machine Learning Models."

Phase 2: Learning Execution (Meet Lumina)

Once your mission is created, Lumina takes over. Lumina is your personal learning companion patient, adaptive, and genuinely helpful.

Lumina guides you through checkpoints (bite-sized learning goals) using a sophisticated multi-agent system:

Lumina Orchestrator (invisible coordinator)
    ├── Greeter: "Welcome! Let's start your journey..."
    ├── Flow Briefer: "Next up: Understanding Neural Networks. Ready?"
    ├── Sensei (your teacher)
    │   ├── Content Composer
    │   │   ├── Content Searcher: Finds educational articles via Google Search
    │   │   ├── Video Selector: Curates YouTube videos (4-20 min, educational)
    │   │   └── Content Formatter: Personalizes content for your learning style
    │   └── Evaluates your understanding
    ├── Help Desk: Answers off-topic questions
    └── Wrapper: Celebrates your completion

Here's what makes this special:

1. Content is Generated in Real-Time

When Sensei teaches you about "neural networks," it doesn't pull from a static database. Instead:

Content Searcher finds the latest, most relevant articles
Video Selector curates educational YouTube videos (filtered by duration, category, relevance)
Content Formatter adapts everything to your learning style (visual? examples? step-by-step?)

2. It Adapts to Your Understanding

Sensei: "Can you explain how backpropagation works?"

You: "It's like... adjusting weights based on errors?"

Sensei: "You're on track with the error part! Let me clarify the 
        weight adjustment mechanism..."
        [Delegates to Content Composer for a clearer explanation]
        [Presents it naturally, as if Sensei knew it all along]

3. It Remembers Everything

This is where it gets interesting. Most chatbots lose context when you close the browser. LearnForge uses Cloud SQL with DatabaseSessionService to persist everything:

Which checkpoint you're on
What content was presented
How you responded to questions
What you were confused about
Your learning preferences

Close your browser, come back next week, switch devices it all just works.

The Architecture: How 12 Agents Work Together Seamlessly

The technical magic is in the orchestration. LearnForge uses Google ADK's hierarchical agent system to coordinate specialized agents without the user ever knowing.

Silent Orchestration: The Invisible Hand

The orchestrators are completely invisible. Users never see messages like "Let me hand you over to the Sensei..." Instead, transitions are seamless:

# What the user sees:
Sensei: "Let's explore neural networks! Here's how they work..."

# What's actually happening:
Orchestrator → delegates to Sensei → Sensei delegates to Content Composer 
→ Content Composer chains: Searcher → Video Selector → Formatter
→ Content flows back → Sensei presents it naturally

The orchestrator's instruction is explicit:

root_agent = LlmAgent(
    instruction="""
    YOU MUST NEVER TALK TO THE USER DIRECTLY.
    YOU MUST NEVER ACKNOWLEDGE DELEGATIONS.
    The user should ONLY see responses from sub-agents.
    """
)

Content Authority Separation: Why Teaching Agents Don't Generate Content

Here's an insight that improved content quality dramatically: teaching agents shouldn't generate content they should delegate to specialized agents.

sensei_agent = LlmAgent(
    instruction="""
    YOU ARE FORBIDDEN FROM CREATING ANY TEACHING CONTENT.
    You must delegate ALL content creation to content_composer_agent.

    YOU CAN:
    - Ask questions
    - Evaluate answers
    - Provide feedback

    YOU CANNOT:
    - Explain concepts yourself
    - Provide examples yourself
    """
)

Why? Because:

Content Searcher has access to Google Search (real-time, research-backed)
Video Selector has access to YouTube API (curated, filtered)
Content Formatter knows your learning preferences

Sensei focuses on pedagogy. Content creation agents focus on quality. Separation of concerns, even for AI.

The DatabaseSessionService Breakthrough: Why This Changes Everything

Here's where LearnForge diverges from every other AI learning platform I've seen.

The Problem Nobody Talks About

Most AI chatbots use in-memory sessions. This works fine for:

5-minute conversations
Simple Q&A
Demos

But learning is different. Learning sessions can span:

Hours (deep dive sessions)
Days (coming back to continue)
Weeks (long-form courses)
Months (mastery journeys)

In-memory sessions fail catastrophically:

Server restart? Session lost.
Connection drop? Session lost.
Switch devices? Session lost.
Come back tomorrow? Session lost.

The Solution: Persistent State with Cloud SQL

LearnForge uses DatabaseSessionService with Cloud SQL (PostgreSQL) to persist everything:

from google.adk.sessions import DatabaseSessionService
from google.cloud.sql.connector import Connector

connector = Connector(refresh_strategy="LAZY")

session_service = DatabaseSessionService(
    db_url="postgresql+pg8000://",
    creator=lambda: connector.connect(
        instance_connection_name,
        "pg8000",
        user=db_user,
        password=db_password,
        db=db_name,
    ),
    pool_size=10,
    max_overflow=5,
    pool_timeout=60,
    pool_recycle=1800,
)

What gets persisted:

Current checkpoint index
Completed checkpoints
Content search results (so we don't re-search)
Video selections (so we remember what was shown)
User responses and comprehension checks
Learning preferences

The impact:

You can:

Close your browser mid-checkpoint and resume exactly where you left off
Switch from laptop to phone seamlessly
Come back weeks later and continue your mission
Share sessions with team members (collaborative learning)

This isn't just a feature it's what makes LearnForge production-ready for real learning, not just demos.

Cloud SQL Connector: Security Without the Headache

Traditional database access requires:

IP whitelisting (nightmare in serverless)
VPNs (complex setup)
Exposed connection strings (security risk)

Cloud SQL Connector uses IAM credentials. No network configuration. No exposed passwords. Just secure, managed connections.

def _create_cloud_sql_connection(self):
    return self._connector.connect(
        settings.INSTANCE_CONNECTION_NAME,
        "pg8000",
        user=settings.DB_USER,
        password=settings.DB_PASSWORD,
        db=settings.DB_NAME,
    )

Production-grade security, zero configuration.

Real-Time WebSocket: The Conversation That Never Lags

Both Polaris and Lumina use WebSocket connections for real-time, bidirectional communication. But here's what makes it special: session resume.

The Flow

@router.websocket("/ws")
async def mission_ally_websocket(websocket: WebSocket, mission_id: str):
    await websocket.accept()
    user_id = await authenticate(websocket)

    # Check for existing session
    existing_session = await session_service.get_session(
        app_name="mission-ally",
        user_id=user_id,
        session_id=session_id
    )

    if existing_session:
        # Resume from last checkpoint
        current_checkpoint = existing_session.state["current_checkpoint_index"]
        completed = existing_session.state["completed_checkpoints"]
        # Send historical messages, continue from where they left off
    else:
        # Start new mission
        session = await session_service.create_session(...)

If you disconnect and reconnect, the system:

Loads your session from Cloud SQL
Sends you historical messages (so you see the conversation)
Continues from your last checkpoint
Feels completely seamless

No "start over" button. No lost progress. Just... continue.

Content Composition: How AI Curates Your Learning Materials

When Sensei needs to teach you about "neural networks," it doesn't pull from a static database. Instead, it orchestrates a three-stage pipeline:

Stage 1: Content Searcher

Uses Google Search API to find the latest, most relevant educational content:

content_searcher = LlmAgent(
    name="lumina_content_searcher",
    tools=[google_search_tool],
    instruction="Search for educational content about the concept..."
)

Real-time search means you get current information, not outdated course materials.

Stage 2: Video Selector

Uses YouTube Data API v3 to curate educational videos:

def search_youtube_videos(
    query: str,
    max_results: int = 3,
    duration_filter: str = "medium",  # 4-20 minutes (optimal for learning)
    video_category_id: str = "27"  # Education category
) -> list[dict]:
    # Filters by: duration, category, relevance
    # Returns: title, channel, description, duration, thumbnail

Why this matters: Not all YouTube videos are educational. Not all educational videos are the right length. The selector finds videos that are:

Actually educational (category 27)
The right duration (4-20 min is the sweet spot)
Relevant to the concept
From reputable channels

Stage 3: Content Formatter

Personalizes everything based on your learning profile:

content_formatter = LlmAgent(
    instruction=f"""
    Format content for:
    - Learning style: {user_profile['learning_style']}  # ["examples", "step-by-step"]
    - Level: {user_profile['level']}  # "Beginner" | "Intermediate" | "Advanced"
    - Preferences: {user_preferences}
    """
)

If you're a visual learner who prefers examples, you get examples. If you prefer step-by-step guides, you get structured explanations. The content adapts to you.

SequentialAgent: Chaining It All Together

ADK's SequentialAgent makes this elegant:

content_composer = SequentialAgent(
    name="lumina_content_composer_agent",
    sub_agents=[
        content_searcher,    # Stage 1: Search
        video_selector,   # Stage 2: Curate videos
        content_formatter # Stage 3: Personalize
    ]
)

Each stage passes its output to the next. Clean, simple, powerful.

Technical Deep Dives: The Decisions That Matter

1. Why Cloud SQL Connector Over Traditional Connections

The problem: In Cloud Run (serverless), you can't whitelist IPs. Traditional database connections require network configuration.

The solution: Cloud SQL Connector uses IAM credentials. Zero network configuration. Secure by default.

@property
def use_cloud_sql_connector(self) -> bool:
    return self.is_cloud_run and all([
        self.INSTANCE_CONNECTION_NAME,
        self.DB_USER,
        self.DB_PASSWORD,
        self.DB_NAME
    ])

Impact: Production-ready security without the infrastructure headache.

2. Secret Manager: Zero Hardcoded Credentials

All sensitive configuration comes from Secret Manager:

def _read_secret(env_var: str, default: str = "") -> str:
    value = os.getenv(env_var, default)
    # Cloud Run mounts secrets as files
    if value and os.path.exists(value):
        with open(value) as f:
            return f.read().strip()
    return value

Impact: Credential rotation? Just update the secret. No code changes. No redeploys.

3. Multi-Stage Docker Build: Faster Cold Starts

Optimized container images reduce cold start time:

# Builder stage: Install dependencies
FROM python:3.11-slim as builder
RUN pip install poetry
COPY pyproject.toml poetry.lock ./
RUN poetry install --no-root --only main

# Runtime stage: Copy only what's needed
FROM python:3.11-slim
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .

Impact: 60% smaller images, faster cold starts, lower costs.

4. State-Driven Checkpoint Progression

Checkpoints advance through state, not hardcoded logic:

increment_checkpoint_tool = FunctionTool(
    func=lambda ctx: {
        "current_checkpoint_index": ctx.state["current_checkpoint_index"] + 1,
        "current_checkpoint_goal": next_checkpoint_name,
    }
)

Impact: Mission structures are flexible. Add checkpoints? Remove checkpoints? Change order? It just works.

Performance & Scalability: Built for Real Usage

Cloud Run Auto-Scaling

Min instances: 0 (scale to zero = cost savings)
Max instances: 100 (handles traffic spikes)
CPU: 2 vCPU per instance
Memory: 4Gi per instance
Concurrency: 80 requests per instance

What this means: Zero users? Zero cost. 10,000 concurrent learners? Scales automatically. No manual intervention.

Database Connection Pooling

Base pool: 10 connections
Overflow: 5 connections
Timeout: 60 seconds
Recycle: 30 minutes

What this means: Efficient connection management. No connection exhaustion. Handles thousands of concurrent sessions.

Session State Efficiency

Average session state: ~50KB. That's:

Checkpoint progress
Content search results (cached)
Video selections
User responses

Cloud SQL handles this efficiently. Thousands of concurrent sessions? No problem.

Lessons Learned: What I Wish I Knew Earlier

1. DatabaseSessionService Is Non-Negotiable for Production

In-memory sessions work for demos. Production learning platforms need persistence. The moment I switched to DatabaseSessionService, everything changed:

Users could resume sessions
Progress tracked across weeks
Concurrent learners with isolated state

Learning: Choose session storage based on use case duration, not convenience.

2. Silent Orchestration Creates Better UX

Users don't care about agent architecture. They want a seamless conversation. The orchestrator should be invisible:

# WRONG: User sees the machinery
Orchestrator: "Let me hand you over to the Sensei..."
Sensei: "Hello, let's learn..."

# RIGHT: User only sees the teacher
Sensei: "Hello, let's learn..."

Learning: Hide complexity, show simplicity.

3. Content Authority Separation Improves Quality

Teaching agents shouldn't generate content. They should delegate to specialized agents with:

Access to search APIs (real-time content)
Video curation (filtered, relevant)
Personalization (user preferences)

Learning: Separation of concerns applies to AI agents too.

4. Cloud SQL Connector Simplifies Security

No IP whitelisting. No VPNs. No exposed connection strings. Just IAM credentials and secure connections.

Learning: Use managed services for security, not manual configuration.

5. SequentialAgent Reduces Boilerplate

Content composition requires: search → video selection → formatting. SequentialAgent chains these automatically. No manual coordination needed.

Learning: ADK's built-in patterns are powerful. Use them.

6. State-Driven Flow Enables Flexibility

Hardcoded checkpoint logic breaks when mission structures change. State-driven progression adapts automatically.

Learning: Data-driven > code-driven for dynamic systems.

The Impact: What This Actually Solves

For Learners

Personalized learning paths: No more generic courses
Real-time adaptation: Content adjusts to your understanding
Session persistence: Learn at your own pace, resume anytime
Research-backed content: Latest information, not outdated materials
Multi-modal learning: Text + videos + interactive teaching

For Educators

Scalable tutoring: One AI system can teach thousands simultaneously
Adaptive content: Each learner gets personalized materials
Progress tracking: See exactly where learners are stuck
Content curation: AI finds and filters the best resources

For the Industry

Proof that multi-agent AI works: 12 agents coordinating seamlessly
Production-ready patterns: DatabaseSessionService, Cloud SQL Connector, WebSocket resume
Scalable architecture: Cloud Run auto-scaling, connection pooling
Security best practices: Secret Manager, IAM-based connections

What's Next: The Future of AI-Powered Learning

Short-term:

Multi-modal content (images, diagrams, interactive exercises)
Collaborative learning (team missions, peer review)
Advanced analytics (learning velocity, concept mastery tracking)

Long-term:

Fine-tuned models for domain-specific teaching
Adaptive difficulty based on real-time comprehension
Integration with external platforms (Coursera, edX, Khan Academy)

The vision: Every learner gets a personal tutor that:

Understands their goals
Adapts to their learning style
Remembers everything
Never gets tired
Scales to millions

Conclusion: Why This Matters

Building LearnForge taught me something important: the future of education isn't about better content, it's about better personalization.

Traditional platforms give everyone the same course. LearnForge gives everyone a personalized learning journey that:

Starts with a conversation (not a catalog)
Adapts in real-time (not static content)
Remembers everything (not ephemeral sessions)
Scales seamlessly (not manual infrastructure)

By combining:

Google ADK's hierarchical agent orchestration for complex workflows
Cloud SQL with DatabaseSessionService for persistent state
Cloud Run's auto-scaling for seamless scalability
WebSocket real-time communication for responsive UX

I created a platform that transforms how people learn, from static courses to dynamic, personalized, adaptive journeys.

The technology is here. The infrastructure is ready. The future of education is AI-powered, serverless, and personalized.

LearnForge is just the beginning.

Google Cloud Services Utilized

Service	Purpose	Why It Matters
Cloud Run	Serverless container hosting	Auto-scales from 0 to 100 instances, zero infrastructure management
Artifact Registry	Container image storage	Versioned images, CI/CD integration
Cloud SQL (PostgreSQL)	Persistent session state	Sessions survive restarts, connection drops, device switches
Cloud SQL Connector	Secure database connections	IAM-based security, no IP whitelisting needed
Firebase Authentication	User authentication	Google OAuth 2.0, secure session management
Firestore	Mission data storage	User profiles, mission definitions, enrollments
Cloud Logging	Application logs	Centralized logging, debugging, monitoring
Cloud Trace	Distributed tracing	Performance analysis, bottleneck identification
Secret Manager	Credential storage	Zero hardcoded secrets, rotation-friendly
Agent Development Kit (ADK)	Multi-agent orchestration	Hierarchical agents, sequential pipelines, tool integration
Gemini 2.5 Flash	LLM for agents	Fast, cost-effective, powerful reasoning
YouTube Data API v3	Video curation	Educational video search, filtering, metadata
Google Search API	Content discovery	Real-time research, up-to-date information

Built with Google Cloud Run and Agent Development Kit (ADK)

Transforming how people learn, one conversation at a time.