DEV Community

Rashika Karki
Rashika Karki

Posted on

Building LearnForge: Multi-Agent AI Learning Platform on Cloud Run with Google ADK

Note: I wrote this article for the Google Cloud Run Hackathon 2025.

I built LearnForge, an AI-powered learning platform that does something no other platform does: it has a conversation with you, figures out what you actually want to learn (not what you think you want), researches the topic in real-time, and then creates a personalized learning journey that adapts to how you learn.

Built on Google Cloud Run and Agent Development Kit (ADK) with 12 specialized AI agents working together, it handles everything from vague "I want to learn AI" statements to structured, adaptive learning missions that remember where you left off—even if you come back weeks later.

The result? A learning experience that feels like having a personal tutor who never forgets, never gets tired, and actually knows what they're talking about.


The Problem: Why Online Learning is Hard (And How I Fixed It)

Let me paint you a picture. You're sitting at your computer, motivated, ready to learn something new. You type "learn Python" into a course platform. You get 47 courses. You pick one. Three hours later, you realize:

  • You're learning about data types when you wanted to build web APIs
  • The instructor assumes you know nothing, but you've coded in JavaScript for years
  • The examples are boring and don't relate to what you actually want to build
  • You're on lesson 12 of 200, and you have no idea if you're learning the right things

Sound familiar? This is the fundamental problem with online learning: one-size-fits-all doesn't fit anyone.

The Real Problem

Here's what's broken:

1. The Goal Mismatch

When someone says "I want to learn AI," they could mean:

  • Building chatbots for customer service
  • Fine-tuning language models for text summarization
  • Understanding neural network architectures
  • Creating computer vision systems for medical imaging

But traditional platforms give everyone the same "Introduction to AI" course. It's like asking "Where do you want to go?" and getting the same map regardless of whether you said "the grocery store" or "Mount Everest."

2. The Experience Gap

A beginner needs hand-holding. An intermediate learner needs to skip basics. An expert needs advanced concepts. But courses are static, they can't adapt.

3. The Context Loss

You start a course on Monday. Life happens. You come back on Friday. The platform has no idea:

  • What you were learning
  • How well you understood it
  • Where you left off
  • What you were confused about

4. The Content Problem

Most courses are pre-recorded, static content. But learning is dynamic. You might need:

  • A different explanation when you're confused
  • Real-world examples relevant to your goals
  • Video tutorials when you're a visual learner
  • Step-by-step guides when you prefer structured learning

The Solution: AI That Actually Understands

What if, instead of picking from a catalog, you could just... talk to an AI?

You: "I want to learn AI"

AI: "Great! I checked what's trending in AI learning. Most people start with either model training, data pipelines, or building AI-powered applications. Which direction do you want to focus on?"

You: "Building applications, I think"

AI: "Got it. Are you thinking of chatbots, image generators, or recommendation systems?"

You: "Chatbots"

AI: "Perfect! So you want to learn AI chatbot development, at an intermediate level, focusing on building real applications. Does that sound right?"

And then here's the magic the AI:

  1. Researches the topic using Google Search
  2. Creates a personalized learning mission with checkpoints
  3. Guides you through it conversationally
  4. Adapts content based on your responses
  5. Remembers everything, even if you come back weeks later

This isn't a chatbot. This is a multi-agent AI system that coordinates 12 specialized agents to create a complete learning experience.


How It Works: The Magic Behind the Scenes

LearnForge uses Google's Agent Development Kit (ADK) to orchestrate 12 specialized AI agents. Think of it like a well-coordinated team where each agent has a specific job, but to the user, it feels like talking to one intelligent tutor.

Phase 1: Mission Creation (Meet Polaris)

When you first connect, you meet Polaris, the Pathfinder. Polaris doesn't just ask questions it researches your topic in real-time to ask intelligent, informed questions.

You: "I want to learn machine learning"

Polaris: [Searches Google for "machine learning learning paths 2025"]
         [Finds that most people focus on: model training, data prep, or deployment]

         "When people explore 'machine learning,' they often focus on 
         model training, data prep, or deployment. Which of these do 
         you want to master first?"
Enter fullscreen mode Exit fullscreen mode

Behind the scenes, Polaris uses:

  • Pathfinder Agent: Conversational goal clarification with research
  • Search Agent: Google Search API for real-time topic research
  • Mission Curator Agent: Converts your goals into structured learning missions

The result? Instead of a generic "Machine Learning 101" course, you get a mission tailored to your specific goal: "Building Production ML Systems with Python and TensorFlow" or "Data Preparation for Machine Learning Models."

Phase 2: Learning Execution (Meet Lumina)

Once your mission is created, Lumina takes over. Lumina is your personal learning companion patient, adaptive, and genuinely helpful.

Lumina guides you through checkpoints (bite-sized learning goals) using a sophisticated multi-agent system:

Lumina Orchestrator (invisible coordinator)
    ├── Greeter: "Welcome! Let's start your journey..."
    ├── Flow Briefer: "Next up: Understanding Neural Networks. Ready?"
    ├── Sensei (your teacher)
    │   ├── Content Composer
    │   │   ├── Content Searcher: Finds educational articles via Google Search
    │   │   ├── Video Selector: Curates YouTube videos (4-20 min, educational)
    │   │   └── Content Formatter: Personalizes content for your learning style
    │   └── Evaluates your understanding
    ├── Help Desk: Answers off-topic questions
    └── Wrapper: Celebrates your completion
Enter fullscreen mode Exit fullscreen mode

Here's what makes this special:

1. Content is Generated in Real-Time

When Sensei teaches you about "neural networks," it doesn't pull from a static database. Instead:

  • Content Searcher finds the latest, most relevant articles
  • Video Selector curates educational YouTube videos (filtered by duration, category, relevance)
  • Content Formatter adapts everything to your learning style (visual? examples? step-by-step?)

2. It Adapts to Your Understanding

Sensei: "Can you explain how backpropagation works?"

You: "It's like... adjusting weights based on errors?"

Sensei: "You're on track with the error part! Let me clarify the 
        weight adjustment mechanism..."
        [Delegates to Content Composer for a clearer explanation]
        [Presents it naturally, as if Sensei knew it all along]
Enter fullscreen mode Exit fullscreen mode

3. It Remembers Everything

This is where it gets interesting. Most chatbots lose context when you close the browser. LearnForge uses Cloud SQL with DatabaseSessionService to persist everything:

  • Which checkpoint you're on
  • What content was presented
  • How you responded to questions
  • What you were confused about
  • Your learning preferences

Close your browser, come back next week, switch devices it all just works.


The Architecture: How 12 Agents Work Together Seamlessly

The technical magic is in the orchestration. LearnForge uses Google ADK's hierarchical agent system to coordinate specialized agents without the user ever knowing.

Silent Orchestration: The Invisible Hand

The orchestrators are completely invisible. Users never see messages like "Let me hand you over to the Sensei..." Instead, transitions are seamless:

# What the user sees:
Sensei: "Let's explore neural networks! Here's how they work..."

# What's actually happening:
Orchestrator  delegates to Sensei  Sensei delegates to Content Composer 
 Content Composer chains: Searcher  Video Selector  Formatter
 Content flows back  Sensei presents it naturally
Enter fullscreen mode Exit fullscreen mode

The orchestrator's instruction is explicit:

root_agent = LlmAgent(
    instruction="""
    YOU MUST NEVER TALK TO THE USER DIRECTLY.
    YOU MUST NEVER ACKNOWLEDGE DELEGATIONS.
    The user should ONLY see responses from sub-agents.
    """
)
Enter fullscreen mode Exit fullscreen mode

Content Authority Separation: Why Teaching Agents Don't Generate Content

Here's an insight that improved content quality dramatically: teaching agents shouldn't generate content they should delegate to specialized agents.

sensei_agent = LlmAgent(
    instruction="""
    YOU ARE FORBIDDEN FROM CREATING ANY TEACHING CONTENT.
    You must delegate ALL content creation to content_composer_agent.

    YOU CAN:
    - Ask questions
    - Evaluate answers
    - Provide feedback

    YOU CANNOT:
    - Explain concepts yourself
    - Provide examples yourself
    """
)
Enter fullscreen mode Exit fullscreen mode

Why? Because:

  • Content Searcher has access to Google Search (real-time, research-backed)
  • Video Selector has access to YouTube API (curated, filtered)
  • Content Formatter knows your learning preferences

Sensei focuses on pedagogy. Content creation agents focus on quality. Separation of concerns, even for AI.


The DatabaseSessionService Breakthrough: Why This Changes Everything

Here's where LearnForge diverges from every other AI learning platform I've seen.

The Problem Nobody Talks About

Most AI chatbots use in-memory sessions. This works fine for:

  • 5-minute conversations
  • Simple Q&A
  • Demos

But learning is different. Learning sessions can span:

  • Hours (deep dive sessions)
  • Days (coming back to continue)
  • Weeks (long-form courses)
  • Months (mastery journeys)

In-memory sessions fail catastrophically:

  • Server restart? Session lost.
  • Connection drop? Session lost.
  • Switch devices? Session lost.
  • Come back tomorrow? Session lost.

The Solution: Persistent State with Cloud SQL

LearnForge uses DatabaseSessionService with Cloud SQL (PostgreSQL) to persist everything:

from google.adk.sessions import DatabaseSessionService
from google.cloud.sql.connector import Connector

connector = Connector(refresh_strategy="LAZY")

session_service = DatabaseSessionService(
    db_url="postgresql+pg8000://",
    creator=lambda: connector.connect(
        instance_connection_name,
        "pg8000",
        user=db_user,
        password=db_password,
        db=db_name,
    ),
    pool_size=10,
    max_overflow=5,
    pool_timeout=60,
    pool_recycle=1800,
)
Enter fullscreen mode Exit fullscreen mode

What gets persisted:

  • Current checkpoint index
  • Completed checkpoints
  • Content search results (so we don't re-search)
  • Video selections (so we remember what was shown)
  • User responses and comprehension checks
  • Learning preferences

The impact:

You can:

  • Close your browser mid-checkpoint and resume exactly where you left off
  • Switch from laptop to phone seamlessly
  • Come back weeks later and continue your mission
  • Share sessions with team members (collaborative learning)

This isn't just a feature it's what makes LearnForge production-ready for real learning, not just demos.

Cloud SQL Connector: Security Without the Headache

Traditional database access requires:

  • IP whitelisting (nightmare in serverless)
  • VPNs (complex setup)
  • Exposed connection strings (security risk)

Cloud SQL Connector uses IAM credentials. No network configuration. No exposed passwords. Just secure, managed connections.

def _create_cloud_sql_connection(self):
    return self._connector.connect(
        settings.INSTANCE_CONNECTION_NAME,
        "pg8000",
        user=settings.DB_USER,
        password=settings.DB_PASSWORD,
        db=settings.DB_NAME,
    )
Enter fullscreen mode Exit fullscreen mode

Production-grade security, zero configuration.


Real-Time WebSocket: The Conversation That Never Lags

Both Polaris and Lumina use WebSocket connections for real-time, bidirectional communication. But here's what makes it special: session resume.

The Flow

@router.websocket("/ws")
async def mission_ally_websocket(websocket: WebSocket, mission_id: str):
    await websocket.accept()
    user_id = await authenticate(websocket)

    # Check for existing session
    existing_session = await session_service.get_session(
        app_name="mission-ally",
        user_id=user_id,
        session_id=session_id
    )

    if existing_session:
        # Resume from last checkpoint
        current_checkpoint = existing_session.state["current_checkpoint_index"]
        completed = existing_session.state["completed_checkpoints"]
        # Send historical messages, continue from where they left off
    else:
        # Start new mission
        session = await session_service.create_session(...)
Enter fullscreen mode Exit fullscreen mode

If you disconnect and reconnect, the system:

  1. Loads your session from Cloud SQL
  2. Sends you historical messages (so you see the conversation)
  3. Continues from your last checkpoint
  4. Feels completely seamless

No "start over" button. No lost progress. Just... continue.


Content Composition: How AI Curates Your Learning Materials

When Sensei needs to teach you about "neural networks," it doesn't pull from a static database. Instead, it orchestrates a three-stage pipeline:

Stage 1: Content Searcher

Uses Google Search API to find the latest, most relevant educational content:

content_searcher = LlmAgent(
    name="lumina_content_searcher",
    tools=[google_search_tool],
    instruction="Search for educational content about the concept..."
)
Enter fullscreen mode Exit fullscreen mode

Real-time search means you get current information, not outdated course materials.

Stage 2: Video Selector

Uses YouTube Data API v3 to curate educational videos:

def search_youtube_videos(
    query: str,
    max_results: int = 3,
    duration_filter: str = "medium",  # 4-20 minutes (optimal for learning)
    video_category_id: str = "27"  # Education category
) -> list[dict]:
    # Filters by: duration, category, relevance
    # Returns: title, channel, description, duration, thumbnail
Enter fullscreen mode Exit fullscreen mode

Why this matters: Not all YouTube videos are educational. Not all educational videos are the right length. The selector finds videos that are:

  • Actually educational (category 27)
  • The right duration (4-20 min is the sweet spot)
  • Relevant to the concept
  • From reputable channels

Stage 3: Content Formatter

Personalizes everything based on your learning profile:

content_formatter = LlmAgent(
    instruction=f"""
    Format content for:
    - Learning style: {user_profile['learning_style']}  # ["examples", "step-by-step"]
    - Level: {user_profile['level']}  # "Beginner" | "Intermediate" | "Advanced"
    - Preferences: {user_preferences}
    """
)
Enter fullscreen mode Exit fullscreen mode

If you're a visual learner who prefers examples, you get examples. If you prefer step-by-step guides, you get structured explanations. The content adapts to you.

SequentialAgent: Chaining It All Together

ADK's SequentialAgent makes this elegant:

content_composer = SequentialAgent(
    name="lumina_content_composer_agent",
    sub_agents=[
        content_searcher,    # Stage 1: Search
        video_selector,   # Stage 2: Curate videos
        content_formatter # Stage 3: Personalize
    ]
)
Enter fullscreen mode Exit fullscreen mode

Each stage passes its output to the next. Clean, simple, powerful.


Technical Deep Dives: The Decisions That Matter

1. Why Cloud SQL Connector Over Traditional Connections

The problem: In Cloud Run (serverless), you can't whitelist IPs. Traditional database connections require network configuration.

The solution: Cloud SQL Connector uses IAM credentials. Zero network configuration. Secure by default.

@property
def use_cloud_sql_connector(self) -> bool:
    return self.is_cloud_run and all([
        self.INSTANCE_CONNECTION_NAME,
        self.DB_USER,
        self.DB_PASSWORD,
        self.DB_NAME
    ])
Enter fullscreen mode Exit fullscreen mode

Impact: Production-ready security without the infrastructure headache.

2. Secret Manager: Zero Hardcoded Credentials

All sensitive configuration comes from Secret Manager:

def _read_secret(env_var: str, default: str = "") -> str:
    value = os.getenv(env_var, default)
    # Cloud Run mounts secrets as files
    if value and os.path.exists(value):
        with open(value) as f:
            return f.read().strip()
    return value
Enter fullscreen mode Exit fullscreen mode

Impact: Credential rotation? Just update the secret. No code changes. No redeploys.

3. Multi-Stage Docker Build: Faster Cold Starts

Optimized container images reduce cold start time:

# Builder stage: Install dependencies
FROM python:3.11-slim as builder
RUN pip install poetry
COPY pyproject.toml poetry.lock ./
RUN poetry install --no-root --only main

# Runtime stage: Copy only what's needed
FROM python:3.11-slim
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
Enter fullscreen mode Exit fullscreen mode

Impact: 60% smaller images, faster cold starts, lower costs.

4. State-Driven Checkpoint Progression

Checkpoints advance through state, not hardcoded logic:

increment_checkpoint_tool = FunctionTool(
    func=lambda ctx: {
        "current_checkpoint_index": ctx.state["current_checkpoint_index"] + 1,
        "current_checkpoint_goal": next_checkpoint_name,
    }
)
Enter fullscreen mode Exit fullscreen mode

Impact: Mission structures are flexible. Add checkpoints? Remove checkpoints? Change order? It just works.


Performance & Scalability: Built for Real Usage

Cloud Run Auto-Scaling

  • Min instances: 0 (scale to zero = cost savings)
  • Max instances: 100 (handles traffic spikes)
  • CPU: 2 vCPU per instance
  • Memory: 4Gi per instance
  • Concurrency: 80 requests per instance

What this means: Zero users? Zero cost. 10,000 concurrent learners? Scales automatically. No manual intervention.

Database Connection Pooling

  • Base pool: 10 connections
  • Overflow: 5 connections
  • Timeout: 60 seconds
  • Recycle: 30 minutes

What this means: Efficient connection management. No connection exhaustion. Handles thousands of concurrent sessions.

Session State Efficiency

Average session state: ~50KB. That's:

  • Checkpoint progress
  • Content search results (cached)
  • Video selections
  • User responses

Cloud SQL handles this efficiently. Thousands of concurrent sessions? No problem.


Lessons Learned: What I Wish I Knew Earlier

1. DatabaseSessionService Is Non-Negotiable for Production

In-memory sessions work for demos. Production learning platforms need persistence. The moment I switched to DatabaseSessionService, everything changed:

  • Users could resume sessions
  • Progress tracked across weeks
  • Concurrent learners with isolated state

Learning: Choose session storage based on use case duration, not convenience.

2. Silent Orchestration Creates Better UX

Users don't care about agent architecture. They want a seamless conversation. The orchestrator should be invisible:

# WRONG: User sees the machinery
Orchestrator: "Let me hand you over to the Sensei..."
Sensei: "Hello, let's learn..."

# RIGHT: User only sees the teacher
Sensei: "Hello, let's learn..."
Enter fullscreen mode Exit fullscreen mode

Learning: Hide complexity, show simplicity.

3. Content Authority Separation Improves Quality

Teaching agents shouldn't generate content. They should delegate to specialized agents with:

  • Access to search APIs (real-time content)
  • Video curation (filtered, relevant)
  • Personalization (user preferences)

Learning: Separation of concerns applies to AI agents too.

4. Cloud SQL Connector Simplifies Security

No IP whitelisting. No VPNs. No exposed connection strings. Just IAM credentials and secure connections.

Learning: Use managed services for security, not manual configuration.

5. SequentialAgent Reduces Boilerplate

Content composition requires: search → video selection → formatting. SequentialAgent chains these automatically. No manual coordination needed.

Learning: ADK's built-in patterns are powerful. Use them.

6. State-Driven Flow Enables Flexibility

Hardcoded checkpoint logic breaks when mission structures change. State-driven progression adapts automatically.

Learning: Data-driven > code-driven for dynamic systems.


The Impact: What This Actually Solves

For Learners

  • Personalized learning paths: No more generic courses
  • Real-time adaptation: Content adjusts to your understanding
  • Session persistence: Learn at your own pace, resume anytime
  • Research-backed content: Latest information, not outdated materials
  • Multi-modal learning: Text + videos + interactive teaching

For Educators

  • Scalable tutoring: One AI system can teach thousands simultaneously
  • Adaptive content: Each learner gets personalized materials
  • Progress tracking: See exactly where learners are stuck
  • Content curation: AI finds and filters the best resources

For the Industry

  • Proof that multi-agent AI works: 12 agents coordinating seamlessly
  • Production-ready patterns: DatabaseSessionService, Cloud SQL Connector, WebSocket resume
  • Scalable architecture: Cloud Run auto-scaling, connection pooling
  • Security best practices: Secret Manager, IAM-based connections

What's Next: The Future of AI-Powered Learning

Short-term:

  • Multi-modal content (images, diagrams, interactive exercises)
  • Collaborative learning (team missions, peer review)
  • Advanced analytics (learning velocity, concept mastery tracking)

Long-term:

  • Fine-tuned models for domain-specific teaching
  • Adaptive difficulty based on real-time comprehension
  • Integration with external platforms (Coursera, edX, Khan Academy)

The vision: Every learner gets a personal tutor that:

  • Understands their goals
  • Adapts to their learning style
  • Remembers everything
  • Never gets tired
  • Scales to millions

Conclusion: Why This Matters

Building LearnForge taught me something important: the future of education isn't about better content, it's about better personalization.

Traditional platforms give everyone the same course. LearnForge gives everyone a personalized learning journey that:

  • Starts with a conversation (not a catalog)
  • Adapts in real-time (not static content)
  • Remembers everything (not ephemeral sessions)
  • Scales seamlessly (not manual infrastructure)

By combining:

  • Google ADK's hierarchical agent orchestration for complex workflows
  • Cloud SQL with DatabaseSessionService for persistent state
  • Cloud Run's auto-scaling for seamless scalability
  • WebSocket real-time communication for responsive UX

I created a platform that transforms how people learn, from static courses to dynamic, personalized, adaptive journeys.

The technology is here. The infrastructure is ready. The future of education is AI-powered, serverless, and personalized.

LearnForge is just the beginning.


Google Cloud Services Utilized

Service Purpose Why It Matters
Cloud Run Serverless container hosting Auto-scales from 0 to 100 instances, zero infrastructure management
Artifact Registry Container image storage Versioned images, CI/CD integration
Cloud SQL (PostgreSQL) Persistent session state Sessions survive restarts, connection drops, device switches
Cloud SQL Connector Secure database connections IAM-based security, no IP whitelisting needed
Firebase Authentication User authentication Google OAuth 2.0, secure session management
Firestore Mission data storage User profiles, mission definitions, enrollments
Cloud Logging Application logs Centralized logging, debugging, monitoring
Cloud Trace Distributed tracing Performance analysis, bottleneck identification
Secret Manager Credential storage Zero hardcoded secrets, rotation-friendly
Agent Development Kit (ADK) Multi-agent orchestration Hierarchical agents, sequential pipelines, tool integration
Gemini 2.5 Flash LLM for agents Fast, cost-effective, powerful reasoning
YouTube Data API v3 Video curation Educational video search, filtering, metadata
Google Search API Content discovery Real-time research, up-to-date information

Built with Google Cloud Run and Agent Development Kit (ADK)

Transforming how people learn, one conversation at a time.

Top comments (0)