Ashiha Mahesh Kumar

Posted on May 25 • Edited on May 26

DueIt : Your To-Do List Doesn't Know You So I Gave Mine Three Brains

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

What I Built
DueIt is an adaptive task planning system powered by three Gemma 4 models. You create one task — and the system plans your entire path to the deadline, generates a Notion workspace, scrapes real resources, builds a visual workflow diagram, and replans automatically when you miss a day.
It solves two problems no to-do app has ever fixed:

Forgetting an assignment exists until the night before
Not having enough time because you started too late

Three Gemma 4 models power the system — each chosen for a specific reason:

31B Dense for deep reasoning — task planning, time estimation, Notion doc generation
2B for speed — morning replanning in under 2 seconds
26B MoE for multi-domain synthesis — Excalidraw workflow diagrams and analysis

Let me walk you through exactly what happens when you use DueIt.

You open the app and create a task.

Say you need to build a RAG chatbot. You type the title, pick "Work" as the category, set the deadline, and hit create.
Behind the scenes, Gemma 4 31B Dense starts reasoning. It reads the task, estimates that this will take around 55 hours across 5 phases, and generates a structured day-by-day schedule with specific microtasks for each day. It calculates pressure and risk scores based on your deadline proximity and workload.
When the plan lands, your home screen shows the task with every microtask you can expand and check off as you go.

Your calendar fills itself.
You didn't manually block time. You didn't drag anything. Gemma 31B distributed your microtasks across every day from now until the deadline as real time-blocked sessions — morning workout at 7, landing page mockup at 9, RAG chatbot work at 11:30.
Multiple tasks from different categories sit side by side on the same day. You open the calendar and know exactly what to work on, when, and for how long.

The dashboard tells you the truth.
Not motivational quotes. Not "you got this." The truth.
Your pressure is at 71% — deadlines are close and tasks are piling up. Your risk of missing a deadline is 51% — you're on the edge. Your streak is 7 days — you've been showing up.
The work rhythm chart shows how your productivity flows across the week. Tuesday and Wednesday are your peak days. Sunday drops off.

Scroll down and the AI productivity heatmap shows your entire month. Dark purple means you crushed it. Light means you barely touched anything. One glance and you see the pattern you didn't know you had.

Meanwhile, Notion builds your workspace.
While you were looking at the dashboard, DueIt auto-generated a complete Notion page for your task. No templates. No manual setup.
The page has a task overview with your deadline and time estimate. An expected output section. And a Progress Tracker — an inline database with every phase mapped out: Research, Design, Implementation, Testing, Review & Deploy. All set to "To Do."

Scroll down and every phase has detailed notes with specific microtasks. Phase 1 tells you to identify data sources, research RAG frameworks like LangChain and LlamaIndex, evaluate vector databases. Phase 2 tells you to collect sample documents, design the system architecture, define the chunking approach.
The workflow diagram link sits right there — click it and you're in Excalidraw.
And at the bottom, curated resources that TinyFish scraped from the web. Not generic Google results — real links: Pinecone docs on building RAG chatbots, Microsoft Learn guides, LangChain documentation, DEV Community tutorials.

Excalidraw shows you the visual map.
Gemma 4 26B MoE took the phases from your plan and generated a Mermaid flowchart. TinyFish rendered it in Excalidraw and started a live collaboration room.
Collect Source Documents → Chunk & Embed Data → Index Vector Store → Configure Retrieval Logic → Integrate LLM Prompting → Build Chat Interface.
Every task gets a unique workflow specific to its actual phases. Not a template — a diagram that matches what Gemma planned for this specific task.

Cursor reads the Notion page and starts building.
This is where planning becomes execution.
Cursor connects to your Notion workspace through MCP. It reads the Progress Tracker, understands the phases and their order. It reads the Notes section, understands the specific microtasks within each phase.
Then it starts working. It executes Phase 1 — generates actual project structure, files, and code corresponding to the research phase.
When it finishes, it updates the Notion Progress Tracker. Research phase: Done. Design, Implementation, Testing, Review & Deploy: still To Do. The status change syncs back to your Flutter app automatically.

The Excalidraw diagram updates too — the first box turns green. Planning and execution stay perfectly synchronized.

You skip a day. DueIt doesn't care.
Life happened. You didn't touch anything yesterday. Most apps would just show you a red overdue badge and guilt you into quitting.
DueIt does something different. The next time you open the app, Gemma 4 2B kicks in — in under 2 seconds it checks what you completed, identifies what you missed, and redistributes those microtasks across your remaining days. Your calendar updates. Your pressure meter rises. Your risk adjusts.
But your path to the deadline is still clear. The plan adapted to the real you, not the ideal you who never skips a day.

Figma Prototype

Click here to explore the prototype

Demo

Code

ashihams / due-it

AI-powered task planning app that automatically breaks tasks into actionable subtasks, adapts daily schedules, and recalculates workload based on missed tasks and approaching deadlines.

DueIt

An AI-powered adaptive task planning system that breaks tasks into microtasks, distributes them across your calendar, and replans automatically when life gets in the way.

DueIt uses three Gemma 4 models with intentional role separation, Notion integration for workspace generation, and TinyFish for web resource scraping and Excalidraw workflow diagrams.

Repository Layout


├─ due_it/                    # Flutter frontend
│  ├─ lib/
│  │   ├─ models/             # Task, AiSchedule, DueTask data models
│  │   ├─ providers/          # State management (Provider)
│  │   ├─ screens/            # Home, Calendar, Dashboard, Tasks, Auth, Settings, Profile
│  │   ├─ services/           # API calls, Firestore, Notion
│  │   ├─ theme/              # Colors, spacing, typography
│  │   └─ widgets/            # Reusable UI components
│  ├─ test/                   # Unit tests
│  └─ pubspec.yaml
├─ due_it_backend/            # FastAPI backend
│  ├─ main.py                 # API gateway, planning logic, scheduling algorithms
│  ├─ notion/                 # Notion integration package
│  │

…

View on GitHub

Architecture

Flutter app authenticates through Firebase and sends every request to FastAPI on Cloud Run
FastAPI is a thin router — validates tokens, then hands off to the right Gemma model based on task type
31B Dense handles planning and Notion doc generation. 2B handles morning replans. 26B MoE handles diagram synthesis
Every AI response passes through _validate_ai_plan() before touching Firestore
Notion workspace creation, TinyFish resource scraping, and Excalidraw diagram generation all fire simultaneously as BackgroundTasks
Firestore syncs everything back to Flutter in real time
Cursor connects to Notion through MCP, executes phases as code, and updates the progress tracker — closing the loop

Tech Stack

Frontend
├── Flutter (Dart)
├── Provider — state management
└── Firebase SDK — auth + real-time Firestore sync

Backend
├── FastAPI (Python) — API gateway
├── HTTPX — async HTTP client for OpenRouter
├── BackgroundTasks — parallel Notion/TinyFish/Excalidraw
└── Cloud Run — serverless deployment

AI Models (via OpenRouter)
├── Gemma 4 31B Dense  — planning, estimation, Notion docs
├── Gemma 4 2B         — morning replan, schedule adjustment
└── Gemma 4 26B MoE    — Excalidraw diagrams, Mermaid generation

Data
├── Cloud Firestore — users, tasks, schedules, metrics
└── Firebase Auth — email/password authentication

Integrations
├── Notion API — workspace generation (notion/ package)
├── TinyFish API — web resource scraping + Excalidraw automation
├── Excalidraw — visual workflow diagrams + collaboration rooms
└── Cursor AI — code execution via Notion MCP

Deployment
├── Google Cloud Run — backend
└── Vercel — web frontend

How I Used Gemma 4

DueIt uses all three Gemma 4 model architectures — not because more models sounds impressive, but because each layer of the system has fundamentally different computational demands. A planning engine that reasons through task complexity for 10 seconds is fine. A morning replanner that takes 10 seconds is unusable. A diagram generator that can't hold task structure and visual syntax in its head simultaneously produces garbage. One model can't optimize for all three.

Gemma 4 31B Dense — the reasoning layer

Used for: Task planning, time estimation, phase generation, Notion document content

When you create a task like "Build a RAG chatbot — due in 2 weeks," the model doesn't just split it into equal chunks. It reasons:

This is a multi-phase engineering task — it needs research, design, implementation, testing, and deployment
Research and design are front-loaded — you can't implement what you haven't designed
Implementation is the heaviest phase — it needs 40% of total time, not 20%
The last two days should be lighter — buffer for overrun, not new work
Estimated total: 55 hours across 5 phases, 12 microtasks distributed across 10 working days

Every one of those decisions depends on every other decision. Change the estimate and the distribution changes. Change the phase count and the per-day load changes. This is constraint-dependent multi-step reasoning — not pattern matching, not template filling.

Dense architecture activates all 31B parameters on every token. That matters here because the output is structured JSON where a single malformed field — a negative estimatedMinutes, a missing microtasks array, a schedule longer than the deadline — breaks the entire downstream pipeline. Every token in the output needs the full weight of the model behind it.

GEMMA_31B_DENSE = "google/gemma-4-27b-it"

# The planning prompt includes task title, category, deadline,
# and category-specific instructions (Study vs Work vs Personal)
response_text = _call_gemma(GEMMA_31B_DENSE, prompt, max_tokens=4096)
plan = json.loads(response_text)

# Validate before anything touches Firestore
validated = _validate_ai_plan(plan)
# estimatedMinutes: capped 1-10000
# confidence: clamped 0.0-1.0
# schedule: max 60 days
# microtasks per day: max 10
# missing fields: safe defaults applied

The same model generates Notion document content — task overview, expected output, phase-by-phase notes with microtask breakdowns. This also requires structured reasoning: the Notion page isn't a summary of the plan, it's a restructured representation of the plan optimized for human readability and Cursor agent consumption. Different output format, same reasoning depth.

Why not MoE? The MoE architecture routes tokens to specialist sub-networks. That's powerful when different parts of the input need different expertise. But planning is holistic — the time estimate affects the schedule, which affects the daily load, which affects the pressure score. Routing parts of this reasoning to different experts fragments the very thing that needs to stay unified.

Why not 2B? I tested it. 2B generates plausible-looking JSON that falls apart under scrutiny — phases with no microtasks, schedules that extend past the deadline, time estimates that don't add up to the total. The structured output reliability drops dramatically below 31B for this task.

Gemma 4 2B — the speed layer

Used for: Morning replanning, missed task redistribution, schedule adjustments

This is the model most users interact with without knowing it. Every morning when you open DueIt, the app checks: what did you complete yesterday? What did you skip? How many days are left until each deadline?

If you missed microtasks, 2B redistributes them across your remaining days — proportionally, not equally. A day with 3 existing microtasks gets fewer redistributed tasks than an empty day. The pressure and risk meters recalculate. The calendar updates.

This needs to happen in under 2 seconds. A student checking their phone at 8:47 AM before a 9:00 class is not going to wait for a 31B model to reason through redistribution for 15 seconds. They'll close the app and forget about it — which is the exact problem DueIt exists to solve.

2B handles this because the task is operationally simple. It's not generating a new plan from scratch. It's taking an existing schedule, identifying missed items, and proportionally redistributing them. The reasoning depth required is low. The latency requirement is strict.

GEMMA_2B = "google/gemma-4-4b-it"

# Called on app open — reads yesterday's completions,
# identifies missed microtasks, redistributes across remaining days
# Input is never mutated — copy.deepcopy protects the original
schedule = copy.deepcopy(existing_schedule)
adjusted = _redistribute_schedule(schedule, now, deadline)

Why not 31B? It would produce marginally better redistribution — maybe slightly smarter about which days get more load. But the latency cost is 5-10x higher, and the quality difference is negligible for a redistribution task. The user doesn't notice a 2% better redistribution. They absolutely notice a 10-second wait.

Why not MoE? MoE's routing overhead adds latency even when the task only needs one expert. For a single-domain operation (schedule math), the routing is pure overhead. 2B dense is the fastest path to a correct answer.

Gemma 4 26B MoE — the synthesis layer

Used for: Excalidraw workflow diagram generation, Mermaid flowchart synthesis, multi-domain analysis

When DueIt creates an Excalidraw workflow diagram, the model needs to do three things simultaneously:

Understand task structure — parse the phases (Research → Design → Implementation → Testing → Deploy) and their dependencies
Generate valid Mermaid syntax — produce a flowchart LR block with correct node IDs, arrow syntax, and label formatting
Ensure visual logic — the flowchart should read left-to-right in a sequence that makes sense to a human, not just a valid parse tree

These three domains — task management, code syntax, and visual layout — are genuinely different areas of expertise. A model that's great at understanding task phases might generate broken Mermaid. A model that writes perfect Mermaid might produce a diagram that's technically valid but visually confusing.

This is exactly what Mixture-of-Experts was architectured for. The 26B MoE model has 26 billion total parameters organized into specialist sub-networks, but only activates 4 billion on any given token. Different tokens in the output route to different experts — the task-understanding tokens go to one expert, the syntax tokens to another, the layout decisions to a third.

The result: edge-speed inference (only 4B active parameters per forward pass) with full-depth knowledge (26B total parameters available for routing). You get the speed of a small model with the knowledge of a large one.

GEMMA_26B_MOE = "google/gemma-4-26b-a4b-it"

# Generate Mermaid flowchart from task phases
# The prompt includes phase names, ordering, and dependencies
raw_steps = _call_gemma(GEMMA_26B_MOE, steps_prompt, max_tokens=2048)

# TinyFish takes the Mermaid code, opens Excalidraw,
# imports via the Mermaid importer, starts a collaboration room,
# and returns the shareable URL

Why not 31B Dense? 31B would produce equally good diagrams, but slower. Since diagram generation runs as a BackgroundTask (the user isn't waiting on it), latency matters less here than for replanning. But 31B is already busy with the planning call — running both in parallel on the same model class creates queuing. MoE handles diagram generation on a separate architecture while 31B focuses on planning.

Why not 2B? I tested it. 2B generates Mermaid that looks right but breaks on edge cases — missing arrow connectors, node IDs that collide, labels that overflow. The cross-domain synthesis required (task logic + code syntax + visual layout) exceeds what 2B can hold in context simultaneously.

Model selection summary

Model	Architecture	What it does	Why this specific architecture
Gemma 4 31B Dense	All 31B params active per token	Task planning, time estimation, Notion docs	Holistic multi-step reasoning for reliable structured JSON
Gemma 4 2B	Small dense, fast inference	Morning replan, schedule redistribution	Under 2 seconds — speed is the UX requirement
Gemma 4 26B MoE	4B active / 26B total	Excalidraw diagrams, Mermaid generation	Expert routing across task logic, code syntax, and visual layout

What Gemma 4 unlocked

Before Gemma 4, DueIt used a single gemini-2.5-flash call for everything — planning, diagrams, formatting, replanning. Same model, same latency, same capability ceiling. There was no model selection strategy because there was no model selection available.

The migration wasn't just swapping model strings. It was rethinking which cognitive task each API call actually performs and matching it to the architecture that handles that task best.

Planning got deeper — 31B Dense reasons through constraints that Flash skimmed over. Replanning got faster — 2B responds in the time it takes to glance at your phone. Diagrams got better — MoE's expert routing produces cleaner cross-domain output than a single generalist model juggling three domains at once.

That's what intentional model selection looks like. Not "I used the biggest model for everything." But: I used three models, each one earning its place.

Built solo with too much coffee, three Gemma brains, and the belief that productivity apps should plan around the real you — not the ideal you.

One task. Fully planned. Automatically adapted.