ProfitPulse ERP: An AI-Powered Business Intelligence App Built with Gemma 4 & Flutter

#gemmachallenge #gemma #flutter #devchallenge

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

ProfitPulse is a cross-platform ERP (Enterprise Resource Planning) application built with Flutter, designed for small-to-medium businesses that need a fully offline-capable, privacy-first tool for managing their operations.

Out of the box it covers:

📦 Inventory Management — Track stock levels, batch numbers, and expiry dates
🛒 Sales & Purchases — POS-style transaction screens for fast, accurate data entry
🏭 Production / Recipe Tracking — Bill-of-materials-style production runs that auto-deduct ingredients from stock
💰 Financials — Loan ledger, outstanding receivables, and settlement workflows
👥 Contacts — Dual-role customer/supplier directory
The standout feature is the Strategic AI Analyst — a built-in business intelligence layer that reads your live ERP data and generates structured insights: a business health score, a phase classification (e.g., Foundation, Growth, Scale), a plain-English summary, key observations, and an actionable strategy. A secondary "Ask Business AI" chat panel lets you query your own data in natural language (e.g., "Which product has the lowest turnover this week?").

The entire AI layer works 100% on-device, with no data ever leaving the phone.

Demo

profit_pulse – dashboard_screen.dart [profit_pulse] 2026-05-09 00-02-35.mp4 - Google Drive

drive.google.com

How I Used Gemma 4

The Model: Gemma 4 E2B (2B Edge)
I chose Gemma 4 E2B (gemma-4-E2B-it.litertlm) for one core reason: it is the only sub-4 GB model with a native reasoning / thinking mode. For a business analyst use-case — where the model needs to weigh multiple data points before committing to a health score and strategy — that chain-of-thought capability produces dramatically more coherent and trustworthy output than a pure generative model of the same size.

The 2.4 GB model size means it fits on a mid-range Android device with 6 GB RAM, making truly on-device, offline business intelligence achievable for the first time without a server.

Integration Architecture
ProfitPulse uses a dual-engine architecture via a Switch toggle on the dashboard:
┌─────────────────────────────────┐
│ AI BLoC (Dart) │
│ GenerateBusinessReport event │
│ AskAI event │
└────────────┬───────────┬────────┘
│ useCloud │ !useCloud
▼ ▼
GeminiService GemmaService
(REST API) (flutter_gemma)
│
Gemma 4 E2B
(on-device,
thinking mode)

GemmaService wraps the flutter_gemma package and handles:

Model registration — On first launch the model is downloaded directly from HuggingFace (litert-community/gemma-4-E2B-it-litert-lm) via FlutterGemma.installModel().fromNetwork(). Subsequent launches detect the cached file and re-register it without re-downloading.
Thinking mode — The model is loaded with isThinking: true, enabling Gemma 4's internal reasoning pass before it emits tokens. The service distinguishes between TextResponse (shown to the user) and ThinkingResponse (used internally by the model's chain-of-thought, yielded but gracefully ignored by the JSON parser).
Session management — For structured JSON reports the chat history is flushed before each call (resetChat()) to prevent "hallucination leak" from previous conversational context bleeding into the structured output.

// gemma_service.dart — core inference loop
final responseStream = _chat!.generateChatResponseAsync();
await for (final ModelResponse response in responseStream) {
  if (response is TextResponse) {
    yield response.token;          // streamed to UI
  } else if (response is ThinkingResponse) {
    yield response.content;        // reasoning pass — BLoC JSON parser ignores it
  }
}

Prompt Design for Structured Output
The AI Analyst card sends the model a live ERP snapshot in JSON (inventory count, total revenue, outstanding loan exposure) and asks for a strictly-typed JSON response:

const systemPrompt = """
Analyze this ERP snapshot. Use 'Foundation' phase if revenue/items are low.
Output ONLY JSON:
{"health_score": int, "phase": "string", "summary": "string",
"observation": "string", "strategy": "string"}
""";
The BLoC accumulates streamed tokens and parses the JSON as soon as both { and } are present in the buffer — so the UI updates in real-time as the model reasons through the answer.

Why not a larger model?
The 31B Dense variant would produce richer prose, but requires hardware that isn't realistic for a mobile ERP tool. E4B was considered but its 4 GB footprint starts to strain lower-end devices and doesn't provide a meaningful quality improvement over E2B's thinking mode for structured JSON tasks. E2B + thinking mode hits the sweet spot: small enough to ship on-device, smart enough to reason about multi-variable business data, and fast enough to respond in seconds.

Privacy by Design
Every inference call stays on the user's device. The ERP data — revenue figures, supplier names, financial positions — never touches an external server unless the user explicitly switches to Cloud Engine mode (which uses their own Google AI Studio API key). This is a hard requirement for any serious business tool.

credits: yiawakil_37_7772a17ade5a4