DEV Community

Angel Anton
Angel Anton

Posted on

Smarter Notifications with Edge AI: A Kotlin + Koog + MediaPipes Journey

Introduction

For the last few weeks I have been testing the Yazio app, a calorie counter. While using it, I noticed that the notifications — though helpful — were sometimes similar and easy to ignore. This observation sparked a question:

Could these reminders be generated dynamically, on the device, based on my context, and sound more natural or timely?

That question led me into an experiment that combined JetBrains Koog as agentic framework and Google MediaPipe Inference API + Gemma LLM for Android edge AI.

The result? A prototype for smarter, more context-aware messages — powered by a local model, and a fantastic starting point for learning and experimenting with generative AI.

The problem: Flat notifications

Notification messages are a crucial tool for many apps, and in multiple cases, the main entry point to them. In the case of Yazio, they do a good job reminding users to log meals or water intake.

But after a while, they can start to feel repetitive or disconnected from what’s actually happening in our day. Maybe they arrive when we’re in the gym, or suggest drinking water instead of a cup of tea on a cold day. It’s a natural limitation of static, predefined messages. They don’t adjust to:

  • What you’ve already logged
  • What time it is
  • Your habits or preferences
  • Your current activity or mood

In other words, they lack context.

Why Context Matters

Behavioural psychology suggests that timing, tone, and context deeply affect how people respond to messages. Our moms know exactly if we’re hungry, and our favourite dishes for each situation. That’s because they have years and years of context about us (and we trust them! but that’s another topic ☺).

A notification like:

Time for a snack?

vs.

Nice pace today. Since lunch is logged, a quick summer bite — gazpacho or yogurt with peach — will keep you moving 💪

It could feel more personal and relevant — not by guessing, but by responding to context. Not quite like your mom, but better than a static reminder.

Hardcoding these messages isn’t scalable, but solving problems of generated language and nondeterministic outputs is exactly where language models shine.

Using Koog to Emulate an Agent

JetBrains Koog is an agentic framework designed to build reasoning agents around language models. It’s created in Kotlin and instead of making a raw prompt request, Koog helps implement the following chain:

Agentic Framework flow diagram

Koog provides connectors for accessing to data using Model Context Protocol (MCP) and common large language model APIs, orchestrating tools and decisions around a model. In this experiment, Koog triggers an agent on notification receipt, assembles user and device context, generates a response, and applies safety post‑processing

At the time of writing, Koog doesn’t natively target small, on‑device models — but it can still orchestrate local inference as part of a hybrid setup.

The Edge AI choice

When designing this feature, I believed a hybrid architecture was the best solution: small language models (SLMs) for the default local workflow, with a large model in the cloud as a fallback. It is both a strategic and a bold decision.

  • Strategic, because the future of agentic AI is moving toward SLM-first systems: faster, cheaper, and easier to align for repetitive subtasks, as NVIDIA’s research points out.
  • Bold, because it breaks with the industry inertia of always relying on a monolithic cloud LLM — the path most apps still follow.

SLM / LLM hybrid architecture

By betting on SLMs now, we will not only reduce costs and latency, but also align with what is likely to become the standard paradigm for reliable, sustainable agentic systems.

For implementing this solution, the Google’s MediaPipe LLM Inference API enables fast, offline calls to a language model directly on Android. In addition, there is a significant list of small models available, and an active community behind them.

Are Koog and MediaPipe equivalent / do they overlap?

  • Not equivalent: They target different problems. Koog Framework is about building agentic logic / workflows / reasoning; MediaPipe LLM Inference API is the layer that loads a model on the phone, manages memory, and executes text generation: it’s the basic building block used for the agentic systems like Koog.
  • Complementary rather than redundant: In many complex AI systems, you can combine: MediaPipe (or parts of it) to process sensor / vision input, then feed results into a Koog agent for decision / reasoning / response.

Prototype Design:

For an initial prototype I assumed:

  • Inference target: local SLM with a possible switch to remote in future versions
  • Core triggers handled by Koog; Koog orchestrates MCP calls (weather, seasonal info, local dishes)
  • Primary platform: Android (Jetpack Compose), with modules and KMP for core logic in mind for future versions
  • Push notifications are the output surface

A sequence diagram for the agent orchestration could be like this:

Koog orchestration diagram for initial prototype

Both the input context and the output answer can be described in DTOs

Input DTO (from Koog to the language model)

userLocale: string (e.g., es-ES)
country: string (e.g., ES)
mealType: enum {BREAKFAST, LUNCH, DINNER, SNACK, WATER}
alreadyLogged: { breakfast: bool, lunch: bool... }
timeNow: ISO datetime
quietHours: { startLocal: string, endLocal: string }
weather: { condition: enum, tempC: number, feelsLikeC: number }
season: enum {WINTER, SPRING, SUMMER, AUTUMN}
localDishes: array of { name: string, mealTypes: enum[], }
motivationLevel: enum {LOW, MEDIUM, HIGH}
dietaryTags: array of strings (optional: vegan, halal...)
recentStreakDays: int
Enter fullscreen mode Exit fullscreen mode

For notification output DTO:

title: string
body: string
category: string (e.g., meal_reminder, hydration)
language: string
confidence: number (01)
abVariant: string (for experiments)
Enter fullscreen mode Exit fullscreen mode

Prototype Implementation:

It may sound incredible, but building a simple prototype with a significant part of the requirements described above, is quite straightforward. There is solid documentation from Google and JetBrains with multiple examples. None of them includes local inference but it’s a matter of time that those two worlds converge.

The current implementation contains a simple screen for downloading the model from a static link, changing some parameters and prompting the downloaded model. It demonstrates the notification engine can ‘think locally’ before speaking. The output is a text and a notification message.

Prototype snapshots

Reflections & Trade-offs:

The Koog framework is overkill for a simple operation like querying a model: Before using agents, it’s better to find the simplest solution possible, and only increasing complexity when needed. But this project is a base for learning and experimentation; future versions will earn the additional structure.

The integration of MediaPipe inference and Koog isn’t smooth: MediaPipe LLM inference is session‑based, and model initial load is slow, typically happening in the background. On the other hand, Koog samples assume always‑alive remote APIs. A practical fix is to bind MediaPipe’s session lifecycle to the app’s main activity (or a scoped service) and expose a readiness state to Koog.

The language model, even downloaded in background, is overkill just for pushing better messages: Absolutely, the model should serve multiple on‑device tasks to justify its footprint. Even in that case, output and performance testing is needed on different Android devices.

A local dev setup for testing increases the speed: The Ollama + Mistral combination takes 30 seconds to install and provides a local LLM for testing Koog agents and Kotlin pipelines in a pure JVM project.

Privacy and performance is another interesting topic: The system runs inference on-device by default and at this stage, doesn’t send personally identifiable information externally. It could cap the prompt size using compact JSON and caches MCP outputs (like weather or season) to reduce latency and battery usage. Timeouts and fallbacks ensure reliability, and an optional remote model is available only with explicit user opt-in.

Is Multiplatform a good added value?: Not in the first stage of the prototype. Although the Koog related classes are in a separate module, I consider it’s better to invest efforts experimenting with the agentic structure and add the local inference in iOS once the Android side is ready and tested.

Testing and Evaluation:

As a double wink to product managers, the whole feature can be tested using the following approach:

  • A small but representative “golden set” of 50 context scenarios (combining meal type, time, weather, region, and dietary tags) to verify the system responds appropriately across typical edge cases.
  • Linguistic checks — length, emoji count, locale, forbidden words, and sensitive claims — to ensure messages are safe, readable, and culturally consistent.
  • For impact, A/B test template-based notifications versus LLM-generated ones, measuring tap-through (CTR) and time-to-log to confirm real user benefit.
  • Finally, we could enforce an end-to-end latency budget of roughly 250–600 ms on mid-range devices, quantized ~1B SLM, short prompts, and constrained decoding; if the pipeline exceeds that threshold, it will fall back to deterministic templates to preserve UX reliability.

What is Next?

  • Injecting real context (weather MCP, season, local meals)
  • Prompt finetuning
  • Second model call for translation
  • Post-processor and safety checks
  • Remote LLM fallback

Links

Top comments (0)