Eight. Eight hours of average screentime every day.
That is a third of my whole day gone with the wind every single day.
I'm not proud of it, of course... though now I am much better in my daily time management. (Android said I used my phone 11 hours less than last week. Woo-hoo!) From that experience, I really don't want to go back to that person again, so using the available tools that I have. I want to create something on my phone that nudge me to stop this insidious habit of mine whenever I fall down that pit again.
That’s when this personal project was born, an assistant, or better yet, a digital coach, to help steer my actions whenever I overuse my phone. It should observe what I’m doing on my phone, keep track of my usage, and intervene to remind me of the patterns I’m falling back into again. Also, I’m a huge fan of F.R.I.D.A.Y., Tony’s new assistant in Iron Man: Age of Ultron, so if it’s gonna tell me to stop my Netflix binge as my coach, I’ll probably listen to it, haha.
Here are some teasers from my app:
So that's my goal: build an AI assistant that watches my phone and tells me to touch grass.
A quick note on privacy: Yes, an app that screenshots your phone raises red flags. For this proof-of-concept, all screenshots are processed locally, encrypted in transit, and never stored as images—only text summaries.
This project will be divided into 3 dev logs, each posted in their separate blog:
- Devlog #1: Building a proof-of-concept (you are here)
- Devlog #2: Teaching the AI to read and respond to rich phone context
- Devlog #3: Building intervention tools and polishing the "coach" personality
Devlog #1 – Building a Proof-of-Concept
Section 1: Planning (The Easy Part)
What does a "Screen Time AI Coach" need to do?
- Understand phone context – via screenshot capture + AI vision
- Communicate with me – through overlay dialogues (RPG-style pop-ups)
- Feel like FRIDAY – witty, observant, and actually helpful
So the requirements are pretty concise! Let's map out my tech stack:
- GPT-4o-mini for screenshot-to-text summarization
- GPT-4.1 for FRIDAY's personality and responses
- Android Foreground Service for persistent background operation
- Media Projection API for screen capture
Here's the basic flow:
Simple, right? Take a screenshot every 18 seconds, summarize it, send it to FRIDAY, get a response.
I thought I'd finish in a week. Oh, how wrong I was...
Me back then:
Section 2: The Android Development Gauntlet
Confession time: I'm a full-stack web developer. I had never touched Android Studio before this project.
I thought, "How different can it be? Both are apps, right? How hard can this be?"
The answer? Surprisingly difficult!
I'm not going into too much detail because I want this blog to revolve more around the LLM system, but here are some notable surprises I encountered for tech-savvy people:
The Biggest Surprises
-
Screen capture isn't simple
- You can't just call
captureScreen() - It must register a system-level media projection service
- Plus, it requires explicit user permission every time the app starts
- And can only have ONE media projection active (this bit me later when trying to record demos)
- You can't just call
-
The "Uh oh, Android just kills my app" problem
- So, Android aggressively kills background processes, and I need this process to be persist if I want to see what the user is doing
- Solution: Foreground Service with persistent notification
-
So many permissions to be enabled
- The overlay dialogue system also requires additional permission from the user
- (duh. You think Android is gonna let you display anything over the user's screen willy-nilly?)
- Notification needs permission, write to external storage need one too, internet,...
But through all those headaches, here's what the architecture looks like now:
(Some components blurred—don't worry they are revealed in Section 3)
After a week of Stack Overflow and arguing with Android's permission system, I finally had screenshots flowing to OpenAI. Time to teach the AI what to do with them.
Section 3: Teaching FRIDAY When to Talk (and How to Remember)
Before we dive into implementation details, let's talk about how the OpenAI API actually works.
Quick API Primer: Unlike ChatGPT's website where conversations persist, the Chat Completion API is stateless. Every request must include the entire conversation history. It's like talking to someone with amnesia—you have to remind them of your whole conversation every single time.
Now that we understand that, let's look at the problems this creates.
Problem #1: The $473/Day Token Crisis
After a few days of coding, we have a working prototype! Here it is in action:
Cool right? Message me if you're curious about what the system prompt looks like 😄.
But there is a problem…
In the report, my token count is increasing rapidly. (A token is basically a unit for characters for the LLM – the longer the chat history in the request, the more tokens the LLM needs to consume.)
Now my initial count is ~3,000 tokens and it is growing by ~300 tokens per request. The request frequency is one every 18 seconds (200/hour)
Let's do the math:
3,000 + 3,300 + 3,600 + 3,900...
= 6,570,000 tokens/hour
= 157,680,000 tokens/day
× $3 per 1M tokens (GPT-4.1 pricing)
= $473/day
$473. Per Day. That's my MONTHLY rent. WHAT THE- Sigh... These things are never easy...
Solution: Internal Memory System
Remember how the OpenAI API has no memory? Every request needs the full conversation history. As that history grows, so do the tokens—exponentially.
I needed to cap the context window without losing important information.
There are many ways, to be honest:
- Truncating unnecessary data in the chat history
- Summarizing past chat history (long-term self-condensing context)
- A better screenshot strategy (don't screenshot when the user is AFK, for example)
All are valid approaches to ultimately reduce the request size, but I just wanted to continue prototyping, so the low-hanging fruit I could do fast was to implement an internal memory.
In short, every time the LLM responds:
- Have it revise the chat history, find some notable information about the user, save it in memory
- And for every request, only take 20 of messages from the chat history.
Result: Token count stabilized at 8,500-9,500 per request

Each screenshot loop's token is not getting over **9000. Great!
This dropped projected daily costs from $473 to ~$25-30 (still expensive for a personal project, but manageable for testing since I am not using it extensively, so it is more like $1-2 a day).
Here's the updated architecture with memory management:

ChatManager which is the chatbot that can create memories and only takes 20 of the most recent chat messages.
Problem #2: The Spam Problem
Token economy: ✅ Fixed
Next problem: FRIDAY won't shut up.
Picture this timeline:
- 10:00 AM – User opens YouTube
- 10:01 AM – User watches a Short about training doves
- 10:02 AM – User watches Iron Man death scene (ops, spoilers!)
- 10:03 AM – User watches another Short…
- 10:04 AM – Another Short…
- …
- 6:00 PM – User finally closes YouTube
Should FRIDAY comment on every single Short? That's 480+ interruptions over 8 hours. Imagine this is FRIDAY every couple of seconds.
Annoying right? Ideally, at first, it should comment on something funny on the first short or two, but afterward, the A.I should recognize my YouTube short addiction pattern, and after an hour, have a conversation to intervene.
So, how do you make the A.I recognize certain repeating patterns based on memories, timeline, and chat history over a long period of time?
Well, that's for devlog#2 since this devlog is already long enough. But I wanted at least to have the A.I not re-comment on repeating or unimportant stuff (like your home screen or a blank screen).
Solution: Teach FRIDAY to Ignore Boring Stuff
The fix is actually simple but takes a bit of trial and error to tweak it right. Just add these rules to the system prompt:
[RULES – RESPONSE BEHAVIOR]
* Avoid redundancy.
* Do NOT comment on trivial or static events (e.g., home screen, idle phone, blank screen).
* If the current observation is essentially identical to a recent one, ignore it completely.
* You only respond when the situation warrants insight or when something new occurs.
Result: FRIDAY now stays quiet during repetitive activities and only speaks up when there's something worth saying.

FRIDAY isn't responding if I afk on its chat screen

Last message was at 9:31 and now it is 9:34 so FRIDAY hasn't been rambling for 3 minutes about a mundane chat history screen
Section 4 - Proof-of-Concept Demo
Ah, finally! Through all that, let’s check out our end product, shall we?
- FRIDAY watching me watching CinemaWins watching Iron Man 2:
What works: The personality feels right—witty but not annoying.
What needs work: Sometimes misses context from the video content itself.
Demo 2: Calling Out My YouTube Shorts Addiction
FRIDAY intervening after detecting a pattern:
What works: Actually makes me pause and think (which is the goal).
What needs work: The intervention timing is still arbitrary—needs smarter pattern recognition.
Summary:
Here we have a proof of concept that can:
- A: Understand our phone context via screenshot capture, although the summaries definitely need to be more robust in the future if we want to capture all information and nuances
- B: Communicate with the user though an overlay dialogue system.
- C: Talk like FRIDAY, comment on my phone screen content and warning me not watching any short content.
What's Next
In Devlog #2, I'll tackle the hard problem: teaching FRIDAY to actually understand my habits over hours and days, not just seconds. This means:
- Better context extraction from screenshots
- Long-term memory and pattern matching
- Smarter intervention timing (not just "you've been on YouTube for a while")
The goal is to transform FRIDAY from a snarky commentator into an actual behavior-change coach.
Was building an AI to solve my YouTube addiction overkill? Probably, yes?
Did I learn a ton about Android dev, LLM context management, and prompt engineering? Yes, a lot.
Would I do it again? …Ask me after I see next month's OpenAI bill, haha...
Thanks for reading!
— Jackal








Top comments (1)
An actual nice way to use AI, rather than the usual AI slop we see now. 😭😭🙏🙏🙏
But, not gonna lie, I would love to see the progress and see how Jarvis here evolves and becomes super personalized. "Jack, for the past month you have been watching Dexter content..." or something like that, and see whether it actually helps.
I want to say that you should give it some access to some phone abilities, like blocking your apps at certain times, or turning off your phone. But if the AI went rogue (like malfunctioned), it might just kill your phone. 😭
But other than that, nice one. Hope Jarvis becomes the best nagging mom-like AI that always tells you the problem is the phone, from the phone.