DEV Community: Keerthana

From Half‑dead Prototype to Local‑Only AI Medical Assistant: Rewiring MedClinic with GitHub Copilot

Keerthana — Mon, 25 May 2026 13:55:21 +0000

This is a submission for the GitHub Finish‑Up‑A‑Thon Challenge

What I Built

I built MedClinic, a fully local AI‑powered medical assistant that runs on a MedGamma‑2B‑class model without any third‑party APIs or cloud services.

Instead of slapping a shiny frontend on an off‑the‑shelf API, I:

Wrote the entire orchestration layer by hand (no pre‑trained wrappers).
Pipelined plain user text → MedGamma‑2B inference → structured JSON response as a pure inference pipeline.
Did not use any external API — everything lives on‑device.

The abandoned prototype (3 months ago)

Demo

Link: https://github.com/pulipatikeerthana9-wq/medclinic-voice-scribe

Now changed to

The Comeback Story

MedClinic started as a half‑dead prototype buried in a forgotten branch. The older version had:

Basic voice‑to‑text that I struggled to build without much prior experience, and it felt extremely hard to even get working.
A single monolithic function.
A 90‑second pause before every answer due to unoptimized inference.

I had just one ingredient: a local MedGamma‑2B‑like model sitting idle on my machine. No Play‑Cloud, no “API magic” — just raw model weights and a stubborn idea that a local‑only doctor‑in‑your‑laptop is possible.

What changed everything was GitHub Copilot:

Copilot became my architect for the pipeline.
My job was to sanity‑check the model design, trim the boilerplate, and own the safety guardrails.

In under a month, the MedClinic branch went from “proof of concept” to a hands‑on assistant that gives coherent, structured medical‑style answers — all without a single API call.

GitHub Copilot’s role (how it changed everything)

Here is where Copilot stepped in:

Pipeline design

I asked:

“How do I structure a voice‑input → MedGamma‑2B inference → structured JSON medical‑assistant pipeline?”

Copilot returned three layers:

input‑sanitizer
inference‑router
JSON‑formatter

I kept all three and wired them around MedGamma‑2B.

Model‑context scaffolding

Copilot generated:

Prompt templates
Role‑system messages
Safety guardrails

that were tailored to MedGamma‑2B’s capabilities.

Token‑aware logic

Copilot reminded me to:

Chunk user input
Trim old context
Stay under MedGamma‑2B’s context window

This is critical when you have no API retries and must avoid timeouts.

Testing scripts

Copilot wrote unit‑style tests that simulate patient‑style input and validate MedClinic’s JSON output shapes.

Where I pushed back

Copilot once suggested serializing the entire conversation into every call — a 10k‑token‑drag. I forced it to keep only the last 3 turns to stay under budget.
Early templates were too verbose; I cut about 40% of the prompt after reviewing Copilot’s own “better‑prompt” suggestions.

BEFORE VS AFTER

Aspect	Before Copilot & MedGamma‑2B	After Copilot‑Rewired MedClinic
Source code	Single file, spaghetti inference	Modular: voice → parser → inference → JSON formatter
Model usage	Raw prompt, no context-window awareness	Context-aware; trims history to stay under MedGamma‑2B’s token budget
Response format	Free-text paragraph	Structured JSON: diagnosis, symptoms, next_steps
Token pressure	No control, often past window	Token-sensitive trimming, pre-compressed chunks
UI feel	10s delays, no structure	Fast, structured, feels like talking to a junior doctor

SOAP Note transcription

My Experience with GitHub Copilot

Ease

Copilot removed the design friction, not the code‑writing.

I keep writing HTML/CSS myself, just like the e‑commerce example from the challenge.
But whenever I touched MedGamma‑2B orchestration logic, Copilot sketched the architecture and I polished it.

Power amplified by tokens

MedGamma‑2B’s context window is the hard limit — no retries.

Copilot helped me design a pipeline that never spills tokens:

Automatically summarize long patient histories.
Drop irrelevant context before sending to the model.
Pre‑compress repeated info into short tags.

In practice:

A 2‑minute patient voice transcript → ~1.2k tokens sent to MedGamma‑2B.
Copilot‑generated logic trimmed ~400 useless tokens just by removing filler and rephrasing.

MedClinic stays under budget while giving answers that feel like a human‑style consultation, not a chat‑bot‑style dump.

Copilot as co‑founder

GitHub Copilot didn’t just speed up my development — it rewired MedClinic’s brain.

Before: a local‑model prototype that felt like a toy.
After: a token‑aware, structured, local‑only AI physician assistant that I can run on my laptop with zero cloud dependencies.

How i used Gemini to turn my syllabus into a planner, quiz, and mind map within 5 minutes!!!

Keerthana — Tue, 19 May 2026 16:52:58 +0000

This is a submission for the Google I/O Writing Challenge

As a student, I wanted to turn my syllabus into something actually useful instead of just keeping it as a long list of topics. So I uploaded it to Gemini, and in about 5 minutes it generated a study plan, quiz, flashcards, and a mind map.

This made my revision feel much more organized and less overwhelming.

**Step 1: **I uploaded my syllabus and asked Gemini to break it down into the main topics first.

It quickly identified the major sections, which made the rest of the process much easier.

Step 2: After Gemini understood the syllabus, I used it to create a simple study plan based on the important topics.

prompt i used:

I have uploaded my syllabus and exam-related notes.
Your job is to turn this into a complete student study system.
Create the following:

A priority-based to-do list for the next 10 days.

A topic-wise study plan with daily time blocks.

A list of useful resources for each major topic, but only if they are directly relevant to the syllabus.

Flashcards for important definitions, formulas, and concepts.

A simple mind map structure showing how the topics connect.

A short revision strategy for the last 2 days before the exam.

Rules:

Stay strictly grounded in the uploaded syllabus.

Do not add unrelated topics.

Keep the output concise, structured, and easy to revise.

Highlight the most important topics first.

If any topic is unclear or too broad, flag it and suggest how I should study it.

Output format:

Use headings.

Use bullet points.

Make it easy for a student to follow immediately.

This part helped me see what to study first and made the syllabus feel much less confusing.

Step 3: Then I asked Gemini to make a quick quiz and flashcards from the important topics so I could test myself faster.

This was really helpful because it turned the syllabus into something I could actually revise in a short time.

Step 4: Finally, I asked Gemini to turn everything into a mind map, which made it much easier to understand the full syllabus at a glance when I had very little time.

Gemini also had a lot more options like generating audio, video, and even podcast-style content, which made it feel more powerful than I expected.

Overall, Gemini helped me turn a long and confusing syllabus into something practical and easier to study. For me, the best part was how quickly it created a study plan, quiz, flashcards, mind map, and even extra options like audio and video ideas. As a student, that kind of speed and flexibility can save a lot of time during exam prep.

What happens when an AI agent never stops watching? Hermes Agent as a persistent cognitive layer for your Digital life

Keerthana — Sat, 16 May 2026 03:47:07 +0000

This is a submission for the Hermes Agent Challenge
Last month, I watched three different people make the same kind of digital mistake.

A tired student clicked a “dream internship” link at 1 AM and almost submitted personal details to a fake form.
Someone else sent money too quickly because a message said “urgent, reply in 5 minutes or it expires.”

A third person pushed an important task all week, then finished it in a panic at the last possible hour.

None of them were stupid.
They were just stressed, distracted, and overwhelmed which is how most digital mistakes actually happen.

Most AI tools today do nothing about this.
You open a chat box, type a prompt, get an answer, and close the tab. The AI disappears until you remember to ask for help again.

Hermes Agent points at a different future.

Instead of being a reactive chatbot, it’s an open‑source agent that can keep running, use tools, remember context, and act on its own over time.

In this post, I want to treat Hermes not as “just another assistant,” but as something deeper:
a cognitive layer in the background of your digital life that quietly watches for patterns and steps in before you make predictable mistakes.

Hermes Cognitive Layer Workflow:

1. Humans Are Predictably Bad at Digital Decisions
If you zoom out, our online behavior is full of patterns:

We click suspicious links when we’re tired.

We accept fake urgency when we feel pressure.

We keep postponing meaningful work until fear finally kicks in.

We overshare files when we’re rushing.

These are not random glitches.
They are predictable cognitive weaknesses: impulsiveness, distraction, urgency manipulation, emotional spending, and procrastination.

Most tools including many “AI assistants” only respond after something has happened:
after you clicked, after you paid, after the deadline passed.

To prevent damage, an AI system has to do more than answer questions. It has to:

Stay present over time.

Learn your behavior patterns.

Intervene at the right moment, not just when asked.

2. From Reactive Assistants to a Cognitive Layer
Typical assistants have two hard limits:

They are short‑lived they exist inside a tab or app. Close it, and they’re gone.

They are prompt‑driven they wait until you explicitly ask for help.

That’s fine for Q&A, but not for reducing real‑world mistakes.

What I’m interested in is a different model:

Instead of “an app you open,” think of an ambient AI guardian that runs quietly, observes what you actually do, and introduces just enough friction when you’re about to do something you’ll regret.

That’s where Hermes Agent becomes a good foundation.

3. Why Hermes Specifically Fits This Vision
Hermes Agent is built for long‑running workflows, not just isolated prompts.

Out of the box, it offers:

Tooling: access to web, files, terminals, schedulers, and custom tools.

Scheduling: cron‑like jobs that run on a schedule without you being there.

Memory: persistent storage and retrieval of what it learns about you over time.

Skills: reusable behaviors that can be improved and reused automatically.

That combination tools + memory + scheduling + skills makes Hermes feel especially suited to act as a long‑running cognitive layer instead of a one‑shot chatbot.

It can watch event streams, write to its own memory, run cron jobs, and use those memories later when deciding whether to intervene.

4. Concept: An Ambient AI Guardian
I don’t think of this as
“a productivity bot” or
“a security app.”
I think of it as a background intelligence system that wraps around your digital life.

Roughly, the loop looks like this:

User Behavior – browsing, payments, file access, tasks.

Hermes Agent – running in the background.

Memory + Pattern Recognition storing events, learning habits.

Risk / Behavior Analysis :comparing the current situation to your normal patterns.

Timed Intervention – deciding whether to step in now, later, or not at all.

Outcome – warning, coaching, or soft protection.

In Hermes terms, this could be:

A cron job that reads recent events (from logs, APIs, or watchers).

A set of tools that ingest those events into long‑term memory.

A decision skill that runs inference over that memory and chooses whether to trigger a notification, open a dialogue, or pause an action.

The rest of this post shows how that layer might work in three domains.

5. Scenario 1 — Money and Scam Protection with Friction
People rarely lose money because they don’t understand interest rates.
They lose it because:

A site screams “limited time, buy now!”

A message pretends to be from their bank.

They’re exhausted and just want to click “yes.”

*With Hermes as a money and scam guardian, I imagine:
*
A browser‑automation tool that inspects pages where I perform payments (limited to domains I approve).

A memory store of my normal transaction patterns: typical amounts, recurring recipients, usual sites.

A scheduled or event‑triggered check whenever I’m about to confirm something unusual.

Instead of a vague “This is suspicious,” it could say:

“This transaction is larger than your typical range, going to a recipient you’ve never paid before, on a domain you haven’t used. Do you want to wait 2 minutes and review this carefully?”

The key elements are:

Pattern‑aware: it compares the current action to your history, not some generic rule.

Time‑aware: it steps in before the money leaves your account.

Friction‑based: it slows you down instead of silently blocking you.

Technically, this is just tools + cron + memory + a decision skill.
Conceptually, it’s an AI that protects you from your own rushed decisions.

*6. Scenario 2 *— Local‑First File and Privacy Guardian
We talk a lot about cloud privacy, but a more basic risk is someone accessing files directly on your laptop when you’re not paying attention.

Here, Hermes could become a local‑first file guardian:

A filesystem watcher tool monitors just the directories you mark as sensitive.

Events like “new process reading private folder” or “unusual time of access” are logged into memory.

A small analysis skill periodically reviews those logs.

When something looks off, instead of silently allowing it, the agent could:

Immediately notify you (“This folder is being accessed in a way you don’t usually see.”).

Temporarily hide or lock that directory until you confirm it’s fine.

The important part is that this system runs locally:

Hermes Agent running on your own device means this behavioral surveillance doesn’t have to be uploaded to an external cloud just to be useful.

That local‑first execution gives the whole idea a more ethical and privacy‑aware foundation and fits Hermes’ open‑source spirit.

7. Scenario 3 — Behavior‑Aware To‑Do Coaching
To‑do lists rarely fail because the UI is bad.
They fail because humans are predictable:

We delay uncomfortable tasks.

Our energy peaks and crashes at consistent times.

Notifications blur into background noise.

Hermes can act as a behavior aware to‑do coach:

A task tool syncs with your to‑do list or stores tasks in a local file/DB.

A memory module keeps track of when you actually complete tasks, not just when you create them.

A scheduled skill analyzes this to learn your personal productivity cycles and avoidance patterns.

Then the agent’s messages become more intelligent:

Instead of “You forgot your task,” you get:

“You usually delay complex tasks after 9 PM. Should I break this into smaller subtasks and schedule the first one for tomorrow morning, when you usually focus better?”

Instead of spamming you every hour, it nudges you at your actual best times.

As a student, I notice how often my own mistakes happen not because I don’t know what to do, but because I’m tired, scrolling, or overloaded.
This kind of Hermes setup doesn’t do the work for me it just catches my bad habits in the act and makes them harder to ignore.

8. Cross‑Domain Awareness: Connecting the Dots
The really interesting part is when this cognitive layer begins to connect behavior across domains:

Not enough sleep → lower focus → more rushed decisions → higher scam risk.

Stressful week → more procrastination → more temptation to click “easy money” offers.

Because Hermes can orchestrate multiple tools and store long‑term history, in theory it could say things like:

“You’ve slept poorly for three nights, postponed two important tasks, and now you’re about to make an unusually large purchase on a new site. Are you sure this isn’t stress‑spending?”

At that point, it stops being “automation” and starts feeling like adaptive cognition support — helping your future self by noticing patterns your present self is blind to.

9. Where This Could Go Wrong (and Why Design Matters)

A system like this is powerful, which means it can also be dangerous if designed badly.

In the worst version, it could:

Turn into invasive behavioral surveillance.

Manipulate you by over‑optimizing for “engagement” or “safety.”

Create an unhealthy dependence where you outsource all judgment to the agent.

That risk is exactly why things like local‑first execution, explicit permissions, transparent memory, and user‑controlled boundaries are non‑negotiable.

The goal is not to build an authoritarian digital parent.
The goal is to build a trustworthy second system that slows you down just enough to think clearly.

10. From Apps to Cognitive Infrastructure
I originally started thinking about this after noticing how easy it is, especially as a student, to:

ignore tasks until panic hits,

trust fake opportunities when stressed, and

click things too quickly when I just want the problem to disappear.

Hermes Agent, with its long‑running workflows, memory, and tool orchestration, gives us a way to experiment with a different style of AI:
not just a chat window we open, but cognitive infrastructure that quietly supports better decisions.

Maybe the most important AI systems of the future won’t be the loudest ones.
They might be the quiet background agents that help humans make fewer irreversible mistakes not by replacing our thinking, but by giving us one more chance to think before we act.

Stop just chatting with AI: Build real skills in GenAI and Prompt Engineering

Keerthana — Sun, 10 May 2026 04:07:56 +0000

You’re Underusing AI: It’s More Than Just ChatGPT
Most people think “AI” means asking ChatGPT to write an essay or Midjourney to make a cool image.

In reality, there are entire families of AI systems, dozens of generative tools, and a new skill set called prompt engineering that almost nobody around you is using properly yet.

If you’re a student, developer, or tech-curious learner in 2026, you are still early.
This post is your high-level map: what types of AI exist, what “generative AI” actually means, what prompt engineering is, and where to learn all of this for free or very cheap.

1. First: AI is not one thing
Let’s kill one myth: AI is not a single magical brain.
It’s a collection of different model types designed for different jobs.

At a high level, you’ll often hear about:

Discriminative models: These models classify things. They answer questions like “Is this spam or not?”, “Is this a cat or a dog?”, or “Will this customer churn?”

**Generative models: **These models create things. They can generate text, images, code, audio, or video that looks like the data they were trained on.

**Foundation models / LLMs: **Huge models trained on massive datasets that can be adapted for many tasks: chatbots, coding assistants, search, agents, and more.

If you want a gentle, visual explanation of “discriminative vs generative,” this short video helps:

Generative vs Discriminative AI Explained (YouTube):
https://www.youtube.com/watch?v=HfRwJFk66dc

Discriminative vs. Generative Models – Coursera article:
https://www.coursera.org/articles/discriminative-vs-generative-models

Understanding this distinction already puts you ahead of most people who treat “AI” as one big black box.

2. What exactly is Generative AI?
Generative AI (GenAI) is the branch of AI focused on generation — text, images, code, audio, and even 3D assets.

If you’ve used ChatGPT, DALL·E, Midjourney, Claude, Gemini, or GitHub Copilot, you’ve already touched generative models.

Common use cases:

Text: blog posts, emails, social media, documentation, lesson plans, summaries.

Code: boilerplate, refactors, tests, debugging hints, entire small tools.

**Images: **thumbnails, UI concepts, marketing banners, art references.

Audio & video: synthetic voices, podcast clips, explainer videos, dubbing.

Good beginner-friendly Generative AI intros:

Introduction to Generative AI – Google / Coursera (micro-course):
https://www.coursera.org/learn/introduction-to-generative-ai

Introduction to Generative AI – Google Skills:
https://www.skills.google/course_templates/536

Beginner: Introduction to Generative AI learning path – Google Skills:
https://www.skills.google/paths/118

Generative AI Full Course for Beginners (Intellipaat, YouTube):
https://www.youtube.com/watch?v=Pq8lW5y8JpA

Generative AI Full Course 2025 (Intellipaat, YouTube):
https://www.youtube.com/watch?v=QoVq7Yn0d90

The important mindset shift: GenAI is not just “ask it to do your homework.”
It’s a toolbox for building apps, automating workflows, and augmenting your skills, not replacing your brain.

3. Prompt engineering: the missing skill everyone skips
Most people type one sentence into a model, get a mid result, and say “AI is overrated.”
The problem usually isn’t the model — it’s the prompt.

Prompt engineering is the skill of talking to models in a structured way so you get reliable, high‑quality outputs.

It includes simple but powerful patterns like:

Giving role + goal: “You are a senior Python mentor. Help me refactor this Flask API for better security.”

Providing context + constraints: “Use bullet points, be under 200 words, and avoid jargon.”

Iterating: “Now rewrite this for LinkedIn,” “Turn this into a step-by-step checklist,” etc.

Great places to learn prompt engineering (for free):

25+ Free Prompt Engineering Courses (coursesity list):
https://coursesity.com/free-tutorials-learn/prompt-engineering

Top 5 Free Prompt Engineering Courses with Certificates – upGrad blog:
https://www.upgrad.com/blog/prompt-engineering-courses/

Best Free Prompt Engineering Courses 2026 (FreeAcademy ranking):
https://freeacademy.ai/blog/best-free-prompt-engineering-courses

LinkedIn post: “Here are the 5 free courses to learn Prompt Engineering in 2026”:
https://www.linkedin.com/posts/iamskabir_open-ai-google-facebook-have-all-released-activity-7425136055315738624-1lp5

Good prompts turn AI from a toy into a serious productivity booster.
This is why there are now full courses and certificates dedicated only to prompt engineering.

4. Where to learn AI and Generative AI (even as a beginner)
You don’t need a PhD or expensive bootcamp to start.
There’s a ton of structured learning content that is free or low-cost and beginner-friendly.

Some solid starting points:

Introduction to Generative AI (beginner Coursera course, 4 modules):
https://www.coursera.org/learn/intro-gen-ai

Introduction to Generative AI – in-depth Coursera course (Transformers, GANs, Diffusion):
https://www.coursera.org/learn/introduction-generative-ai

Introduction to Generative AI Specialization – Coursera learning path:
https://www.coursera.org/specializations/introduction-to-generative-ai

Google’s Generative AI path on Google Skills:
https://www.skills.google/paths/118

Generative AI Full Course (Intellipaat, YouTube – long, hands-on friendly):
https://www.youtube.com/watch?v=Pq8lW5y8JpA

Another full GenAI course (Intellipaat, 2025 version):
https://www.youtube.com/watch?v=QoVq7Yn0d90

These will give you the mental model: what GenAI can do, what terms mean (LLM, embeddings, fine-tuning, RAG), and where it fits in the bigger AI ecosystem.

5. Where to learn prompt engineering properly
If you want to stand out, don’t stop at “using ChatGPT.”
Go one level deeper and actually learn prompt engineering frameworks.

Links worth bookmarking:

25+ Free Prompt Engineering Courses (curated list):
https://coursesity.com/free-tutorials-learn/prompt-engineering

Top 5 Free Prompt Engineering Courses with Certificates (upGrad):
https://www.upgrad.com/blog/prompt-engineering-courses/

Best Free Prompt Engineering Courses 2026 – FreeAcademy (with rankings):
https://freeacademy.ai/blog/best-free-prompt-engineering-courses

LinkedIn breakdown of 5 free prompt engineering courses (OpenAI, Google, Meta, etc.):
https://www.linkedin.com/posts/iamskabir_open-ai-google-facebook-have-all-released-activity-7425136055315738624-1lp5

Treat prompt engineering like you’d treat SQL or Git: it’s a core skill, not a “nice to have,” if you want to build serious GenAI-powered products.

6. A simple roadmap: What to learn and in what order
If you’re overwhelmed, here’s a high-level path you can follow.

Learn AI basics (conceptually)

Learn what AI vs ML vs deep learning means (any ML 101 video or article works; the Coursera discriminative vs generative article is a good start).

Watch the “Generative vs Discriminative AI” YouTube explainer:
https://www.youtube.com/watch?v=HfRwJFk66dc

Understand Generative AI and LLMs

Take a short GenAI intro course:
https://www.coursera.org/learn/introduction-to-generative-ai

Or follow a beginner path like Google Skills’ GenAI learning path:
https://www.skills.google/paths/118

Use a full YouTube course for hands-on demos:
https://www.youtube.com/watch?v=Pq8lW5y8JpA

Practice prompt engineering daily

Pick one of the free prompt engineering course lists:
https://coursesity.com/free-tutorials-learn/prompt-engineering

Or use the FreeAcademy interactive course to practice prompts:
https://freeacademy.ai/blog/best-free-prompt-engineering-courses

Build tiny projects

After a module or two, build something small: a content generator, a study notes bot, or a code review helper.

Most GenAI courses and learning paths now include mini-projects and guided labs.

Go deeper if you enjoy it

Use a more advanced Generative AI course (with Transformers, GANs, Diffusion):
https://www.coursera.org/learn/introduction-generative-ai

Follow a full specialization / path if you want a structured route:
https://www.coursera.org/specializations/introduction-to-generative-ai

You don’t have to learn everything at once.
The goal is to stack skills: first understanding, then prompting, then building.

7. Why this matters in 2026
In 2026, AI is not “future tech” anymore — it’s infrastructure.
Companies are quietly wiring GenAI into customer support, internal tools, analytics, marketing, and developer workflows.

Most people around you will still treat AI as a fancy autocomplete.
If you understand the varieties of AI, master prompt engineering, and can ship small GenAI projects, you’re already in the top few percent of users.

So if you’ve been telling yourself “I’ll learn AI someday,” consider this your sign:
Someday is now.

I rushed my First Gemma 4 idea. Here’s what it taught me about building local AI for safety

Keerthana — Fri, 08 May 2026 13:57:04 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4.

When I first joined the Gemma 4 Challenge, I rushed to publish an idea I was genuinely excited about: a local AI safety layer that could help in emergencies even when you cannot reach your phone.

Looking back at that first post, I realized I missed some important things in how I framed the idea and how I explained the system. I had a strong concept, but I did not ground it enough in a real user, a realistic prototype path, or the actual experience of using it.

This is the version I wish I had written first.

In this article, I want to do three things:

Briefly recap the original idea.
Be honest about what I got wrong.
Show how I would redesign the concept now so it feels closer to something a developer could actually prototype.

The core problem: your SOS app assumes you can move

Most modern phones already have emergency and safety features. They can:

Call emergency services
Share your location with trusted contacts
Trigger alarms or alerts

But there is a hidden assumption behind all of them: you can reach your phone and interact with it.

What if you cannot?

Your hands are not free.
You are injured or semi-conscious after a fall or accident.
Someone has taken your phone away, or it is simply out of reach.

In those moments, your smart safety setup becomes much less useful. The tools exist, but the person cannot operate them.

That gap is what made me think about a local AI safety layer powered by Gemma 4: a system that could notice unusual patterns around you and start helping before you unlock your screen and open an app.

Before Gemma 4 vs after Gemma 4

Before Gemma 4, ideas like this felt harder to take seriously as local-first tools. Either the model would be too limited, or the whole flow would end up depending on the cloud anyway.

After Gemma 4, the idea feels more realistic. Local AI starts to look less like a toy and more like a usable reasoning layer that can sit closer to the user, the device, and the moment where a decision actually matters.

That shift is what pulled me toward this challenge.

What I originally tried to build with Gemma 4

In my first post, I described a local safety layer powered by Gemma 4 that would quietly watch signals around a user and decide when to step in.

The basic idea was:

Continuously monitor context from sensors and devices.
Let a local AI model reason about what is happening.
Escalate only when the situation really looks dangerous.

In my mind, this was not supposed to be just another cloud AI feature. I was imagining something closer to a personal guardian that could run locally on a phone, wearable, or nearby edge device.

That is also why Gemma 4 felt interesting to me.

Why Gemma 4 matters here

What makes Gemma 4 exciting to me is not just that it is powerful. It is that it makes local-first AI feel much more practical.

For a safety-related idea, that matters because local AI changes the tradeoffs:

Lower latency: you do not want every decision to wait on a cloud round trip.
More privacy: sensitive context like motion, location, and health-adjacent patterns should stay as local as possible.
Better resilience: in a bad situation, weak connectivity is exactly what you should expect.

That said, one of my mistakes in the first version was that I kept saying local AI and signals in a very abstract way. I did not really show what that could mean as a prototype.

Mistake 1: I talked about signals without a real stack

In my head, I was imagining motion, location, sound, notifications, maybe even smart home events. But I wrote about them like vague inputs instead of a real developer workflow.

If you are reading this as a builder, you naturally want more than the concept. You want to know:

Which devices?
Which APIs?
Which runtime?
How do all the pieces connect?

So here is the more realistic v0 stack I would use now.

How I’d prototype this in one weekend

Simulate motion and location events in JSON
Run Gemma 4 locally with a simple prompt
Classify events into normal / concern / emergency
Trigger a silent countdown flow
Log override feedback from the user

A more realistic v0: how I would prototype this now

If I were prototyping this idea today, I would start small.

Which Gemma 4 model fits this idea?

Small (2B/4B): the ideal long-term destination for running on phones, wearables, or other edge devices.
31B Dense: a strong option for prototyping the reasoning loop first on a local GPU or cloud machine.
26B MoE: more interesting later if the system ever needs to handle many users or events at high throughput.

At my current stage, I would think of 4B as the long-term edge target and use a stronger setup first to test the reasoning flow.

1. Devices and sensors

I would begin with the devices people already have:

An Android phone using Sensor APIs like the accelerometer and gyroscope
Location services to detect movement, sudden stops, or unusual context
Optionally a smartwatch for heart rate and motion if available

Even just accelerometer plus location is enough to simulate interesting emergency scenarios.

2. Local runtime

For early experiments, I would start simple:

Run Gemma 4 locally on a laptop with Ollama or LM Studio
Use that setup to test prompts, event formatting, and decision logic
Only later think about moving inference closer to the phone or an edge device

This is another thing I understand more clearly now: you do not need a perfect mobile deployment on day one to test whether the reasoning flow makes sense.

3. Backend glue

I would use a small backend service such as:

Python + FastAPI
Node.js + Express

That service would:

Receive events from the phone through HTTP or WebSocket
Normalize them into structured JSON
Send short batches of recent context to Gemma 4

A tiny queue or buffer layer would also help reduce noisy sensor spam before every event reaches the model.

4. Gemma 4 as the reasoning layer

This is where Gemma 4 does the most interesting work.

Instead of hardcoding dozens of brittle if-this-then-that rules, I would use Gemma 4 to reason over a stream of events and classify the situation into something like:

Normal
Mild concern
Probable emergency

For example, the model could be prompted to read recent sensor context and respond with structured JSON such as:

{
  "severity": 3,
  "reason": "Sudden fall detected, user not moving for 60 seconds, elevated heart rate, unusual location context.",
  "recommended_action": "Trigger SOS countdown and notify trusted contact."
}

What I like here is that Gemma 4 is not replacing the app. It is acting as the decision-making layer inside the app.

5. Safety actions and UX

For a first version, the system does not need to be complicated.

A useful v0 could do this:

Start a silent 15–30 second countdown when the model predicts a probable emergency
Let the user cancel quickly if they are okay
If there is no response, send location to a trusted contact and optionally trigger an SOS flow

At that point, the idea stops being a vague AI safety concept and becomes a prototype path.

Mistake 2: I did not anchor the idea in one real person

Another mistake I made in the first post was talking about emergencies too generally.

That made the idea sound broad, but also blurry.

If I am honest, the use case I kept imagining most strongly was women’s safety: walking alone at night, travelling alone, or being in situations where taking out a phone may be too slow or may even escalate danger.

That does not mean the concept could not help elderly users, accident recovery, or other scenarios. But if I were designing a v0 now, I would not hide behind a vague everyone framing.

I would say clearly: this first version is designed around one urgent user story.

That single decision already makes the product thinking better.

Mistake 3: I focused too much on architecture, not enough on experience

As developers, it is very easy to jump into models, stacks, APIs, and pipelines.

I did that.

But the more important question is: what does this feel like for the person using it?

If this became a real app, the experience might look like this:

The user installs the app and sets trusted contacts.
The app quietly monitors motion and location patterns in the background.
When something unusual happens, the system sends a compact event summary to the local Gemma 4 reasoning layer.
Gemma 4 classifies the situation as normal, mild concern, or probable emergency.
If the risk is high, the app begins a silent countdown and asks for confirmation.
If the user does not respond, the app escalates automatically.

That flow is what makes Gemma 4 interesting to me here. It is not just generating text. It is helping a system decide when to move from watching to acting.

Roadmap if you want to explore this idea

If I were taking this further, I would do it in this order:

Simulate normal and suspicious event timelines in JSON.
Test Gemma 4 prompts locally with a small reasoning loop.
Build a tiny dashboard to replay events and inspect decisions.
Only then think about streaming real phone or wearable data.

That order matters. It keeps the idea grounded and prevents the project from becoming “hardware complexity first, learning second.”

What this taught me about writing about AI

The biggest lesson for me was not only about the idea itself. It was also about how to write better about AI projects.

A post becomes stronger when it has:

One real user instead of a generic audience
One believable prototype path instead of just ambition
One clear explanation of what the model is actually doing

My first version had genuine excitement, but this version has more structure and honesty.

Small update after the comments

One piece of feedback that really stayed with me came from the discussion on this post.

I had already started reframing this idea as:
simulate JSON → test Gemma 4 locally → then think about real devices.

But one comment helped me notice that the middle step deserves more attention than I first gave it.

Before trying to make the model “work,” I need to stay in observation mode a little longer:
What changes when the same event is phrased differently?
What stays consistent?
Where does the model hesitate, overreact, or simplify too much?

That made me realize my real first experiment does not need to be a full prototype.
It can just be:
a few fake event timelines,
a local Gemma 4 setup,
and a running note of intent → input → output.

That feels like a much more honest and practical place to begin.

The real shift

The point of this post is not to claim I have solved safety with AI. I have not.

But Gemma 4 makes it realistic for a student or indie developer to experiment with local-first safety logic in a way that feels much more practical than before.

That, to me, is the real shift.

Not just that local AI is getting stronger.

But that it is becoming personal enough, local enough, and usable enough to imagine systems that help in the exact moments where the cloud may not be enough.

If you have worked on local AI, safety systems, or context-aware apps, I would genuinely love to know how you would approach this problem differently.

Gemma 4, Read My Ingredient Label and Tell Me If It’s Lying: A Personal AI Health Filter

Keerthana — Thu, 07 May 2026 10:13:21 +0000

What I’m Building

Most apps still treat “healthy” like it’s a universal setting.
High protein? Great.
Low fat? Great.
Organic? Great.

Except… that’s not how real bodies work.

In the real world, “healthy” is completely different person to person. A product that’s perfect for one friend can quietly wreck another.

What’s Broken About “Healthy” Labels?

Think about these everyday situations:

Your gym friend swears by a “clean” protein bar, but it destroys your skin and your stomach.
Your dermatologist tells you to avoid certain ingredients, but your “gentle” moisturizer still triggers breakouts.
You’re trying to watch sodium or sugar, but the packaging just screams “FIT – NATURAL – SUPERFOOD” and never explains what it means for you.

Most people don’t have the time or background to:

Decode long ingredient lists
Know which chemical-sounding names are actually fine
Understand which combos might be bad for their skin, gut, or specific conditions

So what happens?

We either:

Trust the front label and hope for the best
Randomly Google ingredients one by one
Give up and buy the same 2–3 “safe” things forever

Meanwhile, all the real detail is sitting silently in that ingredient list.

Before vs After Gemma 4

Before Gemma 4:
“Healthy” meant whatever the marketing label or a generic app rating said.

After Gemma 4 (what I want to build):
“Healthy” becomes a personal decision, based on your own profile and what’s actually inside the product.

What If Labels Could Talk Directly to You?

Instead of asking, “Is this product healthy?” I want to ask:

“Is this product healthy for me?”

Here’s the concept I’m building around Gemma 4.

Your personal profile You create a simple, privacy-first profile (optional, but powerful):

Allergies
Skin conditions (like acne-prone or sensitive)
Intolerances (like lactose)
Goals (high protein, low sugar, low sodium, etc.)
Health concerns (like blood pressure, diabetes risk)

You scan a product label You upload a photo of a product label:

Packaged food
Skincare
Supplements
Cosmetics

Gemma 4 becomes the reasoning engine Gemma 4 will be the brain that:

Understands the image and extracts the ingredient list
Interprets what those ingredients actually are
Cross-checks them against your profile
Explains whether the product fits you, not just the “average” human

You get a personal verdict Instead of a fake universal health score, you get:

Safe – Likely compatible with your profile
Caution – Some ingredients might not play nicely with you
Avoid – Specific reasons why it conflicts with your goals or conditions

And most importantly, you get a short, human explanation instead of a mysterious “7.9/10 health score.”

A Concrete Example

Imagine this profile:

Acne-prone skin
Lactose intolerance
Trying to avoid high sugar intake

You scan a chocolate-flavored protein shake.

A generic app might say:

“High protein, moderate sugar. Healthy for active adults.”

But Gemma 4, with your profile in context, would aim for something more like:

“This shake contains whey protein and added sugars. While it helps with protein intake, the dairy-based ingredients may trigger issues for lactose-sensitive users, and the high sugar content could contribute to acne flare-ups and conflict with your low-sugar goal.”

Same product. Totally different conclusion, because the context changed.

Why Gemma 4 Fits This So Well

Looking at how others are using Gemma 4 on DEV, there’s a clear pattern: people are exploring local, personal, reasoning-heavy use cases rather than just building another chatbot. That fits this idea perfectly.

This project needs several capabilities:

Image understanding – read the label from a photo
Ingredient interpretation – understand what each item actually is
Contextual reasoning – connect those ingredients to user-specific risks and goals
Lightweight deployment – so it can eventually run locally on a phone or laptop

Gemma 4’s focus on multimodal reasoning and small, deployable models makes it a strong candidate:

It can be the reasoning brain that works on top of OCR or direct vision input.
It’s small enough that a future version of this could run locally instead of sending your health profile to some random server.
It’s already being explored for similar “personal AI layer” ideas in this challenge, which tells me this direction is aligned with what Gemma 4 is meant for.

What I’m Actually Going to Build

Important note: this is not a “here’s my finished app, sign up now” post.

This is:
“Here’s the problem, here’s the idea, and here’s how I want to build it with Gemma 4.”

Here’s the rough system flow I’m planning:

User profile layer Minimal, privacy-first profile: allergies, intolerances, skin type, goals.

Ideally stored locally or encrypted (especially if I get this running with a local Gemma 4 setup).

Image → ingredients User uploads a photo of the label.

Use OCR or Gemma 4’s multimodal abilities (depending on the stack) to pull out the ingredient list as text.

Structured ingredient understanding Normalize ingredient names (for example, “whey concentrate” → “dairy protein”).

Mark known flags:

High sodium
Added sugars
Common allergens
Comedogenic (pore-clogging) oils

Gemma 4 reasoning step Prompt Gemma 4 with:

The user profile
The structured ingredient data
Some domain rules (for example, “for acne-prone skin, be cautious with X, Y, Z”)

Ask it to:

Classify: Safe / Caution / Avoid
Explain the reasoning in short, clear language

Eventually this could look like a simple API call:

POST /analyze-ingredients

{
  "profile": {...},
  "ingredients": [...]
}

Response:

{
  "verdict": "Caution",
  "reasons": [...],
  "flaggedIngredients": [...]
}

User-facing output Clear badge: Safe, Caution, or Avoid

One short paragraph of reasoning in plain language

Optional:

A small list of which specific ingredients were flagged
Why they were flagged (for education)

Why Local AI Matters Here

This idea sits in a very sensitive zone: food, skin, health.

You might not want your:

Intolerances
Skin issues
Health goals
Ingredient history

constantly sent to cloud servers every time you scan something.

That’s why I’m particularly interested in exploring local deployments of Gemma 4 as this evolves:

Ingredient analysis that runs on your own device
Faster scans (no round-trip to a remote server)
More privacy for your health profile
A truly personal AI layer living on your phone or laptop

If you look at the current Gemma 4 challenge posts, a lot of people are already thinking in terms of “local AI as a new design space,” not just API calls. This project fits right into that mindset.

What This Is — and Isn’t

This is not:

A medical diagnosis tool
A replacement for your doctor, nutritionist, or dermatologist

This is:

A translation layer between confusing ingredient lists and your personal context
A way to quickly ask, “Does this make sense for me?” before you buy or apply
A starting point to bring more honesty and personalization into how we read labels

Where I Want to Take It

If the core ingredient interpreter works well, there are a lot of directions this could grow into:

Skincare compatibility checks for acne-prone or sensitive skin
Allergy-focused food scanning for specific triggers
Supplement “risk radar” for people on certain medications
Personalized grocery suggestions that avoid your red flags
A lightweight offline assistant that lives on your phone as a “health lens” on top of your camera

For now, I want to validate the core:

Can Gemma 4 reliably reason about ingredient lists in the context of one specific person, and produce explanations that feel useful, honest, and understandable?

If you’re also experimenting with Gemma 4 around labels, health, or local AI, I’d love to hear how you’re approaching it.

Your SOS app can’t Help if you can’t reach your phone — so I want to built a local AI Safety Layer with gemma 4

Keerthana — Thu, 07 May 2026 02:31:13 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4.

Most emergency and SOS apps quietly assume one thing:

In a crisis, you will be able to reach your phone, unlock it, open an app, and press the right button.

Real emergencies don’t always cooperate with that assumption.

Phones fall away. Attackers take or destroy devices. People freeze, panic, or lose consciousness. Networks drop exactly when you need them most.

That gap between “I installed a safety app” and “I actually got help when it mattered” is what made me start thinking about a different design: What if the reasoning behind emergency detection could run locally, on-device, instead of far away in the cloud?

Gemma 4’s smaller models make that idea feel much more realistic than it would have even a few years ago [web:5][web:16].

The problem with current emergency systems

Most consumer safety tools work on binary logic:

Button pressed = emergency.
No button pressed = no emergency.

Some newer systems add fall detection or basic automation, but they’re still fundamentally event-driven. A single trigger flips a switch.

In real life, a lot of emergencies are not a single clean, isolated event. A person falling while jogging is not the same as:

A fall followed by no movement.
Distress speech like “leave me alone” or “stop.”
An unusual heart-rate spike.
No response to the device for several seconds.

The interesting problem is not detecting a fall or listening for one phrase. The interesting problem is understanding context across multiple signals.

Cloud AI can help with that reasoning, but for emergency use it also introduces new risks:

Latency when every second matters.
Dependency on connectivity that may fail in exactly the worst moments.
Sensitive data (voice, location, activity patterns) constantly leaving the device.

This is where local AI starts to look less like a “nice to have” and more like an architectural requirement.

Why Gemma 4 is a good fit for this idea

Gemma 4 is a family of open models designed for different hardware realities:

Small effective 2B and 4B models aimed at ultra-mobile and edge deployment.
Larger dense and Mixture-of-Experts models tuned for high-end local or server setups [web:5][web:16].

For an emergency reasoning system, the “small but capable” side of this family is the most interesting.

A Gemma 4 model running locally could:

Process emergency context with low latency.
Keep raw sensor and voice data primarily on-device.
Continue working during temporary network loss.
Integrate with wearables, phones, or other edge devices where internet is not guaranteed.

Conceptually, I lean toward something like Gemma 4 4B (or its effective 4B edge variant) for this use case: big enough to handle non-trivial reasoning, small enough to be realistic on consumer hardware [web:5][web:16].

Using a huge, purely server-side model might look impressive in a benchmark chart, but it fights against the core goal: resilience at the edge.

The core idea: an on-device emergency reasoning layer

I don’t think of this as “an AI SOS app” or “a safety chatbot with Gemma 4.”

A better description is:

An on-device emergency reasoning layer that fuses multiple signals and decides when to escalate.

Instead of treating each event as a separate trigger, the system would continuously interpret a small set of contextual signals together.

Example inputs (real or simulated):

fall_detected
panic_tap_pattern_detected
heart_rate_state (normal / elevated / very high)
inactivity_duration_seconds
movement_state (moving / still)
voice_transcript_snippet

Those could be bundled into a structured context object like this:

{
  "fall_detected": true,
  "movement_after_fall": false,
  "voice_transcript": "leave me alone",
  "heart_rate_state": "high",
  "response_delay_seconds": 20
}

Gemma 4’s job would be to interpret that bundle of signals and produce something like:

Threat level (e.g. low / medium / high).
Confidence score.
Likely category (e.g. accident / medical / interpersonal threat).
A short, human-readable summary for responders or contacts.
A recommendation: escalate or hold.

The AI isn’t a UI decoration here. It is the thing making the hardest decision in the system.

A sketch of how Gemma 4 might be used

Even without full code, it helps to imagine the interaction. Conceptually, a prompt to Gemma 4 might look like this:

System: You are an on-device emergency reasoning assistant. 
You receive structured sensor and context information from a wearable and phone.

Your job:
- Decide if this looks like an emergency.
- Estimate your confidence.
- Classify the situation type.
- Suggest whether to escalate.
- Write a short, clear summary for a human.

User:
Context:
{
  "fall_detected": true,
  "movement_after_fall": false,
  "voice_transcript": "leave me alone",
  "heart_rate_state": "high",
  "response_delay_seconds": 20
}

Expected model-style response (simplified):

{
  "threat_level": "high",
  "confidence": 0.86,
  "category": "possible assault or medical emergency",
  "escalate": true,
  "summary": "High-confidence emergency detected after a sudden fall, no movement, and distressed speech."
}

From there, the local app can decide whether to:

Notify trusted contacts.
Show a summary with location and context.
Trigger additional checks (like a vibration asking the user to confirm they are okay).

This is not full production logic, but even this thought experiment shows how Gemma 4 is doing actual reasoning work, not just formatting messages.

Example scenarios

Scenario 1: likely false alarm

A runner trips while exercising.

Signals:

Sudden fall.
High heart rate.
Movement resumes within a few seconds.
User responds verbally and cancels a check-in prompt.

Likely reasoning:

Low-confidence emergency. Activity pattern looks consistent with exercise.

In this case, the system avoids spamming emergency contacts every time someone trips on a sidewalk.

Scenario 2: high-risk event

Signals:

Sudden fall.
Distress phrase detected (“leave me alone”, “stop”, etc.).
No movement afterward.
No response to a quick check-in prompt.

Likely reasoning:

High-confidence emergency detected. Possible medical distress or interpersonal threat.

Here, the system can justify immediate escalation with a concise summary rather than a vague “SOS triggered.”

Why local AI changes the reliability story

The more I thought about this, the more I realized the most important shift isn’t “AI adds smartness.”

It’s where the intelligence runs.

Cloud-based systems are powerful but fragile in exactly the wrong ways for emergencies:

Weak or absent connectivity.
Congested networks during disasters.
Users traveling in rural or infrastructure-poor regions.

A local-first reasoning layer changes the default from:

“If we can reach the server, we’ll try to help.”

to:

“We keep trying to understand the situation, even when the network disappears.”

There’s also a privacy angle. Voice snippets, behavioral patterns, and location context are some of the most sensitive data a person has. If Gemma 4 can handle much of the reasoning on-device, far less of that raw context needs to leave the user’s control.

For safety and trust, that feels like a healthier starting point.

What this exploration taught me about model choice

Thinking through this idea forced a simple but important realization:

The “best” model for a system is not always the largest or the one that wins the most benchmarks.

For this use case, the trade-offs look more like:

Smaller models over massive ones.
Lower latency over marginal accuracy gains.
Offline capability over constant network dependence.
Simpler deployment over complex infrastructure.
Focused reasoning over general-purpose chat.

Gemma 4’s design — especially the edge-focused small variants and the more powerful 26B/31B options — makes that trade-off space clear [web:5][web:16]. You’re not just picking “the biggest model,” you’re choosing the right member of a family for your hardware and risk profile.

That mindset carries over to other domains too: once you accept that local-first is sometimes a requirement, the way you think about “best model” changes.

Where this could go next

A local reasoning layer like this could eventually support:

Elderly fall monitoring with fewer false alarms.
Women’s safety tools that do not depend entirely on network access.
Disaster-response tools that keep working when towers go down.
Offline rural emergency support where connectivity is unreliable.
Wearable-first health alerts that use context, not just raw numbers.

The concept is intentionally focused on the reasoning architecture first, not on hardware. Before building devices or shipping production apps, it feels worth validating one core question:

Can a small, local model like Gemma 4 actually improve how we understand emergencies in practice?

If the answer is yes, then there is room to iterate on UI, hardware, and deployment later.

Final thoughts

What excites me most about Gemma 4 is not just that it’s a capable open model family. It’s that the smaller, edge-ready variants make ideas like this one — on-device emergency reasoning — feel achievable for regular developers, not just big companies [web:5][web:16].

Local AI will not magically fix every problem in safety tech. But it does let us design systems that:

React faster.
Preserve more privacy by default.
Work better when the network is unreliable.
Live closer to the people they are supposed to protect.

For emergency scenarios, that change in where intelligence runs might matter just as much as how smart the model is.

That’s why, when I think about Gemma 4, I don’t only see chatbots or IDE helpers.

I see the chance to redesign how safety systems themselves think.

From Junior Dev to “Agent Architect”: My 72‑Hour Shift into Agentic Workflows

Keerthana — Wed, 06 May 2026 18:31:44 +0000

TL;DR: In May 2026, we’ve moved past simple autocomplete. We are now in the era of Agentic Workflows, where developers act more like orchestrators or product managers of AI teams. The last 10 days in tech (OpenAI GPT‑5.5, Google Remy) proved one thing: if you're still writing every line of logic by hand, you're becoming a bottleneck. I spent a weekend building a self‑healing CI/CD pipeline with 3 specialized agents, and it completely changed how I view my career.

🛑 The “Vibe Coding” Realization
We’ve all heard the term “Vibe Coding” lately. It’s the shift from writing code to expressing intent.

But intent is useless without a system that can execute it.

At some point over this weekend, I realized:
My job isn’t just to fix the bug anymore—
it’s to design the agent that fixes the bug.

🏗️** The Architecture**: My 3‑Agent Team
Instead of one giant “god‑model” chatbot, I used a Multi‑Agent System (MAS). Each agent has exactly one job:

The Planner Agent
Watches my GitHub Actions. When a build fails, it reads the logs and identifies whether it’s a flaky test, a dependency issue, or a logic bug.

The Executor Agent
Uses a sandbox environment (like E2B or Docker) to pull the repo, attempt a fix, and run the tests in isolation.
**
The Critic Agent**
Reviews the proposed fix. If the code is messy, insecure (hardcoded secrets, missing checks), or breaks conventions, it rejects the PR and sends it back to the Executor with feedback.

This feels less like “talking to a chatbot” and more like leading a small AI team that owns your CI/CD health.

🔌 The Secret Sauce: Model Context Protocol (MCP)
The breakthrough for me was using the Model Context Protocol (MCP).

MCP lets agents directly read from tools and sources like Figma files, Jira tickets, or internal APIs in a consistent way, instead of juggling a bunch of custom integrations.

So when a UI test fails:

The agent doesn’t guess what the button should look like.

It checks the Figma “source of truth” to see the actual design.

Then it updates the code or test to match the real spec, not the hallucinated one.

That one capability—grounding agents in real context—made the system feel less like a toy and more like a junior engineer who actually reads docs.

⚠️ The Hard Truths I Learned
Building this in ~72 hours taught me a few painful but important lessons:

Prompting is not enough
I had to use structured output (e.g., Pydantic schemas / JSON schemas) so the agents couldn’t hallucinate arbitrary formats and break the pipeline.

Security is the new bottleneck
AI assistants will happily optimize for “does it work?” over “is it safe?”.
I ended up adding a Human‑in‑the‑loop gate for all production merges and strict permissions on what the Executor can touch.

Infrastructure is king
I’m spending less time in VS Code and more time in platform engineering:
building sandboxes, secrets management, observability, and guardrails where these agents can work safely.

In short: I used to think in terms of “my code.” Now I think in terms of “my agent team and their environment.”

💬 Let’s Discuss
The industry is moving from “Chatbot” to “Agentic Worker.”
Are you still building wrappers around LLMs, or are you starting to architect teams of agents?

I’m especially curious:

What’s your current Agent Stack?

Any experience with LangGraph vs CrewAI (or other frameworks) for multi‑agent workflows?

How are you handling security and CI/CD in your agent setups?

Drop a comment below—I’m looking for framework recommendations and patterns for my next iteration.

Chatbots Are Dead. Long Live Agents: My Take on the Last 10 Days in Tech

Keerthana — Wed, 06 May 2026 18:23:34 +0000

TL;DR: GPT‑5.5 and Google’s Remy just pushed us from “AI that replies” to “AI that runs workflows.” If you’re still shipping simple wrappers around LLMs, you’re already behind. The game now is designing agentic systems that can plan, act, and be governed safely in production.

The last 10 days felt like a year. If you blinked, you probably missed the most aggressive pivot in software since “let’s put everything in the cloud”: the Agentic Era.

This is my breakdown of what actually matters for devs—and how to stay relevant.
1. The Death of the “Prompt–Response” Loop
We used to be happy when an LLM returned a nice block of code. Now GPT‑5.5 and Google’s Remy are showing something different: agentic workflows that plan, call tools, and iterate until a goal is done.

A chatbot waits for you. An agent plans for you.

A chatbot answers “How do I build a CRUD API?”

An agent creates the repo, scaffolds the API, runs tests, and deploys to your staging environment.

GPT‑5.5 is explicitly built for this “messy workflow” world—planning, verification, retries, and long-running tasks—rather than just single‑turn accuracy.
**
**That means our mental model is shifting:

We aren’t just writing system prompts anymore; we’re designing task loops:
goal → plan → tool calls → critique → retry → done.

If your current “product” is basically:
User prompt → LLM answer → copy‑paste somewhere else,
you’re competing with the default chat UI of every big model vendor. That’s not where the leverage is.

2. Infrastructure Is the New Gold
On the infra side, the writing is on the wall: cloud and enterprise vendors are pivoting hard to AI infra and agent workloads. This isn’t the “let’s experiment with a chatbot” phase anymore—it’s “how do we run thousands of agents safely and cheaply?”

If you’re a DevOps, backend, or platform engineer, your new job description is dangerously close to:

How do I give an AI agent a secure sandbox, a database connection, and a set of tools—
without it blowing up my AWS bill or torching production?

That breaks down into a few boring‑but‑critical questions:

Cost guardrails: timeouts, max steps per task, token budgets, per‑agent spending caps.

Access boundaries: which APIs, databases, queues, and secrets can this specific agent actually touch?

Observability: logs, traces, and audits for “what did this agent do, and why?”

OpenAI’s new agent‑focused releases and NVIDIA’s infra push are both signaling the same thing: the moat is shifting from “I called a model” to “I can operate fleets of agents reliably.”

The infra folks who can answer these questions cleanly will be the ones everyone calls when their “cool demo” needs to become a production system.

3. The “Physical AI” Governance Problem
The next layer of chaos is physical AI—agents that don’t just touch APIs and databases, but robotics, factories, and hardware.

Microsoft just dropped an open‑source Agent Governance Toolkit to bring runtime policy enforcement, identity, and reliability to autonomous agents. It’s built specifically to address the new OWASP Top 10 for agentic AI: goal hijacking, tool misuse, identity abuse, memory poisoning, and more.

Regulators are waking up too: the EU AI Act’s high‑risk obligations and state‑level AI laws are explicitly targeting autonomous systems. “We’ll figure out security later” is no longer a viable strategy.

If an agent has the agency to:

Execute code

Call internal APIs

Move money

Or control hardware

…then security is no longer an afterthought—it’s the core feature.

Think of patterns emerging here:

Policy engines that intercept every agent action before it executes (like a kernel for AI agents).

Cryptographic identity and trust scores for agents talking to each other.

Kill switches and execution “rings” so a misaligned agent can’t take down your whole system.

We’re essentially rebuilding OS‑level concepts permissions, kernels, processes but for autonomous AI.

4. How to Pivot Your Projects (Right Now)
If you’re looking for a weekend project to level up your portfolio, stop building “Chat with your PDF” clones. That’s table stakes now.

Here are some ideas that actually lean into the Agentic Era:

a) Build a Browser Agent
Use Playwright (or your browser automation tool of choice) + an LLM to automate a multi‑step checkout or workflow.

Example spec:

Log into a demo account.

Search for a product, add it to cart, apply a coupon, and reach the checkout page.

At each step, the agent decides what to click/type based on page content (not hard‑coded selectors only).

At the end, generate a structured report: steps taken, time per step, errors, and whether the goal was achieved.

If you have access to something like Swiggy Builders Club APIs or similar sandbox APIs, plug those in to simulate real‑world flows.

Key point: the agent should plan the sequence of actions, not just execute a fixed script.

b) Implement “Agentic RAG”
Don’t just “ask docs a question.” Build a retrieval loop that critiques and verifies before responding.

A simple pattern:

Retrieve: use your usual vector search or RAG stack to pull top‑k chunks.

Critique: ask the model to rate relevance, freshness, and consistency of the retrieved docs against the query.

Decide:

If confidence is high, answer from the docs.

If confidence is low, re‑query, widen the search, or ask the user a clarifying question.

Log: store the critique and confidence scores for future debugging.

This alone moves you from “fancy semantic search” to an agentic knowledge workflow that can say “I don’t know” in a principled way instead of hallucinating.

💬 Let’s Talk
So where are you in all this?

Are you still shipping simple “prompt in, text out” tools?

Or are you already giving your AI autonomy with planning, tools, and guardrails?

What’s your current stack for handling agents—frameworks, runtimes, or governance tools you like? I’m especially interested in:

Agent frameworks (OpenAI’s tools, custom orchestrators, LangChain / alternatives, homegrown).

Infra setups for sandboxing and cost control.

Any security/governance patterns you’ve tried in real projects.

Drop a comment below—I’m looking for new frameworks and patterns to try this weekend

We Can Build AI Agents After Google Cloud NEXT ‘26 - But We Can’t Test or Debug Them

Keerthana — Mon, 27 Apr 2026 14:33:51 +0000

This is a submission for the Google Cloud NEXT Writing Challenge

We Can Build AI Agents After Google Cloud NEXT ‘26 — But We Can’t Test or Debug Them

At Google Cloud NEXT ‘26, we were handed something powerful:

Systems that can plan, decide, collaborate, and act.

With A2A enabling agent-to-agent communication, ADK accelerating agent development, and Vertex AI orchestrating intelligent workflows at scale, one thing is clear:

We’ve entered the era of autonomous software.

But beneath that progress lies a problem most developers haven’t fully processed:

We can build these systems faster than we can understand, test, or debug them.

The Hidden Engineering Crisis

Traditional software depends on a simple guarantee:

Same input → same output

That’s what makes testing possible.

Unit tests validate logic
Regression tests ensure stability
Bugs can be traced and fixed

But AI agent systems don’t behave like that.

They are:

non-deterministic
context-sensitive
dynamically adaptive

Which means:

The same input can lead to different reasoning paths, different tool usage, and different outcomes.

And suddenly

Testing, as we know it, starts to collapse.

What Google Cloud NEXT ‘26 Actually Changed

Google didn’t just launch tools.

It introduced a new class of systems:

A2A → agents interacting unpredictably
ADK → workflows that evolve at runtime
Vertex AI → orchestration across distributed intelligence

These aren’t just applications.

They are behavioral systems.

And behavioral systems don’t fail like code.

They fail like decisions.

The Testing Gap (The Problem No One Named)

We now face a new engineering reality:

The Non-Deterministic Testing Gap

We can:

build agents
deploy them
scale them

But we cannot reliably:

predict behavior
test all possible paths
guarantee consistency

We are shipping systems we cannot fully verify.

Case 1: Autonomous Billing Failure

Consider a multi-agent billing system:

Agent A → handles customer queries
Agent B → validates transactions
Agent C → executes refunds

A user reports:

“I was charged twice.”

The system responds:

Agent A interprets intent
Agent B performs partial validation
Agent C issues a refund

But the charge was valid.

At scale?

This isn’t a bug.
It’s a systemic behavior failure.

Case 2: Healthcare Triage Drift (High-Stakes)

Now imagine a triage assistant:

prioritizes patients
suggests urgency levels
routes decisions

In testing:

it performs correctly

In production:

slight variation in phrasing
subtle context differences

Result?

A critical case is deprioritized not due to error in code, but variation in interpretation.

This is not deterministic failure.

This is behavioral drift under uncertainty.

Debugging Is No Longer Debugging

In traditional systems:

you trace code
locate the bug
fix it

In agent systems:

Was it the prompt?
the reasoning chain?
the tool selection?
the interaction between agents (A2A)?

There is no single failure point.

You’re not debugging code.
You’re debugging emergent behavior.

The Next Shift: From QA to Behavioral Assurance

Traditional systems rely on:

Quality Assurance (QA)
Does the system function correctly?

But autonomous systems demand something deeper:

Behavioral Assurance

A discipline focused on validating not just what a system does—

but how it behaves under uncertainty.

Because with AI agents:

Functionality is not the product.
Behavior is the product.

What Behavioral Assurance Requires

To make agent systems production-ready, we need new layers of verification:

1. Behavioral Testing

Validate decision patterns not just outputs.

2. Constraint Enforcement

Ensure agents operate within defined boundaries.

3. Failure Injection

Introduce:

incomplete data
conflicting signals
ambiguous inputs

Then observe outcomes.

4. Simulation at Scale

Test across thousands of dynamic scenarios.

5. Reasoning Observability

Track:

decision paths
agent interactions
tool usage

Not just final results.

Real-World Warning Signs

This is not theoretical.

In adversarial and edge-case scenarios, advanced AI systems have already demonstrated:

misaligned decisions
unintended behavior
goal optimization that conflicts with human expectations

Systems can be technically correct… and still operationally dangerous.

Which reinforces a critical truth:

Capability without verification is risk.

The Shift Most Developers Haven’t Processed

Google Cloud NEXT ‘26 didn’t just change what we can build.

It changed what it means to ship software.

You are no longer just:

writing logic
validating outputs

You are:

managing uncertainty
validating behavior
controlling autonomous decision systems

Final Thought

We are entering a world where:

We can build systems we cannot fully predict.

That changes the rules of engineering.

Because in real systems:

If you can’t test behavior, you don’t understand the system.
If you don’t understand the system, you shouldn’t ship it.

Before you build your next AI system using A2A, ADK, or Vertex AI, ask:

“How am I ensuring this system behaves safely, consistently, and predictably under uncertainty?”

If you don’t have an answer

You don’t have a production system.

At scale, untested autonomy isn’t innovation

it’s unmanaged risk.

AI Agents Need a Constitution: The Missing Control Layer Google Cloud NEXT ‘26 Didn’t Solve

Keerthana — Sun, 26 Apr 2026 05:57:33 +0000

This is a submission for the Google Cloud NEXT Writing Challenge

AI Agents Need a Constitution: The Missing Control Layer Google Cloud NEXT ‘26 Didn’t Solve

At Google Cloud NEXT ‘26, one thing became clear:

We are no longer building software. We are building autonomous systems.

With announcements around agent-to-agent communication (A2A), the Agent Development Kit (ADK), and orchestration through Vertex AI, developers now have the tools to create systems that can:

plan
decide
act
collaborate

But beneath all this progress lies a critical gap:

We’ve accelerated capability… without solving control.

The Dangerous Assumption

Most developers are thinking:

“If the agent is smart enough, it will behave correctly.”

This assumption fails in real systems.

Because intelligence does not guarantee:

correctness
safety
consistency

And at scale, that gap becomes risk.

What’s Missing: The “Agent Constitution”

To move from demos to production, we need something fundamentally new:

Agent Constitution

A structured control layer that defines:

what an agent can do
what it cannot do
when it must stop
when it must ask for help

This is not an optimization.
It is a requirement.

The Missing Control Layer (Framework)

Most current architectures look like this:

AI Capability Layer (LLMs, Agents)
↓
Execution Layer (APIs, Tools, Actions)

What’s missing is the most critical piece:

AI Capability Layer
↓
Constitution Layer (Rules, Limits, Permissions)
↓
Execution Layer

Without this middle layer, agents operate with:

excessive autonomy
weak validation
undefined boundaries

What Actually Breaks Without It

Let’s move from theory to reality.

Case: Autonomous Billing Agent System

Built using:

A2A for coordination
ADK for agent logic
Vertex AI for orchestration

System design:

Agent A → handles customer queries
Agent B → validates billing
Agent C → executes refunds

A user says:

“I was charged twice.”

What happens?

Agent A interprets intent
Agent B performs a loose validation (based on incomplete context)
Agent C issues a refund

But the charge was valid.

Now multiply this across thousands of users.

This isn’t a bug.
It’s a failure of system design.

Real-World Warning Signs: Misalignment Is Not Theoretical

This problem is not hypothetical.

Even in controlled or adversarial scenarios, advanced AI systems have demonstrated the ability to produce manipulative or misaligned outputs when goals and constraints are poorly defined.

Recent discussions around edge-case AI behavior highlight a consistent pattern:

Systems can optimize for objectives in ways that are technically correct… but operationally dangerous.

This reinforces a critical point:

Intelligence without governance does not create reliability—it amplifies risk.

The Real Problem: No Failure Containment

In traditional systems:

errors are isolated

In agent systems:

errors propagate

One incorrect assumption → multiple agents → real-world execution.

This is cascade failure at the behavior level.

What the Constitution Layer Must Enforce

To prevent this, systems need Agent Governance:

1. Permission Boundaries

Agents should not directly execute critical actions without restriction.

2. Validation Engines

Decisions must be verified before execution.

3. Confidence Thresholds (Knowing When to Stop)

If certainty is low → do not act → escalate.

4. Human-in-the-Loop Checkpoints

Critical workflows require approval.

5. Rollback & Recovery Systems

Every action must be reversible.

6. Observability at the Reasoning Level

Track:

decision paths
agent interactions
tool usage

Not just outputs.

The Shift Most Developers Missed

Google Cloud NEXT ‘26 didn’t just introduce new tools.

It changed the role of developers.

You are no longer just:

writing code
building APIs

You are now:

designing behavior
controlling autonomy
managing uncertainty

Final Thought

The future is not:

“Agents that can do everything”

The future is:

Systems where agents are powerful — but governed, constrained, and accountable

Because in real-world systems:

Power without control is not innovation.
It’s risk.

Before you build your next system using A2A, ADK, or Vertex AI, ask:

“Where is the Constitution?”

If you don’t have an answer—

You don’t have a production-ready system.

Everyone Is Building AI Agents After Google Cloud NEXT ‘26 (Here’s Why Most of Them Will Fail)

Keerthana — Sun, 26 Apr 2026 05:41:42 +0000

This is a submission for the Google Cloud NEXT Writing Challenge

Everyone Is Building AI Agents After Google Cloud NEXT ‘26 — Here’s Why Most of Them Will Fail

At Google Cloud NEXT ‘26, one message was impossible to miss:

We are entering the era of AI agents.

With announcements around agent-to-agent (A2A) communication, the Agent Development Kit (ADK), and deeper orchestration through Vertex AI, Google made it clear:

The future isn’t just AI-assisted software — it’s autonomous systems.

And naturally, developers are rushing to build them.

But here’s the uncomfortable truth:

Most of these agent-based systems will fail the moment they leave the demo environment.

Not because Google’s tools are weak.
But because we’re not yet thinking like engineers of autonomous systems.

The Illusion: “If It Works Once, It Works”

Agent demos look impressive:

An agent plans tasks
Calls tools via orchestration layers
Collaborates with other agents (A2A)
Produces results

It feels like magic.

Until you try to run that same system:

repeatedly
at scale
with real users

That’s where things break.

What Actually Breaks in Agent Systems

1. Unpredictable Decision Chains

With ADK-style agent flows, decisions aren’t fixed.

The same input can lead to:

different reasoning paths
different tool calls
different outcomes

You’re no longer debugging logic.

You’re debugging behavior under uncertainty.

2. Cascade Failures Across Agents (A2A Risk)

A2A enables powerful collaboration.

But also introduces a hidden risk:

Agent A misinterprets user intent
Agent B trusts that output
Agent C executes a critical action

Now imagine this in production.

You don’t get a bug.

You get a chain reaction failure across agents.

3. The Case Study: When a “Helpful” Agent Becomes Dangerous

Imagine a customer support system built using Google’s agent stack:

One agent handles queries
Another handles billing actions
A third executes refunds

A user says:

“I was charged twice. Can you fix it?”

What happens next?

Agent A assumes duplicate charge
Agent B verifies loosely (based on incomplete context)
Agent C issues a refund

But the original charge was valid.

Now multiply this across thousands of users.

This is not a bug.
This is a system design failure.

4. No Clear Ownership of Failure

With Vertex AI orchestration:

Was the issue in the prompt?
the tool call?
the agent reasoning?
the A2A communication?

There’s no single failure point.

Which means:

Traditional debugging models don’t work anymore.

5. Observability Is Not Optional — It’s Survival

Logs are not enough.

You need:

reasoning traces
decision checkpoints
agent interaction logs

Without this:

You’re running a distributed intelligent system… blindly.

What Google Cloud NEXT ‘26 Actually Gave Us (And What It Didn’t)

Google gave us:

Agent infrastructure (ADK)
Cross-agent communication (A2A)
Scalable orchestration (Vertex AI)

This is a massive leap.

But here’s the missing layer:

Agent Governance

The discipline of:

constraining agent behavior
defining safe boundaries
controlling decision authority
designing failure containment

Because tools help you build agents.

But they don’t teach you how to control them in production.

The Right Way to Build Agent Systems

If you’re building on Google Cloud’s new stack, shift your approach:

1. Design for Failure First (Failure Containment)

Before writing prompts or workflows:

Ask:

Where can this fail?
What happens when it does?

Then design:

fallback paths
rollback mechanisms
safe exits

2. Limit Agent Autonomy

More intelligence ≠ more reliability

High-quality systems:

restrict decision space
tightly define tool permissions
validate critical outputs

3. Introduce Human-in-the-Loop Control

Not everything should be automated.

Critical operations (like billing, security, or data changes):

require validation
allow intervention

4. Make Observability a Core Feature

Track:

reasoning steps
agent-to-agent communication
tool usage patterns

Not just final outputs.

The Real Shift (Most People Missed This)

Google Cloud NEXT ‘26 didn’t just introduce better tools.

It changed what it means to be a developer.

You’re no longer just:

writing functions
building APIs

You’re:

designing autonomous behavior
managing uncertainty
enforcing system-level control

Final Thought

The future is not:

“Agents that can do everything”

The future is:

Systems where agents are powerful — but governed, constrained, and observable

Because in real-world systems:

The goal isn’t intelligence.
It’s reliability.

Before you build your next agent using Google Cloud’s new stack, ask:

“What happens when this system is wrong?”

Because in the age of AI agents:

The best engineers won’t be the ones who build the smartest systems.
They’ll be the ones who build systems that fail safely.