DEV Community: PRASAD TILLOO

L.E.N.S. — A private photography coach for blind and low-vision artisans

PRASAD TILLOO — Fri, 22 May 2026 22:19:03 +0000

L.E.N.S. (Local Edge Native Studio) is a voice-guided photography coach that runs Gemma 4 E4B locally through Ollama — so a maker can verify and improve product photos before listing, without sending images to the cloud and without asking someone sighted to “just check this one.”

Gemma 4’s native multimodal vision is the engine: each coaching turn sends a real product photo (base64 in the Ollama chat) and gets back structured JSON the app validates before speaking.

🔗 Try it (no install): lens-app-gemma4.vercel.app

📹 Demo video: YouTube walkthrough

💻 Source: github.com/prasadt1/photography-coach-gemma4 (Apache 2.0)

What I Built

I built L.E.N.S. for someone like Mohan — a low-vision artisan who hand-knits sweaters to sell online. He can judge the knit by touch: tension, pattern, finish. What he cannot reliably judge is the photograph of the piece. Is it in focus? Is the light flat? Is the sweater cropped awkwardly or lost against the background? On a marketplace like Etsy, the photo is the product; a weak photo quietly costs the sale. Until now, that step has meant borrowing someone else’s eyes.

L.E.N.S. closes that gap.

The maker points their camera and takes a photo.
Gemma 4 E4B — on their own machine, via Ollama — assesses framing, lighting, focus, and composition from the image itself (multimodal input, not a text-only description).
L.E.N.S. speaks back one specific, actionable fix: not “this photo is bad,” but “move back about six inches” or “the light is behind the sweater — turn toward the window.”
They take a second photo; L.E.N.S. compares the two images out loud and says which is stronger and why.
It drafts copy-ready listing text — title, description, and alt-text — ready to paste into their store.

It is voice-first by design, not a visual UI with audio bolted on. I built and tested the flow with a screen reader on and the screen off, because that is how it will actually be used. Structured JSON is an accessibility choice too: the client validates a strict schema and surfaces discrete, ordered points, so coaching stays one fix at a time instead of a wall of feedback the user cannot skim.

I designed for the hardest case — a blind maker, fully offline — and by the curb-cut effect, the same coaching helps any maker without a photographer or a reliable connection.

Alt: infographic of five steps — artisan capture, on-device analysis, voice feedback loop, compare and iterate, then listing copy for Etsy or Shopify.

Demo

Full walkthrough: first photo, spoken coaching, stronger retake, comparison, generated listing.

Try it live

Link	What you get
lens-app-gemma4.vercel.app	Judge / no-install demo. Sample photos play back real E4B runs recorded locally; uploads use Gemma 4 31B on Ollama Cloud so reviewers can try a photo without pulling a model.
photography-coach-gemma4.vercel.app	Real product path for the submission video — E4B on your Mac via Ollama (same Wi‑Fi PWA or tunnel). Photos do not go to Ollama Cloud on this deploy.

No account. No tracking. Copy-ready output only — L.E.N.S. does not auto-publish to Etsy or Shopify.

Code

Source, README, architecture notes, and spike write-ups:

prasadt1 / photography-coach-gemma4

📷 L.E.N.S. — Local Edge Native Studio

The one step between a finished piece and a sale shouldn't depend on someone else's eyes.

A private, voice-guided photography coach for blind and low-vision artisans.

🔗 Live demos: Judge try-it (Ollama Cloud 31B) · Real product / video (local E4B) · Demo video · Built for the Gemma 4 Good Hackathon

Tracks: Digital Equity & Inclusivity · Ollama

What L.E.N.S. is

Mohan has low vision. He hand-knits sweaters and can finish a flawless cable pattern by touch. He can shape, price, and list a piece on his own — until the one step he cannot finish alone: photographing it well enough to sell online.

L.E.N.S. closes that gap. It is a voice-guided photography coach that helps blind and low-vision artisans verify and improve their product photos before listing their work. It runs Gemma 4 through Ollama, describes the photo in plain…

View on GitHub

Stack: React 19 + TypeScript PWA, optional Electron desktop build, Ollama for local multimodal inference, Web Speech API for coach voice output.

Repo highlights:

Strict JSON contract — one schema drives description, colour check, single fix, alt-text, and listing copy.
Three honestly labelled inference modes (see below).
Spike docs: Spike 1 — E4B via Ollama, quantization study, LiteRT iOS spike.

This is original work I built for accessibility-first product photography coaching; the repo is not a repackaged template.

How I Used Gemma 4

Gemma 4 is the core of L.E.N.S.: multimodal photo assessment and coaching generation. Every model and runtime choice followed from local-first privacy and voice-loop latency.

Why Gemma 4 E4B (and what I ruled out)

The Gemma 4 family spans small edge models, 31B Dense, and 26B MoE. For this project:

Variant	Role in my decision
E2B (~2B)	Too small for consistent visual judgment on real product photos.
E4B (~4B)	Shipped. Small enough for consumer hardware + Ollama offline; capable enough for trustworthy multimodal coaching.
31B Dense	Ruled out for the product — too heavy for typical laptops; breaks the “photo never leaves the machine” promise. Used only for judge demo uploads on Ollama Cloud.
26B MoE	Strong for throughput/reasoning, but overkill for a single-photo voice loop on modest hardware; E4B matched the edge + multimodal product path better.

E4B is the deliberate middle: the trade-off is the project.

What E4B unlocked for this project

Multimodal vision on-device — real product photos in, structured coaching out (framing, light, focus, colour), not text-only guesses.
Offline independence — the product path never requires sending photos to a remote API.
Usable voice-loop latency — ~4B + Q4_K_M + streaming TTS ≈ ~20s warm (down from ~40s early on).
Strict JSON coaching — one spatial fix, two-photo compare, listing copy — all from schemas Ollama enforces at generation time.
Honest dual deploy — E4B for the real maker story; 31B only where judges need a zero-install upload path.

Multimodal + structured output (how it’s wired)

Each analyze call sends the image in Ollama’s messages[].images[] array and asks Gemma 4 E4B for JSON via Ollama’s format field (JSON Schema). The client validates before TTS speaks:

// services/ollamaService.ts — simplified
const messages = [
  { role: 'system', content: buildSystemPrompt(/* artisan coaching */) },
  { role: 'user', content: userPrompt, images: [base64ProductPhoto] },
];

await fetch(`${OLLAMA_BASE}/api/chat`, {
  method: 'POST',
  body: JSON.stringify({
    model: 'gemma4:e4b',
    messages,
    format: ARTISAN_V3_OUTPUT_SCHEMA,  // Ollama enforces JSON shape
    stream: true,                       // TTS starts before generation ends
    options: { num_predict: cappedTokens },
    keep_alive: '30m',
  }),
});

The artisan schema drives fields like scene description, one priorityFix, alt-text, and listing title/description — so VoiceOver/TalkBack and the coach voice never drown the maker in a paragraph of fixes.

Runtime: Ollama

I spiked Cactus and llama.cpp as well. Ollama won for the cleanest local multimodal serving and the simplest path to multiple inference modes without rebuilding the pipeline each time.

Quantization: Q4_K_M

On modest hardware, Q4_K_M keeps E4B runnable without meaningfully hurting visual assessment. Lighter quants started to cost coaching quality; heavier ones were not worth the memory for this use case.

Latency and voice

Early warm inference was ~40s — too long for a spoken coaching loop. Prompt tuning, a token cap, a warm-up call on startup, and streaming brought warm runs to roughly 20s.

Three honest inference modes

Mode	Model	Network
Local (product)	Gemma 4 E4B via Ollama on the maker’s machine	Fully offline
Judge demo uploads	Gemma 4 31B on Ollama Cloud	Requires connection
Demo mode	Playback of real recorded E4B responses	None

I also spiked LiteRT for true on-device iOS inference (~25 tok/s in Google’s reference app). That is Phase 2 — documented as roadmap, not claimed as shipped. Today, iOS is covered by the installable PWA talking to Ollama on the Mac (same Wi‑Fi or tunnel).

Why local Gemma matters

Privacy here is not a bullet point — it is the mechanism of independence. A cloud coach swaps one dependency for another: instead of a sighted helper, you need connectivity, an account, and a server that receives your product photos. A capable Gemma 4 model on the maker’s own hardware is what makes “I can list this myself” real.

Accessibility (why the UX matches the model story)

Voice-first with an equivalent labelled control for every voice action.
Screen reader: landmarks, live regions, managed focus; coach TTS works alongside VoiceOver/TalkBack, not instead of it.
One fix at a time — same discipline in prompt design and UI.
Anti-hallucination — states uncertainty when the image does not support a claim.
Multilingual coaching paths in the prompt layer.

What’s next

Native on-device iOS via LiteRT (spike done; integration is post-hackathon).
More languages and tighter cold-start latency.
Deeper maker workflows (batch listing prep) — still local-first.

I built a closed-loop WhatsApp advisor for 100M+ Indian farmers — fully serverless on AWS

PRASAD TILLOO — Sun, 19 Apr 2026 12:49:39 +0000

The problem

Indian smallholder farmers don't lose crops because advice doesn't exist. They lose them because advice arrives after the spray window closes.

What I built

AgriNexus AI — a WhatsApp advisor that follows up until the farmer confirms "हो गया" (done).

The architecture

Three decisions worth sharing:

1. EventBridge Scheduler > Step Functions Wait States
Keeping a state machine open for 48h is expensive at scale.
Step Functions completes in seconds; EventBridge Scheduler
creates one-shot targets at T+24h / T+48h. DynamoDB Streams
→ ResponseDetector Lambda cancels schedules on "done."
Linear cost scaling instead of exponential state transitions.

2. S3 Vectors > OpenSearch Serverless
OpenSearch's always-on OCU costs dominated early bills
regardless of query volume. S3 Vectors eliminated that.
Modeled cost: ~$0.54/farmer/year at 10K scale.

3. Bedrock RAG with visible citations
Retrieve-and-Generate API + Claude. Knowledge base: ICAR,
FAO, NFSM PDFs. Every response has a source link visible to the farmer. Trust needs traceability.

The stack

API Gateway + WAF → Lambda → SQS FIFO → Bedrock (Claude +
S3 Vectors KB) → Transcribe → Polly → EventBridge Scheduler
→ DynamoDB Streams

Full article with architecture diagrams and ADRs: [article link]
Repo: https://github.com/prasadt1/agrinexus-ai

One ask

This is an AWS Builder AIdeas 2025 finalist. Community voting runs April 18–23 PT. If the architecture or the mission resonates:
click the 👍 like button at the top of this AWS Builder article here 👉 https://builder.aws.com/content/3C8hBRTcsRuQrHzE3Pq243yhXTF/aideas-finalist-agrinexus-ai
(One-time ~30-sec sign-up with Amazon Builder.)

Happy to answer technical questions in the comments.

How I Built a Behavioral Nudge Engine for Farmers on Serverless AWS

PRASAD TILLOO — Sat, 14 Mar 2026 09:45:46 +0000

Read the full article on AWS Builder Center

Every agricultural AI chatbot I found had the same blind spot: they stop at information delivery.

They tell the farmer what to do. None of them know whether the farmer actually did it.

That gap — between knowing and doing — is where crop loss happens. A cotton farmer in Maharashtra (India) who misses a 3-day insecticide / pesticide spray window can lose 30-50% of his crop. Not because the advice doesn't exist in FAO and ICAR research manuals — but because it never reaches him in time, in his language, in a format he can act on.

So I built AgriNexus AI — a WhatsApp-based agricultural advisor that goes beyond Q&A. It proactively sends weather-timed nudges and follows up until the farmer confirms: "ho gaya" (done).

Built for the AWS 10,000 AIdeas competition. Full article with demo videos on AWS Builder Center.

The Architecture: Four Flows, One WhatsApp Number

AgriNexus supports four interaction modes — all through the WhatsApp number the farmer already has:

1. Text Chat — Farmer types in Hindi, Marathi, Telugu, or English. Amazon Bedrock RAG searches FAO and ICAR manuals, Claude 3 Sonnet generates a response in the farmer's language with source citations. 3-5 seconds end-to-end.

2. Voice Pipeline — Farmer sends a voice note describing crop symptoms. The pipeline: OGG audio from WhatsApp → S3 → Amazon Transcribe (dialect-matched: hi-IN, mr-IN, te-IN, en-IN) → transcript re-queued to the same RAG pipeline → Claude 3 Sonnet response → Amazon Polly neural TTS → audio + text reply back. The farmer hears the answer while walking the field.

3. Crop Vision — Farmer photographs a diseased leaf. Claude 3 Sonnet Vision identifies the pest/disease, returns a structured diagnosis: pest name, confidence score, recommended pesticide, dosage, application timing, and safety warnings. All in the farmer's language.

4. The Nudge Loop — This is what makes AgriNexus different from a chatbot. More on this below.

The Behavioral Nudge Engine: From Knowledge to Action

This is the core innovation. Every other agricultural AI tool I studied — Farmer.Chat, iSDA, AgriChat.AI — stops at information delivery. AgriNexus closes the loop.

The nudge engine doesn't wait for the farmer to ask. When weather conditions are right for spraying, it initiates contact:

EventBridge Scheduler triggers WeatherPoller Lambda daily at 7 AM
WeatherPoller checks conditions: wind < 15 km/h, no rain forecast
If conditions pass → Step Functions nudge workflow starts
Workflow queries DynamoDB for eligible farmers by region + crop + consent (with duplicate prevention — no farmer gets two nudges for the same activity on the same day)
NudgeSender Lambda fires a crop-specific WhatsApp message with interactive buttons

The farmer sees (in Hindi):

"Today is good weather for fungicide spray on wheat. Wind: 8 km/h."
[Done] [Not now]

If they tap "ho gaya" (done) — DynamoDB Streams triggers ResponseDetector Lambda, which detects the DONE keyword across all supported languages (Hindi/Marathi/Telugu/English), cancels all pending reminders, and marks the nudge as COMPLETED.

If they don't respond — reminders fire at T+24h and T+48h.

The Architecture Decision That Saved Us: EventBridge Scheduler vs Step Functions Wait States

This was the most interesting trade-off in the build.

The naive approach: keep a Step Functions execution alive for 72 hours to handle T+24h and T+48h reminders. At scale, that's expensive — state transition costs accumulate, and you're paying for executions that idle for days.

The AgriNexus approach: Step Functions workflow completes in under 5 seconds. It sends the nudge and registers EventBridge Scheduler targets for each reminder. When the farmer responds, ResponseDetector deletes those schedules instantly. Short execution + scheduled targets = clean, cheap, scalable.

This pattern is reusable for any system that needs delayed follow-ups with early cancellation — appointment reminders, SLA escalations, onboarding sequences.

The Nudge Loop Architecture

Here's the full flow diagram — 12 steps from weather check to loop closure:

Key components:

EventBridge Scheduler — 7 AM daily trigger + T+24h/T+48h reminder targets
Step Functions — orchestrates the nudge workflow (completes in seconds, not days)
DynamoDB Streams — real-time DONE keyword detection without polling
ResponseDetector Lambda — multi-language keyword matching + schedule cleanup
Single-Table DynamoDB — user profiles, message idempotency, and nudge tracking in one table with composite keys

Five Architecture Decisions in 30 Seconds

Single-Table DynamoDB — composite keys (PK=USER#<phone>, SK=PROFILE|MSG#<ts>|NUDGE#<id>) with GSI for region-based targeting. Access patterns defined before schema.
EventBridge Scheduler over Step Functions Wait States — short-lived executions + scheduled targets. Event-driven cleanup on DONE detection.
DynamoDB Streams for real-time response detection — no polling. Stream triggers Lambda on every write, checks for DONE keywords across 4 languages.
Claude 3 Sonnet for code-switching — farmers mix Hindi and English ("Mere cotton mein pests hain"). Don't over-engineer language detection. Let the model handle what it's trained for.
Batch transcription over streaming (MVP) — 20-34 second latency is acceptable for async WhatsApp. Ship first, optimize with real user feedback. Transcribe Streaming (<2s) is the post-MVP path.

The Stack

Layer	Technology
Messaging	WhatsApp Business API + API Gateway
Intelligence	Amazon Bedrock (Claude 3 Sonnet) + RAG
Knowledge Base	Bedrock Knowledge Bases + OpenSearch Serverless
Voice In	Amazon Transcribe (hi-IN, mr-IN, te-IN, en-IN)
Voice Out	Amazon Polly (Aditi/Kajal neural voices)
Vision	Claude 3 Sonnet multimodal
Nudge Engine	EventBridge Scheduler + Step Functions + DynamoDB Streams
Storage	DynamoDB single-table + S3
Orchestration	AWS SAM, Lambda (Python 3.11)

Fully serverless. Zero instances. Scales from 10 to 10,000 farmers without re-architecture. Under $0.70/farmer/year at 10,000 users.

What I Learned

Voice is the interface, not a feature. Typing in Devanagari on a basic Android is slow and error-prone. Voice notes are what farmers already use to communicate with family. AgriNexus works the same way.

Behavioral nudges need closed loops. Sending a reminder is easy. Knowing whether it worked is hard — and it matters. The T+24h/T+48h chain with DONE/NOT YET buttons came from thinking about what actually changes behavior, not just what delivers information.

Prompt engineering is iterative. Initial RAG prompts gave 60% accuracy. Structured prompts with explicit format instructions and language-specific system messages reached 95%. Treat the AI as a collaborator that needs clear instructions.

Event-driven beats polling. DynamoDB Streams for DONE detection, EventBridge Scheduler for delayed reminders, SQS for async processing — events scale better than polling loops at every level.

Try It / Read More

I built AgriNexus for the AWS 10,000 AIdeas competition (Agriculture & Food Security category) using Kiro for spec-driven development and 100+ EARS requirements with full traceability to code.

The full technical deep-dive with demo videos, all four architecture flow diagrams, cost analysis, and the complete build journey is here:

Read the full article on AWS Builder Center

GitHub: AgriNexus AI

If this resonated — a like on the article puts an AI agronomist in one more farmer's pocket 🙏 Voting closes March 20.

Built with Amazon Bedrock, Kiro, AWS Lambda, DynamoDB, EventBridge Scheduler, Step Functions, SQS FIFO, and WhatsApp Business API.

Schema-First Prompt Engineering: The Gemini Lesson That Will Save Your Production App

PRASAD TILLOO — Fri, 13 Mar 2026 22:45:46 +0000

How enforcing JSON schema at the API level — not just in your prompt text — makes Gemini outputs reliable enough for production.

I built an AI Photography Coach using Google Gemini 3 Pro — it analyzes photos across five dimensions, exposes the AI's reasoning chain, and lets you chat with a mentor that remembers your analysis context. The full project is open-source and the writeup is on Medium.
But the single most transferable lesson from building it wasn't about photography or multimodal AI. It was about how to ask Gemini for structured data without it breaking on you in production.
Here's what I learned.

Quick Project Highlights

📸 Live app: Search "Photography Coach AI" on Google AI Studio
🐙 GitHub: https://github.com/prasadt1/photography-coach-ai-gemini3
📖 Full writeup: https://medium.com/@prasad.sgsits/i-built-an-ai-photography-coach-with-google-gemini-3-pro-heres-everything-i-learned-45411abef25c

The Problem
Early in development I asked Gemini for structured data the way most developers do the first time — in plain English inside the prompt:
"Analyze this photo and return a JSON object with five dimension scores..."
It worked perfectly in testing. It broke constantly in production — markdown fences, preamble text, explanation paragraphs, inconsistent field names. Every variation that JSON.parse() couldn't handle.
The fix is simple once you know it, but it's not obvious from the docs.

The Wrong Way
typescript// Asking nicely in prompt text — unreliable in production
const prompt = Analyze this photo and return JSON with composition_score, lighting_score, technique_score, creative_score, subject_score, and a reasoning object...

const response = await geminiClient.generateContent({
contents: [{ parts: [{ text: prompt }] }]
});

const text = response.candidates[0].content.parts[0].text;
const parsed = JSON.parse(text);
// 💥 Fails when Gemini adds markdown fences, preamble, or explanation

The Right Way
typescriptconst response = await geminiClient.generateContent({
contents: [{
parts: [
{ inlineData: { mimeType: "image/jpeg", data: base64Image } },
{ text: ANALYSIS_PROMPT }
]
}],
generationConfig: {
responseMimeType: "application/json", // enforce at API level
responseSchema: { // define exact contract
type: "object",
properties: {
composition_score: { type: "number" },
lighting_score: { type: "number" },
technique_score: { type: "number" },
creative_score: { type: "number" },
subject_score: { type: "number" },
reasoning: {
type: "object",
properties: {
observations: { type: "array", items: { type: "string" } },
reasoning_steps:{ type: "array", items: { type: "string" } },
priority_fixes: { type: "array", items: { type: "string" } }
},
required: ["observations", "reasoning_steps", "priority_fixes"]
}
},
required: [
"composition_score", "lighting_score", "technique_score",
"creative_score", "subject_score", "reasoning"
]
}
}
});

// Now deterministic — safe to parse without defensive gymnastics
const result = JSON.parse(response.candidates[0].content.parts[0].text);

Why This Works
responseMimeType: "application/json" tells Gemini at the API level to return pure JSON — no markdown fences, no preamble, no trailing explanation. This alone eliminates most production failures.
responseSchema defines the exact contract. Gemini will not return fields outside it or omit required ones. Your frontend parsing becomes deterministic.
Together they shift the reliability burden from your parsing code to the API itself — which is exactly where it belongs.

The Deeper Lesson
Schema enforcement changes how you design prompts. When you define the schema first, you're forced to think clearly about what you actually need from the model. That clarity produces better prompts, better outputs, and fewer surprises at 2am.
Define your schema before you write your first prompt. Not after.
In Photography Coach AI, this schema-first approach is what made it possible to drive five separate UI tabs — Overview, Detailed Analysis, Mentor Chat, AI Enhancement, Economics — all from a single structured Gemini response. No ambiguity, no defensive parsing, no fallback logic for malformed outputs.

Troubleshooting Tips

Empty responses after adding responseSchema: Check that your schema property names exactly match what you're asking for in the prompt. Mismatches between prompt language and schema field names are the most common cause of silent failures
Nested objects failing: Define required arrays at every level of nesting, not just the top level
Numbers returning as strings: Explicitly set type: "number" for all numeric fields — Gemini will default to string if the type is ambiguous
Schema too complex: If your schema has more than ~15 fields, consider splitting into two sequential API calls rather than one monolithic schema
Testing tip: Validate your schema against Gemini's output in AI Studio's playground before wiring it into your frontend — iterate the schema there, not in code

CTAs

⭐ Star the repo: https://github.com/prasadt1/photography-coach-ai-gemini3
📖 Full project writeup on Medium: https://medium.com/@prasad.sgsits/i-built-an-ai-photography-coach-with-google-gemini-3-pro-heres-everything-i-learned-45411abef25c
🚀 Try the live app: Search "Photography Coach AI" on Google AI Studio
🐛 Open an issue with questions or schema edge cases you've hit

Header image prompt:

"Dark background technical diagram showing two code blocks side by side labeled 'Wrong Way' in red and 'Right Way' in green. Center arrow between them. Bottom label: 'Schema-First Prompt Engineering with Gemini'. Monospace font, minimal design, teal and red accent colors. Dev.to header format, wide aspect ratio. Filename: devto-schema-first-header.png"

gemini #googleai #promptengineering #typescript #webdev #llm #javascript #opensource

Schema-First Prompt Engineering: The Gemini Lesson That Will Save Your Production App

PRASAD TILLOO — Wed, 18 Feb 2026 09:18:41 +0000

Quick Project Highlights

📸 Live app: https://ai.studio/apps/drive/1v2uJziWHPOHRES4EmmWXavydKZAe8ary?fullscreenApplet=true
🐙 GitHub: https://github.com/prasadt1/photography-coach-ai-gemini3
📖 Full writeup: https://medium.com/@prasad.sgsits/i-built-an-ai-photography-coach-with-google-gemini-3-pro-heres-everything-i-learned-45411abef25c

The Problem
Early in development I asked Gemini for structured data the way most developers do the first time — in plain English inside the prompt:
"Analyze this photo and return a JSON object with five dimension scores..."
It worked perfectly in testing. It broke constantly in production — markdown fences, preamble text, explanation paragraphs, inconsistent field names. Every variation that JSON.parse() couldn't handle.
The fix is simple once you know it, but it's not obvious from the docs.

Wrong Way

// Wrong Way: Unstructured Prompt
const ai = await getGenAIClient();

const prompt = `Analyze this image and tell me if it's good or bad.
  Give me some feedback.`;

const result = await ai.models.generateContent(prompt);
console.log(result.response.text());

// Output: "It's okay. The lighting is bit dark..."
// 💥 Unpredictable format, breaks JSON.parse() in production

The Right Way

// Right Way: Schema-First
const ai = await getGenAIClient();

const schema = {
  type: Type.OBJECT,
  properties: {
    score:        { type: Type.NUMBER },
    feedback:     { type: Type.STRING },
    improvements: { type: Type.ARRAY, items: { type: Type.STRING } }
  },
  required: ['score', 'feedback', 'improvements']
};

const result = await ai.models.generateContent({
  model: 'gemini-3-pro-preview',
  contents: { role: 'user', parts: [{ text: prompt }] },
  config: {
    responseMimeType: 'application/json',
    responseSchema: schema
  }
});

console.log(JSON.parse(result.text));
// Output: { score: 7, feedback: "Good composition...",
//           improvements: ["Increase exposure"] }

Bonus Pattern: Dynamic Client for AI Studio Compatibility
If you're publishing to Google AI Studio, use a dynamic client helper that checks for the shared API key first before falling back to your environment variable:

const getGenAIClient = async (): Promise<GoogleGenAI> => {
  // 1. Check for shared API key (AI Studio Published App Mode)
  if (typeof window !== 'undefined' && (window as any).aistudio?.getSharedApiKey) {
    try {
      const apiKey = await (window as any).aistudio.getSharedApiKey();
      if (apiKey) return new GoogleGenAI({ apiKey });
    } catch (e) {
      console.debug("Shared key failed, falling back to env var:", e);
    }
  }
  // 2. Fallback to local env var (Standard Dev Mode)
  return new GoogleGenAI({ apiKey: process.env.API_KEY || '' });
};

This lets the same codebase work in both local development and AI Studio's published app mode without any changes — no environment-specific builds, no conditional deploys.

Why This Works
responseMimeType: "application/json" tells Gemini at the API level to return pure JSON — no markdown fences, no preamble, no trailing explanation. This alone eliminates most production failures.
responseSchema defines the exact contract. Gemini will not return fields outside it or omit required ones. Your frontend parsing becomes deterministic.
Together they shift the reliability burden from your parsing code to the API itself — which is exactly where it belongs.

The Deeper Lesson
Schema enforcement changes how you design prompts. When you define the schema first, you're forced to think clearly about what you actually need from the model. That clarity produces better prompts, better outputs, and fewer surprises at 2am.
Define your schema before you write your first prompt. Not after.
In Photography Coach AI, this schema-first approach is what made it possible to drive five separate UI tabs — Overview, Detailed Analysis, Mentor Chat, AI Enhancement, Economics — all from a single structured Gemini response. No ambiguity, no defensive parsing, no fallback logic for malformed outputs.

Troubleshooting Tips

Empty responses after adding responseSchema: Check that your schema property names exactly match what you're asking for in the prompt. Mismatches between prompt language and schema field names are the most common cause of silent failures
Nested objects failing: Define required arrays at every level of nesting, not just the top level
Numbers returning as strings: Explicitly set type: "number" for all numeric fields — Gemini will default to string if the type is ambiguous
Schema too complex: If your schema has more than ~15 fields, consider splitting into two sequential API calls rather than one monolithic schema
Testing tip: Validate your schema against Gemini's output in AI Studio's playground before wiring it into your frontend — iterate the schema there, not in code

CTAs

⭐ Star the repo: https://github.com/prasadt1/photography-coach-ai-gemini3
📖 Full project writeup on Medium: https://medium.com/@prasad.sgsits/i-built-an-ai-photography-coach-with-google-gemini-3-pro-heres-everything-i-learned-45411abef25c
🚀 Try the live app: https://ai.studio/apps/drive/1v2uJziWHPOHRES4EmmWXavydKZAe8ary?fullscreenApplet=true
🐛 Open an issue with questions or schema edge cases you've hit

gemini #googleai #promptengineering #typescript #webdev #llm #javascript #opensource

Schema-First Prompt Engineering: The Gemini Lesson That Will Save Your Production App

PRASAD TILLOO — Wed, 18 Feb 2026 09:18:41 +0000

Quick Project Highlights

Wrong Way

// Wrong Way: Unstructured Prompt
const ai = await getGenAIClient();

const prompt = `Analyze this image and tell me if it's good or bad.
  Give me some feedback.`;

const result = await ai.models.generateContent(prompt);
console.log(result.response.text());

// Output: "It's okay. The lighting is bit dark..."
// 💥 Unpredictable format, breaks JSON.parse() in production

The Right Way

// Right Way: Schema-First
const ai = await getGenAIClient();

const schema = {
  type: Type.OBJECT,
  properties: {
    score:        { type: Type.NUMBER },
    feedback:     { type: Type.STRING },
    improvements: { type: Type.ARRAY, items: { type: Type.STRING } }
  },
  required: ['score', 'feedback', 'improvements']
};

const result = await ai.models.generateContent({
  model: 'gemini-3-pro-preview',
  contents: { role: 'user', parts: [{ text: prompt }] },
  config: {
    responseMimeType: 'application/json',
    responseSchema: schema
  }
});

console.log(JSON.parse(result.text));
// Output: { score: 7, feedback: "Good composition...",
//           improvements: ["Increase exposure"] }

Troubleshooting Tips

CTAs

⭐ Star the repo: https://github.com/prasadt1/photography-coach-ai-gemini3
📖 Full project writeup on Medium: https://medium.com/@prasad.sgsits/i-built-an-ai-photography-coach-with-google-gemini-3-pro-heres-everything-i-learned-45411abef25c
🚀 Try the live app: https://ai.studio/apps/drive/1v2uJziWHPOHRES4EmmWXavydKZAe8ary?fullscreenApplet=true
🐛 Open an issue with questions or schema edge cases you've hit

gemini #googleai #promptengineering #typescript #webdev #llm #javascript #opensource

I Built a Portfolio That Thinks Like an Architect (Using Google Gemini + Cloud Run)

PRASAD TILLOO — Mon, 02 Feb 2026 07:54:12 +0000

This is a submission for the New Year, New You Portfolio Challenge Presented by Google AI

An experience-driven AI portfolio that matches client challenges against 15+ years of real delivery patterns — not generic ChatGPT responses.

👋 About Me

I’m Prasad Tilloo — an independent Enterprise Architect and Transformation Consultant based in Germany who's spent 15+ years helping enterprises navigate cloud migrations, AI adoption, and compliance-heavy transformations. I've worked with everyone from healthcare giants to climate tech startups.

But here's the thing: every client asks the same question - "Have you done something like this before?"

That question inspired me to build something different. Not another generic AI chatbot, but a portfolio that actually thinks like an architect.

Portfolio

🌐 Cloud Run App link:

🎥 Quick Demo Video (2 minutes):

👉 Try it yourself at https://prasadtilloo.com/tools/project-similarity

Key Screens

Homepage

Similarity Matcher

How I Built It

The Core Idea: Experience-Driven AI

Most AI portfolios just slap ChatGPT onto a website. I wanted something smarter.

My system analyzes your project description against structured signals from real projects:

Industry patterns
Technical constraints
Anti-patterns I've observed
Decision frameworks that worked
Retrospective lessons

Google Gemini 1.5 Pro handles the reasoning, but it's constrained by real project metadata - no hallucinated architectures.

Tech Stack & Google AI Integration

Frontend: React + TypeScript + Tailwind

Backend: Node.js on Google Cloud Run

AI: Google Gemini 1.5 Pro

Data: Google Sheets (lightweight CRM)

Email: SendGrid for lead capture

AI-Assisted Development Workflow

I used Google Gemini + Antigravity in a "vibe coding" approach:

Gemini for architectural reasoning and refactoring suggestions
Antigravity for rapid UI iteration
Human decisions for UX structure, domain modeling, and production hardening

Example prompt I used:

"Build experience-driven matching using project metadata, not embeddings. Score industry, constraints, anti-patterns, and decision frameworks. Return top 3 with confidence."

This accelerated development while keeping architectural decisions manual.

Production Architecture

🏗️ System Overview:

Container-level architecture showing Cloud Run hosting both frontend and API, Gemini-powered similarity engine, Google Sheets CRM, SendGrid email delivery, and Namecheap DNS with managed SSL.

Key Components:

Frontend: React + TypeScript + Vite SPA
Backend: Node.js API on Google Cloud Run
AI Engine: Google Gemini 1.5 Pro for similarity matching
Data Storage: Google Sheets for CRM + Static JSON for projects
Email: SendGrid for transactional delivery
Infrastructure: Custom domain + SSL via Cloud Run

Data Flow:

User describes project challenge via React interface
Cloud Run API processes request and queries project database
Gemini AI analyzes similarity patterns against 15+ years of experience
System generates personalized insights and recommendations
SendGrid delivers results via email after lead capture
Google Sheets stores lead information for follow-up

Browser (React + Vite)
↓
Cloud Run (Node API)  
↓
Gemini 1.5 Pro
↓
Project Similarity Engine
↓
Google Sheets (Leads + Tool Requests)
↓
SendGrid (Email Delivery)

Why Cloud Run? Zero infrastructure ops, automatic HTTPS, simple CI/CD. Perfect for a consulting business.

Custom Domain Setup: Namecheap DNS → Google Cloud Run with automatic SSL certificates.

What I'm Most Proud Of

1. It's Actually Useful

This isn't a demo. It's my real business system. Clients use it to understand if their project matches my experience before booking calls.

2. Experience-Driven AI Differentiation

Instead of generic responses, visitors get:

"Here's the closest project I've done like yours — what worked, what failed, and what I'd do differently today."

3. Production-Grade Implementation

GDPR-safe lead capture
Rate limiting on AI endpoints
Feature flags for staged rollout
Email gating before AI results
Proper error handling and fallbacks

4. Strategic Focus

I intentionally enabled only one AI feature for this submission. Why? To showcase architectural thinking over feature dumping. The Project Similarity Matcher demonstrates real business value, not AI novelty.

5. Real Business Impact

Live deployment serving actual clients at prasadtilloo.com
Lead qualification through AI matching
Case studies with NDA-protected artifacts
Evidence-based trust building

The Result? A portfolio that doesn't just show my work - it thinks like I do.

Instead of telling clients "I'm experienced," it shows them exactly how my experience applies to their specific challenge.

That's the difference between a portfolio and a business system.

🔗 Links:

Live Site: prasadtilloo.com
Demo Video: 2-minute Loom walkthrough
Source Code: GitHub Repository
Competition Page: Technical Deep Dive

Built for the Google AI "New Year, New You" Portfolio Challenge 🏆

From Student to AI Architect: How Multi-Agent Systems Rewired My Understanding of Intelligent Applications

PRASAD TILLOO — Sat, 13 Dec 2025 11:29:01 +0000

This is a submission for the Google AI Agents Writing Challenge: Learning Reflections & Capstone Showcase

My Learning Journey

Five days ago, I submitted an AI Photography Coach—a multi-agent system capstone project for the 5-Day AI Agents Intensive Course with Google that fundamentally changed how I think about building intelligent applications. This wasn't just another capstone project. It forced me to confront a question I'd been wrestling with for months: What separates a system that appears intelligent from one that genuinely solves problems?

Coming into the Google AI Agents Intensive, I understood agents conceptually. I'd read the papers. I'd tinkered with LLMs. But there was a critical gap between understanding and architecting—and this course obliterated that gap.

The breakthrough came on Day 2 when we deconstructed the difference between monolithic LLM calls and specialized agent systems. Most people use LLMs like Swiss Army knives—one model trying to do everything. The course showed me something radical: the power isn't in having one smart model; it's in having many focused ones working together.

Key Concepts & Technical Deep Dive

The ADK Native Orchestrator Pattern: Architecture That Actually Works

The game-changer for my photography coach was understanding the ADK-native orchestrator pattern. Instead of building a custom routing system, I leveraged Google's Agent Development Kit's built-in orchestration capabilities.

Here's the architecture that makes this work:

Core Agents (Shared):

Vision Agent (Sub-Agent 1): Uses Gemini 2.5 Flash Vision for image analysis—EXIF extraction, composition analysis, defect detection with severity scoring, and strength identification
Orchestrator Agent (Parent): The intelligent coordinator that manages session state, routes requests to specialized sub-agents, implements context compaction, and persists memory using SQLite + ADK Cloud Memory adapters
Knowledge Agent (Sub-Agent 2): Powered by Gemini 2.5 Flash with hybrid CASCADE RAG for query understanding, knowledge retrieval, response generation, citation grounding, and skill-level adaptation

Key Pattern: Orchestrator Mediates All Communication

This is critical: the orchestrator mediates all agent communication. The Vision Agent doesn't talk directly to the Knowledge Agent. Instead:

Vision Agent outputs structured analysis (exif dict, composition_summary, detected_issues, strengths)
Orchestrator aggregates this with session context
Knowledge Agent receives unified input context and generates the coaching response
Orchestrator updates conversation history and persists session state

This eliminates cascading errors and makes the entire system debuggable in ways direct agent-to-agent communication could never achieve. It's a pattern, not just a feature—and it's built into ADK natively.

Three Infrastructure Approaches: One System, Multiple Deployments

What fascinated me was how the same core agents can run through three different interfaces. This distinction between agent architecture and deployment architecture was the second major revelation:

1. ADK Runner (Cloud)

Components: LlmAgent, Runner, Sessions
Interface: Vertex AI / Cloud Run
When to use: Production-grade photo coaching with cloud scalability and managed infrastructure

2. MCP Server (Desktop)

Components: JSON-RPC 2.0 over stdio transport
Capabilities: 3 tools exposed per agent
Deploy: Claude Desktop, local machine
When to use: Local development, integration with Claude, running alongside other MCP-compatible tools
This was the breakthrough for me—MCP protocol meant I could integrate my agents with any compatible application without rewriting core logic

3. Python API (Custom)

Components: Direct imports, function calls
Deploy: Notebooks, custom apps, Streamlit dashboards
When to use: Research, experimentation, embedded systems, educational contexts

The realization: agent architecture is orthogonal to deployment architecture. Design the agent system once (orchestrator + specialized agents), then expose it through whichever interface makes sense for your use case. This separation of concerns is elegant and powerful.

The Critical Insight: Negative Space Design

During debugging, I discovered something counterintuitive: the best agent isn't the one with the smartest prompts; it's the one with the clearest responsibilities.

I spent as much time defining what each agent should not do as defining what it should do:

Vision Agent: Analyzes only what's in the image. Never generates teaching advice or pedagogical content.
Knowledge Agent: Teaches based on provided analysis. Never re-analyzes images or duplicates vision work.
Orchestrator: Routes and aggregates. Never generates original analysis or coaching—only synthesis.

This negative space design—drawing boundaries tighter than seemed necessary—eliminated entire categories of bugs. It forced each agent's responsibility to be so crystalline that context compaction became natural, error handling became obvious, and delegation logic became transparent.

Context Engineering and Memory as Foundation

The course's emphasis on context compaction changed how I architect systems. In a multi-agent ecosystem, context is a resource, not a convenience.

The photography coach uses a two-tier memory system:

Session memory: Short-term context about current analysis and conversation
User model: Long-term history of preferences, skill progression, learning patterns

The orchestrator implements context compaction before passing context between agents:

Summarizing vision analysis into structured fields (rather than raw model output)
Truncating conversation history intelligently
Maintaining only relevant user profile context

This isn't optimization; it's architectural necessity. With three agents and multiple turns, uncompressed context balloons quickly. Compaction forces rigor in what information actually matters for decision-making.

Tools: The Backbone of Agent Capability

The course reframed my entire thinking: agents aren't intelligent because of their prompts; they're intelligent because of their tools.

For the photography coach:

Vision APIs: Constrain analysis to structured outputs
Vector Database (CASCADE hybrid RAG): Guarantee knowledge comes from grounded sources
Custom Tools: Photography-specific calculations (depth of field relationships, shutter speed ratios, focal length conversions)
Memory Tools: SQLite adapters for persistence

Each tool is a constraint that prevents hallucination. When a Vision Agent can only output structured EXIF data and composition summaries, it can't invent. When the Knowledge Agent can only pull from photography principles via RAG, its advice has traceable citations. Tools aren't features you add; they're guardrails you build into the system's fabric.

Reflections & Takeaways

What the Course Got Right

The hands-on codelabs genuinely built intuition. I didn't just read about multi-agent systems; I implemented them, broke them, debugged them, rebuilt them. The guest speakers—engineers shipping agentic AI at scale—grounded theory in production reality. Learning about the ADK's orchestrator pattern in isolation, then building it into a real system, created understanding that no lecture could achieve.

The emphasis on architecture as design constraint was transformative. Before this course, I thought about features and interfaces. Now I think about specialization, coordination, failure modes, and the boundaries between components.

Honest Critique

The course could dive deeper into failure modes in multi-agent systems. They fail in new ways: cascading errors compounding across agents, subtle bugs in delegation logic, context compaction artifacts that only emerge in production. A dedicated deep-dive would be invaluable.

More explicit guidance on choosing deployment interfaces would help practitioners. The fact that one agent system can work through ADK Runner, MCP Server, or custom Python API is powerful—but knowing when to use each requires hands-on experience or mentorship.

How This Changes What I Build Next

I'm now architecting systems fundamentally differently:

Define agent specialization and boundaries first, before any code
Treat the orchestrator pattern as primitive, not optional
Make context compaction a first-class design concern
Use tools to constrain behavior, not enhance capability
Choose deployment interface after agent architecture is finalized, not before

The photography coach is just the beginning. The real power is understanding that intelligent systems are built through specialization and clear boundaries, not through smarter prompts or larger models. Architecture beats parameters every time.

The Bigger Picture

If you're considering the AI Agents Intensive: do it. But go in expecting it to change your architecture mindset, not just teach you new libraries.

The future of AI isn't smarter models—it's smarter systems. Systems that know their limitations, delegate to specialists, maintain clear boundaries, and communicate through structured protocols. Systems where architecture is a design tool, not an afterthought. That's what this course teaches. That's what matters now.

Technical Stack & Architecture

Core Agents (ADK Native):

Vision Agent: Gemini 2.5 Flash Vision (image analysis, EXIF extraction, composition scoring, defect detection)
Orchestrator Agent: Session management, context compaction, routing, memory persistence
Knowledge Agent: Gemini 2.5 Flash + Hybrid CASCADE RAG (knowledge retrieval, citations, skill adaptation)

Memory & Persistence:

SQLite for session state
ADK Cloud Memory adapters
Conversation history management
User model tracking

Deployment Options:

ADK Runner: Cloud/Vertex AI production deployment
MCP Server: Desktop deployment with JSON-RPC 2.0 (Claude Desktop, local tools)
Python API: Notebooks, Streamlit, custom applications

Integration Patterns:

Orchestrator-mediated agent communication (no direct agent-to-agent)
Structured context passing between agents
RAG-grounded knowledge retrieval with citations
Context compaction before inter-agent communication

Project Links: