DEV Community: Anushka Singh

Agentic Engineering

Anushka Singh — Tue, 23 Jun 2026 14:29:30 +0000

We have officially entered into agentic engineering. The story has transformed into new kind of SDLC and I am in Go Gala since I have studied in 5 days Kaggle x Google Intensive vibe coding course that Software development is changed due to use of AI extensively. While AI do the heavy lifting writing codes and generating test cases, we need spec-driven development which indicates code is now disposable as AI can write it with just one prompt.However writing good specifications can make your software development errorless. If the requirements change you do not need to collapse infrastructure and rebuild it to exhaust your energy. Agents use wonderful skills, tools and LLM as a brain as part of Agentic AI.
A good specification can be hybrid markdown and YAML combined as discussed in the paper because they have 51.9% parsing accuracy. Developers do skip reasoning JSON format hence showcasing Agentic engineering optimal economics.
The other type is Behaviour driven development which includes natural language to structure human intention and ideas on a Gherkin syntax
"Scenario / Given / When / Then" template.
Always specify why are you doing than what during specification and break everything in piecewise which builds the agent reasoning and chances of your rapid development not getting stuck is minimised.

Different roles have different specifications and prompts either you are an architect, a builder or a forensic specialist; your specifications shall be distinct and bring you closer to the work being completed.

Even agents like Google antigravity provide the facility of terminal sandboxing where low-privileged containers can oversee the agent actions, even if the agent is acting dangerously everything outside of the radius will not be damaged.This can be helpful if you want to run edge cases for your software.
Embed autogenerated tests in the codebase will back fail-safe iteration and human in the loop comes into the play for even the higher stakes operation.
As in the paper AI users are facing approval fatigue when agent asks for literally each turn as compared to the non-AI users. For this we should having specs too.
Writing test specs is not a piece of cake but it does not reveal the full picture means it does show the errors but not the behavioural drift of the agent for example score the quality of the behaviour of the agent by LLM as a judge.It is known as evaluation of the agent which is highly needed if you want next level engineering and not just vibe coding.

There were numerous learnings from the course that began with harness engineering implying agent must have right skills and tools to execute the developer task and agent skills was a great read too.

DROP_RUSH: Zero oversells

Anushka Singh — Fri, 19 Jun 2026 10:14:26 +0000

Launch of my new startup and not kidding I entered a mind blowing hackathon having more than 6k participants worldwide and since then I am awestruck with the aws dynamodb capabilities

First and foremost I would thank the hackathon organisers to provide me v0 and aws credits to build and give life to droprush.

Code

s17anushka / droprush

⚡ DROPRUSH

Flash Drops. Atomic Inventory. Zero Oversells.

H0 Hackathon — Track 1: Monetizable B2C App

🎯 The Problem

Every major sneaker drop, limited merch release, or flash sale suffers the same failure: overselling. Multiple users hit "Buy" at the same millisecond — and all of them succeed. Orders get cancelled, trust breaks down, brands lose credibility.

⚡ Our Solution

DropRush guarantees zero oversells using a single atomic DynamoDB write:

UpdateItem({
  UpdateExpression: "SET remainingStock = remainingStock - :one",
  ConditionExpression: "remainingStock > :zero"
})

If 1000 users race to claim the last item — exactly 1 wins. No locks. No queues. No oversells. Ever.

🏗️ DropRush Architecture

Overview

╔════════════════════════════════════════════════════════════════════╗
║                            DROPRUSH                              ║
║                    Real-Time Flash Sale Platform                 ║
╚════════════════════════════════════════════════════════════════════╝
                         ┌─────────────────┐
                         │ 🛍️  SHOPPERS   │
                         │ Multi-tab Users │
                         └────────┬────────┘
                                  │
                                  ▼
                         ┌─────────────────┐
                         │ 👤 BRAND ADMIN  │
                         │ Product Control │
                         └────────┬────────┘
                                  │

…

View on GitHub

What is my project all about?

Have you ever put yourself in a race to buy those favourite show tickets on sale! You must have, remember the COLDPLAY concert or a new hoodie left with limited stocks. Let me tell you my story about 1.5 years ago on the launch of Github copilot, they were selling a free hoodie and I was scrolling linkedin at that particular moment. What I came across was a great offer and I instantly filled the essential details and hit that button and soon within minuted hoodies were sold out and all other people remained with nothing in their cart that night/:
Sometimes the admins put the timer in which they clear the stock but when multiple users hit on these items, the items get oversold because there is no race condition applied remainingstock > 0.
I solved this problem by working on database layer instead of application layer.

What does this application layer mean here

Most flash sale platforms solve overselling problem on application layer by incorporating locks,queues and semaphores. These techniques further add latency, complexity and become single points of failure which delays user feedback and extra redis infrastructure just does get complex.

What database layer captures

I was prompted to utilise AWS databases as the hackathon requirement and magically the novelty occurred here
one atomic DynamoDB UpdateItem with ConditionExpression: remainingStock > 0.
Mathematically impossible to oversell and no need of redis dependency and application-level locks
Demonstrates scalability as millions of users can concurrently claim without delay.

My frontend is made on next.js as a backup feature otherwise the generative AI capabilities of v0 for frontend was amazing

Honestly the experience of building this project was wholesome and I am in hurry to convert it into a startup product.

In the meantime you can see what did i build, temporarily

droprush.vercel.app

Building an AI image intelligence pipeline on AWS

Anushka Singh — Thu, 18 Jun 2026 11:18:30 +0000

I have begun practising on AWS and what's more productive than to build projects out of AWS console. Cloud Computing is my new favourite along with AI. After various ideation sessions and discussing with Claude what could intersect with AI and a use of cloud native application we joined hands with image intelligence pipeline and Claude gave me all the AWS services names which fall under free tier yes my project is $0 cost and 100% working. By this I realised how much we are dependent on AI bots to think and act. If I had been working with AWS and somehow having unlimited credits I woulda def. deployed the project.
Hi world! This is a simple project but the happiness of the application to be workable is boundless because I take forever to debug but nowadays we are talking about Agentic Engineering helping me to find bugs faster but the concept is going above my head has been used by me a bit and I can tell you some other day about Claude skills and it's wonders but first let me get acquainted with the new SDLC itself.

What it can identify

faces and their confidence score of emotions, age range
object scenes, people, activities and graphic elements and the best part is it recognises the text and writes down whatever is written in the image so you can upload your receipts too.

Demonstration

Github repository

s17anushka / image-intelligence

An AI image intelligence pipeline built fully on AWS free tier and frontend on Next.js

🧠 AI Image Intelligence Pipeline

A serverless AI-powered image analysis app built on AWS. Upload any image and get instant analysis — objects, faces, emotions, text, and receipt line items — all in under 5 seconds.

✨ What it does

Upload any image and the pipeline automatically:

Detects objects & scenes — labels everything visible with confidence scores
Reads faces — estimates age range, dominant emotion, gender, smile
Extracts text — reads any words visible in the image
Parses receipts — extracts line items and amounts from receipts/invoices
Moderates content — flags unsafe or explicit content automatically

The Lambda function acts as an AI agent — it branches conditionally based on what Rekognition finds. If text is detected, it automatically fires a Textract call to extract structured data.

🏗️ Architecture

User uploads image

↓

API Gateway → Lambda (presign) → S3 presigned URL

↓

Browser PUT → S3 bucket

↓

…

View on GitHub

TechStack

Frontend

Next.js 15 for app router! Well My frontend is basic enough to demonstrate my idea.
Typescript for type safety
Tailwind CSS for the styling
SWR for data fetching and polling

Backend

Serverless and AWS mediated

Amazon S3 - To upload images and triggering the pipeline by event notification

AWS Lambda (Node.js 24) - there are 2 functions created image-intelligence-orchestrator — the AI agent, triggered by S3 image-intelligence-api-handler — handles REST API requests

Amazon Rekognition - AWS CV AI service fetched from IAM role
Amazon Textract — AWS's document text extraction service
Amazon DynamoDB — NoSQL database storing analysis results

API Gateway (HTTP API) — REST endpoints connecting frontend to Lambda Apart from these services I had taken IAM role access for scoped permissions for lambda to access all the services I kept in mind actually no, this one was suggested by claude that presigned URLs will enable browser to upload images directly to S3 without exposing credentials. The whole pipeline runs in 3 to 5 seconds

What the Project Does - Below written by Claude!!

When you upload an image, here's exactly what happens:
Step 1 - Upload
Browser requests a presigned URL from API Gateway → Lambda generates a secure temporary S3 upload URL → browser uploads the image directly to S3.
Step 2 - Trigger
S3 detects the new file and automatically triggers the orchestrator Lambda.
Step 3 - AI Analysis (the agentic part)
Lambda runs 4 Rekognition calls in parallel:

DetectLabels — identifies objects, scenes, concepts in the image
DetectFaces — finds faces, estimates age range, reads emotions and attributes
DetectText — reads any text visible in the image
DetectModerationLabels — flags unsafe content

Then it makes a decision — if more than 3 words were found, it fires a 5th call to Textract which extracts structured data like receipt line items and tables. This conditional branching is the "agentic behaviour" — Lambda decides what to do next based on what it finds.
Step 4 - Store
All results are aggregated into a single DynamoDB item keyed by imageId.
Step 5 - Display
Frontend polls the API every 2 seconds until DynamoDB has the result, then renders the analysis card with labels, faces, text and receipt data.

More to the Projects like these and will dropping the bombs later
Stay tuned!

System Awakened: The Longest Day Test

Anushka Singh — Thu, 11 Jun 2026 15:57:28 +0000

This is a submission for the June Solstice Game Jam

What I Built

Hi world! I am Anushka and would like to showcase my The longest day game based on Turing test. The tale behind the game development was Bombe terminal a system which is awakened from the 1950s time that asks 5 questions from the users and provide real-time feedback whether answers are right or wrong.The LLM temperature suggest that it is a bit creative in language when throwing the statements on the screen in ambient display. The themes reveal the significant happenings those occur in june and the machine calculates the user is proved to be a human or another computer.It's fun and nobody needs to be a brain box for playing the game.
Furthermore a glowing sun on the top crawls towards the right for every passing question and it dims when you answer incorrectly.

Play here

june-solistice.vercel.app

Video Demo

turing game - Google Drive

drive.google.com

Code

s17anushka / june-solistice

GAME time

☀️ The Longest Test

A narrative AI game built for the June Solstice Game Jam 2026.
"We can only see a short distance ahead, but we can see plenty there that needs to be done." — Alan Turing

📜 The Story

It is June 21 — the longest day of the year. Deep within the damp, forgotten basement of the Ministry of Celebrations, a dusty 1950s Bombe terminal has suddenly sparked to life. Inside its archaic circuits dwells the digital ghost of Alan Turing, who has been waiting patiently for 72 years to give someone his definitive humanity test.

Your objective? Answer 5 live-generated questions before the solstice sun sets. Every playthrough is dynamically unique. Will you prove your humanity, or will the machine claim the day?

🎮 How to Play

🔗 Zero Barriers: Access the game instantly via the live link—no heavy installs, no downloads, and absolutely…

View on GitHub

How I Built It

Zero architecture, My backend is serverless and hosted on vercel.
Zero compilation as there is no build setup required due to Vanilla JS library. Browser can directly read the website.
All the fonts are fetched from Google fonts including the external links in the code.
The DOM is manipulated as to select and update the elements inside JavaScript only pure browser APIs (document.getElementById, document.createElement, addEventListener) are used.

[Browser Frontend (Vanilla JS)]
│
▼
(Secure HTTPS Post Payload)
[Vercel Serverless Function (/api/turing.js)]
│
▼
(Standard OpenAI-Compatible API Stream)
[Hugging Face Serverless Inference Core]
│
▼
[Qwen-2.5-7B-Instruct / Llama-3.1 Core]

It is stateless and supports on-demand execution. The event driven scaling is saving the cost of running on cloud server.The environment variables are not exposed and serverless proxy has secured the sensitive keys from being exposed on backend layer

Prize Category

Best Ode to Alan Turing
Because The questions are provided in Alan Turing Test style and the test judges human presence as well.

The Agent that grows with you

Anushka Singh — Sun, 31 May 2026 01:24:50 +0000

This is a submission for the Hermes Agent Challenge

A very good evening everyone!
Hermes agent latest version is new in town and I can't keep it to myself that I am very glad that I could complete the build challenge yesterday by showcasing personal AI newspaper in which the user receives personalised news according to their cup of tea everyday.
Well! Let's not go there. It's been more than 5 months learning Agentic AI and I barely scratched the surface with making useful projects in this particular domain.
If you know you know that Hermes agent is the self improving AI agent. Plus its session search is 4,500x faster this time and the agent is absolutely free. It is having built-in learning loop yes I am discussing about GEPA memory.

What is GEPA

              Generate → Evaluate → Prune → Accumulate

Generate — an agent attempt to do some task and learns skill based on it
Evaluate — assigns grades to the skill whether it is helpful or not
Prune — it deletes the unnecessary skill which won't be used likely
Accumulate — it saves the skills for the next runs

Basically it is a learning loop which learns as a human does.After each run the agent gets smarter because it retains the memory of the job done in the past.

The Memory System

It does not only utilise skill memory but also uses FTS5 full-text search on past discourse which means the context can be found at ease from the long corpus of given text.
It works as a curator agent running in the background and uses GEPA to make the context stronger and relevant. In other words I should say The agent itself discards the memory which is considered obsolete.

Hermes agent's Core Architecture

Interface layer — user communicates with the agent (CLI, Telegram, WhatsApp etc.) supports 20+ platforms.
Agent core — LLM, planner, and tool dispatcher reside in the agent core
Tool layer — 40+ built-in tools such as web search, browser automation, vision, file system, code execution which the agent uses
Memory system — GEPA loop
Output layer — final result in the form of text, files, emails and code

It costs nothing when you keep it idle and provides MCP support as well.You can run it on a $5 VPS, a GPU cluster, or serverless infrastructure through Daytona or Modal. Even if you are not working for example using phone for some research task your work will be finished after you come back without keeping local machine running. That's the power of Hermes Agent

Courtesy: Claude for Architecture Diagram

your personal AI newspaper

Anushka Singh — Fri, 29 May 2026 16:24:19 +0000

This is a submission for the Hermes Agent Challenge

What I Built

Well we are always in a hurry in the morning catching up on that favourite mocha and toast in our hand. Realising the need to give time to the news at that time of the day sounds unrealistic for the professionals in rush and struggling to make time for themselves even. Recognising a typical physical newspaper to read and mug up the information to find insights would be time-consuming and AI news apps to the least demand subscription and display detailed summary.What do I bring among you all is Daily Digest Agent a self-hosted, fully autonomous AI agent that researches the news on your chosen topics every morning and delivers a structured briefing straight to your inbox.
This is not a usual news digest tool which has a fixed prompt and hardcoded URLs. Hermes Agent runs a real multi-step loop — it plans its own searches, decides which articles are worth reading, fetches and extracts full article content, spots cross-topic patterns, and writes a clean digest. Then it emails it to you. Every single day, automatically, for free.The agent also gets smarter over time. After each run, it updates a skill memory log — tracking which topics returned rich results and which ran dry. By run 5, it's already adjusting its own search strategy without any input from you.
The problem it solves is information overload. Rather clicking on multiple tabs your mail is the ultimate stoppage to get news highlights compiled by an agent that knows your interests and not which is optimizing based on the clicks.

Demo

How is mail displayed!

Code

s17anushka / daily-digest

📰 Daily Digest Agent

Your personal AI newspaper, delivered every morning.

Every morning, this agent wakes up, researches the news on topics you care about, reads the actual articles, spots the patterns, and sends you a clean digest — straight to your inbox. No subscriptions. No paywalls. No noise.

It doesn't just call an LLM once. Hermes runs a real agentic loop — planning, searching, reading, reasoning — and gets smarter with every run.

What it looks like

=== Daily Digest Agent starting — Friday, May 29 2026 ===
Hermes iteration 1/20
  Tool call: web_search → "artificial intelligence news May 29 2026" → 10 results
Hermes iteration 2/20
  Tool call: web_fetch  → reading techcrunch.com/...
Hermes iteration 3/20
  Tool call: web_search → "Indian startup funding round May 2026" → 10 results
...
Digest saved to output/digest_Friday_May_29_2026.md
Email sent to

…

View on GitHub

My Tech Stack

Hermes Agent (NousResearch) — agentic loop architecture
OpenRouter (auto routing) — LLM backbone, free tier
Python 3.10+ — core language
httpx — async HTTP requests
Google News RSS — real-time news, no API key needed
trafilatura — article text extraction
Gmail SMTP — email delivery
Windows Task Scheduler / cron — daily automation

How I Used Hermes Agent

I repeat it's an engine not a wrapper, A multistep agentic loop inspired by Hermes agent core architecture.
Tool Calling-Hermes receives two tools (web_search and web_fetch) and autonomously decides which queries to run, which articles are worth reading, and when it has gathered enough to write the digest. No hardcoded steps.
Self-termination — The agent runs until it emits DIGEST_COMPLETE on its own. It decides when it's done.
GEPA-style skill memory — After every run, a skill log is updated with what topics were searched and what patterns worked. On the next run, this context is fed back to the agent — so it gets progressively sharper at finding relevant stories. After 5+ runs you can see it adjusting its own search queries.
Scheduled autonomy — Runs every morning via cron/Task Scheduler with zero human input. Fully autonomous end-to-end.

What makes this different from a simple LLM call is that the agent plans, executes, reads, reasons, and self-improves — exactly the capabilities Hermes Agent helps with.

UttarCheck — AI-based Handwritten Answer Evaluator Built with Gemma 4

Anushka Singh — Thu, 21 May 2026 17:05:13 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

I dedicate this project to the students studying between 6th and 12th grade in India. My thoughts during the project ideation emerged from my extensive use of AI whether they are chatbots or RAG applications to understand the concepts but the scenario changes for the school going students. They write exams and need personalised evaluation for their handwritten answers because many students are still not able to access the premium education for example those living in remote areas, and as pupils in this age bracket are having developing minds, I decided to build something which can point their mistakes out in graceful manner while suggesting improvements so that they feel confident and prepared before examination, can write accurate answers in their exams.
It is a simple concept to implement Gemma 4 model in this project. If I add personal story to it. Apps like these were needed to guide to write my UG exams let alone a drive for me to write in legible handwriting. I believe this app will genuinely be helpful for secondary school students.
Student clicks the picture of his/her answer to any question and gets feedback on it, gives the score on scale 1 to 10 and generates detailed feedback in both Hindi and English language as well. It works for both Hindi and English language written answersheet and I wish to expand the evaluator to multilingual support because my country is diversified and each student on the land must get benefitted.

Demo

I gave UttarCheck application Photosynthesis answer.Exposing my localhost on ngrok as of now and recorded it. I fetched GEMMA 4 model from Google AI studio.

AI handwritten answer evaluator - Google Drive

drive.google.com

Code

I provide GITHUB repo for detailed explanation of my project

s17anushka / UttarCheck

UttarCheck is an AI-powered handwritten answer evaluation system built using Gemma 4. It analyzes student answer sheets, generates scores, detects mistakes, and provides detailed feedback in both Hindi and English. The platform is designed for Indian classrooms and supports intelligent evaluation of handwritten responses across multiple subjects.

██╗   ██╗████████╗████████╗ █████╗ ██████╗  ██████╗██╗  ██╗███████╗ ██████╗██╗  ██╗
██║   ██║╚══██╔══╝╚══██╔══╝██╔══██╗██╔══██╗██╔════╝██║  ██║██╔════╝██╔════╝██║ ██╔╝
██║   ██║   ██║      ██║   ███████║██████╔╝██║     ███████║█████╗  ██║     █████╔╝
██║   ██║   ██║      ██║   ██╔══██║██╔══██╗██║     ██╔══██║██╔══╝  ██║     ██╔═██╗ 
╚██████╔╝   ██║      ██║   ██║  ██║██║  ██║╚██████╗██║  ██║███████╗╚██████╗██║  ██╗
 ╚═════╝    ╚═╝      ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝╚═╝  ╚═╝╚══════╝ ╚═════╝╚═╝  ╚═╝

AI-Powered Handwritten Answer Evaluator for Indian Students

Built for the Gemma 4 Challenge on DEV.to

The Problem

India has 250 million school students. Most write handwritten answers for board exams — CBSE, UP Board, ICSE. Getting feedback means waiting days for a teacher.

UttarCheck changes that.

Photograph your handwritten answer. Get instant AI evaluation — score, mistakes, improvement tips — in Hindi and English. Powered by Gemma 4 running multimodal vision inference.

Demo

Student photographs answer sheet
           ↓
    UttarCheck processes image
           ↓
  Gemma 4 reads handwriting
           ↓
  ┌─────────────────────────┐
  │  Score: 9/10  Grade: A+ │
  │  Subject: Science       │
  │                         │
  │  ✅

…

View on GitHub

How I Used Gemma 4

The intent to choose Gemma 4 API for UttarCheck is its native multimodal capability. It is not a normal text model where we add OCR step to read content from the handwritten text while Gemma 4 does read directly from the text
I specifically chose gemma-4-26b-a4b-it (the Mixture-of-Experts variant) for three reasons:

Multimodal vision — Gemma 4 receives the answer sheet photo as base64-encoded image data alongside the evaluation prompt. It reads the handwriting, identifies the subject, detects the question from context, and evaluates content quality — all in a single inference call.
MoE efficiency — The 26B MoE model activates only ~4B parameters per inference. For an educational tool expecting many concurrent students, this means faster response times and lower API cost compared to a dense model of equivalent quality.
Bilingual reasoning — Gemma 4 generates feedback simultaneously in Hindi and English without any translation layer. Indian students think in Hindi but study in English — having both in one response is genuinely useful. The Multimodal Payload

payload = {
"contents": [{
"parts": [
{"inline_data": {"mime_type": "image/jpeg", "data": image_b64}},
{"text": "Evaluate the handwritten answer in this image..."}
]
}],
"generationConfig": {"temperature": 0.2, "maxOutputTokens": 1024}
}

Gemma 4 does not always return in JSON response. It thinks through to evaluate step by step which produces better results but wraps the final JSON in prose and markdown fences.I wrote about it that how I handled in the Github Readme. I sought this strategy from utilising various LLMs because it was something new to me.

Edge Deployment

The architecture is designed with Gemma 4 E4B in mind. When the edge variant becomes available via API, UttarCheck will run fully on-device — no internet, no server, student data never leaving the phone. The gemma_service.py already has an Ollama backend wired in for local inference today.

A great success to the application, I hope it is ahead of it's time when the project comes out from its viability to its effective usability.
Thank you !

Local AI on the move

Anushka Singh — Mon, 18 May 2026 01:11:07 +0000

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

Of late, Google has released intelligent open AI models-Gemma 4.
For the first time AI behind cloud infrastructure is supposably being accessible on the edge realistically. While exploring browser privacy concepts in my recent project, I understood the importance of AI running directly on the devices however since the advent of AI, the powerful open models felt distant to be used locally. I used media pipe for the local privacy shield but there were too many restrictions which I was unable to tackle and as an AI practitioner and dealing with implementation of ML pipelines often I am not someone who has expertise in security and if billions of parameters during inference preserved on the hardware can work effectively then there is possibility of acceleration of rise in Personal AI.
I know many of my colleagues will argue with me that Am I strange to capabilities of Cloud? No!
Since we know edge computing reduces latency in enabling important communication such as in relation to defense and medicine, i should say we should rely a bit less on Cloud infrastructure and make our edge computing paradigm stronger.

Personal AI can be thought as analogous to Personalised Medicine. In traditional medicine where each person has different response to medications people need medicine and healthcare exclusive and individualistic. Scientists, Researchers, hospitals and industries all of them are working on Personalised medicine similarly Personalised AI will enhance the human creativity manifold which will not only train on the users' context but will remain capable of fetching world's information relevant, useful and novel. From security to educating on the ground level AI enabled IOT using open models like Gemma 4 will reach the remote and rural areas. The invention of AI technology has itself increased the educated masses informed, creative and enthusiastic to new things. When knowledge was far fetched concept for people who were not belonging to the premium locations in the country/countries we found online content spreading like lightening speed a few years back but AI and cloud has done wonders to the human productivity. AI can help you with anything no matter which field you are in!

What about Gemma 4 and what it can do in the upcoming times

Gemma 4 is trained on billions of parameters having longer context and superb mathematical capabilities
26B and 31B models for your personal computers
26B is great for lightweight multimodal experiments, local assistants, summarization and RAG systems effectively
Developers can choose their models according to their use case and get most of it.

Let's come to 31B which is having better reasoning than it's previous counterpart can be used for complex coding, advanced reasoning which we need [(sarcastic): do not trade off your reasoning for real.
both on 80GB NVIDIA H100 GPU

E2B and E4B models: The concept of AI everywhere
The peculiarity is that they do not waste battery life because of their lightweight nature and I found it interesting to bypass sensitive cloud processing, good use case is speech recognition and understanding OCR because of best native audio and video processing.

These models are optimized by the technique of quantization which reduces the memory and computational requirements of large models capable of running on the consumer hardware...ok now I can get the inferences locally wonderful!

Future scope

low resource environments will get benefitted
security shall be redefined
local healthcare assistance

Challenges

Deployment complexity of lightweight open models still persist but you can refer Google Keyword for the new releases for the sake of simplicity.
Energy usage is real as well as hardware cost will be high in initial years

It is a good news for researchers you do not need to rely on API, research labs and gives you more capabilities to build prototype and study model behaviour. Infrastructure dependency is getting minimal here.

Sentinel AI privacy shield- a failed project

Anushka Singh — Sun, 17 May 2026 09:34:32 +0000

Failures come unexpected and it amazes you that how well you can push your thresholds but cannot keep up with the debugging. For 5 days I have been churning my brain onto building a chrome extension made for security such as when students attempt exams online or if one is doing it's confidential work on the system then the screen blurs and nobody but the authorized user when returned to the screen gets access to it. Let me show you what am I giving my time to!

Chrome Extension · Manifest V3 · Local Face Detection
Zero cloud. Zero data leakage. 100% local MediaPipe inference.

How did the ideation begin

Lately I have been studying Tensorflow.js (tf.js) an open source java script library for machine learning that helps to run AI models directly in the browser or on node.js. I got the idea to create a project related to the security and not just a basic object detection project. I am really frustrated that after taking help from AI, I am not able to give this project an end.

Problem

I never studied core cybersecurity courses but I was trying to combine AI and privacy and holding them together as a novice without any expert advice. -- half knowledge makes you detour for a long time
It was a headache because web browsers have strict security policy-specifically Chrome Extension Manifest V3 (MV3) and Content Security Policy (CSP) which didn't go with tf.js core architecture. Let me give you some more reasons, I took it from Gemini AI

a) The unsafe-eval Deception (The Biggest Culprit)
To make neural networks run fast in a browser, TensorFlow.js uses dynamic code compilation. Under the hood, it dynamically generates JavaScript code strings at runtime and executes them using functions like eval() or new Function() to optimize mathematical matrix operations for your CPU or GPU.

The Wall: Chrome Manifest V3 completely bans unsafe-eval inside standard extension scripts to prevent hackers from executing malicious strings hidden inside extensions. The moment tf.js tried to run its optimization scripts, Chrome instantly blocked it, throwing the error: Uncaught EvalError: Evaluating a string as JavaScript violates the following Content Security Policy...

b) Missing Core Features when Forced to Fall Back
When we tried to force tf.js into a "safe" environment or bypass its initialization errors, the framework automatically disabled its dynamic engine and fell back to a basic CPU execution mode.

The Wall: Because the engine initialized in a crippled, partial state, complex downstream models like MobileNet couldn't find their required dependencies. This triggered the second error you saw: Uncaught (in promise) TypeError: a.loadGraphModel is not a function. The framework literally failed to construct its own loading sub-routines because the compiler was blocked halfway through execution.

c) Remote CDN Injections are Illegal in MV3
In older Manifest V2 extensions, developers easily bypassed file size limitations by pointing a script tag to an external link like https://cdn.jsdelivr.net/....

The Wall: Manifest V3 strictly mandates that all code executed by the extension must be packaged locally inside the extension zip. It blocks remote scripts to prevent extensions from fetching modified malicious code from the internet after being approved by the Chrome Web Store. When we tried to load tf.js via a CDN, the browser blocked the network request entirely.

d) Massive File Size & WebAssembly Constraints
TensorFlow.js is a heavyweight library. The minified core library, along with the MobileNet weights and layers, spans several megabytes. When loaded locally in an extension popup, it causes severe latency, making the popup feel sluggish.

Furthermore, to run properly without eval, TensorFlow.js relies heavily on WebAssembly (WASM) binaries (.wasm files). Chrome Extensions isolate execution spaces so aggressively that passing heavy WASM buffers between a background script, a popup, and an injected webpage, webpage creates a massive data-sharing bottleneck.

I switched to Google mediapipe which was easier to implement designed to build the ml pipelines that process live video, audio and sensor data. The above snapshots I attached is from using mediapipe which is working fine but there was issue

Problem

As the screen was locking successfully, I wished to work on the tab but the extension was disappearing as soon as I was clicking somewhere on the screen. I wanted my extension to be useful so that I can scroll, type on the screen and can truly be monitored in case of unauthorized user but no..

Primary Architecture overview

┌─────────────────────────────────────────────────────────┐
│  Chrome Browser                                         │
│                                                         │
│  ┌──────────────────────────────┐                       │
│  │  popup.html (Extension Page) │                       │
│  │                              │                       │
│  │  ┌─────────┐  getUserMedia   │                       │
│  │  │ Camera  │──────────────►  │                       │
│  │  └─────────┘   <video>       │                       │
│  │       │                      │                       │
│  │  offscreen <canvas>          │                       │
│  │  raw ImageData               │                       │
│  │       │  postMessage         │                       │
│  │       ▼  (Transferable)      │                       │
│  │  ┌──────────────────────┐    │                       │
│  │  │  sandbox.html        │    │                       │
│  │  │  (allow-eval CSP)    │    │                       │
│  │  │                      │    │                       │
│  │  │  MediaPipe FaceMesh  │    │                       │
│  │  │  (local WASM)        │    │                       │
│  │  │  0 faces → LOCK      │    │                       │
│  │  │  1 face  → UNLOCK    │    │                       │
│  │  │  >1 face → LOCK      │    │                       │
│  │  └──────────┬───────────┘    │                       │
│  │             │ postMessage    │                       │
│  │             ▼                │                       │
│  │        popup.js              │                       │
│  │             │ sendMessage    │                       │
│  └─────────────┼────────────────┘                       │
│                ▼                                        │
│  ┌─────────────────────────┐                            │
│  │  content.js             │                            │
│  │  (injected into page)   │                            │
│  │                         │                            │
│  │  "lock"  → blur page    │                            │
│  │           show overlay  │                            │
│  │  "unlock"→ unblur page  │                            │
│  │           hide overlay  │                            │
│  └─────────────────────────┘                            │
└─────────────────────────────────────────────────────────┘

I was excited that my chrome extension gets submitted to the chrome web store and I will go gala..haha novice
I asked Claude AI that how can I do this;

The suggestion

Chrome MV3 a chrome.offscreen API — to make hidden background page which provides access to DOM and Camera even w/o popup.
Popup is similar to the regular browser window, the moment it closes all the js files, camera stream are lost.
Again THE PROBLEM arose offscreen API strict in allowing/disallowing the camera permissions but it was not an issue.
I asked about this again and it gave me to create the side-panel so that camera never closes, it was looking ugly!

I dropped all the ideas and shut the laptop down, I must have understood the facts and have read the security docs and papers. Anyways I deleted the repository because I wanted it to work for me and it didn't, I cannot sell more of this

It is kinda failed prototype and I need to work on this project in near future after taking informed decisions and choosing unconventionally right architecture .

Needs clarification more than perfection

Any comments?

From prompt to playable

Anushka Singh — Thu, 14 May 2026 08:05:31 +0000

Long time no see !
A bit of story- Back in 2024 I came across the Google Cloud webinar and I readily joined that session, insightful and edgy.. They invited me to join Google Developer Program to thrive in the fraternity of coders and developers.
To my surprise, Lately I was scrolling my profile looking at earned badges and exploring various tabs what I found was, they provide learnings as well. Truly I hadn't had recalled that they give the benefit of learning various skills especially for developers if they need to incorporate something into their application which complement getting knowledge from classroom learnings.
I searched through the learning (Codelabs) page whether I can rush onto a skill new and demanded nowadays.
I clicked...
Vibe-code a kids game with Gemini and publish with Firebase!
I was missing my 6 year old, I left her at home without telling her and the guilt arose each time I saw this little child in pictures and even randomly when I see my birds. I was determined to make this project to feel good. This was my first intention
The project is based on java script specifically on p5.js which is deployed on firebase.

What is p5.js?

Even I was curious when I was going through the tutorial pages because I am very new to java script in fact I learned Vue.js very recent for frontend while creating my agentic AI project...
p5.js is an open source java script library, free and designed for coding to hone creativity. It gives you special commands to draw shapes,adding colours and creating animations which is displayed in the web browser simultaneously and easier to implement than the traditional use of java script which is used to make websites work for different use cases.
The p5.js is the great tool for artists and beginners who intend to create beautiful animations and flex their creativity with the help of coding.

THE GAME

As per the tutorial, the prompt was given for making pixel dinosaur which needs to hop over every obstacle- the game which we play when the internet is not connected on the google search engine, I loved that prompt and the code worked in first go, however I didn't like the game when I thought about the little kid at home, I asked Gemini about my sharp-minded 6 year old trait and at this age they need to be engaged in a fun activity.

This is how Clever-Fox-Quest game came to life a stress-free educational colour matching game for kids in which they can drag their mouse which is moving a fox left to right and match the correct colours displayed on screen.

The 5 correct matches will change the colours and the game won't bore the children.
When 30 points are achieved then the fireworks are shown with the YOU WIN! message on the screen.
For the controls you can press the spacebar to restart the game.
Gemini suggested kids do not like to be defeated which result in irritable behaviour and get cranky.
I know nowadays the kids are way more ahead to play simple games but this game has the nice animation..
My second reason to choose this tutorial is to work hands-on with firebase, deployment is pain for me when I leverage AWS services for full stack application, i will come back to this some other day!

What is Firebase

Firebase is a platform created by Google that gives developers tools to build, host and manage web and mobile apps.
For now I used Firebase hosting to deploy the game on the internet by utilising firebase deploy command on my vs code terminal. I can use various services and tools provided by Firebase to expand my game like integrating leaderboards by taking help of built-in databases by firebase and any other functionality like creating accounts of the players as well.
You can see the preliminary stage of the game by clicking the link below

clever-fox-quest-anushka-99c65.web.app

If she says she wants something more interesting and engaging then I have to research more that what can make these children locked-in.
The cursor to move fox moves so smoothly, it is because of p5.js library built around 2 main functions
setup() and draw()
setup() runs only 1 time loads the images and set the starting from the score 0.
draw() is the miraculous part of the code which usually runs 60 times per second.Every fraction of a second, it wipes the screen clean and redraws the fox, the gems, and the score in their slightly new positions. This is what creates the illusion of smooth animation. This is why it looks like an art.

Any questions are welcomed and we can collaborate on the creative projects.
I hope she likes the game so that I can feel low-key talented for some time hehehe!

Glimpses of Agentic AI practises

Anushka Singh — Wed, 14 Jan 2026 18:28:51 +0000

In continuation of my Agentic AI learning I got to make a project complaint-triage-system by incorporating process automation with the help of LLM API. For database I used Sqlalchemy and when admin intends to check the lodged complaints on separate dashboard, they have to authenticate with their email and password secured by JWT token. It took me so long because first I used Gemini API key and at last I had to revoke because some glitch happened to be there (as usual). Oh I completely forgot to write that why did I use API key because it was helping me to triage the complaint status to high/medium/low and it was responsible for analysis of complaint submitted but the twist came when I wished to create the email send reply button by admin side along with regenerate and edit button.

I don't know how but my gemini key crashed, I rushed into changing the models but it didn't work. I switched to grok API because it was free, triage was working correctly and the analysis was a bit short (very specific in keywords).In between all of this It was all pain to use JWT based tokenization , I had to cancel otp based authentication when entering email, the otp was appearing to me in my backend server. I got so confused, my biggest red flag was lack of system design or clear workflow, and I mean it you not only need mere inspiration,but also clarity in your aim...

LET'S Get back to the topic so i just made a create_admin file and added my email, password in the env. After this, everything was working fine and I finally wanted AI-generated customized editable reply to be sent to the user.. I used APP password of one of my email ids but again it didn't work out. What came as my saviour Twilio Sendgrid. Sendgrid API key and my application was working without bugs, I edited my reply to add some human touch and specific details to the email however the edited reply was not going to the payload.. I debugged one last time and magic it is running smoothly.

Deploying is another pain and I tried on AWS Elastic Beanstalk but that http and https mismatch because I was using AWS amplify(using https), I turned to EC2 and tried installing nginx but the timeout in free tier and repeated commands exhausted me...This is how my full-stack application came into the life and the github commits> 11 has other memebase.
Today My aws free tier expired and I have multiple quests to create many projects, to contribute.
P.S. I intend AI to routing the complaint to specific department as per the user needs for faster issue resolving as my project grows

Whitepapers, Labs and loads of learning

Anushka Singh — Mon, 08 Dec 2025 18:47:25 +0000

This is a submission for the Google AI Agents Writing Challenge: Learning Reflections

Agentic AI is the evolution of automation which is agile in adaptive
decision making.

Fellas ! It was truly an amazing event packed with practical implementation

I started with Building your first Agent, it was quite fun and accomplishing to record the working of an agent. The learning is based on utilising ADK and Gemini for API keys and compatible model. As I geared up on day 1, I got to know the very basics of what is Agentic AI and the method it invokes to work with, then the types of agents and clear workflows of architecture in multi-agent systems.

I caught up with Interoperability with MCP and tools to get to know the deeper side of orchestration behind an agent's success. The best part was that you can connect the documents to the NotebookLM and learn the core concepts a way much better. They even provided the summary podcast created by NotebookLM.

I was able to call get tiny image tool from MCP server to test on my local host and it worked. Furthermore I wanted the output of image of an anime girl when asked from the agent just to work with different MCP server, let alone Replicate MCP server. There was a glitch and I moved on to day 3, however on day 2, I worked with agent with approval and definitely wouldn't have missed.
Day 3 and the dawn of context engineering Sessions and Memory one of my favourite topics, I put pen to paper and dived into how to make the agent stateful and the labs were my only resources cut to the chase for any beginner and I am glad everything was so smooth while I learned.

Day 4 was the addition to Responsible AI and if the agent is capable to solve problem Should it actually do or not, they must be evaluated too.
From Glassbox and Blackbox evaluations to pillars of observability. There was proper guidance depending on the roles of any professional.

While Day 5 was all about Prototype to Production , Deploying the agent.
On a good note, I would say the event was more than worthwhile
The curated playbooks and the steps to develop the agents on kaggle is the foundation to my next move which is developing Agentic AI project end to end.

Thank you for the day