DEV Community: Prakash Thapa

Stop managing assets with spreadsheets. Comodo helps you track equipment, manage inventory, monitor assignments, and stay organized from one app. Save time and improve accountability. Download: https://play.google.com/store/apps/details?id=com.comodo

Prakash Thapa — Mon, 01 Jun 2026 10:01:06 +0000

Asset Manager: Inventory App - Apps on Google Play

Manage tools, equipment & inventory in one powerful tracking app

play.google.com

Auth0 for AI Agents Challenge

Prakash Thapa — Thu, 16 Oct 2025 16:02:01 +0000

This is a submission for the Auth0 for AI Agents Challenge

What I Built

This application is a sophisticated AI Agent Assistant that demonstrates how to build secure, trustworthy, and controllable AI agents. At its core, it's a chat interface where a user can interact with an AI powered by the Gemini API.
However, its unique value lies in the simulation of a robust permissions layer, conceptually powered by Auth0 for AI Agents. This layer allows end-users to have granular control over the AI's capabilities.
The Problem It Solves
As AI agents become more integrated into our digital lives, they require access to personal data and the ability to perform actions on our behalf (e.g., read emails, book meetings, query databases). This creates a significant security and privacy challenge: How do you ensure an AI agent only does what it's explicitly allowed to do and only accesses the information it's permitted to see?
This application directly addresses this problem by providing a clear model for:
Secure Scoping: The AI's abilities are not limitless. They are strictly defined by the "Tools" (actions it can take) and "Knowledge Sources" (information it can access) that the user grants it.
User-Centric Control: The user, after authenticating via Auth0, acts as an administrator for their own AI agent. They can dynamically connect tools or grant access to knowledge through a simple and intuitive UI.
Enforced Boundaries: The application uses Gemini's systemInstruction feature to create a secure operational boundary for the AI. The agent is explicitly instructed to adhere to its given permissions, refuse requests that fall outside those permissions, and be transparent about which tools or knowledge sources it's using.
In essence, this application provides a blueprint for building agentic AI systems that are not only powerful but also secure, transparent, and trustworthy. It moves beyond simple chatbots to showcase a future where users can confidently delegate tasks to AI assistants, knowing they have full control over their digital footprint.

Demo

https://github.com/pt67/agent_auth

How I Used Auth0 for AI Agents

This application simulates a secure, user-controlled AI assistant by integrating Auth0 for AI Agents in the following ways:

User Authentication & Agent Ownership Auth0 handles user login and identity verification, ensuring that each AI agent session is tied to a specific authenticated user. This establishes clear ownership and accountability for agent actions.

Granular Permission Scoping After authentication, users can selectively grant access to tools (e.g., calendar, email, database) and knowledge sources. These permissions are scoped and stored per session, mimicking OAuth-style delegation.

Dynamic Capability Management The frontend UI allows users to toggle tools and data access dynamically. Auth0 ensures that only authenticated users can modify these scopes, and the agent reflects these changes in real time.

System-Level Boundaries via Gemini The Gemini API is configured with systemInstruction prompts that enforce the agent’s operational boundaries. These instructions explicitly tell the AI to:

Refuse actions outside its granted permissions

Disclose what tools or data it's using

Respect user-defined limits at all times

Security-First Design The architecture models a future-proof approach to agentic systems: one where users remain in control, agents are transparent, and access is always intentional.

This setup demonstrates how Auth0 can be used not just for login, but as a foundation for secure, trustworthy AI delegation.

Lessons Learned and Takeaways

Building this project was both challenging and rewarding. Here are some key reflections:

Challenge: Simulating Real Agent Boundaries Designing a realistic permissions model for an AI agent required careful thought. It wasn't just about toggling features—it was about enforcing boundaries the agent would respect.

Challenge: Aligning UX with Security Creating a UI that felt intuitive while still communicating the gravity of access control was tricky. I learned how important it is to make permission management feel empowering, not burdensome.

Lesson: System Instructions Matter Gemini's systemInstruction feature was pivotal. It taught me how much influence prompt engineering has over agent behavior, especially when simulating secure delegation.

Lesson: Auth0 Is More Than Login Auth0's flexibility allowed me to model agent ownership, permission scoping, and dynamic session control. It reinforced that authentication is the foundation of trustworthy AI.

Advice for Developers If you're building AI agents, treat them like collaborators with boundaries. Use authentication not just to verify users, but to define what agents can do. And always make those boundaries visible to users.

Google AI Studio Challenge Submission Template

Prakash Thapa — Fri, 05 Sep 2025 12:48:00 +0000

This is a submission for the Google AI Studio Multimodal Challenge

What I Built

The Problem It Solves
Many people face the daily challenge of looking at a collection of ingredients in their fridge or pantry and feeling uninspired or unsure of what to make. This often leads to food waste or repetitive meals. Traditional recipe searches require users to manually type in ingredients, which can be tedious and may not capture everything available.
The Experience It Creates
The Visual Recipe Assistant creates a seamless and intuitive experience to combat this problem:
Effortless Inspiration: Instead of typing, you simply snap a photo of your ingredients. The app takes this visual input and instantly provides you with complete, ready-to-make recipes. This removes the mental friction of meal planning and makes cooking more spontaneous and fun.
AI-Powered Culinary Creativity: The applet showcases the power of the Gemini API's multimodal understanding. It intelligently identifies various food items from an image and generates creative, relevant recipes complete with instructions, serving sizes, and even estimated nutritional information.
Reduces Food Waste: By suggesting recipes based on what you actually have, the app encourages you to use up ingredients before they spoil, promoting a more sustainable kitchen.
A Personalized Digital Cookbook: With the ability to save your favorite generated recipes, the app becomes a personal, ever-growing cookbook. The "Saved Recipes" feature ensures that you can easily revisit meals you enjoyed, building a curated collection tailored to your tastes and pantry staples.
In essence, the Visual Recipe Assistant transforms your phone's camera into a smart culinary partner, making meal discovery effortless, reducing food waste, and empowering you to be more creative in the kitchen.

Demo

https://youtu.be/nXbO-m_aLus

How I Used Google AI Studio

This application is a prime example of leveraging the powerful multimodal capabilities of the Google Gemini API, the same technology that powers Google AI Studio. Here’s a breakdown of how it was implemented:

Core Multimodal Capability: Fusing Image and Text Input The central feature of this app is its ability to understand and reason from two different types of input simultaneously: an image and a text prompt. This is a core strength of the Gemini models. Image Input (ImagePart): The user provides a photograph of their ingredients. This is the visual context. The gemini-2.5-flash model doesn't just see pixels; it performs sophisticated object recognition to identify the items as "tomatoes," "onions," "pasta," "herbs," etc. This is the "what do I have?" part of the equation. Text Input (TextPart): The image alone isn't enough. I pair the visual data with a carefully crafted text prompt: "Based on the ingredients in this image, suggest up to 3 simple recipes. For each recipe, provide the recipe name, a list of ingredients with quantities, step-by-step instructions, the serving size, and estimated nutritional information (calories, protein, carbohydrates, and fats)." This prompt gives the model its instructions—the "what should I do with this information?" part. It directs the model to act as a creative chef and to structure its response in a very specific way. The synergy of these two modalities allows the model to perform a complex task: it looks at the image, identifies the ingredients, and then uses that list as the basis for a creative text-generation task defined by the prompt.
Leveraging an Advanced AI Studio Feature: Structured Output (JSON Schema) A major challenge when working with large language models is getting consistently formatted output that can be easily used in an application. Getting back a plain block of text would require fragile and error-prone string parsing. To solve this, I leveraged one of the most powerful features available through the Gemini API, which you can also configure in AI Studio: Structured Output. responseMimeType: 'application/json': This tells the model that I expect the final output to be a valid JSON string. responseSchema: This is the most critical part. I provide the model with a detailed JSON schema that defines the exact structure of the data I want. I specified that the output should be an ARRAY of OBJECTs, where each object must contain: recipeName (a STRING) ingredients (an ARRAY of STRINGs) instructions (an ARRAY of STRINGs) servingSize (a STRING) nutritionalInfo (an OBJECT with specific string properties for calories, protein, etc.) By defining this schema, I force the model to organize its creative output into a predictable, machine-readable format. This eliminates the need for manual parsing and makes the integration between the AI response and the user interface seamless and robust. The application can directly take the JSON response, parse it, and render the recipe cards. In summary, this applet uses multimodal input (image + text) to understand a user's real-world context and leverages structured output (JSON schema) to transform the AI's creative response into reliable data that powers a dynamic and user-friendly experience.

Multimodal Features

The specific multimodal functionality I built is the core of this application: it fuses visual input (an image of ingredients) with a detailed text prompt to generate structured JSON data (recipes). This is a powerful combination that significantly enhances the user experience in several ways.
The Multimodal Functionality Breakdown:
Visual Understanding (Image Input): The user provides a photo of their available ingredients. The gemini-2.5-flash model leverages its sophisticated computer vision capabilities to identify the individual food items in the image. It doesn't just see a picture; it understands "these are tomatoes, that's an onion, I see a box of pasta." This acts as the factual, real-world context for the request.
Instructional Context (Text Input): The image alone is just data. The user's intent is provided through a carefully crafted text prompt that is sent simultaneously with the image. The prompt instructs the model to act as a recipe generator, specifying the desired output: "suggest up to 3 simple recipes... provide the recipe name, a list of ingredients with quantities, step-by-step instructions, serving size, and estimated nutritional information."
Structured Output (JSON): A key part of the implementation is forcing the model's response into a specific modality—structured application/json. By providing a responseSchema, the AI's creative text and numerical data are organized into a clean, predictable format that the application can immediately parse and render into UI components.
Why It Enhances the User Experience:
Intuitive and Effortless Interaction: The primary benefit is a massive reduction in friction. Instead of the tedious task of manually typing out a list of ingredients, the user performs a simple, natural action: taking a photo. This mimics asking a friend, "What can I make with this?" It's faster, more engaging, and feels almost magical.
Solves a Practical, Real-World Problem: This functionality directly addresses the common "what's for dinner?" dilemma. By starting with the user's actual inventory, the generated recipes are immediately actionable and relevant. This helps reduce food waste and encourages creativity with ingredients that might otherwise be overlooked.
Creates a Reliable and Polished UI: By combining the multimodal input with a strict JSON output schema, the application avoids the pitfalls of parsing messy, unstructured text. This ensures that the generated recipes are always displayed in a clean, consistent, and easy-to-read format. The UI is robust and professional because the AI's output is tailored to its specific needs, which is a superior user experience compared to displaying a raw block of text.
In essence, this multimodal approach transforms the user's phone camera from a simple image-capture device into a powerful culinary assistant, turning a snapshot of their kitchen counter into a personalized, actionable meal plan.

I made a simple click winner game with javascript.

Prakash Thapa — Sat, 25 Jun 2022 17:07:28 +0000

This game is very simple and easy to understand. I have build just now and posted here. If you are beginner on HTML, CSS and Javascript this game will teach you a lot.

Step 1: Lets create html file now:

click_winner/index.html

<!DOCTYPE html>
<html>
<head>
<title>Shooter Ball</title>

<link rel="stylesheet" href="css/style.css"/>
</head>
<body>

<div class="floor">
<div id="time"></div>
<div id="score-box">Score: <strong id="score">0</strong></div>
<div id="plane" disabled>Click Me</div>
</div>

<script src="js/script.js"></script>

</body>
</html>

Step 2:

Create CSS file on css folder now
click_winner/css/style.css

  width:500px;
  margin:auto;
  background:green;
  height:100vh;
}

#time, #score-box{
color:#fff;
padding:5px 10px;
} 

#plane{
border-radius: 50%;
padding: 10px;
height:50px;
width: 50px;
background:red;
text-align:center;
color:#fff;
box-shadow: 10px 10px 60px 0px rgba(0,0,0,0.75);
-webkit-box-shadow: 10px 10px 60px 0px rgba(0,0,0,0.75);
-moz-box-shadow: 10px 10px 60px 0px rgba(0,0,0,0.75);

 -webkit-user-select: none;  
  -moz-user-select: none;    
  -ms-user-select: none;      
  user-select: none;

cursor:pointer;  

}

Step 3:

Create a script file as well now in javascript.
click_winner/js/script.js

var pl = document.getElementById('plane');


//make a gravity for Plane;

var x =0;
var y=0;
var score = 0;


var timer_start = setInterval(makeMove, 10);

function makeMove(){
 x++;
 y++;


pl.style.transform = "translate("+ x +"px"+","+ y+"px)";


if(y >= 200){
  x = Math.floor(Math.random()*300);
  y = Math.floor(Math.random()*300);

 }
}



pl.onclick = ()=>{
score++

document.getElementById('score').innerHTML = score;
}

var time = 0;
setInterval(()=>{
time++;
document.getElementById('time').innerHTML = time + " Seconds left";

if(time == 60){
clearInterval(timer_start);
alert("Time Off. Restart the game again..")
}

}, 1000);

Conclusion

I hope you guys enjoyed my simple game. If you learned something new today. You can check out the source code here.

Thank you!