DEV Community: David Angaya

We Built an AI That "Daydreams": Our Google Cloud Hackathon Story

David Angaya — Sun, 22 Jun 2025 21:26:17 +0000

When our team first started the Agent Development Kit Hackathon with Google Cloud, we were driven by a single, nagging question: Why do today's powerful AIs feel so... passive? They are brilliant calculators, but they often feel like a "polite guest who won't speak until spoken to."

We wanted to build a true partner—an AI that could feel alive. One that could get curious with you, understand the flow of your thinking, and have an "inner life" of its own.

That idea became Wise. In just under two weeks, we built and deployed a working prototype of a system that we believe represents a new paradigm for human-AI interaction. This is the story of how we built it.

(Disclaimer: We created this project and article for the purposes of entering the Agent Development Kit Hackathon with Google Cloud. #adkhackathon)

The Vision: An AI That Thinks With You

Our goal was to build more than a chatbot; we wanted to build a co-creator. Wise is designed to be a calm, focused space for thinking, built on two core concepts:

A Proactive "Inner Life": Wise has a DaydreamAgent that 'thinks' in the background. As a proof-of-concept, it analyzes a dataset to find interesting anomalies and proactively greets you with a data-driven "spark." It solves the "passive AI" problem by being an active participant in the conversation.
Cognitive Tuning: Wise doesn't have a single, static personality. By detecting the intent of your conversation, it switches between different Cognitive Lenses:
- The Analytical Lens for data-driven research and tool use.
- The Imaginative Lens for creative brainstorming and poetry.

How We Built It: The "SageMind Architecture"

Top AI researchers like Andrej Karpathy note that the future of AI applications lies in "multiple LLM orchestration." Our SageMind Architecture is our answer to this challenge.

For this hackathon, we made a crucial decision: instead of using a pre-built toolkit, we chose to build our own custom, lightweight agentic framework from first principles. This approach allowed us to demonstrate our deep understanding of agent orchestration by building the core logic from the ground up.

Our system is a unified Streamlit application where our custom-built Python agents are orchestrated to deliver a seamless experience.

The DaydreamAgent (The Proactive Dreamer):
This agent is our Curiosity Engine. We used Python and Pandas to analyze a sample stock history CSV. The agent identifies the day with the highest trading volume and then uses Gemini 1.5 Flash to synthesize a natural language "spark" from that data point, which is then presented to the user.
The ConversationalAgent (The Master Conductor):
This is the main agent the user talks to. Our Streamlit front-end acts as the orchestrator, passing the active "Lens" (as a system prompt) and the conversation history to this agent. Based on user intent, this agent intelligently chooses which internal Tool to use:
- In Analytical mode, it can use a "Data Analysis Tool" that leverages Gemini 1.5 Pro to write Python code on the fly, which is then executed to generate and display a Plotly graph.
- In Imaginative mode, it uses its creative prompting to generate prose or poetry.

This custom architecture allowed us to build a sophisticated, decomposed workflow, a principle core to modern AI systems.

The Full Stack

Application & Orchestration: Streamlit
Agentic Core: Python, Pandas, Plotly
AI Models: Google Gemini 1.5 Pro & Gemini 1.5 Flash
Cloud Deployment: Docker, Google Cloud Run

What's Next? From Partner to Platform

This hackathon was just the beginning. The SageMind architecture is a powerful foundation. Our next steps are to build out the interactive Mind Map—the visual "second brain"—and expand our library of Cognitive Lenses.

Our long-term vision is to evolve Wise into a true Multimodal Conductor. This is a system that doesn't just use internal tools, but intelligently orchestrates a 'dream team' of best-in-class AI models and APIs from across the industry—connecting to Google BigQuery for enterprise data, using a web search tool for real-time information, and leveraging multimodal models to reason from images and documents.

Building Wise taught us that the future isn't just about bigger models, but about smarter, more empathetic architectures. We're excited to continue building that future.

Thanks for reading! You can check out our project on Devpost and see our final demo video [Here - Link to come].

Beyond Basic Practice: Creating the JobSage AI Interview Simulator with Gemini & Embeddings

David Angaya — Fri, 11 Apr 2025 17:56:34 +0000

Introduction

: The Interview Gauntlet & The Feedback Gap

Technical interviews, especially for top companies like those in FAANG, are notoriously challenging. Preparing effectively often feels like navigating a maze blindfolded. While resources like LeetCode hone algorithmic skills, they often miss crucial aspects: How clear was my explanation? Did I cover the key concepts? How do I stack up against other candidates? Standard practice tools provide questions, but rarely offer the deep, personalized, and actionable feedback needed to truly improve.
This gap in feedback and personalized guidance was the inspiration for JobSage, my Google Gen AI Intensive Capstone project. My goal was to build an AI-powered mock interview coach that goes beyond simple Q&A, providing a more insightful, engaging, and actionable practice experience designed specifically for roles like Data Science and Machine Learning Engineering.

JobSage: What Makes It Different?

Instead of just presenting questions and checking answers, JobSage integrates several features powered by Generative AI to simulate a more comprehensive coaching experience:

1 Granular, Multi-Dimensional Feedback: Moves beyond a simple "correct/incorrect" or basic similarity score. JobSage evaluates answers on Content Accuracy/Relevance, Clarity, and Technical Depth using AI assessment.
2 Adaptive Difficulty: The interview adjusts (in simulation) based on performance, presenting easier or harder questions to keep the user appropriately challenged.
3 Skill Tracking & Benchmarking: Identifies relevant skills for each question, tracks performance per skill, and provides a benchmark against simulated FAANG applicant norms (e.g., "Estimated Top 25%").
4 Gamification: Incorporates points, badges, and leaderboard context to make practice more engaging.
Actionable Recommendations: Provides targeted study tips for weak areas, potential resume tweak suggestions based on performance, and relevant job recommendations upon "passing" the mock session.
5 Negotiation Simulation: Includes a module to practice salary negotiation scenarios with AI-generated dialogue and feedback.

Under the Hood: Leveraging Gemini and Embeddings

Building JobSage involved integrating several key GenAI techniques, primarily using the Google Gemini API (gemini-1.5-flash) and Sentence Transformers embeddings (all-MiniLM-L6-v2).

1. Embeddings: The Backbone of Understanding
Sentence Transformers are fantastic for capturing the semantic meaning of text in dense numerical vectors. In JobSage, embeddings are used in multiple ways via the get_text_embedding function:

 Content Similarity: We calculate the cosine similarity between the embedding of a user's answer and the embedding of pre-defined "ideal answer points". This gives a quantitative score (similarity_score) reflecting how closely the user's answer matches the core concepts, forming one component of the evaluation.

RAG for Job Recommendations: This is a core RAG application. The user's entire CV text is embedded. This CV embedding is then used as a query vector to find the most similar job descriptions from a pre-embedded list (jobs_df['embeddings']) using cosine similarity. The recommend_jobs function implements this semantic retrieval.

Simplified embedding usage for similarity

from sentence_transformers import util Assuming user_embedding and ideal_embedding are numpy arrays similarity_score = util.pytorch_cos_sim(user_embedding, ideal_embedding).item() * 10`

2. Gemini API: The Multi-Talented "Brain"

The Gemini API powers most of the generative and evaluative tasks:
AI-Powered Evaluation (evaluate_answer): This is perhaps the most interesting use. Instead of just checking keywords or similarity, Gemini acts as an expert interviewer. We use a structured prompt asking it to score the user's answer on Content, Clarity, and Depth (each out of 10) and provide specific textual feedback and suggestions for each dimension.

This leverages Gemini's understanding to provide nuanced scoring and feedback far beyond simple metrics. We parse the scores and feedback from the response text.

`Task: Evaluate the user's interview answer based on Content, Clarity, and Depth.
Instructions:

Content Score: Evaluate accuracy/correctness... Score 0-10. Provide rationale then 'Content Score: X/10'.
Clarity Score: Evaluate clear phrasing/structure... Score 0-10. Provide rationale then 'Clarity Score: X/10'.
Depth Score: Evaluate thoroughness/nuance... Score 0-10. Provide rationale then 'Depth Score: X/10'.
Overall Feedback: Provide 2-3 sentences... Start with 'Overall Feedback:'. Question: "{question_text}" Expected Answer Points/Keywords: "{ideal_answer_points}" Candidate's Answer: "{user_answer}" Response Format (Use EXACT keywords...): ...**Dynamic Follow-up Questions (generate_follow_up)**: Based on the user's answer, Gemini is prompted to ask a relevant, probing follow-up question, making the interaction feel more conversational. **CV Skill Extraction (extract_cv_skills)**: Gemini analyzes the raw CV text and extracts a list of relevant skills in a structured JSON format. This is more robust than basic keyword matching. **Negotiation Simulation (simulate_negotiation)**: Gemini generates an entire negotiation dialogue based on role/salary inputs and provides tailored feedback on the candidate's simulated strategy within that dialogue. **Few-Shot Prompting Attempt (generate_question)**: To demonstrate this capability, thegenerate_questionfunction includes code that formats examples from our question bank and prompts Gemini to generate a new, similar question. While we use predefined questions for evaluation consistency in the demo, this shows the technique. **3. RAG: Connecting Performance to Opportunities** Retrieval Augmented Generation concepts are applied in two key ways: **Job Recommendations**: As mentioned, comparing the CV embedding against job embeddings is a direct application of retrieval based on semantic similarity. **Study Recommendations**: This uses context-based retrieval. Therecommend_study_topicsfunction identifies the user's weakest skills (derived context from the interview performance) and retrieves relevant study advice from our structuredstudy_rec_df.`

The Implementation: Kaggle Notebook & Gradio Demo

The entire project was developed within a Kaggle Notebook, making it self-contained and reproducible. Key libraries included pandas for data management, sentence-transformers for embeddings, google-generativeai for the Gemini API, and gradio for the interactive demo.
The notebook simulates a full interview session, printing out the questions, evaluations, analysis, and recommendations (see Cell 7 output in the notebook).
To provide interactivity, a Gradio interface was built (Cell 9), allowing users to engage with the core interview cycle and the negotiation simulator directly within the notebook's output environment.

Challenges and Limitations

Building JobSage within the Capstone timeframe came with challenges:

Prompt Engineering: Getting Gemini to consistently return structured output (especially the JSON for evaluation scores/feedback) required careful prompt design and iteration. Handling potential API errors or blocked prompts was also necessary.
Static Data & Simulation: Relying on placeholder data for questions, jobs, and benchmarks limits realism. The true power would come from larger, curated databases and real user data (anonymized) for benchmarking.
Notebook Environment: While great for prototyping, implementing features like real-time state updates across Gradio tabs or voice I/O is difficult within a standard notebook.
Evaluation Complexity: Evaluating open-ended technical answers is inherently complex. While the combination of embeddings and LLM assessment provides depth, it's still an approximation of expert human judgment.

Future Vision

This Capstone project is just the beginning for JobSage. Future steps include:

1. - Developing a full-stack web application (Flask/FastAPI backend, database, React/Vue frontend).
2. - Integrating live job APIs.
3. - Building robust user profiles and persistent storage.
4. - Refining the AI evaluation and feedback mechanisms.
5. - Adding voice capabilities.
6. - Potentially exploring privacy-preserving techniques using local models for certain features.

Conclusion

Jobsage demonstrates how multiple GenAI tools can be orchestrated to create a significantly improved interview preparation experience. By providing detailed, AI-driven feedback, personalized recommendations, and engaging features, it aims to empower job seekers to approach technical interviews with more confidence and insight. This project was a fantastic learning experience during the Google GenAI Intensive course, highlighting the practical power of embeddings, LLMs, and RAG in solving real-world problems.