G ness

Posted on Dec 13, 2025

Kaggle Capstone: Resumay_I – Engineering a Multi-LLM Agent for Job Application Mastery

#devchallenge #googlekagglechallenge #ai #aiagents

This is a submission for the Google AI Agents Writing Challenge: [Learning Reflections OR Capstone Showcase]

Resumay_I: Behind the Scenes of Building a Robust Multi-LLM Agent for Job Applications

🚀 Introduction: The Job Market & The AI Memory Challenge

The modern job market is a relentless landscape. Crafting a tailored application for each role, analyzing intricate job descriptions, and anticipating recruiter expectations is a monumental task. But beyond the human challenge, lies a critical problem in AI agent development: AI memory and predictive reliability. Many AI solutions, especially those built on short-context LLM calls, struggle with maintaining a consistent understanding of their mission and accumulated knowledge. This "invisible ceiling" of forgotten instructions leads to agents that behave inconsistently, eroding trust and limiting their real-world utility.

Enter Resumay_I (pronounced "Resume I") – an advanced AI Agent meticulously engineered to address these challenges head-on. Resumay_I acts as your expert career architect, transforming the job application process by meticulously crafting, refining, and critiquing submission materials. Crucially, Resumay_I is built with robust mechanisms to ensure its intelligence and operational directives are consistently remembered, delivering predictable and trustworthy results every time.

🧠 The Blueprint: "Mega Prompt" & "Wisdom Pattern"

Resumay_I's unwavering reliability stems from a unique, two-pronged approach to agent memory and design:

The "Mega Prompt" (Foundational Memory Blueprint)

Our journey began by feeding a deeply detailed, multi-section prompt to a powerful LLM (Gemini 3.0). This "Mega Prompt" wasn't just a simple instruction; it was a comprehensive blueprint defining Resumay_I's persona (expert recruiter/hiring manager), role, context, specific directives, and even UI controls.

The key to its success and avoiding the common pitfalls of overly long prompts was our development process. We leveraged XMind, a mind-mapping tool, not just for organizing thoughts, but as a "visual programming environment."

The hierarchical structure of XMind allowed us to easily outline the system, context, and role text. This structured outline was then fed directly as a single, cohesive "Mega Prompt." This approach ensured consistency of thought across different parts of the system, establishing the agent's initial long-term memory and instruction set, and embedding its core mission from the outset.

The "Wisdom Pattern" (Iterative Architectural Memory)

As development progressed, we encountered various engineering challenges inherent to building complex multi-LLM agents on platforms like Kaggle. Each solution, architectural decision, and hard-won lesson was explicitly captured as a "Wisdom" directive (e.g., Wisdom 19: Atomic Dependency Resolution). This "Wisdom Pattern" became our mechanism for persistent, human-curated architectural memory. It ensured that critical design principles, robust engineering solutions, and "ground truths" were retained, documented, and consistently applied across iterations. This continuous learning and self-correction, much like a human engineering team documenting best practices, prevented the agent's architecture itself from "forgetting" the principles that guarantee its stability, reproducibility, and operational reliability.

This iterative "refactoring during iterations" approach, guided by real-time prompting and embedded "Wisdoms," was crucial. We even found that "submitting working code in the iteration process through the AI Context is kind of like automatic refactoring!" This dynamic feedback loop allowed us to build an extraordinarily stable and robust application, moving far beyond a "just get it to work" mentality. As the process matured, it naturally morphed into a few-shot prompt strategy, where established "Wisdoms" reduced the need for extensive re-prompting.

🤝 The Odd Couple: Gemini (Architect) & Qwen (Critic)

Resumay_I leverages a sophisticated multi-agent architecture orchestrated by LangGraph, intelligently combining two distinct Large Language Models (LLMs) for optimal performance and reliability:

The Architect (Google Gemini 2.0 Flash API): This cloud-based, high-context LLM serves as Resumay_I's primary workhorse. Gemini takes on the roles of retrieval, robust execution, and nuanced evaluation. It performs deep research, drafts initial resumes and cover letters, and orchestrates the overall application flow. Gemini's extensive knowledge and reasoning capabilities are vital for synthesizing complex information and generating tailored content.
The Critic (Qwen 2.5-1.5B-Instruct): This locally hosted, open-source model is specifically tasked with the critical role of an adversarial "Critic." Running efficiently on Kaggle's T4 GPU, Qwen rigorously evaluates Gemini's drafted materials, providing objective, often ruthless, critiques and suggestions for improvement. This competitive feedback loop between Gemini and Qwen significantly enhances the quality and effectiveness of the final application materials.

🛠️ Battling the Bots: Engineering Resumay_I's Stability (Wisdoms in Action)

To ensure Resumay_I's extraordinary stability and repeatability, especially within Kaggle's resource-constrained and headless environment, we implemented several robust engineering solutions, each a testament to our "Wisdom Pattern."

Dependency Hell? Not on Our Watch! (Wisdom 19: Atomic Dependency Resolution)

One of the most insidious challenges in Python development is "dependency hell," where conflicting package versions lead to unpredictable runtime errors. For a competition submission, where every execution consumes valuable quota, this is unacceptable.

Our solution: Atomic Dependency Resolution. We aggressively uninstall any potentially conflicting packages before executing a single, comprehensive pip install command that strictly pins all required library versions. This guarantees a pristine, reproducible environment, ensuring THE_FLOW runs flawlessly from the very first invocation.

# --- Section 5.1: Setup & Dependencies (Step 1) ---
# @title 1.1 Atomic Dependency Resolution
import subprocess
import warnings
warnings.filterwarnings('ignore')

print("📦 Initializing Resumay_I Environment: Atomic Dependency Resolution Mode")
print("-----------------------------------------------------------------------")

print("   🧹 Cleaning conflicting packages to prevent 'dependency hell'...")
# This aggressive uninstall ensures a pristine environment before installing specific versions.
# It's crucial for competition submissions to avoid unexpected runtime errors.
!pip uninstall -y langchain langchain-core langchain-community langgraph protobuf google-generativeai onnxruntime opentelemetry-api opentelemetry-sdk > /dev/null 2>&1

print("   🛡️ Installing all dependencies with strict version constraints (Wisdom 19)...")
# The single pip install command with pinned versions forces the resolver to find a compatible
# solution for all libraries at once, preventing cascading dependency conflicts.
# This is vital for reproducibility, stability, and conserving freemium/trial quotas.
!pip install -U -q --no-cache-dir \
    "protobuf>=3.20.3,<5.0.0dev" \
    "langchain-core==0.2.39" \
    "langchain==0.2.14" \
    "langchain-community==0.2.12" \
    "langgraph==0.1.19" \
    "langchain-text-splitters==0.2.4" \
    "chromadb" \
    "duckduckgo-search" \
    "google-search-results" \
    "requests" \
    "sentence-transformers" \
    "seaborn" \
    "matplotlib" \
    "wordcloud" \
    "ipywidgets" \
    "transformers" \
    "accelerate" \
    "bitsandbytes" \
    "networkx" \
    "scipy" > /dev/null 2>&1

print("\n   🔍 Verifying the installed environment...")
# 'pip check' confirms that all installed packages meet their dependency requirements,
# providing a final assurance of a stable environment.
!pip check
print("\n✅ Setup & Dependencies Complete: Environment is stable for Resumay_I's operations.")

API Chaos? Enter the Circuit Breaker! (Wisdom 13: Direct REST & Wisdom 23: Circuit Breaker Protocol)

Relying on external APIs introduces inherent instability, from rate limits (429 errors) to unexpected outages. To safeguard Resumay_I's operations, we developed a custom GeminiREST client. This implementation bypasses the official SDK, directly interacting with the Google API via HTTP (Wisdom 13: The "Direct REST" Protocol), resolving persistent protobuf/grpc dependency conflicts.

Crucially, GeminiREST incorporates a "Circuit Breaker" mechanism (Wisdom 23: The Circuit Breaker Protocol). If Gemini experiences consecutive failures or rate limits, the circuit "trips," and Resumay_I automatically falls back to our local Qwen model. This prevents agent crashes and conserves precious freemium quota.

# --- CUSTOM GEMINI REST CLASS WITH CIRCUIT BREAKER ---
# This custom class directly interacts with the Gemini API via HTTP, bypassing the official SDK.
# This approach (Wisdom 13: The "Direct REST" Protocol) resolves persistent protobuf/grpc dependency
# conflicts often encountered in complex Python environments, enhancing stability.
class GeminiREST:
    def __init__(self, api_key, model="gemini-2.0-flash"):
        self.api_key = api_key
        self.model = model
        self.url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={api_key}"
        self.headers = {"Content-Type": "application/json"}
        self.cache = {} # Simple in-memory cache to reduce redundant API calls
        self.circuit_breaker = False
        self.consecutive_failures = 0
        self.max_failures = 3 # Number of consecutive failures before tripping the circuit breaker

    def invoke(self, prompt):
        # Wisdom 23: The Circuit Breaker Protocol
        # If the circuit breaker is open, immediately return a fallback error.        
        if self.circuit_breaker:
            return type('obj', (object,), {'content': "Error: Circuit Breaker Open."})
        # Check cache before making an API call
        if prompt in self.cache:
            return type('obj', (object,), {'content': self.cache[prompt]})

        payload = {"contents": [{"parts": [{"text": prompt}]}]}
        max_retries = 3
        base_delay = 5.0 # Base delay for exponential backoff on 429 errors

        # Wisdom 30: Import requests locally to prevent NameError in exception block
        import requests

        for attempt in range(max_retries):
            try:
                response = requests.post(self.url, headers=self.headers, json=payload, timeout=30)
                if response.status_code == 429:
                    print(f"   ⚠️ Gemini API Busy (429). Retrying... ({attempt+1}/{max_retries})")
                    time.sleep(base_delay * (attempt + 1) + random.uniform(0, 2)) # Exponential backoff with jitter
                    continue
                response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)

                data = response.json()
                # Extract content from the nested JSON response structure
                text = data.get("candidates", [{}])[0].get("content", {}).get("parts", [{}])[0].get("text", "")

                if not text: 
                    raise ValueError("Empty response")

                self.cache[prompt] = text # Store successful response in cache
                self.consecutive_failures = 0 # Reset failure count on success
                return type('obj', (object,), {'content': text})

            except Exception as e: # Generic exception to be safe
                print(f"   ⚠️ Gemini API Error (Attempt {attempt+1}/{max_retries}): {e}")
                time.sleep(2) # Short delay before next retry for general errors

        # If all retries fail, increment failure count and potentially trip the circuit breaker
        self.consecutive_failures += 1
        if self.consecutive_failures >= self.max_failures:
            self.circuit_breaker = True
            print(f"   🚫 CIRCUIT BREAKER TRIPPED: {self.max_failures} consecutive Gemini failures. "
                  "Switching to Local Mode (Qwen) for remainder of run to conserve quota.")

        return type('obj', (object,), {'content': "Error: Max retries exceeded or API unavailable."})

The VRAM Trap: Taming Qwen on Kaggle's T4 (Wisdom 3: GPU Troubleshooting & Wisdom 4: Dependency Hell Prevention)

Deploying local LLMs on platforms like Kaggle's T4 GPUs presents a "VRAM Trap" – limited memory for larger models. Our "Critic," Qwen 2.5-1.5B-Instruct, was specifically chosen and optimized to overcome this.

We leveraged BitsAndBytesConfig for 4-bit quantization, a critical technique for efficiently loading models into GPU memory (Wisdom 3: GPU Troubleshooting). Furthermore, we explicitly used langchain_community.llms.HuggingFacePipeline (Wisdom 4: Dependency Hell Prevention) to ensure seamless compatibility with our strictly pinned LangChain versions, preventing further dependency conflicts.

print("\n⏳ Initializing Qwen 2.5-1.5B-Instruct (Critic - Local GPU Mode)...")
llm_qwen = None # Renamed from llm_phi to llm_qwen for clarity in this project
# This section addresses the "VRAM Trap" of Kaggle T4 GPUs (Observations) and
# GPU initialization challenges (Wisdom 3: GPU Troubleshooting).
if torch.cuda.is_available():
    print(f" - GPU Detected: {torch.cuda.get_device_name(0)}")
    try:
        model_id = "Qwen/Qwen2.5-1.5B-Instruct"
        # BitsAndBytesConfig enables 4-bit quantization, crucial for fitting
        # larger models into Kaggle's T4 GPU memory efficiently (Wisdom 3).

        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.bfloat16
        )
        tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
        model = AutoModelForCausalLM.from_pretrained(
            model_id, quantization_config=bnb_config, device_map="auto", trust_remote_code=True
        )
        # HuggingFacePipeline is used to wrap the local Qwen model (Wisdom 4),
        # ensuring compatibility with our pinned LangChain versions.        
        pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=1024, temperature=0.5)
        llm_qwen = HuggingFacePipeline(pipeline=pipe)
        print(" - Qwen 2.5-1.5B-Instruct Loaded on GPU for 'Critic' role.")
    except Exception as e:
        print(f"   ⚠️ GPU Init Error for Qwen: {e}. The local Critic may be offline.")
        print("   Please ensure 'Accelerator' is set to 'GPU T4 x2' in Kaggle Notebook settings.")
else:
    print("⚠️ WARNING: No GPU detected! The Qwen 'Critic' will be offline, and all tasks will default to Gemini.")

Interactive Dashboards in a Headless World (Wisdom 26: Enterprise Dashboard Protocol & Wisdom 32: Headless Submission Protocol)

Providing rich, interactive feedback is crucial, but rendering complex UIs in headless Kaggle environments can be tricky. Our solution was the Base64 Dashboard Protocol (Wisdom 26). Dynamically generated plots (like salary charts and jargon word clouds) are created entirely in-memory, converted to Base64 strings, and then directly embedded into the Python code for the ipywidgets dashboard. This ensures reliable rendering of visuals in both interactive development and final headless submission modes.

Combined with a safe_input function (Wisdom 32: Headless Submission Protocol) that gracefully handles user input in non-interactive environments, Resumay_I guarantees a consistent, rich user experience even under stringent competition conditions.

# DECISION NODE: Resumay_I's "Workflow Manager"
# Determines whether to continue iterating or finish the process.
def dashboard_node(state: AgentState):
    print("\n📊 DASHBOARD: Generated.")

    # Extract data from state for dashboard display.
    sal = state.get('salary_data', {}).get('data', {})
    # Provide default values if salary data is missing to prevent errors.    
    j, m, s = sal.get('Junior',0), sal.get('Mid',0), sal.get('Senior',0)
    mission = state.get('company_mission', 'N/A')
    jargon_b64 = state.get('jargon_b64', "") # Base64 image of jargon word cloud


    # Retrieve latest documents and critique.    
    res_vers = state.get('resume_versions', [])
    latest_resume = res_vers[-1] if res_vers else "No Resume Generated"
    cov_vers = state.get('cover_versions', [])
    latest_cover = cov_vers[-1] if cov_vers else "No Cover Letter Generated"
    crit_hist = state.get('critique_history', [])
    latest_critique = crit_hist[-1] if crit_hist else "No Critique Available"

    # Generate Salary Chart in-memory as a Base64 string.
    # Wisdom 26: The Enterprise Dashboard Protocol - Visuals on RAM.

    sal_b64 = ""
    try:
        plt.figure(figsize=(6,3))
        sns.barplot(x=['Jun', 'Mid', 'Sen'], y=[j, m, s])
        plt.title('Salary Range')
        img_io = io.BytesIO()
        plt.savefig(img_io, format='png', bbox_inches='tight')
        plt.close()
        img_io.seek(0)
        sal_b64 = base64.b64encode(img_io.read()).decode('utf-8')
    except: pass

    # Generate Safe Python Code for the ipywidgets Dashboard.
    # This dynamically constructed code will be executed in the next step (Wisdom 24).
    # Wisdom 6: Chat Output Stability - Use json.dumps() for all string literals to prevent corruption.
    # Wisdom 7: Frontend Parser Stability - Generate markdown backticks programmatically (chr(96)*3).
    marker = chr(96)*3

    code = f"""
import ipywidgets as widgets
from IPython.display import display, HTML
import base64, json

style = '''<style>.header-box {{ background: linear-gradient(90deg, #1e293b, #0f172a); padding: 20px; border-radius: 10px; color: white; }}</style>'''
header_html = widgets.HTML(value=style + f'''<div class="header-box"><h2>🚀 Resumay_I Architect Dashboard</h2></div>''')

img_layout = widgets.Layout(width='100%', max_width='600px')

# Salary Box
if "{sal_b64}":
    sal_img = widgets.Image(value=base64.b64decode("{sal_b64}"), format='png', layout=img_layout)
    sal_box = widgets.VBox([widgets.HTML("<h3>💰 Salary Analysis</h3>"), sal_img], layout=widgets.Layout(align_items='center'))
else: sal_box = widgets.HTML("<h3>💰 Salary Analysis</h3><p>No Salary Data Available.</p>")

# Jargon Box
if "{jargon_b64}":
    jar_img = widgets.Image(value=base64.b64decode("{jargon_b64}"), format='png', layout=img_layout)
    jar_box = widgets.VBox([widgets.HTML("<h3>☁️ Jargon Cloud</h3>"), jar_img], layout=widgets.Layout(align_items='center'))
else: jar_box = widgets.HTML("<h3>☁️ Jargon Cloud</h3><p>No Jargon Data Available.</p>")

# 1. Visuals Tab
visuals_tab = widgets.HBox([sal_box, jar_box], layout=widgets.Layout(justify_content='space-around'))

# 2. Documents Tab
# Use json.dumps for robust string embedding
res_val = json.dumps({{json.dumps(latest_resume)}}) 
cov_val = json.dumps({{json.dumps(latest_cover)}})

res_wid = widgets.Textarea(value=res_val.strip('"'), description='Resume', layout=widgets.Layout(width='98%', height='400px'), disabled=True)
cov_wid = widgets.Textarea(value=cov_val.strip('"'), description='Cover Letter', layout=widgets.Layout(width='98%', height='400px'), disabled=True)
docs_tab = widgets.Tab(children=[res_wid, cov_wid])
docs_tab.set_title(0, '📄 Resume')
docs_tab.set_title(1, '✉️ Cover Letter')

# 3. Critique Tab
crit_val = json.dumps({{json.dumps(latest_critique)}})
crit_wid = widgets.Textarea(value=crit_val.strip('"'), description='Critique', layout=widgets.Layout(width='98%', height='400px'), disabled=True)
crit_tab = widgets.VBox([widgets.HTML("<h3>⚖️ AI Critique</h3>"), crit_wid])

# Main Tabs
main_tabs = widgets.Tab(children=[visuals_tab, docs_tab, crit_tab])
main_tabs.set_title(0, '📊 Analytics')
main_tabs.set_title(1, '📄 Documents')
main_tabs.set_title(2, '⚖️ Critique')

display(widgets.VBox([header_html, main_tabs]))
"""
    return {"dashboard_code": code}# Return generated code and salary B64 for state tracking

🗺️ THE_FLOW: Orchestrating Intelligence with LangGraph

Resumay_I's operations are orchestrated via a LangGraph-powered state machine, THE_FLOW, which mimics the logical steps of an expert job application process:

Research Node: Gathers job description from a URL, extracts key information (company mission, sentiment, salary data, jargon), and indexes it into a RAG (Retrieval Augmented Generation) system (HuggingFace Embeddings + ChromaDB) which acts as Resumay_I's "external memory." It also generates a jargon word cloud. This node is where Resumay_I integrates diverse analyses (like the multiple contest analyses you mentioned, C5-C8) to form its own "ground truth" for strategic deliverables, going beyond a single LLM's initial grounding.
Draft Resume Node: Generates a tailored resume draft using Gemini (with Qwen as a fallback), informed by the RAG system and job analysis.
Draft Cover Letter Node: Creates a personalized cover letter.
Critique Node: Qwen (The Critic) rigorously evaluates the latest resume and cover letter drafts, providing detailed feedback. This is the core of the adversarial process.
Decision Node: Evaluates the critique and determines if further iterations (e.g., re-drafting based on feedback) are required or if the process should conclude. This enables iterative refinement and self-correction.
Dashboard Node: Generates the final, interactive ipywidgets dashboard, synthesizing all gathered data, generated documents, and critiques into a comprehensive, actionable overview.

📊 The Grand Finale: Your Interactive Career Strategist Dashboard

Resumay_I's final deliverable is an interactive ipywidgets dashboard, a comprehensive visual summary of its analysis and generated materials. This dashboard provides a clear, actionable, and visually engaging output, empowering job seekers with a strategic advantage.

The dashboard features:

Analytics: Visualizations of salary ranges and a jargon word cloud, providing quick insights into the job market and role.
Documents: The latest versions of your custom-generated resume and cover letter, ready for review.
Critique: The adversarial feedback from Qwen, highlighting specific areas for improvement, enabling targeted revisions.

✅ Conclusion: Impact & Future Outlook

Resumay_I stands as an innovative, stable, and intelligent AI Agent designed to revolutionize job application strategy. By deeply integrating solutions for agent memory through the "Mega Prompt" and "Wisdom Pattern," employing a robust multi-LLM architecture, and implementing meticulous engineering practices, Resumay_I demonstrates a high level of professionalism, repeatability, and reliability. This project is a strong candidate for the 2025 Google Kaggle Agents Intensive Capstone Project, showcasing practical AI agent development ready for real-world impact.

DEV Community