<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: soohan abbasi</title>
    <description>The latest articles on DEV Community by soohan abbasi (@soohan_abbas).</description>
    <link>https://dev.to/soohan_abbas</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3928507%2Fa7b99d81-7717-4464-b5bd-1cfb8111d839.jpg</url>
      <title>DEV Community: soohan abbasi</title>
      <link>https://dev.to/soohan_abbas</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/soohan_abbas"/>
    <language>en</language>
    <item>
      <title>I Built an Offline AI Career Advisor Using Gemma 4 — Here's Exactly How It Works</title>
      <dc:creator>soohan abbasi</dc:creator>
      <pubDate>Wed, 13 May 2026 06:04:46 +0000</pubDate>
      <link>https://dev.to/soohan_abbas/i-built-an-offline-ai-career-advisor-using-gemma-4-heres-exactly-how-it-works-3hgc</link>
      <guid>https://dev.to/soohan_abbas/i-built-an-offline-ai-career-advisor-using-gemma-4-heres-exactly-how-it-works-3hgc</guid>
      <description>&lt;h1&gt;
  
  
  I Built an Offline AI Career Advisor Using Gemma 4 — Here's Exactly How It Works
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;A technical walkthrough of GuidanceOS: from model loading to multi-agent orchestration, running entirely on a Kaggle T4 GPU with no internet at inference time.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I teach Computer Science. Over the years, one thing I kept seeing was students who had decent skills but no idea what to do with them. They didn't know what jobs matched their profile, what courses to take next, or how to position themselves for a career. Career guidance platforms exist, sure — but they're mostly behind paywalls, require accounts, and need a stable internet connection.&lt;/p&gt;

&lt;p&gt;So I built GuidanceOS for the Gemma 4 Good Hackathon. The goal was simple: a fully offline AI system that takes your resume, figures out your skills, and gives you a complete career analysis — job matches, course recommendations, a 3-month learning plan, and an ATS score — all running locally on a GPU, no API calls at inference time.&lt;/p&gt;

&lt;p&gt;Here's exactly how I built it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Model Choice: Why Gemma 4 e4b-it
&lt;/h2&gt;

&lt;p&gt;The hackathon required using Gemma 4. Google released four variants: 2B, 4B (edge), 26B MoE, and 31B Dense. I went with &lt;strong&gt;gemma-4-e4b-it&lt;/strong&gt; for a specific reason.&lt;/p&gt;

&lt;p&gt;The "e" stands for edge-optimized. The "it" stands for instruction-tuned. On Kaggle's free T4 GPU (15GB VRAM), a naive load of even a 4B model can fail if quantization isn't handled right. With 4-bit NF4 quantization via BitsAndBytes, gemma-4-e4b-it loads in about 8.7GB — leaving headroom for inference.&lt;/p&gt;

&lt;p&gt;One problem I ran into immediately: the stable release of Hugging Face Transformers (5.0.0 at the time) didn't recognize the &lt;code&gt;gemma4&lt;/code&gt; architecture. Loading the model threw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: The checkpoint you are trying to load has model type `gemma4`
but Transformers does not recognize this architecture.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The fix was straightforward — install Transformers from the GitHub dev branch:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="sb"&gt;`&lt;/span&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;git+https://github.com/huggingface/transformers.git&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This bumped the version to &lt;code&gt;5.8.0.dev0&lt;/code&gt;, which includes the Gemma 4 model class.&lt;/p&gt;

&lt;p&gt;The second issue was GPU memory management. Using &lt;code&gt;device_map="auto"&lt;/code&gt; caused BitsAndBytes to split the model across CPU and GPU, which it doesn't allow in 4-bit mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ValueError: Some modules are dispatched on the CPU or the disk.
Make sure you have enough GPU RAM to fit the quantized model.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Solution: pin everything to a single GPU.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;AutoModelForImageTextToText&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;MODEL_PATH&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;quantization_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;bnb_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;device_map&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;dtype&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bfloat16&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, the model loaded cleanly in about 3 minutes and sat at 8.7GB on GPU 0.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Knowledge Base: TF-IDF Over 130K Records
&lt;/h2&gt;

&lt;p&gt;I used two datasets:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LinkedIn Job Postings&lt;/strong&gt; — 123,849 jobs with title, description, skills, location, experience level, and salary&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Coursera Courses 2024&lt;/strong&gt; — 6,645 courses with title, skills, description, level, rating, and URL&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For job and course matching, I built a TF-IDF index over combined text fields. For jobs, I concatenated the job title, skills description, and the first 300 characters of the full description. For courses, I combined the title, skills tags, and description.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jobs_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;combined_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;jobs_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;jobs_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;skills_desc&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
    &lt;span class="n"&gt;jobs_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;[:&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I fit a TfidfVectorizer with bigrams and 10,000 features:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jobs_vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TfidfVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;max_features&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;stop_words&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;english&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ngram_range&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;jobs_tfidf_matrix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jobs_vectorizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit_transform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;jobs_clean&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;combined_text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At query time, the user's skill string gets transformed by the same vectorizer and compared against the full matrix using cosine similarity. The top-k results come back in milliseconds — no GPU needed, no network call.&lt;/p&gt;

&lt;p&gt;I chose TF-IDF over dense vector search (FAISS + sentence embeddings) deliberately. Dense search needs an embedding model at query time, which adds latency and memory. TF-IDF is deterministic, fast, and reproducible — important when the whole point is offline-first operation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Inference Helper
&lt;/h2&gt;

&lt;p&gt;Before building agents, I needed a clean wrapper around Gemma 4's generation. The model uses a specific chat format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;ask_gemma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;300&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;formatted&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;bos&amp;gt;&amp;lt;start_of_turn&amp;gt;user&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;end_of_turn&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;start_of_turn&amp;gt;model&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;formatted&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;add_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;
    &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;to&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cuda:0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;no_grad&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;do_sample&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;top_p&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;repetition_penalty&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;1.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;pad_token_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;eos_token_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;eos_token_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;input_len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;shape&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="n"&gt;input_len&lt;/span&gt;&lt;span class="p"&gt;:],&lt;/span&gt; &lt;span class="n"&gt;skip_special_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;end_of_turn&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;end_of_turn&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noting here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;add_special_tokens=False&lt;/code&gt;&lt;/strong&gt; — because I'm manually prepending &lt;code&gt;&amp;lt;bos&amp;gt;&lt;/code&gt; in the prompt string. If you let the tokenizer add it automatically as well, you get a duplicate BOS token which confuses the model.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;repetition_penalty=1.3&lt;/code&gt;&lt;/strong&gt; — without this, the model loops. I found this out the hard way when my first test response was 200 repetitions of "matched matched matched".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decoding only new tokens&lt;/strong&gt; — &lt;code&gt;outputs[0][input_len:]&lt;/code&gt; strips the input tokens from the output before decoding. Otherwise you get the full prompt echoed back before the response.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Agents
&lt;/h2&gt;

&lt;p&gt;Each agent is a focused prompt sent to &lt;code&gt;ask_gemma&lt;/code&gt;. The agents run sequentially, not in parallel — this keeps memory usage flat and avoids context window issues.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent 1 — Skills Analyzer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Takes the raw resume text and returns a structured output in a fixed format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;TECHNICAL SKILLS: Python, NLP, LangChain, ...
SOFT SKILLS: Communication, Teaching, ...
EXPERIENCE: 5 years
LEVEL: mid
DOMAINS: Artificial Intelligence, NLP, Education
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I enforce the format in the prompt rather than post-processing with regex. Gemma 4 follows structured output instructions reliably when you give it an exact template to fill.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent 2 — Career Path Advisor&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Takes the extracted skills string and returns three career paths with job titles, required additional skills, USD salary ranges, and a growth potential score out of 10.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent 3 — Learning Plan Designer&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Takes the skills and target role and returns a 3-month plan broken down by month — foundation topics in month 1, intermediate topics in month 2, advanced topics and portfolio projects in month 3.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent 4 — Resume and ATS Analyst&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Takes the resume text and target role and returns an ATS score out of 100, three strengths, three improvement areas, missing keywords, and a suggested rewrite for the professional summary.&lt;/p&gt;

&lt;p&gt;The skills string extracted by Agent 1 is passed directly into Agents 2 and 3, creating a lightweight chain without needing LangChain or CrewAI overhead.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gradio Interface
&lt;/h2&gt;

&lt;p&gt;I used Gradio instead of Streamlit for one reason: on Kaggle, &lt;code&gt;app.launch(share=True)&lt;/code&gt; generates a public ngrok URL in a single line. No tunnel setup, no separate process.&lt;/p&gt;

&lt;p&gt;The interface has two inputs — resume text and target role — and six output tabs, one per agent plus job matches and course recommendations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GuidanceOS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Row&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;resume_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Textbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Resume Text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;role_input&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Textbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Target Role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;submit_btn&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Button&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze My Profile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;variant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;primary&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Column&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;scale&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tab&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Skills Analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
                &lt;span class="n"&gt;skills_out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;gr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Textbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# ... five more tabs
&lt;/span&gt;
&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;launch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;share&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I added &lt;code&gt;gr.Progress()&lt;/code&gt; to the main function so the UI shows which agent is running instead of just freezing. Each agent call takes 30-90 seconds on T4 — the progress bar makes it feel responsive.&lt;/p&gt;




&lt;h2&gt;
  
  
  End-to-End Flow
&lt;/h2&gt;

&lt;p&gt;When a user clicks Analyze:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Resume text → Agent 1 → structured skills profile&lt;/li&gt;
&lt;li&gt;Skills string → TF-IDF search → top 5 jobs from 123K LinkedIn postings&lt;/li&gt;
&lt;li&gt;Skills string → TF-IDF search → top 5 courses from 6.6K Coursera courses&lt;/li&gt;
&lt;li&gt;Skills string → Agent 2 → three career paths with salaries&lt;/li&gt;
&lt;li&gt;Skills string + target role → Agent 3 → 3-month learning roadmap&lt;/li&gt;
&lt;li&gt;Resume text + target role → Agent 4 → ATS score and improvements&lt;/li&gt;
&lt;li&gt;All outputs → six Gradio tabs&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total time: 3-5 minutes on a T4 GPU. All computation on-device. Zero external API calls.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Would Do Differently
&lt;/h2&gt;

&lt;p&gt;A few things I'd change with more time:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Structured JSON output from agents.&lt;/strong&gt; Right now the agents return free-form text. Enforcing JSON output would make the results easier to display in a proper UI — cards instead of plain text boxes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;FAISS for course search.&lt;/strong&gt; TF-IDF misses semantic similarity — "data analysis" and "analytics" are treated as different terms. Sentence embeddings with FAISS would improve course matching quality significantly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session persistence with SQLite.&lt;/strong&gt; The current setup doesn't remember previous conversations. Adding a lightweight SQLite store would let users build on previous sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;SHAP explainability.&lt;/strong&gt; I had planned to add a SHAP chart showing which skills drove each job recommendation using a Random Forest trained on the jobs dataset. It didn't make the deadline but the data pipeline supports it cleanly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It Yourself
&lt;/h2&gt;

&lt;p&gt;The full notebook is on Kaggle:&lt;br&gt;
&lt;a href="https://www.kaggle.com/code/abbasi110/guidanceos-gemma4-offline-career-advisor" rel="noopener noreferrer"&gt;kaggle.com/code/abbasi110/guidanceos-gemma4-offline-career-advisor&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Source code on GitHub:&lt;br&gt;
&lt;a href="https://github.com/soohanAbbasi/GuidanceOS" rel="noopener noreferrer"&gt;github.com/soohanAbbasi/GuidanceOS&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You need a Kaggle account to run it. Add the gemma-4-e4b-it model and both datasets, set the accelerator to GPU T4 x2, and run all cells in order. The Gradio URL prints in the last cell.&lt;/p&gt;




&lt;p&gt;That's the full build. If you have questions about any part of it — the quantization setup, the prompt templates, or the TF-IDF indexing — leave a comment and I'll answer.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>career</category>
      <category>llm</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
