Sreejit Pradhan

Posted on May 28

My AI Kept Hallucinating Career Paths. I Abandoned the Project. GitHub Copilot Helped Me Fix What Was Actually Broken.

#githubchallenge #devchallenge #githubcopilot #nextjs

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

Every developer has a graveyard repo. You know the one. It lives in a pinned tab you stopped opening. The commit history stops mid-sentence. The README has a section called "Roadmap" that you wrote with too much ambition and too little sleep.

Mine was PathForge AI.

The idea was real: a career intelligence engine for students in India and Southeast Asia who don't have a guidance counselor, can't afford consultants, and are one bad decision away from a degree that doesn't match their goals, grades, or budget. You enter your marks, your dream career, your financial reality — PathForge gives you three ranked career paths, real institutions, scholarship intelligence, and what it called a "brutal honesty" score.

Good idea. And then the engine started hallucinating.

The Problem Nobody Likes Talking About with AI Career Tools

The institution matching engine was confidently wrong. Not occasionally wrong — structurally wrong. It would return a medical college in a tier a student couldn't afford, recommend a stream with a 12% subject overlap to their actual grades, suggest scholarships that had been discontinued. The probability scores looked precise — "78.4% fit" — but the math underneath was guessing.

This is the specific kind of brokenness that makes you close the laptop.

It wasn't a bug I could debug with a stack trace. It was an architecture problem. The AI reasoning layer had no anchoring system — no structured parameters to constrain what "good match" actually meant. The model was doing freeform pattern matching on career data and calling it intelligence. It wasn't. It was vibes with decimal points.

I had three commits in four months. The last one was: "fix: remove hallucinated university from results (again)".

That's when I stopped.

What Was Actually in the Graveyard

Here's what PathForge looked like before I came back to it:

Stack: Next.js 16, TypeScript, NVIDIA NIM (Llama-3.1-70b-Instruct), Prisma + Supabase, Clerk auth, Zustand for state.

What worked:

The 6-step onboarding wizard
Basic auth flow via Clerk
UI and design system (ember/dark forge aesthetic — still proud of this)
NVIDIA NIM integration was live

What was broken:

The institution matching engine — hallucinating ~40% of recommendations
No real scoring logic, just prompts asking the model to "rank these paths"
No parameter constraints on what constituted a valid match
No penalty system for budget mismatches or stream misalignment
The "Reality Check Engine" was aspirational text in a README, not code

The gap between what the README promised and what the code delivered was significant. I knew it. That's partly why I stopped — finishing it felt dishonest without fixing the core thing first.

The Comeback: What I Actually Fixed

The Finish-Up-A-Thon was the forcing function I needed. Here's what changed:

1. The Multi-Factor Probability Calculator — Now With Actual Math

The old version asked the LLM to produce a probability score. That's the problem. LLMs don't do probability. They do plausible-sounding probability.

The new scoring engine is deterministic:

// Multi-Factor Probability Calculator
// marks fit (40%) + stream fit (30%) + budget fit (20%) + base score (10%)
// + trend bonus/penalty adjustments

function calculateCareerScore(
  studentProfile: StudentProfile,
  careerPath: CareerPath
): number {
  const marksFit = calculateMarksFit(studentProfile.grades, careerPath.requiredMarks) * 0.40;
  const streamFit = calculateStreamFit(studentProfile.stream, careerPath.preferredStreams) * 0.30;
  const budgetFit = calculateBudgetFit(studentProfile.budget, careerPath.estimatedCost) * 0.20;
  const baseScore = 0.10;

  const trendBonus = getTrendBonus(careerPath.marketDemand);
  const penalties = applyPenalties(studentProfile, careerPath);

  return Math.min(100, (marksFit + streamFit + budgetFit + baseScore + trendBonus - penalties) * 100);
}

The LLM now receives this score and uses it as a hard anchor. It can't recommend a path with a 34% fit as a primary option. The math comes first. The language comes second.

2. The Matchmaking System — Parameter Constraints That Actually Constrain

The hallucination problem wasn't the model. The model was doing what models do: generating plausible text. The problem was that I was asking it to do constraint satisfaction without giving it constraints.

The new matchmaking system defines explicit parameters before the AI ever sees a student's profile:

Budget ceiling enforcement: if an institution's fee exceeds the student's stated budget by more than 15%, it gets filtered before the prompt is built
Stream compatibility matrix: a lookup table mapping board streams to career family compatibility scores — not inferred, hardcoded
Scholarship pre-filtering: institutions are matched against the live scholarship database before being passed to the model, not after
Minimum threshold gates: a career path below 45% combined fit score never reaches the output, regardless of what the model wants to surface

// Parameter constraint layer — runs BEFORE the AI prompt
function buildConstrainedCandidateSet(
  profile: StudentProfile,
  allPaths: CareerPath[]
): CareerPath[] {
  return allPaths
    .filter(path => path.estimatedCost <= profile.budget * 1.15)
    .filter(path => streamCompatibilityMatrix[profile.stream][path.family] > 0.4)
    .filter(path => calculateCareerScore(profile, path) >= 45)
    .sort((a, b) => calculateCareerScore(profile, b) - calculateCareerScore(profile, a))
    .slice(0, 6); // Top 6 candidates passed to AI for narrative generation
}

The AI now narrates. It no longer decides. That's the fix.

3. The Reality Check Engine — Actually Built This Time

The README mentioned this feature. The code did not have it. Now it does.

The Reality Check Engine generates specific flags, not generic warnings:

Budget gap flag: "Your stated budget (₹4L/year) is ₹2.8L below the average cost of top Engineering colleges in your shortlist. Here are 3 institutions within range."
Salary arbitrage flag: "Your target career (Data Science) pays a median ₹8.2L in Year 3. Your backup path (Actuarial Science) pays ₹11.4L in Year 3 with a 23% lower admission bar."
Survival odds flag: "4,200 students applied to your top-choice stream last year. 340 were admitted. Your profile puts you in the 61st percentile of that applicant pool."

These are not motivational statements. They are information.

4. Persistent Career Memory

Previously: your data lived in localStorage only, reset if you cleared your browser.

Now: Clerk + Supabase gives you persistent career profiles across devices. Your history is yours, and it's actually stored.

How GitHub Copilot Fit Into This

I want to be honest about how I used it, because the honest version is more useful than "Copilot wrote my app."

Where Copilot actually changed the work:

Clarifying what I was actually building. When I came back to the codebase after months away, I used Copilot Chat to explain my own code back to me. I'd highlight a function and ask: "What does this actually do and what are its failure modes?" That sounds embarrassing. It's also just accurate. It's faster than rereading 400 lines of cold TypeScript with no context.

The stream compatibility matrix. I had the concept but not the structure. I asked Copilot: "I need a lookup table that maps Indian board exam streams to career family compatibility scores between 0 and 1. What schema would you use for this?" It gave me a direction. I rewrote it substantially — the values are mine, the institution data is mine — but the schema idea saved me an hour of second-guessing.

Bug detection on the constraint layer. The budget ceiling calculation had an off-by-one logic issue where students who exactly matched the budget threshold were being filtered out instead of included. I'd been looking at it for 20 minutes. I pasted the function into Copilot and asked it to review for edge cases. It caught it in about 30 seconds.

Code review on the scoring function. Before I was confident the math was right, I asked Copilot to check whether my weighted scoring formula would behave unexpectedly at edge values (marks = 0, budget = maximum, stream = no match). It flagged a division-by-zero risk I'd missed in the marks fit calculation.

Where Copilot couldn't help:

The actual domain logic — what score a "Science stream" student should get when applying to an Arts career, what the right budget threshold multiplier should be, how to weight market demand trends — none of that came from Copilot. It doesn't know that the JEE Advanced has 150,000 serious applicants for 16,000 seats, or that a budget of ₹3L/year in India eliminates roughly 70% of private engineering colleges. That knowledge is local and specific. I had to supply it.

Copilot is a very good tool for the craft of code. It's not a substitute for knowing what you're building.

Where I'd Push Back

The constraint-based approach I built has a ceiling. It's good at filtering out bad matches. It's less good at surfacing surprising good matches — the career path a student wouldn't have thought to consider but actually fits them well.

The old hallucinating engine was wrong 40% of the time, but the remaining 60% occasionally included genuinely creative suggestions the deterministic system wouldn't generate. There's a version of PathForge that uses the constraint layer as a floor and the AI as an exploration layer on top. I haven't built that yet.

Also: the institution database is India-first. Southeast Asia support is planned but thin. The scholarship data needs regular updates to stay accurate. These aren't excuses — they're the honest version of the product right now.

Demo

Repo: github.com/ogMaverick12/pathforge-ai

The 6-step wizard takes about 90 seconds. Enter your stream, marks, dream career, and budget. What comes out is three ranked career paths with real institutions, probability scores built on actual math, and a Reality Check section that doesn't soften the numbers.

The Part That Surprised Me

Coming back to this project was harder than starting it.

Starting a project is pure possibility. Coming back to one you abandoned means confronting the gap between what you said you'd build and what you actually did. There's a specific discomfort in reading your own old comments — "// TODO: fix hallucination issue" — and knowing you left that there for months.

The Finish-Up-A-Thon forced the question: is this worth finishing, or is this a project I'm attached to for the wrong reasons?

PathForge is worth finishing because the problem is real. Students in India making career decisions on incomplete information, with no structured support, making choices they can't easily undo — that's not an abstract use case. The hallucinating engine was embarrassing. The deterministic scoring system isn't embarrassing.

That's the difference between the version I abandoned and the version I'm shipping.

What's the project in your graveyard that's worth coming back to? Drop the repo below — especially curious about anyone else who's hit the "AI is confidently wrong" wall and had to build structure around it to fix it.

Built by Vi-Bit Technologies. ⚡ Solving problems smarter, faster, and better.

Top comments (2)

Harjot Singh • May 29

the part where u went back to the project + actually shipped is the rare ending here. most ppl tell the abandonment story but never the rebuild. tangentially: been building moonshift to lower the abandonment rate at the SHIPPING step (db+auth+deploy wired automatically from one prompt). $3 per shipped saas instead of subscription, code into ur own gh + vercel. first run free if u ever want to test the rebuild flow w/ a different baseline.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.