How Hindsight Helped My Job-Match Prompts

#ai #webdev

“Did it seriously just swallow that failure?”
I watched our CareerMind rejection endpoint return a full autopsy even when Hindsight calls failed, because we made memory logging fail-open instead of breaking user flows.
That moment changed how I think about “agent memory” in production. I used to treat memory quality as the main problem. In this codebase, the first real problem was reliability: if memory writes fail, do users still get a useful answer right now?
CareerMind is a Next.js App Router app that helps users upload resumes, run job-match checks, log rejections, and generate weekly reports. The core flow lives in API routes under api, with domain logic in career-intelligence.service.ts, AI prompting in ai.ts, and memory integration in hindsight.ts. Persistent state is Prisma/Postgres. Hindsight is the long-memory layer.
If you want background on the memory system itself, these are the three references I relied on while building and debugging this:
• Hindsight GitHub repository for production agent memory examples
• Hindsight documentation for retain and recall API behavior
• Vectorize agent memory architecture page for design context

CareerMind dashboard: resume signals, rejections, and strategy outputs in one place.

What I Thought Would Work
My first assumption was straightforward: “If I feed job descriptions plus a user profile into the model, I’ll get decent match recommendations.” That gave me outputs that sounded polished but drifted into generic advice. It didn’t incorporate what the user had actually tried, what had failed, or which previous strategy worked.
So I shifted the job-match prompt in ai.ts to use three memory layers:
• Episodic: timeline events like resume_uploaded, advice_given, rejection_logged
• Semantic: stable profile signals like skills, projects, strengths, weaknesses
• Reflective: aggregated patterns and outcome history
That made recommendations more grounded, but it also surfaced the second problem: memory calls can fail, timeout, or return awkward payloads. If I coupled endpoint success to memory success, users got 500s for what should be non-critical enrichment.

The Through-Line: Fail-Open Memory, Better Prompts
The core design decision in this repo is not “use memory.” It is “memory should improve quality, not decide availability.”
I implemented that in two places:
• hindsight.ts: all retain/recall wrappers catch failures and return safe defaults.
• Route handlers like route.ts and route.ts: AI path first, deterministic fallback second, then best-effort memory writes.
Snippet 1: Memory Writes That Refuse to Take Down Requests
// lib/hindsight.ts (simplified)
export async function logEvent(userId: string, type: EpisodicEventType, data: Record) {
let client: HindsightClient;
let bankId: string;

try {
const setup = await ensureBank(userId);
client = setup.client;
bankId = setup.bankId;
} catch {
return; // fail-open
}

const event = {
userId,
type,
timestamp: new Date().toISOString(),
metadata: data,
};

try {
await withRetry(() =>
client.retain(bankId, JSON.stringify(event), {
timestamp: event.timestamp,
tags: ["episodic", "event", event:${type}],
}),
);
} catch {
return; // fail-open
}
}

Why this shape:
• ensureBank + withRetry handles transient failures.
• Logging errors are swallowed intentionally.
• Request path stays healthy even when memory service is degraded.
Tradeoff:
• You can lose some events silently if retention fails.
• I accepted that because user-facing response latency and availability mattered more than perfect memory completeness.
I do think this deserves better observability (metrics/counters), but the fail-open behavior is the right default for this product.
Snippet 2: Prompting Job Match With Real Memory Context
// lib/ai.ts (simplified)
export async function analyzeJobMatch(userId: string, jobDescription: string) {
const [semantic, reflective, timeline] = await Promise.all([
getSemanticProfile(userId),
getReflectiveInsights(userId),
getUserTimeline(userId),
]);

const result = await runStructuredPrompt(jobMatchSchema, systemPrompt, [
"Memory context:",
JSON.stringify({ semantic, reflective, recentTimeline: timeline.slice(-8) }),
"Job description:",
jobDescription,
].join("\n\n"));

await logEvent(userId, "advice_given", {
source: "job_match_ai",
recommendation: result.recommendation,
missingSkills: result.missingSkills,
});

return result;
}

Why this improved output:
• The model sees actual recent behavior, not just static skills.
• Reflective insights introduce patterns like repeated failures or successful strategies.
• Timeline truncation (slice(-8)) avoids ballooning prompts.
The interesting part is that this keeps the prompt small but high signal. I’m not streaming raw history. I’m sending a compact memory distillation.

Where It Surprised Me
The surprising bug wasn’t hallucination. It was format drift.
Hindsight recall can return text that is mostly JSON but not always clean JSON. In hindsight.ts, I had to add a JSON extraction fallback that regexes object boundaries when direct parse fails. I don’t love that. But without it, the memory pipeline collapses whenever payload formatting is slightly off.
That one change had a practical effect: fewer empty semantic/reflective responses, which made job-match prompts less likely to degrade into generic output.
A second surprise was timeout behavior. In ai.ts, I added withTimeout around memory calls in resume extraction. Slow memory retrieval should not stall resume upload. The upload route in route.ts also races AI extraction with a 5-second timeout and falls back to heuristic parsing.
Snippet 3: AI First, Deterministic Fallback, Then Memory
// app/api/job/match/route.ts (simplified)
try {
result = await analyzeJobMatch(session.user.id, parsed.data.jdText);
} catch (aiError) {
const fallback = await matchJobForUser(session.user.id, parsed.data.jdText);
result = {
matchScore: fallback.score,
missingSkills: fallback.missingSkills,
strengths: fallback.matchedSkills,
recommendation: fallback.recommendation === "Apply" ? "APPLY" : "IMPROVE",
};
}

await prisma.adviceLog.create(...);
await prisma.careerEvent.create(...);
await logEvent(session.user.id, "advice_given", {...});

Why this route shape works:
• User always gets a result, even when AI fails.
• DB receives structured artifacts (adviceLog, careerEvent) regardless of AI path.
• Hindsight enrichment is additive, not mandatory.
Tradeoff:
• Fallback path is less nuanced than AI+memory path.
• But it preserves continuity and keeps telemetry flowing for future learning.

What “Better” Looks Like in Practice
Before this design, a bad external dependency day could cascade into failed user operations. Afterward, the endpoints became resilient in a way users can feel:
• Rejection logging still returns autopsy-style output even when AI or memory components fail.
• Job match still returns a score/recommendation path when model calls fail.
• Weekly report still renders with partial data when sources are degraded.
The report endpoint in route.ts is a good example. It computes an explicit degraded-data mode rather than pretending all sources are healthy.
Snippet 4: Degraded Data as a First-Class State
// app/api/report/route.ts (simplified)
const [userResult, resumesResult, applicationsResult, rejectionsResult, adviceLogsResult] =
await Promise.allSettled([...]);

const degradedData =
userResult.status === "rejected" ||
resumesResult.status === "rejected" ||
applicationsResult.status === "rejected" ||
rejectionsResult.status === "rejected" ||
adviceLogsResult.status === "rejected";

return NextResponse.json({
...report,
...(degradedData
? { warning: "Some data sources were temporarily unavailable." }
: {}),
});

This is boring in the best way. It tells the truth about system state and avoids hiding partial failure.

I confirmed the exact behavior: question generation persists structured sets to InterviewPrep, and mock feedback returns a scored critique with an improved answer. Next I’ll give you paste-ready article content in your current narrative style.## Interview Prep Wasn’t Just a Nice-to-Have
One feature I underestimated was interview prep. I initially thought it would be a thin wrapper around an LLM prompt, but in practice it became one of the few places where users got immediate, concrete value without waiting for long-term memory effects. In CareerMind, I generate role- and company-specific question sets (technical, behavioral, coding/SQL, plus focus areas), persist them, and let users run mock answers against the same context.
The generation path is straightforward but strict. The endpoint in route.ts validates payloads with Zod, calls generateInterviewQuestions, and then stores the output in InterviewPrep with separate fields for technical, behavioral, coding, and focusAreas. That structure matters: instead of returning one blob of text, I keep categories explicit so the UI can present targeted drills and I can attach sessions to a TargetCompany when it exists.
// app/api/interview/generate/route.ts (simplified)
const saved = await prisma.interviewPrep.create({
data: {
userId: session.user.id,
targetCompanyId: targetCompany?.id,
company: payload.company,
role: payload.role,
userSkills: payload.userSkills,
technical: questions.technical,
behavioral: questions.behavioral,
coding: questions.coding,
focusAreas: questions.focusAreas,
},
});

The mock interview endpoint in route.ts takes a real question-answer pair and returns a structured critique: score (0–10), strengths, improvements, and an improved answer draft. I force JSON responses with schema validation in interview.ts, because free-form responses looked impressive but were hard to use reliably in product flows.
// lib/interview.ts (simplified)
const mockFeedbackSchema = z.object({
score: z.number().min(0).max(10),
strengths: z.array(z.string()).max(8),
improvements: z.array(z.string()).max(8),
improvedAnswer: z.string().min(1),
});

Where this ties back to Hindsight is the learning loop: interview prep gives users actionable drills now, while the rest of the platform logs outcomes and patterns that can inform later recommendations. In other words, I’m not asking memory to do everything. Hindsight carries longitudinal context; interview prep handles short feedback cycles. That split kept the system useful even when memory retrieval or AI generation had rough edges elsewhere.

Interview prep output: categorized question sets and focus areas saved per company/role:

A Concrete Before/After Scenario
Here is a scenario I saw while iterating:
Before:
• User uploads resume.
• Runs job match.
• Gets recommendation that sounds reasonable but repeats broad advice.
• Logs rejection.
• Next job match barely changes because the prompt lacks structured memory from outcomes.
After:
• Resume upload updates semantic profile and logs resume_uploaded.
• Job match reads semantic + reflective + recent timeline before generating recommendation.
• Rejection logging stores patterns and critical gap in reflective memory.
• Next job match includes those patterns, so recommendation shifts from “apply now” to “close X first,” with concrete missing skills aligned to prior failures.
This isn’t “the model got smarter.” It’s that the prompt context got more honest about past behavior.

What I’d Change Next
I’m happy with the failure boundaries, but there are rough edges I’d tackle next:
• Add explicit observability around dropped memory writes. Fail-open is correct, silent is not.
• Strengthen schema contracts between recall payloads and parsing. Regex JSON rescue is pragmatic but brittle.
• Add idempotency keys for event logging to reduce duplicate episodic entries when retries and route retries overlap.
• Move from simple timeline slices to recency + relevance selection so prompts stay compact as history grows.
I’d also probably make the memory layer explicitly asynchronous in more routes. Some calls already use Promise.allSettled without awaiting completion, which helps latency, but the pattern is uneven across handlers.

Lessons I’d Reuse on the Next Project
• Treat memory as a quality layer, not a hard dependency. Your app should function without it.
• Split memory by purpose. Episodic, semantic, and reflective layers force cleaner retrieval and better prompts.
• Build deterministic fallback paths early. You will need them sooner than you think.
• Use strict JSON schemas for model outputs, but expect bad formatting and plan a recovery path.
• Expose degraded mode honestly to clients. Engineers trust systems that admit partial failure.
The main thing I learned is simple: the best job-match prompt in this codebase is not the cleverest wording. It is the one backed by memory that survives real failure modes.
If I had to summarize the design in one line: keep user flows reliable first, then let Hindsight make them progressively smarter over time.

DEV Community

How Hindsight Helped My Job-Match Prompts

Top comments (0)