Shae Bryant Smith

Posted on Nov 10

AI-Powered Career Portfolios on Agentic Postgres where candidates have the power

#devchallenge #agenticpostgreschallenge #ai #postgres

Agentic Postgres Challenge Submission

SmartFolio: Multi-Source Career Portfolios with Conversational AI

What I Built

SmartFolio is an AI-powered portfolio platform that aggregates career data from multiple sources (resumes, GitHub, LinkedIn) and makes it conversationally searchable using TimescaleDB Cloud (Tiger Data) with pgvector and pg_trgm extensions.

Key Features:

Resume parsing with GPT-4o → structured experiences/education/skills
GitHub project import - auto-parses READMEs for project descriptions and tech stacks
LinkedIn profile import - paste raw text, AI extracts normalized data
Conversational chat - ask natural language questions about candidates
Hybrid search - combines semantic (pgvector) + full-text (pg_trgm) retrieval
Grounded answers - GPT-4o synthesizes responses using only retrieved context
Provenance tracking - every data point tagged with its source

Demo: https://smartfolio.vercel.app/profile/shae

GitHub: https://github.com/sbsmith86/smartfolio

Why This Matters

The modern career isn't linear—it's a portfolio of experiences across companies, open source contributions, side projects, and continuous learning. Yet traditional resumes flatten this richness into a chronological list.

SmartFolio embraces the "portfolio career" paradigm (HBR)[https://hbr.org/2021/10/why-you-should-build-a-career-portfolio-not-a-career-path] where professionals curate diverse experiences rather than climbing a single ladder. By aggregating data from multiple sources and making it conversationally queryable, hiring managers can ask nuanced questions like:

"What PHP experience does this candidate have?" (retrieves jobs + GitHub projects)
"Tell me about their leadership experience" (synthesizes across multiple roles)
"What projects demonstrate full-stack skills?" (combines work history + open source)

Agentic Postgres Features Used

1. Semantic Search with pgvector

Every experience, education record, and skill is converted to a 1536-dimensional embedding using OpenAI's text-embedding-3-small model. When you ask a question, it's also embedded, and we use pgvector's cosine distance operator (<=>) to find semantically similar content:

SELECT id, "contentType", "textContent",
       1 - (embedding <=> $1::vector) as similarity
FROM knowledge_embeddings
WHERE "userId" = $2
ORDER BY embedding <=> $1::vector
LIMIT 10

This captures meaning and context, not just keyword matches. For example, "What Python experience" matches jobs mentioning Django, Flask, data science libraries—even if the job title doesn't include "Python Developer."

2. Full-Text Search with pg_trgm

PostgreSQL's pg_trgm extension provides trigram similarity matching for exact terms and acronyms:

SELECT id, "contentType", "textContent",
       similarity("textContent", $1) as similarity
FROM knowledge_embeddings
WHERE "userId" = $2
  AND similarity("textContent", $1) > 0.1
ORDER BY similarity DESC
LIMIT 10

This complements semantic search by ensuring specific technologies (PHP, AWS, React), company names, and industry terms are matched precisely—even with typos or variations.

3. Hybrid Scoring

We combine both approaches with a weighted blend:

const hybridScore = (semanticScore * 0.7) + (fulltextScore * 0.3);

70% semantic similarity - captures conceptual understanding
30% full-text similarity - ensures keyword precision

The system retrieves top candidates from each layer, merges them, and re-ranks by hybrid score to deliver the most relevant results.

4. Fluid Storage

When you upload a resume, import GitHub projects, or paste LinkedIn data, GPT-4o parses the content and immediately generates embeddings. There's no schema migration or manual indexing—new data becomes searchable instantly:

// After creating an Experience record
const embeddingText = `${position} at ${company}. ${description}`;
const embedding = await generateEmbedding(embeddingText);

await prisma.$executeRaw`
  INSERT INTO knowledge_embeddings
    ("userId", "contentType", "contentId", "textContent", "embedding")
  VALUES
    (${userId}, 'experience', ${experienceId}, ${embeddingText}, ${embedding}::vector)
`;

The knowledge_embeddings table grows dynamically as users add more information.

5. Agent Pattern (MCP)

SmartFolio follows the Model Context Protocol pattern with specialized agents:

Ingestion Agent (GPT-4o) - normalizes and structures incoming data
- Parses resumes → extracts experiences, education, skills
- Parses GitHub READMEs → extracts project descriptions, tech stacks
- Parses LinkedIn text → normalizes job titles, dates, companies
Query Agent (GPT-4o) - retrieves relevant context and synthesizes grounded answers
- Embeds the question
- Runs hybrid search to retrieve top 15 relevant items
- Builds context from retrieved experiences/education/skills
- Synthesizes answer using only provided context (temp 0.3)

This separation ensures data quality on write and intelligent responses on read.

How to Test

Visit the demo profile: https://smartfolio.vercel.app/profile/shae
No login required - the profile is publicly accessible
Chat is open by default - try asking:
- "What PHP experience does this candidate have?"
- "Tell me about their leadership experience"
- "What projects have they built?"
Click citations - each answer includes clickable source citations that scroll to the relevant section
Expand tech panel - see how hybrid search, pgvector, and pg_trgm work together

Technical Architecture

Stack:

Next.js 16.0.0 with App Router and Turbopack
TimescaleDB Cloud (Tiger Data) - PostgreSQL with pgvector and pg_trgm extensions
Prisma 6.18.0 - ORM with raw SQL for vector operations
OpenAI - GPT-4o for parsing/synthesis, text-embedding-3-small for embeddings
Vercel - deployment and hosting

Key Implementation Details:

Hybrid Search Optimization
- Increased result limit from 8 to 15 after diagnostic analysis
- Created debug-chat-search.js tool to analyze search quality
- Discovered fulltext threshold (0.1) was filtering out relevant results
- Now retrieves 10+ items and generates 3+ citations for tech-specific queries
Citation System
- Compact inline pills instead of large boxes
- Shows company name + position: "Technical Lead at Lantern"
- Tooltips reveal full excerpts on hover
- Clicking scrolls to specific experience/education/skill (not just section)
UX Refinements
- Auto-scroll only scrolls chat container (not entire page)
- Citations scroll to specific items with 100px offset
- Technology explanation panel educates judges on Agentic Postgres features
- Status badges clarify Active vs Demo features

Challenges & Solutions

Challenge 1: Semantic vs Keyword Mismatch

Problem: GitHub projects with explicit PHP/Laravel tech stacks weren't ranking high semantically for "What PHP experience does the candidate have?"

Root Cause: Question uses "experience" (career language) while projects use "service", "platform" (product language). Semantic search prioritizes jobs over projects.

Solution: Hybrid search with 30% fulltext weight ensures exact keyword matches get boosted. Also increased result limit to 15 to capture more diverse sources.

Challenge 2: Shallow Results

Problem: Chat only returned 1-2 relevant experiences for technology-specific questions.

Root Cause: Result limit (8) was too restrictive. Fulltext similarity threshold (0.1) filtered out many PHP mentions.

Solution:

Increased hybrid search limit from 8 to 15 results
Created diagnostic tool (debug-chat-search.js) to analyze search quality
Validated improvement: 3 citations vs 1 originally

What's Next

For Full Production:

🔗 Knowledge graph visualization - show relationships between skills/companies/projects
🤝 Team collaboration - share candidate profiles with hiring teams
🔒 Privacy controls - candidate decides what's public vs private

Lessons Learned:

Hybrid search quality depends on: embedding coverage, similarity thresholds, scoring weights, result limits, and LLM synthesis behavior
Diagnostic tooling is essential for optimizing search relevance
UI should be information-dense for secondary information (chat citations)
Semantic search captures meaning but can miss keyword-specific relevance when context differs

Conclusion

SmartFolio demonstrates how TimescaleDB Cloud's Tiger Data (pgvector + pg_trgm) enables sophisticated AI applications with:

Semantic understanding - finds conceptually similar content
Keyword precision - ensures exact terms match
Hybrid scoring - balances both approaches
Fluid storage - instant searchability without schema changes
Agent patterns - specialized ingestion and query agents

The result is a conversational interface that makes career portfolios queryable, verifiable, and grounded in actual data—not hallucinated summaries.

Try it yourself: https://smartfolio.vercel.app/profile/shae

Source code: https://github.com/sbsmith86/smartfolio

Built for the Agentic Postgres Challenge using TimescaleDB Cloud.