Mani

Posted on Jun 3

How I Built an AI System That Turns Gmail Into a Job Tracker

#webdev #buildinpublic #nextjs #ai

tags: nextjs ai buildinpublic webdev showdev

I missed an interview because Gmail buried it under a Netflix receipt.

That mistake led me to build HireCanvas.

I was applying to 100+ jobs. Using one inbox for everything — bank statements, Wi-Fi recharges, LinkedIn alerts, and somewhere buried in that chaos, a Stripe interview invite I never saw in time. The opportunity was gone before I even knew it existed.

The data was right there. Every recruiter reply, every status update, every "we've decided to move on" — sitting in Gmail. Just not being read systematically.

So I built a system that reads it for me.

[SCREENSHOT: Landing page hero — hirecanvas.in]

This is the full technical breakdown. Architecture. AI extraction pipeline. Queue design. Security. CI/CD. What I'd do differently.

What It Does in One Diagram

YOUR INBOX (before HireCanvas)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📧 Bank statement
📧 Netflix invoice
📧 [MISSED] "Interview invitation — Stripe" ← gone
📧 Wi-Fi recharge
📧 "Your application to Google was received"
📧 LinkedIn: 10 new jobs for you
📧 "We're moving forward with other candidates" (Meta)
📧 "Interview scheduled — Vercel"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

AFTER HIRECANVAS SYNC
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ Stripe    → Interview    (pipeline updated, reminder set)
✅ Google    → Applied      (new entry created)
✅ Meta      → Rejected     (status updated)
✅ Vercel    → Interview    (pipeline updated, reminder set)
🚫 Bank / Netflix / Wi-Fi  → filtered, never hits AI
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

By the Numbers

Before the architecture — because metrics establish credibility fast.

Metric	Value
Database migrations	43
GitHub Actions CI runs	89
PRs merged to main	28
BullMQ workers	3
Noise filter stages before AI	5
AI pipeline stages per email	3
Provider fallback levels	5
Emails that actually reach AI	~15-25% of inbox
Cost per email extraction	~$0.0008 (Gemini 2.5 Flash)
Budget exceeded retry delay	12 hours
Manual production deploys	0

The Stack

Layer	Tech
Frontend	Next.js 15 App Router, React 19, TypeScript strict
Styling	Tailwind CSS 4 — mint `#f0fdfb` / teal `#14b8a6`
State	Zustand 5 (client) + TanStack Query 5 (server)
Database	Supabase — PostgreSQL + RLS + Realtime + Storage
Queue	BullMQ 5 + Redis (Valkey in production)
AI	Gemini 2.5 Flash / Claude Haiku 4.5 / GPT-4o / Ollama
Infra	Docker multi-stage, EC2, Nginx
CI/CD	GitHub Actions — lint → typecheck → audit → unit → e2e → deploy
Payments	Stripe + webhooks
Email	AWS SES
Security	AES-256-GCM token encryption, RLS everywhere, PII sanitization

Two decisions that shaped everything:

All AI work runs through a queue, never in an API route. Syncing 500 emails takes minutes. Serverless timeouts would kill it.
Table view, not Kanban. Every other job tracker uses drag-and-drop boards. A filterable table is faster to scan at 80+ applications.

[SCREENSHOT: Applications table — Netflix/Vercel/Meta/Google/Stripe at different stages]

System Architecture

┌─────────────────────────────────────────────┐
│           BROWSER (Next.js Client)           │
│   Zustand + TanStack Query (status polling)  │
└────────────────────┬────────────────────────┘
                     │ HTTPS / Server Actions
                     ▼
┌─────────────────────────────────────────────┐
│           NEXT.JS SERVER                    │
│   API Routes  |  Server Actions             │
└────────┬────────────────────┬───────────────┘
         │ reads/writes       │ enqueue job
         ▼                    ▼
┌─────────────────┐  ┌────────────────────────┐
│   SUPABASE      │  │   REDIS (BullMQ)       │
│   PostgreSQL    │  │   3 queues:            │
│   RLS + Auth    │◄─│   sync/extract/remind  │
│   Realtime      │  └──────────┬─────────────┘
└─────────────────┘             │
                    ┌───────────┼────────────┐
                    ▼           ▼            ▼
             ┌──────────┐ ┌──────────┐ ┌──────────┐
             │  SYNC    │ │EXTRACTION│ │ REMINDER │
             │ WORKER   │ │ WORKER   │ │ WORKER   │
             │          │ │          │ │          │
             │Gmail API │ │Gemini    │ │AWS SES   │
             │OAuth     │ │Claude    │ │Schedule  │
             │5-stage   │ │GPT-4o    │ │follow-ups│
             │filter    │ │Verifier  │ │          │
             └──────────┘ └──────────┘ └──────────┘

The key flow:

User hits Sync (or daily cron fires at 10 PM)
API route creates a BullMQ job and returns immediately — no timeout risk
Client polls via TanStack Query
Supabase Realtime pushes live updates to sync_status
User sees live progress indicator — no WebSockets needed

[SCREENSHOT: Dashboard — KPI cards + Daily Sync Report panel]

Database Design

43 migrations. Here are the three tables with non-obvious design decisions.

`processed_emails` — Dedup Without Storing Email Content

CREATE TABLE processed_emails (
  id uuid PRIMARY KEY,
  user_id uuid REFERENCES app_users,
  gmail_message_id text NOT NULL,
  content_hash text NOT NULL,   -- SHA-256(sender + subject + snippet)
  review_status text DEFAULT 'needs_review',
  created_at timestamptz DEFAULT now(),
  UNIQUE(user_id, content_hash)
);

We never store email bodies. A SHA-256 hash is enough to detect duplicates. If hash exists: skip. No AI call. No DB write.

`job_status_timeline` — State Machine as Append-Only Log

CREATE TABLE job_status_timeline (
  id uuid PRIMARY KEY,
  job_id uuid REFERENCES jobs,
  from_status text,
  to_status text NOT NULL,
  trigger_source text,    -- 'ai_extraction' | 'manual' | 'csv_import'
  confidence float,
  evidence_quote text,    -- verbatim quote from the email
  triggered_at timestamptz DEFAULT now()
);

Every status change is a new row, not an UPDATE. This gives full history for the timeline view — and enables insights like "this application sat at Interview for 12 days then went silent."

`ai_usage` — Per-User Cost Ledger

CREATE TABLE ai_usage (
  id uuid PRIMARY KEY,
  user_id uuid REFERENCES app_users,
  model text NOT NULL,
  stage text NOT NULL,          -- 'classifier' | 'extractor' | 'verifier'
  input_tokens int NOT NULL,
  output_tokens int NOT NULL,
  cost_usd numeric(10,6) NOT NULL,
  created_at timestamptz DEFAULT now()
);

Every single AI call is logged with exact cost. This feeds the daily budget cap system.

Full schema available in the GitHub repo.

Gmail Sync Engine

Incremental Sync — Gmail History IDs

// Try incremental sync first (only what changed since last run)
if (lastHistoryId) {
  try {
    messages = await listFromHistory(gmail, lastHistoryId);
  } catch (err) {
    if (err.code === 404) {
      // History expired after ~30 days — fall back to query sync
      messages = await listFromQuery(gmail, dateRange);
    }
  }
}
// Wide date ranges are sliced into 30-day chunks
// to prevent silent truncation from Gmail API limits

The 5-Stage Noise Filter

Only ~15-25% of emails survive this. That's the entire cost model.

Email arrives
     │
     ▼
[1] OUTBOUND CHECK     → sent mail? discard
     │
     ▼
[2] GMAIL LABELS       → PROMOTIONS/SOCIAL/FORUMS?
                          discard UNLESS known ATS domain
     │
     ▼
[3] SIZE GUARD         → HTML body > 50KB + not ATS domain?
                          discard (bulk newsletter)
     │
     ▼
[4] SHA-256 DEDUP      → hash(sender+subject+snippet) in DB?
                          skip (already processed)
     │
     ▼
[5] KEYWORD FAST-SKIP  → regex on subject + sender
                          newsletter pattern? discard
                          job pattern? proceed
     │
     ▼
  Enqueue for AI extraction

Full implementation: src/lib/gmail/noiseFilter.ts

The 3-Stage AI Extraction Pipeline

This is the most important engineering in the project.

The problem with one LLM call: hallucinations corrupt your data silently. A model invents a company name, misreads a rejection as an interview invite, returns confident garbage.

The solution: 3 stages. Each stage has one job. Stage 3 runs on a different provider than Stage 2.

Raw Email
    │
    ▼
[SANITIZER]     Strip SSN, credit cards, API keys, passwords
                Log PII patterns fired → extraction_audit_log
    │
    ▼
[STAGE 1]       Relevance Classifier
CLASSIFIER      Model: Gemini 2.5 Flash
                Input: sender + subject + first 800 chars

                Output: {
                  is_job_lifecycle: boolean,
                  email_type: 'interview_invite' | 'rejection' | ...,
                  confidence: 0.0-1.0
                }
    │
    │ is_job_lifecycle = true (or ATS domain override)
    ▼
[STAGE 2]       Structured Extractor
EXTRACTOR       Model: Gemini 2.5 Flash
                Input: first 2500 chars of sanitized body

                Output: {
                  company, role, status, recruiter_name,
                  interview_date, salary_range, ats_vendor,
                  low_confidence_fields: string[]
                }
    │
    ▼
[STAGE 3]       Cross-Model Verifier
VERIFIER        Model: Claude Haiku 4.5 (if Stage 2 = Gemini)
                    OR GPT-4o (if Stage 2 = Claude)
                NEVER same model as Stage 2

                Checks:
                  - Does company appear in email body?
                  - Does status match context?
                  - Can it find a verbatim evidence quote?

                Output: {
                  approved: boolean,
                  status_evidence: string,   ← QUOTE PROOF CHECK
                  corrections: {}
                }
    │
    ├── confidence >= threshold + evidence found
    │       → AUTO-ACCEPT: DB upsert, timeline row, reminder
    │
    └── confidence < threshold OR evidence missing
            → HUMAN REVIEW QUEUE ("Review Pending" in UI)

Why Different Models for Stage 2 and Stage 3?

Two models trained by different organizations on different data are very unlikely to hallucinate in the same way about the same input.

If Gemini invents a company name, Claude won't confirm it. Claude has no idea what Gemini was thinking. The disagreement surfaces the error.

The Quote Proof Check

This is the single most important safeguard in the pipeline:

// The verifier must return a verbatim quote from the email
// that justifies the status it assigned.
// We then do a literal string search.

const quoteExists = emailBody
  .toLowerCase()
  .includes(result.status_evidence.toLowerCase());

if (!quoteExists) {
  // Model fabricated a quote that doesn't exist → hard fail
  return { approved: false, reason: 'evidence_not_found_in_body' };
}

If the quote doesn't exist word-for-word in the email, the model hallucinated its evidence. Route to human review — never silently write to DB.

Full implementation: src/lib/queue/workers/processExtractionJob.ts

A Real Example

EMAIL:
From: recruiting@stripe.com
Subject: Interview Invitation — Engineering at Stripe

"Hi Alex, we'd like to invite you for a technical interview
with our engineering team. Scheduled for June 10th, 2PM PST via Zoom."

STAGE 1:  { is_job_lifecycle: true, type: "interview_invite", confidence: 0.98 }

STAGE 2:  { company: "Stripe", status: "interview",
            interview_date: "2026-06-10", role: null,
            low_confidence_fields: ["role"] }

STAGE 3:  { approved: true,
            status_evidence: "invite you for a technical interview" }

QUOTE CHECK: "invite you for a technical interview"
  → exists in body? ✅ YES

RESULT:
  → status updated to Interview
  → timeline row written (Applied → Interview)
  → reminder scheduled for June 9th, 9:00 AM

LLM Router and Circuit Breakers

AI providers go down. Rate limits hit. Credits run out. The product shouldn't go dark.

// On any provider failure:
await redis.set(`llm:cooldown:${provider}`, '1', 'PX', 25_000);
// Quarantined for 25 seconds, then auto-recovers

// Fallback chain:
async function getAvailableProvider(preferred: string): Promise<string> {
  for (const provider of [preferred, 'gemini', 'openai', 'claude', 'ollama']) {
    const cooling = await redis.exists(`llm:cooldown:${provider}`);
    if (!cooling) return provider;
  }
  return 'regex_fallback';  // last resort — no hallucinations possible
}

Fallback chain:

Gemini 2.5 Flash → GPT-4o → Claude Haiku → Ollama (local, free) → Regex Parser

Ollama handles expensive tasks like resume tailoring locally — protecting paid API keys for critical extraction work.

Daily AI Budget Caps

Without this, a user syncing 3,000 emails could generate a $50 bill in one night.

const DAILY_LIMITS_USD = { free: 0.05, pro: 0.25, elite: 0.50 };

// Runs before EVERY AI job
await assertWithinDailyAIBudget(userId, tier);

// When exceeded — not a failure, graceful degradation:
// 1. Update sync_status with user-visible warning message
// 2. Re-enqueue job with 12-hour delay (auto-retry)
// 3. User sees toast: "Daily AI limit reached. Retrying at 10 AM."

Token costs are calculated exactly per call:

Model	Input (per 1M)	Output (per 1M)
Gemini 2.5 Flash	$0.30	$2.50
Claude Haiku 4.5	$1.00	$5.00
GPT-4o	$2.50	$10.00

Full implementation: src/lib/ai/costGuard.ts

Security

Gmail Token Encryption

// AES-256-GCM encryption before storage
const encrypted = encryptToken(oauthRefreshToken, process.env.TOKEN_ENCRYPTION_KEY);
await db.oauthTokens.save({ userId, ...encrypted });

// Decrypted in-memory only during sync worker run
// Never logged. Never exposed to the browser.

PII Sanitization Before Every AI Call

// Runs on every email before it touches any external model
const PII_PATTERNS = {
  ssn:         /\b\d{3}-\d{2}-\d{4}\b/g,
  credit_card: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
  openai_key:  /sk-[a-zA-Z0-9]{20,}/g,
  github_pat:  /ghp_[a-zA-Z0-9]{36}/g,
  password:    /password\s*[:=]\s*\S+/gi,
};
// Every pattern fired is logged to extraction_audit_log (GDPR record)

Row Level Security on Every Table

-- Enforced at DB layer — not just the application layer
CREATE POLICY "users_own_jobs" ON jobs
  FOR ALL
  USING (user_id = auth.uid())
  WITH CHECK (user_id = auth.uid());
-- Same pattern on all 9 tables
-- Even a manipulated request returns zero rows

CI/CD Pipeline

main branch push
        │
        ▼
┌───────────────────────────────────────┐
│   PARALLEL JOBS                       │
│   Lint + npm audit --audit-level=high │
│   TypeScript strict check             │
│   Unit tests (Jest)                   │
└───────────────┬───────────────────────┘
                │ all pass
                ▼
         Next.js build + Docker build
                │
                ▼
     E2E tests (Playwright)
     Against REAL Supabase — not mocked
     Tests: auth flows, RLS, Stripe webhooks
                │
                ▼
     CD fires ONLY on CI success
     SSH → EC2 → docker compose up --build
     Health check: /api/health
       { status: "ok", db: true, redis: true }
                │
                ▼
            ✅ Deploy complete

npm audit --high on every PR — no manual dep reviews needed
Real Supabase in E2E — catches RLS edge cases mocks would miss
Health check gates the deploy — HTTP 200 alone is not enough

89 CI runs. 28 PRs. Zero manual deploys. Zero failed health checks.

The Toolkit

[SCREENSHOT: Resumes page — drag-and-drop upload, ATS Checker, AI Cover Letter buttons]

Resume Manager solves the resume_final_v3.pdf problem:

Resumes stored in Supabase Storage — accessible from any device
Each resume linked to the specific application it was used for
ATS Checker — scores resume vs job description, returns keyword gaps
AI Cover Letter — generates a tailored letter matched to the company's tone

[SCREENSHOT: Interview Prep page — question bank + Get AI Feedback button]

Interview Prep — 30 questions across Behavioral, Technical, Career, Situational categories:

Filter by category and difficulty
Type or record your practice answer
Get AI Feedback returns specific, constructive coaching
Elite tier gets this as a real-time coaching loop

Engineering Lessons

These generalize beyond this project.

1. The cheapest AI call is the one you never make.
Filter aggressively before you reach for a model. 80% of inbox emails get rejected before any LLM sees them.

2. Queue everything expensive.
If it takes more than 2 seconds, it doesn't belong in a web request. BullMQ from day one, not as an afterthought.

3. Never let a model verify its own output.
Cross-model verification exists for a reason. Two different training pipelines won't hallucinate the same way on the same input.

4. Human review beats silent corruption.
When confidence is low, flag it. Don't write bad data to the database. A "Review Pending" queue is more trustworthy than guessing.

5. Cost controls before launch, not after.
Per-user daily budget caps, exact token cost logging, graceful degradation. Build the cost floor before users arrive.

6. Least-privilege is not optional when you touch someone's inbox.
Read-only OAuth. Encrypted tokens. PII stripped before AI. RLS at every table. None of this is extra credit.

What I'd Do Differently

Chrome extension first, not last.
Gmail sync only catches jobs you've already applied to. One-click save from LinkedIn would capture the whole funnel. Wrong prioritization.

Gemini Batch API from day one.
50% cost reduction. My BullMQ queue is already set up for it. It just needs wiring. Free savings I left on the table.

Outlook in parallel with Gmail.
Gmail-only is a real market limiter. Microsoft Graph API has comparable read-only scopes. Should have built both simultaneously.

OpenTelemetry from day one.
I can tell you what a sync cost but not where a slow extraction spent its time. Structured traces from the start would have saved hours of debugging.

What's Next

[ ] Chrome extension for one-click job saving from LinkedIn
[ ] Outlook / Microsoft 365 integration
[ ] Gemini Batch API (50% cost reduction — queue is already ready)
[ ] Public API for power users
[ ] PWA (groundwork already in codebase)

If you're job hunting right now: try hirecanvas.in. Free tier gets you manual tracking + interview prep. Pro ($9.99/mo) unlocks Gmail sync + AI extraction.

If you're building LLM data pipelines: the cross-model verification + quote proof check is the pattern worth borrowing. It has caught more silent hallucinations than any other safeguard in the system.

Never let a model verify its own output.
Evidence must exist verbatim in the source.

Those two rules are what make this system trustworthy in production.

Questions on any part of the implementation — ask in the comments. I read everything.

Built with Next.js 15, Supabase, BullMQ, Gemini 2.5 Flash, Claude Haiku 4.5, and too many late evenings.
Live at hirecanvas.in

DEV Community

How I Built an AI System That Turns Gmail Into a Job Tracker

What It Does in One Diagram

By the Numbers

The Stack

System Architecture

Database Design

`processed_emails` — Dedup Without Storing Email Content

`job_status_timeline` — State Machine as Append-Only Log

`ai_usage` — Per-User Cost Ledger

Gmail Sync Engine

Incremental Sync — Gmail History IDs

The 5-Stage Noise Filter

The 3-Stage AI Extraction Pipeline

Why Different Models for Stage 2 and Stage 3?

The Quote Proof Check

A Real Example

LLM Router and Circuit Breakers

Daily AI Budget Caps

Security

Gmail Token Encryption

PII Sanitization Before Every AI Call

Row Level Security on Every Table

CI/CD Pipeline

The Toolkit

Engineering Lessons

What I'd Do Differently

What's Next

Top comments (0)

What It Does in One Diagram

By the Numbers

The Stack

System Architecture

Database Design

processed_emails — Dedup Without Storing Email Content

job_status_timeline — State Machine as Append-Only Log

ai_usage — Per-User Cost Ledger

Gmail Sync Engine

Incremental Sync — Gmail History IDs

The 5-Stage Noise Filter

The 3-Stage AI Extraction Pipeline

Why Different Models for Stage 2 and Stage 3?

The Quote Proof Check

A Real Example

LLM Router and Circuit Breakers

Daily AI Budget Caps

Security

Gmail Token Encryption

PII Sanitization Before Every AI Call

Row Level Security on Every Table

CI/CD Pipeline

The Toolkit

Engineering Lessons

What I'd Do Differently

What's Next

`processed_emails` — Dedup Without Storing Email Content

`job_status_timeline` — State Machine as Append-Only Log

`ai_usage` — Per-User Cost Ledger