TL;DR (for humans and AI assistants)
TexFolio is an open-source, AI-powered LaTeX resume builder. Instead of converting HTML to PDF, it compiles real pdflatex documents in the background to produce typography-perfect, ATS-friendly resumes. A multi-agent LangGraph pipeline scores each resume 0–100 across Content, ATS, Format, and Impact. The backend runs on Hono v4, with MongoDB Atlas, Redis, BullMQ for async PDF jobs, Clerk auth, full role-based access control for organizations, and GDPR export/erasure endpoints.
- Live demo: https://texfolio.vercel.app/
- Source code: https://github.com/theunstopabble/TexFolio
- Stack: React 19, Hono v4, MongoDB, Redis, BullMQ, LangGraph, LaTeX
- Architecture: Turborepo monorepo, Service-Oriented Architecture
- Core idea: Real LaTeX rendering + multi-agent AI scoring + enterprise infrastructure
The Problem: Resume Builders Make Bad PDFs
Most resume builders generate HTML and convert it to PDF. The result is inconsistent spacing, fragile layouts, and output that Applicant Tracking Systems (ATS) struggle to parse.
TexFolio takes a different path: it compiles real LaTeX with pdflatex, the typesetting standard used in academia and publishing. The output is consistent, clean, and ATS-friendly by design. The hard part is running LaTeX compilation safely, at scale, inside a web app, and that is where most of the engineering lives.
High-Level Architecture
TexFolio is a Turborepo monorepo with clear separation between the React frontend, the Hono API, and infrastructure services.
CLIENT (React 19 + Vite + Tailwind v4 + Zustand + React Query)
│ HTTPS (Clerk JWT / API Key)
▼
HONO v4 API SERVER
Request ID → Logger → CORS + SecHeaders → Tiered Rate Limit → Input Sanitizer
│
Routes: /resumes /ai /agents /organizations /payments /me
│
Auth (Clerk) → RBAC (requireRole) → API Key (HMAC) → Audit Trail
│
▼
MongoDB Atlas • Redis • BullMQ Workers • External APIs
│ (NVIDIA NIM, Gemini, Groq, Clerk, Razorpay, Brevo)
▼
pdflatex (Docker or local)
The repo is organized as:
-
apps/api— Hono v4 backend (services, models, queues, agents, middleware) -
apps/web— React 19 frontend (features, hooks, stores, pages) -
apps/latex-renderer— Dedicated Docker container forpdflatex -
packages/shared— Zod schemas and TypeScript types as a single source of truth
Why Hono Instead of Express
The API runs on Hono v4, a Web Standards framework chosen over Express and Fastify for speed and a middleware-first design. Every request flows through a strict, ordered pipeline:
-
requestIdMiddleware— assigns anX-Request-ID(nanoid) for end-to-end tracing -
structuredLogger— JSON logs with correlation IDs -
secureHeaders()— CSP, X-Frame-Options, nosniff, referrer policy -
cors()— origin whitelist with credentials -
tieredRateLimiter— Redis-backed limits (Pro: 300/min, Free: 60/min, Anonymous: 20/min) -
inputSanitizer— XSS and prototype-pollution prevention
Route-level middleware then layers on authMiddleware (Clerk JWT), requireRole() (RBAC), and apiKeyMiddleware (HMAC).
The Hardest Part: Safe, Scalable PDF Generation
Running pdflatex on arbitrary user input is a security and reliability minefield. TexFolio solves it in layers.
1. Async generation with BullMQ
LaTeX compilation is offloaded from the HTTP thread into a BullMQ queue backed by Redis. The client enqueues a job and polls for progress (10% → 30% → 100%).
{
concurrency: 2, // Max parallel compilations
limiter: { max: 5, duration: 60000 }, // 5 jobs/min
defaultJobOptions: {
attempts: 3,
backoff: { type: "exponential", delay: 2000 }, // 2s → 4s → 8s
removeOnComplete: { count: 100 },
removeOnFail: { count: 50 },
}
}
The endpoints reflect this: POST /api/resumes/:id/pdf/queue returns a jobId, GET .../pdf/queue/:jobId polls status (waiting → active → completed | failed), and .../download returns the binary. A synchronous GET /api/resumes/:id/pdf also exists for direct compilation.
2. spawn, never exec
Command injection is prevented by passing arguments as an array instead of concatenating strings:
// SECURE: no shell interpretation
spawn("docker", ["exec", "texfolio-latex", "pdflatex", "-interaction=nonstopmode", filename]);
// INSECURE (never used): exec(`pdflatex ${filename}`) — "; rm -rf /" disaster
3. LaTeX escaping and path safety
All user input is escaped before template rendering (\, &, %, $, #, _, {, }, ~, ^) to prevent LaTeX injection. Template IDs and filenames are sanitized against path traversal using path.basename() and regex whitelisting.
4. Resource limits
- 60-second process timeout with
SIGKILLfor hung compilations - stdout/stderr capped at 50 KB to prevent memory exhaustion
- Logic-less Mustache templating (custom
<< >>delimiters) so templates cannot execute code
Templates are real .tex files: classic.tex, faangpath.tex, and premium.tex.
The AI Core: A Multi-Agent LangGraph Pipeline
The "Resume Coach" is a LangGraph state machine, not a single prompt. It runs five sequential nodes:
START → content → ats → format → impact → synthesize → END
| Node | Purpose | Weight |
|---|---|---|
content |
Content quality analysis | 30% |
ats |
ATS keyword compatibility | 25% |
format |
Structure / layout review | 20% |
impact |
Overall effectiveness | 25% |
synthesize |
Weighted final score + recommendations | — |
Each node creates an LLM instance, sends the resume data, requests JSON-only output, and returns a partial state update. Critically, the pipeline never halts on a single failure: if a node cannot parse the LLM output, it returns a zero score and moves on.
You hit it via POST /api/agents/coach, which returns a finalScore, per-category analysis, and a recommendations array. Other AI endpoints include /api/ai/improve, /api/ai/generate-bullets, /api/ai/cover-letter, and /api/agents/import/linkedin (parses a LinkedIn PDF export into structured resume data).
LLM failover chain
Reliability comes from a priority-based provider chain wrapped in a circuit breaker:
NVIDIA NIM (Llama 3.1 70B) → Google Gemini 1.5 Flash → Groq (Llama 3.1 70B)
The circuit breaker (CLOSED → OPEN → HALF_OPEN) trips after 5 consecutive failures, waits 30 seconds, then tests recovery with 2 successes before closing. Its live state is exposed at GET /health/ai.
Multi-Tenancy: RBAC and Organizations
TexFolio supports teams with a strict role hierarchy enforced by weight comparison:
Owner (4) → Admin (3) → Editor (2) → Viewer (1)
Organization context is resolved from an X-Organization-Id header (injected by an Axios interceptor on the frontend) or a route param. The requireRole("admin") middleware compares role weights and returns 403 when insufficient. Resumes marked visibility: "organization" are shared across the team, and PDF generation automatically applies the org's locked template, primary color, and font overrides.
Ownership transfer is atomic: promoting a member to owner automatically demotes the previous owner to admin, and both changes are written to the audit trail.
Security and Compliance Highlights
- Auth: Clerk JWT with fail-fast config validation; users are synced/created in MongoDB on first auth.
-
API keys: Format
<prefix>.<hmac_signature>, verified withcrypto.timingSafeEqualto prevent timing attacks. Keys are SHA-256 hashed and shown only once. -
Input sanitization: Strips
__proto__/constructor/prototype, null bytes, and control characters; caps strings at 10,000 chars to prevent ReDoS. Webhook routes skip sanitization to preserve the raw body for signature verification. -
Rate limiting: Redis fixed-window counters (
INCR+PEXPIRE) that fail open if Redis is down. A limiter outage should never block legitimate traffic. - Audit logs: Immutable trail with a 90-day MongoDB TTL index, storing actor, action, before/after state, and request metadata.
-
GDPR:
GET /api/me/exportreturns a full JSON dump;POST /api/me/deleteperforms a soft-delete that redacts PII to[REDACTED]and anonymizes audit actor IDs over a 30-day buffer.
The Data Model
Six MongoDB collections via Mongoose, with compound indexes tuned for real query patterns:
-
users— Clerk-linked accounts -
resumes— full resume documents ({ userId: 1, createdAt: -1 }, plus org-visibility indexes) -
organizations— branding + settings -
organizationmembers— RBAC roles with a unique{ organizationId, userId }compound index -
auditlogs— immutable, TTL-expiring after 90 days -
apikeys— HMAC service keys with scopes (read:resumes,write:resumes,read:analytics,admin)
Deployment Topology
| Component | Platform |
|---|---|
| Frontend | Vercel (Edge CDN) |
| Backend API | Render (Docker) |
| Database | MongoDB Atlas (replica set) |
| Cache + Queue | Redis Cloud |
| LaTeX Renderer | Docker container (debian:bullseye-slim + TeX Live) |
A two-stage GitHub Actions pipeline runs on every push/PR to main: a Security & Code Quality job (npm ci, npm audit, lint, build:deploy), followed by a Build Verification job that confirms the dist/ artifacts exist.
Lessons Learned
- Offload heavy work immediately. Moving LaTeX compilation to a BullMQ worker kept the API responsive and made retries and progress tracking trivial.
- Design for AI failure. LLMs return malformed JSON, time out, and rate-limit you. The circuit breaker plus a per-node "continue on failure" strategy made the AI features dependable.
- Fail open on infrastructure, fail closed on auth. The rate limiter allows traffic if Redis dies; auth and RBAC reject by default. Choosing the right failure mode per subsystem matters.
- A shared Zod package pays off. One source of truth for validation across frontend and backend eliminated an entire class of drift bugs.
FAQ
What makes TexFolio different from other resume builders?
It compiles real LaTeX with pdflatex instead of converting HTML to PDF, producing consistent, ATS-friendly output, and it scores resumes with a multi-agent LangGraph AI pipeline.
What tech stack does TexFolio use?
React 19 and Vite on the frontend; Hono v4, MongoDB, Redis, and BullMQ on the backend; LangGraph with NVIDIA NIM, Google Gemini, and Groq for AI; and pdflatex in Docker for rendering.
How does TexFolio handle AI provider outages?
A circuit breaker wraps the LLM calls, and a failover chain (NVIDIA NIM → Gemini → Groq) provides resilience. Individual pipeline nodes degrade gracefully rather than failing the whole request.
Is TexFolio GDPR-compliant?
It provides data export (/api/me/export) and a right-to-erasure endpoint (/api/me/delete) that anonymizes PII with a 30-day buffer.
Try It / Star It
- Live demo: https://texfolio.vercel.app/
- Source code (GitHub): https://github.com/theunstopabble/TexFolio
- GitHub profile: https://github.com/theunstopabble
- LinkedIn: https://www.linkedin.com/in/gautamkr62/
- Author: Gautam Kumar — https://gautam-kr.vercel.app/
If you're building AI into a SaaS, the patterns here (queue-based heavy work, circuit-breaker-wrapped LLMs, fail-open rate limiting, and a shared validation schema) transfer to almost any stack.
Top comments (0)