Fuad Hasan

Posted on Mar 3

I Built an AI That Automates Literature Reviews — Here's How It Works Under the Hood

#ai #automation #saas #researchtool

If you've ever had to do a systematic literature review — the kind where you manually search databases, download 80 PDFs, read each one, and paste findings into a spreadsheet — you know it's one of the most brutal parts of academic research. It takes weeks, sometimes months.

I built Research Room AI ([https://researchroomai.com]) to eliminate that pain. You type in a research topic, and the platform finds relevant papers, downloads the full-text PDFs (open-access only), reads them cover-to-cover with AI, and spits out a structured, exportable table of methodology, findings, and limitations.

What It Actually Does

The core user flow is four steps:

Define your topic — Enter your research subject + constraints
Secure full texts — The system identifies and downloads legal open-access PDFs
AI synthesis — An LLM reads each paper and extracts structured data
Export & analyze — Results land in a clean dashboard; download as CSV

The hard part isn't any single step — it's making all four work together reliably at scale.

The Tech Stack

Frontend: Next.js 15 (App Router) + Tailwind CSS
Auth: Supabase Auth
Database: PostgreSQL via Prisma ORM
Queue: BullMQ on Redis
Worker: Separate Node.js service (Docker)
AI: Groq (fast LLM inference)
Storage: Cloudflare R2
Payments: Paddle
APIs: OpenAlex, Semantic Scholar, Google Scholar

The Hardest Problem: Finding and Downloading PDFs Reliably

This was the most frustrating engineering challenge. Academic papers live across hundreds of different publishers, repositories, and paywalls. My approach:

Search OpenAlex / Semantic Scholar for papers matching the topic — these APIs return rich metadata including DOIs and, crucially, open-access PDF URLs.
Multi-source resolution — if the primary URL fails, fall back to Unpaywall, arXiv, PubMed Central, and institutional repositories.
Compliance guardrails — only download PDFs explicitly flagged as open-access. No paywalled content, ever.

The PDF resolver service (worker/src/services/pdf-resolver.ts) handles retry logic, redirect chains, and content-type validation. A surprising number of "PDF links" serve HTML error pages — you have to check mime types after download, not before.

The Worker Architecture

The main Next.js app and the AI processing worker are fully separate services. This was the right call:

The Next.js app stays fast and responsive — it just enqueues jobs
The worker can be scaled independently and redeployed without touching the frontend
Long-running AI tasks (reading a 40-page paper) don't block HTTP request cycles

Jobs flow through BullMQ queues backed by Redis. The worker picks up a job, downloads the PDF, sends the text to Groq for extraction, and writes structured results back to Postgres.

Simplified processor flow:
async function processJob(job) { const paper = await resolvePDF(job.data.doi); const text = await extractText(paper.pdfBuffer); const analysis = await groqAnalyzer.extract(text, job.data.topic); await prisma.paperResult.create({ data: analysis }); }

Groq's LPU inference is key here — it's fast enough that users see results streaming in within a reasonable time, rather than waiting 20 minutes.

The Database Schema Challenge

Every literature review has a different set of columns. One researcher wants sample_size, study_design, country. Another wants model_accuracy, dataset, limitations.

My solution: store extracted fields as a flexible JSON blob alongside a set of review-level column definitions the user can configure. This gives relational integrity for project-level data while keeping per-paper results flexible.

The Subscription Model

Free tier gets 3 review projects with a 30-day trial window. Premium unlocks unlimited reviews:

Plan	Price
Free	$0 — 3 reviews
Premium Monthly	$19/month
Premium Yearly	$149/year ($12.42/mo)

I used Paddle for billing because it handles global VAT/tax compliance out of the box, which would otherwise be a compliance nightmare for a solo founder selling to universities worldwide.

Lessons Learned

Decouple your AI work from your web server immediately.
I initially processed PDFs in a Next.js API route. The first time a user uploaded a 200-page paper, the request timed out. Move to an async queue from day one.
Academic APIs are inconsistent — build defensive parsers.
OpenAlex returns null for fields you'd expect to always have values. Semantic Scholar uses a different schema entirely. Write adapters for each source and never trust a field to always exist.
Rate limiting is not optional.
Without per-user rate limits on the processing queue, a single determined user could burn through thousands of API credits in minutes. BullMQ's job throttling saved me here.
Full-text > abstract for quality extraction.
Early versions only sent abstracts to the LLM. The quality of extracted methodology was poor. Sending the full paper text (chunked for context window limits) dramatically improved accuracy.
Stripe vs. Paddle — for research/academic niches, Paddle wins.
Universities and research institutions are often in the EU, UK, or APAC. Paddle being the Merchant of Record means they handle VAT calculation and invoice compliance, which academics often need for expense reimbursement.

Try It

If you do any kind of research — academic, market, scientific — give it a shot:

https://researchroomai.com

The free tier gets you 3 full literature reviews with no credit card required.

Happy to answer questions about any part of the architecture in the comments. Building AI tooling for academia is an underexplored niche with real pain to solve — the manual review process genuinely hasn't changed since the 1990s.

nextjs #ai #saas #webdev #researchtool

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.