If you've ever had to do a systematic literature review — the kind where you manually search databases, download 80 PDFs, read each one, and paste findings into a spreadsheet — you know it's one of the most brutal parts of academic research. It takes weeks, sometimes months.
I built Research Room AI ([https://researchroomai.com]) to eliminate that pain. You type in a research topic, and the platform finds relevant papers, downloads the full-text PDFs (open-access only), reads them cover-to-cover with AI, and spits out a structured, exportable table of methodology, findings, and limitations.
What It Actually Does
The core user flow is four steps:
- Define your topic — Enter your research subject + constraints
- Secure full texts — The system identifies and downloads legal open-access PDFs
- AI synthesis — An LLM reads each paper and extracts structured data
- Export & analyze — Results land in a clean dashboard; download as CSV
The hard part isn't any single step — it's making all four work together reliably at scale.
The Tech Stack
Frontend: Next.js 15 (App Router) + Tailwind CSS
Auth: Supabase Auth
Database: PostgreSQL via Prisma ORM
Queue: BullMQ on Redis
Worker: Separate Node.js service (Docker)
AI: Groq (fast LLM inference)
Storage: Cloudflare R2
Payments: Paddle
APIs: OpenAlex, Semantic Scholar, Google Scholar
The Hardest Problem: Finding and Downloading PDFs Reliably
This was the most frustrating engineering challenge. Academic papers live across hundreds of different publishers, repositories, and paywalls. My approach:
- Search OpenAlex / Semantic Scholar for papers matching the topic — these APIs return rich metadata including DOIs and, crucially, open-access PDF URLs.
- Multi-source resolution — if the primary URL fails, fall back to Unpaywall, arXiv, PubMed Central, and institutional repositories.
- Compliance guardrails — only download PDFs explicitly flagged as open-access. No paywalled content, ever.
The PDF resolver service (worker/src/services/pdf-resolver.ts) handles retry logic, redirect chains, and content-type validation. A surprising number of "PDF links" serve HTML error pages — you have to check mime types after download, not before.
The Worker Architecture
The main Next.js app and the AI processing worker are fully separate services. This was the right call:
- The Next.js app stays fast and responsive — it just enqueues jobs
- The worker can be scaled independently and redeployed without touching the frontend
- Long-running AI tasks (reading a 40-page paper) don't block HTTP request cycles
Jobs flow through BullMQ queues backed by Redis. The worker picks up a job, downloads the PDF, sends the text to Groq for extraction, and writes structured results back to Postgres.
Simplified processor flow:
async function processJob(job) {
const paper = await resolvePDF(job.data.doi);
const text = await extractText(paper.pdfBuffer);
const analysis = await groqAnalyzer.extract(text, job.data.topic);
await prisma.paperResult.create({ data: analysis });
}
Groq's LPU inference is key here — it's fast enough that users see results streaming in within a reasonable time, rather than waiting 20 minutes.
The Database Schema Challenge
Every literature review has a different set of columns. One researcher wants sample_size, study_design, country. Another wants model_accuracy, dataset, limitations.
My solution: store extracted fields as a flexible JSON blob alongside a set of review-level column definitions the user can configure. This gives relational integrity for project-level data while keeping per-paper results flexible.
The Subscription Model
Free tier gets 3 review projects with a 30-day trial window. Premium unlocks unlimited reviews:
| Plan | Price |
|---|---|
| Free | $0 — 3 reviews |
| Premium Monthly | $19/month |
| Premium Yearly | $149/year ($12.42/mo) |
I used Paddle for billing because it handles global VAT/tax compliance out of the box, which would otherwise be a compliance nightmare for a solo founder selling to universities worldwide.
Lessons Learned
Decouple your AI work from your web server immediately.
I initially processed PDFs in a Next.js API route. The first time a user uploaded a 200-page paper, the request timed out. Move to an async queue from day one.Academic APIs are inconsistent — build defensive parsers.
OpenAlex returns null for fields you'd expect to always have values. Semantic Scholar uses a different schema entirely. Write adapters for each source and never trust a field to always exist.Rate limiting is not optional.
Without per-user rate limits on the processing queue, a single determined user could burn through thousands of API credits in minutes. BullMQ's job throttling saved me here.Full-text > abstract for quality extraction.
Early versions only sent abstracts to the LLM. The quality of extracted methodology was poor. Sending the full paper text (chunked for context window limits) dramatically improved accuracy.Stripe vs. Paddle — for research/academic niches, Paddle wins.
Universities and research institutions are often in the EU, UK, or APAC. Paddle being the Merchant of Record means they handle VAT calculation and invoice compliance, which academics often need for expense reimbursement.
Try It
If you do any kind of research — academic, market, scientific — give it a shot:
The free tier gets you 3 full literature reviews with no credit card required.
Happy to answer questions about any part of the architecture in the comments. Building AI tooling for academia is an underexplored niche with real pain to solve — the manual review process genuinely hasn't changed since the 1990s.

Top comments (1)
Really solid architecture. The multi-source PDF resolution pipeline is where most academic tools fall apart, so the fact that you built fallbacks through OpenAlex, Unpaywall, arXiv, and PubMed Central shows you actually understand the problem space. Using Groq for inference is a smart call too — the latency difference matters a lot when you are processing dozens of papers in a batch. One thing I would be curious about: how do you handle papers where the methodology section is buried in supplementary materials or appendices? That is a common issue in biomedical literature. Great work shipping this as a full product with Paddle integration and everything.