One Researcher Built a 10,000-Paper AI Reading List So You Don't Have To

#ai #webdev #programming #productivity

A single GitHub repo summarizing every major AI conference of 2024-2026 — in 5 minutes per paper. · zhaoyang97/Paper-Notes

Why This Hits Different Right Now
We are drowning. NeurIPS 2025 alone accepted 2,301 papers. ICLR 2026 dropped another 1,567. CVPR 2026 added 1,330 more. If you read one paper per hour, eight hours a day, you would need over four months just to skim the abstracts of what one researcher — zhaoyang97 — has already summarized in a single GitHub repo. That's the context in which Paper-Notes deserves serious attention.

This isn't a curated "top 10 papers of the year" listicle. This is a systematic, structured attempt to compress the entire frontier of AI research into digestible 5-minute notes, organized by conference and by research domain. With 32 stars at time of writing, almost nobody has found it yet. That's the gap.

What It Actually Does
The repo lives at zhaoyang97/Paper-Notes and publishes as a GitHub Pages site at zhaoyang97.github.io/Paper-Notes/. The structure is deceptively simple but operationally impressive.

The docs/ directory is organized along two axes simultaneously: by conference (ICLR2026/, CVPR2026/, ACL2025/, NeurIPS2025/, etc.) and by research domain within each conference. So if you want to find everything about LLM reasoning from NeurIPS 2025, you'd navigate to docs/NeurIPS2025/llm_reasoning/. That's a sane information architecture — most similar projects force you to pick one or the other.

The 44 research folders cover the full spectrum: from llm_reasoning/ (240 notes) and multimodal_vlm/ (825 notes) to niche domains like earth_science/ (7 notes) and signal_comm/ (37 notes). The breadth is genuinely unusual. Most curated reading lists collapse everything into five or six buckets. This one has aigc_detection/, causal_inference/, knowledge_editing/, and self_supervised/ as distinct categories.

Each note follows a consistent format: title, conference/arXiv link, domain tags, a one-sentence summary, background and motivation, core method breakdown, experimental results, and limitations analysis. That last item — explicit limitations analysis on every paper — is where this distinguishes itself from lazy summarization.

The Technical Architecture Worth Examining
The index.md files at each conference level serve as navigable indices — the docs/CVPR2026/index.md presumably aggregates links across all 1,330 CVPR 2026 notes organized by subdomain. The Python codebase (the repo's listed language) likely handles generation or templating of these notes at scale — though the actual generator scripts aren't exposed in the README, which is a notable omission we'll get to.

The publication layer is MkDocs or a similar static site generator pointed at the docs/ tree, with docs/index.md serving as the root landing page that aggregates across all conferences. This is the right call — GitHub Pages rendering means zero infrastructure cost and instant global CDN.

Coverage numbers that stand out: image_generation/ at 1,018 notes, medical_imaging/ at 597, model_compression/ at 503, and reinforcement_learning/ at 454. These are the domains with the most published research right now, and the note counts roughly track publication volume — which suggests the curation isn't arbitrarily cherry-picked.

The others/ category at 717 notes is the honest admission that taxonomy is hard. I'd rather see 717 papers in an overflow bucket than have them shoehorned into ill-fitting categories.

The Honest Critical Take
Let me be direct about where this breaks down, because credibility matters more than cheerleading.

First: the generation question is unresolved. Ten thousand structured paper notes with consistent formatting, covering conferences that concluded months ago, is a suspicious volume for manual work. The README doesn't address methodology — are these human-written summaries, LLM-generated from abstracts, or some hybrid? For a resource you'd use to make research decisions, that provenance matters enormously. A note that says "core method: we propose a novel attention mechanism" is useless if it's just rephrased abstract text.

Second: 32 stars on a 10,000-note corpus is a red flag worth examining. Either this just launched (the last push was April 14, 2026, which is very recent), or the community has seen it and quietly moved on. The gap between the stated scope and the current engagement warrants skepticism.

Third: the license is CC BY-NC-SA 4.0, which means you can't use this content commercially. If you're building a product, a research tool, or anything monetized on top of these notes, you're in murky territory immediately.

Fourth: it's in Chinese. The description, README headers, and presumably the notes themselves are written in Mandarin. For non-Chinese-reading developers, the GitHub Pages site may require translation — which adds friction and ironically makes it less immediately useful for a global audience despite covering globally significant research.

Who should NOT use this: Anyone who needs to deeply understand a paper before citing it in their own work. Summaries at this scale are entry points, not replacements. Also anyone building commercial tooling on top of the content.

The Verdict
Despite the caveats, Paper-Notes fills a real and painful gap. The AI research surface area has outpaced any individual researcher's ability to monitor it. A structured, domain-indexed, conference-organized corpus of 10,000+ notes — whatever their exact provenance — gives you a map of the territory before you dive into any specific paper.

The right use case is triage and discovery: you're working on RAG, you want to know what NeurIPS 2025 contributed to the space, you check docs/NeurIPS2025/information_retrieval/, get a lay of the land in 20 minutes, then go read the three papers that seem most relevant in full. That workflow is genuinely valuable.

Try this if you are: A developer building in a domain adjacent to ML research who needs to stay current without dedicating 20 hours a week to paper reading. An AI engineer who wants to sanity-check whether a technique they're implementing has been superseded. A researcher new to a subfield who needs orientation before going deep.

Watch this repo. If the note quality holds up on inspection — go read five notes in your domain and evaluate the depth of the limitations analysis — this becomes a standard reference. If the notes read like abstract rephrasing, treat it as an index and nothing more.

Either way, someone built the map. That's more than most of us did.

NeurIPS 2025 alone accepted 2,301 papers — this single repo has already summarized all of them, organized by domain, before most researchers have opened their conference proceedings.

DEV Community

One Researcher Built a 10,000-Paper AI Reading List So You Don't Have To

Top comments (0)