DEV Community

reopio
reopio

Posted on

Paper2PPT: Let AI Handle Your Academic Presentation Slides, Say Goodbye to All-Nighters!

Hello everyone! Today I want to introduce an open-source tool I recently developed —— Paper2PPT(https://github.com/gejifeng/Paper2PPT).

As an "academic laborer" who often needs to read papers and give presentations, have you ever experienced such pain:

  • Just finished reading an obscure paper, haven't had time to digest it, and have to prepare a group meeting PPT.
  • Staring at a blank PPT template, not knowing how to condense dozens of pages of papers into a 10-minute speech.
  • Finally wrote the content, spent half a day on typesetting, only to find that the font is too small and the picture overflows when projected...

Wouldn't it be cool if there was an AI assistant that could read your paper, automatically plan your speech ideas, and even directly generate beautifully typeset PDF slides?

This is the original intention of Paper2PPT.

What is Paper2PPT?

Simply put, Paper2PPT is a fully automated paper-to-PPT intelligent Agent.

It is not just a simple "summary tool", but an "AI producer" with a complete workflow. You just need to throw a paper (PDF or LaTeX source code) to it, tell it how long you want to talk (for example, 10 minutes), and leave the rest - reading, conceiving, writing, typesetting, and retouching - to it.

In the end, you will get a professional PDF presentation and complete LaTeX source code. This means you can manually fine-tune it at any time to ensure the final effect perfectly meets your needs.

🖼️ Showcase

Seeing is believing. Here are screenshots of real presentations generated by Paper2PPT:

Attention Is All You Need
IR3 (Mixed Precision GMRES)

All generated results can be found in the output/ directory.

What makes it special?

There are some PPT generation tools on the market, but Paper2PPT has made many special optimizations for academic scenarios:

  1. Flexible Input:

    • PDF: Throw it in directly, and it will automatically call tools like mineru to extract text and structure.
    • LaTeX Source: This is its specialty! It can directly read the paper source code and understand the deeper structure.
  2. Thinking Like a Human:
    It doesn't mechanically copy and paste the abstract into the PPT, but takes three steps:

    • Planner: Read the full text first and plan the script like a director. This page talks about the background, that page talks about the method, a comparison table is needed here, and an architecture diagram is needed there.
    • Generator: According to the script, start writing the LaTeX code for each page. It knows when to use a list, when to split columns, and when to highlight key points.
    • Refiner: This is the coolest part! It will try to compile the PPT. If it finds that there is too much text "overflowing the box", or the typesetting is ugly, it will automatically adjust the font size, spacing, and even rewrite the content like a designer until it is perfect.
  3. WYSIWYG LaTeX:
    What is generated is not a rigid picture, but standard Beamer LaTeX code. This means the formulas are extremely beautiful, the typesetting is absolutely professional, and you have 100% modification rights.

Agent's Thinking Framework: How does it work?

To help everyone understand the internal operation mechanism of Paper2PPT more intuitively, I drew a workflow diagram of its "brain".

You can see that this is not just a linear process, but an intelligent system containing a feedback loop.

Paper2PPT Workflow

Core Process Analysis:

  1. Ingestion: Whether it is PDF or LaTeX, it is first converted into pure text and structured data that the Agent can understand.
  2. Planning: The Planner brain goes online. Based on the speech duration (e.g., 10 minutes = 7-8 PPT slides), it designs a Slide Plan. It decides the title, core goal, and best layout (left-right columns or text-image mix?) for each page.
  3. Generation: The Generator takes over the outline, combines it with the original paper, and starts "fleshing it out". It calls preset LaTeX templates to transform boring text into exquisite code.
  4. Self-Reflection & Refinement Loop:
    • This is the most critical step. The system automatically compiles the generated code.
    • The Log Parser checks the compilation log for warnings like "Overfull \vbox" (meaning content overflow).
    • If there is an overflow, the Refiner intervenes. It analyzes the overflowing page, adopts strategies (such as simplifying text, adjusting spacing, using adjustbox scaling), and then recompiles.
    • This process loops until all pages are perfectly presented.

Quick Start

Paper2PPT is completely open source and developed based on Python.

1. Installation

It is recommended to use Conda:

conda create -n paper2ppt python=3.12
conda activate paper2ppt
pip install -r requirements.txt
Enter fullscreen mode Exit fullscreen mode

Note: A GPU is recommended for optimal performance (Tested on V100 16G).

2. Configure API

Create a .env file in the project root directory to configure LLM-related environment variables. You can refer to the .env.example file.

This project was tested using the DeepSeek API. The recommended configuration is as follows:

API_PROVIDER=deepseek
MODEL_NAME=deepseek-chat
LLM_API_KEY=your_api_key_here
LLM_API_BASE=https://api.deepseek.com
MAX_OUTPUT_TOKENS=6000
PDF_PARSE_METHOD=auto
Enter fullscreen mode Exit fullscreen mode

Don't forget to install a LaTeX environment (such as TeX Live) because we need it to compile the final PDF.

3. Run

Put your paper in the paper/pdf or paper/tex directory, then run:

python main.py
Enter fullscreen mode Exit fullscreen mode

4. Interaction

The terminal will list all found papers. You just need to choose which one to convert, enter the speech duration, and then watch it "think" and "type" quickly on the screen.

Future Plans (Todo List)

Although it is usable now, I still have many ideas to implement:

  • [ ] Automatic Chart Generation: Current PPTs are still text-heavy. In the future, I want it to understand data and automatically draw academic-style charts.
  • [ ] AI Drawing: Integrate Stable Diffusion or Midjourney to automatically add high-end illustrations to PPTs.
  • [ ] More Themes: Currently there is only one template. More styles will be added in the future, and even custom templates will be supported.
  • [ ] GUI Interface: Make a nice graphical interface so you don't have to type command lines anymore.
  • [ ] Windows Support: Currently mainly running on Linux/Mac, Windows adaptation is also on the way.

Final Words

Paper2PPT is currently in the Beta stage. Although it can run through most of the processes, there are definitely many places that can be optimized. For example, supporting more template styles, stronger image reading capabilities, etc.

If you are also a fan of AI + academic tools, welcome to submit Issues or PRs on GitHub, let's make it better together!


Project Address: https://github.com/gejifeng/Paper2PPT

Top comments (0)