🛠️ pdf_splitter: Split PDF into markdown chapters

#pdf #markdown #converter

PDF Splitter is a utility that splits a long PDF into markdown chapters. It uses PyPDF2 for PDF processing and converts each chapter into a markdown file. The tool takes the input PDF file and the output directory as arguments. It also provides an option to specify the chapter names. The tool is designed to be easy to use and provides a simple command-line interface. The output markdown files are named after the chapter names provided. If no chapter names are provided, the tool will automatically generate chapter names based on the PDF outline. The tool is useful for converting large PDF documents into smaller, more manageable markdown files. It can be used for a variety of tasks, such as converting eBooks, technical documents, and academic papers.

import argparse
import json
import os
from PyPDF2 import PdfReader

def split_pdf(pdf_file, output_dir, chapter_names=None):
    pdf = PdfReader(pdf_file)
    if chapter_names is None:
        chapter_names = [f'Chapter {i+1}' for i in range(len(pdf.pages))]
    for i, page in enumerate(pdf.pages):
        with open(os.path.join(output_dir, f'{chapter_names[i]}.md'), 'w') as f:
            f.write(f'# {chapter_names[i]}

')
            text = page.extract_text()
            f.write(text)

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('pdf_file', help='Input PDF file')
    parser.add_argument('output_dir', help='Output directory')
    parser.add_argument('--chapter-names', nargs='+', help='Chapter names')
    args = parser.parse_args()
    split_pdf(args.pdf_file, args.output_dir, args.chapter_names)

DEV Community

🛠️ pdf_splitter: Split PDF into markdown chapters

Top comments (0)