DEV Community

Saravanan Muniraj
Saravanan Muniraj

Posted on

A Simple Guide to Loading an Entire PDF into a List of Documents Using Langchain

Before diving into the code, it is essential to install the necessary packages to ensure everything runs smoothly. You can do this by executing the following commands in your terminal:

pip install langchain_community
pip install pypdf
Enter fullscreen mode Exit fullscreen mode
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load the PDF file from the specified path.

FILE_PATH = "c:/work/Test01.pdf"

loader = PyPDFLoader(file_path=FILE_PATH)

# Load the entire PDF into a list of documents

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)

documents = loader.load_and_split(text_splitter)

for i in range(len(documents)):
    print(documents[i].page_content + "\n")```



Enter fullscreen mode Exit fullscreen mode

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more