DEV Community

Pooja Patel
Pooja Patel

Posted on

How to get the length of a PDF file on AWS s3 using Python?

Ever wondered how to find the length of a pdf file on s3? You are at the right place to learn.

๐Ÿ“ Prerequisite:

  • Access to AWS account

๐Ÿ“– Here are the simple steps to get the length of a pdf file on AWS s3:

  1. Login to AWS console (Create an account if you do not have one)
  2. Look for service s3
  3. Create a bucket and upload an object
  4. Open VS code or your favorite IDE
  5. Install boto3 and pypdf with below commands:
pip install boto3 pypdf

Enter fullscreen mode Exit fullscreen mode
  1. Make sure your aws credentials are configured ~/.aws/credentials, or via environment variables, or IAM roles
  2. Copy the below python code into file named main.py and run it with the code:
python main.py
Enter fullscreen mode Exit fullscreen mode

main.py:

import boto3
from pypdf import PdfReader
from io import BytesIO

#  S3 Configuration 
bucket_name = 'demo-bucket-14576876757'
pdf_key = 'sample_file.pdf'

#  Create S3 client 
s3 = boto3.client('s3')

#  Download the PDF into memory 
response = s3.get_object(Bucket=bucket_name, Key=pdf_key)
pdf_content = response['Body'].read()

#  Load PDF from memory 
reader = PdfReader(BytesIO(pdf_content))

#  Get number of pages 
num_pages = len(reader.pages)
print(f"The PDF has {num_pages} pages.")
Enter fullscreen mode Exit fullscreen mode

๐Ÿ–ฅ๏ธ Output:
You should see the output as below:

โœ… Yes, it is that simple!

Top comments (0)