DEV Community

Cover image for Convert PDFs to JPEGs in C#: A Simple Helper for LLM Integration
David Au Yeung
David Au Yeung

Posted on

Convert PDFs to JPEGs in C#: A Simple Helper for LLM Integration

Introduction

In this guide, we'll learn how to convert PDF pages into JPEG images using C# and the Spire.PDF library. This is particularly useful when working with LLM APIs that require images in Base64 format, as you can easily convert the JPEG files into Base64 strings after extraction.

With just a few lines of code, you can seamlessly transform any PDF into high-quality JPEGs. This approach is compatible with .NET environments and is ideal for scenarios like document automation, API integrations, and more.

Step 1: Install the Required NuGet Package

First, you need to install the Spire.PDF library. Run the following command in your project directory:

dotnet add package Spire.PDF
Enter fullscreen mode Exit fullscreen mode

This library provides the tools to load PDF files and convert individual pages into images.

Step 2: Create the PdfHelper Class

Now, let's encapsulate the PDF-to-JPEG conversion logic into a reusable class. Here's the PdfHelper class:

using Spire.Pdf;
using System.Drawing;
using System.Drawing.Imaging;

namespace MyPlaygroundApp.Utils
{
    public class PdfHelper
    {
        public static void ToJPEG(string pdfPath, string outputDirectory)
        {
            // Load the PDF document
            PdfDocument pdf = new PdfDocument();
            pdf.LoadFromFile(pdfPath);

            // Ensure the output directory exists
            if (!Directory.Exists(outputDirectory))
            {
                Directory.CreateDirectory(outputDirectory);
            }

            // Loop through each page and save as JPEG
            for (int i = 0; i < pdf.Pages.Count; i++)
            {
                // Save the page as a Stream
                using (Stream imageStream = pdf.SaveAsImage(i))
                {
                    // Convert the Stream to an Image
                    using (Image image = Image.FromStream(imageStream))
                    {
                        // Define the output path
                        string outputPath = Path.Combine(outputDirectory, $"Page-{i + 1}.jpg");

                        // Save the image as JPEG
                        image.Save(outputPath, ImageFormat.Jpeg);
                    }
                }
            }

            // Close the PDF document
            pdf.Close();
            Console.WriteLine("PDF pages have been converted to JPEG images.");
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Call the Helper in Your Application

Here's how you can use the PdfHelper class in your main application:

using MyPlaygroundApp.Utils;

class Program
{
    static void Main(string[] args)
    {
        string pdfPath = @"C:\Users\User\Downloads\demo.pdf"; // Path to your PDF file
        string outputDirectory = @"C:\Users\User\Downloads"; // Directory to save JPEG files

        PdfHelper.ToJPEG(pdfPath, outputDirectory);
    }
}
Enter fullscreen mode Exit fullscreen mode

Once you run the program, each page of the specified PDF will be converted into a separate JPEG file and saved in the outputDirectory.

Step 4: Run and Verify the Results

  1. Build and run your project:
dotnet run
Enter fullscreen mode Exit fullscreen mode
  1. Check the output directory. You'll see JPEG files named Page-1.jpg, Page-2.jpg, and so on.

  2. Open any of the JPEG files with an image viewer to confirm the conversion's accuracy.

Why This Helper is Useful

This helper is a lifesaver when working with APIs that only accept images in Base64 format. After converting the PDF to JPEGs, you can easily encode the images using the following snippet:

string base64Image = Convert.ToBase64String(File.ReadAllBytes("Page-1.jpg"));
Enter fullscreen mode Exit fullscreen mode

This workflow simplifies the integration of PDF content into LLM APIs, making it versatile for various use cases.

Summary

  • We used Spire.PDF, a robust library, to convert PDFs into JPEGs in C#.
  • Created a reusable PdfHelper class to handle the conversion process.
  • Demonstrated how to integrate the helper into a real-world application.

With this setup, you can easily automate the PDF-to-JPEG conversion process and prepare image-based data for LLM APIs.

Love C#!

Top comments (0)