Introduction
In this guide, we'll learn how to convert PDF pages into JPEG images using C# and the Spire.PDF library. This is particularly useful when working with LLM APIs that require images in Base64 format, as you can easily convert the JPEG files into Base64 strings after extraction.
With just a few lines of code, you can seamlessly transform any PDF into high-quality JPEGs. This approach is compatible with .NET environments and is ideal for scenarios like document automation, API integrations, and more.
Step 1: Install the Required NuGet Package
First, you need to install the Spire.PDF library. Run the following command in your project directory:
dotnet add package Spire.PDF
This library provides the tools to load PDF files and convert individual pages into images.
Step 2: Create the PdfHelper Class
Now, let's encapsulate the PDF-to-JPEG conversion logic into a reusable class. Here's the PdfHelper
class:
using Spire.Pdf;
using System.Drawing;
using System.Drawing.Imaging;
namespace MyPlaygroundApp.Utils
{
public class PdfHelper
{
public static void ToJPEG(string pdfPath, string outputDirectory)
{
// Load the PDF document
PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile(pdfPath);
// Ensure the output directory exists
if (!Directory.Exists(outputDirectory))
{
Directory.CreateDirectory(outputDirectory);
}
// Loop through each page and save as JPEG
for (int i = 0; i < pdf.Pages.Count; i++)
{
// Save the page as a Stream
using (Stream imageStream = pdf.SaveAsImage(i))
{
// Convert the Stream to an Image
using (Image image = Image.FromStream(imageStream))
{
// Define the output path
string outputPath = Path.Combine(outputDirectory, $"Page-{i + 1}.jpg");
// Save the image as JPEG
image.Save(outputPath, ImageFormat.Jpeg);
}
}
}
// Close the PDF document
pdf.Close();
Console.WriteLine("PDF pages have been converted to JPEG images.");
}
}
}
Step 3: Call the Helper in Your Application
Here's how you can use the PdfHelper
class in your main application:
using MyPlaygroundApp.Utils;
class Program
{
static void Main(string[] args)
{
string pdfPath = @"C:\Users\User\Downloads\demo.pdf"; // Path to your PDF file
string outputDirectory = @"C:\Users\User\Downloads"; // Directory to save JPEG files
PdfHelper.ToJPEG(pdfPath, outputDirectory);
}
}
Once you run the program, each page of the specified PDF will be converted into a separate JPEG file and saved in the outputDirectory
.
Step 4: Run and Verify the Results
- Build and run your project:
dotnet run
Check the output directory. You'll see JPEG files named
Page-1.jpg
,Page-2.jpg
, and so on.Open any of the JPEG files with an image viewer to confirm the conversion's accuracy.
Why This Helper is Useful
This helper is a lifesaver when working with APIs that only accept images in Base64 format. After converting the PDF to JPEGs, you can easily encode the images using the following snippet:
string base64Image = Convert.ToBase64String(File.ReadAllBytes("Page-1.jpg"));
This workflow simplifies the integration of PDF content into LLM APIs, making it versatile for various use cases.
Summary
- We used Spire.PDF, a robust library, to convert PDFs into JPEGs in C#.
- Created a reusable
PdfHelper
class to handle the conversion process. - Demonstrated how to integrate the helper into a real-world application.
With this setup, you can easily automate the PDF-to-JPEG conversion process and prepare image-based data for LLM APIs.
Love C#!
Top comments (0)