DEV Community

Usman Aziz
Usman Aziz

Posted on • Edited on • Originally published at blog.groupdocs.com

6 3

Extract Images from PDF Documents using C# .NET

Portable Document Format (PDF) is a popular and widely used document format developed by Adobe. The PDF documents can contain a variety of content including formatted text, images, annotations, form fields, etc. Parsing PDF document programmatically is a popular use case and there are multiple ways of extracting the text. However, extracting images from a PDF document is a complex task. This article demonstrates how easily you can extract images from the PDF documents programmatically in C# using GroupDocs.Parser for .NET API. So let’s begin.

Steps to extract images from a PDF document

1. Create a new project.

2. Download GroupDocs.Parser for .NET or install it using NuGet.

3. Add the following namespaces.

using GroupDocs.Parser;
using GroupDocs.Parser.Data;
using System.Drawing;
using System.Drawing.Imaging;
Enter fullscreen mode Exit fullscreen mode

4. Load the PDF document.

// Create an instance of Parser class
using (Parser parser = new Parser("sample.pdf"))
{
  // you code goes here.
}
Enter fullscreen mode Exit fullscreen mode

5. Extract images from the document.

// Extract images
IEnumerable<PageImageArea> images = parser.GetImages();
// Check if images extraction is supported
if (images == null)
{
  Console.WriteLine("Images extraction isn't supported");
  return;
}
Enter fullscreen mode Exit fullscreen mode

6. Access each image from the collection and save it.

// Iterate over images
foreach (PageImageArea image in images)
{
  // Save images
  Image.FromStream(image.GetImageStream()).Save(string.Format("{0}.Jpeg", counter++), ImageFormat.Jpeg);                    
}
Enter fullscreen mode Exit fullscreen mode

Complete code

// Create an instance of Parser class
using (Parser parser = new Parser("C:\\candy.pdf"))
{
    // Extract images
    IEnumerable<PageImageArea> images = parser.GetImages();
    // Check if image extraction is supported
    if (images == null)
    {
        Console.WriteLine("Images extraction isn't supported");
        return;
    }

    int counter = 1;
    // Iterate over images
    foreach (PageImageArea image in images)
    {
        // Save each image
        Image.FromStream(image.GetImageStream()).Save(string.Format("{0}.Jpeg", counter++), ImageFormat.Jpeg);                    
    }
}
Enter fullscreen mode Exit fullscreen mode

Results

PDF Document
Alt Text
Extracted Images
Alt Text

Cheers!

See Also

Hostinger image

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (0)

Cloudinary image

Optimize, customize, deliver, manage and analyze your images.

Remove background in all your web images at the same time, use outpainting to expand images with matching content, remove objects via open-set object detection and fill, recolor, crop, resize... Discover these and hundreds more ways to manage your web images and videos on a scale.

Learn more

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay