IronSoftware

Posted on Jan 7

Extract Images from PDFs in C# (.NET 10)

#dotnet #csharp

Our legal team received contracts with embedded diagrams. We needed those diagrams as separate image files for documentation. Manual extraction via screenshot was tedious and low-quality.

Programmatic image extraction solved this. Here's how to pull embedded images from PDFs.

How Do I Extract Images from a PDF?

Use ExtractAllImages():

using IronPdf;
// Install via NuGet: Install-Package IronPdf

var pdf = PdfDocument.FromFile("document.pdf");

var images = pdf.ExtractAllImages();

for (int i = 0; i < images.Count; i++)
{
    images[i].SaveAs($"extracted-image-{i}.png");
}

Extracts all embedded images, saves as PNG files.

Why Extract Images from PDFs?

Documentation: Reuse diagrams, charts, logos
Analysis: Process images with computer vision
Archival: Store images separately from PDFs
Content repurposing: Use images in presentations, websites
Quality recovery: Get original high-res images instead of screenshots

I extract charts from financial reports for PowerPoint presentations.

What Image Formats Can I Extract?

PDFs embed images in various formats:

JPEG (photos, compressed)
PNG (graphics, transparency)
TIFF (scanned documents)
BMP (uncompressed bitmaps)

ExtractAllImages() returns IronSoftware.Drawing.AnyBitmap objects, which you can save as any format:

var images = pdf.ExtractAllImages();

foreach (var image in images)
{
    image.SaveAs($"image-{Guid.NewGuid()}.jpg", AnyBitmap.ImageFormat.Jpeg);
}

Can I Extract from Specific Pages Only?

Yes, specify page indices:

// Extract images from page 3 only
var page2Images = pdf.ExtractImagesFromPage(2); // Zero-indexed

foreach (var img in page2Images)
{
    img.SaveAs($"page3-image-{Guid.NewGuid()}.png");
}

Or loop through selected pages:

for (int pageNum = 0; pageNum < 5; pageNum++)
{
    var pageImages = pdf.ExtractImagesFromPage(pageNum);
    // Process images from pages 0-4
}

How Do I Handle Large PDFs?

Process page-by-page to manage memory:

for (int i = 0; i < pdf.PageCount; i++)
{
    var pageImages = pdf.ExtractImagesFromPage(i);

    foreach (var img in pageImages)
    {
        img.SaveAs($"page{i}-img{Guid.NewGuid()}.png");
        img.Dispose(); // Free memory immediately
    }
}

Prevents out-of-memory errors on 1000+ page documents.

Can I Get Image Metadata?

Yes, inspect dimensions and properties:

var images = pdf.ExtractAllImages();

foreach (var img in images)
{
    Console.WriteLine($"Width: {img.Width}");
    Console.WriteLine($"Height: {img.Height}");
    Console.WriteLine($"Pixel Format: {img.Format}");

    img.SaveAs($"image-{img.Width}x{img.Height}.png");
}

Use metadata to filter or categorize extracted images.

How Do I Filter Images by Size?

var images = pdf.ExtractAllImages();

var largeImages = images.Where(img => img.Width > 800 && img.Height > 600);

foreach (var img in largeImages)
{
    img.SaveAs($"large-{Guid.NewGuid()}.png");
}

Ignore small icons, logos, or decorative elements.

Can I Extract to Memory Instead of Files?

Yes, keep as AnyBitmap objects:

var images = pdf.ExtractAllImages();

foreach (var img in images)
{
    // Convert to byte array
    using var ms = new MemoryStream();
    img.SaveAs(ms, AnyBitmap.ImageFormat.Png);
    byte[] imageBytes = ms.ToArray();

    // Send to API, database, or process further
}

No intermediate files needed.

What If a PDF Has No Embedded Images?

ExtractAllImages() returns an empty collection:

var images = pdf.ExtractAllImages();

if (images.Count == 0)
{
    Console.WriteLine("No images found in PDF");
}
else
{
    Console.WriteLine($"Found {images.Count} images");
}

Text-only PDFs or PDFs with vector graphics (not raster images) return no images.

How Do I Extract Vector Graphics?

Vector graphics (SVG, paths) aren't "images" in the embedded sense. To capture them:

Rasterize pages to images (converts entire page including vectors):

pdf.RasterizeToImageFiles("page-*.png", new ImageRenderingOptions
{
    Dpi = 300
});

Or use specialized PDF libraries that parse vector commands.

Most PDFs with diagrams embed raster images, not vectors.

Can I Batch Extract from Multiple PDFs?

Yes, loop through files:

var pdfFiles = Directory.GetFiles("documents", "*.pdf");

foreach (var file in pdfFiles)
{
    var pdf = PdfDocument.FromFile(file);
    var images = pdf.ExtractAllImages();

    var folder = Path.Combine("extracted", Path.GetFileNameWithoutExtension(file));
    Directory.CreateDirectory(folder);

    for (int i = 0; i < images.Count; i++)
    {
        images[i].SaveAs(Path.Combine(folder, $"image-{i}.png"));
    }
}

Processes entire directories, organizes by source PDF.

How Do I Maintain Original Quality?

Extracted images preserve original resolution and compression. Saving as PNG maintains lossless quality:

img.SaveAs("high-quality.png"); // Lossless

For smaller files, use JPEG with quality control:

img.SaveAs("compressed.jpg", AnyBitmap.ImageFormat.Jpeg, 85); // 85% quality

Can I Extract Images from Protected PDFs?

If you have the password, yes:

var pdf = PdfDocument.FromFile("protected.pdf", "password123");

var images = pdf.ExtractAllImages();

foreach (var img in images)
{
    img.SaveAs($"extracted-{Guid.NewGuid()}.png");
}

What About Images in Forms?

Form fields with image controls may not be "embedded images". They're part of the form layer. Use ExtractAllImages() – it captures most embedded content.

For complex forms, you might need to render pages as images instead.

How Do I Process Extracted Images?

Use ImageSharp or System.Drawing for post-processing:

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Processing;

var images = pdf.ExtractAllImages();

foreach (var img in images)
{
    using var ms = new MemoryStream();
    img.SaveAs(ms, AnyBitmap.ImageFormat.Png);
    ms.Position = 0;

    var image = Image.Load(ms);

    // Apply processing
    image.Mutate(x => x.Grayscale().Resize(800, 600));

    image.Save($"processed-{Guid.NewGuid()}.png");
}

What's the Performance?

Small PDFs: ~50-200ms per document
Large PDFs: ~1-5 seconds for 100+ images
Page-by-page: Faster for selective extraction

Parallel processing speeds up batch operations:

Parallel.ForEach(pdfFiles, file =>
{
    var pdf = PdfDocument.FromFile(file);
    var images = pdf.ExtractAllImages();
    // Save images...
});

Can I Extract Thumbnails Only?

Extract all, then filter by size:

var images = pdf.ExtractAllImages();

var thumbnails = images.Where(img => img.Width < 300 || img.Height < 300);

foreach (var thumb in thumbnails)
{
    thumb.SaveAs($"thumb-{Guid.NewGuid()}.png");
}

How Do I Handle Duplicate Images?

PDFs sometimes embed the same image multiple times. Detect duplicates by comparing file hashes:

var hashes = new HashSet<string>();
var images = pdf.ExtractAllImages();

foreach (var img in images)
{
    using var ms = new MemoryStream();
    img.SaveAs(ms, AnyBitmap.ImageFormat.Png);

    var hash = Convert.ToBase64String(
        System.Security.Cryptography.MD5.HashData(ms.ToArray())
    );

    if (!hashes.Contains(hash))
    {
        hashes.Add(hash);
        img.SaveAs($"unique-{Guid.NewGuid()}.png");
    }
}

Saves only unique images.

What About Scanned PDFs?

Scanned PDFs are essentially images of pages. Each page is one large image. Extract with:

var images = pdf.ExtractAllImages();

// Typically: 1 image per page for scanned documents
Console.WriteLine($"Extracted {images.Count} scanned pages");

Or rasterize for more control:

pdf.RasterizeToImageFiles("scanned-page-*.png", new ImageRenderingOptions
{
    Dpi = 300
});

Written by Jacob Mellor, CTO at Iron Software. Jacob created IronPDF and leads a team of 50+ engineers building .NET document processing libraries.

DEV Community