Our legal team received contracts with embedded diagrams. We needed those diagrams as separate image files for documentation. Manual extraction via screenshot was tedious and low-quality.
Programmatic image extraction solved this. Here's how to pull embedded images from PDFs.
How Do I Extract Images from a PDF?
Use ExtractAllImages():
using IronPdf;
// Install via NuGet: Install-Package IronPdf
var pdf = PdfDocument.FromFile("document.pdf");
var images = pdf.ExtractAllImages();
for (int i = 0; i < images.Count; i++)
{
images[i].SaveAs($"extracted-image-{i}.png");
}
Extracts all embedded images, saves as PNG files.
Why Extract Images from PDFs?
Documentation: Reuse diagrams, charts, logos
Analysis: Process images with computer vision
Archival: Store images separately from PDFs
Content repurposing: Use images in presentations, websites
Quality recovery: Get original high-res images instead of screenshots
I extract charts from financial reports for PowerPoint presentations.
What Image Formats Can I Extract?
PDFs embed images in various formats:
- JPEG (photos, compressed)
- PNG (graphics, transparency)
- TIFF (scanned documents)
- BMP (uncompressed bitmaps)
ExtractAllImages() returns IronSoftware.Drawing.AnyBitmap objects, which you can save as any format:
var images = pdf.ExtractAllImages();
foreach (var image in images)
{
image.SaveAs($"image-{Guid.NewGuid()}.jpg", AnyBitmap.ImageFormat.Jpeg);
}
Can I Extract from Specific Pages Only?
Yes, specify page indices:
// Extract images from page 3 only
var page2Images = pdf.ExtractImagesFromPage(2); // Zero-indexed
foreach (var img in page2Images)
{
img.SaveAs($"page3-image-{Guid.NewGuid()}.png");
}
Or loop through selected pages:
for (int pageNum = 0; pageNum < 5; pageNum++)
{
var pageImages = pdf.ExtractImagesFromPage(pageNum);
// Process images from pages 0-4
}
How Do I Handle Large PDFs?
Process page-by-page to manage memory:
for (int i = 0; i < pdf.PageCount; i++)
{
var pageImages = pdf.ExtractImagesFromPage(i);
foreach (var img in pageImages)
{
img.SaveAs($"page{i}-img{Guid.NewGuid()}.png");
img.Dispose(); // Free memory immediately
}
}
Prevents out-of-memory errors on 1000+ page documents.
Can I Get Image Metadata?
Yes, inspect dimensions and properties:
var images = pdf.ExtractAllImages();
foreach (var img in images)
{
Console.WriteLine($"Width: {img.Width}");
Console.WriteLine($"Height: {img.Height}");
Console.WriteLine($"Pixel Format: {img.Format}");
img.SaveAs($"image-{img.Width}x{img.Height}.png");
}
Use metadata to filter or categorize extracted images.
How Do I Filter Images by Size?
var images = pdf.ExtractAllImages();
var largeImages = images.Where(img => img.Width > 800 && img.Height > 600);
foreach (var img in largeImages)
{
img.SaveAs($"large-{Guid.NewGuid()}.png");
}
Ignore small icons, logos, or decorative elements.
Can I Extract to Memory Instead of Files?
Yes, keep as AnyBitmap objects:
var images = pdf.ExtractAllImages();
foreach (var img in images)
{
// Convert to byte array
using var ms = new MemoryStream();
img.SaveAs(ms, AnyBitmap.ImageFormat.Png);
byte[] imageBytes = ms.ToArray();
// Send to API, database, or process further
}
No intermediate files needed.
What If a PDF Has No Embedded Images?
ExtractAllImages() returns an empty collection:
var images = pdf.ExtractAllImages();
if (images.Count == 0)
{
Console.WriteLine("No images found in PDF");
}
else
{
Console.WriteLine($"Found {images.Count} images");
}
Text-only PDFs or PDFs with vector graphics (not raster images) return no images.
How Do I Extract Vector Graphics?
Vector graphics (SVG, paths) aren't "images" in the embedded sense. To capture them:
- Rasterize pages to images (converts entire page including vectors):
pdf.RasterizeToImageFiles("page-*.png", new ImageRenderingOptions
{
Dpi = 300
});
- Or use specialized PDF libraries that parse vector commands.
Most PDFs with diagrams embed raster images, not vectors.
Can I Batch Extract from Multiple PDFs?
Yes, loop through files:
var pdfFiles = Directory.GetFiles("documents", "*.pdf");
foreach (var file in pdfFiles)
{
var pdf = PdfDocument.FromFile(file);
var images = pdf.ExtractAllImages();
var folder = Path.Combine("extracted", Path.GetFileNameWithoutExtension(file));
Directory.CreateDirectory(folder);
for (int i = 0; i < images.Count; i++)
{
images[i].SaveAs(Path.Combine(folder, $"image-{i}.png"));
}
}
Processes entire directories, organizes by source PDF.
How Do I Maintain Original Quality?
Extracted images preserve original resolution and compression. Saving as PNG maintains lossless quality:
img.SaveAs("high-quality.png"); // Lossless
For smaller files, use JPEG with quality control:
img.SaveAs("compressed.jpg", AnyBitmap.ImageFormat.Jpeg, 85); // 85% quality
Can I Extract Images from Protected PDFs?
If you have the password, yes:
var pdf = PdfDocument.FromFile("protected.pdf", "password123");
var images = pdf.ExtractAllImages();
foreach (var img in images)
{
img.SaveAs($"extracted-{Guid.NewGuid()}.png");
}
What About Images in Forms?
Form fields with image controls may not be "embedded images". They're part of the form layer. Use ExtractAllImages() – it captures most embedded content.
For complex forms, you might need to render pages as images instead.
How Do I Process Extracted Images?
Use ImageSharp or System.Drawing for post-processing:
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Processing;
var images = pdf.ExtractAllImages();
foreach (var img in images)
{
using var ms = new MemoryStream();
img.SaveAs(ms, AnyBitmap.ImageFormat.Png);
ms.Position = 0;
var image = Image.Load(ms);
// Apply processing
image.Mutate(x => x.Grayscale().Resize(800, 600));
image.Save($"processed-{Guid.NewGuid()}.png");
}
What's the Performance?
Small PDFs: ~50-200ms per document
Large PDFs: ~1-5 seconds for 100+ images
Page-by-page: Faster for selective extraction
Parallel processing speeds up batch operations:
Parallel.ForEach(pdfFiles, file =>
{
var pdf = PdfDocument.FromFile(file);
var images = pdf.ExtractAllImages();
// Save images...
});
Can I Extract Thumbnails Only?
Extract all, then filter by size:
var images = pdf.ExtractAllImages();
var thumbnails = images.Where(img => img.Width < 300 || img.Height < 300);
foreach (var thumb in thumbnails)
{
thumb.SaveAs($"thumb-{Guid.NewGuid()}.png");
}
How Do I Handle Duplicate Images?
PDFs sometimes embed the same image multiple times. Detect duplicates by comparing file hashes:
var hashes = new HashSet<string>();
var images = pdf.ExtractAllImages();
foreach (var img in images)
{
using var ms = new MemoryStream();
img.SaveAs(ms, AnyBitmap.ImageFormat.Png);
var hash = Convert.ToBase64String(
System.Security.Cryptography.MD5.HashData(ms.ToArray())
);
if (!hashes.Contains(hash))
{
hashes.Add(hash);
img.SaveAs($"unique-{Guid.NewGuid()}.png");
}
}
Saves only unique images.
What About Scanned PDFs?
Scanned PDFs are essentially images of pages. Each page is one large image. Extract with:
var images = pdf.ExtractAllImages();
// Typically: 1 image per page for scanned documents
Console.WriteLine($"Extracted {images.Count} scanned pages");
Or rasterize for more control:
pdf.RasterizeToImageFiles("scanned-page-*.png", new ImageRenderingOptions
{
Dpi = 300
});
Written by Jacob Mellor, CTO at Iron Software. Jacob created IronPDF and leads a team of 50+ engineers building .NET document processing libraries.
Top comments (0)