Portable Document Format (PDF) is a popular and widely used document format developed by Adobe. The PDF documents can contain a variety of content including formatted text, images, annotations, form fields, etc. Parsing PDF document programmatically is a popular use case and there are multiple ways of extracting the text. However, extracting images from a PDF document is a complex task. This article demonstrates how easily you can extract images from the PDF documents programmatically in C# using GroupDocs.Parser for .NET API. So let’s begin.
Steps to extract images from a PDF document
1. Create a new project.
2. Download GroupDocs.Parser for .NET or install it using NuGet.
3. Add the following namespaces.
using GroupDocs.Parser;
using GroupDocs.Parser.Data;
using System.Drawing;
using System.Drawing.Imaging;
4. Load the PDF document.
// Create an instance of Parser class
using (Parser parser = new Parser("sample.pdf"))
{
// you code goes here.
}
5. Extract images from the document.
// Extract images
IEnumerable<PageImageArea> images = parser.GetImages();
// Check if images extraction is supported
if (images == null)
{
Console.WriteLine("Images extraction isn't supported");
return;
}
6. Access each image from the collection and save it.
// Iterate over images
foreach (PageImageArea image in images)
{
// Save images
Image.FromStream(image.GetImageStream()).Save(string.Format("{0}.Jpeg", counter++), ImageFormat.Jpeg);
}
Complete code
// Create an instance of Parser class
using (Parser parser = new Parser("C:\\candy.pdf"))
{
// Extract images
IEnumerable<PageImageArea> images = parser.GetImages();
// Check if image extraction is supported
if (images == null)
{
Console.WriteLine("Images extraction isn't supported");
return;
}
int counter = 1;
// Iterate over images
foreach (PageImageArea image in images)
{
// Save each image
Image.FromStream(image.GetImageStream()).Save(string.Format("{0}.Jpeg", counter++), ImageFormat.Jpeg);
}
}
Results
Cheers!
Top comments (0)