Extracting images from a document using Aspose.Words Cloud API (C# / .NET)

asposewords profile image asposewords ・2 min read

The sample code below discusses a few methods for extracting existing images from a DOCX, or other document file, using the Aspose.Words Cloud API.

You can connect directly and use the Aspose.Words REST API but to make the code simpler we are looking at the Aspose.Words Cloud SDK for .NET. There are also SDKs for other languages.

Before we can extract image data from a document file file using the Cloud API the file needs to be available in the Cloud Storage. The sample code below assumes the document has been uploaded. See Uploading a document to Cloud Storage (C# / .NET).

The sample code relies on having the Storage and Words SDKs both downloaded (either as code or DLLs) and referenced via the C# project.

My solution in Visual Studio looks like this:


The code below shows a sample ExtractImages method.

1) Sets up a connection to the WordsApi by passing in your AppSid and AppKey obtained from the Cloud Dashboard.

2) In the next step I have created a GetDocumentRequest and called the GetDocument method to just verify that the file exists in Storage and can be opened by the API.

The Console.WriteLine call that follows just shows accessing properties of the document … SourceFormat in this case.

The simple example below just has a for / next loop to grab each image based on an index. A better approach might be to call the GetDocumentDrawingObjects method to retrieve a list of drawing objects in the document if you need to retrieve other information related to the image.

3) Creates a GetDocumentDrawingObjectImageDataRequest object with the details of the image we want (based on index) and then

4) We call GetDocumentDrawingObjectImageData to open a Stream to the object

5) Saves the Stream to a file (with a unique name based on the index).

It then loops back to 3) to grab the next image.

The code will save image data as PNG files.

In the sample DOCX file that I used there is a SmartArt object that can be seen by the GetDocumentDrawingObjectImageData call but which is not an image … and so generates an error and is skipped by this sample code.

As an alternative to the above call you can use GetDocumentDrawingObjectByIndex to specify the format of the returned image.


Editor guide