DEV Community

E-iceblue Product Family
E-iceblue Product Family

Posted on

1

How to extract text and image from a PowerPoint Document in Java

Texts and images are the main elements on the Presentation slides. In this article, we will show you how to extract text and image from a PowerPoint document using Free Spire.Presentation for Java.

Installation

If you use maven, you need to specify the dependencies for Free Spire.Presentation for Java library in your project’s pom.xml file.

<repositories>  
        <repository>  
            <id>com.e-iceblue</id>  
            <name>e-iceblue</name>  
            <url>http://repo.e-iceblue.com/nexus/content/groups/public/</url>  
        </repository>  
</repositories>  
<dependencies>  
    <dependency>  
        <groupId>e-iceblue</groupId>  
        <artifactId>spire.presentation.free</artifactId>  
        <version>3.9.0</version>  
    </dependency>  
</dependencies>
Enter fullscreen mode Exit fullscreen mode

For non-maven projects, download Free Spire.Presentation for Java, unzip the package and add Spire.Presentation.jar in the lib folder into your project as a dependency.

import com.spire.presentation.*;

import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileWriter;

public class Test {
    public static void main(String[] args) throws Exception {
        //Create a presentation instance.
        Presentation ppt = new Presentation();

        // Load the document from disk.
        ppt.loadFromFile("Sample.pptx");

            StringBuilder buffer = new StringBuilder();
            //Traverse the presentation slides to extract the text.
            for (Object slide : ppt.getSlides()) {
                for (Object shape : ((ISlide) slide).getShapes()) {
                    if (shape instanceof IAutoShape) {
                        for (Object tp : ((IAutoShape) shape).getTextFrame().getParagraphs()) {
                            buffer.append(((ParagraphEx) tp).getText());
                        }
                    }
                }
            }
            //Save to document to .txt
            FileWriter writer = new FileWriter("ExtractText.txt");
            writer.write(buffer.toString());
            writer.flush();
            writer.close();

        //Extract all the images from the presentation slides
        for (int i = 0; i < ppt.getImages().getCount(); i++) {
            BufferedImage image = ppt.getImages().get(i).getImage();
            ImageIO.write(image, "PNG", new File(String.format("extractImage-%1$s.png", i)));
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

The effective screenshot of the extracted images:
Extracted images

Extracted Text

Sentry image

Hands-on debugging session: instrument, monitor, and fix

Join Lazar for a hands-on session where you’ll build it, break it, debug it, and fix it. You’ll set up Sentry, track errors, use Session Replay and Tracing, and leverage some good ol’ AI to find and fix issues fast.

RSVP here →

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more