DEV Community

GroupDocs
GroupDocs

Posted on

1

How to extract text and metadata from text and presentation templates

Extract text and metadata from a number of different text and presentation templates on Java platform using GroupDocs.Parser for Java API. Following template formats are supported:

  • dotx (Template)
  • dotm (Macro-enabled template)
  • ott (OpenDocument Text Template)
  • potx (Template)
  • potm (Macro-enabled template)
  • ppsm (Macro-enabled slideshow)
  • pptm (Macro-enabled presentation)

Below code samples demonstrates how to extract text and metadata from templates.

// Extracting Text
void extractText(String fileName) {
// Extract a text from the file
String text = Extractor.DEFAULT.extractText(fileName);
// Print an extracted text
System.out.println(text);
}
// Extracting Metadata
void extractMetadata(String fileName) {
// Extract metadata from the file
MetadataCollection metadata = Extractor.DEFAULT.extractMetadata(fileName);
// Print extracted metadata
for (String key : metadata.getKeys()) {
// Print a metadata key
System.out.print(key);
System.out.print(": ");
// Print a metadata value
System.out.println(metadata.get_Item(key));
}
}

In addition to this, parsing API also supports retrieving tables from PDF documents and allows identifying the media type for your secure Office Open XML documents - http://bit.ly/2CCy7bX

Speedy emails, satisfied customers

Postmark Image

Are delayed transactional emails costing you user satisfaction? Postmark delivers your emails almost instantly, keeping your customers happy and connected.

Sign up

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs