DEV Community


Posted on

How to extract text and metadata from text and presentation templates

Extract text and metadata from a number of different text and presentation templates on Java platform using GroupDocs.Parser for Java API. Following template formats are supported:

  • dotx (Template)
  • dotm (Macro-enabled template)
  • ott (OpenDocument Text Template)
  • potx (Template)
  • potm (Macro-enabled template)
  • ppsm (Macro-enabled slideshow)
  • pptm (Macro-enabled presentation)

Below code samples demonstrates how to extract text and metadata from templates.

// Extracting Text
void extractText(String fileName) {
// Extract a text from the file
String text = Extractor.DEFAULT.extractText(fileName);
// Print an extracted text
// Extracting Metadata
void extractMetadata(String fileName) {
// Extract metadata from the file
MetadataCollection metadata = Extractor.DEFAULT.extractMetadata(fileName);
// Print extracted metadata
for (String key : metadata.getKeys()) {
// Print a metadata key
System.out.print(": ");
// Print a metadata value

In addition to this, parsing API also supports retrieving tables from PDF documents and allows identifying the media type for your secure Office Open XML documents -

Discussion (0)