DEV Community

GroupDocs
GroupDocs

Posted on

3 1

How to Extract Metadata from Presentation Templates using .NET Parsing API

When it comes to manipulating documents within your applications, the development options are endless. Organizations around the globe, regardless of the niche, regularly incorporate multitudes of innovative functionalities into each of their business scenarios for improving productivity. Extracting different types of information from multi-format documents is one such requisite. However, one primary concern is the accuracy or validity of the extracted data, not all software applications provide developers with highly accurate data extraction functionality.

Therefore, when looking for applications which could provide you with precise extraction of raw and formatted text as well as metadata from many different types of well-known file formats on .NET platform, GroupDocs.Parser for .NET must be considered. Apart from the basic data extraction features this document text extraction API does provide, app developers can use it for extracting text and metadata from various text and presentation templates with the
help of the latest API version. Another important feature is the ability to programmatically fetch tables from PDF documents within your .NET apps. And while working with this functionality, you can create table bounds manually or let the API identify the layout in automatic mode.

In addition to this, you have access to the features of detecting media type of your password-protected Office OpenXML documents and batch document processing –
http://bit.ly/2QuFPsr

Following code samples show how to extract text and metadata from templates:
// Extracting Text
void ExtractText(string fileName)
{
// Extract a text from the file
var text = Extractor.Default.ExtractText(fileName);
// Print an extracted text
Console.WriteLine(text);
}
// Extracting Metadata
void ExtractMetadata(string fileName)
{
// Extract metadata from the file
var metadata = Extractor.Default.ExtractMetadata(fileName);
// Print extracted metadata
foreach (var m in metadata)
{
// Print a metadata key
Console.Write(m.Key);
Console.Write(": ");
// Print a metadata value
Console.WriteLine(m.Value);
}
}

Below code sample shows how to detect media type in password-protected Office OpenXML documents:

// Create load options
LoadOptions loadOptions = new LoadOptions();
// Set a password
loadOptions.Password = "password";
// Get a default composite media type detector
var detector = CompositeMediaTypeDetector.Default;
// Create a stream to detect media type by content (not file extension)
using (var stream = File.OpenRead(Common.GetFilePath(fileName)))
{
// Detect a media type
var mediaType = detector.Detect(stream, loadOptions);
// Print a detected media type
Console.WriteLine(mediaType);
}

Heroku

Deploy with ease. Manage efficiently. Scale faster.

Leave the infrastructure headaches to us, while you focus on pushing boundaries, realizing your vision, and making a lasting impression on your users.

Get Started

Oldest comments (0)

AWS Q Developer image

Your AI Code Assistant

Automate your code reviews. Catch bugs before your coworkers. Fix security issues in your code. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

Explore a trove of insights in this engaging article, celebrated within our welcoming DEV Community. Developers from every background are invited to join and enhance our shared wisdom.

A genuine "thank you" can truly uplift someone’s day. Feel free to express your gratitude in the comments below!

On DEV, our collective exchange of knowledge lightens the road ahead and strengthens our community bonds. Found something valuable here? A small thank you to the author can make a big difference.

Okay