Discussion on: Extract the text data from PDF file using Aspose.PDF for .NET

View post

Hi Team,
I need to extract content from PDF, by giving a paragraph heading or some phrase.
How to achieve this. ParagraphAbsober, does get all text. However I need only from a particular paragraph or particular portion of a paragraph, not the complete page.
How to achieve this.
Regards,
Mamtha.A.C.D.

Andriy Andruhovski • Apr 14 '19

Thanks for your interest!
Currently, you can use TextFragementAbsorber with regular expression as an input parameter.

    // Create TextFragmentAbsorber object that searches all words starting 'h' 
    // and ending 'o' using regular expression.
    TextFragmentAbsorber absorber = new TextFragmentAbsorber(@"h\w*?o", 
         new TextSearchOptions(true));

Unfortunately, ParagraphAbsorber doesn't support search by the regular expression, so you need to analyze paragraphs extracted with this tool manually.