DEV Community

Extract the text data from PDF file using Aspose.PDF for .NET

Andriy Andruhovski on March 12, 2018

While dealing with Portable Document Format files, at times, you might need to extract text from a PDF file. Aspose.PDF several classes to extract ...

Read full post

smithharber • Jul 7 '23 • Edited

Are you searching for a conversion solution to Import PDF file to Text effortlessly? Use CubexSoft PDF to Text Tool Converter for this purpose which give perfect and exact solution of how to convert a PDF emails to Text. There is no conversion issue you can simply export PDF files into Text file format. The software is a desktop based application which supports all version of Windows i.e. 11, 10, 8, 7, 8.1, vista etc. If you want to grab more knowledge about the software working, download PDF to TXT Tool demo version. The software demo version allow convert of first 5 PDF emails to Text for free of cost.

mamtacd • Apr 10 '19

Hi Team,
I need to extract content from PDF, by giving a paragraph heading or some phrase.
How to achieve this. ParagraphAbsober, does get all text. However I need only from a particular paragraph or particular portion of a paragraph, not the complete page.
How to achieve this.
Regards,
Mamtha.A.C.D.

Andriy Andruhovski Aspose.PDF • Apr 14 '19

Thanks for your interest!
Currently, you can use TextFragementAbsorber with regular expression as an input parameter.

    // Create TextFragmentAbsorber object that searches all words starting 'h' 
    // and ending 'o' using regular expression.
    TextFragmentAbsorber absorber = new TextFragmentAbsorber(@"h\w*?o", 
         new TextSearchOptions(true));

Unfortunately, ParagraphAbsorber doesn't support search by the regular expression, so you need to analyze paragraphs extracted with this tool manually.