While dealing with Portable Document Format files, at times, you might need to extract text from a PDF file.
Aspose.PDF several classes to extract the data:
The easiest way to extract the data from PDF is using TextFragmentAbsorber with the default options:
TextAbsorber performs text extraction and provides access to the result via Text object. In this case, we'll get all text data in one single object.
Call the Accept method on a particular page of the Document object. The Index is the particular page number from where text needs to be extracted.
Sometimes we need to extract the text from the particular area (i.e. the left upper corner of the page). TextAbsorber also can do it. We'll need to setup TextSearchOptions property. In the following example, we'll set up a LimitToPageBounds property and a Rectangle property. The last takes Rectangle object as a value and using this property, we can specify the region of the page from which we need to extract the text. In our example, the LimitToPageBounds property indicates that text is searched within the page bound and the Rectangle property indicates to the upper half of page.
The TextFragmentAbsorber object is basically used in text search scenario. When the search is completed the occurrences are represented as text fragments collection. The TextFragment object provides access to the search occurrence text, text properties, and allows to edit text and change the text state (font, font size, color etc).
The ParagraphAbsorber class performs the search for sections and paragraphs of text and provides access for rectangles and polygons that describe it in text coordinate space.