DEV Community

Discussion on: Read/Extract Text from Pdf in Java

Collapse
 
eiceblue profile image
E-iceblue Product Family • Edited

Hi, the following code snippet shows you how to find text by a pattern and highlight the results with yellow.

import com.spire.pdf.general.find.PdfTextFind;
import java.awt.*;

public class FindByRegularExpression {

    public static void main(String[] args) throws Exception {

        //Load a PDF document
        PdfDocument pdf = new PdfDocument();
        pdf.loadFromFile("C:\\Users\\Administrator\\Desktop\\test.pdf");

        //Creat a PdfTextFind collection 
        PdfTextFind[] results;

        //Loop through the pages
        for (Object page : (Iterable) pdf.getPages()) {
            PdfPageBase pageBase = (PdfPageBase) page;

            //Define a pattern
            String pattern = "\\#\\w+\\b";

            //Find all results that match the pattern
            results = pageBase.findText(pattern).getFinds();

            //Highlight results with yellow
            for (PdfTextFind find : results) {
                find.applyHighLight(Color.yellow);
            }
        }

        //Save to file
        pdf.saveToFile("output.pdf");
    }
}