PDF documents are widely used in reports, contracts, textbooks, and exams. Often, we need to search for specific text and highlight it—for example, marking key terms in reports, highlighting important clauses in contracts, or extracting certain content for analysis. Manual processing is time-consuming and prone to errors, so automating text search and highlighting in PDFs is essential.
In this guide, we’ll demonstrate four practical ways to find and highlight text in PDF files using Java:
Highlight text on a single page
Highlight text within a specific rectangular area
Highlight text throughout the entire document
Highlight text using regular expressions
Installing the Required Library
We use Spire.PDF for Java, a powerful PDF processing library that allows you to load PDFs, search text, edit content, and highlight text. You can download the JAR package or include it via Maven:
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>12.4.4</version>
</dependency>
</dependencies>
Once installed, you can manipulate PDFs entirely in Java without third-party software like Adobe Acrobat.
Method 1: Highlight Text on a Single Page
If you only need to highlight text on a single page—for instance, the term “Database” on the first page—this method is ideal for quick annotations.
import com.spire.ms.System.Collections.Generic.List;
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;
import com.spire.pdf.PdfPageBase;
import com.spire.pdf.texts.PdfTextFinder;
import com.spire.pdf.texts.PdfTextFragment;
import com.spire.pdf.texts.TextFindParameter;
import java.awt.*;
import java.util.EnumSet;
public class HighlightTextInPage {
public static void main(String[] args) {
// Load the PDF document
PdfDocument doc = new PdfDocument();
doc.loadFromFile("C:\\Users\\Administrator\\Desktop\\SampleDocument.pdf");
// Get the first page
PdfPageBase page = doc.getPages().get(0);
// Create a text finder for this page
PdfTextFinder finder = new PdfTextFinder(page);
// Set search options: whole word match, ignore case
finder.getOptions().setTextFindParameter(EnumSet.of(TextFindParameter.WholeWord));
finder.getOptions().setTextFindParameter(EnumSet.of(TextFindParameter.IgnoreCase));
// Find all occurrences of the word "Database"
List<PdfTextFragment> results = finder.find("Database");
// Highlight each found text fragment
for (PdfTextFragment textFragment : results) {
textFragment.highLight(Color.LIGHT_GRAY);
}
// Save the highlighted PDF to a new file
doc.saveToFile("output/HighlightTextInPage.pdf", FileFormat.PDF);
doc.dispose();
}
}
Use cases: Page-level annotation, report review, quick content highlighting.
Method 2: Highlight Text Within a Specific Rectangular Area
Sometimes you only want to search a specific area, such as the header, footer, or a table, to avoid highlighting irrelevant text. Use setFindArea() to define the search rectangle.
import java.awt.geom.Rectangle2D;
// Define a rectangular area (e.g., top header region)
finder.getOptions().setFindArea(new Rectangle2D.Float(0, 0, 841, 180));
// Search for the word "Report" within this area
List<PdfTextFragment> results = finder.find("Report");
// Highlight each found fragment
for (PdfTextFragment textFragment : results) {
textFragment.highLight(Color.LIGHT_GRAY);
}
Use cases: Highlight header text in reports, mark section titles, or annotate tables.
Method 3: Highlight Text Throughout the Entire Document
To process an entire document—for example, to highlight “Contract Clause” throughout a PDF—you can iterate through each page.
for (Object pageObj : doc.getPages()) {
PdfPageBase page = (PdfPageBase) pageObj;
PdfTextFinder finder = new PdfTextFinder(page);
// Set search options: whole word match, ignore case
finder.getOptions().setTextFindParameter(EnumSet.of(TextFindParameter.WholeWord));
finder.getOptions().setTextFindParameter(EnumSet.of(TextFindParameter.IgnoreCase));
// Find all occurrences of "Contract Clause"
List<PdfTextFragment> results = finder.find("Contract Clause");
// Highlight each found fragment
for (PdfTextFragment textFragment : results) {
textFragment.highLight(Color.LIGHT_GRAY);
}
}
Use cases: Full-document contract review, batch report annotation, textbook content marking.
Method 4: Highlight Text Using Regular Expressions
Regular expressions are useful for pattern-based searches, such as section headings or numbered entries. For example, highlight all chapter headings starting with “Chapter” followed by a number:
// Enable regex search
finder.getOptions().setTextFindParameter(EnumSet.of(TextFindParameter.Regex));
// Define a regex pattern for "Chapter" followed by any number
String pattern = "\\bChapter\\s\\d+\\b";
// Find all matches
List<PdfTextFragment> results = finder.find(pattern);
// Highlight each match
for (PdfTextFragment textFragment : results) {
textFragment.highLight(Color.LIGHT_GRAY);
}
Use cases: Highlight chapter titles, extract numbered entries, detect structured content.
Conclusion
These four methods demonstrate practical ways to find and highlight text in PDF files using Java:
Single-page highlighting – Quickly annotate specific pages.
Rectangular area highlighting – Precisely control search scope.
Full-document highlighting – Batch process large PDFs.
Regex-based highlighting – Flexibly match complex patterns.
With Spire.PDF for Java, you can automate PDF text processing efficiently, making it ideal for contract review, report annotation, textbook marking, and other professional applications.
Top comments (0)