DEV Community

Theerachai Songsee
Theerachai Songsee

Posted on

Answer: Using PDFbox to determine the coordinates of words in a document

I'm working on extract data from PDF files. This post helps me to determine for the coordinate position by word searching.

take a look on this, I think it's what you need.

Here is the code:

import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.util.TextPosition;

public class PrintTextLocations extends PDFTextStripper {

public static StringBuilder tWord

Top comments (0)