DEV Community

Theerachai Songsee
Theerachai Songsee

Posted on

4 2

Answer: Using PDFbox to determine the coordinates of words in a document

I'm working on extract data from PDF files. This post helps me to determine for the coordinate position by word searching.

take a look on this, I think it's what you need.

https://jackson-brain.com/using-pdfbox-to-locate-text-coordinates-within-a-pdf-in-java/

Here is the code:

import java.io.File;
import java.io.IOException;
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.util.TextPosition;

public class PrintTextLocations extends PDFTextStripper {

public static StringBuilder tWord

Top comments (0)

Heroku

This site is powered by Heroku

Heroku was created by developers, for developers. Get started today and find out why Heroku has been the platform of choice for brands like DEV for over a decade.

Sign Up

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay