When you sit in the office, many PDF forms with names and numbers are handed to you. The next task is to gather all the data and save it to an Excel spreadsheet. You may decide to copy and paste the data to Excel but that is a daunting task and it may take you hours to copy the data. Here I would like to recommend Spire.PDF for java to you, which you can extract data from PDF forms into Excel worksheet easily in few lines of codes.
Spire.PDF for Java is a PDF API that enables Java applications to read, write, save and print PDF documents without using Adobe Acrobat. Using this Java PDF component, developers and programmers can implement rich capabilities to create PDF files from scratch or process existing PDF file. Let us show you how to extract data from PDF files and then store them to Excel worksheets from the following aspects:
Export table data from PDF to Excel
Install Spire.PDF for Java
First of all, you're required to add the Spire.Pdf.jar file as a dependency in your Java program. The JAR file can be downloaded from this link. If you use Maven, you can easily import the JAR file in your application by adding the following code to your project's pom.xml file.
<repositories>
<repository>
<id>com.e-iceblue</id>
<name>e-iceblue</name>
<url>https://repo.e-iceblue.com/nexus/content/groups/public/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>e-iceblue</groupId>
<artifactId>spire.pdf</artifactId>
<version>9.7.0</version>
</dependency>
</dependencies>
Convert PDF to Excel
The following are the steps to convert a PDF document to Excel:
- Initialize an instance of PdfDocument class.
- Load the PDF document using PdfDocument.loadFromFile(String) method.
- Save the document to Excel using PdfDocument.saveToFile(String, FileFormat) method.
import com.spire.pdf.FileFormat;
import com.spire.pdf.PdfDocument;
public class PdftoExcel {
public static void main(String[] args) throws Exception {
//Initialize an instance of PdfDocument class
PdfDocument pdf = new PdfDocument();
//Load the PDF document
pdf.loadFromFile("Sample.pdf");
//Save the PDF document to XLSX
pdf.saveToFile("PdfToExcel.xlsx", FileFormat.XLSX);
}
}
Export table data from PDF to Excel
When you convert the whole PDF file to Excel, you may find that the boarders are disappeared and get the other data you don’t want. If you want to remain all the styles on the Excel, you only extract the date in tables from a PDF page and export them as individual Excel worksheets.
import com.spire.pdf.PdfDocument;
import com.spire.pdf.utilities.PdfTable;
import com.spire.pdf.utilities.PdfTableExtractor;
import com.spire.xls.ExcelVersion;
import com.spire.xls.Workbook;
import com.spire.xls.Worksheet;
public class ExtractTableDataAndSaveInExcel {
public static void main(String[] args) throws Exception {
//Load a sample PDF document
PdfDocument pdf = new PdfDocument("Sample1.pdf");
//Create a PdfTableExtractor instance
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
//Extract tables from the first page
PdfTable[] pdfTables = extractor.extractTable(0);
//Create a Workbook object,
Workbook wb = new Workbook();
//Remove default worksheets
wb.getWorksheets().clear();
//If any tables are found
if (pdfTables != null && pdfTables.length > 0) {
//Loop through the tables
for (int tableNum = 0; tableNum < pdfTables.length; tableNum++) {
//Add a worksheet to workbook
String sheetName = String.format("Table - %d", tableNum + 1);
Worksheet sheet = wb.getWorksheets().add(sheetName);
//Loop through the rows in the current table
for (int rowNum = 0; rowNum < pdfTables[tableNum].getRowCount(); rowNum++) {
//Loop through the columns in the current table
for (int colNum = 0; colNum < pdfTables[tableNum].getColumnCount(); colNum++) {
//Extract data from the current table cell
String text = pdfTables[tableNum].getText(rowNum, colNum);
//Insert data into a specific cell
sheet.get(rowNum + 1, colNum + 1).setText(text);
}
}
//Auto fit column width
for (int sheetColNum = 0; sheetColNum < sheet.getColumns().length; sheetColNum++) {
sheet.autoFitColumn(sheetColNum + 1);
}
}
}
//Save the workbook to an Excel file
wb.saveToFile("ExportTableToExcel1.xlsx", ExcelVersion.Version2016);
}
}
Conclusion
In this article, we have demonstrated how to Export the date in PDF table and then store it to Excel using Java. With Spire.PDF for Java, we could also extract all the texts and images from PDF file for different scenarios. You can check the PDF forum for more features to operate the PDF files.
Top comments (1)
nice explanation