DEV Community

scndry
scndry

Posted on • Edited on

Stop Treating Spreadsheets Like Spreadsheets — Treat Them Like JSON

Every Java developer has written this code:

Workbook wb = new XSSFWorkbook(file);
Sheet sheet = wb.getSheetAt(0);
for (Row row : sheet) {
    String name = row.getCell(0).getStringCellValue();
    int qty = (int) row.getCell(1).getNumericCellValue();
    // ... 20 more fields
}
Enter fullscreen mode Exit fullscreen mode

Cell by cell. Column index by column index. Cast by cast. For every single spreadsheet your application touches.

What if you could do this instead?

SpreadsheetMapper mapper = new SpreadsheetMapper();
List<Employee> employees = mapper.readValues(file, Employee.class);
Enter fullscreen mode Exit fullscreen mode

That's it. Same API as Jackson's ObjectMapper. Because a spreadsheet row IS a JSON object.

Introducing jackson-dataformat-spreadsheet

A Jackson extension module that treats XLSX/XLS as just another data format — like JSON, CSV, or XML.

GitHub: jackson-dataformat-spreadsheet

Reading

@DataGrid
public class Employee {
    private String name;
    private String department;
    private int salary;
    // getters, setters
}

SpreadsheetMapper mapper = new SpreadsheetMapper();

// Single row
Employee first = mapper.readValue(file, Employee.class);

// All rows
List<Employee> all = mapper.readValues(file, Employee.class);

// Specific sheet
SheetInput<File> input = SheetInput.source(file, "Payroll");
List<Employee> payroll = mapper.readValues(input, Employee.class);
Enter fullscreen mode Exit fullscreen mode

Writing

List<Employee> employees = ...;
mapper.writeValue(file, employees, Employee.class);
Enter fullscreen mode Exit fullscreen mode

That produces a proper XLSX file with headers and typed cells.

Nested Objects — The Killer Feature

Spreadsheets are flat. POJOs are not. Most Excel libraries force you to flatten everything manually. This library does it automatically:

┌─────┬──────┬─────────┬────────────────┬─────────────┬────────┐
│ ID  │ NAME │ ZIPCODE │ ADDRESS LINE 1 │ DESIGNATION │ SALARY │
├─────┼──────┼─────────┼────────────────┼─────────────┼────────┤
│ 1   │ John │ 12345   │ 123 Main St.   │ CEO         │ 300000 │
└─────┴──────┴─────────┴────────────────┴─────────────┴────────┘
Enter fullscreen mode Exit fullscreen mode
@DataGrid
class Employee {
    int id;
    String name;
    Address address;
    Employment employment;
}

class Address {
    String zipcode;
    String addressLine1;
}

class Employment {
    String designation;
    long salary;
}
Enter fullscreen mode Exit fullscreen mode

No configuration needed. The nested POJO structure defines the column layout. Read and write — both directions work.

How It's Built

This isn't a POI wrapper. It extends Jackson's streaming layer directly:

  • SheetParser extends ParserMinimalBase — pulls tokens from StAX
  • SheetGenerator extends GeneratorBase — pushes cells via POI
  • SpreadsheetFactory extends JsonFactory — creates parsers/generators

The read path parses OOXML XML directly via StAX. No XMLBeans, no SAX callbacks, no intermediate DOM. A pull parser feeding a pull parser — naturally composable.

Jackson (pull)       SheetParser (pull)      StAX (pull)
    │                      │                      │
    ├─ nextToken() ───────►├─ next() ────────────►├─ next()
    │◄─ VALUE_STRING ──────┤◄─ CELL_VALUE ────────┤◄─ START_ELEMENT
Enter fullscreen mode Exit fullscreen mode

Performance

Benchmarked against popular alternatives on realistic data (100K rows, 6 columns, mixed types, shared string table):

┌──────────────────────┬───────────┬─────────┐
│       Library        │ Read Time │ Memory  │
├──────────────────────┼───────────┼─────────┤
│ jackson-spreadsheet  │ 215 ms    │ 612 MB  │
├──────────────────────┼───────────┼─────────┤
│ FastExcel            │ 213 ms    │ 418 MB  │
├──────────────────────┼───────────┼─────────┤
│ EasyExcel            │ 261 ms    │ 397 MB  │
├──────────────────────┼───────────┼─────────┤
│ Apache POI UserModel │ 1215 ms   │ 2333 MB │
└──────────────────────┴───────────┴─────────┘
Enter fullscreen mode Exit fullscreen mode

Competitive with FastExcel on throughput, 5x faster than POI UserModel. The memory overhead is Jackson's data-binding cost — the price of getting type-safe POJOs instead of raw strings.

Annotations

Control the schema with @DataGrid and @DataColumn:

@DataGrid
class Product {
    @DataColumn("Product Name")
    String name;

    @DataColumn(value = "Price", style = "currency")
    double price;

    @DataColumn(merge = OptBoolean.TRUE)
    String category;
}
Enter fullscreen mode Exit fullscreen mode

What's Next

The long-term goal is to contribute this upstream as an official Jackson dataformat module (like jackson-dataformat-csv or jackson-dataformat-xml). For now, it's available on GitHub Packages:

The long-term goal is to contribute this upstream as an official Jackson dataformat module (like jackson-dataformat-csv or jackson-dataformat-xml). Available on Maven Central:

<dependency>
    <groupId>io.github.scndry</groupId>
    <artifactId>jackson-dataformat-spreadsheet</artifactId>
    <version>1.0.2</version>
</dependency>
Enter fullscreen mode Exit fullscreen mode

Requirements

  • Java 8+
  • Jackson 2.14.0+
  • Apache POI 4.1.1+ (Strict OOXML requires 5.1.0+)

Links


Feedback, issues, and stars welcome. If you've ever cursed at row.getCell(17).getStringCellValue(), this is for you.

Top comments (0)