How try-with-resources Quietly Leaked Disk in Production

#java #apachepoi #sre #performance

We got paged because a pod was sitting at 84% RSS, dangerously close to its 2 GiB limit. The JVM heap was a calm ~260 MiB. So something was eating memory outside the heap, and that's where it got interesting

What we found

Digging through native memory first, then into the filesystem:

$ du -sh /tmp/poifiles
1.9G    /tmp/poifiles

$ ls /tmp/poifiles | wc -l
10046

10k temporary XLSX fragments piled up in /tmp, accumulating at roughly 800 per day per pod. The pod hadn't restarted in 6 days. Another pod was already past 10,103 files. Multiply by the cluster and you can see how a node fills up without anyone noticing

The culprit

Apache POI's SXSSFWorkbook. We were doing the obvious thing:

try (var wb = new SXSSFWorkbook(...)) {
    writeWorkbook(wb, ...);
}

Looks correct. SXSSFWorkbook is AutoCloseable, so try-with-resources should clean up after us. But it doesn't

close() only releases streams. The temp files SXSSF spills to disk are deleted by a separate dispose() call — and we never made it there. Throw in the occasional I/O exception during the write path and you get a slow, silent leak that nobody sees until /tmp on the node is full

The fix

One line, once you know it exists:

SXSSFWorkbook wb = new SXSSFWorkbook(template, ROW_ACCESS_WINDOW_SIZE);
try {
    wb.setCompressTempFiles(true);
    writeWorkbook(wb, records, schemaClass, xlsxProfile, os);
} finally {
    disposeQuietly(wb);
}

Why disk leaks are scarier than memory leaks

A memory leak hits the pod and self-heals on restart. A disk leak hits the node and takes down every container sharing it. Same code, very different blast radius.

Two more things on top of the fix

A one-line fix isn't enough. The next person on the team will write the same try-with-resources pattern. So we added two guardrails

A unit test that forces the failure path

It opens an OutputStream that throws mid-write, then asserts the temp dir is empty after the export blew up:

@Test
void writeShouldDisposePoiTempFilesEvenIfOutputStreamFails(@TempDir Path poiTempDir) throws IOException {
    TempFile.setTempFileCreationStrategy(new DefaultTempFileCreationStrategy(poiTempDir.toFile()));
    var records = List.of(/* ... */);

    try (OutputStream failing = new OutputStream() {
        @Override public void write(int b) throws IOException { throw new IOException("boom"); }
    }) {
        assertThatThrownBy(() -> xlsxStreamingFileService.write(records, failing, MySchemaDto.class))
            .isInstanceOf(ServiceException.class);
    }

    assertNoPoiSheetFilesLeft(poiTempDir);
}

An ArchUnit rule that pins SXSSFWorkbook to one package

So the next person can't accidentally use it raw somewhere else in the codebase:

@ArchTest
static final ArchRule SXSSF_WORKBOOK_USAGE_LIMITED_TO_FILE_PACKAGE =
    noClasses()
        .that().resideOutsideOfPackage("..file..")
        .should().dependOnClassesThat()
        .haveFullyQualifiedName("org.apache.poi.xssf.streaming.SXSSFWorkbook")
        .because("SXSSFWorkbook leaks tmp files without an explicit dispose(); use only through the file package");

The combo is what I actually trust. The test proves the current code is correct, the ArchUnit rule keeps it that way as the codebase grows.

"Just upgrade POI, the newer version handles this"

Maybe. But you don't always know you have the bug in the first place, and bumping a heavy library in a live system is rarely a one-line PR. Defensive code, a test, and an architectural rule cost an hour and protect you regardless of which POI version ships next year