π οΈ JVM Crash During TestNG Suite Execution β Root Cause & Fix
Running large-scale UI automation suites can be tricky β especially when TestNG and Maven Surefire are involved. Recently, we hit a JVM crash during execution that took down an entire test suite. After some deep investigation with heap dumps, GC logs, and TestNG internals, hereβs the full Root Cause Analysis (RCA) and how we solved it.
β‘ The Error
[ERROR] org.apache.maven.plugins:maven-surefire-plugin:2.19.1:test failed:
The forked VM terminated without properly saying goodbye.
VM crash or System.exit called?
At first glance, this looks like a random JVM crash. But the heap dump revealed ~6GB of memory retained by org.testng.SuiteRunner
, holding thousands of TestRunner
instances β each keeping entire test classes, WebDrivers, and PageObjects alive.
π What is a Forked JVM?
Maven Surefire runs tests in a forked JVM β a separate Java process.
Why?
- Isolates tests from the main build
- Allows custom JVM args (
-Xmx
, GC options, heap dump on OOM, etc.) - Enables parallel test execution
Flow:
- Maven spawns a new JVM
- Tests run inside this forked process
- JVM args are applied via
<argLine>
inpom.xml
π TestNGβs SuiteRunner Explained
org.testng.SuiteRunner
is the heart of TestNG suite execution.
Responsibilities:
- Parse
testng.xml
- Manage
<test>
blocks viaTestRunner
- Track all test classes & methods executed
- Aggregate results (pass/fail/skip)
- Feed data to reporters/listeners
Structure:
SuiteRunner
βββ List<TestRunner>
βββ Test Class Instance
βββ WebDriver
βββ Page Objects
βββ Test Data
βββ Utilities
πΎ Why Memory Leaks Happen
-
SuiteRunner β keeps strong refs to all
TestRunner
s -
TestRunner β holds
ITestContext
,ITestResult
, and test class instance - Test Class β holds WebDriver, Page Objects, Data Models
π Until the suite ends, nothing is garbage-collected.
Result:
- 17,553
TestRunner
objects alive - Selenium WebDriver objects + DOM snapshots consume huge memory
- GC canβt reclaim β JVM crashes
π RCA Summary
Factor | Detail |
---|---|
Error | Forked VM crash (Surefire goodbye error) |
Cause | JVM ran out of memory due to retained references in SuiteRunner
|
Trigger | Large number of tests in a single suite |
Leak Source | Strong references: SuiteRunner β TestRunner β Test Class |
GC Impact | Objects never eligible for GC until JVM exits |
Result | Heap bloat, OutOfMemoryError, JVM crash |
π οΈ Fixes & Mitigation
1. Move Heavy Fields to Method Scope
Instead of keeping page objects at class level:
// β Problematic
ExternalJobPage externalJobPage;
@BeforeMethod
public void setup() {
externalJobPage = new ExternalJobPage(getDriver());
}
Use method-level objects:
// β
GC-friendly
@Test
public void testSomething() {
ExternalJobPage page = new ExternalJobPage(getDriver());
page.verifyJobDetails();
}
2. Nullify References in Cleanup Hooks
@AfterMethod
public void clean() {
driver = null;
pageObject = null;
System.gc(); // Hint GC
}
3. Aggressive Field Nullification (Final Solution)
public void tearDown() {
try {
Field[] fields = this.getClass().getDeclaredFields();
for (Field field : fields) {
if (field.getName().startsWith("ajc$") || field.getType().isPrimitive()) {
continue;
}
field.setAccessible(true);
if (!Modifier.isStatic(field.getModifiers())) {
field.set(this, null);
}
}
log.info("Cleaned up instance for class " + this.getClass().getName());
} catch (Exception e) {
log.error("Failed to tear down: {}", e.getMessage());
}
System.gc();
}
And ensure cleanup of test data:
@AfterTest
public void clearTestData() {
try {
if (TestDataContext.globalTestDataMapSize() > 1) {
TestDataContext.clearData(testCasePath);
}
tearDown();
} catch (Exception e) {
log.error("Exception while clearing test data: {}", e.getMessage());
}
}
4. Split Large Suites
- Donβt run thousands of tests in one suite
- Break into smaller
testng.xml
files
5. Upgrade Tooling
- Use Maven Surefire 3.1.2+ (better fork handling)
- Use TestNG 7.x+ (memory fixes included)
6. Explore Dependency Injection (POC Needed)
Using DI (like Guice or Spring) ensures controlled lifecycles for test objects.
β Key Takeaways
- SuiteRunner holds everything until suite ends β design your framework to release memory early.
- Avoid class-level heavy fields β use method scope.
-
Nullify aggressively in
@AfterMethod
/@AfterTest
. - Split test suites β donβt overload a single JVM.
- Upgrade Surefire + TestNG β newer versions manage memory better.
With these changes, our suite stopped crashing and memory usage dropped drastically. π
π‘ If youβre running large-scale TestNG suites with Selenium, check your heap dump once in a while. You might be surprised how much SuiteRunner
is holding on to.
Top comments (0)