Apache SeaTunnel

Posted on Apr 3

Rethinking ClassLoader Governance in Apache SeaTunnel

#classloader #apacheseatunnel #ai #programming

Recently, while diving into the Apache SeaTunnel Zeta Engine codebase, I followed the ClassLoader thread and conducted a relatively systematic review.

Overall, the current design already has a clear foundational structure, especially the centralized management approach of ClassLoaderService, which is actually quite rare among similar systems 👍.

Here, I try to take a different perspective—starting from “ClassLoader governance in long-running runtimes”—to summarize some observations and outline a possible evolution path. These may not be entirely accurate, but are intended to spark discussion.

From “Usable” to “Governable”

Apache SeaTunnel already supports well: multi-connector coexistence and dynamic loading and execution. From a “functional availability” perspective, the mechanism works. But if we move one step further and ask: can ClassLoaders have a controllable lifecycle and verifiable reclamation? the evaluation criteria begin to change.

Observations (Runtime-Oriented)

1. The Semantic Gap Between “Release” and “Close”

Currently, releaseClassLoader() removes cache entries and performs some thread-level cleanup when the reference count drops to zero, but it does not explicitly call URLClassLoader.close(). For example: DefaultClassLoaderService.releaseClassLoader() (no close call observed) and DefaultClassLoaderService.close() mainly clears internal cache structures. This raises a noteworthy point: JAR handle release depends on GC timing, and in long-running scenarios or on certain platforms (such as Windows), files may not be released promptly. 👉 This is closer to “logical release” rather than “end of resource lifecycle”.

2. Class Loading Boundaries Can Still Change at Runtime

In some paths, dependencies are still injected into the current ClassLoader via addURL, such as: reflective calls to addURL in AbstractPluginDiscovery, and plugin dependency injection into the current loader in Flink execution paths. This leads to an interesting phenomenon: class loading boundaries are not only defined by loader structure, but also influenced by runtime behavior. While not problematic for a single job, under scenarios like repeated jobs in the same process or switching plugin combinations, boundaries may accumulate “historical residue”.

3. Some Residual Surfaces Are Not Fully Closed

There are multiple TCCL usage patterns in the codebase (synchronous / asynchronous / cross-thread), and some paths show: TCCL not restored in finally, or inconsistent baselines during cross-thread restoration. For example: TCCL usage in cooperative workers within TaskExecutionService, and asymmetric restoration in some operations (such as source / restore). Additionally, some typical ClassLoader retention points are not yet uniformly governed, such as JDBC Driver registration (e.g., TDengine-related implementations) and connectors directly setting TCCL without restoring it.

A Possible Evolution Path (For Reference)

Based on these observations, I’ve outlined a progressive governance path that avoids large-scale refactoring and can be implemented in phases.

Phase 1: Close the ClassLoader Lifecycle

Key ideas: explicitly call close() on URLClassLoaders created by SeaTunnel at the appropriate time, and define clear ownership—“who creates, who closes”. This shifts from “GC-dependent release” to “controlled release”.

Phase 2: Stabilize Loading Boundaries

Goals: avoid runtime addURL where possible, and determine the full classpath before loader creation. This ensures consistent behavior of the same loader over time.

Phase 3: Consolidate Common Residual Points

Standardize patterns such as: wrapping TCCL with try-with-resources, pairing JDBC Driver registration and deregistration, and clearly assigning ClassLoader ownership to threads and ThreadLocal. This turns implicit references into manageable resources.

Phase 4: Introduce Verifiable Reclamation

As an enhancement: use WeakReference + ReferenceQueue to track loaders, or expose simple runtime metrics (e.g., number of live loaders). The goal is not absolute precision, but the ability to reasonably judge whether resources have been released.

Why This Matters

These issues rarely surface in short-lived tasks. But in scenarios such as long-running engine nodes, repeated task scheduling, or frequent plugin switching, these boundary issues accumulate over time. The results may include Metaspace growth, inability to replace JARs, and occasional class conflicts.

One-Sentence Summary

From “class isolation” to “governable ClassLoaders with verifiable reclamation.”

The above reflects my current understanding and organization of the topic. Some points may not be entirely accurate—feedback and real-world scenarios are very welcome 🙌. If the community is interested, this could evolve into a more general and reusable infrastructure capability.

Appendix: Code References

Some code locations noted during analysis (not exhaustive): DefaultClassLoaderService (release/close), AbstractPluginDiscovery (addURL), Flink starter execution paths (plugin injection), TaskExecutionService (TCCL usage), various operations (source/restore), and connectors (Iceberg / Paimon / TDengine, etc.).

DEV Community