Introduction: It’s Not Reflection—It’s Bytecode
I’ve been developing in Java for several years—from small utilities to high-load services in distributed systems. In such systems, even a few seconds of downtime are unacceptable: we’re talking about SLAs of 99.99% and higher. To guarantee this level of reliability, I’ve had to do more than just write code—I’ve needed to deeply understand how it actually works inside the JVM.
This led me to load testing, memory dump analysis, and profiling using agents like Async-Profiler and OpenTelemetry—and ultimately to the question: “How does Spring inject transactions without my involvement? Why can Mockito mock a class with no constructor? And how does Lombok ‘add’ methods that don’t exist in the source code?”
I used to think it was magic too. But after diving into the architecture of the Java Virtual Machine (JVM), I realized everything is far more logical than it appears. In this article, we’ll walk step by step through how the JVM loads classes, verifies their correctness, and how you can modify their behavior at runtime.
We’ll explore the full lifecycle of a class—from loading into the JVM to bytecode manipulation. We’ll understand how Class Loading works, why verification is necessary, and how tools like ASM, Byte Buddy, and Javassist enable us to move from passive code consumption to active transformation. We’ll also examine how Lombok, Mockito, and Spring AOP leverage these mechanisms—without any magic at all.
1. Class Loading: Delegation, Isolation, Customization
The process of loading a class into the JVM begins long before the first bytecode instruction is executed. This process, known as Class Loading, is one of the earliest stages in a class’s lifecycle. The key component of this system is java.lang.ClassLoader
, an abstract class responsible for locating and loading the binary representation of a class (in .class
format) and converting it into an instance of java.lang.Class
.
The central principle of the ClassLoader architecture is the delegation model. When a ClassLoader instance receives a request to load a class by name, it first delegates the request to its parent ClassLoader. Only if the parent cannot locate the class does the current ClassLoader attempt to load it itself. This hierarchy ensures security and prevents substitution of core system classes.
The traditional hierarchy consists of three levels:
-
Bootstrap ClassLoader (Primordial ClassLoader): sits at the top of the hierarchy and is part of the JVM itself. Implemented in native code (C/C++), it loads the most fundamental Java platform classes—those found in
rt.jar
(or in modules starting with Java 9+), such asjava.lang.Object
,java.lang.String
, and others. It cannot be directly accessed from user code. -
Platform ClassLoader (formerly Extension ClassLoader): introduced in Java 9+, it loads classes from platform modules. Prior to Java 9, this role belonged to the Extension ClassLoader, which loaded JAR files from the extension directory (
jre/lib/ext
). -
System ClassLoader (Application ClassLoader): this is the default ClassLoader for applications. It loads classes from paths specified in the
CLASSPATH
environment variable, the-classpath
command-line argument, or the default directory. This is the ClassLoader used to load the application’s main class specified in thejava
command.
Developers can extend this model by creating custom ClassLoader
implementations. This is typically done by subclassing ClassLoader
and overriding one of its methods. The most common approach is to override the findClass(String name)
method. This preserves the standard delegation model (implemented in loadClass(String name)
) while allowing customization of how the class’s byte representation is located and retrieved.
The loadClass(String name)
method serves as the entry point for class loading. It implements the delegation algorithm: first checking if the class has already been loaded (caching), then delegating to the parent, and only calling findClass(name)
if the class remains unfound. The defineClass(byte[] b, int off, int len)
method is a protected utility that actually converts a byte array into a Class
instance. It must be invoked from within findClass
or loadClass
after the bytes have been obtained. Importantly, defineClass
does not validate the bytecode—that responsibility falls to the subsequent linking phase.
Creating custom ClassLoaders has many practical applications but also carries risks. Poor implementations can lead to memory leaks. Additionally, each new ClassLoader creates an isolated namespace for classes, which may result in ClassNotFoundException
or NoClassDefFoundError
if dependencies aren’t properly resolved. Thus, working with ClassLoaders demands not only technical knowledge but also a deep understanding of application architecture.
The idea of dynamically generating classes—once considered a niche technique—is now becoming part of the platform itself. The Unnamed Classes feature introduced in the recently released Java 25 allows running Java script files (.jsh
), which are compiled on-the-fly into anonymous classes and loaded by the system ClassLoader via the defineClass
mechanism. This transforms what was once an advanced, specialized approach into a mainstream execution path for specific use cases.
2. Linking Phase: Verification, Preparation, Resolution
After the ClassLoader
successfully loads the binary data of a class and creates an instance of java.lang.Class
, the class does not immediately become executable. Before its methods can be invoked or its fields accessed, the class must undergo the linking process. This multi-stage procedure—defined in the JVM specification (JLS §5.4)—consists of three sequential phases: Verification, Preparation, and Resolution. These steps are critical for ensuring the stability, security, and integrity of the Java runtime environment.
Verification: The Guardian of JVM Integrity
Verification is the most complex and crucial phase of linking. Its purpose is to ensure that the class is safe and does not violate JVM invariants. It checks the structure of the .class
file, type safety, operand stack balance, correctness of references, and access control.
The verification process is performed by internal JVM components such as ClassFileParser
and includes several levels of analysis:
-
Structural Verification – validates the format of the
.class
file, version numbers, constant pool size, etc. - Bytecode Verification – analyzes the bytecode stream of each method.
Starting with Java 7, bytecode verification was significantly optimized through stack map frames—metadata inserted by the compiler into the Code
attribute that explicitly describes the types of local variables and the operand stack at key points in the code. This allows the verifier to perform a single-pass check, greatly accelerating application startup.
Here is an example of invalid bytecode that would fail verification:
aload_0 // Push 'this' (an object reference) onto the stack
iconst_0 // Push int 0 onto the stack
invokevirtual java/io/PrintStream.println:(Ljava/lang/String;)V
This code attempts to call println(String)
with an int
argument, violating the method’s signature. The verifier will detect the type mismatch on the operand stack and throw a java.lang.VerifyError
.
Preparation: Memory Allocation for Static Members
During the Preparation phase, the JVM allocates memory for all static fields of the class. At this stage, fields are initialized only to their default values (0
for numeric types, false
for boolean
, null
for reference types). Initializers like static int x = 5;
are executed later—in the <clinit>
method (the static initializer).
This phase also includes setting up internal JVM data structures used to represent the class, such as virtual method tables (vtables) and interface method tables (itables).
Resolution: Resolving Symbolic References
In this phase, the JVM converts symbolic references from the constant pool (class names, method names, field names) into direct references to actual runtime entities in memory. Resolution is typically performed lazily—upon the first use of a reference.
The main types of resolvable references include:
- Classes and interfaces – loaded by name;
- Fields – located and access-checked;
- Methods – resolved by name and descriptor, with checks for overriding validity and accessibility.
If any error occurs during resolution—for example, a class is missing, a method doesn’t exist, or access rules are violated—the JVM throws an appropriate error: NoClassDefFoundError
, NoSuchFieldError
, NoSuchMethodError
, IllegalAccessError
, etc.
Once linking completes successfully, the class is technically ready for use—but its static initialization (<clinit>
) is deferred until the first active use of the class.
3. Bytecode and Class File Format: The Structure of .class
Understanding class loading and linking is impossible without a deep dive into the very entity they operate on—the .class
file format. This binary format, strictly defined in the JVM Specification, serves as the universal carrier of compiled Java code. It is independent of the source language (Kotlin, Scala, Groovy also compile to this format) and the target platform’s architecture, enabling the famous "write once, run anywhere" paradigm. Examining its structure reveals exactly how the JVM interprets and verifies code.
ClassFile Structure
A .class
file begins with the magic number 0xCAFEBABE
, followed by version information, the constant pool, and class metadata:
ClassFile {
u4 magic;
u2 minor_version, major_version;
cp_info constant_pool[...];
u2 access_flags, this_class, super_class;
u2 interfaces[...];
field_info fields[...];
method_info methods[...];
attribute_info attributes[...];
}
Key components:
-
minor_version
,major_version
: define the file format version. For example,major_version=52
corresponds to Java 8. The JVM will refuse to load classes with a higher version. -
constant_pool
: the central repository for class names, method names, strings, and type descriptors. All other structures reference it by index. -
fields
,methods
: described by name, descriptor, and attributes. -
attributes
: optional data sections. The most important include:-
Code
: contains the method’s bytecode, max stack size, local variable count, and exception handling table. -
LineNumberTable
/LocalVariableTable
: used for debugging. -
RuntimeVisibleAnnotations
: stores annotations accessible via reflection.
-
Bytecode: The Language of a Stack Machine
Bytecode is a set of instructions executed by the Java Virtual Machine. The JVM is a stack-based machine, meaning most operations pass data through an operand stack rather than CPU registers.
Each instruction consists of a single byte (the opcode) and, optionally, operands. Consider this method:
public int add(int a, int b) {
return a + b;
}
Its bytecode (as shown by javap -c
):
Code:
0: iload_1
1: iload_2
2: iadd
3: ireturn
-
iload_1
: loads the value from local variable index 1 (a
) onto the operand stack. -
iload_2
: loads the value from local variable index 2 (b
) onto the stack. -
iadd
: pops twoint
values from the stack, adds them, and pushes the result back. -
ireturn
: pops theint
result from the stack and returns it.
This example shows how the high-level expression a + b
is translated into a sequence of stack operations. During the linking phase, the verifier must ensure the operand stack is always in a consistent state—for instance, that exactly two int
values are present before iadd
executes.
Understanding the .class
file structure and bytecode semantics is an absolute prerequisite for any code manipulation. Tools like ASM work directly with these low-level structures, reading and modifying byte streams in strict compliance with the format specification. Any intervention at this level requires full awareness of how changes will affect verification and subsequent execution.
Language evolution continuously shapes the .class
format. For example, the introduction of Unnamed Variables (JEP 477) (var _ = ...
) in Java 25 requires special handling by the compiler. Although the variable _
is unused, it must still appear in the LocalVariableTable
attribute of the Code
section—but with special flags indicating its "unnamed" status. This illustrates how even seemingly simple language features trigger changes in JVM’s low-level data structures, which must be correctly processed and verified.
4. Bytecode Manipulation: ASM, Byte Buddy, Javassist
In previous sections, we examined how the JVM consumes bytecode. Now it’s time to move to the next level—active creation and modification of this code. Bytecode manipulation is a powerful technique that allows programmatic alteration of class behavior before or during execution. This isn’t just “reflection on steroids”—it’s direct intervention into the essence of executable code. Specialized frameworks exist for this purpose, each offering its own level of abstraction and approach.
ASM: Low-Level Control
ASM is the most popular and performant bytecode manipulation framework. It operates at the lowest possible level, providing direct access to bytecode instructions and .class
file structures. ASM uses the Visitor pattern, where ClassVisitor
, MethodVisitor
, and other components traverse class elements, allowing developers to insert, remove, or replace instructions.
ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
ClassVisitor cv = new MyClassVisitor(cw);
// When visiting the 'toString' method
@Override
public MethodVisitor visitMethod(int access, String name, String descriptor,
String signature, String[] exceptions) {
MethodVisitor mv = cv.visitMethod(access, name, descriptor, signature, exceptions);
if ("toString".equals(name)) {
return new AddLoggingAdviceAdapter(mv); // Insert logging
}
return mv;
}
// Adapter that adds code before and after the method
class AddLoggingAdviceAdapter extends AdviceAdapter {
protected AddLoggingAdviceAdapter(MethodVisitor mv, int access, String name, String desc) {
super(ASM9, mv, access, name, desc);
}
@Override
protected void onMethodEnter() {
mv.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;");
mv.visitLdcInsn("Entering toString()");
mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false);
}
}
Pros of ASM: Maximum flexibility and performance, minimal overhead.
Cons: Complex API, requires deep understanding of bytecode and stack map frames. A stack management error will cause a VerifyError
.
Byte Buddy: A Modern DSL over ASM
Byte Buddy positions itself as a modern alternative to CGLIB and a more convenient wrapper over ASM. It provides a high-level, type-safe DSL that lets you describe transformations almost as if writing Java, hiding the complexities of bytecode manipulation.
new ByteBuddy()
.redefine(User.class)
.method(named("toString"))
.intercept(Advice.to(LoggingInterceptor.class))
.make()
.load(getClass().getClassLoader(), ClassLoadingStrategy.Default.WRAPPER);
Where @Advice.OnMethodEnter
in LoggingInterceptor
is automatically converted into the corresponding bytecode instructions.
Pros of Byte Buddy: Clean, readable API; excellent support for new Java features (modules, records, sealed classes); powerful dynamic proxy capabilities.
Cons: Depends on ASM; some overhead due to abstractions.
Javassist: Source-Like Code Manipulation
Javassist (Java Programming Assistant) offers the highest level of abstraction. It allows developers to write modifications as strings of Java source code, which are then compiled into bytecode on the fly.
CtClass ctClass = ClassPool.getDefault().get("com.example.User");
CtMethod ctMethod = ctClass.getDeclaredMethod("toString");
ctMethod.insertBefore("{ System.out.println(\"Entering toString\"); }");
byte[] modifiedBytes = ctClass.toBytecode();
Pros of Javassist: Extremely easy to use, especially for quick PoCs.
Cons: Least performant, limited flexibility, may produce suboptimal bytecode. Not always compatible with the latest JVM versions.
Choosing the Right Tool
The choice depends on the task:
- ASM: When maximum performance and full control are required (e.g., APM agents).
- Byte Buddy: For most modern tasks, especially when support for new standards and code readability matter.
- Javassist: For simple tasks or when rapid implementation is needed.
All these tools generate bytecode that must pass JVM verification. Their effectiveness lies in “knowing the rules” and correctly generating stack map frames
to avoid VerifyError
.
5. Runtime Instrumentation: java.lang.instrument and JVMTI
So far, we’ve treated bytecode manipulation as a process that precedes or coincides with class loading. However, the most powerful and flexible capabilities emerge when changes are applied at runtime, without restarting the application. This is achieved through two primary mechanisms: the high-level java.lang.instrument
API and the low-level JNI-based JVMTI (JVM Tool Interface).
java.lang.instrument
: Dynamic Transformation
The java.lang.instrument
package, introduced in Java 5, provides a standard way for agents to attach to the JVM and modify bytecode on the fly. An agent is a special JAR file attached to the JVM either at startup or dynamically during execution.
Premain Agent (-javaagent
):
An agent attached at JVM startup using the -javaagent:my-agent.jar
flag. It must contain a public static void premain(String agentArgs, Instrumentation inst)
method. At this stage, you can register a ClassFileTransformer
, which is invoked before each class is loaded.
public class MyAgent {
public static void premain(String agentArgs, Instrumentation inst) {
inst.addTransformer(new MyClassFileTransformer());
}
}
class MyClassFileTransformer implements ClassFileTransformer {
@Override
public byte[] transform(ClassLoader loader, String className,
Class<?> classBeingRedefined,
ProtectionDomain protectionDomain,
byte[] classfileBuffer) throws IllegalClassFormatException {
// Modify classfileBuffer using ASM/Byte Buddy
if (className.equals("com/example/Service")) {
return modifyWithByteBuddy(classfileBuffer);
}
return null; // No modification
}
}
Agent-main (agentmain
) and Dynamic Loading:
Starting with Java 6, agents can be loaded dynamically into a running JVM using com.sun.tools.attach.VirtualMachine
.
VirtualMachine vm = VirtualMachine.attach("1234"); // PID of the process
vm.loadAgent("/path/to/my-agent.jar", "optional args");
vm.detach();
For this, the agent must have a public static void agentmain(String agentArgs, Instrumentation inst)
method. This mechanism is used by many profilers (e.g., YourKit, Async-Profiler) and monitoring systems.
Key limitation of java.lang.instrument
: You can add fields and methods or modify method bodies, but you cannot delete or change the signatures of existing fields/methods after a class has been loaded. Changes apply only to new classes or via retransformClasses()
.
JVMTI: Native Control over the JVM
JVMTI (JVM Tool Interface) is a native (C/C++) interface that provides the highest level of control over a running JVM. It succeeds older interfaces like JPDA (Java Platform Debugger Architecture). Through JVMTI, you can not only transform but also redefine and retransform already-loaded classes, and obtain detailed information about memory, threads, garbage collection, etc.
-
RedefineClasses
: Replaces the implementation of one or more already-loaded classes with new bytecode versions. Restrictions: you cannot change method signatures, inheritance hierarchy, or fields. -
RetransformClasses
: Forces the JVM to re-applyClassFileTransformer
logic even to already-loaded classes, enabling new transformation rules to be applied globally.
// Simplified C example
jvmtiError error = jvmti->RedefineClasses(jvmti, 1, &classDefinition);
Where is JVMTI used?
- Interactive debuggers (e.g., IntelliJ IDEA Debugger).
- Production profilers (YourKit, JProfiler) that collect performance data with minimal overhead.
- HotSwap tools (though standard IDE HotSwap is limited).
- Security and monitoring systems requiring deep introspective access.
JVMTI is significantly more complex and dangerous than java.lang.instrument
. An error in native code can crash the entire JVM. However, it provides unprecedented control, essential for professional diagnostic tools. These mechanisms demonstrate that the JVM is not just a runtime environment but an open platform for metaprogramming—where program behavior can be altered externally based on its internal state.
6. How Lombok, Spring, and Mockito Work
The theory of bytecode manipulation becomes truly tangible when we examine its use in widely adopted frameworks and libraries. These tools don’t rely on “magic”—they leverage deep JVM mechanisms we’ve already discussed. Understanding their internals reveals the practical value of knowledge about Class Loading and bytecode manipulation.
Lombok: Code Generation Behind the Scenes
Project Lombok is known for adding methods (toString
, equals
, getter
, setter
) via annotations. Its implementation combines two approaches:
-
Annotation Processing (AP): At compile time, when
javac
processes source code, Lombok registers its ownProcessor
. This processor inspects the AST (Abstract Syntax Tree) and inserts nodes for missing methods directly into the syntax tree before bytecode is generated. This occurs within the standard compilation process. -
Javac Internals / Instrumentation: For certain features (e.g.,
@SneakyThrows
) or IDE integration, Lombok directly interacts withjavac
’s internal APIs (packagecom.sun.tools.javac.*
) to modify the AST. At runtime, in some configurations, an agent (-javaagent
) may be used for introspection, but the core work happens at compile time.
Key point: Lombok doesn’t generate separate .class
files—it modifies the compilation of the current file. The resulting bytecode is identical to what would be produced if the methods were written manually, so it passes verification effortlessly.
Spring AOP and CGLIB: Proxying Without Interfaces
The Spring Framework heavily uses Aspect-Oriented Programming (AOP) to inject cross-cutting concerns (transaction management, security, logging). When a bean doesn’t implement an interface but a proxy is needed, Spring uses CGLIB.
- CGLIB (Code Generation Library): This library (built on ASM) dynamically creates a subclass of the target class at runtime.
-
Mechanism: CGLIB analyzes the original class’s bytecode, generates a new class that extends it, and overrides all
public
andprotected
methods. In the overridden methods, it inserts calls to aCallback
(e.g.,MethodInterceptor
) that manages the original method invocation and aspect execution.
// Pseudocode of what CGLIB generates
public class UserService$$EnhancerByCGLIB extends UserService {
private Callback callback;
@Override
public void save(User user) {
callback.intercept(this, saveMethod, new Object[]{user}, null);
}
}
This subclass is generated in memory, its bytecode is passed to ClassLoader#defineClass
, and an instance of this class is used instead of the original. This approach allows Spring to apply AOP to any class—but requires that the class and its methods are not final
.
Mockito: Creating Mocks Out of Thin Air
Mockito—one of the most popular testing libraries—also relies on bytecode manipulation. When you write mock(MyUserService.class)
, Mockito must create an object that:
- Is an instance of
MyUserService
(or its subclass), - Doesn’t execute real logic,
- Can record calls and return stubs.
To achieve this, Mockito uses Objenesis and Byte Buddy:
-
Instance creation: Objenesis uses low-level JVM capabilities (often via
sun.misc.Unsafe.allocateInstance()
) to instantiate a class without invoking its constructor. -
Proxy generation: Byte Buddy generates a subclass of the mocked class. All method calls on the mock are intercepted, recorded, and handled according to configured behavior (e.g.,
when(...).thenReturn(...)
). This entire process occurs at runtime during test execution and is fully based onClassLoader
andInstrumentation
capabilities.
Observability Agents (OpenTelemetry, Micrometer)
Modern APM systems attach as agents (-javaagent
) and register a ClassFileTransformer
. When classes matching certain patterns are loaded (e.g., methods annotated with @RequestMapping
, @Timed
, or HTTP client entry points), the agent modifies their bytecode to insert calls to tracing or metrics SDKs. This enables telemetry collection without modifying the application’s source code.
Conclusion: Power and Responsibility
We’ve journeyed from class loading to runtime modification.
We’ve seen how verification protects the JVM, how .class
stores code, and how ASM, Byte Buddy, and JVMTI grant us control over it.
These mechanisms form the foundation of the modern Java ecosystem. Without them, tools like Spring (with its proxying and dependency injection), Mockito (with its flexible mocks), Lombok (with its concise annotations), or powerful APM systems (which provide deep performance insights without code changes) wouldn’t exist. They transform Java from a static language into a platform rich with metaprogramming capabilities.
However, with great power comes great responsibility:
-
Performance: Every runtime intervention takes time. A poorly implemented
ClassFileTransformer
can slow application startup by orders of magnitude. -
Complexity and Debugging: Errors in generated bytecode lead to
VerifyError
orIllegalAccessError
, which are extremely hard to diagnose. Stack traces lose their connection to source code. - Memory Leaks: Custom ClassLoaders, if not properly released, retain all their loaded classes and static data in memory, causing metaspace leaks.
- Compatibility: Tools written for one JVM version may break in another.
- Security: The ability to modify any class opens the door to serious vulnerabilities if the agent or instrumentation isn’t trusted.
Nevertheless, understanding these mechanisms is essential for any developer aiming to go beyond boilerplate programming. It enables you to:
- Deeply understand the frameworks you use,
- Diagnose complex class-loading and performance issues,
- Build your own tools for monitoring, testing, and extending functionality.
P.S. Have you encountered memory leaks due to custom ClassLoaders? Or perhaps written your own monitoring agent? Share your experience in the comments—it will be valuable for everyone!
Top comments (0)