|
From The Well-Grounded Java Developer, Second Edition by Benjamin Evans, Jason Clark, and Martijn Verburg This article discusses classloaders and reflection in Java. |
Take 37% off The Well-Grounded Java Developer, Second Edition by entering fccevans2 into the discount code box at checkout at manning.com.
Java is a fundamentally object-oriented system with a dynamic runtime. One of the aspects of this is that Java’s types are alive at runtime, and the type system used with a running Java platform can be modified – in particular by the addition of new types.
This means that the types that make up a Java program are open to extension by unknown types at runtime (unless they are final
or one of the new sealed
classes).
The classloading capability is exposed to the user – Java classes and the loaders which can modify the type system are themselves Java types. In modern Java environments all classloaders are modular – loading classes is always done within the context of a module.
The platform ships with a number of typical classloaders, which are used to do different jobs during the startup and normal operation of the platform.
BootstrapClassLoader
aka primordial classloader — This is instantiated early in the process of starting up the VM – and it’s usually best to think of it as being a part of the VM. It’s typically used to get the absolute basic system loaded – java.base
.
PlatformClassLoader
– After the bare minimum system has been bootstrapped then the platform classloader loads the rest of the platform modules that the application depends upon. This classloader is the primary interface to access any platform class – regardless of whether it was loaded by this loader or the bootstrap.
AppClassLoader
– the application classloader — this is the most widely used classloader. This loads the application classes and does the majority of the work in most modern Java environments. In modular environments the application class loader is no longer an instance of URLClassLoader
(as it was in Java 8 and before) but, instead it’s an instance of an internal class.
Let’s see these new classloaders in action, by adding a class called DisplayClassloaders
to the wgjd.discovery
module:
package wgjd.discovery; import com.sun.tools.attach.VirtualMachineDescriptor; public class DisplayClassloaders { public static void main(String[] args) { var clThis = DisplayClassloaders.class.getClassLoader(); System.out.println(clThis); var clObj = Object.class.getClassLoader(); System.out.println(clObj); var clAttach = VirtualMachineDescriptor.class.getClassLoader(); System.out.println(clAttach); } }
This produces the output:
$ java --add-exports=jdk.internal.jvmstat/sun.jvmstat.monitor=wgjd.discovery --module-path=out -m wgjd.discovery/wgjd.discovery.DisplayClassloaders
jdk.internal.loader.ClassLoaders$AppClassLoader@5fd0d5ae
null
jdk.internal.loader.ClassLoaders$AppClassLoader@5fd0d5ae
Notice that the classloader for Object
(which is in java.base
) reports as null
. This is a security feature – the bootstrap classloader doesn’t verify and provides full security access to every class it loads. For that reason it doesn’t make sense to have the classloader represented and available within the Java runtime – too much potential for bugs and/or abuse.
In addition to their core role, classloaders are also often used to load resources (files that aren’t classes, such as images or config files) from JAR files or other locations on the classpath.
This is often seen in a pattern that combines with try-with-resources to produce code like this:
try (var is = TestMain.class.getResourceAsStream("/resource.csv"); var br = new BufferedReader(new InputStreamReader(is));) { // ... } // Exception handling elided
The classloaders provide this mechanism in a couple of different forms – returning either a File
or an InputStream
but unfortunately not a Path
.
Custom Classloading
In more complex environments, there are often a number of additional custom classloaders – classes that subclass java.lang.ClassLoader
(directly or indirectly). This is possible because the classloader class isn’t final, and developers are, in fact, encouraged to write their own classloaders that are specific to their individual needs.
Custom classloaders are represented as Java types, and they need to be loaded by a classloader – which is usually referred to as their parent classloader. This shouldn’t be confused with class inheritance and parent classes – instead, classloaders are related by a form of delegation.
In figure 1, you can see the delegation hierarchy of classloaders, and how the different loaders relate to each other. In some special cases, a custom classloader may have a different classloader as their parent, but the usual case is that it’s the loading classloader.
Figure 1. Classloader hierarchy
The key to the custom mechanism are the methods loadClass()
and findClass()
which are defined on ClassLoader
. The main entry point is loadClass()
and a simplified form of the relevant code in ClassLoader` is:
protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException { synchronized (getClassLoadingLock(name)) { // First, check if the class has already been loaded Class<?> c = findLoadedClass(name); if (c == null) { // ... try { if (parent != null) { c = parent.loadClass(name, false); } else { c = findBootstrapClassOrNull(name); } } catch (ClassNotFoundException e) { // ClassNotFoundException thrown if class not found // from the non-null parent class loader } if (c == null) { // If still not found, then invoke findClass in order // to find the class. // ... c = findClass(name); // ... } } // ... return c; } }
This means that the loadClass()
mechanism looks to see if the class is already loaded, then asks its parent classloader. If that classloading fails (note the try-catch surrounding the call to parent.loadClass(name, false)
) then the loading process delegates to findClass()
. The definition of findClass()
in java.lang.ClassLoader
is simple – it throws a ClassNotFoundException
.
At this point, let’s return to a question that we posed at the start of the article, and explore some of the exception and error types that can be encountered during classloading.
Classloading Exceptions
Firstly, the meaning of ClassNotFoundException
is relatively simple – that the classloader attempted to load the specified class, but was unable to do it. The class was unknown to the JVM at the point where loading was requested – and the JVM was unable to find it.
Next up is NoClassDefFoundError
– note that this is an error rather than an exception. This error indicates that the JVM knew of the existence of the requested class but didn’t find a definition for it in its internal metadata.
Let’s take a quick look at an example:
public class ExampleNoClassDef { public static class BadInit { private static int thisIsFine = 1 / 0; } public static void main(String[] args) { try { var init = new BadInit(); } catch (Throwable t) { System.out.println(t); } var init2 = new BadInit(); System.out.println(init2.thisIsFine); } }
When this runs, we get some output like this:
$ java ExampleNoClassDef java.lang.ExceptionInInitializerError Exception in thread "main" java.lang.NoClassDefFoundError: Could not initialize class ExampleNoClassDef$BadInit at ExampleNoClassDef.main(ExampleNoClassDef.java:13)
This shows that the JVM tried to load the BadInit
class but failed to do it. Nevertheless, the program caught the exception and tried to carry on. When the class was encountered for the second time, the JVM’s internal metadata table showed that the class had been seen – but that a valid class wasn’t loaded.
The JVM effectively implements negative caching on a failed classloading attempt – and the loading isn’t retried, and instead an error (NoClassDefFoundError
) is thrown.
Another common error is UnsupportedClassVersionError
– which is triggered when a classloading operation tries to load a class file which was compiled by a higher version of the Java source code compiler than the runtime version.
For example, consider a class compiled with Java 11 that we try to run on Java 8:
$ java ScratchImpl Error: A JNI error has occurred, please check your installation and try again Exception in thread "main" java.lang.UnsupportedClassVersionError: ScratchImpl has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:763) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:495)
The Java 11 format bytecode may have features in it which aren’t supported by the runtime, and it’s not safe to continue to try to load it. Note that because this is a Java 8 runtime, it doesn’t have modular entries in the stack trace.
Finally, we should also mention LinkageError
– which is the base class of a hierarchy containing NoClassDefFoundError
, VerifyError
and UnsatisfiedLinkError
as well as several other possibilities.
A First Custom Classloader
The simplest form of custom classloading is to subclass ClassLoader
and override findClass()
. This allows us to reuse the loadClass()
logic that we discussed earlier on and to reduce the complexity in our classloader.
Our first example is the SadClassLoader
– it doesn’t do anything but make sure that you know it was technically involved in the process and it wishes you well.
public class LoadSomeClasses { public static class SadClassloader extends ClassLoader { public SadClassloader() { super(SadClassloader.class.getClassLoader()); } public Class<?> findClass(String name) throws ClassNotFoundException { System.out.println("I am very concerned that I couldn't find the class"); throw new ClassNotFoundException(name); } } public static void main(String[] args) { if (args.length > 0) { var loader = new SadClassloader(); for (var name : args) { System.out.println(name +" ::"); try { var clazz = loader.loadClass(name); System.out.println(clazz); } catch (ClassNotFoundException x) { x.printStackTrace(); } } } } }
In our example we set up a simple classloader and some code that uses it to try to load classes which may already be loaded.
One common convention for custom classloaders is to provide a void constructor that calls the superclass constructor and provides the loading classloader as an argument (to become the parent).
Many custom classloaders aren’t that much more complex than our example – they override findClass()
to provide the specific needed capability . This could include, for example, looking for the class over the network. In one memorable case, a custom classloader loaded classes by connecting to a database via JDBC and accessing an encrypted binary column to get the bytes that would be used. This was to satisfy an encryption-at-rest requirement for sensitive code in a highly regulated environment.
It’s possible to do more than override findClass()
. For example, loadClass()
isn’t final and it can be overridden, and in fact some custom classloaders override it precisely to change the general logic we met earlier.
Finally, there’s also the method defineClass()
which is defined on ClassLoader
.
This method is key to classloading – as it’s the user-accessible method that performs the “Loading and Linking” process that we described earlier in the article. It takes an array of bytes and turns them into a class object. This is the primary mechanism used to load new classes at runtime which aren’t present on the classpath.
The call to defineClass()
only works if it’s passed a buffer of bytes which are in the correct JVM class file format – if not then it fails to load – as either the loading or verification step fails.
This method can be used for advanced techniques such as loading classes which are generated at runtime and have no source code representation. This technique is how the lambda expressions mechanism works in Java.
The defineClass()
method is both protected and final, and it’s defined on java.lang.ClassLoader
to only be accessed by subclasses of ClassLoader
. Custom classloaders always have access to the basic functionality of defineClass()
but can’t tamper with the verification or other low-level classloading logic. This last point is important – not being able to change the verification algorithm is a useful safety feature as it means a poorly-written custom classloader can’t compromise the basic platform security the JVM provides.
In the case of the HotSpot virtual machine (which is by far the most common Java VM implementation), defineClass()
delegates to the native method defineClass1()
, which does some basic checks and then calls a C function called JVM_DefineClassWithSource()
.
This function is an entry point into the JVM – and it provides access into the C++ code of HotSpot. HotSpot uses the C++ SystemDictionary
to load the new class via the C++ method ClassFileParser::parseClassFile()
. This is the code that runs much of the linking process – and in particular the verification algorithm.
Once classloading has completed, the bytecode of the methods is placed into Hotspot’s metadata objects that represent the methods (they’re called methodOops). They’re then available for the bytecode interpreter to use. This can be thought of as a method cache conceptually, although the bytecode is held inline in the methodOops for performance reasons.
We’ve already met the SadClassloader
, but let’s look at another couple of examples of custom classloaders, staring with a look at how classloading can be used to implement dependency injection (DI).
Example: A Dependency Injection Framework
Two primary concepts are core to the idea of DI:
- Units of functionality within a system have dependencies and configuration information upon which they rely for proper functioning.
- Many object systems have dependencies which are difficult or clumsy to express.
The picture that should be in your head is of classes that contain behavior, and configuration and dependencies which are external to the objects. This latter part is what is usually referred to as the runtime wiring of the objects.
In this example, we’ll discuss how a hypothetical DI framework could make use of classloaders to implement runtime wiring. The approach we’ll take is like a simplified version of the original implementation of the Spring framework.
Modern DI frameworks frequently uses another approach which has more compile-time safety – but which has significantly higher complexity and cognitive load to understand. Our example is for demonstration purposes only.
Let’s start by looking at how we’d start an application under our imaginary DI framework:
java -cp <CLASSPATH> org.wgjd.DIMain /path/to/config.xml
The CLASSPATH must contain the JAR files for the DI framework, and for any classes which are referred to in the config.xml file (along with any of their dependencies).
For this to be managed under DI, you’ll need a config file too, like this:
<beans> <bean id="dao" class="wgjd.ch03.PaymentsDAO"> <constructor-arg index="0" value="jdbc:postgresql://db.wgjd.org/payments"/> <constructor-arg index="1" value="org.postgresql.Driver"/> </bean> <bean id="service" class="wgjd.ch03.PaymentService"> <constructor-arg index="0" ref="dao"/> </bean> </beans>
In this technique, the DI framework uses the config file to determine which objects to construct. This example needs to make the dao
and service
beans, and the framework calls the constructors for each bean.
This means that classloading occurs in two separate phases. The first phase (which is handled by the application classloader) loads the class DIMain and any classes to which it refers. Then DIMain starts to run and receives the location of the config file as a parameter to main()
.
At this point, the framework is up and running in the JVM, but the user classes specified in config.xml
haven’t yet been touched. In fact, until DIMain
examines the config file, the framework has no way of knowing which classes are to be loaded.
This example’s hypothetical and illustrative – it’s entirely possible to build a simple DI framework that works in exactly the manner described. Implementation of real DI systems is typically more complicated in practice.
To bring up the application configuration specified in config.xml
, a second phase of classloading is required. This uses a custom classloader.
First, the config.xml
file is checked for consistency and to make sure it’s error-free. Then, if all is well, the custom classloader tries to load the types from the CLASSPATH
. If any of these fail, the whole process is aborted, causing a runtime error.
If this succeeds, the DI framework can proceed to instantiate the required objects in the correct order and call any setter methods on the created instances. Finally, if all of this completes OK, the application context is up and can begin to run.
Example: An instrumenting classloader
Consider a classloader that alters the bytecode of classes as they’re loaded to add extra instrumentation information. When test cases are run against the transformed code, the instrumentation code records which methods and code branches are tested by the test cases. From this, the developer can see how thorough the unit tests for a class are.
This approach was the basis of the EMMA testing coverage tool, which is still available from http://emma.sourceforge.net/ although it’s now rather outdated and hasn’t been kept up to date for modern Java versions.
Despite this, it’s quite common to encounter frameworks and other code that makes use of specialized classloaders that transform the bytecode as it’s being loaded.
The technique of modifying bytecode as it’s loaded is also seen in the java agent approach, which is used for performance monitoring, observability and other goals – by tools such as New Relic.
We’ve briefly touched on a couple of use cases for custom classloading. Many other areas of the Java technology space are big users of classloaders and related techniques. These are some of the best-known examples:
- Plugin architectures
- Frameworks (whether vendor or homegrown)
- Class file retrieval from unusual locations (not file systems or URLs)
- Java EE
- Any circumstance where new, unknown code may need to be added after the JVM process has already started running
Let’s move on to discuss how the modules system affects classloading and modifies the classic picture that we’ve explained.
Modules and classloading
The modules system is designed to operate at a different level to classloading, which is a relatively low-level mechanism within the platform. Modules are about large-scale dependencies between program units and classloading is about the small scale, but it’s important to understand how the two mechanisms intersect and the changes to program startup, which are caused by the arrival of modules.
Recall that when running on a modular JVM then to execute a program, the runtime computes a module graph and tries to satisfy it as a first step. This is referred to as module resolution and it derives the transitive closure of the root module and its dependencies.
During this process, additional checks are performed (e.g. no modules with duplicate names, no split packages, etc). The existence of the module graph means that fewer runtime classloading problems are expected – because missing jars on the module path can now be detected before the process even starts fully.
Beyond this, the modules system doesn’t alter classloading much in most cases. Some advanced possibilities exist, such as dynamically loading modular implementations of service provider interfaces by using reflection, but those are unlikely to be encountered by most developers.
Reflection
One of the key techniques that a well-grounded Java developer should have at their command is reflection. This is an extremely powerful capability but many developers struggle with it at first – because it seems alien to the way that most Java developers think about code.
Reflection is the ability to query or introspect objects and discover (and use) their capabilities at runtime. It can be thought of as several different things, depending on context:
- A programming language API
- A programming style or technique
- A runtime mechanism that enables the technique
- A property of the language type system
Reflection in an object-oriented system is the idea that the programming environment can represent the types and methods of the program as objects. This is only possible in languages that have a runtime that supports this – and it’s a fundamentally dynamic aspect of a language.
When using the reflective style of programming, it’s possible to manipulate objects without using their static types at all. This seems like a step backwards, but if we can work with objects without needing to know their static types, then it means that we can build libraries, frameworks and tools that can work with any type – including types that didn’t exist when our handling code was written.
When Java was a young language, reflection was one of the key technological innovations that it brought to the mainstream. Although other languages (notably Smalltalk) introduced it much earlier, it wasn’t a common part of many languages when Java was released.
Introducing Reflection
The abstract description of reflection can often seem confusing or hard to grasp. Let’s look at some simple examples in JShell to try to get a more concrete view of what’s reflection:
jshell> Object o = new Object(); o ==> java.lang.Object@a67c67e jshell> Class<?> clz = o.getClass(); clz ==> class java.lang.Object
This is our first glimpse of reflection – a class object for the type Object
. In fact, the actual type of clz
is Class<Object>
but when we obtain a class object from classloading or getClass()
we have to handle it using the unknown type, ?
, in the generics:
jshell> Class<Object> clz = Object.class; clz ==> class java.lang.Object jshell> Class<Object> clz = o.getClass(); | Error: | incompatible types: java.lang.Class<capture#1 of ? extends java.lang.Object> cannot be converted to java.lang.Class<java.lang.Object> | Class<Object> clz = o.getClass(); | ^----------^
This is because reflection is a dynamic, runtime mechanism, and the true type Class<Object>
is unknown to the source code compiler. This introduces irreducible extra complexity to working with reflection – as we can’t rely on the Java type system to help us.
On the other hand, this dynamic nature is the key point of reflection – if we don’t know what type something is at compile time. We must treat it in a general way which creates the flexibility to build an open, extensible system.. Reflection produces a fundamentally open system and this can come into conflict with the more encapsulated systems that Java modules try to bring to the platform.
Many familiar frameworks and developer tools rely heavily on reflection to achieve their capabilities, such as debuggers and code browsers. Plugin architectures and interactive environments and REPLs also use reflection extensively. In fact, JShell couldn’t be built in a language without a reflection subsystem.
jshell> class Pet { ...> public void feed() { ...> System.out.println("Feed the pet"); ...> } ...> } | created class Pet jshell> var clz = Pet.class; clz ==> class Pet
Now we’ve an object that represents the class type of Pet, we can use to do other actions, such as creating a new instance:
jshell> Object o = clz.newInstance(); o ==> Pet@66480dd7
The problem is that newInstance()
returns Object
– which isn’t a useful type. We could cast o
back to Pet,
but this requires us to know ahead of time what types we’re working with – which rather defeats the point of the dynamic nature of reflection; let’s try something else:
jshell> import java.lang.reflect.Method; jshell> Method m = clz.getMethod("feed", new Class[0]); m ==> public void Pet.feed()
Now we’ve an object that represents the method feed()
– but it represents it as abstract metadata – it isn’t attached to any specific instance.
The natural thing to do with an object that represents a method is to call it. The class java.lang.reflect.Method
defines a method invoke()
which has the effect of calling the method that the Method
object represents.
When working in JShell we’re avoiding a lot of exception handling code. When writing regular Java code that uses reflection you need to deal with the possible exception types in one way or another.
For this call to succeed, we must provide the right number and types of arguments. This argument list must include the receiver object on which the method is called reflectively (assuming the method is an instance method). In our simple example, this looks like this:
jshell> Object ret = m.invoke(o); Feed the pet ret ==> null
As well as the Method
objects, reflection also provides for objects that represent other fundamental concepts within the Java type system and language – such as fields, annotations and constructors. These classes are found in the java.lang.reflect
package – and some of them (such as Constructor
) are generic types.
The reflection subsystem must be upgraded to deal with modules. As classes and methods can be treated reflectively, also there needs to be a reflective API for working with modules. The key class is, perhaps unsurprisingly, java.lang.Module
and it can be accessed directly from a Class
object:
var module = String.class.getModule(); var descriptor = module.getDescriptor();
The descriptor of a module is of type ModuleDescriptor and provides a read-only view of the metadata about a module – the equivalent to the contents of module-info.class.
Dynamic capabilities, such as discovery of modules, are also possible in the new reflective API. This is achieved via interfaces such as ModuleFinder, but a detailed description of how to work reflectively with the modules system is outside the scope of this article.
Combining Classloading and Reflection
Let’s look at an example that combines classloading and reflection. We won’t need a full classloader that obeys the usual findClass()
and loadClass()
protocol. Instead, we’ll subclass ClassLoader
to gain access to the protected defineClass()
method.
The main method takes a list of filenames, and if they’re a Java class, then uses reflection to access each method in turn and detect if it’s a native method:
public class NativeMethodChecker { public static class EasyLoader extends ClassLoader { public EasyLoader() { super(EasyLoader.class.getClassLoader()); } public Class<?> loadFromDisk(String fName) throws IOException { var b = Files.readAllBytes(Path.of(fName)); return defineClass(null, b, 0, b.length); } } public static void main(String[] args) { if (args.length > 0) { var loader = new EasyLoader(); for (var file : args) { System.out.println(file +" ::"); try { var clazz = loader.loadFromDisk(file); for (var m : clazz.getMethods()) { if (Modifier.isNative(m.getModifiers())) { System.out.println(m.getName()); } } } catch (IOException | ClassFormatError x) { System.out.println("Not a class file"); } } } } }
These types of examples can be fun to explore the dynamic nature of the Java platform and to learn how the Reflection API works, but it’s important that a well-grounded Java developer is conscious of the limitations and occasional frustrations that can occur when working reflectively.
Problems with Reflection
The Reflection API has been part of the Java platform from 1.1 (1996) and in those twenty-five years a number of issues and weaknesses have come to light. Some of these inconveniences include:
- It’s an old API – with array types everywhere (it predates the Java Collections)
- Figuring out which method overload to call is painful
- API has two different methods
getMethod()
andgetDeclaredMethod()
to access methods reflectively - API provides the
setAccessible()
method which can be used to ignore access control - Exception handling is complex for reflective calls – checked exceptions get elevated to runtime exceptions
- Boxing and unboxing is necessary to make reflective calls that pass or return primitives
- Primitive types require placeholder class objects, e.g.
int.class
– which is of typeClass<Integer>
void
methods require the introduction of thejava.lang.Void
type
As well as the various awkward corners in the API, Java Reflection has always suffered from poor performance – for several reasons, including unfriendliness to the JVM’s JIT compiler.
Solving the problem of reflective call performance was one of the major reasons for the addition of the Method Handles API.
One final problem with reflection remains, which is perhaps more of a philosophical problem (or anti-pattern): Developers frequently encounter reflection as one of the first truly advanced techniques that they meet when levelling up in Java.
As a result it can become overused or a Golden Hammer technique – used to implement systems which are excessively flexible or which display an internal mini-framework which isn’t needed (sometimes called the Inner Framework anti-pattern). Such systems are often configurable but at the expense of encoding the domain model into configuration rather than directly in the domain types.
Reflection is a great technique and one that the well-grounded Java developer should have in their toolbox, but it isn’t suitable for every situation and most developers probably only need to use it sparingly.
That’s all for now. If you want to learn more about the book, you can check it out on Manning’s liveBook platform here.