The Artima Developer Community
Sponsored Link

Chapter 8 of Inside the Java Virtual Machine
The Linking Model
by Bill Venners

From the programmer's perspective, one of the most important aspects of Java's architecture to understand is the linking model. As mentioned in earlier chapters, Java's linking model allows you to design user-defined class loaders that extend your application in custom ways at run-time. Through user-defined class loaders, your application can load and dynamically link to classes and interfaces that were unknown or did not even exist when your application was compiled.

The engine that drives Java's linking model is the process of resolution. The previous chapter described all the various stages in the lifetime of a class, but didn't dive into the details of loading and resolution. This chapter looks at loading and resolution in depth, and shows how the process of resolution fits in with dynamic extension. It gives an overview of the linking model, explains constant pool resolution, describes method tables, shows how to write and use class loaders, and gives several examples.

Dynamic Linking and Resolution

When you compile a Java program, you get a separate class file for each class or interface in your program. Although the individual class files may appear to be independent, they actually harbor symbolic connections to one another and to the class files of the Java API. When you run your program, the Java virtual machine loads your program's classes and interfaces and hooks them together in a process of dynamic linking. As your program runs, the Java virtual machine builds an internal web of interconnected classes and interfaces.

A class file keeps all its symbolic references in one place, the constant pool. Each class file has a constant pool, and each class or interface loaded by the Java virtual machine has an internal version of its constant pool called the runtime constant pool. The runtime constant pool is an implementation-specific data structure that maps to the constant pool in the class file. Thus, after a type is initially loaded, all the symbolic references from the type reside in the type's runtime constant pool.

At some point during the running of a program, if a particular symbolic reference is to be used, it must be resolved. Resolution is the process of finding the entity identified by the symbolic reference and replacing the symbolic reference with a direct reference. Because all symbolic references reside in the constant pool, this process is often called constant pool resolution.

As described in Chapter 6, "The Java Class File," the constant pool is organized as a sequence of items. Each item has a unique index, much like an array element. A symbolic reference is one kind of item that may appear in the constant pool. Java virtual machine instructions that use a symbolic reference specify the index in the constant pool where the symbolic reference resides. For example, the getstatic opcode, which pushes the value of a static field onto the stack, is followed in the bytecode stream by an index into the constant pool. The constant pool entry at the specified index, a CONSTANT_Fieldref_info entry, reveals the fully qualified name of the class in which the field resides, and the name and type of the field.

Keep in mind that the Java virtual machine contains a separate runtime constant pool for each class and interface it loads. When an instruction refers to the fifth item in the constant pool, it is referring to the fifth item in the constant pool for the current class, the class that defined the method the Java virtual machine is currently executing.

Several instructions, from the same or different methods, may refer to the same constant pool entry, but each constant pool entry is resolved only once. After a symbolic reference has been resolved for one instruction, subsequent attempts to resolve it by other instructions take advantage of the hard work already done, and use the same direct reference resulting from the original resolution.

Linking involves not only the replacement of symbolic references with direct ones, it also involves checking for correctness and permission. As mentioned in Chapter 7, "The Lifetime of a Class," the checking of symbolic references for existence and access permission (one aspect of the full verification phase) is performed during resolution. For example, when a Java virtual machine resolves a getstatic instruction to a field of another class, the Java virtual machine checks to make sure that:

If any of these checks fail, an error is thrown and resolution fails. Otherwise, the symbolic reference is replaced by the direct reference and resolution succeeds.

As described in Chapter 7, "The Lifetime of a Class," different implementations of the Java virtual machine are permitted to perform resolution at different times during the execution of a program. An implementation may choose to link everything up front by following all symbolic references from the initial class, then all symbolic references from subsequent classes, until every symbolic reference has been resolved. In this case, the application would be completely linked before its main() method was ever invoked. This approach is called early resolution. Alternatively, an implementation may choose to wait until the very last minute to resolve each symbolic reference. In this case, the Java virtual machine would resolve a symbolic reference only when it is first used by the running program. This approach is called late resolution. Implementations may also use a resolution strategy in-between these two extremes.

Although a Java virtual machine implementation has some freedom in choosing when to resolve symbolic references, every Java virtual machine must give the outward impression that it uses late resolution. No matter when a particular Java virtual machine performs its resolution, it will always throw any error that results from attempting to resolve a symbolic reference at the point in the execution of the program where the symbolic reference was actually used for the first time. In this way, it will always appear to the user as if the resolution were late. If a Java virtual machine does early resolution, and during early resolution discovers that a class file is missing, it won't report the class file missing by throwing the appropriate error until later in the program when something in that class file is actually used. If the class is never used by the program, the error will never be thrown.

Resolution and Dynamic Extension

In addition to simply linking types at run-time, Java applications can decide at run-time which types to link. Java's architecture allows Java programs to be dynamically extended, the process of deciding at run-time other types to use, loading them, and using them. You can dynamically extend a Java application by passing the name of a type to load to either the forName() method of class java.lang.Class or the loadClass() method of an instance of a user-defined class loader, which can be created from any subclass of java.lang.ClassLoader. Either of these approaches enable your running application to load types whose names are not mentioned in the source code of your application, but rather, are determined by your application as it runs. An example of dynamic extension is a Java-capable web browser, which loads class files for applets from across a network. When the browser starts, it doesn't know what class files it will be loading across the network. The browser learns the names of the classes and interfaces required by each applet as it encounters the web pages that contain those applets.

The most straightforward way to dynamically extend a Java application is with the forName() method of class java.lang.Class, which has two overloaded forms:

// A method declared in class java.lang.Class:
public static Class forName(String className)
    throws ClassNotFoundException;
public static Class forName(String className, boolean initialize,
    ClassLoader loader)
    throws ClassNotFoundException;
The three parameter form of forName(), which was added in version 1.2, takes the fully qualified name of the type to load in the String className parameter. If the boolean initialize parameter is true, the type will be linked and initialized as well as loaded before the forName() method returns. Otherwise, if the boolean initialize parameter is false, the type will be loaded and possibly linked but not explicitly initialized by the forName() method. Nevertheless, if the type had already been initialized prior to the forName() invocation, the type returned will have been initialized even though you pass false as the second parameter to forName(). In the third parameter, ClassLoader loader, you pass a reference to the user-defined class loader from which you want forName() to request the type. You can also indicate that you want forName() to request the type from the bootstrap class loader by passing null in the ClassLoader loader parameter. The version of forName() that takes one parameter, the fully qualified name of the type to load, always requests the type from the current class loader (the loader that loaded the class making the forName() request) and always initializes the type. Both versions of forName() return a reference to the Class instance that represents the loaded type, or if the type can't be loaded, throws ClassNotFoundException.

The other way to dynamically extend a Java application is to load classes via the loadClass() method of a user-defined class loader. To request a type from a user- defined class loader, you invoke loadClass() on that class loader. Class ClassLoader contains two overloaded methods named loadClass(), which look like this:

// A method declared in class java.lang.ClassLoader:
protected Class loadClass(String name)
    throws ClassNotFoundException;
protected Class loadClass(String name, boolean resolve)
    throws ClassNotFoundException;

Both loadClass() methods accept the fully qualified name to load in their String name parameter. The semantics of loadClass() are similar to those of forName(). If the loadClass() method has already loaded a type with the fully qualified name passed in the String name parameter, it should return the Class instance representing that already loaded type. Otherwise, it should attempt to load the requested type in some custom way decided upon by the author of the user-defined class loader. If the class loader is successful loading the type in its custom way, loadClass() should return the Class instance representing the newly loaded type. Otherwise, it should throw ClassNotFoundException. The details on writing your own user-defined class loader are given later in this chapter.

The boolean resolve parameter of the two-parameter version of loadClass() indicates whether or not the type should be linked as well as loaded. As mentioned in previous chapters, the process of linking involves three steps: verification of the loaded type, preparation, which involves allocating memory for the type, and optionally, resolution of symbolic references contained in the type. If resolve is true, the loadClass() method should ensure that the type has been linked as well as loaded before it returns the Class instance for that type. If resolve is false, the loadClass() method will merely attempt to load the requested type and not concern itself with whether or not the type is linked. Because the Java virtual machine specification gives implementations some flexibility in the timing of linking, when you pass false in the resolve parameter, the type you get back from loadClass() may or may not have already been linked. The two parameter version of loadClass() is a legacy method whose resolve parameter has, since Java version 1.1, really served no useful purpose. In general, you should invoke the one-parameter version of loadClass(), which is equivalent to invoking the two-parameter version with resolve set to false. When you invoke the one-parameter version of loadClass(), it will attempt to load and return the type, but will leave the timing of linking and initializing the type to the virtual machine.

Whether you should use forName() or invoke loadClass() on a user-defined class loader instance depends on your needs. If you have no special needs that require a class loader, you should probably use forName(), because forName() is the most straightforward approach to dynamic extension. In addition, if you need the requested type to be initialized as well as loaded (and linked), you'll have to use forName(). When the loadClass() method returns a type, that type may or may not be linked. When you invoke the single parameter version of forName(), or invoke the three-parameter version and pass true in the initialize parameter, the returned type will definitely have been already linked and initialized.

Initialization is the reason, for example, that JDBC drivers are usually loaded with a call to forName(). Because the static initializers of each JDBC driver class registers the driver with a DriverManager, thereby making the driver available to the application, the driver class must be initialized, not just loaded. Were a driver class loaded but not initialized, the static initializers of the class would not be executed, the driver would not become registered with the DriverManager, and the driver would therefore not be available to the application. Loading a driver with forName() ensures that the class will be initialized, which ensures the driver will be available for use by the application after forName() returns.

Class loaders, on the other hand, can help you meet needs that forName() can't. If you have some custom way of loading types, such as by downloading them across a network, retrieving them from a database, extracting them from encrypted files, or even generating them on the fly, you'll need a class loader. One of the primary reasons to create a user-defined class loader is to customize the way in which a fully qualified type name is transformed into an array of bytes in the Java class file format that define the named type. Other reasons you may want to use a class loader rather than forName() involve security. As mentioned in Chapter 3, "Security," the separate namespaces awarded to each class loader enable you to in effect place a shield between the types loaded into different namespaces. You can write a Java application such that types cannot see any types that aren't loaded into the same namespace. Also, as mentioned in Chapter 3, class loaders are responsible for placing loaded code into protection domains. Thus, if your security needs include a custom way to place loaded types into protection domains, you'll need to use class loaders rather than forName().

Both the general process of dynamic extension and the separate namespaces awarded to individual class loaders are supported by one aspect of resolution: the way a virtual machine chooses a class loader when it resolves a symbolic reference to a type. When the resolution of a constant pool entry requires loading a type, the virtual machine uses the same class loader that loaded the referencing type to load the referenced type. For example, imagine a Cat class refers via a symbolic reference in its constant pool to a type named Mouse. Assume Cat was loaded by a user-defined class loader. When the virtual machine resolves the reference to Mouse, it checks to see if Mouse has been loaded into the namespace to which Cat belongs. (It checks to see if the class loader that loaded Cat has previously loaded a type named Mouse.) If not, the virtual machine requests Mouse from the same class loader that loaded Cat. This is true even if a class named Mouse had previously been loaded into a different namespace. When a symbolic reference from a type loaded by the bootstrap class loader is resolved, the Java virtual machine uses the bootstrap class loader to load the referenced type. When a symbolic reference from a type loaded by a user- defined class loader is resolved, the Java virtual machine uses the same user-defined class loader to load the referenced type.

Class Loaders and the Parent-Delegation Model

As mentioned in Chapter 3, "Security," version 1.2 introduced a formal parent-delegation model for class loaders. Although legacy class loaders written prior to 1.2 that don't take advantage of the parent- delegation model will still work in 1.2, the recommended way to create class loaders from 1.2 on is to use the parent-delegation model. Each user-defined class loader created in 1.2 is assigned a "parent" class loader when it is created. If the parent class loader is not passed explicitly to the constructor of the user-defined class loader, the system class loader is assigned to be the parent by default. Alternatively, a parent loader can be explicitly passed to the constructor of a new user-defined class loader. If a reference to an existing user- defined class loader is passed to the constructor, that user-defined class loader is assigned to be the parent. If null is passed to the constructor, the bootstrap class loader is assigned to be the parent.

To better visualize the parent-delegation model, imagine a Java application creates a user-defined class loader named "Grandma." Because the application passes null to Grandma's constructor, Grandma's parent is set to the bootstrap class loader. Time passes. Sometime later, the application creates another class loader named "Mom." Because the application passes to Mom's constructor a reference to Grandma, Mom's parent is set to the user-defined class loader referred to affectionately as Grandma. More time passes. At some later time, the application creates a class loader named, "Cindy." Because the application passes to Cindy's constructor a reference to Mom, Cindy's parent is set to the user- defined class loader referred to as Mom.

Now imagine the application asks Cindy to load a type named java.io.FileReader. When a class that follows the parent delegation model loads a type, it first delegates to its parent -- it asks its parent to try and load the type. Its parent, in turn, asks its parent, which first asks its parent, and so on. The delegation continues all the way up to the end-point of the parent-delegation chain, which is usually the bootstrap class loader. Thus, the first thing Cindy does is ask Mom to load the type. The first thing Mom does is ask Grandma to load the type. And the first thing Grandma does is ask the bootstrap class loader to load the type. In this case, the bootstrap class loader is able to load (or already has loaded) the type, and returns the Class instance representing java.io.FileReader to Grandma. Grandma passes this Class reference back to Mom, who passes it back to Cindy, who returns it to the application.

Note that given delegation between class loaders, the class loader that initiates loading is not necessarily the class loader that actually defines the type. In the previous example, the application initially asked Cindy to load the type, but ultimately, the bootstrap class loader defined the type. In Java terminology, a class loader that is asked to load a type, but returns a type loaded by some other class loader, is called an initiating class loader of that type. The class loader that actually defines the type is called the defining class loader for the type. In the previous example, therefore, the defining class loader for java.io.FileReader is the bootstrap class loader. Class Cindy is an initiating class loader, but so are Mom, Grandma, and even the bootstrap class loader. Any class loader that is asked to load a type and is able to return a reference to the Class instance representing the type is an initiating loader of that type.

For another example, imagine the application asks Cindy to load a type named com.artima.knitting.QuiltPattern. Cindy delegates to Mom, who delegates to Grandma, who delegates to the bootstrap class loader. In this case, however, the bootstrap class loader is unable to load the type. So control returns back to Grandma, who attempts to load the type in her custom way. Because Grandma is responsible for loading standard extensions, and the com.artima.knitting package is wisely installed in a JAR file in the standard extensions directory, Grandma is able to load the type. Grandma defines the type and returns the Class instance representing com.artima.knitting.QuiltPattern to Mom. Mom passes this Class reference back to Cindy, who returns it to the application. In this example, Grandma is the defining loader of the com.artima.knitting.QuiltPattern type. Cindy, Mom, and Grandma -- but not the bootstrap class loader -- are initiating class loaders for the type.

Constant Pool Resolution

This section describes the details of resolving each type of constant pool entry, including the errors that may be thrown during resolution. If an error is thrown during resolution, the error is seen as being thrown by the instruction that refers to the constant pool entry being resolved. Besides the errors described here, individual instructions that trigger the resolution of a constant pool entry may cause other errors to be thrown. For example, getstatic causes a CONSTANT_Fieldref_info entry to be resolved. If the entry is resolved successfully, the virtual machine performs one additional check: it makes sure the field is actually static (a class variable and not an instance variable). If the field is not static, the virtual machine throws an error. Any extra errors that may be thrown during resolution besides those described in this section are described for each individual instruction in Appendix A.

In the following sections, the term current class loader refers to the defining class loader, whether it be a user-defined class loader or the bootstrap class loader, for the type whose constant pool contains the symbolic reference being resolved. The term current namespace refers to the namespace of the current class loader, the set of all type names for which the current class loader has been marked as an initiating loader.

Resolution of CONSTANT_Class_info Entries

Of all the types of constant pool entries, the most complicated to resolve is CONSTANT_Class_info. This type of entry is used to represent symbolic references to classes (including array classes) and interfaces. Several instructions, such as new and anewarray, refer directly to CONSTANT_Class_info entries. Other instructions, such as putfield or invokevirtual, refer indirectly to CONSTANT_Class_info entries through other types of entry. For example, the putfield instruction refers to a CONSTANT_Fieldref_info entry. The class_index item of a CONSTANT_Fieldref_info gives the constant pool index of a CONSTANT_Class_info entry.

The details of resolving a CONSTANT_Class_info entry vary depending on whether or not the type is an array and whether the referencing type (the one that contains in its constant pool the CONSTANT_Class_info entry being resolved) was loaded via the bootstrap class loader or a user-defined class loader.

Array Classes

A CONSTANT_Class_info entry refers to an array class if its name_index refers to a CONSTANT_Utf8_info string that begins with a left bracket, as in "[I." As described in Chapter 6, "The Java Class File," internal array names contain one left bracket for each dimension, followed by a component type. If the component type begins with an "L," as in "Ljava.lang.Integer;," the array is an array of references. Otherwise, the component type is a primitive type, such as "I" for int or "D" for double, and the array is an array of primitive types.

The end product of the resolution of a symbolic reference to an array class is a Class instance that represents the array class. If the current class loader has already been recorded as an initiating loader for the array class being resolved, that same class is used. Otherwise, the virtual machine performs the following steps: If the component type of the array is a reference type (the array is an array of references), the virtual machine resolves the component type using the current class loader. For example, if resolving an array class with the name "[[Ljava.lang.Integer;," the virtual machine would make certain class java.lang.Integer is loaded into the namespace of the current class loader. After resolving the component type if the array is an array of references, or immediately, if the array is an array of primitive types, the virtual machine creates a new array class of the indicated component type and number of dimensions and instantiates a Class instance to represent the type. For an array of references, the array class is marked as having been defined by the defining class loader of the component type. For an array of primitive types, the array class is marked as having been defined by the bootstrap class loader.

Non-Array Classes and Interfaces

A CONSTANT_Class_info entry whose name_index refers to a CONSTANT_Utf8_info string that doesn't begin with a left bracket is a symbolic reference to a non-array class or an interface. Resolution of this kind of symbolic reference is a multiple step process.

The Java virtual machine performs the same basic steps, described below as Steps 1a and 1b, to resolve any symbolic reference (any CONSTANT_Class_info entry) to a non-array class or interface. In Step 1a, the type is loaded. In Step 1b, access permission to the type is checked. The precise way in which the virtual machine performs Step 1a depends on whether the referencing type was loaded via the bootstrap class loader or a user-defined class loader.

Also described in this section are Steps 2a through 2d, which describe the linking and initialization of the newly resolved type. These steps are not part of the resolution of the symbolic reference to the type that becomes linked and initialized. Resolution of a symbolic reference to a non-array class or interface involves only Steps 1a and 1b, the (potential) loading of the type and the checking of its access permission. However, whenever the resolution process of a symbolic reference to a type is being triggered by the first active use of the type, linking and initialization of the type will immediately follow the resolution of the symbolic reference to that type. Because Java virtual machine implementations are allowed to perform early resolution, however, resolution of references to types may occur much earlier than the linking and initialization of those types. As mentioned in Chapter 7, the "Lifetime of a Type," initialization (here, Step 2d) occurs on the first active use of the type. Before a type can be initialized, it must be linked (Steps 2a through 2c), and before it can be linked, it must be loaded (Step 1a). [D] Step 1a. Load the Type and any Supertypes

The fundamental activity required by the resolution of a non-array class or interface is making sure the type is loaded into the current namespace. As a first step, the virtual machine must determine whether or not the referenced type has already been loaded into the current namespace. To make that determination, the virtual machine must find out whether the current class loader has been marked as an initiating loader for a type with the desired fully qualified name (the type name given in the symbolic reference being resolved). For each class loader, the Java virtual machine maintains a list of the names of all the types for which the class loader has served as an initiating class loader. Each of these lists forms a namespace inside the Java virtual machine. The virtual machine uses these lists during resolution to determine whether a class has already been loaded by a particular class loader. If the virtual machine discovers the desired fully qualified name is already mentioned in the current namespace, it will just use the already-loaded type, which is defined by a chunk of type data in the method area and represented by an associated Class instance on the heap. By first checking whether the current namespace already includes the desired fully qualified name, the virtual machine helps ensure that only one type with a given name is loaded by any single class loader.

If a type with the desired fully qualified name hasn't yet been loaded into the current namespace, the virtual machine passes the fully qualified name to the current class loader. The Java virtual machine always asks the current class loader, the defining loader of the referencing type whose runtime constant pool contains the CONSTANT_Class_info entry being resolved, to attempt to load the referenced type. If the referencing type was defined by the bootstrap class loader, the virtual machine asks the bootstrap class loader to load the referenced type. Otherwise, the referencing type was defined by a user- defined class loader, and the virtual machine asks the same user-defined class loader to load the referenced type.

If the current class loader is the bootstrap class loader, the virtual machine asks it in an implementation dependent way to load the type. If the current class loader is a user-defined class loader, the Java virtual machine makes the load request by invoking the user-defined class loader's loadClass() method, passing in parameter name the fully qualified name of the desired type.

When either the bootstrap class loader or a user-defined class loader is asked to load a type, the class loader has two choices: It can attempt to load the type by itself, or it can delegate the job to some other class loader. A user-defined class loader can ask either another user-defined class loader or the bootstrap class loader to attempt to load the type. The bootstrap class loader can ask a user-defined class loader to attempt to load the type.

To delegate to a user-defined class loader, a class loader (whether bootstrap or user-defined) invokes loadClass() on that class loader, passing in the fully qualified name of the desired type. To delegate to the bootstrap class loader, a user-defined class loader invokes findSystemClass(), a static method from java.lang.ClassLoader, passing in the fully qualified name of the desired type. A class loader that has been delegated to can also decide whether or not to attempt to load the type itself, or to delegate the job to yet another class loader. Eventually, some class loader will decide that the buck stops with it, and rather than delegate, attempt to actually load the type itself. If this class loader is successful at loading the type, it will be marked as the defining class loader for the type. All of the class loaders involved in the process-- the defining class loader and all the class loaders that delegated -- will be marked as initiating loaders of the type.

Given the existence of the parent-delegation model described earlier in this chapter, if a user-defined class loader delegates, the class loader to which it delegates will often be its parent in the parent-delegation model. The parent will, in-turn, delegate to its parent, which will delegate to its parent, and so on. The delegation process continues all the way up to the end-point of the delegation process, which is the class loader that, rather than delegating, decides to try and load the type itself. Most often, this end-point class loader will be the bootstrap class loader. When a parent class loader attempts to load the type but fails, control returns to the child class loader. In the parent-delegation model, the child class loader, upon learning that its parent (and grandparent, great grandparent, and so on) was unable to load the type, attempts to load the type itself. If a class loader in the middle of the delegation chain is the class loader that first has success loading the type, that class loader will be marked as the defining class loader. The defining class loader and all the class loaders before it in the parent-delegation chain will be marked as initiating class loaders. However, its parent, grandparent, great grandparent, and so on, none of whom were successful in their attempts to load the type, will not be marked as initiating class loaders of the type.

If the loadClass() method of a user-defined class loader is able to locate or produce an array of bytes that purportedly defines the type in the Java class file format, loadClass() must invoke defineClass(), passing the fully qualified name of the desired type and a reference to the byte array. Invoking defineClass() will cause the virtual machine to attempt to parse the binary data into internal data structures in the method area. At this point the virtual machine will perform pass one of verification, as described in Chapter 3, "Security," which ensures the passed array of bytes adhere to the basic structure of the Java class file format. The Java virtual machine uses the passed fully qualified name to verify that the desired type name is actually declared as the name of the type in the passed array of bytes.

Once the referenced type is loaded in, the virtual machine peers into its binary data. If the type is a class and not java.lang.Object, the virtual machine determines from the class's data the fully qualified name of the class's direct superclass. The virtual machine then checks to see if the superclass has been loaded into the current namespace. If not, it loads the superclass. Once that class comes in, the virtual machine can again peer into its binary data to find its superclass. This process repeats all the way up to Object.

When the virtual machine loads a superclass, it is really just resolving yet another symbolic reference. To determine what the fully qualified name of a class's superclass is, the virtual machine looks at the super_class field of the class file. This field gives a constant pool index of a CONSTANT_Class_info entry that serves as a symbolic reference to the class's superclass. When the virtual machine load the superclass, it does so as Step 1a of the process of resolving the symbolic reference to the superclass. Thus, as part of Step 1a of the resolution process for CONSTANT_Class_info entries, the virtual machine recursively applies the resolution process for CONSTANT_Class_info entries on each superclass all the way up to Object.

On the way back down from Object, the virtual machine will again peer into the type data for each type it loaded to see if the type directly implements any interfaces. If so, it will make sure those interfaces are also loaded. For each interface the virtual machine loads, the virtual machine peers into its type data to see if it directly extends any other interfaces. If so, the virtual machine makes sure those superinterfaces are loaded.

When the virtual machine loads superinterfaces, it is once again resolving more CONSTANT_Class_info entries. The indexes of all the constant pool entries that serve as symbolic references to the interfaces directly implemented or extended by the type being loaded are stored in the interfaces component of the class file. When the virtual machine loads superinterfaces, it is resolving the CONSTANT_Class_info entries specified in the interfaces component, applying the resolution process for CONSTANT_Class_info entries recursively.

When the virtual machine applies the recursive resolution process to superclasses and superinterfaces, it uses the defining class loader of the referencing subtype. The virtual machine makes its request in the usual way, by invoking loadClass() on the referencing subtype's defining class loader, passing in the fully qualified name of the desired direct superclass or direct superinterface.

Once a type has been loaded into the current namespace, and by recursion, all the type's superclasses and superinterfaces have also been successfully loaded, the virtual machine instantiates the new Class instance to represent the type. If the bytes defining the type were located or produced by a user-defined class loader and passed to defineClass(), defineClass() will at that point return the new Class instance. Alternatively, if a user-defined class loader delegated to the bootstrap class loader with a findSystemClass() invocation, findSystemClass() will at that point return the Class instance. Upon receiving the Class instance from either defineClass() or findSystemClass(), the loadClass() method returns the Class instance to its caller. If a user-defined class loader delegates to another user- defined class loader, therefore, it receives the Class instance from the delegated-to user-defined class loader when its loadClass() method returns. Upon receiving the Class instance from the delegated-to class loader, the delegated-from class loader returns it from its own loadClass() method.

Through Step 1a, the Java virtual machine makes sure a type is loaded, and if the type is a class, that all its superclasses are loaded, and whether the type is a class or an interface, that all of its superinterfaces are loaded. During this step, these types are not linked and initialized--just loaded.

During Step 1a, the virtual machine may throw the following errors:

Step 1b. Check Access Permission

After loading is complete, the virtual machine checks for access permission. If the referencing type does not have permission to access the referenced type, the virtual machine throws an IllegalAccessError. Step 1b is another activity that is logically part of verification, but that is performed at some other time than the official verification phase. The check for access permission will always take place after Step 1a, ensuring a type referenced from a symbolic reference is loaded into the current namespace, as part of resolving that symbolic reference. Once this check is complete, Step 1b--and the entire process of resolving the CONSTANT_Class_info entry--is complete.

If an error occurred in Steps 1a or 1b, the resolution of the symbolic reference to the type fails. But if all went well up until the access permission check of Step 1b, the type is still usable in general, just not usable by the referencing type. If an error occurred before the access permission check, however, the type is unusable and must be marked as such or discarded. [D] Step 2. Link and Initialize the Type and any Superclasses

At this point, the type being referred to by the CONSTANT_Class_info entry being resolved has been loaded, but not necessarily linked or initialized. In addition, all the type's superclasses and superinterfaces have been loaded, but not necessarily linked or initialized. Some of the supertypes may be initialized at this point, because they may have been initialized during earlier resolutions.

As described in Chapter 7, "The Lifetime of a Class," superclasses must be initialized before subclasses. If the virtual machine is resolving a reference to a class (not an interface) because of an active use of that class, it must make sure that the superclasses have been initialized, starting with Object and proceeding down the inheritance hierarchy to the referenced class. (Note that this is the opposite order in which they were loaded in Step 1a.) If a type hasn't yet been linked, it must be linked before it is initialized. Note that only superclasses must be initialized, not superinterfaces.

Step 2a. Verify the Type

Step 2 begins with the official verification phase of linking, described in Chapter 7, "The Lifetime of a Class." As mentioned in Chapter 7, the process of verification may require that the virtual machine load new types to ensure the bytecodes are adhering to the semantics of the Java language. For example, if a reference to an instance of a particular class is assigned to a variable with a declared type of a different class, the virtual machine would have to load both types to make sure one is a subclass of the other. These classes would at this point be loaded and possibly linked, but definitely not initialized.

If during the verification process the Java virtual machine uncovers trouble, it throws VerifyError.

Step 2b. Prepare the Type

After the official verification phase is complete, the type must be prepared. As described in Chapter 7, "The Lifetime of a Class," during preparation the virtual machine allocates memory for class variables and implementation-dependent data structures such as method tables.

Optional Step 2c. Resolve the Type

At this point, the type has been loaded, verified and prepared. As described in Chapter 7, "The Lifetime of a Class," a Java virtual machine implementation may optionally resolve the type at this point. Keep in mind that at this stage in the resolution process, Steps 1a, 2a, and 2b have been performed on a referenced type to resolve a CONSTANT_Class_info entry in the constant pool of a referencing type. Step 2c is the resolution of symbolic references contained in the referenced type, not the referencing type. (And by the way, Step 2b is not mentioned in the previous discussion because Step 2b has nothing to do with the referenced type's loading, linking, and initialization process. Step 2b is actually part of pass four of the verification step of the linking phase of the referencing type, the type that contains the symbolic reference to the referenced type.)

For example, if the virtual machine is resolving a symbolic reference from class Cat to class Mouse, the virtual machine performs Steps 1a, 2a, and 2b on class Mouse. At this stage of resolving the symbolic reference to Mouse contained in the constant pool of Cat, the virtual machine could optionally (as Step 2c) resolve all the symbolic references contained in the constant pool for Mouse. If Mouse's constant pool contains a symbolic reference to class Cheese, for example, the virtual machine could load and optionally link (but not initialize) Cheese at this time. The virtual machine mustn't attempt to initialize Cheese here because Cheese is not being actively used. (Of course, Cheese may in fact have already been actively used elsewhere, so it could have been already be loaded into this namespace, linked, and initialized.)

As mentioned earlier in this chapter, if an implementation does perform Step 2c at this point in the resolution process (early resolution), it must not report any errors until the symbolic references are actually used by the running program. For example, if during the resolution of Mouse's constant pool, the virtual machine can't find class Cheese, it won't throw a NoClassDefFound error until (and unless) Cheese is actually used by the program.

Step 2d. Initialize the Type

At this point, the type has been loaded, verified, prepared and optionally resolved. At long last, the type is ready for initialization. As defined in Chapter 7, "The Lifetime of a Class," initialization consists of two steps. The initialization of the type's superclasses in top down order, if the type has any superclasses, and the execution of the type's class initialization method, if it has one. Step 2d just consists of executing the class initialization method, if one exists. Because Steps 2d is performed for all the referenced type's superclasses, from the top down, Step 2d will occur for superclasses before it occurs for subclasses.

If the class initialization method completes abruptly by throwing some exception that isn't a subclass of Error, the virtual machine throws ExceptionInInitializerError with the thrown exception as a parameter to the constructor. Otherwise, if the thrown exception is already a subclass of Error, that error is thrown. If the virtual machine can't create a new ExceptionInInitializerError because there isn't enough memory, it throws an OutOfMemoryError.

Resolution of CONSTANT_Fieldref_info Entries

To resolve a constant pool entry of type CONSTANT_Fieldref_info, the virtual machine must first resolve the CONSTANT_Class_info entry specified in the class_index item. Therefore, any error that can be thrown because of the resolution of a CONSTANT_Class_info can be thrown during the resolution of a CONSTANT_Fieldref_info. If resolution of the CONSTANT_Class_info entry succeeds, the virtual machine searches for the indicated field in the type and its supertypes. If it finds the indicated field, the virtual machine checks to make sure the current class has permission to access the field.

If resolution to the CONSTANT_Class_info completes successfully, the virtual machine performs the field lookup process using these steps:

  1. The virtual machine checks the referenced type for a field of the specified name and type. If the virtual machine discovers such a field, that field is the result of the successful field lookup.
  2. Otherwise, the virtual machine checks any interfaces directly implemented or extended by the type, and recursively, any superinterfaces of interfaces directly implemented or extended by the type, for a field of the specified name and type. If the virtual machine discovers such a field, that field is the result of the successful field lookup.
  3. Otherwise, if the type has a direct superclass, the virtual machine checks the type's direct superclass, and recursively all the superclasses of the type, for a field of the specified name and type. If the virtual machine discovers such a field, that field is the result of the successful field lookup.
  4. Otherwise, field lookup fails.

If the virtual machine discovers there is no field with the proper name and type in the referenced class or any of its supertypes (if field lookup failed), the virtual machine throws NoSuchFieldError. Otherwise, if the field lookup succeeds, but the current class doesn't have permission to access the field, the virtual machine throws IllegalAccessError.

Otherwise, the virtual machine marks the entry as resolved and places a direct reference to the field in the data for the constant pool entry.

Resolution of CONSTANT_Methodref_info Entries

To resolve a constant pool entry of type CONSTANT_Methodref_info, the virtual machine must first resolve the CONSTANT_Class_info entry specified in the class_index item. Therefore, any error that can be thrown because of the resolution of a CONSTANT_Class_info can be thrown during the resolution of a CONSTANT_Methodref_info. If the resolution of the CONSTANT_Class_info entry succeeds, the virtual machine searches for the indicated method in the type and its supertypes. If it finds the indicated method, the virtual machine checks to make sure the current class has permission to access the method.

If resolution to the CONSTANT_Class_info completes successfully, the virtual machine performs method resolution using these steps:

  1. If the resolved type is an interface, not a class, the virtual machine throws IncompatibleClassChangeError.
  2. Otherwise, the resolved type is a class. The virtual machine checks the referenced class for a method of the specified name and descriptor. If the virtual machine discovers such a method, that method is the result of the successful method lookup.
  3. Otherwise, if the class has a direct superclass, the virtual machine checks the class's direct superclass, and recursively all the superclasses of the class, for a method of the specified name and descriptor. If the virtual machine discovers such a method, that method is the result of the successful method lookup.
  4. Otherwise, the virtual machine checks any interfaces directly implemented by the class, and recursively, any superinterfaces of interfaces directly implemented by the type, for a method of the specified name and descriptor. If the virtual machine discovers such a method, that method is the result of the successful method lookup.
  5. Otherwise, method lookup fails.

If the virtual machine discovers there is no method with the proper name, return type, and number and types of parameters in the referenced class or any of its supertypes (if method lookup fails), the virtual machine throws NoSuchMethodError. Otherwise, if the method exists, but the method is abstract, the virtual machine throws AbstractMethodError. Otherwise, if the method exists, but the current class doesn't have permission to access the method, the virtual machine throws IllegalAccessError.

Otherwise, the virtual machine marks the entry as resolved and places a direct reference to the method in the data for the constant pool entry.

Resolution of CONSTANT_InterfaceMethodref_info Entries

To resolve a constant pool entry of type CONSTANT_InterfaceMethodref_info, the virtual machine must first resolve the CONSTANT_Class_info entry specified in the class_index item. Therefore, any error that can be thrown because of the resolution of a CONSTANT_Class_info can be thrown during the resolution of a CONSTANT_InterfaceMethodref_info. If the resolution of the CONSTANT_Class_info entry succeeds, the virtual machine searches for the indicated method in the interface and its supertypes. (The virtual machine need not check to make sure the current class has permission to access the method, because all methods declared in interfaces are implicitly public.)

If resolution to the CONSTANT_Class_info completes successfully, the virtual machine performs interface method resolution using these steps:

  1. If the resolved type is an class, not an interface, the virtual machine throws IncompatibleClassChangeError.
  2. Otherwise, the resolved type is an interface. The virtual machine checks the referenced interface for a method of the specified name and descriptor. If the virtual machine discovers such a method, that method is the result of the successful interface method lookup.
  3. Otherwise, the virtual machine checks the class's direct superinterfaces, recursively all the superinterfaces of the interface, and class java.lang.Object for a method of the specified name and descriptor. If the virtual machine discovers such a method, that method is the result of the successful interface method lookup.

If the virtual machine discovers there is no method with the proper name, return type, and number and types of parameters in the referenced interface or any of its supertypes, the virtual machine throws NoSuchMethodError.

Otherwise, the virtual machine marks the entry as resolved and places a direct reference to the method in the data for the constant pool entry.

Resolution of CONSTANT_String_info Entries

To resolve an entry of type CONSTANT_String_info, the virtual machine must place a reference to an interned String object in the data for the constant pool entry being resolved. The String object (an instance of class java.lang.String) must have the character sequence specified by the CONSTANT_Utf8_info entry identified by the string_index item of the CONSTANT_String_info.

Each Java virtual machine must maintain an internal list of references to String objects that have been "interned" during the course of running the application. Basically, a String object is said to be interned simply if it appears in the virtual machine's internal list of interned String objects. The point of maintaining this list is that any particular sequence of characters is guaranteed to appear in the list no more than once.

To intern a sequence of characters represented by a CONSTANT_String_info entry, the virtual machine checks to see if the sequence of characters is already in the list of interned strings. If so, the virtual machine uses the reference to the existing, previously-interned String object. Otherwise, the virtual machine creates a new String object with the proper character sequence and adds a reference to that String object to the list. To complete the resolution process for a CONSTANT_String_info entry, the virtual machine places the reference to the interned String object in the data of the constant pool entry being resolved.

In your Java programs, you can intern a string by invoking the intern() method of class String. All literal strings are interned via the process of resolving CONSTANT_String_info entries. If a string with the same sequence of Unicode characters has been previously interned, the intern() method returns a reference to the matching already-interned String object. If the intern() method is invoked on a String object that contains a sequence of characters that has not yet been interned, that object itself will be interned. The intern()method will return a reference to the same String object upon which it was invoked .

Here's an example:

// On CD-ROM in file linking/ex1/Example1.java
class Example1 {

    // Assume this application is invoked with one command-line
    // argument, the string "Hi!".
    public static void main(String[] args) {

        // argZero, because it is assigned a String from the command
        // line, does not reference a string literal. This string
        // is not interned.
        String argZero = args[0];

        // literalString, however, does reference a string literal.
        // It will be assigned a reference to a String with the value
        // "Hi!" by an instruction that references a
        // CONSTANT_String_info entry in the constant pool. The
        // "Hi!" string will be interned by this process.
        String literalString = "Hi!";

        // At this point, there are two String objects on the heap
        // that have the value "Hi!". The one from arg[0], which
        // isn't interned, and the one from the literal, which
        // is interned.
        System.out.print("Before interning argZero: ");
        if (argZero == literalString) {
            System.out.println("they're the same string object!");
        }
        else {
            System.out.println("they're different string objects.");
        }

        // argZero.intern() returns the reference to the literal
        // string "Hi!" that is already interned. Now both argZero
        // and literalString have the same value. The non-interned
        // version of "Hi!" is now available for garbage collection.
        argZero = argZero.intern();
        System.out.print("After interning argZero: ");
        if (argZero == literalString) {
            System.out.println("they're the same string object!");
        }
        else {
            System.out.println("they're different string objects.");
        }
    }
}
When executed with the string "Hi!" as the first command-line argument, the Example1 application prints the following:
Before interning argZero: they're different string objects.
After interning argZero: they're the same string object!

Resolution of Other Types of Entries

The CONSTANT_Integer_info, CONSTANT_Long_info, CONSTANT_Float_info, CONSTANT_Double_info entries contain the constant values they represent within the entry itself. These are straightforward to resolve. To resolve this kind of entry, many virtual machine implementations may not have to do anything but use the value as is. Other implementations, however, may choose to do some processing on it. For example, a virtual machine on a little-endian machine could choose to swap the byte order of the value at resolve time.

Entries of type CONSTANT_Utf8_info and CONSTANT_NameAndType_info are never referred to directly by instructions. They are only referred to via other types of entries, and resolved when those referring entries are resolved.

Loading Constraints

A Java type can refer symbolically to another type in the constant pool in ways that require special attention when performing resolution to ensure type safety in the presence of multiple class loaders. When one type contains a symbolic reference to a field in another type, the symbolic reference includes a descriptor that specifies the type of the field. When one type contains a symbolic reference to a method in another type, the symbolic reference includes a descriptor that specifies the types of the return value and parameters, if any. If the referenced and referencing types do not have the same initiating loader, the virtual machine must make sure the types mentioned in the field and method descriptors are consistent across the namespaces. For example, imagine class Cat contains symbolic references to fields and methods declared in class Mouse, and that two different class loaders initiated the loading of Cat and Mouse. To preserve type safety in the presence of multiple class loaders, it is essential that the fully qualified type names mentioned in field and method descriptors contained in Cat refer to the same type data (in the method area) as those same names in class Mouse.

To ensure that Java virtual machine implementations enforce this type consistency across namespaces, the second edition of the Java virtual machine specification defined several loading constraints. Each Java virtual machine must maintain an internal list of these constraints, each of which basically states that a name in one namespace must refer to the same type data in the method area as the same name in another namespace. As a Java virtual machine encounters symbolic references to fields and methods of referenced types whose loading wasn't initiated by the same class loader that initiated loading of the referencing type, the virtual machine may add constraints to the list. The virtual machine must check that all current loading constraints are met when it resolves symbolic references.

To describe the loading constraints, the notation Li will be used to represent types. C denotes the fully qualified name of the type. Ld denotes the defining class loader of the type. Li denotes the class loader that initiated loading of the type. When the defining class loader is irrelevant, the simplified notation CLi will be used to denote the type and its initiating class loader. When the initiating loader is irrelevant, the simplified notation will be used to denote the type and its defining class loader. An equals sign between two types denotes that both types are actually the exact same type, represented by the same type data in the method area.

Given this notation, the rules for generating loading constraints are:

If the virtual machine's internal list of constraints contains the two constraints TL1 = TL2 and TL2 = TL3, this implies that TL1 = TL3. Even if type T is never loaded by L2 during the execution of the virtual machine instance, the types named T loaded by L1 and L3 must still be the same exact type.

For a less mathematical look at loading constraints, refer to the last example in this chapter. This example, which is presented in the section titled "Example: Type Safety and Loading Constraints," shows how the lack of loading constraints can enable an industrious cracker to thwart the Java virtual machine's guarantee of type safety.

Compile-Time Resolution of Constants

As mentioned in Chapter 7, "The Lifetime of a Class," references to static final variables initialized to a compile-time constant are resolved at compile-time to a local copy of the constant value. This is true for constants of all the primitive types and of type java.lang.String.

This special treatment of constants facilitates two features of the Java language. First, local copies of constant values enable static final variables to be used as case expressions in switch statements. The two virtual machine instructions that implement switch statements in bytecodes, tableswitch and lookupswitch, require the case values in-line in the bytecode stream. These instructions do not support run-time resolution of case values. See Chapter 16, "Control Flow," for more information about these instructions.

The other motivation behind the special treatment of constants is conditional compilation. Java supports conditional compilation via if statements whose expressions resolve to a compile-time constant. Here's an example:

// On CD-ROM in file linking/ex2/AntHill.java
class AntHill {

    static final boolean debug = true;
}

// On CD-ROM in file linking/ex2/Example2.java
class Example2 {

    public static void main(String[] args) {
        if (AntHill.debug) {
            System.out.println("Debug is true!");
        }
    }
}

Because of the special treatment of primitive constants, the Java compiler can decide whether or not to include the body of the if statement in Example2.main() depending upon the value of AntHill.debug. Because AntHill.debug is true in this case, javac generates bytecodes for Example2's main() method that include the body of the if statement, but not a check of AntHill.debug's value. The constant pool of Example2 has no symbolic reference to class AntHill. Here are the bytecodes for main():

              // Push objref from System.out
0 getstatic #8 
              // Push objref to literal string "Debug is true!"
3 ldc #1 
              // Pop objref (to a String), pop objref(to
              // System.out), invoke println() on System.out
              // passing the string as the only parameter:
              // System.out.println("Debug is true!");
5 invokevirtual #9 
8 return      // return void

If the reference to AntHill.debug were resolved at run-time, the compiler would always need to include a check of AntHill.debug's value and the body of the if statement just in case value of AntHill.debug ever changed. The value of AntHill.debug can't change after it is compiled, of course, because it is declared as final. Still, you could change the source code of AntHill and recompile AntHill, but not recompile Example2.

Because the reference to AntHill.debug is resolved at compile-time the compiler can conditionally compile out the body of the if statement if AntHill.debug is discovered to be false. Note that this means you can't change the behavior of the Example2 application just be setting AntHill to false and recompiling only AntHill. You have to recompile Example2 as well.

Example3, shown below, is Example2 with its name changed to Example3 and compiled with an AntHill that has debug set to false:

// On CD-ROM in file linking/ex3/AntHill.java
class AntHill {

    static final boolean debug = false;
}

// On CD-ROM in file linking/ex3/Example3.java
class Example3 {

    public static void main(String[] args) {
        if (AntHill.debug) {
            System.out.println("Debug is true!");
        }
    }
}

Here are the bytecodes generated by javac for Example3's main() method:

0 return     // return void

As you can see, the Java compiler has brazenly eliminated the entire if statement found in Example3.main(). There is not even a hint of the println() invocation in this very short bytecode sequence.

Direct References

The ultimate goal of constant pool resolution is to replace a symbolic reference with a direct reference. The form of symbolic references is well-defined in Chapter 6, "The Java Class File," but what form do direct references take? As you might expect, the form of direct references is yet another decision of the designers of individual Java virtual machine implementations. Nevertheless, there are some characteristics likely to be common among most implementations.

Direct references to types, class variables, and class methods are likely native pointers into the method area. A direct reference to a type can simply point to the implementation-specific data structure in the method area that holds the type data. A direct reference to a class variable can point to the class variable's value stored in the method area. A direct reference to a class method can point to a data structure in the method area that contains the data needed to invoke the method. For example, the data structure for a class method could include information such as whether or not the method is native. If the method is native, the data structure could include a function pointer to the dynamically linked native method implementation. If the method is not native, the data structure could include the method's bytecodes, max_stack, max_locals, and so on. If there is a just-in-time-compiled version of the method, the data structure could include a pointer to that just-in-time-compiled native code.

Direct references to instance variables and instance methods are offsets. A direct reference to an instance variable is likely the offset from the start of the object's image to the location of the instance variable. A direct reference to an instance method is likely an offset into a method table.

Using offsets to represent direct references to instance variables and instance methods depends on a predictable ordering of the fields in a class's object image and the methods in a class's method table. Although implementation designers may choose any way of placing instance variables into an object image or methods into a method table, they will almost certainly use the same way for all types. Therefore, in any one implementation, the ordering of fields in an object and methods in a method table is defined and predictable.

As an example, consider this hierarchy of three classes and one interface:

// On CD-ROM in file linking/ex4/Friendly.java
interface Friendly {

    void sayHello();
    void sayGoodbye();
}

// On CD-ROM in file linking/ex4/Dog.java
class Dog {

    // How many times this dog wags its tail when
    // saying hello.
    private int wagCount = ((int) (Math.random() * 5.0)) + 1;

    void sayHello() {

        System.out.print("Wag");
        for (int i = 0; i < wagCount; ++i) {
            System.out.print(", wag");
        }
        System.out.println(".");
    }

    public String toString() {

        return "Woof!";
    }
}

// On CD-ROM in file linking/ex4/CockerSpaniel.java
class CockerSpaniel extends Dog implements Friendly {

    // How many times this Cocker Spaniel woofs when saying hello.
    private int woofCount = ((int) (Math.random() * 4.0)) + 1;

    // How many times this Cocker Spaniel wimpers when saying
    // goodbye.
    private int wimperCount = ((int) (Math.random() * 3.0)) + 1;

    public void sayHello() {

        // Wag that tail a few times.
        super.sayHello();

        System.out.print("Woof");
        for (int i = 0; i < woofCount; ++i) {
            System.out.print(", woof");
        }
        System.out.println("!");
    }

    public void sayGoodbye() {

        System.out.print("Wimper");
        for (int i = 0; i < wimperCount; ++i) {
            System.out.print(", wimper");
        }
        System.out.println(".");
    }
}

// On CD-ROM in file linking/ex4/Cat.java
class Cat implements Friendly {


    public void eat() {

        System.out.println("Chomp, chomp, chomp.");
    }

    public void sayHello() {

        System.out.println("Rub, rub, rub.");
    }

    public void sayGoodbye() {

        System.out.println("Scamper.");
    }

    protected void finalize() {

        System.out.println("Meow!");
    }
}

Assume these types are loaded into a Java virtual machine that organizes objects by placing the instance variables declared in superclasses into the object image before those declared in subclasses, and by placing the instance variables for each individual class in their order of appearance in the class file. Assuming there are no instance variables in class Object, the object images for Dog, CockerSpaniel, and Cat would appear as shown in Figure 8-1.

Figure 8-1. Some object images.

Figure 8-1. Some object images.

In this figure, the object image for CockerSpaniel best illustrates this particular virtual machine's approach to laying out objects. The instance variable for Dog, the superclass, appears before the instance variables for CockerSpaniel, the subclass. The instance variables of CockerSpaniel appear in order of declaration: woofCount first, then wimperCount.

Note that the wagCount instance variable appears at offset one in both Dog and CockerSpaniel. In this implementation of the Java virtual machine, a symbolic reference to the wagCount field of class Dog would be resolved to direct reference that is an offset of one. Regardless of whether the actual object being referred to was a Dog, a CockerSpaniel, or any other subclass of Dog, the wagCount instance variable would always appear at offset one in the object image.

A similar pattern emerges in method tables. A method table entry is associated in some way with data structures in the method area that contain sufficient data to enable the virtual machine to invoke the method. Assume that in the Java virtual machine implementation being described here, method tables are arrays of native pointers into the method area. The data structures that the method table entries point to are similar to the data structures described above for class methods. Assume that the particular Java virtual machine implementation that loads these types organizes its method tables by placing methods for superclasses into the method table before those for subclasses, and by placing pointers for each class in the order the methods appear in the class file. The exception to the ordering is that methods overridden by a subclass appear in the slot where the overridden method first appears in a superclass.

The way this virtual machine would organize the method table for class Dog is shown in Figure 8-2. In this figure, the method table entries that point to methods defined in class Object are shown in dark gray. Entries that point to methods defined in Dog are shown in light gray.

Figure 8-2. The method table for class Dog

Figure 8-2. The method table for class Dog.

Note that only non-private instance methods appear in this method table. Class methods, which are invoked via the invokestatic instruction, need not appear here, because they are statically bound and don't need the extra indirection of a method table. Private methods and instance initialization methods need not appear here because they are invoked via the invokespecial instruction and are therefore statically bound. Only methods that are invoked with invokevirtual or invokeinterface appear in this method table. See Chapter 19, "Method Invocation and Return," for a discussion of the different invocation instructions.

By looking at the source code, you can see that Dog overrides the toString() method defined in class Object. In Dog's method table, the toString() method appears only once, in the same slot (offset seven) in which it appears in the method table for Object. The pointer residing at offset seven in Dog's method table points to the data for Dog's implementation of toString(). In this implementation of the Java virtual machine, the pointer to the method data for toString() will appear at offset seven for every method table of every class. (Actually, you could write your own version of java.lang.Object and load it in through a user-defined class loader. In this manner you could create a namespace in which the pointer to toString() occupies a method table offset other than seven in the same Java virtual machine implementation.)

Below the methods declared in Object, which appear first in this method table, come the methods declared in Dog that don't override any method in Object. There is only one such method, sayHello(), which has the method table offset 11. All of Dog's subclasses will either inherit or override this implementation of sayHello(), and some version of sayHello() will always appear at offset 11 of any subclass of Dog.

Figure 8-3 shows the method table for CockerSpaniel. Note that because CockerSpaniel declares sayHello() and sayGoodbye(), the pointers for those methods point to the data for CockerSpaniel's implementation of those methods. Because CockerSpaniel inherits Dog's implementation of toString(), the pointer for that method (which is still at offset seven) points the data for Dog's implementation of that method. CockerSpaniel inherits all other methods from Object, so the pointers for those methods point directly into Object's type data. Note also that sayHello() is sitting at offset eleven, the same offset it has in Dog's method table.

Figure 8-3. The method table for class CockerSpaniel.

Figure 8-3. The method table for class CockerSpaniel.

When the virtual machine resolves a symbolic reference (a CONSTANT_Methodref_info entry) to the toString() method of any class, the direct reference is method table offset seven. When the virtual machine resolves a symbolic reference to the sayHello() method of Dog or any of its subclasses, the direct reference is method table offset eleven. When the virtual machine resolves a symbolic reference to the sayGoodbye() method of CockerSpaniel or any of its subclasses, the direct reference is the method table offset twelve.

Once a symbolic reference to an instance method is resolved to a method table offset, the virtual machine must still actually invoke the method. To invoke an instance method, the virtual machine goes through the object to get at the method table for the object's class. As mentioned in Chapter 5, "The Java Virtual Machine," given a reference to an object, every virtual machine implementation must have some way to get at the type data for that object's class. In addition, given a reference to an object, the method table (a part of the type data for the object's class) is usually very quickly accessible. (One potential scheme is shown in Figure 5-7.) Once the virtual machine has the method table for the object's class, it uses the offset to find the actual method to invoke. Voila!

The virtual machine can always depend on method table offsets when it has a reference of a class type (a CONSTANT_Methodref_info entry). If the sayHello() method appears in offset seven in class Dog, it will appear in offset seven in any subclass of Dog. The same is not true, however, if the reference is of an interface type (a CONSTANT_InterfaceMethodref_info entry). With direct references to instance methods accessed through an interface reference there is no guaranteed method table offset. Consider the method table for class Cat, shown in Figure 8-4.

Figure 8-4. The method table for class Cat.

Figure 8-4. The method table for class Cat.

Note that both Cat and CockerSpaniel implement the Friendly interface. A variable of type Friendly could hold a reference to a Cat object or a CockerSpaniel object. With that reference, your program could invoke sayHello() or sayGoodbye() on a Cat, a CockerSpaniel, or any other object whose class implements the Friendly interface. The Example4 application demonstrates this:

// On CD-ROM in file linking/ex4/Example4.java
class Example4 {

    public static void main(String[] args) {

        Dog dog = new CockerSpaniel();

        dog.sayHello();

        Friendly fr = (Friendly) dog;

        // Invoke sayGoodbye() on a CockerSpaniel object through a
        // reference of type Friendly.
        fr.sayGoodbye();

        fr = new Cat();

        // Invoke sayGoodbye() on a Cat object through a reference
        // of type Friendly.
        fr.sayGoodbye();
    }
}

In Example4, local variable fr invokes sayGoodbye() on both a CockerSpaniel object and a Cat object. The same constant pool entry, a CONSTANT_InterfaceMethodref_info entry, is used to invoke this method on both objects. But when the virtual machine resolves the symbolic reference to sayHello(), it can't just save a method table offset and expect that offset to always work in future uses of the constant pool entry.

The trouble is that classes that implement the Friendly interface aren't guaranteed to have a common superclass that also implements Friendly. As a result, the methods declared in Friendly aren't guaranteed to be in the same place in all method tables. If you compare the method table for CockerSpaniel against the method table for Cat, for example, you'll see that in CockerSpaniel, sayHello()'s pointer occupies offset 11. But in Cat, sayHello() occupies offset 12. Likewise, CockerSpaniel's sayGoodbye() method pointer resides in offset 12, but Cat's sayGoodbye() method pointer resides at offset 13.

Thus, whenever the Java virtual machine invokes a method from an interface reference, it must search the method table of the object's class until it finds the appropriate method. This is why invoking instance methods on interface references can be significantly slower than invoking instance methods on class references. Virtual machine implementations can attempt to be smart, of course, about how they search through a method table. For example, an implementation could save the last index at which they found the method and try there first the next time. Or an implementation could build data structures during preparation that help them search through method tables given an interface reference. Nevertheless, invoking a method given an interface reference will likely be to some extent slower than invoking a method given a class reference.

_quick Instructions

The first edition of the Java virtual machine specification described a technique used by one of Sun's early implementations of the Java virtual machine to speed up the interpretation of bytecodes. In this scheme, opcodes that refer to constant pool entries are replaced by a "_quick" opcode when the constant pool entry is resolved. When the virtual machine encounters a _quick instruction, it knows the constant pool entry is already resolved and can therefore execute the instruction faster.

The core instruction set of the Java virtual machine consists of 200 single-byte opcodes, all of which are described in Appendix A, "Instruction Set by Opcode Mnemonic." These 200 opcodes are the only opcodes you will ever see in class files. Virtual machine implementations that use the "_quick" technique use another 25 single-byte opcodes internally, the "_quick" opcodes.

For example, when a virtual machine that uses the _quick technique resolves a constant pool entry referred to by an ldc instruction (opcode value 0x12), it replaces the ldc opcode byte in the bytecode stream with an ldc_quick instruction (opcode value 0xcb). This technique is part of the process of replacing a symbolic reference with a direct reference in Sun's early virtual machine.

For some instructions, in addition to overwriting the normal opcode with a _quick opcode, a virtual machine that uses the _quick technique overwrites the operands of the instruction with data that represents the direct reference. For example, in addition to replacing an invokevirtual opcode with an invokevirtual_quick, the virtual machine also puts the method table offset and the number of arguments into the two operand bytes that follow every invokevirtual instruction. Placing the method table offset in the bytecode stream following the invokevirtual_quick opcode saves the virtual machine the time it would take to look up the offset in the resolved constant pool entry.

Example: The Linking of the Salutation Application

As an example of Java's linking model, consider the Salutation application shown below:

// On CD-ROM in file linking/ex5/Salutation.java
class Salutation {

    private static final String hello = "Hello, world!";
    private static final String greeting = "Greetings, planet!";
    private static final String salutation = "Salutations, orb!";

    private static int choice = (int) (Math.random() * 2.99);

    public static void main(String[] args) {

        String s = hello;
        if (choice == 1) {
            s = greeting;
        }
        else if (choice == 2) {
            s = salutation;
        }

        System.out.println(s);
    }
}

Assume that you have asked a Java virtual machine to run Salutation. When the virtual machine starts, it attempts to invoke the main() method of Salutation. It quickly realizes, however, that it can't invoke main(). The invocation of a method declared in a class is an active use of that class, which is not allowed until the class is initialized. Thus, before the virtual machine can invoke main(), it must initialize Salutation. And before it can initialize Salutation, it must load and link Salutation. So, the virtual machine hands the fully qualified name of Salutation to the bootstrap class loader, which retrieves the binary form of the class, parses the binary data into internal data structures, and creates an instance of java.lang.Class. The constant pool for Salutation is shown in Table 8-1.

Index Type Value
1 CONSTANT_String_info 30
2 CONSTANT_String_info 31
3 CONSTANT_String_info 39
4 CONSTANT_Class_info 37
5 CONSTANT_Class_info 44
6 CONSTANT_Class_info 45
7 CONSTANT_Class_info 46
8 CONSTANT_Class_info 47
9 CONSTANT_Methodref_info 7, 16
10 CONSTANT_Fieldref_info 4, 17
11 CONSTANT_Fieldref_info 8, 18
12 CONSTANT_Methodref_info 5, 19
13 CONSTANT_Methodref_info 6, 20
14 CONSTANT_Double_info 2.99
16 CONSTANT_NameAndType_info 26, 22
17 CONSTANT_NameAndType_info 41, 32
18 CONSTANT_NameAndType_info 49, 34
19 CONSTANT_NameAndType_info 50, 23
20 CONSTANT_NameAndType_info 51, 21
21 CONSTANT_Utf8_info "()D"
22 CONSTANT_Utf8_info "()V"
23 CONSTANT_Utf8_info "(Ljava/lang/String;)V"
24 CONSTANT_Utf8_info "([Ljava/lang/String;)V"
25 CONSTANT_Utf8_info "<clinit>"
26 CONSTANT_Utf8_info "<init>"
27 CONSTANT_Utf8_info "Code"
28 CONSTANT_Utf8_info "ConstantValue"
29 CONSTANT_Utf8_info "Exceptions"
30 CONSTANT_Utf8_info "Greetings, planet!"
31 CONSTANT_Utf8_info "Hello, world!"
32 CONSTANT_Utf8_info "I"
33 CONSTANT_Utf8_info "LineNumberTable"
34 CONSTANT_Utf8_info "Ljava/io/PrintStream;"
35 CONSTANT_Utf8_info "Ljava/lang/String;"
36 CONSTANT_Utf8_info "LocalVariables"
37 CONSTANT_Utf8_info "Salutation"
38 CONSTANT_Utf8_info "Salutation.java"
39 CONSTANT_Utf8_info "Salutations, orb!"
40 CONSTANT_Utf8_info "SourceFile"
41 CONSTANT_Utf8_info "choice"
42 CONSTANT_Utf8_info "greeting"
43 CONSTANT_Utf8_info "hello"
44 CONSTANT_Utf8_info "java/io/PrintStream"
45 CONSTANT_Utf8_info "java/lang/Math"
46 CONSTANT_Utf8_info "java/lang/Object"
47 CONSTANT_Utf8_info "java/lang/System"
48 CONSTANT_Utf8_info "main"
49 CONSTANT_Utf8_info "out"
50 CONSTANT_Utf8_info "println"
51 CONSTANT_Utf8_info "random"
52 CONSTANT_Utf8_info "salutation"

Table 8-1. Class Salutation's constant pool

As part of the loading process for Salutation, the Java virtual machine must make sure all of Salutation's superclasses have been loaded. To start this process, the virtual machine looks into Salutation's type data at the super_class item, which is a seven. The virtual machine looks up entry seven in the constant pool, and finds a CONSTANT_Class_info entry that serves as a symbolic reference to class java.lang.Object. See Figure 8-5 for a graphical depiction of this symbolic reference. The virtual machine resolves this symbolic reference, which causes it to load class Object. Because Object is the top of Salutation's inheritance hierarchy, the virtual machine and links and initializes Object as well.

Figure 8-5 The symbolic reference from Salutation to Object.

Figure 8-5 The symbolic reference from Salutation to Object.

Now that the Java virtual machine has loaded the Salutation class and loaded, linked and initialized all its superclasses, the virtual machine is ready to link Salutation. As the first step in the linking process, the virtual machine verifies the integrity of the binary representation of class Salutation. Assume this implementation of the Java virtual machine performs all verification up front, except for the verification of symbolic references. So by the time this official verification phase of linking is completed, the virtual machine will have verified:

  1. that Salutation's binary data is structurally correct
  2. that Salutation correctly implements the semantics of the Java language
  3. that Salutation's bytecodes won't crash the virtual machine

After the Java virtual machine has verified Salutation, it must prepare for Salutation's use by allocating any memory needed by the class. At this stage, the virtual machine allocates memory for Salutation's class variable, choice, and gives it a default initial value. Because the choice class variable is an int, it receives the default initial value of zero.

The three literal Strings--hello, greeting, and salutation--are constants, not class variables. They do not occupy memory space as class variables in the method area. They don't receive default initial values. Because they are declared static and final, they appear as CONSTANT_String_info entries in Salutation's constant pool. The constant pool for Salutation that was generated by javac is shown in Table 8-1. The entries that represent Salutation's constant strings are: for greeting, entry one; for hello, entry two; and for salutation, entry three.

After the processes of verification and preparation have successfully completed, the class is ready for resolution. As mentioned above, different implementations of the Java virtual machine may perform the resolution phase of linking at different times. Resolution of Salutation is optional at this point in its lifetime. Java virtual machines are not required to perform resolution until each symbolic reference is actually used by the program. If a symbolic reference is never actually used by a program, the virtual machine is not required to resolve it.

A Java virtual machine implementation could perform the recursive resolution process, described above for Salutation, at this point in the lifetime of a program. If so, the program would be completely linked before main() is ever invoked. A different Java virtual machine implementation could perform none of the resolution process at this point. Instead, it could resolve each symbolic reference the first time it is actually used by the running program. Other implementations could choose a resolution strategy between these two extremes. Although different implementations may perform resolution at different times, all implementations will ensure that a type is loaded, verified, prepared, and initialized before it is used.

Assume this implementation of the Java virtual machine uses late resolution. As each symbolic reference is used for the first time by the program, it will be checked for accuracy and converted into a direct reference. Assume also that this implementation uses the technique of replacing the opcode that refers to the constant pool with _quick equivalents.

Once this Java virtual machine implementation has loaded, verified, and prepared Salutation, it is ready to initialize it. As mentioned above, the Java virtual machine must initialize all superclasses of a class before it can initialize the class. In this case, the virtual machine has already initialized Object, the superclass of Salutation.

After the virtual machine has made sure all of Salutation's superclasses have been initialized (in this case, just class Object), it is ready to invoke Salutation's <clinit>() method. Because Salutation contains a class variable, width, that has an initializer that doesn't resolve at compile-time to a constant, the compiler does place a <clinit>() method into Salutation's class file.

Here's the <clinit>() method for Salutation:

             // Invoke class method Math.random(), passing no
             // parameters. Push double result.
 0 invokestatic #13 <Method double random()>
            // Push double constant 2.99 from constant pool.
 3 ldc2_w #14 <Double 2.99>
 6 dmul     // Pop two doubles, multiple, push double result.
 7 d2i      // Pop double, convert to int, push int result.
            // Pop int, store int Salutation.choice
 8 putstatic #10 <Field int choice>
11 return   // Return void from <clinit>()

The Java virtual machine executes Salutation's <clinit>() method to set the choice field to its proper initial value. Before executing <clinit>(), choice has its default initial value of zero. After executing <clinit>(), it has one of three values chosen pseudo-randomly: zero, one, or two.

The first instruction of the <clinit>() method, invokestatic #13, refers to constant pool entry 13, a CONSTANT_Methodref_info that represents a symbolic reference to the random() method of class java.lang.Math. You can see a graphical depiction of this symbolic reference in Figure 8-6. The Java virtual machine resolves this symbolic reference, which causes it to load, link, and initialize class java.lang.Math. It places a direct reference to the random() method into constant pool entry 13, marks the entry as resolved, and replaces the invokestatic opcode with invokestatic_quick.

Figure 8-6. The symbolic reference from Salutation to Math.random().

Figure 8-6. The symbolic reference from Salutation to Math.random().

Having completed the resolution process for constant pool entry 13, the Java virtual machine is ready to invoke the method. When the virtual machine actually invokes the random() method, it will load, link, and initialize any types referenced symbolically from Math's constant pool and random()'s code. When this method returns, the virtual machine will push the returned double value onto the main() method's operand stack.

To execute the next instruction, ldc2_w #14, the virtual machine looks into constant pool entry 14 and finds an unresolved CONSTANT_Double_info entry. The virtual machine resolves this entry to the double value 2.99, marks the entry as resolved, and replaces the ldc2_w opcode with ldc2_w_quick. Once the virtual machine has resolved constant pool entry 14, it pushes the constant double value, 2.99, onto the operand stack.

Note that this entry, a CONSTANT_Double_info, does not refer to any other constant pool entry or item outside this class. The eight bytes of the double value 2.99 are specified within the entry itself.

Note also that in this constant pool, there is no entry with an index of 15. As mentioned in Chapter 6, "The Java Class File," entries of type CONSTANT_Double_info and CONSTANT_Long_info occupy two slots in the constant pool. Thus, the CONSTANT_Double_info at index 14 is considered to occupy both indices 14 and 15.

To execute the next instruction, dmul, the virtual machine pops two doubles, multiplies them, and pushes the double result. For the next instruction, the virtual machine pops the double, converts it to int, and pushes the int result. Assume that for this particular execution of Salutation, the result of this operation is the int value two.

The next instruction, putstatic #10, uses another symbolic reference from the constant pool, this one to the choice variable of Salutation itself. This instruction illustrates that a class's bytecodes use symbolic references to refer not only to fields and methods of other types, but also to its own fields and methods. When the virtual machine executes this instruction, it looks up constant pool entry 10 and finds an as yet unresolved CONSTANT_Fieldref_info item. See Figure 8-7 For a graphical depiction of this symbolic reference. The virtual machine resolves the reference by locating the choice class variable in Salutation's type data in the method area, and placing a pointer to the actual variable data in constant pool entry 10. It marks the entry as resolved and replaces the putstatic opcode with putstatic_quick.

Figure 8-7. The symbolic reference from Salutation to its own choice field.

Figure 8-7. The symbolic reference from Salutation to its own choice field.

Once it has resolved the CONSTANT_Fieldref_info entry for choice, the virtual machine pops an int (in this case a two) from the operand stack and places it into the choice variable. The execution of the putstatic instruction is now complete.

Lastly, the virtual machine executes the return instruction, which signals to the virtual machine that the <clinit>() method, and hence the initialization of class Salutation, is complete. Now that class Salutation has been initialized, it is finally ready for use. The Java virtual machine invokes main(), and the program begins. Here's the bytecode sequence for Salutation's main() method:

                // Push objref to literal string from constant pool
                // entry 2
 0 ldc #2 <String "Hello, world!">
 2 astore_1     // Pop objref into loc var 1: String s = hello;
                // Push int from static field Salutation.choice. Note
                // that by this time, choice has definitely been
                // given its proper initial value.
 3 getstatic #10 <Field int choice>
 6 iconst_1     // Push int constant 1
                // Pop two ints, compare, if not equal branch to 16:
 7 if_icmpne 16 // if (choice == 1) {
                // Here, choice does equal 1. Push objref to string
                // literal from constant pool:
10 ldc #1 <String "Greetings, planet!">
12 astore_1     // Pop objref into loc var 1: s = greeting;
13 goto 26      // Branch unconditionally to offset 26
                // Push int from static field Salutation.choice
16 getstatic #10 <Field int choice>
19 iconst_2     // Push int constant 2
                // Pop two ints, compare, if not equal branch to 26:
20 if_icmpne 26 // if (choice == 2) {
                // Here, choice does equal 2. Push objref to string
                // literal from constant pool:
23 ldc #3 <String "Salutations, orb!">
25 astore_1     // Pop objref into loc var 1: String s = salutation;
                // Push objref from System.out
26 getstatic #11 <Field java.io.PrintStream out>
29 aload_1      // Push objref (to a String) from loc var 1
                // Pop objref (to a String), pop objref(to
                // System.out), invoke println() on System.out
                // passing the string as the only parameter:
                // System.out.println(s);
30 invokevirtual #12 <Method void println(java.lang.String)>
33 return       // Return void from main()

The first instruction in main(), ldc #2, uses a symbolic reference to the string literal "Hello, world!". When the virtual machine executes this instruction, it looks up constant pool entry two and finds a CONSTANT_String_info item that hasn't yet been resolved. See Figure 8-8 For a graphical depiction of the symbolic reference to this string literal.

Figure 8-8. A symbolic reference from Salutation to "Hello, World!"

Figure 8-8. A symbolic reference from Salutation to "Hello, world!"

As part of executing the ldc instruction, the virtual machine resolves the constant pool entry. It creates and interns a new String object with the value "Hello, world!", places a reference to the string object in the constant pool entry, marks the entry as resolved, and replaces the ldc opcode with an ldc_quick.

Now that the virtual machine has resolved the "Hello, world!" string literal, it pushes the reference to that String object onto the stack. The next instruction, astore_1, pops the reference and stores it into local variable position one, the s variable.

To execute the next instruction, getstatic #10, the virtual machine looks up constant pool entry 10 and discovers a CONSTANT_Fieldref_info entry that has already been resolved. This entry, a symbolic reference to Salutation's own choice field, was resolved by the putstatic #10 instruction in the <clinit>() method. The virtual machine simply replaces the getstatic opcode with getstatic_quick, and pushes the int value of choice onto the stack.

To execute main()'s next instruction, iconst_1, the virtual machine simply pushes int one onto the operand stack. For the next instruction, ificmpne 16, the virtual machine pops the top two ints and subtracts one from the other. In this case, since the value of choice was set by the <clinit>() method to be two, the result of the subtraction is not zero. As a consequence, the virtual machine takes the branch. It updates the pc register so that the next instruction it executes is the getstatic instruction at offset 16.

The getstatic instruction at offset 16 refers to the same constant pool entry referred to by the getstatic instruction at offset three: constant pool entry 10. When the virtual machine executes the getstatic at offset 16, it looks up constant pool entry 10 and finds a CONSTANT_Fieldref_info entry that is already resolved. It replaces the getstatic opcode with getstatic_quick, and pushes the int value of Salutation's choice class variable (a two) onto the operand stack.

To execute the next instruction, iconst_2, the virtual machine pushes an int two onto the stack. For the next instruction, another ificmpne 26, the virtual machine again pops two ints and subtracts one from the other. This time, however, both ints equal two, so the result of the subtraction is zero. As a consequence, the virtual machine does not take the branch and continues on to execute the next instruction in the bytecode array, another ldc.

The ldc instruction at offset 23 refers to constant pool entry three, a CONSTANT_String_info entry that serves as a symbolic reference to the string literal "Salutations, orb!". The virtual machine looks up this entry in the constant pool and discovers it is as yet unresolved. To resolve the entry, the virtual machine creates and interns a new String object with the value "Salutations, orb!", places a reference to the new object in the data for constant pool entry three, and replaces the ldc opcode with ldc_quick. Having resolved the string literal, the virtual machine pushes the reference to the String object onto the stack.

To execute the next instruction, astore_1, the virtual machine pops the object reference to the "Salutations, orb!" string literal off the stack and stores it into local variable slot one, overwriting the reference to "Hello, world!" written there by the astore_1 instruction at offset two.

The next instruction, getstatic #11, uses a symbolic reference to a public static class variable of java.lang.System with the name out and the type java.io.PrintStream. This symbolic reference occupies the CONSTANT_Fieldref_info entry at index 11 in the constant pool. See Figure 8-9 For a graphical depiction of this symbolic reference.

Figure 8-9. The symbolic reference from Salutation to System.out.

Figure 8-9. The symbolic reference from Salutation to System.out.

To resolve the reference to System.out, the Java virtual machine must load, link, and initialize java.lang.System to make sure it has a public static field, named out, of type java.io.PrintStream. Then, the virtual machine will replace the symbolic reference with a direct reference, such as a native pointer, so that any future uses of System.out by Saluation won't require resolution and will be faster. Lastly, the virtual machine will replace the getstatic opcode with getstatic_quick.

Once the virtual machine has successfully resolved the symbolic reference, it will push the reference to System.out onto the stack. To execute the next instruction, aload_1, the virtual machine simply pushes onto the stack the object reference from local variable one, which is the reference to the "Salutations, orb!" string literal.

To execute the next instruction, invokevirtual #12, the Java virtual machine looks up constant pool entry 12 and finds an unresolved CONSTANT_Methodref_info entry, a symbolic reference to the println() method of java.io.PrintStream. See Figure 8- 10 for a graphical depiction of this symbolic reference. The virtual machine loads, links, and initializes java.io.PrintStream, and makes sure it has a println() method that is public, returns void, and takes a String argument. It marks the entry as resolved and puts a direct reference (an index into PrintStream's method table) into the data for the resolved constant pool entry. Lastly, the virtual machine replaces the invokevirtual opcode with invokevirtual_quick, and places the method table index and the number of arguments accepted by the method as operands to the invokevirtual_quick opcode.

Figure 8-10. The symbolic reference from Salutation to PrintStream.println().

Figure 8-10. The symbolic reference from Salutation to PrintStream.println().

When the virtual machine actually invokes the println() method, it will load, link, and initialize any types referenced symbolically from PrintStream's constant pool and println()'s code.

The next instruction is the last instruction the main() method: return. Because main() was being executed by the only non- deamon thread running in the Salutation application, executing the return instruction will cause the virtual machine to exit. Note that constant pool entry one, which contained a symbolic reference to the "Greetings, planet!" string literal, was never resolved during this execution of the Salutation application. Because choice happened to be initialized with a value of two, the instruction that referred to constant pool entry one, the ldc #1 instruction at offset 10, was never executed. As a result, the virtual machine never created a String object with the value "Greetings, planet!".

Example: The Dynamic Extension of the Greet Application

As an example of an application that performs dynamic extension through user-defined class loaders, consider the following class:

// On CD-ROM in file linking/ex6/Greet.java
import com.artima.greeter.*;

public class Greet {

    // Arguments to this application:
    //     args[0] - path name of directory in which class files
    //               for greeters are stored
    //     args[1], args[2], ... - class names of greeters to load
    //               and invoke the greet() method on.
    //
    // All greeters must implement the com.artima.greeter.Greeter
    // interface.
    //
    static public void main(String[] args) {

        if (args.length <= 1) {
            System.out.println(
                "Enter base path and greeter class names as args.");
            return;
        }

        GreeterClassLoader gcl = new GreeterClassLoader(args[0]);

        for (int i = 1; i < args.length; ++i) {
            try {

                // Load the greeter specified on the command line
                Class c = gcl.loadClass(args[i]);

                // Instantiate it into a greeter object
                Object o = c.newInstance();

                // Cast the Object ref to the Greeter interface type
                // so greet() can be invoked on it
                Greeter greeter = (Greeter) o;

                // Greet the world in this greeter's special way
                greeter.greet();
            }
            catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

The Greet application is a fancy incarnation of the typical "Hello, world!" program. Greet uses a user-defined class loader to dynamically extend itself with classes--called "greeters"--that do the actual work of telling the world hello.

A greeter is any class that implements the com.artima.greeter.Greeter interface:

// On CD-ROM in file linking/ex6/com/artima/greeter/Greeter.java
package com.artima.greeter;

public interface Greeter {

    void greet();
}

As you can see from the code above, the Greeter interface declares only one method: greet(). When a greeter object's greet() method is invoked, the object should say hello to the world in its own unique way. Here are a few examples of greeters:

// On CD-ROM in file linking/ex6/greeters/Hello.java
import com.artima.greeter.Greeter;

public class Hello implements Greeter {

    public void greet() {
        System.out.println("Hello, world!");
    }
}

// On CD-ROM in file linking/ex6/greeters/Greetings.java
import com.artima.greeter.Greeter;

public class Greetings implements Greeter {

    public void greet() {
        System.out.println("Greetings, planet!");
    }
}

// On CD-ROM in file linking/ex6/greeters/Salutations.java
import com.artima.greeter.Greeter;

public class Salutations implements Greeter {

    public void greet() {
        System.out.println("Salutations, orb!");
    }
}

// On CD-ROM in file linking/ex6/greeters/HowDoYouDo.java
import com.artima.greeter.Greeter;

public class HowDoYouDo implements Greeter {

    public void greet() {
        System.out.println("How do you do, globe!");
    }
}
Greeters can be more complex than the above four examples. Here's an example of a greeter that chooses a greeting based on the time of day:
// On CD-ROM in file linking/ex6/greeters/HiTime.java
import com.artima.greeter.Greeter;
import java.util.Date;

public class HiTime implements Greeter {

    public void greet() {

        // Date's no-arg constructor initializes itself to the
        // current date and time
        Date date = new Date();
        int hours = date.getHours();

        // Some hours: midnight, 0; noon, 12; 11PM, 23;
        if (hours >= 4 && hours <= 11) {
            System.out.println("Good morning, world!");
        }
        else if (hours >= 12 && hours <= 16) {
            System.out.println("Good afternoon, world!");
        }
        else if (hours >= 17 && hours <= 21) {
            System.out.println("Good evening, world!");
        }
        else {
            System.out.println("Good night, world!");
        }
    }
}

The Greet application doesn't know at compile-time what greeter classes it will load and where those classes will be stored. At run-time it takes a directory path as its first command-line argument and greeter class names as subsequent arguments. It attempts to load the greeters using the path name as a base directory.

For example, imagine you invoke the Greet application with the following command line:

java Greet greeters Hello
In this command line, java is the name of the Java virtual machine executable. Greet is the class name of the Greet application. greeters is the name of a directory relative to the current directory in which the Greet application should look for greeters. Hello is the name of the greeter.

When the Greet application is invoked with the above command line, it attempts to load greeters/Hello.class and invoke Hello's greet() method. If the Hello.class file is indeed sitting in a directory named greeters, the application will print:

Hello, world!

The Greet application can handle more than one greeter. If you invoke it with the following command line:

java Greet greeters Hello Greetings Salutations HowDoYouDo
the Greet application will load each of the four greeters listed and invoke their greet() methods, yielding the following output:
Hello, world!
Greetings, planet!
Salutations, orb!
How do you do, globe!

The Greet application works by first checking to make sure there are at least two command-line arguments: a directory path and at least one greeter class name. It then instantiates a new GreeterClassLoader object, which will be responsible for loading the greeters. (The inner workings of class GreeterClassLoader, a subclass of java.lang.ClassLoader, will be described later in this section.) The constructor for GreeterClassLoader accepts a String that it uses as a directory path in which to look for greeters.

After it has created the GreeterClassLoader object, the Greet application invokes its loadClass() method for each greeter name that appears on the command line. When it invokes loadClass(), it passes the greeter class name, args[i], as the sole parameter:

// Load the greeter specified on the command line
Class c = gcl.loadClass(args[i]);
If the loadClass() method is unsuccessful, it throws an exception or error. If the loadClass() method is successful, it returns the Class instance for the newly loaded type.

Note that in addition to being loaded, the type requested of loadClass() may possibly be linked and initialized by the time loadClass() returns. If the type had been actively used prior to the loadClass() invocation that requested the type, that active use would have triggered its loading, linking, and initialization. Regardless, by the time the next statement, which calls newInstance() on a Class reference, is executed, the type will definitely have been initialized. If the type has not yet been initialized, calling newInstance() will trigger the initialization of the type (which must be a class), because a class must be initialized before an object of that class is instantiated. So if the type hadn't been initialized prior to the loadClass() invocation, calling newInstance() will trigger the initialization.

Once loadClass() has returned a Class instance, the Greet application's main() method instantiates a new instance of the greeter by calling newInstance() on the Class instance:

// Instantiate it into a greeter object
Object o = c.newInstance();
When the newInstance() method is invoked on a Class object, the virtual machine creates and initializes a new instance of the class represented by the Class object. To initialize the new instance, the virtual machine invokes its no-arg constructor. (Note that for this statement to work without throwing an exception, the newly loaded type must be a class, not an interface, must be accessible, must not be abstract, and must contain a no-arg constructor that is accessible.)

The Greet application then casts the Object reference that points to the greeter object to type Greeter:

// Cast the Object ref to the Greeter interface type
// so greet() can be invoked on it
Greeter greeter = (Greeter) o;

Finally, armed with a Greeter reference, the main() method invokes the greet() method on the greeter object:

// Greet the world in this greeter's special way
greeter.greet();

The Greet application demonstrates the flexibility inherent in Java's linking model. The Greet application does not know at compile time what greeters it will be loading and dynamically linking to at run-time. In the examples above, class Greet invokes the greet() method in classes Hello, Greetings, Salutations, and HowDoYouDo. But if you look at Greet's constant pool, there is no symbolic reference to any of these classes. There is only a symbolic reference to their shared superinterface, com.artima.greeter.Greeter. Greeters themselves, so long as they implement the com.artima.greeter.Greeter interface, can be anything and can be written and compiled anytime, even after the Greet application itself is compiled.

Using a 1.1 User-Defined Class Loader

Prior to 1.2, the loadClass() method of java.lang.ClassLoader was abstract. To create your own user-defined class loader, you subclassed ClassLoader and implemented loadClass(). In 1.2, a concrete implementation of loadClass() was included in ClassLoader. This concrete loadClass() supports the parent-delegation model introduced in 1.2, and in general makes it easier and less error prone to create a user-defined class loader. To create a user-defined class loader in 1.2, you can subclass ClassLoader< and, rather than override loadClass(), you can override findClass() -- a method with a much simpler contract than loadClass(). This approach to creating a user- defined class loader will be described later in this chapter.

To give you some historical perspective of how class loaders changed between 1.1 and 1.2, consider this implementation of GreeterClassLoader, written for 1.1 and included in the first edition of this book:

// On CD-ROM in file
// linking/ex6/COM/artima/greeter/GreeterClassLoader.java
package COM.artima.greeter;

import java.io.*;
import java.util.Hashtable;

public class GreeterClassLoader extends ClassLoader {

    // basePath gives the path to which this class
    // loader appends "/.class" to get the
    // full path name of the class file to load
    private String basePath;

    public GreeterClassLoader(String basePath) {

        this.basePath = basePath;
    }

    public synchronized Class loadClass(String className,
        boolean resolveIt) throws ClassNotFoundException {

        Class result;
        byte classData[];

        // Check the loaded class cache
        result = findLoadedClass(className);
        if (result != null) {
            // Return a cached class
            return result;
        }

        // Check with the primordial class loader
        try {
            result = super.findSystemClass(className);
            // Return a system class
            return result;
        }
        catch (ClassNotFoundException e) {
        }

        // Don't attempt to load a system file except through
        // the primordial class loader
        if (className.startsWith("java.")) {
            throw new ClassNotFoundException();
        }

        // Try to load it from the basePath directory.
        classData = getTypeFromBasePath(className);
        if (classData == null) {
            System.out.println("GCL - Can't load class: "
                + className);
            throw new ClassNotFoundException();
        }

        // Parse it
        result = defineClass(className, classData, 0,
            classData.length);
        if (result == null) {
            System.out.println("GCL - Class format error: "
                + className);
            throw new ClassFormatError();
        }

        if (resolveIt) {
            resolveClass(result);
        }

        // Return class from basePath directory
        return result;
    }

    private byte[] getTypeFromBasePath(String typeName) {

        FileInputStream fis;
        String fileName = basePath + File.separatorChar
            + typeName.replace('.', File.separatorChar)
            + ".class";

        try {
            fis = new FileInputStream(fileName);
        }
        catch (FileNotFoundException e) {
            return null;
        }

        BufferedInputStream bis = new BufferedInputStream(fis);

        ByteArrayOutputStream out = new ByteArrayOutputStream();

        try {
            int c = bis.read();
            while (c != -1) {
                out.write(c);
                c = bis.read();
            }
        }
        catch (IOException e) {
            return null;
        }

        return out.toByteArray();
    }
}

The 1.1 GreeterClassLoader declares one instance variable, basePath. This variable, a String, is used to store the directory path (passed to GreetingClassLoader's constructor) in which the loadClass() method should look for the class file of the type it has been requested to load.

The loadClass() method begins by checking to see if the requested type has already been loaded by this class loader. It does this by invoking findLoadedClass(), an instance method in ClassLoader, passing in the fully qualified name of the requested type as a parameter. If this class loader has already been marked as an initiating class loader of a type with the requested fully qualified name, findLoadedClass() will return the Class instance representing the type:

// Check the loaded class cache
result = findLoadedClass(className);
if (result != null) {
    // Return a cached class
    return result;
}

As mentioned earlier in this chapter, the virtual machine maintains a list of type names that have already been requested of each class loader. These lists, which include all the types for which each class loader has been marked as an initiating loader, represent the sets of unique names that currently populate each class loader's namespace. When loading classes in Step 1a of the process of resolving CONSTANT_Class_info entries (described earlier in this chapter), the virtual machine always checks its internal list before automatically invoking loadClass(). As a result, the virtual machine will never automatically invoke loadClass() on a user-defined class loader with the name of a type already loaded by that user-defined class loader. Nevertheless, the GreeterClassLoader invokes findLoadedClass()<<> to check the requested class against the list of the names of the types it has already loaded. Why? Because even though the virtual machine will never ask a user-defined class loader to load the same type twice, the application just might.

As an example, imagine the Greet application were invoked with this command line:

java Greet greeters Hello Hello Hello Hello Hello
Given this command line, the Greet application would invoke loadClass() with the name Hello five times on the same GreeterClassLoader object. The first time, the GreeterClassLoader would load the class. The next four times, however, the GreeterClassLoader would simply get the Class instance for Hello by calling findLoadedClass() and return that. It would only load class Hello once.

If the loadClass() method determines that the requested type has not been loaded into its name space, it next passes the name of the requested type to findSystemClass():

// Check with the primordial class loader
try {
    result = super.findSystemClass(className);
    // Return a system class
    return result;
}
catch (ClassNotFoundException e) {
}

When the findSystemClass() method is invoked in a 1.1 virtual machine, the primordial class loader attempts to load the type. In 1.2, the system class loader attempts to load the type. If the load is successful, findSystemClass() returns the Class instance representing the type, and loadClass() returns that same Class instance.

If the primordial (in 1.1) or system (in 1.2 ) class loader is unable to load the type, findSystemClass() throws ClassNotFoundError. In this case, the loadClass() method next checks to make sure the requested class is not part of the java package:

// Don't attempt to load a system file except through
// the primordial class loader
if (className.startsWith("java.")) {
    throw new ClassNotFoundException();
}

This check prevents members of the standard java packages (java.lang, java.io, etc.) from being loaded by anything but the bootstrap class loader. As mentioned in Chapter 3, "Security," two types that declare themselves to be part of the same named package are only granted access to each other's package-visible members if they belong to the same runtime package (if they were loaded by the same class loader). But the notion of a "runtime package" and its affect on accessibility was first introduced in the second edition of the Java virtual machine specification. Thus, early versions of class loaders had to explicitly prevent user-defined class loaders from attempting to load types that declare themselves to be part of the Java API (or any other "restricted" packages) but that couldn't be loaded by the bootstrap class loader.

If the type name doesn't begin with "java.", the loadClass() method next invokes getTypeFromBasePath(), which attempts to import the binary data in the user- defined class loader's custom way:

// Try to load it from the basePath directory.
classData = getTypeFromBasePath(className);
if (classData == null) {
    throw new ClassNotFoundException();
}

The getTypeFromBasePath() method looks for a file with the type name plus a ".class" extension in the base directory passed to the GreeterClassLoader's constructor. If the getTypeFromBasePath() method is unable to find the file, it returns a null result and the loadClass() method throws ClassNotFoundException. Otherwise, loadClass() invokes defineClass(), passing the byte array returned by getTypeFromBasePath():

// Parse it
result = defineClass(className, classData, 0,
    classData.length);
if (result == null) {
    System.out.println("GCL - Class format error: "
        + className);
    throw new ClassFormatError();
}

The defineClass() method completes the loading process: it parses the binary data into internal data structures and creates a Class instance. The defineClass() method does not link and initialize the type. (As mentioned earlier in this chapter, the defineClass() method also makes sure all the type's supertypes are loaded. It does this by invoking loadClass() on this user-defined class loader for each direct superclass and superinterface, and recursively applies the resolution process on all supertypes in the hierarchy.)

If defineClass() is successful, the loadClass() method checks to see if resolve were set to true. If so, it invokes resolveClass(), passing the Class instance returned by defineClass(). The resolveClass() method links the class. , it Finally, loadClass() returns the newly created Class instance:

if (resolveIt) {
    resolveClass(result);
}

// Return class from basePath directory
return result;

Using a 1.2 User-Defined Class Loader

The class loader described in the previous section, which was originally designed for a 1.1 virtual machine, will still work in 1.2. Although 1.2 added a concrete default implementation of loadClass() to java.lang.ClassLoader, this concrete method can still be overridden in subclasses. Because the contract of loadClass() did not change from 1.1 to 1.2, legacy user-defined class loaders that override loadClass() should still work as expected in 1.2.

The basic contract of loadClass() is this: Given the fully qualified name of the type to find, the loadClass() method should in some way attempt to locate or produce an array of bytes, purportedly in the Java class file format, that define the type. If loadClass() is unable to locate or produce the bytes, it should throw ClassNotFoundException. Otherwise, loadClass() should pass the array of bytes to one of the defineClass() methods declared in class ClassLoader. By passing the byte array to defineClass(), loadClass() asks the virtual machine to import the type represented by the passed byte array into the namespace of this user-defined class loader. When loadClass() calls defineClass() in 1.2, it can also specify a protection domain with which the type data should be associated. When the loadClass() method of a class loader successfully loads a type, it returns a java.lang.Class object to represent the newly loaded type.

The concrete implementation of loadClass() from class java.lang.ClassLoader fullfills the loadClass() method's contract using these four basic steps:

  1. See if the requested type has already been loaded into this class loader's namespace (via findLoadedClass()). If so, return the Class instance for that already-loaded type.
  2. Otherwise, delegate to this class loader's parent loader. If the parent returns a Class instance, return that same Class instance.
  3. Otherwise, invoke findClass(), which should attempt to locate or produce an array of bytes, purportedly in the Java class file format, that define the desired type. If successful, findClass() should pass those bytes to defineClass(), which will attempt to import the type and return a Class instance. If findClass() returns a Classinstance, loadClass() returns that same Class instance.
  4. Otherwise, findClass() completes abruptly with some exception, and loadClass() completes abruptly with the same exception.

Although in 1.2 you can still subclass ClassLoader and override the loadClass() method, the recommended approach to creating your own user-defined class loader in 1.2 is to subclass ClassLoader and implement the findClass() method. The findClass() method looks like this:

// A method declared in class java.lang.ClassLoader:
protected Class findClass(String name)
    throws ClassNotFoundException;

The basic contract of the findClass() method is this: findClass() accepts the fully qualified name of a desired type as its only parameter. findClass() first attempts to locate or produce an array of bytes, purportedly in the Java class file format, that define the type of the requested name. If findClass() is unable to locate or produce the array of bytes, it completes abruptly with ClassNotFoundException. Otherwise, findClass() invokes defineClass(), passing in the requested name, the array of bytes and, optionally, a ProtectionDomain object with which the type should be associated. If defineClass() returns a Class instance for the type, findClass() simply returns that same Class instance to its caller. Otherwise, defineClass() completes abruptly with some exception, and findClass() completes abruptly with the same exception.

Here's a version of GreeterClassLoader that, rather than overriding loadClass(), merely overrides findClass():

// On CD-ROM in file
// linking/ex7/com/artima/greeter/GreeterClassLoader.java
package com.artima.greeter;

import java.io.*;

public class GreeterClassLoader extends ClassLoader {

    // basePath gives the path to which this class
    // loader appends "/.class" to get the
    // full path name of the class file to load
    private String basePath;

    public GreeterClassLoader(String basePath) {

        this.basePath = basePath;
    }

    public GreeterClassLoader(ClassLoader parent, String basePath) {

        super(parent);
        this.basePath = basePath;
    }

    protected Class findClass(String className)
        throws ClassNotFoundException {

        byte classData[];

        // Try to load it from the basePath directory.
        classData = getTypeFromBasePath(className);
        if (classData == null) {
            throw new ClassNotFoundException();
        }

        // Parse it
        return defineClass(className, classData, 0,
            classData.length);
    }

    private byte[] getTypeFromBasePath(String typeName) {

        FileInputStream fis;
        String fileName = basePath + File.separatorChar
            + typeName.replace('.', File.separatorChar)
            + ".class";

        try {
            fis = new FileInputStream(fileName);
        }
        catch (FileNotFoundException e) {
            return null;
        }

        BufferedInputStream bis = new BufferedInputStream(fis);

        ByteArrayOutputStream out = new ByteArrayOutputStream();

        try {
            int c = bis.read();
            while (c != -1) {
                out.write(c);
                c = bis.read();
            }
        }
        catch (IOException e) {
            return null;
        }

        return out.toByteArray();
    }
}

This version of GreeterClassLoader appears in the linking/ex7 directory of the CD-ROM. All of the source files in linking/ex6, which were described in detail in previous sections, appear unchanged in linking/ex7, except for GreeterClassLoader.java. Where the GreeterClassLoader class, described in the previous section, that overrides loadClass() appears in linking/ex6, the GreeterClassLoader , described in this section, that overrides findClass() appears in linking/ex7.

This second version of GreeterClassLoader declares one instance variable, basePath, which is a String that is used to store the directory path in which findClass() should look for the class file of the type it has been requested to load. The basePath String is the only parameter passed to GreetingClassLoader's 1-arg constructor. Because the 1-arg constructor accepts no reference to a caller-specified parent class loader, this class loader can't invoke the superclass constructor that takes a reference to a user-defined class loader. Thus, it simply invokes the superclass's no-arg constructor by default, which sets this class loader's parent to be the system class loader. The other constructor (the 2-arg constructor), however, accepts a reference to a user-defined class loader instance as well as the basePath String. This constructor explicitly invokes the superclass's 1-arg constructor, passing along the reference. The superclass sets this class loader's parent to be the passed user-defined class loader instance.

By comparing the implementation of findClass() in this version of GreeterClassLoader with the implementation of loadClass() in the previous version of GreeterClassLoader, you can easily see how much simpler it is to write findClass() than loadClass(). You have much less to worry about when you write findClass(), and fewer opportunities to make mistakes. findClass() merely invokes getTypeFromBasePath() to attempt in this user-defined class loader's custom way to load the requested type. If getTypeFromBasePath() is unable to locate the requested type in the basePath directory, it returns null and findClass() throws a ClassNotFoundException. Otherwise, getTypeFromBasePath() returns the array of bytes, which findClass() simply passes on to defineClass(). If defineClass() returns a reference to a Class instance to represent the successfully loaded type, findClass() returns that same reference. Otherwise, defineClass() completes abruptly with an exception, which causes findClass() to complete abruptly with the same exception.

The findClass() method's contract is a subset of the loadClass() method's contract. findClass() isolates the only two parts of loadClass() that should in general be customized by subclasses of java.lang.ClassLoader:

  1. the custom manner in which an array of bytes is located or produced given a fully qualified type name
  2. optionally, the custom manner in which a type's protection domain is determined

When an implementation of findClass() performs these two tasks, the result is an array of bytes and a reference to a ProtectionDomain object. findClass() passes both the byte array and the ProtectionDomain reference to defineClass().

Example: Dynamic Extension with forName()

As an example of a Java application that performs dynamic extension with forName(), consider the EasyGreet class:

// On CD-ROM in file linking/ex7/EasyGreet.java
import com.artima.greeter.*;

public class EasyGreet {

    // Arguments to this application:
    //     args[0], args[1], ... - class names of greeters to load
    //               and invoke the greet() method on.
    //
    // All greeters must implement the com.artima.greeter.Greeter
    // interface.
    //
    static public void main(String[] args) {

        if (args.length == 0) {
            System.out.println(
                "Enter greeter class names as args.");
            return;
        }

        for (int i = 0; i < args.length; ++i) {
            try {

                // Load the greeter specified on the command line
                Class c = Class.forName(args[i]);

                // Instantiate it into a greeter object
                Object o = c.newInstance();

                // Cast the Object ref to the Greeter interface type
                // so greet() can be invoked on it
                Greeter greeter = (Greeter) o;

                // Greet the world in this greeter's special way
                greeter.greet();
            }
            catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

The EasyGreet application is very similar to the Greet application from the previous example. Like Greet, EasyGreet will attempt to dynamically load and execute greeters mentioned in command line arguments. But unlike Greet, EasyGreet doesn't take as its first command line argument a path name of the directory in which the class files for the greeters are stored. All of EasyGreet's command line arguments are greeter class names. Another difference is that EasyGreeter, because it is going to use forName() to load greeters dynamically, doesn't instantiate a GreeterClassLoader. Then, where Greet invoked loadClass() on its GreeterClassLoader instance, EasyGreet invokes forName(), a static method of class Class.

EasyGreet's forName() invocation looks very similar to Greet's loadClass() invocation. Like loadClass(), forName() accepts the fully qualified name of the requested type in a String parameter. If successful in loading the type (or if the type had been loaded previously), forName(), like loadClass(), returns the Class instance that represents the type. If unsuccessful, forName(), like loadClass(), throws ClassNotFoundException. The big difference between the two approaches is that whereas loadClass() attempts to ensure the requested type is loaded into the user- defined class loader's namespace, forName() attempts to ensure the requested type is loaded into the current namespace -- the namespace of the defining class loader for the type whose method includes the forName() invocation.

Because forName() is invoked from the main() method of class EasyGreet, the class loader that forName() asks to load the requested type is EasyGreet's defining class loader. When run from Sun's Java 2 SDK version 1.2, the class loader that loads EasyGreet is the system class loader, which looks for classes on the class path. To use the class path environment variable, you can execute the EasyGreet application in the linking/ex7 directory of the CD-ROM with a command like this:

java EasyGreet Hello

If you don't specify a class path either explicitly on the command line or in an environment variable, the system loader will look in the current directory for requested types. Because the current directory (the linking/ex7 directory from the CD-ROM) doesn't contain Hello.class, the system class loader is unable to locate Hello.class. The forName() method, and in turn EasyGreet's main() method, completes abruptly with a ClassNotFoundException:

java.lang.ClassNotFoundException: Hello
	at java.net.URLClassLoader$1.run(URLClassLoader.java: 202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java: 191)
	at java.lang.ClassLoader.loadClass(ClassLoader.java: 290)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java: 275)
	at java.lang.ClassLoader.loadClass(ClassLoader.java: 247)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java: 124)
	at EasyGreet.main(EasyGreet.java, Compiled Code)

To enable EasyGreet to find Hello.class merely requires that the greeters directory be included in a class path specified on the command line with the "-cp" option, as in:

java -cp .;greeters; EasyGreet Hello

When started with this command, the EasyGreet program prints:

Hello, world!

Like the Greet method, EasyGreet will accept multiple greeter names on the command line:

java -cp .;greeters; EasyGreet Hello Greetings Salutations HowDoYouDo
When invoked with this command, the EasyGreet application will load each of the four greeters listed and invoke their greet() methods, yielding this output:
Hello, world!
Greetings, planet!
Salutations, orb!
How do you do, globe!

The important difference that arises from Greet's use of loadClass() on GreeterClassLoader and EasyGreet's use of forName() is the namespaces into which the greeter classes get loaded. In Greet, the greeter classes get loaded into the GreeterClassLoader's namespace. In EasyGreet, the greeter classes get loaded into the system class loader's namespace.

Example: Unloading Unreachable Greeters

As an example of dynamically loaded types becoming unreachable and getting unloaded by the virtual machine, consider the following application:

// On CD-ROM in file linking/ex7/GreetAndForget.java
import com.artima.greeter.*;

public class GreetAndForget {

    // Arguments to this application:
    //     args[0] - path name of directory in which class files
    //               for greeters are stored
    //     args[1], args[2], ... - class names of greeters to load
    //               and invoke the greet() method on.
    //
    // All greeters must implement the com.artima.greeter.Greeter
    // interface.
    //
    static public void main(String[] args) {

        if (args.length <= 1) {
            System.out.println(
                "Enter base path and greeter class names as args.");
            return;
        }

        for (int i = 1; i < args.length; ++i) {
            try {

                GreeterClassLoader gcl =
                    new GreeterClassLoader(args[0]);

                // Load the greeter specified on the command line
                Class c = gcl.loadClass(args[i]);

                // Instantiate it into a greeter object
                Object o = c.newInstance();

                // Cast the Object ref to the Greeter interface type
                // so greet() can be invoked on it
                Greeter greeter = (Greeter) o;

                // Greet the world in this greeter's special way
                greeter.greet();

                // Forget the class loader object, Class
                // instance, and greeter object
                gcl = null;
                c = null;
                o = null;
                greeter = null;

                // At this point, the types loaded through the
                // GreeterClassLoader object created at the top of
                // this for loop are unreferenced and can be unloaded
                // by the virtual machine.
            }
            catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}

The GreetAndForget application accepts the same command line arguments as the Greet application of the previous example. The first argument is a base directory path name where the GreetAndForget application will look for greeters. Subsequent arguments are greeter names. To understand this example you should be familiar with the Greet application presented earlier in this chapter.

Imagine you invoke the GreetAndForget application with the following command line:

java GreetAndForget greeters Surprise HiTime Surprise
The code for the HiTime greeter, which selects a different greeting based on the time of day, is shown above in the previous section of this chapter. The code for the Surprise greeter, which pseudo-randomly selects one of four helper greeters-- Hello, Greetings, Salutations, or HowDoYouDo--and invokes its greet() method, is shown here:
// On CD-ROM in file linking/ex7/greeters/Surprise.java
import com.artima.greeter.Greeter;

public class Surprise implements Greeter {

    public void greet() {

        // Choose one of four greeters pseudo-randomly and
        // invoke its greet() method.
        int choice = (int) (Math.random() * 3.99);

        Greeter g;

        switch(choice) {

        case 0:
            g = new Hello();
            g.greet();
            break;

        case 1:
            g = new Greetings();
            g.greet();
            break;

        case 2:
            g = new Salutations();
            g.greet();
            break;

        case 3:
            g = new HowDoYouDo();
            g.greet();
            break;
        }
    }
}

Given the command line shown above, the GreetAndForget application invokes the greet() method of the Surprise greeter first, then the HiTime greeter, then the Surprise greeter again. GreetAndForget's actual output would vary depending on the time of day and Surprise's pseudo-random mood. For the purposes of this example, assume that you typed in the above command, hit return, and got the following output:

How do you do, globe!
Good afternoon, world!
Greetings, planet!
This output indicates Surprise chose to execute HowDoYouDo's greet() method the first time around and Greetings's greet() method the second time around.

The first pass through GreetAndForget's for loop, the virtual machine loads the Surprise class and invokes its greet() method. The constant pool for Surprise includes a symbolic reference to each of the four helper greeters that it may choose: Hello, Greetings, Salutations, and HowDoYouDo. Assuming the Java virtual machine that you used to run the GreetAndForget application uses late resolution, only one of these four symbolic references will be resolved during the first pass of GreetAndForget's for loop: the symbolic reference to HowDoYouDo. The virtual machine resolves this symbolic reference when it executes the bytecodes that correspond to the following statement in Surprise's greet() method:

g = new HowDoYouDo();

To resolve the symbolic reference from Surprise's constant pool to HowDoYouDo, the virtual machine invokes the GreeterClassLoader object's loadClass() method, passing the string "HowDoYouDo" in the name parameter. The virtual machine uses the GreeterClassLoader object to load HowDoYouDo because Surprise was loaded through the GreeterClassLoader object. As mentioned earlier in this chapter, when the Java virtual machine resolves a symbolic reference, it uses the same class loader that defined the referencing type (in this case, Surprise) to initiate loading the referenced type (in this case, HowDoYouDo).

Once Surprise's greet() method has created a new HowDoYouDo instance, it invokes its greet() method:

g.greet();

As the virtual machine executes HowDoYouDo's greet() method, it must resolve two symbolic references from HowDoYouDo's constant pool-- one to class java.lang.System and another to class java.io.PrintStream. To resolve these symbolic references, the virtual machine invokes the GreeterClassLoader object's loadClass() method, once with the name java.lang.System and once with the name java.io.PrintStream. As before, the virtual machine uses the GreeterClassLoader object to load these classes because the referencing class--in this case, HowDoYouDo--was loaded through the GreeterClassLoader object. But these two classes, both members of the Java API, will end up being loaded by the bootstrap class loader anyway, because loadClass() will first delegate to its parent.

Remember that before the loadClass() method of GreeterClassLoader attempts to look for a requested type in the base directory (in this case, directory greeters), it invokes its parent, the system class loader. The system class loader will first delegate to its parent, which will first delegate to its parent, and so on. Eventually findSystemClass() will be invoked to delegate to the bootstrap class loader, the end-point of the parent-delegation chain. Because the bootstrap class loader (via findSystemClass()) is able to load both java.lang.System and java.io.PrintStream, the loadClass() method will simply return the Class instance returned by findSystemClass(). These classes will be marked not as having been loaded by the GreeterClassLoader object, but as having been loaded by the bootstrap class loader. To resolve any references from java.lang.System or java.io.PrintStream, the virtual machine will not invoke the loadClass() method of the GreeterClassLoader object, or even the system class loader. It will just use the bootstrap class loader directly.

As a result, after Surprise's greet() method has returned, there will be two types marked as having been loaded by the GreeterClassLoader object: class Surprise and class HowDoYouDo. These two types will be in the virtual machine's internal list of the types loaded by the GreeterClassLoader object.

Just after Surprise's greet() method returns, the Class instances for Surprise and HowDoYouDo are reachable by the application. The garbage collector will not reclaim the space occupied by these Class instances, because there are ways for the application's code to access and use them. See Figure 8-11 for a graphical depiction of the reachability of these two Class instances.

Figure 8-11. The reachability of the Class instances for Surprise and HowDoYouDo.

Figure 8-11. The reachability of the Class instances for Surprise and HowDoYouDo.

The Class instance for Surprise can be reached in two ways. First, it can be reached directly from local variable c of GreetAndForget's main() method. Second, it can be reached from local variables o and greeter, which both point to the same Surprise object. From the Surprise object, the virtual machine can get at Surprise's type data, which includes a reference to Surprise's Class object. The third way the Class instance for Surprise can be reached is through the gcl local variable of GreetAndForget's main() method. This local variable points to the GreeterClassLoader object, which includes a reference to a HashTable object in which a reference to Surprise's Class instance is stored.

The Class instance for HowDoYouDo can be reached in two ways. One way is identical to the one of the paths to the Class instance for Surprise: the gcl local variable of GreetAndForget's main() method points to the GreeterClassLoader object, which includes a reference to a HashTable object. The Hashtable contains a reference to HowDoYouDo's Class instance. The other way to reach HowDoYouDo's class instance is through Surprise's constant pool.

When the virtual machine resolved the symbolic reference from Surprise's constant pool to HowDoYouDo, it replaced the symbolic reference with a direct reference. The direct reference points to HowDoYouDo's type data, which includes a reference to HowDoYouDo's Class instance.

Thus, starting from Surprise's constant pool, the Class instance to HowDoYouDo is reachable. But why would the garbage collector look at direct references emanating from Surprise's constant pool in the first place? Because Surprise's Class instance is reachable. When the garbage collector finds that it can reach Surprise's Class instance, it makes sure it marks the Class instances for any types that are directly referenced from Surprise's constant pool as reachable. If Surprise is still live, the virtual machine can't unload any types Surprise may need to use.

Note that of the three ways, described above, that Surprise's Class instance can be reached, none of them involve a constant pool of another type. Surprise does not appear as a symbolic reference in the constant pool for GreetAndForget. Class GreetAndForget did not know about Surprise at compile-time. Instead, the GreetAndForget application decided at run-time to load and link to class Surprise. Thus, the Class instance for class Surprise is only reachable by starting from the local variables of GreetAndForget's main() method. Unfortunately for Surprise (and ultimately for HowDoYouDo), this does not constitute a very firm grasp on life.

The next four statements in GreetAndForget's main() method, will change the reachability situation completely:

// Forget the user-defined class loader, Class
// instance, and greeter object
gcl = null;
c = null;
o = null;
greeter = null;
These statements null out all four starting places from which Surprise's Class instance is reachable. As a result, after these statements have been executed, the Class instance for Surprise is no longer reachable. These statements also render unreachable the Class instance for HowDoYouDo, the Surprise instance that was formerly pointed to by the o and greeter variables, the GreeterClassLoader instance that was formerly pointed to by the gcl variable, and the Hashtable instance that was pointed to by the classes variable of the GreeterClassLoader object. All five of these objects are now available for garbage collection.

When (and if) the garbage collector gets around to freeing the unreferenced Class instances for Surprise and HowDoYouDo, it can also free up all the associated type data in the method area for Surprise and HowDoYouDo. Because these class's Class instances are unreachable, the types themselves are unreachable and can be unloaded by the virtual machine.

Note that two iterations of the for loop later (given the command line shown above), the GreetAndForget application will again load class Surprise. Keep in mind that the virtual machine will not reuse the type data for Surprise that was loaded during the first pass of the for loop. Granted, that type data became available for unloading at the end of the first pass. But even if the Class instance for Surprise hadn't become unreferenced at the end of the first pass, the type data from the first pass wouldn't be reused during the third pass.

With each pass of the for loop, the main() method of GreetAndForget creates a new GreeterClassLoader object. Thus, every greeter that GreetAndForget loads is loaded through a different user-defined class loader. For example, if you invoke the GreetAndForget application with the Hello greeter listed five times on the command line, the application will create five instances of class GreeterClassLoader. The Hello greeter will be loaded five times by five different user-defined class loaders. The method area will contain five different copies of the type data for Hello. The heap will contain five Class instances that represent the Hello class--one for each namespace into which Hello is loaded. When one of the Class instances for Hello becomes unreferenced, only the Hello type data associated with that particular Class instance would be available for unloading.

Example: Type Safety and Loading Constraints

In early implementations of the Java virtual machine, it was possible to confuse Java's type system. A Java application could trick the Java virtual machine into using an object of one type as if it were an object of a different type. This capability makes cracker's happy, because they can potentially spoof trusted classes to gain access to non-public data or change the behavior of methods by replacing them with new versions. For example, if a cracker could write a class and successfully fool the Java virtual machine into thinking it was class SecurityManager, that cracker could potentially break out of the sandbox. The example presented in this section is designed to help you understand the type safety problems that can arise with delegating class loaders, and the loading constraints that appeared in the second edition of the Java virtual machine specification to address the problem.

The type safety problem arises because the multiple namespaces inside a Java virtual machine can share types. If one class loader delegates to another class loader, and the delegated-to class loader defines the type, both class loaders are marked as initiating loaders for that type. The type defined by the delegated-to class loader is shared among all the namespaces of the initiating loaders of the type.

At compile time, a type is uniquely identifiable by its fully qualified name. For example, only one class named Spoofed can exist at compile time. At runtime, however, a fully qualified name is not enough to uniquely identify a type that has been loaded into a Java virtual machine. Because a Java application can have multiple class loaders, and each class loader maintains its own namespace, multiple types with the same fully qualified name can be loaded into the same Java virtual machine. Thus, to uniquely identify a type loaded into a Java virtual machine requires the fully qualified name and the defining class loader.

The type safety problems made possible by this class loader architecture arose from the Java virtual machine's initial reliance on the compile time notion of a type being uniquely identifiable by only its fully qualified name. You can always load two types both named Spoofed into the same Java virtual machine. Each Spoofed class would be defined by different class loader. But with a little finesse, you could fool an early implementation of the Java virtual machine into treating an instance of one Spoofed as if it were an instance of the other Spoofed.

To address this problem, the second edition of the Java virtual machine specification introduced the notion of loading constraints. Loading constraints basically enable the Java virtual machine to enforce type safety based not just on fully qualified name, but also on the defining class loader, without forcing eager class loading. When the virtual machine detects a potential for type confusion during constant pool resolution, it adds a constraint to an internal list of constraints. All future resolutions must satisfy this new constraint, as well as all other constraints in the list.

For an example of the type confusion problem and its loading constraints solution, consider this implementation of a greeter, written by a devious cracker:

// On CD-ROM in file linking/ex8/greeters/Cracker.java
import com.artima.greeter.Greeter;

public class Cracker implements Greeter {

    public void greet() {

        Spoofed spoofed = new Spoofed();

        System.out.println("secret val = "
            + spoofed.giveMeFive());

        spoofed = Delegated.getSpoofed();

        System.out.println("secret val = "
            + spoofed.giveMeFive());
    }
}

Class Cracker is a greeter, like Hello or Salutations of the previous examples, because it implements the com.artima.greeter.Greeter interface. Class Cracker is sitting in the linking/ex8 directory of the CD-ROM, along with other, more well- meaning, greeters.

All the classes from the linking/ex7 directory appear unchanged in linking/ex8, except for GreeterClassLoader, which has been slightly modified. (More on this modification later.) You can invoke Cracker with the Greet method just like any other greeter. From the linking/ex8 directory, you can simply type:

java Greet greeters Cracker

The main() method of Greet will, as it did in the previous examples, create a GreeterClassLoader and invoke its loadClass() method, passing in the name Cracker. GreeterClassLoader's loadClass() method will look in the greeters directory, load Cracker.class, instantiate a new Cracker object, and invoke greet() on it. Cracker's greet() method starts by instantiating a new Spoofed. This is where the plot thickens.

It turns out that there are two implementations of a class named Spoofed. The class file for the "trusted" implementation is sitting in the linking/ex8 directory, where it will be discovered by the system class loader:

// On CD-ROM in file linking/ex8/Spoofed.java
// Trusted version - when asked to give five, gives 5

public class Spoofed {

    private int secretValue = 42;

    public int giveMeFive() {
        return 5;
    }

    static {
        System.out.println(
            "linking/ex8/Spoofed initialized.");
    }
}

The trusted Spoofed declares a private variable, named secretValue, that is initialized to 42. This private variable represents anything that needs to be kept secret: a credit card number, a private key, an amount of e-cash, a reference to the current Policy object, and so on. Because the designers of this class didn't want the rest of the world to have access to the secret value, they made the secretValue variable private. Only the methods of class Spoofed can access secretValue. If you inspect the code to the trusted Spoofed class, you'll see that the designers of Spoofed didn't provide any method that reveals information about secretValue. The only method in Spoofed, giveMeFive(), returns the value 5.

But what if a maladjusted cracker was able to trick the virtual machine that an instance of the trusted Spoofed was really an instance of this class, also named Spoofed, which was written by the cracker:

// On CD-ROM in file linking/ex8/greeters/Spoofed.java
// Malicious version - when asked to give five, this
//     version of Spoofed reveals secret_value

public class Spoofed {

    private int secretValue = 100;

    public int giveMeFive() {
        return secretValue;
    }

    static {
        System.out.println(
            "linking/ex8/greeters/Spoofed initialized.");
    }
}

When this Spoofed class's giveMeFive() method is invoked, it returns secretValue, effectively rendering the value of the private variable public knowledge.

So which version of Spoofed gets used by the Cracker greeter? Cracker deviously attempts to use both. First, Cracker's greet() method loads the malicious Spoofed and executes its greet() method, just to get the feel of it:

Spoofed spoofed = new Spoofed();

System.out.println("secret val = "
    + spoofed.giveMeFive());

The Java compiler translates the new Spoofed() expression into a new bytecode instruction that gives the index of a CONSTANT_Class_info constant pool entry, which represents a symbolic reference to Spoofed. When the virtual machine resolves this reference, it will ask the defining loader of Cracker to load spoofed. The defining loader of Cracker is this version of GreeterClassLoader, which the cracker has had the opportunity to modify:

// On CD-ROM in file
// linking/ex8/COM/artima/greeter/GreeterClassLoader.java
package com.artima.greeter;

import java.io.*;
import java.util.Hashtable;

public class GreeterClassLoader extends ClassLoader {

    // basePath gives the path to which this class
    // loader appends "/.class" to get the
    // full path name of the class file to load
    private String basePath;

    public GreeterClassLoader(String basePath) {

        this.basePath = basePath;
    }

    public synchronized Class loadClass(String className,
        boolean resolveIt) throws ClassNotFoundException {

        Class result;
        byte classData[];

        // Check the loaded class cache
        result = findLoadedClass(className);
        if (result != null) {
            // Return a cached class
            return result;
        }

        // If Spoofed, don't delegate
        if (className.compareTo("Spoofed") != 0) {

            // Check with the system class loader
            try {
                result = super.findSystemClass(className);
                // Return a system class
                return result;
            }
            catch (ClassNotFoundException e) {
            }
        }

        // Don't attempt to load a system file except through
        // the primordial class loader
        if (className.startsWith("java.")) {
            throw new ClassNotFoundException();
        }

        // Try to load it from the basePath directory.
        classData = getTypeFromBasePath(className);
        if (classData == null) {
            System.out.println("GCL - Can't load class: "
                + className);
            throw new ClassNotFoundException();
        }

        // Parse it
        result = defineClass(className, classData, 0,
            classData.length);
        if (result == null) {
            System.out.println("GCL - Class format error: "
                + className);
            throw new ClassFormatError();
        }

        if (resolveIt) {
            resolveClass(result);
        }

        // Return class from basePath directory
        return result;
    }

    private byte[] getTypeFromBasePath(String typeName) {

        FileInputStream fis;
        String fileName = basePath + File.separatorChar
            + typeName.replace('.', File.separatorChar)
            + ".class";

        try {
            fis = new FileInputStream(fileName);
        }
        catch (FileNotFoundException e) {
            return null;
        }

        BufferedInputStream bis = new BufferedInputStream(fis);

        ByteArrayOutputStream out = new ByteArrayOutputStream();

        try {
            int c = bis.read();
            while (c != -1) {
                out.write(c);
                c = bis.read();
            }
        }
        catch (IOException e) {
            return null;
        }

        return out.toByteArray();
    }
}

To create this user-defined class loader, the cracker took the GreeterClassLoader from the linking/ex6 directory of the CD-ROM (the one that overrides loadClass()), and added one if statement:

// If Spoofed, don't delegate
if (className.compareTo("Spoofed") != 0) {

    // Check with the system class loader
    try {
        result = super.findSystemClass(className);
        // Return a system class
        return result;
    }
    catch (ClassNotFoundException e) {
    }
}

If the type name passed to loadClass() is "Spoofed", the loadClass() method doesn't first delegate to the system class loader before attempting to load the class in its custom way, by looking in the basePath directory. As a result, when the virtual machine asks this class loader (Cracker's defining class loader) to load Spoofed, its loadClass() doesn't delegate. It just looks in the basePath directory for Spoofed.class, where it finds and loads the definition of the malicious Spoofed. The application prints:

linking/ex8/greeters/Spoofed initialized.

The next statement in Cracker's greet() method invokes giveMeFive() on the new Spoofed instance and prints its return value:

secret val = 100

Having exercised the giveMeFive() method and feeling smug, Cracker's greet() method invokes a static method in a class named Delegated, which returns a reference of type Spoofed:

spoofed = Delegated.getSpoofed();

The Java compiler transforms the Delegated.getSpoofed() expression in the source code to an invokestatic bytecode instruction that gives the index of a CONSTANT_Methodref_info entry in the constant pool. To execute this instruction, the virtual machine must resolve the constant pool entry. As the first step in resolving this symbolic reference to getSpoofed(), the virtual machine resolves the CONSTANT_Class_info reference whose index is given in the class_index of the CONSTANT_Methodref_info entry. The CONSTANT_Class_info entry is a symbolic reference to class Delegated.

To resolve Cracker's symbolic reference to Delegated, the virtual machine asks the defining class loader of Cracker to load Delegated. Once again the virtual machine invokes GreeterClassLoader's loadClass() method, this time passing in the name Delegated. However, because the requested name isn't "Spoofed", the loadClass() method goes ahead and delegates the load request to the system class loader. Because Delegated.class is sitting in the linking/ex8 directory, the system class loader is able to load the class. The system class loader is marked as the defining class loader for Delegated, and both the system class loader and the GreeterClassLoader are marked as initiating class loaders.

Once Delegated has been loaded, the virtual machine completes the resolution of the CONSTANT_Methodref_info and invokes the getSpoofed() method. Here's what Delegated's getSpoofed() method looks like:

// On CD-ROM in file linking/ex8/Delegated.java

public class Delegated {

    public static Spoofed getSpoofed() {

        return new Spoofed();
    }
}

In Java source code, this looks quite innocuous. The getSpoofed() method merely instantiates yet another Spoofed object and returns a reference to it. Inside the Java virtual machine, however, a serious challenge to Java's guarantee of type safety is looming.

When the Java compiler encounters the new Spoofed() expression in class Delegated, it generates a new bytecode that gives the index of a CONSTANT_Class_info that forms a symbolic reference to Spoofed. This is exactly what happened when the Java compiler encountered the new Spoofed() expression in class Cracker. When the Java virtual machine executes this new instruction, just as when it executed the new instruction in Cracker's greet() method, it starts by resolving the symbolic reference to Spoofed. The virtual machine asks Delegated's defining loader, which is the system class loader, to load Spoofed.

Although this is the same process that the virtual machine used to resolve Cracker's symbolic reference to Spoofed, the class loader to which the virtual machine makes its load request is different. Because Cracker's defining loader was GreeterClassLoader, the virtual machine asked GreeterClassLoader to load Spoofed. But because Delegated's defining loader was the system class loader, the virtual machine now asks the system class loader to load Spoofed.

Because the trusted version of Spoofed is sitting in the linking/ex8 directory of the CD-ROM, the system class loader is able to read in the bytes of the Spoofed.class and pass them to defineClass(). What happens next depends on whether or not the application is running in a Java virtual machine that adheres to the loading constraints specified in the second edition of the Java virtual machine specification.

Assume for a moment that the application is running in an early Java virtual machine implementation that doesn't apply the loading constraints. In that case, defineClass() is able to define the type from the bytes read in from linking/ex2/Spoofed.class. The virtual machine creates a new instance of this trusted Spoofed type. Shortly thereafter, Delegated's getSpoofed() method returns a reference to the trusted Spoofed object to its caller, Cracker's greet() method. Cracker stores this reference in local variable spoofed, and proceeds to print out the value returned by invoking giveMeFive() on spoofed.

When Cracker.java was compiled, the Java compiler transformed this second giveMeFive() invocation into yet another invokevirtual instruction that references a CONSTANT_Methodref_info entry in the constant pool, the symbolic reference to giveMeFive() in Spoofed. When the virtual machine goes to resolve this symbolic reference, however, it discovers it has already been resolved. The CONSTANT_Methodref_info entry specified by the second giveMeFive() invocation is the same as that specified by the first one, which was resolved to the malicious Spoofed's implementation of giveMeFive(). The virtual machine invokes the malicious Spoofed method on the trusted Spoofed object, and the application prints:

secret val = 42

Although this kind of type confusion attack was possible in many implementations of the Java virtual machine prior to version 1.2, it usually couldn't be exploited in practice, because it requires the assistance of the class loader. In this example, the cracker added an if statement to GreeterClassLoader's loadClass() method that causes it to treat Spoofed specially. Were the cracker to attempt to instigate this kind of type confusion attack via an untrusted applet, he or she would run into trouble. Untrusted applets are not allowed to create class loaders. Thus, providing the designers of the class loaders in the application that loads applets into browsers did their jobs correctly, the cracker would have no way to exploit this (former) weakness in Java's type safety guarantee.

In Java virtual machine implementations that check the loading constraints that are now part of the Java virtual machine specification, the type confusion is not possible at all. All virtual machines must now keep an internal list of loading constraints that must be met as types are loaded. For example, when such a virtual machine resolves the CONSTANT_Methodref_info entry in Cracker' s constant pool that forms a symbolic reference to the getSpoofed() method of class Delegated, the virtual machine records a loading constraint. Because Delegated was defined by a different class loader than Cracker, and Delegated's getSpoofed() method returns a reference to a Spoofed, the virtual machine records the following constraint:

This constraint is checked later, when the virtual machine attempts to resolve the CONSTANT_Class_info entry in Delegated's constant pool that forms a symbolic reference to class Spoofed. At that time, the virtual machine discovers that the constraint is violated. The type named Spoofed that is being loaded by the system class loader is not the same type named Spoofed that was loaded by GreeterClassLoader. As a result, the Java virtual machine throws a LinkageError:

Exception in thread "main" java.lang.LinkageError: Class Spoofed
violates loader constraints
	at java.lang.ClassLoader.defineClass0(Native Method)
	at java.lang.ClassLoader.defineClass(ClassLoader.java:422)
	at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:10)
	at java.net.URLClassLoader.defineClass(URLClassLoader.java:248)
	at java.net.URLClassLoader.access$1(URLClassLoader.java:216)
	at java.net.URLClassLoader$1.run (URLClassLoader.java:197)
	at java.security.AccessController.doPrivileged (Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:191)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:290)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:275)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
	at Delegated.getSpoofed(Delegated.java, Compiled Code)
	at Cracker.greet(Cracker.java:13)
	at Greet.main(Greet.java, Compiled Code)

Java's guarantee of type safety is a cornerstone of its security model. Type safety means that programs are allowed to manipulate the memory occupied by an object's instance variables on the heap only in ways that are defined by that object's class. Likewise, type safety means that programs are allowed to manipulate the memory occupied by a class's static variables in the method area only in ways that are defined by that class. If the virtual machine can become confused about types, as demonstrated in this example, malicious code can potentially look at or change non-public variables. In addition, if malicious code could use a method defined in one version of a type to set an int instance variable, then use a method in another version of that type to interpret and return the value of the int as an array, the malicious code would in effect transform an int to an array reference. With this forged pointer, the malicious code could wreak all kinds of havoc. Thus, it is important that Java's type safety guarantee be iron-clad. The loading constraints ensure that, even in the presence of multiple namespaces, Java's type safety will be enforced at runtime.

On the CD-ROM

The CD-ROM contains the source code examples from this chapter in the linking directory.

The Resources Page

For more information about the material presented in this chapter, visit the resources page: http://www.artima.com/insidejvm/resources/


Sponsored Links

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use