Java Type Loading, Linking, and Initialization

Chapter 7 of Inside the Java Virtual Machine
The Lifetime of a Type
by Bill Venners

<< Page 3 of 6 >>

Verification

After a type is loaded, it is ready to be linked. The first step of the linking process is verification-- ensuring that the type obeys the semantics of the Java language and that it won't violate the integrity of the virtual machine.

Verification is another area in which implementations of the Java virtual machine have some flexibility. Implementation designers can decide how and when to verify types. The Java virtual machine specification lists all the exceptions that a virtual machine can throw and under what circumstances it must throw them. No matter what kind of trouble a Java virtual machine might encounter, there is an exception or error it is supposed to throw. The specification says what exception or error should be thrown in each situation. In some cases, the specification says exactly when the exception or error should be thrown, but usually doesn't dictate precisely how or when the error condition should be detected.

Nevertheless, certain kinds of checks are very likely to take place at certain times in most Java virtual machine implementations. For example, during the loading process, the virtual machine must parse the stream of binary data that represents the type and build internal data structures. At this point, certain checks will have to be done just to ensure the initial act of parsing the binary data won't crash the virtual machine. During this parsing, implementations will likely check the binary data to make sure it has the expected overall format. Parsers of the Java class file format might check the magic number, make sure each component is in the right place and of the proper length, verify that the file isn't too short or too long, and so on. Although these checks take place during loading, before the official verification phase of linking, they are still logically part of the verification phase. The entire process of detecting any kind of problem with loaded types is placed under the category of verification.

Another check that likely occurs during loading is making sure that every class except Object has a superclass. This may be done during loading because when the virtual machine loads a class, it must also make sure all of the class's superclasses are loaded also. The only way a virtual machine can know the name of a given class's superclass is by peering into the binary data for the class. Since the virtual machine is looking at every class's superclass data during loading anyway, it may as well make this check during the loading phase.

Another check--one that likely occurs after the official verification phase in most implementations--is the verification of symbolic references. As described in earlier chapters, the process of dynamic linking involves locating classes, interfaces, fields, and methods referred to by symbolic references stored in the constant pool, and replacing the symbolic references with direct references. When the virtual machine searches for a symbolically referenced entity (type, field, or method), it must first make sure the entity exists. If the virtual machine finds that the entity exists, it must further check that the referencing type has permission to access the entity, given the entity's access permissions. These checks for existence and access permission are logically a part of verification, the first phase of linking, but most likely happen during resolution, the third phase of linking. Resolution itself can be delayed until each symbolic reference is first used by the program, so these checks may even take place after initialization.

So what gets checked during the official verification phase? Anything that hasn't already been checked before the official verification phase and that won't get checked after it. Here are two lists of some of the things that are good candidates for checking during the official verification phase. This first list is composed of checks that ensure classes are binary compatible with each other:

checking that final classes are not subclassed
checking that final methods are not overridden
making sure no incompatible method declarations (such as two methods that have the same name,

the same number, order, and types of parameters, but different return types) appear between the type and its supertypes

Note that while these checks require looking at other types, they only require looking at supertypes. Superclasses need to be initialized before subclasses, so these classes are likely already loaded. Superinterfaces do not need to be initialized when a class that implements them is initialized. However, superinterfaces must be loaded when the class that implements them (or the interface that extends them) is loaded. (They won't be initialized, just loaded and possibly linked at the option of the virtual machine implementation.) All a class's supertypes will be loaded when the class is loaded. At verification time, the class and all its supertypes will be to make sure they are all still binary compatible with one another.

checking that all constant pool entries are consistent with each other. (For example, the string_index item of a CONSTANT_String_info entry must be the index of a CONSTANT_Utf8_info entry.)
checking that all special strings contained in the constant pool (class names, field and method names, field and method descriptors) are well-formed
verifying the integrity of the bytecodes

The most complicated task in the above list is the last one: bytecode verification. All Java virtual machines must in some way verify the integrity of the bytecodes for every method they execute. For example, implementations are not allowed to crash because a jump instruction sends the virtual machine beyond the end of a method. They must detect that the jump instruction is invalid through some process of bytecode verification, and throw an error.

Java virtual machine implementations are not required to verify bytecodes during the official verification phase of linking. Implementations are free, for example, to verify individual instructions as each instruction is executed. One of the design goals of the Java virtual machine instruction set, however, was that it yield bytecodes streams that can be verified all at once by a data flow analyzer. The ability to verify bytecode streams all at once during linking, rather than on the fly as the program runs, gives a big boost to the potential execution speed of Java programs.

When verifying bytecodes via a data flow analyzer, the virtual machine may have to load other classes to ensure that the semantics of the Java language are being followed. For example, imagine a class contained a method that assigned a reference to an instance of java.lang.Float to a field of type java.lang.Number. In this case, the virtual machine would have to load class Float during bytecode verification to make sure it was a subclass of class Number. It would have to load Number to make sure it wasn't declared final. The virtual machine must not initialize class Float at this time, just load it. Float will be initialized only upon its first active use.

For more information on the class verification process, see Chapter 3, "Security."

Preparation

After a Java virtual machine has loaded a class and performed whatever verification it chooses to do up front, the class is ready for preparation. During the preparation phase, the Java virtual machine allocates memory for the class variables and sets them to default initial values. The class variables are not initialized to their proper initial values until the initialization phase. (No Java code is executed during the preparation step.) During preparation, the Java virtual machine sets the newly allocated memory for the class variables to a default value determined by the type of the variable. The default values for the various types are shown in Table 7-1.

Type	Initial Value
`int`	`0`
`long`	`0L`
`short`	`(short) 0`
`char`	`'\u0000'`
`byte`	`(byte) 0`
`boolean`	`false`
`reference`	`null`
`float`	`0.0f`
`double`	`0.0d`

Table 7-1. Default initial values for the primitive and reference types

Although the boolean type appears in Table 7-1, the Java virtual machine itself has very little support for booleans. Internally, boolean is usually implemented as an int, which gets set to zero (boolean false) by default. Therefore, boolean class variables, even if they are implemented internally as ints, are initialized to false.

During the preparation phase, Java virtual machine implementations may also allocate memory for data structures that are intended to improve the performance of the running program. An example of such a data structure is a method table, which contains a pointer to the data for every method in a class, including those inherited from its superclasses. A method table enables an inherited method to be invoked on an object without a search of superclasses at the point of invocation. Method tables are described in more detail in Chapter 8, "The Linking Model."

Resolution

After a type has been through the first two phases of linking: verification and preparation, it is ready for the third and final phase of linking: resolution. Resolution is the process of locating classes, interfaces, fields, and methods referenced symbolically from a type's constant pool, and replacing those symbolic references with direct references. As mentioned above, this phase of linking is optional until (and unless) each symbolic reference is first used by the program. Resolution is described in detail in Chapter 8, "The Linking Model."

<< Page 3 of 6 >>


	Web Artima.com