The Artima Developer Community
Sponsored Link

Chapter 3 of Inside the Java Virtual Machine
Security
by Bill Venners

<<  Page 4 of 17  >>

Advertisement

Pass Three: Bytecode Verification

Once the class file verifier has successfully completed the pass two checks, it turns its attention to the bytecodes. During this pass, which is commonly called the "bytecode verifier," the Java virtual machine performs a data-flow analysis on the streams of bytecodes that represent the methods of the class. To understand the bytecode verifier, you need to understand a bit about bytecodes and frames.

The bytecode streams that represent Java methods are a series of one-byte instructions, called opcodes, each of which may be followed by one or more operands. The operands supply extra data needed by the Java virtual machine to execute the opcode instruction. The activity of executing bytecodes, one opcode after another, constitutes a thread of execution inside the Java virtual machine. Each thread is awarded its own Java Stack, which is made up of discrete frames. Each method invocation gets its own frame, a section of memory where it stores, among other things, local variables and intermediate results of computation. The part of the frame in which a method stores intermediate results is called the method's operand stack. An opcode and its (optional) operands may refer to the data stored on the operand stack or in the local variables of the method's frame. Thus, the virtual machine may use data on the operand stack, in the local variables, or both, in addition to any data stored as operands following an opcode when it executes the opcode.

The bytecode verifier does a great deal of checking. It checks to make sure that no matter what path of execution is taken to get to a certain opcode in the bytecode stream, the operand stack always contains the same number and types of items. It checks to make sure no local variable is accessed before it is known to contain a proper value. It checks that fields of the class are always assigned values of the proper type, and that methods of the class are always invoked with the correct number and types of arguments. The bytecode verifier also checks to make sure that each opcode is valid, that each opcode has valid operands, and that for each opcode, values of the proper type are in the local variables and on the operand stack. These are just a few of the many checks performed by the bytecode verifier, which is able, through all its checking, to verify that a stream of bytecodes is safe for the Java virtual machine to execute.

The bytecode verifier doesn't attempt to detect all safe programs. If it tried to do that, it would run up against the Halting Problem. The Halting Problem, a well-known theorem in computer science, states that you can't write a program that can determine whether any program fed to it as input will halt when it is executed. Whether or not a program will halt is called an "undecidable" property of the program, because you can't write a program that can tell you 100% of the time whether or not any given program has the property. The undecideability of the Halting Problem extends to many properties of computer programs, including whether or not a set of Java bytecodes would be safe for a Java virtual machine to execute.

The way the bytecode verifier gets around the Halting Problem is by not attempting to pass all safe programs. Although you can't write a program that can determine whether or not any given program will halt, you can write a program that recognizes some programs that will halt. For example, if the first instruction of a program is halt, that program will halt. If a program has no loops in it, it will halt, and so on. Similarly, although you can't write a verifier that will pass all bytecode streams that are safe for the virtual machine to execute, you can write a verifier that will pass some of them. And that's just what Java's bytecode verifier does. The verifier checks to make sure a certain set of rules are followed by each set of bytecodes its fed. If a set of bytecodes obeys all the rules, the verifier knows the bytecodes are safe for the virtual machine to execute. If not, the bytecodes may or may not be safe for the virtual machine to execute. Thus, the verifier gets around the Halting Problem by recognizing some, but not all, safe bytecode streams. Given the nature of the constraints checked by the bytecode verifier, any program that can be written in the Java programming language can be compiled to bytecodes that will pass the verifier. Some programs that could not possibly be expressed in the Java programming language will pass the verifier. And some programs (also not expressible in Java source code) that would otherwise be safe for the virtual machine to execute, will not pass the verifier.

Passes one, two, and three of the class file verifier make sure the imported class file is properly formed, internally consistent, adheres to the constraints of the Java programming language, and contains bytecodes that will be safe for the Java virtual machine to execute. If the class file verifier finds that any of these are not true, it throws an error, and the class file is never used by the program.

Pass Four: Verification of Symbolic References

Pass four of the class file verifier takes place when the symbolic references contained in a class file are resolved in the process of dynamic linking. During pass four, the Java virtual machine follows the references from the class file being verified to the referenced class files, to make sure the references are correct. Because pass four has to look at other classes external to the class file being checked, pass four may require that new classes be loaded. Most Java virtual machine implementations will likely delay loading classes until they are actually used by the program. If an implementation does load classes earlier, perhaps in an attempt to speed up the loading process, then it must still give the impression that it is loading classes as late as possible. If, for example, a Java virtual machine discovers during early loading that it can't find a certain referenced class, it doesn't throw a NoClassDefFoundError error until (and unless) the referenced class is used for the first time by the running program. Thus, if a Java virtual machine performs early linking, pass four could happen shortly after pass three. But in Java virtual machines that resolve each symbolic reference the first time it is used, pass four will happen much later than pass three, as bytecodes are executed.

Pass four of class file verification is really just part of the process of dynamic linking. When a class file is loaded, it contains symbolic references to other classes and their fields and methods. A symbolic reference is a character string that gives the name and possibly other information about the referenced item- -enough information to uniquely identify a class, field, or method. Thus, symbolic references to other classes give the full name of the class; symbolic references to the fields of other classes give the class name, field name, and field descriptor; symbolic references to the methods of other classes give the class name, method name, and method descriptor.

Dynamic linking is the process of resolving symbolic references into direct references. As the Java virtual machine executes bytecodes and encounters an opcode that, for the first time, uses a symbolic reference to another class, the virtual machine must resolve the symbolic reference. The virtual machine performs two basic tasks during resolution:

  1. finds the class being referenced (loading it if necessary)
  2. replaces the symbolic reference with a direct reference, such as a pointer or offset, to the class, field, or method
The virtual machine remembers the direct reference so that if it encounters the same reference again later, it can immediately use the direct reference without needing to spend time resolving the symbolic reference again.

When the Java virtual machine resolves a symbolic reference, pass four of the class file verifier makes sure the reference is valid. If the reference is not valid--for instance, if the class cannot be loaded or if the class exists but doesn't contain the referenced field or method--the class file verifier throws an error.

As an example, consider again the Volcano class. If a method of class Volcano invokes a method in a class named Lava, the name and descriptor of the method in Lava are included as part of the binary data in the class file for Volcano. When Volcano's method first invokes Lava's method during the course of execution, the Java virtual machine makes sure a method exists in class Lava that has a name and descriptor that matches those expected by class Volcano. If the symbolic reference (class name, method name and descriptor) is correct, the virtual machine replaces it with a direct reference, such as a pointer, which it will use from then on. But if the symbolic reference from class Volcano doesn't match any method in class Lava, pass four verification fails, and the Java virtual machine throws a NoSuchMethodError.

Binary Compatibility

The reason pass four of the class file verifier must look at classes that refer to one another to make sure they are compatible is because Java programs are dynamically linked. Java compilers will often recompile classes that depend on a class you have changed, and in so doing, detect any incompatibility at compile- time. But there may be times when your compiler doesn't recompile a dependent class. For example, if you are developing a large system, you will likely partition the various parts of the system into packages. If you compile each package separately, then a change to one class in a package would likely cause a recompilation of affected classes within that same package, but not necessarily in any other package. Moreover, if you are using someone else's packages, especially if your program downloads class files from someone else's package across a network as it runs, it may be impossible for you to check for compatibility at compile-time. That's why pass four of the class file verifier must check for compatibility at run-time.

As an example of incompatible changes, imagine you compiled class Volcano (from the previous example) with a Java compiler. Because a method in Volcano invokes a method in another class named Lava, the Java compiler would look for a class file or a source file for class Lava to make sure there was a method in Lava with the appropriate name, return type, and number and types of arguments. If the compiler couldn't find any Lava class, or if it encountered a Lava class that didn't contain the desired method, the compiler would generate an error and would not create a class file for Volcano. Otherwise, the Java compiler would produce a class file for Volcano that is compatible with the class file for Lava. In this case, the Java compiler refused to generate a class file for Volcano that wasn't already compatible with class Lava.

The converse, however, is not necessarily true. The Java compiler could conceivably generate a class file for Lava that isn't compatible with Volcano. If the Lava class doesn't refer to Volcano, you could potentially change the name of the method Volcano invokes from the Lava class, and then recompile only the Lava class. If you tried to run your program using the new version of Lava, but still using the old version of Volcano that wasn't recompiled since you made your change to Lava, the Java virtual machine would, as a result of pass four class file verification, throw a NoSuchMethodError when Volcano attempted to invoke the now non-existent method in Lava.

In this case, the change to class Lava broke binary compatibility with the pre-existing class file for Volcano. In practice, this situation may arise when you update a library you have been using, and your existing code isn't compatible with the new version of the library. To make it easier to alter the code for libraries, the Java programming language was designed to allow you to make many kinds of changes to a class that don't require recompilation of classes that depend upon it. The changes you are allowed to make, which are listed in the Java Language Specification, are called the rules of binary compatibility. These rules clearly define what can be changed, added, or deleted in a class without breaking binary compatibility with pre-existing class files that depend on the changed class. For example, it is always a binary compatible change to add a new method to a class, but never to delete a method that other classes are using. So in the case of Lava, you violated the rules of binary compatibility when you changed the name of the method used by Volcano, because you in effect deleted the old method and added a new. If you had, instead, added the new method and then rewritten the old method so it calls the new, that change would have been binary compatible with any pre-existing class file that already used Lava, including Volcano.

<<  Page 4 of 17  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use