|
|
Summary
This article looks at the class verifier of the Java virtual machine (JVM). The class verifier enables untrusted code to be verified up front, rather than on the fly as the code is executed. This ability provides uninterrupted execution (the program can't "crash" uncontrollably) at a minimal cost in speed degradation.
This month's article continues the discussion of Java's security model begun in August's "Under the Hood." In that article, I gave a general overview of the security mechanisms built into the Java virtual machine (JVM). I also looked closely at one aspect of those security mechanisms: the JVM's built-in safety features. In September's "Under the Hood," I examined the class loader architecture, another aspect of the JVM's built-in security mechanisms. This month I'll focus on the third prong of the JVM's security strategy: the class verifier.
The class-file verifier
Every Java virtual machine has a class-file verifier, which ensures
that loaded class files have a proper internal structure. If the
class-file verifier discovers a problem with a class file, it throws an
exception. Because a class file is just a sequence of binary data, a
virtual machine can't know whether a particular class file was
generated by a well-meaning Java compiler or by shady crackers bent on
compromising the integrity of the virtual machine. As a consequence,
all JVM implementations have a class-file verifier that can be invoked
on untrusted classes, to make sure the classes are safe to use.
One of the security goals that the class-file verifier helps achieve is program robustness. If a buggy compiler or savvy cracker generated a class file that contained a method whose bytecodes included an instruction to jump beyond the end of the method, that method could, if it were invoked, cause the virtual machine to crash. Thus, for the sake of robustness, it is important that the virtual machine verify the integrity of the bytecodes it imports.
Although Java virtual machine designers are allowed to decide when
their virtual machines will perform these checks, many implementations
will do most checking just after a class is loaded. Such a virtual
machine analyzes bytecodes (and verifies their integrity) once, before
they are ever executed. As part of its verification of bytecodes, the
Java virtual machine makes sure all jump instructions -- for example,
goto (jump always), ifeq (jump if top of
stack zero), etc. -- cause a jump to another valid instruction in the
bytecode stream of the method. As a consequence, the virtual machine
need not check for a valid target every time it encounters a jump
instruction as it executes bytecodes. In most cases, checking all
bytecodes once before they are executed is a more efficient way to
guarantee robustness than checking each bytecode instruction every time
it is executed.
A class-file verifier that performs its checking as early as possible most likely operates in two distinct phases. During phase one, which takes place just after a class is loaded, the class-file verifier checks the internal structure of the class file, including verifying the integrity of the bytecodes it contains. During phase two, which takes place as bytecodes are executed, the class-file verifier confirms the existence of symbolically referenced classes, fields, and methods.
Phase one: Internal checks
During phase one, the class-file verifier checks everything that's
possible to check in a class file by looking at only the class file
itself (without examining any other classes or interfaces). Phase one
of the class-file verifier makes sure the imported class file is
properly formed, internally consistent, adheres to the constraints of
the Java programming language, and contains bytecodes that will be safe
for the Java virtual machine to execute. If the class-file verifier
finds that any of these are not true, it throws an error, and the class
file is never used by the program.
Checking format and internal consistency
Besides verifying the integrity of the bytecodes, the verifier performs
many checks for proper class file format and internal consistency
during phase one. For example, every class file must start with the
same four bytes, the magic number: 0xCAFEBABE. The purpose
of magic numbers is to make it easy for file parsers to recognize a
certain type of file. Thus, the first thing a class-file verifier
likely checks is that the imported file does indeed begin with
0xCAFEBABE.
The class-file verifier also checks to make sure the class file is neither truncated nor enhanced with extra trailing bytes. Although different class files can be different lengths, each individual component contained inside a class file indicates its length as well as its type. The verifier can use the component types and lengths to determine the correct total length for each individual class file. In this way, it can verify that the imported file has a length consistent with its internal contents.
The verifier also looks at individual components to make sure they are well-formed instances of their type of component. For example, a method descriptor (the method's return type and the number and types of its parameters) is stored in the class file as a string that must adhere to a certain context-free grammar. One of the checks the verifier performs on individual components is to make sure each method descriptor is a well-formed string of the appropriate grammar.
In addition, the class-file verifier checks that the class itself
adheres to certain constraints placed on it by the specification of the
Java programming language. For example, the verifier enforces the rule
that all classes, except class Object, must have a
superclass. Thus, the class-file verifier checks at runtime some of the
Java language rules that should have been enforced at compile-time.
Because the verifier has no way of knowing if the class file was
generated by a benevolent, bug-free compiler, it checks each class file
to make sure the rules are followed.
The bytecode verifier
Once the class-file verifier has successfully completed the checks for
proper format and internal consistency, it turns its attention to the
bytecodes. During this part of phase one, which is commonly called the
"bytecode verifier," the Java virtual machine performs a data-flow
analysis on the streams of bytecodes that represent the methods of the
class. To understand the bytecode verifier, you need to understand a
bit about bytecodes and frames.
The bytecode streams that represent Java methods are a series of one-byte instructions, called opcodes, each of which may be followed by one or more operands. The operands supply extra data needed by the Java virtual machine to execute the opcode instruction. The activity of executing bytecodes, one opcode after another, constitutes a thread of execution inside the Java virtual machine. Each thread is awarded its own Java stack, which is made up of discrete frames. Each method invocation gets its own frame, a section of memory where it stores, among other things, local variables and intermediate results of computation. The part of the frame in which a method stores intermediate results is called the method's operand stack. An opcode and its (optional) operands may refer to the data stored on the operand stack or in the local variables of the method's frame. Thus, the virtual machine may use data on the operand stack, in the local variables, or both, in addition to any data stored as operands following an opcode when it executes the opcode.
The bytecode verifier does a great deal of checking. It checks to make sure that no matter what path of execution is taken to get to a certain opcode in the bytecode stream, the operand stack always contains the same number and types of items. It checks to make sure no local variable is accessed before it is known to contain a proper value. It checks that fields of the class are always assigned values of the proper type, and that methods of the class are always invoked with the correct number and types of arguments. The bytecode verifier also checks to make sure that each opcode is valid, that each opcode has valid operands, and that for each opcode, values of the proper type are in the local variables and on the operand stack. These are just a few of the many checks performed by the bytecode verifier, which is able, through all its checking, to verify that a stream of bytecodes is safe for the JVM to execute.
Phase two: Verification of symbolic references
Although phase one of verification likely takes place soon after the
JVM loads a class file, phase two is delayed until the bytecodes
contained in the class file actually are executed. Phase two verifies
symbolic references. A symbolic reference is a character
string that gives the name and possibly other information about the
referenced item -- enough information to uniquely identify a class,
field, or method. Thus, symbolic references to other classes give the
full name of the class. Symbolic references to the fields of other
classes give the class name, field name, and field descriptor. Symbolic
references to the methods of other classes give the class name, method
name, and method descriptor.
During phase two, the JVM follows the references from the class file being verified to the referenced class files, to make sure the references are correct. Because phase two has to look at other classes external to the class file being checked, phase two may require that new classes be loaded. Most JVM implementations will likely delay loading classes until they actually are used by the program.
If an implementation does load classes earlier -- perhaps in an attempt to speed up the loading process -- then it must still give the impression that it is loading classes as late as possible. If, for example, a Java virtual machine discovers during early loading that it can't find a certain referenced class, it doesn't throw a "class definition not found" error until (and unless) the referenced class is used for the first time by the running program.
Phase two and dynamic linking
Phase two of class-file verification is really just part of the process
of dynamic linking. When a class file is loaded, it contains symbolic
references to other classes and their fields and methods. Dynamic
linking is the process of resolving symbolic references into
direct references. As the JVM executes bytecodes and encounters an
opcode that, for the first time, uses a symbolic reference to another
class, the virtual machine must resolve the symbolic reference. The
virtual machine performs two basic tasks during resolution:
The virtual machine remembers the direct reference so that if it encounters the same reference again later, it can immediately use the direct reference without needing to spend time resolving the symbolic reference again.
When the Java virtual machine resolves a symbolic reference, phase two of the class-file verifier makes sure the reference is valid. If the reference is not valid -- for instance, if the class cannot be loaded or if the class exists but doesn't contain the referenced field or method -- the class-file verifier throws an error.
As an example, consider a class named Volcano. If a method
of class Volcano invokes a method in a class named
Lava, the name and descriptor of the method in
Lava are included as part of the binary data in the class
file for Volcano. So, during the course of execution when
Volcano's method first invokes Lava's method,
the JVM makes sure a method exists in class Lava that has
a name and descriptor that matches those expected by class
Volcano. If the symbolic reference (class name, method
name, and descriptor) is correct, the virtual machine replaces it with
a direct reference, such as a pointer, which it will use from then on.
But if the symbolic reference from class Volcano doesn't
match any method in class Lava, phase two verification
fails, and the JVM throws a "no such method" error.
Binary compatibility
The reason phase two of the class-file verifier must look at classes
that refer to one another to make sure they are compatible is because
Java programs are dynamically linked. Java compilers often will
recompile classes that depend on a class you have changed, and in so
doing, detect any incompatible changes at compile-time. But there may
be times when your compiler doesn't recompile a dependent class. For
example, if you are developing a large system, you will likely
partition the various parts of the system into packages. If you compile
each package separately, then a change to one class in a package would
likely cause a recompilation of affected classes within that same
package but not necessarily in any other package. Moreover, if you are
using someone else's packages, especially if your program downloads
class files from someone else's package across a network as it runs, it
may be impossible for you to check for compatibility at compile-time.
That's why phase two of the class-file verifier must check for
compatibility at runtime.
Incompatible changes: An example
As an example of incompatible changes, imagine you compiled class
Volcano (from the above example) with a Java compiler.
Because a method in Volcano invokes a method in another
class named Lava, the Java compiler would look for a class
file or a source file for class Lava to make sure there
was a method in Lava with the appropriate name, return
type, and number and types of arguments. If the compiler couldn't find
any Lava class, or if it encountered a Lava
class that didn't contain the desired method, the compiler would
generate an error and would not create a class file for
Volcano. Otherwise, the Java compiler would produce a
class file for Volcano that is compatible with the class
file for Lava. In this case, the Java compiler refused to
generate a class file for Volcano that wasn't already
compatible with class Lava.
The converse, however, is not necessarily true. The Java compiler
conceivably could generate a class file for Lava that
isn't compatible with Volcano. If the Lava
class doesn't refer to Volcano, you could potentially
change the name of the method in Lava that
Volcano invokes, and then recompile only
Lava. If you tried to run your program using the new
version of Lava, but still using the old version of
Volcano, the JVM would, as a result of phase two class
file verification, throw a "no such method" error when
Volcano attempted to invoke the now non-existent method in
Lava.
In this case, the change to class Lava broke binary
compatibility with the pre-existing class file for
Volcano. In practice, this situation may arise when you
update a library you have been using, and your existing code isn't
compatible with the new version of the library. To make it easier to
alter the code for libraries, the Java programming language was
designed to allow you to change a class in many ways that don't require
recompilation of classes that depend on the changed class. The changes
you are allowed to make are governed by the "rules of binary
compatibility," which are listed in the Java Language Specification
(see the Resources section for a link to this
spec). These rules clearly define what can be changed, added, or
deleted in a class without breaking binary compatibility with
pre-existing class files that depend on the changed class.
For example, it is always a binary compatible change to add a new
method to a class but never to delete a method that other classes may
be using. So in the case of Lava, you violated the rules
of binary compatibility when you changed the name of the method used by
Volcano, because, in effect, you deleted the old method and
added a new. If you had, instead, added the new method and then
rewritten the old method so it calls the new, that change would have
been binary compatible with any pre-existing class file that already
used Lava, including Volcano.
Conclusion
The class verifier contributes to the JVM's security model by ensuring
class files loaded from untrusted sources are safe for the JVM to use.
Rather than crashing upon encountering an improperly formed class file,
the JVM's class verifier rejects the malformed class file and throws an
exception. The class verifier catches problems caused by buggy
compilers, malicious crackers, or innocent binary incompatibility.
One of the more important aspects of Java's architecture is the bytecode verifier -- the mechanism that can verify the integrity of a sequence of bytecodes by performing a data-flow analysis on them. As mentioned above, all JVM implementations must verify the integrity of bytecodes in some way, but implementations are not required to use the data-flow analysis approach of the bytecode verifier. Nonetheless, enabling the verification of bytecodes up front by a data-flow analyzer was one of the primary design considerations of the JVM's instruction set. The bytecode verification approach is an attempt to achieve robustness (and security) while keeping to a minimum the trade-off in execution speed.
Next month
In next month's article, I'll complete the discussion of the JVM's
security model by describing the security manager.
About the author
Bill Venners has been writing software professionally for 12 years.
Based in Silicon Valley, he provides software consulting and training
services under the name Artima
Software Company. Over the years he has developed software for the
consumer electronics, education, semiconductor, and life insurance
industries. He has programmed in many languages on many platforms:
assembly language on various microprocessors, C on Unix, C++ on
Windows, Java on the Web. He is author of the book: Inside the Java
Virtual Machine, published by McGraw-Hill.
Reach Bill at bv@artima.com.
This article was first published under the name Security and the Class Verifier in JavaWorld, a division of Web Publishing, Inc., September 1997.
|
Sponsored Links
|