Java Virtual Machine's Internal Architecture

Chapter 5 of Inside the Java Virtual Machine
The Java Virtual Machine
by Bill Venners

<< Page 11 of 13 >>

Execution Techniques

Various execution techniques that may be used by an implementation--interpreting, just-in-time compiling, adaptive optimization, native execution in silicon--were described in Chapter 1, "Introduction to Java's Architecture." The main point to remember about execution techniques is that an implementation can use any technique to execute bytecodes so long as it adheres to the semantics of the Java virtual machine instruction set.

One of the most interesting -- and speedy -- execution techniques is adaptive optimization. The adaptive optimization technique, which is used by several existing Java virtual machine implementations, including Sun's Hotspot virtual machine, borrows from techniques used by earlier virtual machine implementations. The original JVMs interpreted bytecodes one at a time. Second-generation JVMs added a JIT compiler, which compiles each method to native code upon first execution, then executes the native code. Thereafter, whenever the method is called, the native code is executed. Adaptive optimizers, taking advantage of information available only at run-time, attempt to combine bytecode interpretation and compilation to native in the way that will yield optimum performance.

An adaptive optimizing virtual machine begins by interpreting all code, but it monitors the execution of that code. Most programs spend 80 to 90 percent of their time executing 10 to 20 percent of the code. By monitoring the program execution, the virtual machine can figure out which methods represent the program's "hot spot" -- the 10 to 20 percent of the code that is executed 80 to 90 percent of the time.

When the adaptive optimizing virtual machine decides that a particular method is in the hot spot, it fires off a background thread that compiles those bytecodes to native and heavily optimizes the native code. Meanwhile, the program can still execute that method by interpreting its bytecodes. Because the program isn't held up and because the virtual machine is only compiling and optimizing the "hot spot" (perhaps 10 to 20 percent of the code), the virtual machine has more time than a traditional JIT to perform optimizations.

The adaptive optimization approach yields a program in which the code that is executed 80 to 90 percent of the time is native code as heavily optimized as statically compiled C++, with a memory footprint not much bigger than a fully interpreted Java program. In other words, fast. An adaptive optimizing virtual machine can keep the old bytecodes around in case a method moves out of the hot spot. (The hot spot may move somewhat as the program executes.) If a method moves out of the hot spot, the virtual machine can discard the compiled code and revert back to interpreting that method's bytecodes.

As you may have noticed, an adaptive optimizer's approach to making Java programs run fast is similar to the approach programmers should take to improve a program's performance. An adaptive optimizing virtual machine, unlike a regular JIT compiling virtual machine, doesn't do "premature optimization." The adaptive optimizing virtual machine begins by interpreting bytecodes. As the program runs, the virtual machine "profiles" the program to find the program's "hot spot," that 10 to 20 percent of the code that gets executed 80 to 90 percent of the time. And like a good programmer, the adaptive optimizing virtual machine just focuses its optimization efforts on that time-critical code.

But there is a bit more to the adaptive optimization story. Adaptive optimizers can be tuned for the run- time characteristics of Java programs -- in particular, of "well- designed" Java programs. According to David Griswold, Hotspot manager at JavaSoft, "Java is a lot more object-oriented than C++. You can measure that; you can look at the rates of method invocations, dynamic dispatches, and such things. And the rates [for Java] are much higher than they are in C++." Now this high rate of method invocations and dynamic dispatches is especially true in a well-designed Java program, because one aspect of a well-designed Java program is highly factored, fine-grained design -- in other words, lots of compact, cohesive methods and compact, cohesive objects.

This run-time characteristic of Java programs, the high frequency of method invocations and dynamic dispatches, affects performance in two ways. First, there is an overhead associated with each dynamic dispatch. Second, and more significantly, method invocations reduce the effectiveness of compiler optimization.

Method invocations reduce the effectiveness of optimizers because optimizers don't perform well across method invocation boundaries. As a result, optimizers end up focusing on the code between method invocations. And the greater the method invocation frequency, the less code the optimizer has to work with between method invocations, and the less effective the optimization becomes.

The standard solution to this problem is inlining -- the copying of an invoked method's body directly into the body of the invoking method. Inlining eliminates method calls and gives the optimizer more code to work with. It makes possible more effective optimization at the cost of increasing the run- time memory footprint of the program.

The trouble is that inlining is harder with object-oriented languages, such as Java and C++, than with non-object-oriented languages, such as C, because object-oriented languages use dynamic dispatching. And the problem is worse in Java than in C++, because Java has a greater call frequency and a greater percentage of dynamic dispatches than C++.

A regular optimizing static compiler for a C program can inline straightforwardly because there is one function implementation for each function call. The trouble with doing inlining with object- oriented languages is that dynamic method dispatch means there may be multiple function (or method) implementation for any given function call. In other words, the JVM may have many different implementations of a method to choose from at run time, based on the class of the object on which the method is being invoked.

One solution to the problem of inlining a dynamically dispatched method call is to just inline all of the method implementations that may get selected at run-time. The trouble with this solution is that in cases where there are a lot of method implementations, the size of the optimized code can grow very large.

One advantage adaptive optimization has over static compilation is that, because it is happening at runtime, it can use information not available to a static compiler. For example, even though there may be 30 possible implementations that may get called for a particular method invocation, at run-time perhaps only two of them are ever called. The adaptive optimization approach enables only those two to be inlined, thereby minimizing the size of the optimized code.

Threads

The Java virtual machine specification defines a threading model that aims to facilitate implementation on a wide variety of architectures. One goal of the Java threading model is to enable implementation designers, where possible and appropriate, to use native threads. Alternatively, designers can implement a thread mechanism as part of their virtual machine implementation. One advantage to using native threads on a multi-processor host is that different threads of a Java application could run simultaneously on different processors.

One tradeoff of Java's threading model is that the specification of priorities is lowest-common- denominator. A Java thread can run at any one of ten priorities. Priority one is the lowest, and priority ten is the highest. If designers use native threads, they can map the ten Java priorities onto the native priorities however seems most appropriate. The Java virtual machine specification defines the behavior of threads at different priorities only by saying that all threads at the highest priority will get some CPU time. Threads at lower priorities are guaranteed to get CPU time only when all higher priority threads are blocked. Lower priority threads may get some CPU time when higher priority threads aren't blocked, but there are no guarantees.

The specification doesn't assume time-slicing between threads of different priorities, because not all architectures time-slice. (As used here, time-slicing means that all threads at all priorities will be guaranteed some CPU time, even when no threads are blocked.) Even among those architectures that do time-slice, the algorithms used to allot time slots to threads at various priorities can differ greatly.

As mentioned in Chapter 2, "Platform Independence," you must not rely on time-slicing for program correctness. You should use thread priorities only to give the Java virtual machine hints at what it should spend more time on. To coordinate the activities of multiple threads, you should use synchronization.

The thread implementation of any Java virtual machine must support two aspects of synchronization: object locking and thread wait and notify. Object locking helps keep threads from interfering with one another while working independently on shared data. Thread wait and notify helps threads to cooperate with one another while working together toward some common goal. Running applications access the Java virtual machine's locking capabilities via the instruction set, and its wait and notify capabilities via the wait(), notify(), and notifyAll() methods of class Object. For more details, see Chapter 20, "Thread Synchronization."

In the Java virtual machine Specification, the behavior of Java threads is defined in terms of variables, a main memory, and working memories. Each Java virtual machine instance has a main memory, which contains all the program's variables: instance variables of objects, components of arrays, and class variables. Each thread has a working memory, in which the thread stores "working copies" of variables it uses or assigns. Local variables and parameters, because they are private to individual threads, can be logically seen as part of either the working memory or main memory.

The Java virtual machine specification defines many rules that govern the low-level interactions of threads with main memory. For example, one rule states that all operations on primitive types, except in some cases longs and doubles, are atomic. For example, if two threads compete to write two different values to an int variable, even in the absence of synchronization, the variable will end up with one value or the other. The variable will not contain a corrupted value. In other words, one thread will win the competition and write its value to the variable first. The losing thread need not sulk, however, because it will write its value the variable second, overwriting the "winning" thread's value.

The exception to this rule is any long or double variable that is not declared volatile. Rather than being treated as a single atomic 64-bit value, such variables may be treated by some implementations as two atomic 32-bit values. Storing a non-volatile long to memory, for example, could involve two 32-bit write operations. This non- atomic treatment of longs and doubles means that two threads competing to write two different values to a long or double variable can legally yield a corrupted result.

Although implementation designers are not required to treat operations involving non-volatile longs and doubles atomically, the Java virtual machine specification encourages them to do so anyway. This non-atomic treatment of longs and doubles is an exception to the general rule that operations on primitive types are atomic. This exception is intended to facilitate efficient implementation of the threading model on processors that don't provide efficient ways to transfer 64-bit values to and from memory. In the future, this exception may be eliminated. For the time being, however, Java programmers must be sure to synchronize access to shared longs and doubles.

Fundamentally, the rules governing low-level thread behavior specify when a thread may and when it must:

copy values of variables from the main memory to its working memory, and
write values from its working memory back into the main memory.

For certain conditions, the rules specify a precise and predictable order of memory reads and writes. For other conditions, however, the rules do not specify any order. The rules are designed to enable Java programmers to build multi-threaded programs that exhibit predictable behavior, while giving implementation designers some flexibility. This flexibility enables designers of Java virtual machine implementations to take advantage of standard hardware and software techniques that can improve the performance of multi-threaded applications.

The fundamental high-level implication of all the low-level rules that govern the behavior of threads is this: If access to certain variables isn't synchronized, threads are allowed update those variables in main memory in any order. Without synchronization, your multi-threaded applications may exhibit surprising behavior on some Java virtual machine implementations. With proper use of synchronization, however, you can create multi-threaded Java applications that behave in a predictable way on any implementation of the Java virtual machine.

<< Page 11 of 13 >>


	Web Artima.com