|
|
|
Sponsored Link •
|
Summary
All Java programs are compiled into class files that contain bytecodes, the machine language of the Java virtual machine. Here's a first look at Java's bytecodes.
Welcome to another installment of "Under The Hood." This column gives Java developers a glimpse of what is going on beneath their running Java programs. This month's article takes an initial look at the bytecode instruction set of the Java virtual machine (JVM). The article covers primitive types operated upon by bytecodes, bytecodes that convert between types, and bytecodes that operate on the stack. Subsequent articles will discuss other members of the bytecode family.
The bytecode format
Bytecodes are the machine language of the Java virtual machine.
When a JVM loads a class file, it gets one stream of bytecodes
for each method in the class. The bytecodes streams are stored in the method
area of the JVM. The bytecodes for a method are executed
when that method is invoked during the course of running the program. They
can be executed by intepretation, just-in-time compiling, or
any other technique that was chosen by the designer of a particular JVM.
A method's bytecode stream is a sequence of instructions for the Java virtual machine. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the action to take. If more information is required before the JVM can take the action, that information is encoded into one or more operands that immediately follow the opcode.
Each type of opcode has a mnemonic. In the typical assembly language style, streams of Java bytecodes can be represented by their mnemonics followed by any operand values. For example, the following stream of bytecodes can be disassembled into mnemonics:
// Bytecode stream: 03 3b 84 00 01 1a 05 68 3b a7 ff f9// Disassembly: iconst_0 // 03 istore_0 // 3b iinc 0, 1 // 84 00 01 iload_0 // 1a iconst_2 // 05 imul // 68 istore_0 // 3b goto -7 // a7 ff f9
The bytecode instruction set was designed to be compact. All instructions, except two that deal with table jumping, are aligned on byte boundaries. The total number of opcodes is small enough so that opcodes occupy only one byte. This helps minimize the size of class files that may be traveling across networks before being loaded by a JVM. It also helps keep the size of the JVM implementation small.
All computation in the JVM centers on the stack. Because
the JVM has no registers for storing abitrary values,
everything must be pushed onto the stack before it can be used in a calculation.
Bytecode instructions therefore operate primarily on the stack. For example,
in the above bytecode sequence a local variable is multiplied by two by
first pushing the local variable onto the stack with the iload_0 instruction,
then pushing two onto the stack with iconst_2. After both integers have
been pushed onto the stack, the imul instruction effectively pops the two
integers off the stack, multiplies them, and pushes the result back
onto the stack. The result is popped off the top of the stack and stored
back to the local variable by the istore_0 instruction. The JVM was designed
as a stack-based machine rather than a register-based machine to facilitate
efficient implementation on register-poor architectures such as the Intel
486.
Primitive types
The JVM supports seven primitive data types.
Java programmers can declare and use variables of these data types, and
Java bytecodes operate upon these data types. The seven primitive types
are listed in the following table:
| Type | Definition |
|---|---|
byte |
one-byte signed two's complement integer |
short |
two-byte signed two's complement integer |
int |
4-byte signed two's complement integer |
long |
8-byte signed two's complement integer |
float |
4-byte IEEE 754 single-precision float |
double |
8-byte IEEE 754 double-precision float |
char |
2-byte unsigned Unicode character |
The primitive types appear as operands in bytecode streams. All primitive
types that occupy more than 1 byte are stored in big-endian order in
the bytecode stream, which means higher-order bytes precede lower-order
bytes. For example, to push the constant value 256 (hex 0100) onto the
stack, you would use the sipush opcode followed by a short operand. The
short appears in the bytecode stream, shown below, as "01 00"
because the JVM is big-endian. If the JVM were little-endian, the short
would appear as "00 01".
// Bytecode stream: 17 01 00
// Dissassembly:
sipush 256; // 17 01 00
Java opcodes generally indicate the type of their operands. This allows
operands to just be themselves, with no need to identify their type to
the JVM. For example, instead of having one opcode that pushes a local
variable onto the stack, the JVM has several. Opcodes iload,
lload,
fload, and dload push local variables of type int, long,
float, and double, respectively, onto the stack.
Pushing constants onto the stack
Many opcodes push constants onto the stack. Opcodes indicate
the constant value to push in three different ways. The constant value
is either implicit in the opcode itself, follows the opcode in the bytecode
stream as an operand, or is taken from the constant pool.
Some opcodes by themselves indicate a type and constant value to push.
For example, the iconst_1 opcode tells the JVM to push integer
value one. Such bytecodes are defined for some commonly pushed numbers
of various types. These instructions occupy only 1 byte in the bytecode
stream. They increase the efficiency of bytecode execution and reduce the
size of bytecode streams. The opcodes that push ints and floats are shown
in the following table:
| Opcode | Operand(s) | Description |
|---|---|---|
iconst_m1 |
(none) | pushes int -1 onto the stack |
iconst_0 |
(none) | pushes int 0 onto the stack |
iconst_1 |
(none) | pushes int 1 onto the stack |
iconst_2 |
(none) | pushes int 2 onto the stack |
iconst_3 |
(none) | pushes int 3 onto the stack |
iconst_4 |
(none) | pushes int 4 onto the stack |
iconst_5 |
(none) | pushes int 5 onto the stack |
fconst_0 |
(none) | pushes float 0 onto the stack |
fconst_1 |
(none) | pushes float 1 onto the stack |
fconst_2 |
(none) | pushes float 2 onto the stack |
The opcodes shown in the previous table push ints and floats, which are 32-bit values. Each slot on the Java stack is 32 bits wide. Therefore each time an int or float is pushed onto the stack, it occupies one slot.
The opcodes shown in the next table push longs and doubles. Long and double values occupy 64 bits. Each time a long or double is pushed onto the stack, its value occupies two slots on the stack. Opcodes that indicate a specific long or double value to push are shown in the following table:
| Opcode | Operand(s) | Description |
|---|---|---|
lconst_0 |
(none) | pushes long 0 onto the stack |
lconst_1 |
(none) | pushes long 1 onto the stack |
dconst_0 |
(none) | pushes double 0 onto the stack |
dconst_1 |
(none) | pushes double 1 onto the stack |
One other opcode pushes an implicit constant value onto the stack. The
aconst_null opcode, shown in the following table, pushes a null object
reference onto the stack. The format of an object reference depends upon
the JVM implementation. An object reference will somehow
refer to a Java object on the garbage-collected heap. A null object reference
indicates an object reference variable does not currently refer to any
valid object. The aconst_null opcode is used in the process of assigning
null to an object reference variable.
| Opcode | Operand(s) | Description |
|---|---|---|
aconst_null |
(none) | pushes a null object reference onto the stack |
Two opcodes indicate the constant to push with an operand that immediately follows the opcode. These opcodes, shown in the following table, are used to push integer constants that are within the valid range for byte or short types. The byte or short that follows the opcode is expanded to an int before it is pushed onto the stack, because every slot on the Java stack is 32 bits wide. Operations on bytes and shorts that have been pushed onto the stack are actually done on their int equivalents.
| Opcode | Operand(s) | Description |
|---|---|---|
bipush |
byte1 | expands byte1 (a byte type) to an int and pushes it onto the stack |
sipush |
byte1, byte2 | expands byte1, byte2 (a short type) to an int and pushes it onto the stack |
Three opcodes push constants from the constant pool. All constants associated with a class, such as final variables values, are stored in the class's constant pool. Opcodes that push constants from the constant pool have operands that indicate which constant to push by specifying a constant pool index. The Java virtual machine will look up the constant given the index, determine the constant's type, and push it onto the stack.
The constant pool index is an unsigned value that immediately follows
the opcode in the bytecode stream. Opcodes lcd1 and lcd2
push a 32-bit item onto the stack, such as an int or float. The difference
between lcd1 and lcd2 is that lcd1 can only
refer to constant pool locations one through 255 because its index is just
1 byte. (Constant pool location zero is unused.) lcd2 has a
2-byte index, so it can refer to any constant pool location. lcd2w
also has a 2-byte index, and it is used to refer to any constant pool
location containing a long or double, which occupy 64 bits. The opcodes
that push constants from the constant pool are shown in the following table:
| Opcode | Operand(s) | Description |
|---|---|---|
ldc1 |
indexbyte1 | pushes 32-bit constant_pool entry specified by indexbyte1 onto the stack |
ldc2 |
indexbyte1, indexbyte2 | pushes 32-bit constant_pool entry specified by indexbyte1, indexbyte2 onto the stack |
ldc2w |
indexbyte1, indexbyte2 | pushes 64-bit constant_pool entry specified by indexbyte1, indexbyte2 onto the stack |
Pushing local variables onto the stack
Local variables are stored in a special section of the stack
frame. The stack frame is the portion of the stack being used by the currently
executing method. Each stack frame consists of three sections -- the local
variables, the execution environment, and the operand stack. Pushing a
local variable onto the stack actually involves moving a value from the
local variables section of the stack frame to the operand section. The
operand section of the currently executing method is always the top of
the stack, so pushing a value onto the operand section of the current stack
frame is the same as pushing a value onto the top of the stack.
The Java stack is a last-in, first-out stack of 32-bit slots. Because each slot in the stack occupies 32 bits, all local variables occupy at least 32 bits. Local variables of type long and double, which are 64-bit quantities, occupy two slots on the stack. Local variables of type byte or short are stored as local variables of type int, but with a value that is valid for the smaller type. For example, an int local variable which represents a byte type will always contain a value valid for a byte (-128 <= value <= 127).
Each local variable of a method has a unique index. The local variable section of a method's stack frame can be thought of as an array of 32-bit slots, each one addressable by the array index. Local variables of type long or double, which occupy two slots, are referred to by the lower of the two slot indexes. For example, a double that occupies slots two and three would be referred to by an index of two.
Several opcodes exist that push int and float local variables onto the
operand stack. Some opcodes are defined that implicitly refer to a commonly
used local variable position. For example, iload_0 loads the int
local variable at position zero. Other local variables are pushed onto
the stack by an opcode that takes the local variable index from the first
byte following the opcode. The iload instruction is an example
of this type of opcode. The first byte following iload is interpreted
as an unsigned 8-bit index that refers to a local variable.
Unsigned 8-bit local variable indexes, such as the one that follows
the iload instruction, limit the number of local variables in
a method to 256. A separate instruction, called wide, can extend
an 8-bit index by another 8 bits. This raises the local variable
limit to 64 kilobytes. The wide opcode is followed by an 8-bit operand.
The wide opcode and its operand can precede an instruction, such
as iload, that takes an 8-bit unsigned local variable index.
The JVM combines the 8-bit operand of the wide instruction
with the 8-bit operand of the iload instruction to yield a
16-bit unsigned local variable index.
The opcodes that push int and float local variables onto the stack are shown in the following table:
| Opcode | Operand(s) | Description |
|---|---|---|
iload |
vindex | pushes int from local variable position vindex |
iload_0 |
(none) | pushes int from local variable position zero |
iload_1 |
(none) | pushes int from local variable position one |
iload_2 |
(none) | pushes int from local variable position two |
iload_3 |
(none) | pushes int from local variable position three |
fload |
vindex | pushes float from local variable position vindex |
fload_0 |
(none) | pushes float from local variable position zero |
fload_1 |
(none) | pushes float from local variable position one |
fload_2 |
(none) | pushes float from local variable position two |
fload_3 |
(none) | pushes float from local variable position three |
The next table shows the instructions that push local variables of type long and double onto the stack. These instructions move 64 bits from the local variable section of the stack frame to the operand section.
| Opcode | Operand(s) | Description |
|---|---|---|
lload |
vindex | pushes long from local variable positions vindex and (vindex + 1) |
lload_0 |
(none) | pushes long from local variable positions zero and one |
lload_1 |
(none) | pushes long from local variable positions one and two |
lload_2 |
(none) | pushes long from local variable positions two and three |
lload_3 |
(none) | pushes long from local variable positions three and four |
dload |
vindex | pushes double from local variable positions vindex and (vindex + 1) |
dload_0 |
(none) | pushes double from local variable positions zero and one |
dload_1 |
(none) | pushes double from local variable positions one and two |
dload_2 |
(none) | pushes double from local variable positions two and three |
dload_3 |
(none) | pushes double from local variable positions three and four |
The final group of opcodes that push local variables move 32-bit object references from the local variables section of the stack frame to the operand section. These opcodes are shown in the following table:
| Opcode | Operand(s) | Description |
|---|---|---|
aload |
vindex | pushes object reference from local variable position vindex |
aload_0 |
(none) | pushes object reference from local variable position zero |
aload_1 |
(none) | pushes object reference from local variable position one |
aload_2 |
(none) | pushes object reference from local variable position two |
aload_3 |
(none) | pushes object reference from local variable position three |
Popping to local variables
For each opcode that pushes a local variable onto the stack
there exists a corresponding opcode that pops the top of the stack back
into the local variable. The names of these opcodes can be formed by replacing
"load" in the names of the push opcodes with "store".
The opcodes that pop ints and floats from the top of the operand stack
to a local variable are listed in the following table. Each of these opcodes
moves one 32-bit value from the top of the stack to a local variable.
| Opcode | Operand(s) | Description |
|---|---|---|
istore |
vindex | pops int to local variable position vindex |
istore_0 |
(none) | pops int to local variable position zero |
istore_1 |
(none) | pops int to local variable position one |
istore_2 |
(none) | pops int to local variable position two |
istore_3 |
(none) | pops int to local variable position three |
fstore |
vindex | pops float to local variable position vindex |
fstore_0 |
(none) | pops float to local variable position zero |
fstore_1 |
(none) | pops float to local variable position one |
fstore_2 |
(none) | pops float to local variable position two |
fstore_3 |
(none) | pops float to local variable position three |
The next table shows the instructions that pop values of type long and double into a local variable. These instructions move a 64-bit value from the top of the operand stack to a local variable.
| Opcode | Operand(s) | Description |
|---|---|---|
lstore |
vindex | pops long to local variable positions vindex and (vindex + 1) |
lstore_0 |
(none) | pops long to local variable positions zero and one |
lstore_1 |
(none) | pops long to local variable positions one and two |
lstore_2 |
(none) | pops long to local variable positions two and three |
lstore_3 |
(none) | pops long to local variable positions three and four |
dstore |
vindex | pops double to local variable positions vindex and (vindex + 1) |
dstore_0 |
(none) | pops double to local variable positions zero and one |
dstore_1 |
(none) | pops double to local variable positions one and two |
dstore_2 |
(none) | pops double to local variable positions two and three |
dstore_3 |
(none) | pops double to local variable positions three and four |
The final group of opcodes that pops to local variables are shown in the following table. These opcodes pop a 32-bit object reference from the top of the operand stack to a local variable.
| Opcode | Operand(s) | Description |
|---|---|---|
astore |
vindex | pops object reference to local variable position vindex |
astore_0 |
(none) | pops object reference to local variable position zero |
astore_1 |
(none) | pops object reference to local variable position one |
astore_2 |
(none) | pops object reference to local variable position two |
astore_3 |
(none) | pops object reference to local variable position three |
Type conversions
The Java virtual machine has many opcodes that convert from one primitive
type to another. No opcodes follow the conversion opcodes in the bytecode
stream. The value to convert is taken from the top of the stack. The JVM
pops the value at the top of the stack, converts it, and pushes the result
back onto the stack. Opcodes that convert between int, long, float, and
double are shown in the following table. There is an opcode for each possible
from-to combination of these four types:
| Opcode | Operand(s) | Description |
|---|---|---|
i2l |
(none) | converts int to long |
i2f |
(none) | converts int to float |
i2d |
(none) | converts int to double |
l2i |
(none) | converts long to int |
l2f |
(none) | converts long to float |
l2d |
(none) | converts long to double |
f2i |
(none) | converts float to int |
f2l |
(none) | converts float to long |
f2d |
(none) | converts float to double |
d2i |
(none) | converts double to int |
d2l |
(none) | converts double to long |
d2f |
(none) | converts double to float |
Opcodes that convert from an int to a type smaller than int are shown
in the following table. No opcodes exist that convert directly from a long,
float, or double to the types smaller than int. Therefore converting from
a float to a byte, for example, would require two steps. First the float
must be converted to an int with f2i, then the resulting int can
be converted to a byte with int2byte.
| Opcode | Operand(s) | Description |
|---|---|---|
int2byte |
(none) | converts int to byte |
int2char |
(none) | converts int to char |
int2short |
(none) | converts int to short |
Although opcodes exist that convert an int to primitive types smaller than int (byte, short, and char), no opcodes exist that convert in the opposite direction. This is because any bytes, shorts, or chars are effectively converted to int before being pushed onto the stack. Arithmetic operations upon bytes, shorts, and chars are done by first converting the values to int, performing the arithmetic operations on the ints, and being happy with an int result. This means that if you add 2 bytes you get an int, and if you want a byte result you must explicitly convert the int result back to a byte. For example, the following code won't compile:
class BadArithmetic {
byte addOneAndOne() {
byte a = 1;
byte b = 1;
byte c = a + b;
return c;
}
}
When presented with the above code, javac objects with the following remark:
BadArithmetic.java(7): Incompatible type for declaration. Explicit cast needed to convert int to byte.
byte c = a + b;
^
To remedy the situation, the Java programmer must explicitly convert the int result of the addition of a + b back to a byte, as in the following code:
class GoodArithmetic {
byte addOneAndOne() {
byte a = 1;
byte b = 1;
byte c = (byte) (a + b);
return c;
}
}
This makes javac so happy it drops a GoodArithmetic.class file, which contains the following bytecode sequence for the addOneAndOne() method:
iconst_1 // Push int constant 1.
istore_1 // Pop into local variable 1, which is a: byte a = 1;
iconst_1 // Push int constant 1 again.
istore_2 // Pop into local variable 2, which is b: byte b = 1;
iload_1 // Push a (a is already stored as an int in local variable 1).
iload_2 // Push b (b is already stored as an int in local variable 2).
iadd // Perform addition. Top of stack is now (a + b), an int.
int2byte // Convert int result to byte (result still occupies 32 bits).
istore_3 // Pop into local variable 3, which is byte c: byte c = (byte) (a + b);
iload_3 // Push the value of c so it can be returned.
ireturn // Proudly return the result of the addition: return c;
Conversion diversion: a JVM simulation
The applet below demonstrates a JVM executing
a sequence of bytecodes. The bytecode sequence in the simulation was generated
by javac for the Convert() method of the class shown below:
class Diversion {
static void Convert() {
byte imByte = 0;
int imInt = 125;
while (true) {
++imInt;
imByte = (byte) imInt;
imInt *= -1;
imByte = (byte) imInt;
imInt *= -1;
}
}
}
The actual bytecodes generated by javac for Convert() are shown
below:
iconst_0 // Push int constant 0.
istore_0 // Pop to local variable 0, which is imByte: byte imByte = 0;
bipush 125 // Expand byte constant 125 to int and push.
istore_1 // Pop to local variable 1, which is imInt: int imInt = 125;
iinc 1 1 // Increment local variable 1 (imInt) by 1: ++imInt;
iload_1 // Push local variable 1 (imInt).
int2byte // Truncate and sign extend top of stack so it has valid byte value.
istore_0 // Pop to local variable 0 (imByte): imByte = (byte) imInt;
iload_1 // Push local variable 1 (imInt) again.
iconst_m1 // Push integer -1.
imul // Pop top two ints, multiply, push result.
istore_1 // Pop result of multiply to local variable 1 (imInt): imInt *= -1;
iload_1 // Push local variable 1 (imInt).
int2byte // Truncate and sign extend top of stack so it has valid byte value.
istore_0 // Pop to local variable 0 (imByte): imByte = (byte) imInt;
iload_1 // Push local variable 1 (imInt) again.
iconst_m1 // Push integer -1.
imul // Pop top two ints, multiply, push result.
istore_1 // Pop result of multiply to local variable 1 (imInt): imInt *= -1;
goto 5 // Jump back to the iinc instruction: while (true) {}
The Convert() method demonstrates the manner in which the JVM converts
from int to byte. imInt starts out as 125. Each pass through the
while loop, it is incremented and converted to a byte. Then it is multiplied
by -1 and again converted to a byte. The simulation quickly shows what
happens at the edges of the valid range for the byte type.
The maximum value for a byte is 127. The minimum value is -128. Values of type int that are within this range convert directly to byte. However, as soon as the int gets beyond the valid range for byte, things get interesting.
The JVM converts an int to a byte by truncating and sign extending. The highest order bit, the "sign bit," of longs, ints, shorts, and bytes indicate whether or not the integer value is positive or negative. If the sign bit is zero, the value is positive. If the sign bit is one, the value is negative. Bit 7 of a byte value is its sign bit. To convert an int to a byte, bit 7 of the int is copied to bits 8 through 31. This produces an int that has the same numerical value that the int's lowest order byte would have if it were interpreted as a byte type. After the truncation and sign extension, the int will contain a valid byte value.
The simulation applet shows what happens when an int that is just beyond the valid range for byte types gets converted to a byte. For example, when the imInt variable has a value of 128 (0x00000080) and is converted to byte, the resulting byte value is -128 (0xffffff80). Later, when the imInt variable has a value of -129 (0xffffff7f) and is converted to byte, the resulting byte value is 127 (0x0000007f).
To drive the simulation, just press the "Step" button. Each press of the "Step" button will cause the JVM to execute one bytecode instruction. To start the simulation over, press the "Reset" button. There is a text area at the bottom of the applet that describes the next instruction to be executed. Happy clicking.
To view the Conversion Diversion applet, visit the interactive illustrations of Inside the Java Virtual Machine at:
http://www.artima.com/insidejvm/applets/ConversionDiversion.html
About the author
Bill Venners provides custom software development and consulting
services in Silicon Valley under under the name Artima
Software Company. He has been object oriented for five years, primarily
working in C++ on MS Windows. Before that he did a lot of C on Unix and
assembly language on various microprocessors. He is currently focused on
Java.
This article was first published under the name Under the Hood: The Lean, Mean Virtual Machine in JavaWorld, a division of Web Publishing, Inc., September 1996.
|
Sponsored Links
|