Objects and Arrays

Java's Bytecodes that Deal with Objects and Arrays

by Bill Venners
December 15, 1996

First published in JavaWorld, December 1996
Summary
All Java programs are compiled into class files that contain bytecodes, the machine language of the Java virtual machine. This article takes a look at the bytecodes that manipulate objects and arrays.

Welcome to another edition of Under The Hood. This column focuses on Java's underlying technologies. It aims to give developers a glimpse of the mechanisms that make their Java programs run. This month's article takes a look at the bytecodes that deal with objects and arrays.

Object-oriented machine
The Java virtual machine (JVM) works with data in three forms: objects, object references, and primitive types. Objects reside on the garbage-collected heap. Object references and primitive types reside either on the Java stack as local variables, on the heap as instance variables of objects, or in the method area as class variables.

In the Java virtual machine, memory is allocated on the garbage-collected heap only as objects. There is no way to allocate memory for a primitive type on the heap, except as part of an object. If you want to use a primitive type where an Object reference is needed, you can allocate a wrapper object for the type from the java.lang package. For example, there is an Integer class that wraps an int type with an object. Only object references and primitive types can reside on the Java stack as local variables. Objects can never reside on the Java stack.

The architectural separation of objects and primitive types in the JVM is reflected in the Java programming language, in which objects cannot be declared as local variables. Only object references can be declared as such. Upon declaration, an object reference refers to nothing. Only after the reference has been explicitly initialized -- either with a reference to an existing object or with a call to new -- does the reference refer to an actual object.

In the JVM instruction set, all objects are instantiated and accessed with the same set of opcodes, except for arrays. In Java, arrays are full-fledged objects, and, like any other object in a Java program, are created dynamically. Array references can be used anywhere a reference to type Object is called for, and any method of Object can be invoked on an array. Yet, in the Java virtual machine, arrays are handled with special bytecodes.

As with any other object, arrays cannot be declared as local variables; only array references can. Array objects themselves always contain either an array of primitive types or an array of object references. If you declare an array of objects, you get an array of object references. The objects themselves must be explicitly created with new and assigned to the elements of the array.

Opcodes for objects
Instantiation of new objects is accomplished via the new opcode. Two one-byte operands follow the new opcode. These two bytes are combined to form a 16-bit index into the constant pool. The constant pool element at the specified offset gives information about the class of the new object. The JVM creates a new instance of the object on the heap and pushes the reference to the new object onto the stack, as shown below.

Object creation
Opcode Operand(s) Description
new indexbyte1, indexbyte2 creates a new object on the heap, pushes reference

The next table shows the opcodes that put and get object fields. These opcodes, putfield and getfield, operate only on fields that are instance variables. Static variables are accessed by putstatic and getstatic, which are described later. The putfield and getfield instructions each take two one-byte operands. The operands are combined to form a 16-bit index into the constant pool. The constant pool item at that index contains information about the type, size, and offset of the field. The object reference is taken from the stack in both the putfield and getfield instructions. The putfield instruction takes the instance variable value from the stack, and the getfield instruction pushes the retrieved instance variable value onto the stack.

Accessing instance variables
Opcode Operand(s) Description
putfield indexbyte1, indexbyte2 set field, indicated by index, of object to value (both taken from stack)
getfield indexbyte1, indexbyte2 pushes field, indicated by index, of object (taken from stack)

Class variables are accessed via the getstatic and putstatic opcodes, as shown in the table below. Both getstatic and putstatic take two one-byte operands, which are combined by the JVM to form a 16-bit unsigned offset into the constant pool. The constant pool item at that location gives information about one static field of a class. Because there is no particular object associated with a static field, there is no object reference used by either getstatic or putstatic. The putstatic instruction takes the value to assign from the stack. The getstatic instruction pushes the retrieved value onto the stack.

Accessing class variables
Opcode Operand(s) Description
putstatic indexbyte1, indexbyte2 set field, indicated by index, of object to value (both taken from stack)
getstatic indexbyte1, indexbyte2 pushes field, indicated by index, of object (taken from stack)

The following opcodes check to see whether the object reference on the top of the stack refers to an instance of the class or interface indexed by the operands following the opcode. The checkcast instruction throws CheckCastException if the object is not an instance of the specified class or interface. Otherwise, checkcast does nothing. The object reference remains on the stack and execution is continued at the next instruction. This instruction ensures that casts are safe at run time and forms part of the JVM's security blanket.

The instanceof instruction pops the object reference from the top of the stack and pushes true or false. If the object is indeed an instance of the specified class or interface, then true is pushed onto the stack, otherwise, false is pushed onto the stack. The instanceof instruction is used to implement the instanceof keyword of Java, which allows programmers to test whether an object is an instance of a particular class or interface.

Type checking
Opcode Operand(s) Description
checkcast indexbyte1, indexbyte2 Throws ClassCastException if objectref on stack cannot be cast to class at index
instanceof indexbyte1, indexbyte2 Pushes true if objectref on stack is an instanceof class at index, else pushes false

Opcodes for arrays

Instantiation of new arrays is accomplished via the newarray, anewarray, and multianewarray opcodes. The newarray opcode is used to create arrays of primitive types other than object references. The particular primitive type is specified by a single one-byte operand following the newarray opcode. The newarray instruction can create arrays for byte, short, char, int, long, float, double, or boolean.

The anewarray instruction creates an array of object references. Two one-byte operands follow the anewarray opcode and are combined to form a 16-bit index into the constant pool. A description of the class of object for which the array is to be created is found in the constant pool at the specified index. This instruction allocates space for the array of object references and initializes the references to null.

The multianewarray instruction is used to allocate multidimensional arrays -- which are simply arrays of arrays -- and could be allocated with repeated use of the anewarray and newarray instructions. The multianewarray instruction simply compresses the bytecodes needed to create multidimensional arrays into one instruction. Two one-byte operands follow the multianewarray opcode and are combined to form a 16-bit index into the constant pool. A description of the class of object for which the array is to be created is found in the constant pool at the specified index. Immediately following the two one-byte operands that form the constant pool index is a one-byte operand that specifies the number of dimensions in this multidimensional array. The sizes for each dimension are popped off the stack. This instruction allocates space for all arrays that are needed to implement the multidimensional arrays.

Creating new arrays
Opcode Operand(s) Description
newarray atype pops length, allocates new array of primitive types of type indicated by atype, pushes objectref of new array
anewarray indexbyte1, indexbyte2 pops length, allocates a new array of objects of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array
multianewarray indexbyte1, indexbyte2, dimensions pops dimensions number of array lengths, allocates a new multidimensional array of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array

The next table shows the instruction that pops an array reference off the top of the stack and pushes the length of that array.

Getting the array length
Opcode Operand(s) Description
arraylength (none) pops objectref of an array, pushes length of that array

The following opcodes retrieve an element from an array. The array index and array reference are popped from the stack, and the value at the specified index of the specified array is pushed back onto the stack.

Retrieving an array element
Opcode Operand(s) Description
baload (none) pops index and arrayref of an array of bytes, pushes arrayref[index]
caload (none) pops index and arrayref of an array of chars, pushes arrayref[index]
saload (none) pops index and arrayref of an array of shorts, pushes arrayref[index]
iaload (none) pops index and arrayref of an array of ints, pushes arrayref[index]
laload (none) pops index and arrayref of an array of longs, pushes arrayref[index]
faload (none) pops index and arrayref of an array of floats, pushes arrayref[index]
daload (none) pops index and arrayref of an array of doubles, pushes arrayref[index]
aaload (none) pops index and arrayref of an array of objectrefs, pushes arrayref[index]

The next table shows the opcodes that store a value into an array element. The value, index, and array reference are popped from the top of the stack.

Storing to an array element
Opcode Operand(s) Description
bastore (none) pops value, index, and arrayref of an array of bytes, assigns arrayref[index] = value
castore (none) pops value, index, and arrayref of an array of chars, assigns arrayref[index] = value
sastore (none) pops value, index, and arrayref of an array of shorts, assigns arrayref[index] = value
iastore (none) pops value, index, and arrayref of an array of ints, assigns arrayref[index] = value
lastore (none) pops value, index, and arrayref of an array of longs, assigns arrayref[index] = value
fastore (none) pops value, index, and arrayref of an array of floats, assigns arrayref[index] = value
dastore (none) pops value, index, and arrayref of an array of doubles, assigns arrayref[index] = value
aastore (none) pops value, index, and arrayref of an array of objectrefs, assigns arrayref[index] = value

Three-dimensional array: a Java virtual machine simulation

The applet below demonstrates a Java virtual machine executing a sequence of bytecodes. The bytecode sequence in the simulation was generated by javac for the initAnArray() method of the class shown below:

class ArrayDemo {

    static void initAnArray() {

        int[][][] threeD = new int[5][4][3];

        for (int i = 0; i < 5; ++i) {
            for (int j = 0; j < 4; ++j) {
                for (int k = 0; k < 3; ++k) {
                    threeD[i][j][k] = i + j + k;
                }
            }
        }
    }
}

The bytecodes generated by javac for initAnArray() are shown below:

   0 iconst_5 // Push constant int 5.
   1 iconst_4 // Push constant int 4.
   2 iconst_3 // Push constant int 3.
                          // Create a new multi-dimensional array using constant pool
                          // entry #2 as the class (which is [[[I, an 3D array of ints)
                          // with a dimension of 3.
   3 multianewarray #2 dim #3 <Class [[[I>
   7 astore_0 // Pop object ref into local variable 0: int threeD[][][] = new int[5][4][3];
   8 iconst_0 // Push constant int 0.
   9 istore_1 // Pop int into local variable 1: int i = 0;
  10 goto 54 // Go to section of code that tests outer loop.
  13 iconst_0 // Push constant int 0.
  14 istore_2 // Pop int into local variable 2: int j = 0;
  15 goto 46 // Go to section of code that tests middle loop.
  18 iconst_0 // Push constant int 0.
  19 istore_3 // Pop int into local variable 3: int k = 0;
  20 goto 38 // Go to section of code that tests inner loop.
  23 aload_0 // Push object ref from local variable 0.
  24 iload_1 // Push int from local variable 1 (i).
  25 aaload // Pop index and arrayref, push object ref at arrayref[index] (gets threeD[i]).
  26 iload_2 // Push int from local variable 2 (j).
  27 aaload // Pop index and arrayref, push object ref at arrayref[index] (gets threeD[i][j]).
  28 iload_3 // Push int from local variable 3 (k).
                          // Now calculate the int that will be assigned to threeD[i][j][k]
  29 iload_1 // Push int from local variable 1 (i).
  30 iload_2 // Push int from local variable 2 (j).
  31 iadd // Pop two ints, add them, push int result (i + j).
  32 iload_3 // Push int from local variable 3 (k).
  33 iadd // Pop two ints, add them, push int result (i + j + k).
  34 iastore // Pop value, index, and arrayref; assign arrayref[index] = value: threeD[i][j][k] = i + j + k;
  35 iinc 3 1 // Increment by 1 the int in local variable 3: ++k;
  38 iload_3 // Push int from local variable 3 (k).
  39 iconst_3 // Push constant int 3.
  40 if_icmplt 23 // Pop right and left ints, jump if left < right: for (...; k < 3;...)
  43 iinc 2 1 // Increment by 1 the int in local variable 2: ++j;
  46 iload_2 // Push int from local variable 2 (j).
  47 iconst_4 // Push constant int 4.
  48 if_icmplt 18 // Pop right and left ints, jump if left < right: for (...; j < 4;...)
  51 iinc 1 1 // Increment by 1 the int in local variable 1: ++i;
  54 iload_1 // Push int from local variable 1 (i).
  55 iconst_5 // Push constant int 5.
  56 if_icmplt 13 // Pop right and left ints, jump if left < right: for (...; i < 5;...)
  59 return

The initAnArray() method merely allocates and initializes a three-dimensional array. This simulation demonstrates how the Java virtual machine handles multidimensional arrays. In response to the multianewarray instruction, which in this example requests the allocation of a three-dimensional array, the JVM creates a tree of one-dimensional arrays. The reference returned by the multianewarray instruction refers to the base one-dimensional array in the tree. In the initAnArray() method, the base array has five components -- threeD[0] through threeD[4]. Each component of the base array is itself a reference to a one-dimensional array of four components, accessed by threeD[0][0] through threeD[4][3]. The components of these five arrays are also references to arrays, each of which has three components. These components are ints, the elements of this multidimensional array, and they are accessed by threeD[0][0][0] through threeD[4][3][2].

In response to the multianewarray instruction in the initAnArray() method, the Java virtual machine creates one five-dimensional array of arrays, five four-dimensional arrays of arrays, and twenty three-dimensional arrays of ints. The JVM allocates these 26 arrays on the heap, initializes their components such that they form a tree, and returns the reference to the base array.

To assign an int value to an element of the three-dimensional array, the JVM uses aaload to get a component of the base array. Then the JVM uses aaload again on this component -- which is itself an array of arrays -- to get a component of the branch array. This component is a reference to a leaf array of ints. Finally the JVM uses iastore to assign an int value to the element of the leaf array. The JVM uses multiple one-dimensional array accesses to accomplish operations on multidimensional arrays.

To drive the simulation, just press the Step button. Each press of this button will cause the Java virtual machine to execute one bytecode instruction. To start the simulation over, press the Reset button. To cause the JVM to repeatedly execute bytecodes with no further coaxing on your part, press the Run button. The JVM will then execute the bytecodes until the Stop button is pressed. The return instruction in the bytecode sequence generated by javac has been replaced by a breakpoint instruction in the simulation's bytecode sequence. In this case, the breakpoint instruction just causes the simulator to stop. The text area at the bottom of the applet describes the next instruction to be executed. Happy clicking.

To view the Three Dimensional Array applet, visit the interactive illustrations of Inside the Java Virtual Machine at:

http://www.artima.com/insidejvm/applets/ThreeDArray.html

Resources

This article was first published under the name Objects and arrays in JavaWorld, a division of Web Publishing, Inc., December 1996.

Talk back!

Have an opinion? Be the first to post a comment about this article.

About the author

Bill Venners has been writing software professionally for 12 years. Based in Silicon Valley, he provides software consulting and training services under the name Artima Software Company. Over the years he has developed software for the consumer electronics, education, semiconductor, and life insurance industries. He has programmed in many languages on many platforms: assembly language on various microprocessors, C on Unix, C++ on Windows, Java on the Web. He is author of the book: Inside the Java Virtual Machine, published by McGraw-Hill.