The Class File Lifestyle

The Structure and Lifestyle of the Java Class File

by Bill Venners
June 15, 1996

First published in JavaWorld, June 1996
Summary
A key component of Java is the Java class file -- a precisely defined file format to which Java programs are compiled. The class file can be loaded by any Java Virtual Machine implementation and is the vehicle for the transmission of compiled Java across networks. Here's a hands-on introduction to the Java class file.

Welcome to another installment of "Under the Hood." In last month's article I discussed the Java Virtual Machine, or JVM, the abstract computer for which all Java programs are compiled. If you are unfamiliar with the JVM, you may want to read last month's article before this one. In this article I provide a glimpse into the basic structure and lifestyle of the Java class file.

Born to travel
The Java class file is a precisely defined format for compiled Java. Java source code is compiled into class files that can be loaded and executed by any JVM. The class files may travel across a network before being loaded by the JVM.

In fact, if you are reading this article via a Java-capable browser, class files for the simulation applet at the end of the article are flying across the Internet to your computer right now. If you'd like to listen in on them (and your computer has audio capability), push the following button:

You need a Java-enabled browser to view this applet

Sounds like they're having fun, huh? That's in their nature. Java class files were designed to travel well. They are platform-independent, so they will be welcome in more places. They contain bytecodes, the compact instruction set for the JVM, so they can travel light. Java class files are constantly zipping through networks at breakneck speed to arrive at JVMs all over the world.

What's in a class file?
The Java class file contains everything a JVM needs to know about one Java class or interface. In their order of appearance in the class file, the major components are: magic, version, constant pool, access flags, this class, super class, interfaces, fields, methods, and attributes.

Information stored in the class file often varies in length -- that is, the actual length of the information cannot be predicted before loading the class file. For instance, the number of methods listed in the methods component can differ among class files, because it depends on the number of methods defined in the source code. Such information is organized in the class file by prefacing the actual information by its size or length. This way, when the class is being loaded by the JVM, the size of variable-length information is read first. Once the JVM knows the size, it can correctly read in the actual information.

Information is generally written to the class file with no space or padding between consecutive pieces of information; everything is aligned on byte boundaries. This helps keeps class files petite so they will be aerodynamic as they fly across networks.

The order of class file components is strictly defined so JVMs can know what to expect, and where to expect it, when loading a class file. For example, every JVM knows that the first eight bytes of a class file contain the magic and version numbers, that the constant pool starts on the ninth byte, and that the access flags follow the constant pool. But because the constant pool is variable-length, it doesn't know the exact whereabouts of the access flags until it has finished reading in the constant pool. Once it has finished reading in the constant pool, it knows the next two bytes will be the access flags.

Magic and version numbers
The first four bytes of every class file are always 0xCAFEBABE. This magic number makes Java class files easier to identify, because the odds are slim that non-class files would start with the same initial four bytes. The number is called magic because it can be pulled out of a hat by the file format designers. The only requirement is that it is not already being used by another file format that may be encountered in the real world. According to Patrick Naughton, a key member of the original Java team, the magic number was chosen "long before the name Java was ever uttered in reference to this language. We were looking for something fun, unique, and easy to remember. It is only a coincidence that OxCAFEBABE, an oblique reference to the cute baristas at Peet's Coffee, was foreshadowing for the name Java."

The second four bytes of the class file contain the major and minor version numbers. These numbers identify the version of the class file format to which a particular class file adheres and allow JVMs to verify that the class file is loadable. Every JVM has a maximum version it can load, and JVMs will reject class files with later versions.

Constant pool
The class file stores constants associated with its class or interface in the constant pool. Some constants that may be seen frolicking in the pool are literal strings, final variable values, class names, interface names, variable names and types, and method names and signatures. A method signature is its return type and set of argument types.

The constant pool is organized as an array of variable-length elements. Each constant occupies one element in the array. Throughout the class file, constants are referred to by the integer index that indicates their position in the array. The initial constant has an index of one, the second constant has an index of two, etc. The constant pool array is preceded by its array size, so JVMs will know how many constants to expect when loading the class file.

Each element of the constant pool starts with a one-byte tag specifying the type of constant at that position in the array. Once a JVM grabs and interprets this tag, it knows what follows the tag. For example, if a tag indicates the constant is a string, the JVM expects the next two bytes to be the string length. Following this two-byte length, the JVM expects to find length number of bytes, which make up the characters of the string.

In the remainder of the article I'll sometimes refer to the nth element of the constant pool array as constant_pool[n]. This makes sense to the extent the constant pool is organized like an array, but bear in mind that these elements have different sizes and types and that the first element has an index of one.

Access flags
The first two bytes after the constant pool, the access flags, indicate whether or not this file defines a class or an interface, whether the class or interface is public or abstract, and (if it's a class and not an interface) whether the class is final.

This class
The next two bytes, the this class component, are an index into the constant pool array. The constant referred to by this class, constant_pool[this_class], has two parts, a one-byte tag and a two-byte name index. The tag will equal CONSTANT_Class, a value that indicates this element contains information about a class or interface. Constant_pool[name_index] is a string constant containing the name of the class or interface.

The this class component provides a glimpse of how the constant pool is used. This class itself is just an index into the constant pool. When a JVM looks up constant_pool[this_class], it finds an element that identifies itself as a CONSTANT_Class with its tag. The JVM knows CONSTANT_Class elements always have a two-byte index into the constant pool, called name index, following their one-byte tag. So it looks up constant_pool[name_index] to get the string containing the name of the class or interface.

Super class
Following the this class component is the super class component, another two-byte index into the constant pool. Constant_pool[super_class] is a CONSTANT_Class element that points to the name of the super class from which this class descends.

Interfaces
The interfaces component starts with a two-byte count of the number of interfaces implemented by the class (or interface) defined in the file. Immediately following is an array that contains one index into the constant pool for each interface implemented by the class. Each interface is represented by a CONSTANT_Class element in the constant pool that points to the name of the interface.

Fields
The fields component starts with a two-byte count of the number of fields in this class or interface. A field is an instance or class variable of the class or interface. Following the count is an array of variable-length structures, one for each field. Each structure reveals information about one field such as the field's name, type, and, if it is a final variable, its constant value. Some information is contained in the structure itself, and some is contained in constant pool locations pointed to by the structure.

The only fields that appear in the list are those that were declared by the class or interface defined in the file; no fields inherited from super classes or superinterfaces appear in the list.

Methods
The methods component starts with a two-byte count of the number of methods in the class or interface. This count includes only those methods that are explicitly defined by this class, not any methods that may be inherited from superclasses. Following the method count are the methods themselves.

The structure for each method contains several pieces of information about the method, including the method descriptor (its return type and argument list), the number of stack words required for the method's local variables, the maximum number of stack words required for the method's operand stack, a table of exceptions caught by the method, the bytecode sequence, and a line number table.

Attributes
Bringing up the rear are the attributes, which give general information about the particular class or interface defined by the file. The attributes section has a two-byte count of the number of attributes, followed by the attributes themselves. For example, one attribute is the source code attribute; it reveals the name of the source file from which this class file was compiled. JVMs will silently ignore any attributes they don't recognize.

Getting loaded: a simulation of a class file reaching its JVM destination
The applet below simulates a JVM loading a class file. The class file being loaded in the simulation was generated by the javac compiler given the following Java source code:

class Act {
    public static void doMathForever() {
        int i = 0;
        while (true) {
            i += 1;
            i *= 2;
        }
    }
}

The above snippet of code comes from last month's article about the JVM. It is the same doMathForever() method executed by the EternalMath applet from last month's article. I chose this code to provide a real example that wasn't too complex. Although the code may not be very useful in the real world, it does compile to a real class file, which is loaded by the simulation below.

The GettingLoaded applet allows you to drive the class load simulation one step at a time. For each step along the way you can read about the next chunk of bytes that is about to be consumed and interpreted by the JVM. Just press the "Step" button to cause the JVM to consume the next chunk. Pressing "Back" will undo the previous step, and pressing "Reset" will return the simulation to its original state, allowing you to start over from the beginning.

The JVM is shown at the bottom left consuming the stream of bytes that makes up the class file Act.class. The bytes are shown in hex streaming out of a server on the bottom right. The bytes travel right to left, between the server and the JVM, one chunk at a time. The chunk of bytes to be consumed by the JVM on the next "Step" button press are shown in red. These highlighted bytes are described in the large text area above the JVM. Any remaining bytes beyond the next chunk are shown in black.

I've tried to fully explain each chunk of bytes in the text area. There is a lot of detail, therefore, in the text area and you may wish to skim through all the steps first to get the general idea, then look back for more details.

Happy clicking.

You need a Java-enabled browser to view this applet.

Click here for the source code of GettingLoaded. To run this applet on your own, you'll also need the two files that this applet retrieves from the server, the ASCII file that contains the text for each step and the Act.class file itself. Click here for the source code of the Flying Class Files audio applet.

Resources

  • The Java Virtual Machine Specification, the official word from Sun.
    http://java.sun.com/1.0alpha3/doc/vmspec/vmspec_1.html
  • When it comes out, the book The Java Virtual Machine Specification, http://www.aw.com/cp/lindholm-yellin.html, by Tim Lindholm and Frank Yellin (ISBN 0-201-63452-X), part of The Java Series, http://www.aw.com/cp/javaseries.html), from Addison-Wesley, will likely be the best JVM resource.
  • A draft of chapter 4 of The Java Virtual Machine Specification, which describes the class file format and bytecode verifier, can be retrieved from JavaSoft.
    http://java.sun.com/java.sun.com/newdocs.html

This article was first published under the name The Class File Lifestyle in JavaWorld, a division of Web Publishing, Inc., June 1996.

Talk back!

Have an opinion? Be the first to post a comment about this article.

About the author

Bill Venners has been writing software professionally for 12 years. Based in Silicon Valley, he provides software consulting and training services under the name Artima Software Company. Over the years he has developed software for the consumer electronics, education, semiconductor, and life insurance industries. He has programmed in many languages on many platforms: assembly language on various microprocessors, C on Unix, C++ on Windows, Java on the Web. He is author of the book: Inside the Java Virtual Machine, published by McGraw-Hill.