This article is sponsored by the Java Community Process.
The Java Compiler API

A Conversation with Peter von der Ahé

by Frank Sommers
April 3, 2007

Most developers think of the Java compiler, javac, as an unobtrusive command-line tool to invoke when you want to turn Java source code into class files. The Java Compiler API, JSR 199, released in final form last December, opens up the Java compiler to programmatic interaction as well. Artima spoke with JSR 199 spec lead and Sun engineer Peter von der Ahé about what programmatic compiler access means for developers.

Frank Sommers: Can you start by giving us a bird's eye view of the JSR 199 API?

Peter von der Ahé: The JSR 199 Compiler API consists of three things: The first one basically allows you to invoke a compiler via the API. Second, the API allows you to customize how the compiler finds and writes out files. I mean files in the abstract sense, since the files the compiler deals with aren't necessarily on the file system. JSR 199's file abstraction allows you to have files in a database, and to generate output directly to memory, for example. Finally, the JSR 199 API lets you collect diagnostics from the compiler in a structured way so that you can easily transform error messages, for instance, into lines in an IDE's editor.

Frank Sommers: How do you expect the Java Compiler API to impact developers' work?

Peter von der Ahé: The main benefits to developers are indirect in that the JSR 199 API allows betters tools, better deployment time, and better infrastructure to exist.

For example, one of the benefits of having a compiler API is that you can make compilation part of an application-level service. Consider the case when you upload JSP code to an app server: The server has to analyze the JSP files, generate Java source code files from the JSPs, write those files out to disk, invoke an external compiler that then reads the generated Java source code files from disk, writes the class files back to disk, and then the app server needs to read those class files into memory. With the Compiler API, you can keep the compiler running in that app server, and keep all of that in memory. That can reduce deployment time, and also eliminates the startup overhead of the compiler.

To mention another example, say, you have an app server that stores most of its data in a database, and is highly optimized for database access. It is then natural to store not just the data, but also the program in a database. Before the Compiler API, you had to take the program data out of the database, put it on the file system, run the compiler as an external process, which would then have to start up, incurring a time overhead. And once you've generated some results, you'd have to copy those back into the database. The compiler API allows you to shortcut these steps, since it can consume files directly from the database, thus allowing better integration with the database.

Another benefit for developers is that IDEs and other developer tools can more tightly integrate with compilers. By using the JSR 199 API, you can invoke a compiler directly from within an IDE's editor, or from build tools, such as Ant. Those tools then have a tighter control over the compiler. In an application area where compilers are used a lot, reducing the compile time significantly with the Compiler API can have a big impact.

As a result, I think the expectation from developers will be higher for tools that integrate with the compiler API. While I don't think the Compiler API will fundamentally change how developers interact with their IDEs, it's the combination of various subtle things the API allows that will make a lot of difference.

Frank Sommers: To what extent will that tighter compiler integration be available in the upcoming NetBeans 6.0 release?

Peter von der Ahé: Again, it's a combination of small things that will make NetBeans 6 look very distinct from NetBeans 5 in the editor area. For NetBeans 6.0, we completely rewired the guts of the Java source code, using the compiler to implement the editor.

That means a couple of things. For example, we expect quick NetBeans integration of whatever new languages features will be put into JDK 7. Just what those features will be, is an open issue. But once those are implemented in the compiler, we expect that most of the new language features will work in NetBeans with not too much extra work.

Another thing is that simply more information is available about your program to the editor. For example, does a method override another method? Or how many times are these methods get overridden in subclasses? If you're editing a superclass, you can see how many classes override a specific class or method. Code completion pops up faster, and there is less overhead.

Note that the compiler integration is just a means to an end. NetBeans 6 simply has a better Java source code editor. The details of how we achieve that just means that we're working more closely together and provide more direct access to the underlying compiler.

I keep saying NetBeans, but there are other IDEs out there. All the changes we made to the compiler are in Java SE 6, and there are more changes to come, of course, as we move along. Those are out for everybody to use. But at the moment, NetBeans 6 is the only IDE I know that uses Java 6's compiler features directly.

Frank Sommers: Based on your description, JSR 199 is really an API around the Java compiler. How deep inside the actual compiler can you get with this API?

Peter von der Ahé: Quite deep. JSR 199 works together with the annotation processing API, JSR 269, for instance, that presents a compile-time model similar to core reflection. Just as when running a Java program you can reflect on objects and examine the structure of objects and class files, the annotation framework allows you to examine the classes the compiler is compiling. That lets you get fairly deep into the compiler data structures.

In addition, Sun is providing access to the compiler syntax tree, a feature relied on very heavily by the upcoming version of NetBeans. While this is not directly in JSR 199, we're providing that capability in subclasses of the 199 API. At the moment, this only works on Sun's compiler, but providing standardized access to a tree API is something we might want to consider in the future.

Before we could think of a standard way to access the compiler tree, though, we needed to have access to the tree API first. In the past, we didn't even provide a stable API to those internal compiler structures. Instead of trying to standardize on a tree API at this stage, we made a more stable version of the API. You can say that the tree API we're providing now is a first try in that direction. At a later stage, we'll have to reconcile differences between various compilers, and then we will perhaps look at proposing a standard API for syntax trees.

Frank Sommers: How does parsing work in the current compiler?

Peter von der Ahé: In JSR 199, there is a construct called a compilation task. If you access the Sun subclass of that class, then you get additional utilities from that subclass. One of them is to be able to parse files. You can specify files using the file abstraction from JSR 199. If you give a parse() method files in that form, that method returns a list of abstract syntax trees. You can then run through the syntax trees and analyze them in your program.

Note that the tree API we're exposing is not mutable—you cannot change the syntax tree. Providing a mutable API for that would involve a lot of challenges, as we learned from experience with some NetBeans-related tools, notably the Jackpot project.

The Jackpot project learned that if you don't allow modification of trees, but instead copy trees to a new version when you want to make modifications, then you can compare the old version of a tree to a new version. If a user later decides that a modification didn't work, you can just throw away that modification, or the copy representing that modification, and go back to the old version. That turns out to be a great way to perform undo during refactoring. Actually, what Jackpot provides goes beyond refactoring—it's more apt to call that code re-engineering. While the compiler doesn't directly support this, Jackpot extends the compiler's capabilities to recover trees in that manner.

Frank Sommers: You spoke before about the Kitchen Sink project that provides an experimental playground for new Java language features. If I have an idea for a great new Java programming language feature, what steps would I have to take to implement that feature in the open-source compiler?

Peter von der Ahé: When it comes to language features, the first thing you want to do is not touch the compiler. The really important part comes from first sitting down and thinking really hard about the specification of that change. Think about how you want certain things to work. Consider the various corner cases, and how you want to solve them.

That will give you a good starting point for language features, because you want to have fairly clear ideas how the syntax should look before you start modifying especially the parser. You want to think about whether there should be new types of syntax trees in the compiler so when the parser is analyzing the new language features, it could create those trees.

Once you have a good plan for what you want to do, it should be fairly easy to modify the parser. Note that we have a hand-written parser in javac, and that the compiler relies extensively on the visitor pattern. One of the things you need to decide is whether you'll need to extend the visitors to support your new language feature.

Whether you are reusing an existing syntax tree or adding a new one, you will know where to look for changes once you figure out which syntax tree you're looking at. You'll want to go through all the phases of the compiler, and see what's going on in each area. It's not trivial, but it's not as daunting a task as one might think.

It's also possible to plug in a different parser. But then we're not talking about a standard API any more, but about going in and making changes to the internals of the compiler—you're off the beaten path, and things at that level may change as we change the implementation of the compiler.

Frank Sommers: The JSR 199 specs state that the compiler should generate a valid Java class. Can you modify the compiler so that you output something other than Java classes, even from Java source code? Likewise, can you modify the compiler to read in some other language and generate Java class files from that language?

Peter von der Ahé: You could probably use the compiler to read some other language, such as JavaScript, for instance, but then we get into details of our conformance rules, and what you can call Java and what you can't call that. That's really a legal issue that I would prefer to steer clear of.

We have discussed internally at Sun whether it would be a good idea to reuse compiler technology and implementations between the Java programming language, C++, and JavaScript, for example. One use of that would be in the NetBeans IDE. When you look at the Java programming language, C, and C++, they share to some extent certain traits. If you create a compiler that generates code in some intermediate format, as in GCC, then you can probably share some technology there, especially in the area of optimization in the back-end. However, when it comes to applications, such as IDEs, then the differences become more of a stumbling block.

In fact, such reuse was already tried, and we decided to stop doing that. You will see the benefits of not doing that once NetBeans 6 comes out, because the newly-implemented Java source code editor makes the Java editor very specific to the Java programming language. Separately, we can have a specific JavaScript editor. So it's not as much reusing compiler technology, as it is re-using experiences.

I could imagine that if someone was writing a compiler for some scripting language to get it running on the JVM, they could reuse a few classes that make up the back-end of javac. But even then, they would have to do a lot of work, because javac is targeted only at generating class files from the Java programming language.

Frank Sommers: Sun open-sourced javac last November along with the JDK, and the JCP followed that with the final release JSR 199. Going forward, what upcoming compiler-related features are you most excited about?

Peter von der Ahé: Right now, I'm most excited about the community involvement. For example, with the Kitchen Sink project we talked about, the potential is really exciting: we have now a more open way to evaluate various language features, and see what works the best.

I'm also excited about some of the possible new language features for JDK 7 that the compiler will support. While no decisions have been formed on the exact set of new Java programming language features to ship in JDK 7, let me say that I'm very optimistic about closures, based on the proposal led by Neal Gafter. As with other new language features, it's now possible to see how they'll look in practice, since the open-source compiler makes it easier to build an implementation


JSR 199, the Java Compiler API

JSR 269: Pluggable Annotation Processing API

The OpenJDK Project's javac section:

The Kitchen Sink Language project:

Artima interviews Peter von der Ahé on the Kitchen Sink project:

NetBeans 6.0 Milestone builds:

Artima interviews Sun's Tim Boudreau on upcoming NetBeans 6.0 features:

Peter von der Ahé's Blog:

Kin-Man Chung on Speed up JSP compilations in GlassFish with JSR 199 Java Compiler Interface:

Interview with Jackpot Team Leader Tom Ball

Talk back!

Have an opinion? Readers have already posted 2 comments about this article. Why not add yours?

About the author

Frank Sommers is president of Autospaces, Inc.