Sponsored Link •
James Gosling talks with Bill Venners about visualizing software designs and understanding large-scale distributed systems.
For the past several years, Java's creator James Gosling has been working at Sun Labs, researching ways to analyze and manipulate programs represented as annotated parse trees, a project called Jackpot. Compilers have long built parse trees when they translate source code into binary. But traditionally, programmers have worked with source code primarily by manipulating text with editors. The goal of the Jackpot project is to investigate the value of treating the parse tree as the program at development time, not just at compile time.
In this interview, which is being published in multiple installments, James Gosling talks about many aspects of programming.
this!, Gosling describes the ways in which Jackpot can help programmers analyze, visualize, and refactor their programs.
Bill Venners: One thing I think could help people design software is being better able to visualize what they're designing. If you design a chair, you can see it. You can sit in it and get the feel of it. But when you design an object, you can only see what's on the screen, which is often just code.
You have said about your Jackpot research project that if the notion of truth in a program is the abstract syntax tree, not text, you can display the program in a lot of interesting ways. I would imagine that if the abstract syntax tree is stored persistently, then the code is just one view. In fact, you could have many different code views. If someone doesn't like curly braces, for example, they wouldn't have to look at curly braces. I could also imagine that programs could be viewed at more abstract levels that would help people see problems with their design. Is that kind of visualization what you're thinking about in your research?
James Gosling: Yes. Historically, people have used all kinds of ways to try and visualize their programs. Some are simplistic and kind of obvious, others are not. The mathematics world, for example, has a notation for expressing mathematical formulas that is much richer and more evocative than plus, minus, star, slash, parentheses, and variable names. People are pretty good at looking at a page of mathematics that has square root signs and exponents. So if you've got a piece of code full of gnarly math, it's probably a lot more comprehensible to the people who understand gnarly math if you actually display it in something that looks like conventional mathematics. That's one of the things our system will do. There have also been a number of attempts of come up with notations for visualizing control flow that is different than just code. For example, Nassi-Shneiderman diagrams are pretty good for representing decision trees.
In our research, we've been working to build a framework into which you can plug in various kinds of visualization techniques. The extent to which these visualization techniques are helpful is often context dependent. If you are writing a ray tracer, for example, the ability to do mathematical layout of code probably helps a lot. If you are writing a banking application, on the other hand, mathematical layout probably doesn't help at all. But in a banking application, specialized visualizations for state diagrams, database modeling, and database access might help. If you crack open database textbooks, you'll see diagrams of what databases look like. Maybe your program ought to look like that. Maybe your program would be a lot more comprehensible if it looked like that.
Bill Venners: What kind of things would change if the abstract syntax tree is the truth, not text? Would my source be binary? Wouldn't comments need to become first class parts of the language? Right now, comments are just part of the text and they're thrown away.
James Gosling: Oddly enough, one of the most painful things inside our system right now is dealing with comments. Javadoc comments, which people use in very clearly stylized ways, are basically in the grammar. Because of that, Javadoc comments are completely straightforward to deal with. But general comments that people put in random places are an unbelievable pain.
Bill Venners: Because you don't know what to attach them to?
James Gosling: Yes. We do a pretty good job with comments, but some people do very bizarre things with comments. Currently, we don't even try to guarantee perfect fidelity for arbitrarily bizarre comment usage.
Also, we actually don't represent the programs as binary in their persistent form. We represent them as a Java source file. We actually use the Java source file as the way to represent the parse tree.
Bill Venners: If I also edited the Java source by hand, would it break your tool?
James Gosling: No, you can do arbitrary editing and we figure it out. It became clear that any parse tree representation that we would come up with would be almost certainly slower to parse and take more disk space than Java source code. We can derive almost everything we have as annotations from the source code directly. Certainly all the type information is generated by inferencing in the type system. So that's pretty straightforward. We can discover many other things by doing various kinds of pattern matching. And we have also been attaching attributes to methods.
Bill Venners: How do you add attributes?
James Gosling: We're about to start using the 1.5 metadata specification explicitly as our mechanism for attaching persistent information to source code. The traditional way to do that, used for example by Borland JBuilder, is to put the metadata in comments. Often people use Javadoc comments as the way to store their metadata, and that actually worked remarkably well. But now there's a real metadata facility in 1.5.
Bill Venners: What kinds of metadata would you be adding? If I'm visualizing a class in UML, and I click and drag it to a new position, it seems like the positions would need to be attached to the classes themselves.
James Gosling: Yes. Somebody doing a UML editor could certainly attach metadata about the location of the boxes on the screen. Also, in a UML diagram you often distinguish between the important concepts and the fluffy concepts. If you want to make the diagram small, you leave out the fluffy concepts and just represent the important ones. So you could, for example, certainly have a piece of metadata attached to each field that says whether it's fluffy or important.
Bill Venners: How can you visualize and understand the complexity of distributed systems? For example, in enterprise systems today it is often hard to turn something off, because you don't know what's been connected to it over the years. Understanding the complexity of one big application seems like a hard problem, but more manageable than understanding a...
James Gosling: ...sea of things. Boy, there are a whole bunch of PhD theses ready to be had about that topic. It is really hard. For example, in the Web services model, you may publish a service descriptor that says, "Hi, I'm a service. This is what I take. Talk to me." Eventually, something needs to change in that service, or maybe the service has to go away for some reason. If you need to track down the dependencies in a large-scale system where dependencies get established in a completely dynamic and ad hoc basis, there's nothing that's as good as just maintaining a log of who has ever talked to you.
In some sense this is kind of a hopeless problem, and maybe that's OK. And I say, "maybe that's OK," because it really is a deeply difficult problem. For example, look at URLs, which make the Web work. Hypertext wasn't invented with the Web. Hypertext had been around as a concept for 20 or 30 years. The earliest popular description of this was this book called Computer Lib, a written a long time ago by Ted Nelson. That book really was about what you could do with hypertext, and he had this project called project Xanadu that was trying to do that. But they went off and did the usual computer science thing, which is to try to solve all the hard problems and make it perfect.
One of the hard problems is exactly what you were just asking about concerning distributed systems. You've got a reference to a remote resource. What happens if that remote resource moves? Should you keep the backtracking information? How do you keep the backtracking information? Solving that problem is really, really, really hard. Lots of people went running at that brick wall over, and over, and over again, trying to find a way to make these large scale distributed references really work. In the computer science academic world, it was generally considered that an internet link just wasn't of any value unless it could handle resource moving and renaming and issues like that.
In some sense, the brilliant thing that Tim Berners-Lee did was simply to say, "I don't care." For 20 years people had been failing to solve these problems in any large-scale way. Berners-Lee decided to just do the simple obvious thing that solves the problem he needed, namely, getting ahold of a resource. And that's actually an easy problem. Coming up with those names, URLs, is a relatively straightforward thing. He did that, and that enabled a lot of what the Web is today. But the Web has all these problems. What happens if a Web page moves or gets deleted? That is exactly the problem of maintaining or managing the configuration of any large scale distributed system. On the one hand, the URL design has made the Web somewhat fragile. Broken links are all over the place. On the other hand, if they had tried to really solve that problem, the Web never would have happened, because the problem is just too hard.
So philosophically, I really don't know. Dealing with dynamic systems with pieces that come and go is a really hard problem. There are all kinds of specialized solutions for specialized situations, but I've never seen anything like a set of general solutions. In some sense, this particular problem feels like one where unreliability may be a good thing, just because it makes the whole enterprise possible. Maybe people should just get over it.
Come back Monday, November 17 for Part II of a conversation with Ruby's creator Hiruhito (Matz) Matsumoto. I am now staggering the publication of several interviews at once, to give the reader variety. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
James Gosling's Home Page:
Anders Hejlsberg's comments on checked exceptions are in this interview, "The Trouble with Checked Exceptions":
Bruce Eckel wrote an essay questioning the value of checked exceptions, "Does Java Need Checked Exceptions?"
Bill Venners has interviewed James Gosling each year for the past five years. The four previous interviews are:
James Gosling discusses semantic models, mobile behavior, abstraction
versus vagueness, testing, and repetitive stress injury. (February 2002):
James Gosling speaks on inheritance and composition, JSPs and Servlets, community design processes, and more. (May 2001):
James Gosling discusses
developer tools, the realtime JVM, mobile objects, strict interfaces, and more. (May 2000):
James Gosling speaks on interfaces and protocols, servers and services, Jini, Java in the enterprise, and more. (May 1999):