Sponsored Link •
Java's Creator James Gosling talks with Bill Venners about the current state of the Java language.
On Tuesday, February 19, 2002, I interviewed Sun Microsystems Vice President and Fellow James Gosling at his Sun Labs office in Mountain View, California. JavaWorld has published Part I of this interview. Since the interview was so full of interesting bits of James's wisdom, I requested and JavaWorld was kind enough to let me publish the rest here.
When James Gosling is not out preaching the virtues of Java technology, he spends his days in a quiet corner office at Sun Labs, dreaming up new ways to help programmers manage complexity. Bill Venners recently visited Gosling in his office for his annual JavaWorld interview. With his Borg mask (donned at a prior JavaOne keynote) staring down from a shelf above him, Gosling discussed semantic models, mobile behavior, abstraction versus vagueness, the importance of testing, and repetitive stress injury.
Bill Venners: Someone once asked you what innovation felt like, and you said, "Well, it's not like a light bulb going off. It's more like there's something irritating you and you fix it until it goes away." What is irritating you these days? What things are you trying to make go away?
James Gosling: Lately, I've actually been spending about half of my time on, sadly, being corporate spokesperson. In the time left over, during which I get to do my actual work, I've been creating software development tools based on semantic software models rather than textual representations.
What can you do with a piece of software when it's represented as a tree? There's a long history of trying to do this. People often try to build structure editors that way, but tend to fail for a variety of reasons. But some things work nicely in a structural representation, which is how you build a semantic model. So I've been fussing with what you can do with structural representation programs as opposed to textual representations.
Bill Venners: What is a semantic model, and why do you like to manipulate them?
James Gosling: The usual program representation that people manipulate is just text. What the program has to play with is the letters --
'(', and so on. That's what the data structure looks like -- just a series of letters left to right, top to bottom.
If you want to get more information, you must extract it from that series, which can be very difficult.
When people talk about a syntax tree for a program, it looks a lot like a standard binary tree that people would learn about in school. But the nodes all have labels on them. There are things like, "This is a plus node. In the left and right shoulder are the operands of the addition." Then you start labeling the tree with more information: "This is a symbol. It came from that declaration. This addition node has this type because the operands have these types." Once you generate this tree representation, you then annotate it with all the information about types, variable declarations, variable lifetimes, method bindings, and the rest. That's generally called a semantic model, where you essentially have complete resolved information about the program's structure.
Bill Venners: Semantics means meaning. Are you perhaps trying to capture more of the programmer's intent than current compilers?
James Gosling: There's been a lot of debate about what it means to represent a programmer's intent. Generally, representations of that intent turn into some kind of mathematics, something that often gets sent into a verifier. Nobody has discovered a way to express high-level intent that is much better than modern programming languages. The mathematical preconditions and post-conditions tend to be relatively similar; their big advantage is often that you do things twice. If you're going to make a mistake, you often make mistakes in two different ways, so you can compare.
People have come up with techniques for doing certain kinds of semantic modeling, like flow charts and finite state machines. Some of these techniques have software representations; mostly they have significant limitations. Things like finite state machines are pretty good for representing things like business processes. Many systems that do business process modeling use finite state machines. They all have the problem of hiding a lot of complexity because they're not full-blown Turing equivalent languages. There are always classes of problems that you cannot solve with them. So, it tends to always devolve down to you have to use a general purpose programming language. How do you make a general purpose programming language more comprehensible, especially when the systems you're trying to manipulate are very, very large?
A lot of the stuff in Java is designed around building large systems, and people have built multimillion line systems with it. But still, given the state of the art in tools, these systems become very difficult to manipulate.
Bill Venners: I see. Is the point of using a semantic model then to make it easier to make changes to those large systems?
James Gosling: To analyze them, to understand them, to make changes to them. A wide variety of things are a lot easier when you have that kind of representation. There's one school of thought called refactoring. The refactoring camp has developed a lot of transformations. They're often simple things, like renaming a class. But if you have a system with a million lines of code and you want to rename one class, it becomes extraordinarily difficult. You have to find all the places that use that class, and only those that just use that class, not something whose name happens to look a lot like that class. It's similar to if you try to move methods around.
The refactoring school is a lot about what happens when you're trying to reorganize. This can end up being surprisingly important given the way systems evolve. Systems are almost never really architected in their entirety. What happens is you start with some system design that you architect, and then it grows. Most nice, clean architectures turn into some horrible nightmare given just a few years of growth. Refactoring is all about how you rearrange things and keep them tidy.
But you can do lots of other things with structural representations, like finding places where certain idioms are used and rewriting them,
finding places where certain kinds of errors exist. You can do static flow analysis. Say you have an API that has a lock and an unlock method.
You can say, "Find any path where I lock this data structure, but then I don't unlock it. Find all the
if statements that can be turned into conditional expressions. Factor out this variable from that loop."
You can do all kinds of interesting transformations that are very difficult in text, but very easy in a structural model.
Bill Venners: If I were to use your tool someday, would I build and use a semantic model of my program as I'm typing code, rather than the usual edit-compile-edit-compile cycle?
James Gosling: Yes, though in my current experimental test bed, you don't type in text.
Bill Venners: You just think?
James Gosling: It sort of slurps in your program and then you say, "Apply this transformation, apply that transformation." Right now I'm not trying to replace regular editors. I may actually do a little plug-in that lets you edit text in it. But right now, this is an experimental test bench. It's not something anybody would actually want to use.