Sponsored Link •
Java's Creator James Gosling talks with Bill Venners about the current state of the Java language.
On Tuesday, February 19, 2002, I interviewed Sun Microsystems Vice President and Fellow James Gosling at his Sun Labs office in Mountain View, California. JavaWorld has published Part I of this interview. Since the interview was so full of interesting bits of James's wisdom, I requested and JavaWorld was kind enough to let me publish the rest here.
When James Gosling is not out preaching the virtues of Java technology, he spends his days in a quiet corner office at Sun Labs, dreaming up new ways to help programmers manage complexity. Bill Venners recently visited Gosling in his office for his annual JavaWorld interview. With his Borg mask (donned at a prior JavaOne keynote) staring down from a shelf above him, Gosling discussed semantic models, mobile behavior, abstraction versus vagueness, the importance of testing, and repetitive stress injury.
Bill Venners: Someone once asked you what innovation felt like, and you said, "Well, it's not like a light bulb going off. It's more like there's something irritating you and you fix it until it goes away." What is irritating you these days? What things are you trying to make go away?
James Gosling: Lately, I've actually been spending about half of my time on, sadly, being corporate spokesperson. In the time left over, during which I get to do my actual work, I've been creating software development tools based on semantic software models rather than textual representations.
What can you do with a piece of software when it's represented as a tree? There's a long history of trying to do this. People often try to build structure editors that way, but tend to fail for a variety of reasons. But some things work nicely in a structural representation, which is how you build a semantic model. So I've been fussing with what you can do with structural representation programs as opposed to textual representations.
Bill Venners: What is a semantic model, and why do you like to manipulate them?
James Gosling: The usual program representation that people manipulate is just text. What the program has to play with is the letters --
'(', and so on. That's what the data structure looks like -- just a series of letters left to right, top to bottom.
If you want to get more information, you must extract it from that series, which can be very difficult.
When people talk about a syntax tree for a program, it looks a lot like a standard binary tree that people would learn about in school. But the nodes all have labels on them. There are things like, "This is a plus node. In the left and right shoulder are the operands of the addition." Then you start labeling the tree with more information: "This is a symbol. It came from that declaration. This addition node has this type because the operands have these types." Once you generate this tree representation, you then annotate it with all the information about types, variable declarations, variable lifetimes, method bindings, and the rest. That's generally called a semantic model, where you essentially have complete resolved information about the program's structure.
Bill Venners: Semantics means meaning. Are you perhaps trying to capture more of the programmer's intent than current compilers?
James Gosling: There's been a lot of debate about what it means to represent a programmer's intent. Generally, representations of that intent turn into some kind of mathematics, something that often gets sent into a verifier. Nobody has discovered a way to express high-level intent that is much better than modern programming languages. The mathematical preconditions and post-conditions tend to be relatively similar; their big advantage is often that you do things twice. If you're going to make a mistake, you often make mistakes in two different ways, so you can compare.
People have come up with techniques for doing certain kinds of semantic modeling, like flow charts and finite state machines. Some of these techniques have software representations; mostly they have significant limitations. Things like finite state machines are pretty good for representing things like business processes. Many systems that do business process modeling use finite state machines. They all have the problem of hiding a lot of complexity because they're not full-blown Turing equivalent languages. There are always classes of problems that you cannot solve with them. So, it tends to always devolve down to you have to use a general purpose programming language. How do you make a general purpose programming language more comprehensible, especially when the systems you're trying to manipulate are very, very large?
A lot of the stuff in Java is designed around building large systems, and people have built multimillion line systems with it. But still, given the state of the art in tools, these systems become very difficult to manipulate.
Bill Venners: I see. Is the point of using a semantic model then to make it easier to make changes to those large systems?
James Gosling: To analyze them, to understand them, to make changes to them. A wide variety of things are a lot easier when you have that kind of representation. There's one school of thought called refactoring. The refactoring camp has developed a lot of transformations. They're often simple things, like renaming a class. But if you have a system with a million lines of code and you want to rename one class, it becomes extraordinarily difficult. You have to find all the places that use that class, and only those that just use that class, not something whose name happens to look a lot like that class. It's similar to if you try to move methods around.
The refactoring school is a lot about what happens when you're trying to reorganize. This can end up being surprisingly important given the way systems evolve. Systems are almost never really architected in their entirety. What happens is you start with some system design that you architect, and then it grows. Most nice, clean architectures turn into some horrible nightmare given just a few years of growth. Refactoring is all about how you rearrange things and keep them tidy.
But you can do lots of other things with structural representations, like finding places where certain idioms are used and rewriting them,
finding places where certain kinds of errors exist. You can do static flow analysis. Say you have an API that has a lock and an unlock method.
You can say, "Find any path where I lock this data structure, but then I don't unlock it. Find all the
if statements that can be turned into conditional expressions. Factor out this variable from that loop."
You can do all kinds of interesting transformations that are very difficult in text, but very easy in a structural model.
Bill Venners: If I were to use your tool someday, would I build and use a semantic model of my program as I'm typing code, rather than the usual edit-compile-edit-compile cycle?
James Gosling: Yes, though in my current experimental test bed, you don't type in text.
Bill Venners: You just think?
James Gosling: It sort of slurps in your program and then you say, "Apply this transformation, apply that transformation." Right now I'm not trying to replace regular editors. I may actually do a little plug-in that lets you edit text in it. But right now, this is an experimental test bench. It's not something anybody would actually want to use.
Bill Venners: When I first started learning about Java years back, the thing I found coolest about it was that you could send behavior across networks. And yet, so far, that aspect of Java doesn't seem to really be used that much. Do you agree?
James Gosling: The answer is yes and no. People often use it more than they think they're using it, or they're using it in different ways. That certainly happens all the time with cell phones. Software is dynamically loaded into cell phones all the time.
Bill Venners: Do you mean midlets?
James Gosling: Yes, they're the little software bundles called midlets. They're dynamically loaded into cell phones all the time. You can explicitly load them, but often they just sort of come. They get wrapped in their little secure sandbox where they can play. But the facilities actually show up in many different places. The average large-scale Java application these days has a very modular architecture, where a spine is the central architecture, and then all these modules plug in. They tend to plug in dynamically at runtime, so often there is some kind of configuration file that will say, "Okay, plug in all of these pieces."
If you look at the way app servers are built, that's all they do. There's the basic core of the app server. But the stuff you think of as your application is dynamically loaded after the app server starts up.
Bill Venners: But they're not usually loaded across a network.
James Gosling: How the bits are delivered is another question, but the basic mechanism for this on-the-fly, dynamic construction of things has been used heavily. The over-the-network part got messed up with the bizarre legal maneuverings that happened around applets. Within corporations, applets seem to be doing surprisingly well, even on the greater Internet. But we see them mostly as games. They've definitely taken a whack over the legal nightmare that has been everybody's relationship with Microsoft, yet they're still doing pretty well.
Bill Venners: In a previous interview, I asked you how abstract we should make contracts when we're designing the APIs. You said, "...as
abstract as possible, because every commitment you make is a piece of flexibility that you've lost." Recently I encountered a problem involving a method called
If you look at the
java.sql.ResultSet, which just retrieves a boolean value from a column. This method was being used in an API that I had purchased, and
there was a bug. The code was calling
getBoolean on an integer field in a PostgreSQL database, in which 0 was stored to mean
and 1 to mean
true. The problem was that
getBoolean was always returning
false even if the value in the database was 1.
getBoolean contract, it just says it gets the column's value as a
Boolean. It doesn't say what to do if the database field is not actually a boolean, or if it is an integer field. The contract is abstract, but it's also vague. So I'm not sure where the bug is. It's probably in the driver. On the other hand, it may be a valid implementation to say, "Well, if it's not a boolean field because I don't have that in my database, I'll return false." The
getBoolean contract doesn't disallow that interpretation.
If you look at the
James Gosling: I would guess that one was a bug in the driver, because that's a situation where it should have been tossing an exception.
Bill Venners: The
Is there a difference between abstract and being vague? Should you be abstract, but not vague? In this case, I don't think it makes sense to interpret the contract as
it's OK to return
getBoolean contract does say it should throw an
SQLException if there's trouble accessing the database. In this case,
though, there was no trouble accessing the database. The method could throw a runtime exception, but should it and which one?
false all the time. But the contract doesn't
getBoolean in the PostgreSQL driver returns a boolean value. It just happens to always be
the database field type is integer.
Is there a difference between abstract and being vague? Should you be abstract, but not vague? In this case, I don't think it makes sense to interpret the contract as
it's OK to return
James Gosling: Yes. This is one place where it is the art of computer programming. You need to specify as much as necessary for people to be able to use it correctly, but you don't want to over-specify things. It's very hard to do complete specifications. Almost nobody actually does specifications that are close to accurate. The only ones in the Java world that I think even come close to rigorous completeness is the Java language spec, and that's probably because most of the words for the current edition came from Guy Steele and Gilad Bracha, who are well-known totally anal freaks.
It takes a very special mindset to write specifications, and even so, Guy still gets upset. He's always finding hidden vagueness. So, in some sense, vagueness is inescapable. We're human, and you always have to interpret anybody's documentation with a certain set of reasonable-person filter to it.
Bill Venners: You mean, I have to be a reasonable person when I interpret the contract?
James Gosling: You have to say, "What would the reasonable correct interpretation for something be?" Invoking
getBoolean on a field that happens
to be an integer has to be wrong. Exactly how the system should respond to that is another question. People who actually implement it might
have gone one way or another. But returning
Bill Venners: ...is probably a bug.
James Gosling: I think anybody would agree it's the wrong thing to do.
On the other hand, you can also over-specify in subtle ways. For example, if the contract said, "
getBoolean returns a new
Boolean object that
tells you whether the result is
false," that has a piece of over-specification in it that can be damaging. It says it returns a
Boolean object. There are really only two values for
Booleans in the world,
So you can just have two
Booleans in the entire universe, and you return a reference to one or the other. But all too often,
because you have the word "new" there, you actually have to construct a new one. So, you have thousands or millions of instances of things that are all
Bill Venners: Or in my case, all
James Gosling: And that is an issue because the objects, besides having a value, also have an identity. So when you say, "This returns a new
Boolean," you're also promising that
Boolean object has an identity distinct from any other identity. And that forces you into an implementation that consumes much more memory than necessary.
Bill Venners: That makes sense. You have to find the right place where you're being as abstract as possible, but no more abstract than is appropriate.
James Gosling: It's like this old Einstein quote, "Everything should be as simple as possible, but no simpler."
Bill Venners: What do you see as the role of unit tests, conformance tests, and any other test in software construction?
James Gosling: Tests are something that people have to take seriously. I would love for the state of the art in theorem-proving testing to be better than it is. But the techniques that actually work sort of combine unit testing and clean interfaces, where you actually describe how things interact properly. That increases the probability that if all the components pass their unit testing, you can put them together and the whole thing actually works. But you still have to test the whole thing, no matter how much unit testing and careful design you've done. I don't think anybody tests enough of anything. But that's sort of a truism.
In the J2SE (Java 2 Platform, Standard Edition) world, we put an immense amount of effort into testing. We have these huge test suites, tens of thousands of test programs that we run, large applications and small test programs. It consumes an immense amount of energy. Given the way people depend on Java these days, we absolutely have to do it.
One thing that makes testing difficult is that it's kind of boring. Most people think of it that way, but at the same time, it can be intellectually very difficult.
Bill Venners: How's that?
James Gosling: Trying to deal with subtle interactions between pieces that are miles apart. One of the things that goes on in Java is trying to minimize the places where that kind of interaction happens, so that it's easier to test. But lots of things are intrinsically difficult to test, like floating point arithmetic. You could still get a Ph.D. thesis for finding a good way to test, say, the sine function or the cosine function. It's amazing how subtle some of these things are, even though they seem simple.
Bill Venners: I think it's often hard to think of tests. In fact, sometimes it's impossible to test things, because they have to be true everywhere. It's like you can prove a theory wrong, but you can't prove it right.
James Gosling: Right. When I mention sine and cosine, a lot of these functions have this general property that's usually called monotonicity, where it always increases over some interval. Sine always goes up and then always down, and then always up and then always down. Given the vagaries of floating point arithmetic and rounding, if you aren't careful, the function will sometimes bump up and down, because of rounding. You can get interesting shimmies at a really microscopic level in the curve.
Nobody actually has a great way to test for monotonicity in these functions, other than essentially enumerating all possible values. But given 64-bit floating point, there isn't enough time in the universe to do that enumeration. So, people rely on constructing their algorithm so that they have, usually not a proof, but validation that the curve will be as smooth as possible.
Bill Venners: Now and then, my hands hurt. I try to pay attention to it and take it seriously. I have an ergonomic keyboard; I try to sit with good posture; and if things get painful, I stop typing for a while. I know you had a bad case of Repetitive Stress Injury (RSI) that was ultimately healed by surgery. Could you describe what happened to you and perhaps give advice to programmers who notice their hands or wrists hurting?
James Gosling: People really need to pay attention to their bodies, their hands, their wrists, their elbows -- the whole system -- because they can easily get out of whack. One big issue is not any particular problem, but that often you have multiple problems that interact and masquerade as each other. It was so bad for me because I basically had three independent problems -- or four or five, depending on how you count them -- that shared certain symptoms associated with pain in the arm and wrist. I had issues with carpal tunnel syndrome; most of its symptoms are about a pinched nerve. But there are other places where nerves can get pinched. So, I had issues in my carpal tunnel, in my elbow, my spine, and issues around my neck.
As for the neck issue, I had been in a car accident and had a whiplash injury, which caused my right levator scapula muscle to chronically inflame. That muscle apparently is positioned such that if it swells, it not only gets sore, but it pinches a nerve that goes down to your arm, making your arm hurt. And you'll think, "What did I do to my arm here?" But it has nothing to do with your arm; it's up here [in your neck].
You can get joint erosion problems by holding the joint at odd angles. Many hackers have injuries that are like tennis elbow, which in itself causes one joint to slightly dislocate, which can pinch a nerve there. You also have two bones that form a sort of face at the wrist, where all the little carpal bones are. If the one bone dislocates, the face isn't flat anymore, and that can cause tremendous pain in your wrist.
There are simple things you can do, like the way they teach you to hold your hand when you're typing spaces repetitively. You pull your thumbs back so that you have your fingers on the home row and then your thumb on the spacebar. That puts an unnatural strain on your thumb's lower joint where it extends into your palm. I had serious problems with this joint dislocating all the time and wouldn't know why. It would hurt like hell in my hand, and sometimes I'd notice a bump there. The bump was there because the bone was not in its socket.
Bill Venners: You dislocated your thumb by typing?
James Gosling: Oh yeah. It's remarkably easy.
Bill Venners: What would you advise to programmers? There's a lot of typing going on. What should we do to avoid problems?
James Gosling: Well, you certainly should watch out for posture issues and keyboard issues. Many people focus on the keyboard and get their keyboard set up pretty well, but then completely ignore their mouse. The mouse, if anything, is even more important than the keyboard. Often, people will have the mouse up on the side and they'll crank their wrist to use it. When you have your wrist really folded and you're trying to do detailed motions, it can really aggravate the whole carpal tunnel.
In some sense, the dumbest thing I did was avoid getting treatment. I actually had carpal tunnel surgery done on my wrists, and I really should have done it a lot earlier. Fortunately, it wasn't too late. The carpal tunnel is a place where the nerve gets tightly squeezed; apparently, if it's squeezed heavily for a long period of time, the nerve will just die. And once the nerve's dead, it ain't coming back. So, it's then permanent.
Many issues that people have with the carpal tunnel will go through the standard succession of problems. One day you'll feel some numbness in your hand, and you go to your doctor, who says to you "Wear this brace." And that'll make the problem go away. What's happening is you've had some swelling in your carpal tunnel, and by putting the brace on, it stops you from abusing it. The swelling goes down, but then the sheath around the tendons will grow a bit. You'll find that over time, the quick fixes like wearing a splint are less effective. Pretty soon, your hand is just a piece of meat. You can't do much with it that requires the small motor muscles.
I got to the point where I really couldn't sign my name. I had to learn to write. I could kind of grasp a pen, but I couldn't do side-to-side motions in my fingers, because that mostly requires your small motor muscles, which you lose control of.
Bill Venners: You lose control of them or it's too painful?
James Gosling: You lose control of them. You can do anything that uses just the tendons. I could make a fist, but I couldn't spread my fingers. I learned to grab a pen and then use my elbow and shoulder. I could kind of write, but could just barely sign for lunch.
Bill Venners: I've heard of people who can't turn keys anymore.
James Gosling: It's not so much turning the key, but can you actually manipulate your hand into a position to hold it, at least for a carpal tunnel issue.
For me, the thing that made a difference was finding doctors who could deal with the system rather than a particular problem. All these problems look the same from the symptomatic point of view, namely your hand, your wrist, or your forearm hurts. And it's often a fairly generic hurt. When it comes from a nerve pinch, the nerve pinch can be anywhere. So, doctors can say, "Try this or try that." A hand specialist will look at it and say, "Well, the nerve propagation delay across your carpal tunnel isn't too bad, that can't be causing your problem," and, "Well, the nerve propagation delay across your elbow isn't too bad, so that can't be causing your problem," and "The nerve propagation delay up there isn't too bad, so it's not causing a problem." But when you add them all up, it's a problem.
Bill Venners: That reminds of what we talked about last time. That to understand large, complex software systems you have look at and understand individual pieces, but also understand how those pieces interact to form the whole system.
James Gosling: Yeah, and one of the sad things about modern medicine is that the doctors tend to be pretty specialized. The specialization is completely unavoidable. There's just so much knowledge. You could spend your whole life just understanding hands or elbows.