Sponsored Link •
Bill Venners: Could you give an overview of what's in that spec and what it's all about?
James Gosling: The realtime spec for Java is all about how you write programs in Java on realtime systems. And there are a number of components to it -- simple things like dealing with accurate dates, scheduling events. Most of the hard things have to do with deterministic timing.
In the realtime world, what people care about is performance, to a certain extent. But more important than performance is, generally, determinism.
When you say, "I want this to be executed at that time," in general, by God, you mean it. And then it's not allowed to slip. And that has effects in two areas. One is scheduling, and there's all kinds of exotic scheduling algorithms that can be in there, and the realtime spec has a general framework for putting in all kinds of schedulers.
The other side of it is that there's one big source of non-determinism in timing in Java right now, and that's garbage collection. And garbage collection is really the number one hard problem.
On the one hand, people look at garbage collection and what it gives you, and they by and large really like it. Particularly folks in the realtime world like garbage collection a lot because it's tremendously useful for increasing the reliability of systems and speeding up development time. It just helps to hold the whole robustness story tremendously.
Bill Venners: You said particularly the realtime people like that.
James Gosling: Yeah. Well, because they tend to be even more worried about systems that crash. With a lot of these realtime systems, if the software crashes, an actual person dies. It's a control system for an airplane or a robot arm -- you're just getting ready to stop the arm and you garbage collect and the arm just keeps going because nobody's telling it to stop. And it takes out a wall or innocent passers-by or something. And so you can't tolerate that kind of thing.
And so there was a general consensus that we wanted to have garbage collection in there but we didn't want to live with the hiccups that you get when your average garbage collector kicks in. All garbage collectors have the effect that they tend to shut down the system for some interval of time. And some of them you can actually tune the interval.
So, like in HotSpot, there's a limited amount of control that's possible over how long each GC step takes. But if you're writing an application whose timeliness needs to be tighter than what something like the HotSpot VM can do, then you're in trouble.
One of the things that we decided fairly early on was that an easy out on this problem would be to, say, only use "realtime garbage collectors." That's garbage collectors that have essentially zero pause time because they either back out or are doing things truly in parallel.
A couple of people in the group did a literature survey and they concluded that real realtime garbage collectors essentially didn't exist. There were a lot of things which were claiming to be realtime garbage collectors that were close but not good enough. And the usual extent of closeness was that the people were saying, "Well, we're realtime because we never pause more than a millisecond." Or, "We're realtime because only one chance in a billion will we ever actually back off and do a full mark-and-sweep."
Unfortunately, both of those invalidate the GC algorithm in some applications. In the realtime world, people are worried about the worst case. They don't care at all about the average case or the best case, they care about the worst case. And if the worst case is that this algorithm will actually do a full mark-and-sweep -- shut the thing down for seconds -- then it doesn't work, even if it happens once in 10 or 15 days of running. If the thing that's running is the thing that's controlling the flutter of the wingtip of an F-16, after 10 days the airplane turns into dust. And that doesn't work.
For GC algorithms that have fairly small minimum times, often people will claim that they have a realtime algorithm, and then they'll say it's got a maximum pause time of a millisecond. And that works for some applications and doesn't for others. One way to characterize realtime algorithms is based on the maximum latency that they can tolerate. And some of them can live just fine with a millisecond; some need something much finer than that.
So what we ended up doing was defining a way to allow a special kind of thread to run even while the garbage collector is running. And if you know much about garbage collectors then you think about what is it that has to be true about a thread if it's going to run while the garbage collector is running. You end up with a fairly onerous set of restrictions. Namely that whatever the thread does, it does not allow it to touch an object in the garbage collector heap, and it's not allowed to move a pointer to one of these objects around.
And so we've defined two subclasses of thread, one called the realtime thread, which is a thread that has all kinds of extra scheduling parameters besides just the priority. And then there's another subclass of that called the no-heap realtime thread, which is one that also has the ability to run while the garbage collector is running. But those threads are not allowed to access heap memory. And the way that they are able to work is that there's a way to allocate what are called "immortal objects." An immortal object is one that is allocated -- it stays there forever and it never, ever goes away.
The immortal objects correspond pretty much to what people actually do as standard practice in realtime coding today. Generally when people are doing serious realtime work they pre-allocate everything. And then they sort of go into their realtime reading windows with the things which realtime never, ever allocates. Because it turns out that almost all the allocators in the world actually have non-deterministic timing.
See, almost everybody's malloc() or free routine has non-deterministic timing. And so in general, people don't use them for the intense realtime application. So the whole no-heap thing/immortal objects, if you describe that to an average Java programmer, they'd say, "Oh my God, this is really awful." But if you describe it to somebody who's been doing realtime programming, their reaction is sort of, "Gee, that's what I do today. It's not a big problem."
But it gives them the advantage of this universe where people doing these really time-critical things can actually interact with the other world that is done in Java, and get all the advantages of writing programs in Java, and also get all the portability stuff. That's really the most important piece.
Bill Venners: How do the new scheduling capabilities work?
James Gosling: With scheduling, there are some new objects introduced and one is called the scheduler. And a scheduler determines when threads should get started up. And one of the parameters of the scheduler is if it's a no-heap thread, it can then start while the garbage collector is running.
But normally, the scheduler is similar to what happened in typical thread scheduler. It's looking at the queues of threads that are available to be one. They picked one, and in most everyday computer systems these fairly simple priority-based schemes work pretty well.
In realtime systems, generally you've got a notion of a deadline when this particular computation has to be done. And so there are ways to associate properties with threads like what its deadline is, what the expected amount of work is. And that gives a scheduler more information to sort of trade off which things to run now.
It's an open-ended framework, so that people can invent new schedulers and plug them in. There's also hooks in there for doing what people in the realtime world call feasibility analysis. Namely, you've got a set of threads, and you ask the question, "So does this make any sense?" If you've got three or four threads, each of which wants 100 percent of the CPU all the time, it makes no sense at all because you'd need three or four CPUs. Or three or four times as much CPU as you've got.
So you can do things like back out or not start systems, or go into some sort of fail-safe mode.
Bill Venners: What do you mean by "back out"?
James Gosling: Like not start the system. It's application dependent. When the feasibility analysis fails, what do you do?
Bill Venners: So the VM does the feasibility analysis before it starts? Or does the application do it?
James Gosling: The application effectively is in charge of invoking it. Because what the application does is it sets up a set of threads. And then it asks the scheduler "Is this feasible?" And if it is, then the system carries on. If it's not, then the application decides what to do. And it may do things like decide to cut back on its workload somehow. Or it may just even refuse to start.
It's sort of like having the "on" button on your airplane, if it turns it on and says, "Sorry, install a new CPU."
Bill Venners: Is this going to be incorporated into all future VMs? Or will there be realtime VMs and a not-necessarily-realtime VMs?
James Gosling: Yeah, it really ends up being two. There's the Java VM and then there's sort of a specialized class of Java VMs which would be the realtime Java VMs.
Bill Venners: So there are certain kinds of applications then that would run on realtime VMs that just wouldn't work on regular VMs?
James Gosling: Correct. The standard VM is not optimizing for the realtime performance. In general, what non-realtime systems are optimizing for is not timeliness but throughput. So you're trying to see how much raw performance you can get out of the whole system. And whether you're hitting your deadlines on a millisecond by millisecond basis isn't really relevant.