Sway with JavaSpaces

A Conversation with Ken Arnold, Part IV

by Bill Venners
September 30, 2002

Ken Arnold, the original lead architect of JavaSpaces, talks with Bill Venners about loose coupling in JavaSpace-based systems, why fields in entries are public, RPCs to nowhere, and building systems that sway with failure.

Ken Arnold has done a lot of design in his day. While at Sun Microsystems, Arnold was one of the original architects of Jini technology and was the original lead architect of JavaSpaces. Prior to joining Sun, Arnold participated in the original Hewlett-Packard architectural team that designed CORBA. While at UC Berkeley, he created the Curses library for terminal-independent screen-oriented programs. In Part I of this interview, which is being published in six weekly installments, Arnold explains why there's no such thing as a perfect design, suggests questions you should ask yourself when you design, and proposes the radical notion that programmers are people. In Part II, Arnold discusses the role of taste and arrogance in design, the value of other people's problems, and the virtue of simplicity. In Part III, Arnold discusses the concerns of distributed systems design, including the need to expect failure, avoid state, and plan for recovery. In this fourth installment, Arnold describes the basic idea of a JavaSpace, explains why fields in entries are public, why entries are passive, and how decoupling leads to reliability.

Bill Venners: What is a JavaSpace?

Ken Arnold: The basic idea of a JavaSpace is to introduce loose coupling between actors in a protocol. For example, you have a question. If you find a server that can answer your question and you send it a message, you are tied directly to that server and have to deal with its failure modes. With a JavaSpace, you instead write an object, called an entry, into the space. Somebody who can answer the request that entry represents then extracts it and writes back the response.

The kind of loose coupling JavaSpaces enables has several advantages, as loose coupling always does. The simple, direct advantage is scalability. Say you have one server and many clients. The clients write requests, and the server extracts them and writes the responses back. Now let's say the server gets overburdened; in that case, you need to accelerate it. With a directly coupled system, you have to make "the place people talk to" faster. Or you have to change people's logic so that something in the system knows how to distribute the load. Distributing the load is an interesting problem: Who is busy? Who is not? It's sometimes hard to tell.

In the JavaSpaces model, you just start up a second server. Now you have two things retrieving requests from the space and writing results back, so performance increases roughly twice as fast. In fact, it is distressingly close to linear when you do this, for a long period. Because you decouple requesters from request handlers and how requests are handled, you can just have multiple request handlers. You can break the request down into two parts; then something that can complete one part efficiently will retrieve it and write back an intermediate result that somebody else knows how to finish. You can partition the result in different ways. The customer only cares that the results come back.

JavaSpaces has essentially three primary operations: write, which puts an entry into the space; read, which reads an entry from the space; and take, which is equivalent to reading except it removes the entry as well as reads it. The entry is a simple kind of object. It has a set of public fields that all have an object type, with no primitive types. Writing an entry into the space is as simple as creating one of these entry objects and writing it. To do a read or a take, you create an entry of a preferred type, you fill in the fields whose values you care about, and you pass it to a read or take method. A filled-in field has to match exactly. A field that is not filled in—a field left null—is ignored. If I ask for one type, I can get a subtype. JavaSpaces provides a simple way for me to ask you to do something, for you to do things you know how to do, and then for me to look for the results. You can design protocols on top of this basic set of operations.

JavaSpaces is sometimes called a kissing cousin to Linda, the work on which it is based. Whereas Linda was structural, JavaSpaces has added objects to the system. JavaSpaces is a distributed, object-based system. JavaSpaces has transactions like some Linda systems, but it has distributed transactions. So JavaSpaces clearly is an inspired work. We took the insights David Gelernter and his crew used to create Linda, and applied them to a new domain, with differences that make sense to that domain.

Bill Venners: Why are fields in entries public?

Ken Arnold: We could have used typical accessors, such as get and set methods. In any pair of get and set methods, such as in a JavaBean, there's a contract. The documentation of the JavaBean says that setting this value will result in the following behavior. One option, for example, could define get as "get next," where the returned value monotonically increases. set could mean "set the starting point,"—reset where the value returned by get monotonically increases from. That is a legitimate get and set contract. In random number generators, for example, you set the seed and get the next random number.

The contract for get and set methods of entries would essentially be: if you call set with a given value, and then return later and call get with no intervening set, the value would be the same—it would be unmodified. Furthermore, remember the matching is exact. When you call set on your template to set a particular value to 17, you are asking for an entry where the value is 17. When you receive an entry and call its get method, you better get 17, not 18. Incrementing is not OK.

So if you examine the contract description for an entry's get and set methods, you would see it describes a field. get and set would have to act exactly like a field. Therefore, we asked ourselves, why should we have get and set methods whose behavior is exactly like this other language construct called a field? Why not just make it a field? If we make it a field, it will have the correct behavior. Nobody can accidentally screw up their get and set methods. Making it a field eliminates a source of error.

Now this sometimes makes people uncomfortable because they've been told not to have public fields; that public fields are bad. And often, people interpret those things religiously. But we're not a very religious bunch. Rules have reasons. And the reason for the private data rule doesn't apply in this particular case. It is a rare exception to the rule. I also tell people not to put public fields in their objects, but exceptions exist. This is an exception to the rule, because it is simpler and safer to just say it is a field. We sat back and asked: Why is the rule thus? Does it apply? In this case it doesn't.

Bill Venners: I'd like to ask you about some quotes from JavaSpaces Principles, Patterns, and Practice, which you coauthored with Susanne Hupfer and Eric Freeman.

Ken Arnold: Eric and Susanne did most of the writing and I reviewed it. They gave me the privilege of putting my name on the book, for which I am grateful to them.

Bill Venners: Here's a quote: "An entry in a space is a passive data object that can't be changed or altered unless it is first retrieved from the space. This distinction has a powerful effect when developing distributed applications." What is important about the fact that in the space the object is passive?

Ken Arnold: In the sense it is meant there, an entry is passive because it doesn't change state on its own. That means, in effect, that by having a written entry, you can turn your back on it. Six years from now, if it is still in the space, it will have the same value. This reduces the number of actors in the system, because the space is not an actor. It remains idle. The more actors a distributed system has, the more complicated the interactions are. Also, for something to change, somebody must take responsibility. Instead of having something that itself and other people change, someone has to step in and say, I will change it. Therefore, all the changes have a responsible actor.

In some sense, JavaSpaces is like an RPC (remote procedure call) to nowhere. You write an entry into a space, which effectively will invoke a method. You just don't know on what, how it will happen, or when you will get a result. It is an asynchronous method invocation. What would the world be like if you made a method invocation, and while the method invocation traveled to the destination, somebody came in and altered it. How would you live in that universe? You would live in a very different way. And so this kind of static existence means you can view it as an RPC to nowhere, because it isn't touched.

Bill Venners: Here's another quote from your book: "Uncoupling senders and receivers lends to protocols that are simple, flexible, and reliable." How does decoupling senders and receivers help you build simple, flexible, and reliable systems?

Ken Arnold: Decoupling leads to that; it doesn't guarantee it. Nothing prevents people from doing bad things, except straight jackets and rubber balls—no sharp edges. If something is useful, it can be abused. But JavaSpaces leads in that direction. Programmers of the various actors in a system need not understand, nor rely upon, the way other actors in the system are structured. Is it one actor or three that performs this particular operation? Are there one or multiple actors of the same kind in the system? Might writing one entry actually result in a cascade of 37 other entries written in by other actors? Those things are out of the requester's sight, as long as the result comes back.

I mentioned idempotency in regard to distributed system design (see Part III of this interview). It is useful here as well. You want to make your algorithms for using JavaSpaces idempotent as well. You want to be able to write the same entry and, as a general rule, have it be harmless. Because if you don't receive an answer in some humanly defined reasonable amount of time, you might decide: Hey, somebody dropped the ball. Better write another one in.

You can think about it this way: If you design a traditional system using RPCs, or whatever you want to use, in the end you design an architecture of actors and determine how they communicate with each other. You decide what messages they can send to each other, what the error states are, and what the responses are. When you design a system using a JavaSpace, you essentially do the same amount of work, except you replace messages with entries.

The JavaSpace provides you with this robustness mechanism. You write entries into it. You can replicate the space. There is at least one commercial implementation of a replicated JavaSpace. The space can be fault tolerant. It can survive crashes. You can have transactions so things don't get dropped on the floor. If I remove something to compute your results and then I crash, the transaction will timeout, abort, and then entries will appear for somebody else to take.

There is a great story about a group building a project called Viper. The project basically uses a JavaSpace as a compute model. You write a task into the space that represents a large complicated simulation. Then there are compute servers that will take out the tasks, whatever they are, and invoke their run method. The servers don't know what the run method does, they just invoke it. Whatever the run method returns, the servers write back in the space. Then they retrieve something else. Essentially they just donate cycles to run these jobs. The servers download the code associated with the job and they execute it.

When the system is in beta, a guy puts in a large fluid dynamics calculation, and he realizes he put in the wrong one. Rather than wait two hours to get the wrong answer back, he finds the compute server executing his entry, and he kills it. He kills off the virtual machine. So the transaction times out, and the request returns to the space. It is now visible for someone else to take, because the transaction has aborted. So someone else takes it and starts executing it. And he goes and he kills that one. He follows this thing around the network, and he cannot kill the job. This is not the typical problem people have in a distributed system, right?

You asked how JavaSpaces help you make reliable systems. It can do it like that. If those servers were failing because the hardware was flaky, it would also work. You don't have to have some guy going around shooting things. The project Viper people learned they have to have a Cancel button. You can cancel jobs in several ways. I don't know how they did it, but they solved it. I can imagine five designs to solve it. But the main point is, using exactly the technology they have now, they could have built a system with 100 compute services where each one is down 50 percent of the time. And they would still be able to get work done. The client wouldn't know that only half of the compute servers were working. It might be too slow. They might decide that 50 percent is a bad down time. If they start replacing the systems, the performance would get better. But it would be robust.

People talk about five nines, six nines reliability. (Five nines is 99.999 percent reliability.) They usually try to reach the desired number of nines by making each component more reliable. But if you design a system like Viper, you can make it reliable with unreliable components that are much cheaper, more plentiful, and easier to come by. At some point, every component is unreliable. I would much rather build a system on that principle than try to build a system that never goes down. It is the difference between trying to survive an earthquake by building a sturdy structure that is hard to break and building a structure that sways with the movement. You can survive much bigger earthquakes by swaying with the movement, even though your instinct is to build a sturdy structure. People are now following the instinct to build a sturdier structure. When building with JavaSpaces and Jini, you sway with the earthquake, and you can do much better.


Perfection and Simplicity, A Conversation with Ken Arnold, Part I:

Taste and Aesthetics, A Conversation with Ken Arnold, Part II:

Designing Distributed Systems, A Conversation with Ken Arnold, Part III:

You can obtain information about Linda from here:

Ken Arnold first mentioned idempotency in Part III of this interview:

JavaSpaces: Principles, Patterns, and Practice by Eric Freeman, Susanne Hupfer, and Ken Arnold, the book from which Bill Venners reads quotes in this article, is at Amazon.com at:

The Jini Community, the central site for signers of the Jini Sun Community Source License to interact:

Download JavaSpaces from:

Design objects for people, not for computers:

Make Room for JavaSpaces, Part I - An introduction to JavaSpaces, a simple and powerful distributed programming tool:

Make Room for JavaSpaces, Part II - Build a compute server with JavaSpaces, Jini's coordination service:

Make Room for JavaSpaces, Part III - Coordinate your Jini applications with JavaSpaces:

Make Room for JavaSpaces, Part IV - Explore Jini transactions with JavaSpaces:

Make Room for JavaSpaces, Part V - Make your compute server robust and scalable with Jini and JavaSpaces:

Make Room for JavaSpaces, Part VI - Build and use distributed data structures in your JavaSpaces programs:

Talk back!

Have an opinion? Readers have already posted 5 comments about this article. Why not add yours?

About the author

Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.