The Artima Developer Community
Sponsored Link

Designing Distributed Systems
A Conversation with Ken Arnold, Part III
by Bill Venners
October 23, 2002

<<  Page 4 of 4


State is Hell

Bill Venners: What about state?

Ken Arnold: State is hell. You need to design systems under the assumption that state is hell. Everything that can be stateless should be stateless.

Bill Venners: Define what you mean by that.

Ken Arnold: In this sense, state is essentially something held in one place on behalf of somebody who is in another place, something that is not reconstructible by the other entity that wants it. If I can reconstruct it, it's called a cache. And caches are often OK. Caching strategies are their own branch of computer science, and you can screw them up. But as a general rule, I send you a bunch of information and you send me the shorthand for it. I start to interact with you using this shorthand. I pass the integer back to you to refer to the cached data: "I'm talking about interaction number 17." And then you go down. I can send that same state to some equivalent service to you, and build myself back up. That kind of state is essentially a cache.

Caching can get complex. It's the question that Jini solves with leasing. If one of us goes down, when can the person holding the cache throw this stuff away? Leasing is a good answer to that. There are other answers, but leasing is a pretty elegant one.

On the other hand, if you store information that I can't reconstruct, then a whole host of questions suddenly surface. One question is, "Are you now a single point of failure?" I have to talk to you now. I can't talk to anyone else. So what happens if you go down?

To deal with that, you could be replicated. But now you have to worry about replication strategies. What if I talk to one replicant and modify some data, then I talk to another? Is that modification guaranteed to have already arrived there? What is the replication strategy? What kind of consistency do you need—tight or loose? What happens if the network gets partitioned and the replicants can't talk to each other? Can anybody proceed?

There are answers to these questions. A whole branch of computer science is devoted to replication. But it is a nontrivial issue. You can almost always get into some state where you can't proceed. You no longer have a single point of failure. You have reduced the probability of not being able to proceed, but you haven't eliminated it.

If my interaction with you is stateless in the sense I've described—nothing more than a cache—then your failure can only slow me down if there's an equivalent service to you. And that equivalent service can come up after you go down. It doesn't necessarily have to be there in advance for failover.

So, generally, state introduces a whole host of problems and complications. People have solved them, but they are hell. So I follow the rule: make everything you can stateless. If there is one piece of the system you can't make stateless—it has to have state—to the extent possible make it hold all the state. Have as few stateful components as you can.

If you end up with a system of five components and two of them have state, ask yourself if only one can have state. Because, assuming all components are critical, if either of the two stateful components goes down you are screwed. So you might as well have just one stateful component. Then at least four, instead of three, components have this wonderful feature of robustability. There are limits to how hard you should try to do that. You probably don't want to put two completely unrelated things together just for that reason. You may want to implement them in the same place. But in terms of defining the interfaces and designing the systems, you should avoid state and then make as many components as possible stateless. The best answer is to make all components stateless. It is not always achievable, but it's always my goal.

Bill Venners: All these databases lying around are state. On the Web, every Website has a database behind it.

Ken Arnold: Sure, the file system underneath a Website, even if it's just HTML, is a database. And that is one case where state is necessary. How am I going to place an order with you if I don't trust you to hold onto it? So in that case, you have to live with state hell. But a lot of work goes into that dealing with that state. If you look at the reliable, high-performance sites, that is a very nontrivial problem to solve. It is probably the distributed state problem that people are most familiar with. Anyone who has dealt with any large scale, high availability, or high performance piece of the problem knows that state is hell because they've lived with that hell. So the question is, "Why have more hell than you need to have?" You have to try and avoid it. Avoid. Avoid. Avoid.


Have a question for Ken or Bill, a different favorite recovery strategy, or a hellish experience with state to relate? Discuss this article in the Cool Stuff Forum topic, Designing Distributed Systems.


The Jini Community, the central site for signers of the Jini Sun Community Source License to interact:

Download JavaSpaces from:

Design objects for people, not for computers:

Make Room for JavaSpaces, Part I - An introduction to JavaSpaces, a simple and powerful distributed programming tool:

Make Room for JavaSpaces, Part II - Build a compute server with JavaSpaces, Jini's coordination service:

Make Room for JavaSpaces, Part III - Coordinate your Jini applications with JavaSpaces:

Make Room for JavaSpaces, Part IV - Explore Jini transactions with JavaSpaces:

Make Room for JavaSpaces, Part V - Make your compute server robust and scalable with Jini and JavaSpaces:

Make Room for JavaSpaces, Part VI - Build and use distributed data structures in your JavaSpaces programs:

<<  Page 4 of 4

Sponsored Links

Copyright © 1996-2018 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use