Guerrilla Development
Messaging may not be the way to Build a Distrbuted System
by B. Scott Andersen
July 31, 2005

Summary
Eric Armstrong's recent article on Artima caught my eye and made me think a bit about the evolution of distributed programming techniques. In the beginning, there was the message. But what now?

In the beginning...

... there was the message. Eric Armstrong's recent article on Artima Messaging is the Right Way to Build a Distributed System caught my eye the other day. I've not had a chance to write anything for Artima in quite a while and perhaps responding to this piece is as good an excuse as any to get back in the habit.

Eric's article made me begin thinking about the evolution of the distributed programming techniques I've seen and done in the last two decades. While a message passing paradigm started the ball rolling, I believe we've progressed far from that mark.

CSPs

I first started thinking about distributed computing using messaging in about 1985 while working on a project that was trying to make a loose collection of devices communicate over a twisted pair network. It was pretty primitive compared to what we do today, but the problems were all the same: remoteness implies longer latencies, the possibilities of partial failure, managing parallelism, and dealing with the pure tedium of message formats and the protocols that carry those messages.

The model we used then was reflected in Tony Hoare's book Communicating Sequential Processes. CSPs are indeed an excellent way to organize distributed components with simplicity. Each service need only have a single thread (it could have more, but more aren't required for this model), they take a message, perform an action, and send a reply message to report the results to the initiator.

Message paradigm wrapped in an RPC

A few years later, I was running a project building a piece of semiconductor fabrication equipment and I employed the same approach. The machine had a large number of moving parts and several racks of computers providing the computing power to handle the motion control and supervisory functions. The CSP approach made managing parallelism about as easy as it was going to be. The model was so intuitive that I constructed a programming language called MIL (the Machine Interface Language) which I had hoped technicians might be able to use to calibrate and diagnose problems with the machine.

MIL could format messages and send them on the machine's message bus. One of the mantras I'd chant was, "everything had to be on the bus!" While this probably drove the engineers nuts, the approach ensured that the power of the network and this message passing paradigm extended into all of the nooks and crannies of the machine. If it works for the big stuff, it works for the small stuff, too.

The message handling primitives of send and receive were quickly wrapped into procedure forms: one form for a regular RPC (send, wait for the reply), and call-no-wait. (The majority of the code written for the machine was in C and the C programmers also wrapped their message stuff in RPC wrappers.) The RPC form looked something like this in MIL:


status := ssl_move(x:=5, y:=10, timeout:=40000)
if (status != 0) then
	...
end if

To do things in parallel, you would use the no wait form of the subroutine call which would just send the message and not wait for the reply. Waiting for the reply was done in a separate step. MIL's syntax for using the no wait form looked something like this:


// initialize the substrate stage
x := ssl_initialize_no_wait(param := 5)
// initialize the reticle stage
y := ret_initialize_no_wait(foo := 6)

// now do the rendezvous
wait(x, 30000)
wait(y, 30000)
if (waiting(x) or waiting(y)) then
...
end if

In the above code, we sent messages to both the left substrate stage (ssl) and the reticle stage to have them initialize. The _no_wait forms of these routines just sent the messages to the respective subsystems and returned a handle that could be used later to recognize the arrival of the return message. So, in this example, we started the initialization process for two of the machine's subsystems and then, once all this activity was started (in parallel), we could wait for their completion with the wait command. The wait command takes a variable used as the status return value from the send call as the first parameter, and takes a time out value (in milliseconds) as the second value. In this case, we just start waiting for the return messages. In the normal case, both return messages will arrive, both wait calls will succeed before their timeout kicks in. The if statement after the two wait calls asks if either of those two waits had failed by seeing if we are still waiting. All of this is pretty simple and it provided a very powerful, intuitive model that most everybody, engineers, technicians, test designers, etc., could use.

Notice, though, that the underlying mechanism, message passing, eventually gathered another layer of abstraction on top of it. This wasn't to obscure the remoteness of the various subsystems inside the machine; it was to provide some syntactic sugar that made the concepts clearer and more palatable to the developers.

Moving more than data in our messages

After this project it was time to waste some career cycles doing stuff for the web. I'd done some SQL stuff in previous jobs but it was interesting to me how much programming was being done in the database layer. It wasn't just all the SQL query strings that were being passed around, though there were plenty of those, it was also the number of stored procedures that were being used. It seemed like some combination of SQL and the stored procedures were a programming platform in its own right. The lines for when code was installed and used was starting to get blurred in my mind. Were SQL statements data or code? How about stored procedures? If you install stored procedures and then exercise them along with a bunch of other SQL code, is that mobile code? Or, is it just code that was subject to a lightweight install or temporary install. Was there a difference?

Is a connection to a database, and the passing of SQL statements just business as usual in Eric's messaging model? Or, is it more along the line of mobile code? Clearly, the purpose of moving the SQL statements to the database is to have the processing done closer to the data with only the results passed back to the initiator. That sounds like mobile code to me. In fact, the whole point of the database connector is so I don't need to know how the magic of the transport and messaging between my program and the database works.

Why should I care about the protocol?

When I'm trying to talk to a service like a database I really don't want to care about the underlying transport or how they organize the messaging beneath. I've been given a different, higher-level view than all that and I'm happy for it. My program has to be able to handle database errors (including a lost connection to the database), but I don't need to understand any of that stuff the database connector provided.

In fact, the particular transport and protocol for the interactions with the database are a private problem between the database vendor and the provider of the connector technology. If they release a new version of the database and connector, they could even change this protocol completely and I, after a quick recompile and relink, shouldn't care a bit. Wouldn't it be nice if more stuff worked like that?

More stuff does work like that

The idea of moving code isn't new. Back in 1993 and 1994 General Magic carved out a nifty idea of having a scripting language regular users could use to help organize their personal communication. Here's an example of what this could have done.

Let's say you set up some rules for how you wish to be contacted:

If email is received during 8-6 M-F route to office mail.
If email is marked urgent, route to office mail and copy to my Blackberry.
If I haven't picked up a message during normal business hours and a message is 4 hours old (or older), send a message to my Blackberry with the subject only.
If I receive a message from "boss" during normal business hours that is not picked up in 30 minutes or less, call me on my cell phone and do a text-to-speech of the message (so I can pick it up at the ballpark... Hey, I don't want to be fired...)

Telescript was going to do stuff like this and much more. Here's the blurb from a Byte magazine article from that era:

"Telescript, a communications-oriented programming language comparable to C or Pascal, will let developers create network-independent intelligent agents and distributed applications. Some of these applications, in turn, may be tools that let ordinary users create intelligent agents without programming. What PostScript did for cross-platform, device-independent documents, Telescript aims to do for cross-platform, network-independent messaging. General Magic hopes Telescript will become a lingua franca for communications."

We didn't get a world filled with Telescript. But, we do have a world filled with Java and we've been moving Java code since the beginning of the Java movement. Applets, those sometime cutesy and even occasionally useful pieces of Java code that get loaded into web browsers are an example of mobile code. Sometimes those applets talk back to the computer that served them up to our web browser. Again, the browser provides a place for the applet to run and the applet provides a service within the page, but whatever protocols or messages it exchanges with the remote computer are invisible, and uninteresting, to us. Like the database connection utilities, I don't really care how it does its job.

More stuff should work like this

Moving code around, whether it be SQL commands, telescript, or Java code, has much to recommend it. There are a couple of things that moving code accomplishes:

Computation happens in the right place, and
details like protocol specifics and message formats can be hidden.

Jini, a distributed computing model for Java, does move code around and for both of these reasons. But, there is some reluctance even in this forward-looking group to fully embrace moving code to solve some problems. I had a long running conversation on the JavaSpaces list where I lamented that some of the proposed changes to the JavaSpaces specification made this Spring were trying to move data when moving code made more sense to me. You can read my prattling here. Who knows, maybe I was off my rocker--but it seemed like if moving SQL commands to the database for data selection made sense, moving code to a JavaSpace for data selection made at least as much sense.

The relationship between the JavaSpaces client and the JavaSpaces service is managed by the mobile code of the service's proxy. Just as the database connector technology provided a local proxy for the facilities supplied by the database, the JavaSpaces proxy supplies a facade for the JavaSpaces service. The only difference between these two cases is when the linkage-editor functionality happens. If you believe "later is better", you're gonna love Jini and JavaSpaces.

The evolution from the messaging paradigm

We've come a long way since the simple exchange of messages. We started with just simple messages, protocols that had to be agreed upon up-front, and Hoare's CSP model. But as time progressed, there was pressure on that model.

Pressure to move the computation closer to the data
The SQL commands you send in every database query are an example of moving the computation closer to the data. You can buy more computing power; it is hard to buy more network bandwidth.
Pressure to keep a familiar program model like subroutine calls
Messages are fine but the RPC programming model is often what you're trying to do anyway: make a call, get an answer, move on. Why not use a something that looks like a procedure or function to do that?
Pressure to hide unnecessary details
I don't care about the protocol between the database and whatever connector technology I'm using. In fact, the less I know the better off everybody is. If they want to change it in their next release, that's fine with me.
Pressure to control both ends of the connection
If you have to declare message formats and protocols up-front, that means a lock in that could be uncomfortable later. If instead you simply offer an API for the service (as the database connection utility or Jini service would do), you can control both ends of the protocol and the message formats because both of these things are implementation details below the visibility of your end-users.

Distributed computing's growth over the last half decade has been astonishing to me. And, while I'm still a fan of Hoare's simple model, I recognize that pressures have encouraged an evolution of tools, techniques, and attitudes that have made distributed system development easier.

It may have all started with the message... but we've come a long way since then.

Resources

Communicating sequential processes (Prentice-Hall International series in computer science) (Paperback), C. A. R. Hoare, 1985, found on Amazon here. This was one of the first texts I had seen on the topic. It was a bit heavy on the symbology and the math, but once I got it, I was hooked. I still have the hard-bound edition on my shelf.
General Magic invented Telescript which was an idea out of the blue and far before its time.
You can learn more about Jini here.

Talk Back!

Have an opinion? Readers have already posted 13 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever B. Scott Andersen adds a new entry to his weblog, subscribe to his RSS feed.

Digg |

del.icio.us |

About the Blogger

B. Scott Andersen has 20+ years of experience in software development splitting his time between individual contributor and management. He is now a Principal Software Engineer with Verocel, Inc., a company specializing in helping safety-critical system developers attain certification for their products. The opinions expressed here are his own and he takes full responsibility for them... unless, of course, they are worth money, at which point they belong to his employer.


	Web Artima.com