Frank Thoughts
Recoverable Remote Exceptions with Jini
by Frank Sommers
May 29, 2003

Summary
Jini must be one of the most misunderstood technologies out there today. In these Weblogs, I'll try to post a couple of brief essays from my experience using Jini in an enterprise application. This first essay talks about recovering from network-induced exceptions.

Abstract: Remote exceptions indicate failures induced by the presence of a network in a distributed system. That same network, however, might also help recover from such failure conditions. This article presents a simple design technique to push remote exception handling inside a Jini service's proxy, hiding exception handling and recovery from that service's clients. With that method, service providers register exception handlers in Jini lookup services, and their service proxies dynamically discover and download those handlers when encountering remote exceptions.

To differentiate between local and remote method invocations, Java's remote object semantics require that a remote object implement the java.rmi.Remote interface, and that its methods declare java.rmi.RemoteException. If you are writing a remotely accessible service, that requirement is easy to meet. But what is a client to do when it encounters a remote exception?

I recently pondered that question while reviewing some of my code invoking remote objects. I noticed myself obediently following the remote object semantics, declaring RemoteExceptions in my service objects, and catching all the exceptions inside the client code. The problem was that once I caught those RemoteExceptions, I didn't do much with them: My client code simply trickled them up the exception chain, until they either manifested as an error message warning the user that a service was unavailable or, worse, I just left their catch clauses blank. Because my clients made poor use of remote exceptions, they gained no real benefit from knowing that those method calls were remote, not local.

The obvious benefit of catching and handling an exception is that it gives a program a chance to possibly recover from failure conditions. If properly handled, exceptions directly contribute to system reliability. Indeed, operating systems routinely "trap" exceptional conditions, and then attempt to resolve the failures by dynamically loading exception handlers based on the exception's type into their microkernels (or an exception handler layer above the microkernel, as with Windows NT's Executive layer). Dynamic exception handler loading allows an OS to start up with a minimal overhead: It only has to know where to look for handlers when it encounters an exception.

A similar process might benefit a Jini service's reliability: When a service's client catches a remote exception, it could dynamically load an exception handler corresponding to the exception's type from the network. Since the service's proxy generates a remote exception, it is likely that the service provider has the necessary knowledge to guide the client in recovering from a remote exception.

For instance, suppose that a Jini service proxy must connect to a remote object implementation to perform a Java RMI call. For some reason, the remote object is not available, and the Jini proxy raises a RemoteException. At that point the client really doesn't know how to proceed. A conscientious client might follow a voodoo-like process in its desperate attempts at recovery: "Try this invocation three times, if it still doesn't succeed, then try to contact this other server. If that doesn't work either, then discover a new service proxy. If that still doesn't work, just give up."

If a client's developer decided to implement such an elaborate process, coding up those heuristics would involve a lot of extra work. Even with all that effort, the client's chances of connecting to the remote service would basically depend on luck.

A friendlier service provider would build that sort of process into a smart proxy, hiding the cumbersome recovery attempts from a client. For instance, the service provider might decide to operate a backup server for the remote object. The service's smart proxy would perform failover to that secondary server when invoking the first server failed.

While the second approach lightens the burden on the client's developer, it still poses problems in long-running, dynamic Jini systems. Consider the following sequence of events: The service provider embeds a reference to the secondary server's address inside the smart proxy prior to registering that proxy in lookup services. Some time later, a client obtains a copy of that proxy, and starts using the service. After a further passage of time, the service provider decides to relocate the secondary server to a new network address. Finally, the primary server crashes, and the client is presented with a remote exception as a result.

At that point, the client's reference to the backup server points to that server's old address, since the backup server was relocated after the client retrieved the proxy from a lookup service. One way of mitigating that problem is for the service provider to expire the leases on the old service proxies' lookup service registrations, and register a fresh service proxy pointing to the correct backup server address. As the client realizes that neither the primary nor the backup works, it must then discover a new instance of the service's proxy object.

While that solution certainly works, it still burdens the client's developer with the responsibility to discard the old proxy and discover a new one. A more client-friendly service provider would try harder to ensure that his service's proxy seldom failed: He would push the dynamic failure recovery inside the proxy.

When the call to the initial remote object failed, the proxy would intercept ("trap") the remote exception. Based on that exception's type, the proxy would discover an exception handler from Jini lookup services. That handler, in turn, would have the necessary information for the proxy to contact a secondary, backup server.

What that mechanism, the code for the exception handler downloads dynamically into the service proxy. Therefore, the service provider does not have to provide all the possible exception handling routines at the time of deploying a service: Instead, he can add exception handlers to the network from time to time, making his service increasingly reliable.

Remote exceptions for which handlers possibly exist on the network would extend a "recoverable" remote exception type, RecoverableRemoteException. That exception object would define a single method, getHandlerType(), producing the interface type suitable for recovering from the specific RecoverableRemoteException subtype. Thus, you might have a recoverable remote exception AcmeServerConnectionException. Invoking that object's getHandlerType() would produce the interface, AcmeServerConnectionExceptionHandler. Acme service's proxy would then discover objects implementing that interface from Jini lookup services.

Each of those objects would implement the RecoverableExceptionHandler interface, and its single method, handle(Object ctx). handle() consumes an object representing some context specific to the exception, and produces an object that presents a "solution" to the problem embedded in that context. For instance, if the exception's context represented the failed attempts at connecting to the Acme server, the handler would produce an object corresponding to an acceptable alternate network connection. Note that a handler might be implemented locally or remotely.

The primary benefit of the dynamically downloaded exception handler approach is that it allows a Jini service's proxy itself to take advantage of Jini technology's dynamic service discovery capability. Therefore, it adds a recursive layer to a Jini system, isolating a service's clients from certain types of failures.

Of course, some remote exceptions are not possible to recover from - for instance, when all network connections from a client are unavailable. As those situations illustrate, luck and voodoo are still important elements in distributed systems.

This essay first appeared in the May, 2003, issue of the Jini Newsletter.

Talk Back!

Have an opinion? Be the first to post a comment about this weblog entry.
RSS Feed

If you'd like to be notified whenever Frank Sommers adds a new entry to his weblog, subscribe to his RSS feed.

Digg | del.icio.us | Reddit

About the Blogger

Frank Sommers is a Senior Editor with Artima Developer. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld. Frank also serves as chief editor of the Web zine ClusterComputing.org, the IEEE Technical Committee on Scalable Computing's newsletter. Prior to that, he edited the Newsletter of the IEEE Task Force on Cluster Computing. Frank is also founder and president of Autospaces, a company dedicated to bringing service-oriented computing to the automotive software market.
Prior to Autospaces, Frank was vice president of technology and chief software architect at a Los Angeles system integration firm. In that capacity, he designed and developed that company's two main products: A financial underwriting system, and an insurance claims management expert system. Before assuming that position, he was a research fellow at the Center for Multiethnic and Transnational Studies at the University of Southern California, where he participated in a geographic information systems (GIS) project mapping the ethnic populations of the world and the diverse demography of southern California. Frank's interests include parallel and distributed computing, data management, programming languages, cluster and grid computing, and the theoretic foundations of computation. He is a member of the ACM and IEEE, and the American Musicological Society.

This weblog entry is Copyright © 2003 Frank Sommers. All rights reserved.


	Web Artima.com