Although Macromedia architect Sean Neville currently works on Flash-related rich application products for both .NET and J2EE platforms, he originally joined Macromedia to work on the JRun application server. As enterprise architect of JRun 4, Neville was involved with implementing many aspects of the J2EE (Java 2 Platform, Enterprise Edition) specifications, including EJB (Enterprise JavaBeans) containers, JNDI (Java Naming and Directory Interface) providers, JDBC (Java Database Connectivity) managers, and JTA (Java Transaction API) monitors. Neville also represents Macromedia on the JCP (Java Community Process) Executive Committee. In this two-part interview, Neville describes how he used Jini to enable clustering in the JRun 4 application server. In this first installment, Neville describes the object clustering architecture he wanted for the app server, and how Jini facilitated it.
Bill Venners: How did you add clustering functionality to the JRun app server?
Sean Neville: I came to Allaire before it was acquired by Macromedia, where JRun was primarily known for servlets and JSPs (JavaServer Pages) -- the J2EE Web tier. Created at Live Software, JRun was the world's first commercial servlet engine. Allaire had acquired a small company called Valto that built an embedded EJB server and JTA/JTS transaction processor. Allaire's developers had linked those two together in the JRun 3.0 product, which went well. But for several reasons, tacking the two technologies together wasn't the perfect solution, and when I came aboard we decided that the web and enterprise engines really ought to be built together upon a single common base architecture. So in many ways, building JRun 4 involved a comprehensive re-architecture of JRun.
While there were plenty of existing JRun 3 features we wanted to extend in JRun 4 -- such as taking our simplicity and classic RAD features for web developers and extending them into the enterprise -- there were also several things missing that we knew customers needed. One thing missing in JRun 3, which sat fairly prominently among a long list of features we wanted to add to the next version, was object clustering. We wanted to use object clustering as the building block for EJB clustering, JMX MBean (Management-Bean) clustering, JNDI clustering, service clustering, and so on.
Bill Venners: Define what you mean by object clustering.
Sean Neville: In our context, clustering is related to providing high availability, failover, and load balancing for RPC (remote procedure calls) and in making that mechanism as simple to use and as close to zero-admin as possible. We're talking about clustering remote objects. We wanted high availability when clients perform remote calls, so we needed to have multiple skeletons available to process the call. We also wanted failover, so that a user working with a particular stub could automatically failover to another one. Finally, we wanted basic load balancing. Although load balancing is typically handled at the hardware level, developers also expect to find it available at the software level. As part of the load balancing feature, we needed the ability for a stub to choose its skeleton in an algorithm-specific way. We also wanted the algorithm to be pluggable so users could define their own algorithms. And we wanted to make the whole thing simple to use.
Like several application servers today, JRun is built on a service-based microkernel architecture. At the base level is a microkernel that manages services, and all J2EE features are implemented in terms of those services. Services can play many roles, one of them being a manageable role implemented through MBeans.
Bill Venners: MBeans?
Sean Neville: Managed Beans, defined in the JMX specification, which take the JavaBean component model and infuse it with management standards. This is how the service kernel works: if you build a service and deploy it into JRun, in addition to being many other things, it is also an MBean. So standard systems management products, such as Tivoli or OpenView, can manage it. We represent most everything -- from our EJB containers to deployers to our servlet engine -- as MBeans. But MBeans are also remoteable; they are another example of a remote object in our server. This isn't unique to JRun. Other application servers take a similar approach. Among the application servers that use this JMX service framework in one way or another -- and they range from the extremely expensive and fat, to the slim mid-range (like ours), to the free and open-sourced -- I believe JBoss exploits it to the greatest degree.
In any event, our customers were asking for EJB clustering, specifically stateless session-bean clustering. But instead of just developing a specific solution for stateless session-bean clustering and tacking it on front of the container, which is what many vendors ended up doing, we wanted to build clustering into the base remote mechanism itself. So any remoting technology that we built on top of this mechanism -- JMX, EJB, whatever -- would automatically inherit this clustering functionality, and we wouldn't have to do anything new for specific cases beyond configuring the metadata.
Bill Venners: So customers were mostly interested in stateless session-bean clustering, but you wanted to have generic clustering functionality that could also give people that stateless session-bean clustering.
Sean Neville: Right. We also wanted to apply the clustering mechanism to EJB types other than stateless session beans, and make it easy for customers to extend the mechanism. Two of JRun's strongest targets are the OEM and ISV (original equipment manufacturer and independent software vendor) markets. A number of major software businesses embed JRun in their products. As a user, you might not actually know that you're using JRun, because it's often rebranded. We like that case. So we try to design each feature to be as small and as extensible as possible. Larger J2EE servers seem less concerned with the extensibility use cases of OEM/ISV scenarios and in implementing them to be small and simple, perhaps because we're talking about a pure software objective rather than a hardware or services play.
We also wanted the clustering architecture for our own internal convenience. We didn't want to have to think about developing and maintaining cluster code every time we invented a new subsystem. We wanted to have a single mechanism to handle it. So clustering needed to be designed for our own reuse, as an aspect of our service infrastructure that could be applied to any service.
Sean Neville: When you deploy a remote object, it needs a way to locate all similar remote objects on the cluster. The cluster would include whatever hosts the admin decided to link together. By default, if the admin did nothing, we wanted the cluster to include all the nearby hosts reachable via multicast. So whatever that network neighborhood cluster was, we needed a particular remote object to find its buddy on another system within the cluster based on class lookup. The lookup could possibly be based on some other information in the template, but it would mainly be based on the class. We wanted those objects to automatically find each other and exchange stubs.
Bill Venners: So if I'm a remote object, I want to discover all my peer remote objects of the same class. So the cluster self-organizes?
Sean Neville: It would. We wanted a small automatic lookup or discovery mechanism to allow peer remote objects to find each other.
Bill Venners: In other words, when I first come up, I will try to find my peers and get their stubs.
Sean Neville: That's right. As a remote object you'll have your own stub, but as a clustered remote object you'll also have my stub and everyone else's stub who belongs to the same cluster.
Bill Venners: And they'll have my stub somehow.
Sean Neville: That's right. When a client gets a remote reference to me after I've performed discovery, then I can give him all the stubs I know about, which is a collection that might include your stub in addition to mine. Then he has all the stubs locally and can choose you without actually coming to me. The way he would choose stubs depends on a plugged-in algorithm. The algorithm is a serializable object that is passed along with the stub collection.
Bill Venners: It's like a strategy.
Sean Neville: Exactly. So the algorithm is established ahead of time during the remote object development. The algorithm is pluggable and wide-open to being changed, but in cases like EJB, it turns out that really only a handful of specific algorithms tend to be desirable. For stateful session beans, for example, we provide a very specific algorithm.
Remote object clustering is an aspect of the EJB invocation mechanism. When you deploy an EJB into a container and the client looks up that EJB via JNDI, if the client is remote then the client actually gets a remote reference to that container -- in our case, this remote container reference is actually an invocation broker, a remote object sitting in the container. Locally, it appears that the client has an object that implements all the methods implemented in the EJB, but actually the client has a stub to the container. Every time the client invokes a method, the invocation is passed to the container and a series of aspects, or intercepting filters, manipulate the call. Eventually the invocation arrives at the correct instance, the bean implementation instance, and the invocation result then flows back through the container on the way back to the remote client.
That invocation broker, which you could think of as the front controller for the EJB container, is a remote object. So if we could apply our generic remote object clustering aspect to that particular object, then we would automatically have clusterable containers.
Bill Venners: I understand how you planned to get load balancing with the pluggable algorithm, but how did you plan to get failover?
Sean Neville: We grouped together clusterable exceptions. The algorithm, in addition to choosing the stub, would handle the invocation. The algorithm would make the call and handle certain returned exceptions. For example, a remote exception would be interpreted in a particular way. The algorithm could then failover to a different stub.
Bill Venners: So the algorithm itself could pick a different stub.
Sean Neville: That is one way it could happen. And it is the principal way it does happen. There is another way failover can occur: Our web, EJB, and web service containers, as I mentioned, are composed from a set of intercepting filters, or handlers. One of these interceptors in the EJB container is aware of clusterable invocations. So failover could be handled at that level instead, and the advantage there is that some state could be saved and a rollback possibly avoided; if you were already in the container, you could go to another if necessary in the same transaction. But it's true that failover is mostly handled in the algorithm.
Sean Neville: We started experimenting with how we would create the discovery mechanism. We had done some prototype work on discovery based on messaging, and we had built the EJB and JNDI subsystems, two sets of services we wanted to cluster.
When we were at that stage, I attended a three-day J2EE summit in August 2001, where you, Richard Oberg, and others were talking about Jini. To be honest, I had looked at Jini a long time prior, but had never really done anything with it. It was kind of out of sight, out of mind until the summit last year. Once it was in front of me again, at that summit, it was obvious that Jini had already solved the problem of automatic discovery using multicast. It also solved some potential problems we didn't know we would face.
Bill Venners: Like what?
Sean Neville: Well, packet storm was one. We naively didn't even consider that an issue.
So then I started playing with Jini again. I ripped out our discovery mechanism, and in its place I established a Jini lookup service that we used as a delegate inside a service I dubbed the "cluster manager."
When you fire up the JRun server, you're firing up the service microkernel that starts a list of services. These services can be nested. If a particular service starts, then all its child services start before you move on to the next service in the hierarchy. Services can be represented in an XML structure, and that is a handy way to conceptualize it: starting services is like going through a DOM (document object model) where each element represents a service. Each service has a lifecycle; so you go through the tree once, and you initialize all the services and their children. Then you go through the tree again, and you actually start all the services. That lifecycle lets us handle some interdependencies between services.
So one of those core services is this cluster manager that, when it fires itself up, internally fires up a Jini lookup service. The cluster manager can be considered a wrapper around the Jini lookup service.
Bill Venners: Is the Jini lookup service running inside the container?
Sean Neville: It is running inside the server, in the same process. It is not really running in a container, but is managed by the service microkernel and can be managed through JMX. The Jini lookup service is its own uber-service. It is a top-level service.
Once the lookup service is running, any service that happens to be a remote object can find it, because the lookup service has already started. The service can discover the lookup service and tell the lookup service it is there.
Bill Venners: Does it use the Jini multicast discovery protocol?
Sean Neville: It uses the multicast discovery protocol, and the unicast option. So you can statically specify which hosts should be in the cluster if you so desire.
Bill Venners: So if I actually have five of these servers running in the same local area network, I could find all five lookup services in all five of those hosts?
Sean Neville: That's right.
Bill Venners: Or if I just wanted to contact one, I could by mentioning the host in some XML configuration file?
Sean Neville: Exactly. Frequently, services will be nearby on the same subnet, but for whatever reason -- often operational and not technical -- multicast won't be permitted. In that case, you'd use unicast to get the nearby ones.
There's also something called the cluster group, which allows you to set up a named cluster, and join only those peers that belong to the same group. When you get to the lookup service, you query it to see if there are any remote objects similar to yourself that are interested in the same named group, and you immediately grab their stubs. You also register for events. If a remote object finds this lookup service later and it matches, you are notified and then you can grab its stub. If a previously-discovered peer changes groups, or drops out entirely, you also receive an event. We use the Jini event mechanism to implement this.
Bill Venners: When I come up, the reason I look for a Jini lookup service is because that is how I discover my peers. I look them up by type in the lookup service. I also register that I want to be notified if a new peer stub appears, and if one goes away I suppose, so I can keep up to date on what peers exist.
Sean Neville: That's exactly right. The list of stubs is constantly updated, so it stays live.
I have my peer collection, which may be updated as the cycle spins. That's what we use Jini for. From that point on, from the client invocation angle, Jini isn't involved. It is really straight RMI-IIOP from the client view. Jini is our own internal implementation mechanism for creating smart, self-organizing stubs.
The Jini Community, the central site for signers of the Jini Sun Community Source License to interact:
Information on Macromedia JRun is at:
Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.