Spring Clustering with Terracotta

A Conversation with Rod Johnson and Ari Zilka

by Frank Sommers

December 4, 2006

Summary

Terracotta's decision to open-source its clustering technology was in part driven by a new-found love between it and major open-source enterprise application tools and frameworks, such as Tomcat and Spring. Artima spoke with Spring project founder Rod Johnson and Terracotta co-founder Ari Zilka about using Terracotta to cluster Spring applications.

One of Terracotta's promises is an easier programming model for clustered applications. So easy, in fact, that in some cases clustering with Terracotta is transparent, requiring no code changes to an application. That non-intrusive clustering philosophy is similar to Spring's goal of making the development of enterprise applications less tedious.

In this interview with Artima, Spring project founder Rod Johnson and Terracotta co-founder Ari Zilka discuss how custom-scoped beans, introduced in Spring 2.0, support session clustering with Terracotta, how taking away an API can lead to more flexible code, and why home-grown frameworks and scalability don't often mix.

Frank Sommers: Developers have a wealth of options when it comes to clustering an enterprise application. What's unique about Terracotta's clustering?

Ari Zilka: Scalability and availability is why you cluster. You can't build an infinitely large machine, so you need to scale by chaining a bunch of machines together. And you don't want one machine to be a single point of failure—you would want to run any enterprise application on two machines at a minimum. All clustering gives you these operational benefits.

But Terracotta also gives a benefit to the developer: a stateful programming model—you stay closer to pure POJOs, while simultaneously running in a stateless world. You can kill any JVM, start it up again, pick up where it left off. Developers get the ease-of-use of stateful programming, and operations get the ease-of-use of stateless runtime.

Frank Sommers: That's quite a handful. Can you explain how a stateful programming model can still provide the benefits of a stateless application?

Ari Zilka: Terracotta performs heap-level clustering. We're replicating critical bits of the heap where the developer tells us to. With this approach, we can deliver consistent object identities across the cluster. As a result, object referencing still works: if you want to put an object in a map and then into a list, that object is not copied into the two data structures. You end up with very powerful domain modeling semantics that are akin to non-clustered programming.

Most of our users are concerned with the quality of their code base, the simplicity of their design, and adherence to their initial intentions. You can develop with Terracotta in a manner that's closer to how you envision the problem and the solution in the first place. In some cases it's completely drop-in and requires zero code, in some cases you have to re-design parts of your application, but in all cases you don't have to go through a whole-sale redesign. Terracotta's clustering does not change the way you build apps.

Frank Sommers: What does Terracotta bring to Spring developers?

Ari Zilka: Spring closes some gaps between what an enterprise developer needs and what pure Java provides. Spring gives you access to messaging, O/R mappers, transactionality, and so on, in a clean, Pojo-like way. Terracotta adds to that mix the ability to run an application without a big, heavy application server—a giant, monolitic container that somehow does some fancy bean-managed persistence, container-manged persistence, or session-clustering.

With Spring and Terracotta, you can write business logic in one contiguous place, then wire up to a messaging infrastructure and databases in one logical manner, and then you can get the critical state shared across copies of an application running on different VMs.

Rod Johnson: We see Spring being used in a lot of heavy-weight enterprise scenarios. It solves a lot of problems where clustering comes up as a requirements.

The clustering you get with application servers is pretty limited. Instead, a lot of our users are interested in using Spring plus a dedicated clustering solution. They enjoy that kind of a modular approach. They're assembling the components that deliver the functionality they need rather than say, "Hey, there is this monolithic thing that does some clustering, although not very well, and does some persistence, but that's not very good either, but I'm just going to swallow this because it comes as one unit."

Towards a Stateful Model

Frank Sommers: You mentioned the ability to develop with stateful beans. Stateful beans acquired a bad reputation among many developers, because they are harder to manage and scale. Why do you advocate a stateful model now?

Rod Johnson: Traditionally, in Spring the emphasis has been on the stateless programming model. In many cases, people have built with Spring the same style of applications they have built using, say, stateless session beans. That was the approach that has over time been found to scale best in application servers.

We are now seeing interest in moving towards a more stateful model. It's at a fairly early stage in the market, but we see considerable interest. That is where some of the new scoping features in Spring 2.0 come into play. You can have Spring transparently manage your bean with a custom life-cycle. If you're running on a single node, that can be done with a very simple backing store, say, a Map. If you wanted to scale out, you could use a clustering solution, such as Terracotta, behind it. That's purely a matter of configuration, and is consistent with Spring's approach: you get a consistent, simple, and productive programming model, and you can configure that in a way that makes the most of your deployment environment.

Another interesting long-term trend we're seeing is people moving some stuff out of relational databases. A few years ago it seemed that enterprises used a database as a backing store, but what we're actually seeing now, especially in really high-end enterprise environments, is an interest in trying to take some of that functionality out of the database.

Of course, if you do that, you have to solve some of the problems databases do solve. Those problems re-surface in other ways, such as the clustering problems. But you get some very interesting benefits, such as something more object-oriented—you don't have to solve the object-relational impedance mismatch—and you get something that can be many times faster, if done appropriately.

We're seeing situations where people are not necessarily assuming that every piece of data needs to be reliably stored. They don't need to store that kind of data in the database. It may well be that, although some data needs to go into the database, not all working data you would automatically in the past have put in the database needs to be stored there.

There are a range of things that need to have a really permanent record—data for warehousing or auditing, for example. But there is quite a range of things that don't. If you can remove that from automatically going into the database, then you also get a great benefit of simplicity because now you're dealing in pure Java objects.

Ari Zilka: You have what is tantamount to transient data people want keep within the application. The only reason to flush it to a database is because if they lost a JVM, they would lose that the VM's heap, the memory where that transient data is stored. So they flush things to a database for reliability or durability. You can avoid that with Terracotta, or other clustering solutions, and we see developers opening up to the possibility of using the JVM's own heap as durable memory.

Rod Johnson: People have not been very convinced about the stateful session bean model mainly because they have not been convinced that application servers provided an adequate level of reliability there. People do feel better about some of the high-end clustered caching solutions that are emerging in the market. And that is encouraging, especially for users who can afford those kinds of products. Over time we'll see those products being used by a wider audience.

Custom-Scoped Spring Beans

Frank Sommers: Spring 2.0 moved forward the art of stateful applications with custom-scoped beans. Can you explain what they are and what role they play in a clustered Spring application?

Rod Johnson: Spring has historically provided two scopes for beans. There's a singleton scope—an object whose life-time is tied to that of the owning Spring container. When the container comes across a bean definition, it holds a strong reference [to] it for the duration of the runtime.

There are a whole range of things you want modeled like that. For example, much of your critical infrastructure, your transaction manager, for instance, should have exactly that duration. And so should stateless services: if you have services that can naturally process requests with shared state, that works very well.

The other scope is the prototype scope. Every time you ask the container for an instance of a component with the name of a bean with prototype scope, you get a distinct, but identically configured, instance. If you inject a singleton bean into multiple other beans, you're really injecting the same instance. Whereas with the prototype—or non-singleton—scope, every time you inject a component with a given name, you're injecting an independent, but identically configured instance.

You can, of course, ask the container to create a prototype for you on the fly. That's quite useful if, for example, you have a short-lived object that's difficult to configure, and you want to have it constructed in a more sophisticated way than using a constructor.

In Spring 2.0, we add the notion of custom scope. Unlike the the singleton and prototype scopes, you can have a scope with an arbitrary name, and back that [scope] using an out-of-the-box implementation that Spring provides, or using a custom implementation. We have out-of-the-box implementations for HTTP session and HTTP servlet request scopes.

If you specify session scope, Spring will do the work of resolving the correct current session for the user sending the request, and give you the object for that user. So it will figure out that an object with the name foo is relevant to the user processing a particular request.

That can work either through explicit getBean() calls, and through injection, because we can use what we call a scope proxy. You can actually inject one of these scoped beans into another object, so it's completely transparent. When you refer to the object that was injected, Spring automatically resolves the correct target object.

Custom scopes is an extensible, general mechanism, and is not tied to the traditional Web application middle-tier. It can be opened up to allow a wider range of out-of-the-box possibilities, and also a greater range of custom ones, too.

A primary case for this is integration with a caching product. You can transparently have Spring resolve an object from a particular caching product, even providing some method of resolving that [object] to a particular user. How that's done with a caching product will, of course, depend on the implementation.

Frank Sommers: Let's talk a bit about implementation. What layers of a Spring application does Terracotta cluster, and how does that clustering work?

Ari Zilka: Terracotta currently clusters singleton beans and custom-scoped beans under Spring 2.0.

How session clustering is implemented is not specific to Spring, however. Spring and Terracotta interface at the sessions level. Because of Spring 2.0's support for session scope, you can easily wire your bean to use that scope, and then you can seamlessly cluster those beans via sessions.

At the end of the day, a container, such as Tomcat, manages session life-time. The session is created in the cluster by the first application server context that feeds the HTTP request that requires the session. Say, you have five servers, and access a Web application on server 1. Server 1 says, "OK, your browser has cookie ID 123. Do I have a session object in my session map for session 123? No, I don't, so I set a new session up." When that happens, the session is available to any node in the cluster. Servers 1 through 5 may see workload for your session, and the load balancer can say, "Server one is slow, let's send the next request to server 3." And server 3 says, "Do I have session 123? Yes, I do."

When the reaper thread inside Tomcat wakes up, gets an iterator to the session map, and starts blowing sessions away that haven't been accessed in some time, if your session gets removed from the session map, it's removed from the cluster as well.

That gives you completely transparent semantics in the session clustering case. Spring makes it easier to use the session scoping, and Terracotta makes sessions available across multiple machines. That's where we get into the notion of enterprise-class applications without a big, monolithic, heavy, and somewhat dissatisfying, app container that tries to be all things to all people.

Taking Away the API

Frank Sommers: Most developers want to design applications that scale. In addition to support for clustering via session-scoped beans, what's unique about Spring in supporting highly scalable applications?

Rod Johnson:: One of the key things to Spring is taking away the API. For example, when you're using Spring's custom scope concept, such as the session scope, you've abstracted away the API.

For example, if I were developing a Web application and wanted to resolve a particular object from the HTTP session, I could be doing a lookup in the session. That's certainly not a terrible thing, and has been done successfully for years. However, in Spring 2.0, I have the ability to inject into a Spring MVC controller an object with a session scope, and specify that that [object] should be proxied.

Imagine that I've got an account controller—a controller that acts on different account objects—and I want to act on the account that belongs to the current user. You can inject an account object into that controller, and Spring can do the work of giving you a proxy that you can hold a shared reference to. When you invoke the proxy, it correctly figures out which account makes the call from the HTTP session. If an account doesn't exist, it will create one and put it in the session so that you can work with it.

This means that we've taken away the API. If you take away an API, you would think you were sacrificing power, that there were things you couldn't do any more. It turns out that if you do that in a sophisticated way, however, you are increasing the range of scenarios you can run that code in, you're increasing the potential of that code.

In the Web tier, that's is merely a matter of convenience, but imagine that you wanted to work further down the stack in the middle tier, and wanted to work with objects that also were scoped to the current user's session, or some other scope boundary. Removing the API in that layer is particularly important because the middle tier shouldn't know anything about the Web tier, and certainly shouldn't depend on a Web-tier API to resolve the correct instance of the component.

With the clustered session approach, as it is abstracted away by Spring's API-less custom scope feature, you can propagate a scope as deeply as you want to, down through your application's wiring in Spring. That means the custom scope is still benefiting code that doesn't know anything about the Web tier. You can potentially deploy that application with a different scenario, with a completely different backing store for your clustered scope, and you would be able to get total re-use of your business logic. That's a good example of how removing the explicit lookup code to resolve objects from the session—taking away that API—gives you quite a lot of benefits. It increases the possibilities.

With an API, what you'd end up doing is either the naive, horrible way where your Web tier completely leaks down to the middle tier, or what a lot of people do: re-invent your own scope context and pass that thing down. That tends to be messy. It's much nicer to remove the API.

Frank Sommers:: This also speaks to using a well-designed, ready-made framework versus inventing a home-brew framework. In your experience, how do home-grown frameworks fare when it comes to scalability and clustering?

Rod Johnson:: If people are building their own home-grown, hand-wrought infrastructure, which we've seen in the past, they are going to be different between every company, and there typically will be numerous different ways of trying to achieve the same goals in similar projects.

If you sit down and try to make those solutions to interface efficiently with a product like Terracotta, it's not going to be transparent, it's going to require some code changes. And it's going to be a lot of work, and there is a chance that you miss something. If you're using Spring, it can be transparent, because the work of doing the integration is handled by the Spring developers and by the Terracotta developers.

In Spring 2.0, there are a number of integration hooks that enable quality-of-service features to be dropped in without actually changing the programming model. There are a whole range of things beneath the hood not directly experienced by users, but that really help a lot.

There is, for example, the ability to add arbitrary metadata to a bean definition, which didn't exist in the past. It's possible to add metadata to Spring's own component metadata that a product like Terracotta can use to store its information in a way that it can pull that information out anywhere else without requiring modifications to Spring. There are more lifecycle call-backs that advanced pieces of infrastructure can use to take advantage of the container's management of objects. That means that a whole bunch of things can be done transparently.

Resources

[1] Terracotta
http://www.terracottatech.com

[2] The Spring Framework
http://www.springframework.org

Talk back!

Have an opinion? Readers have already posted 5 comments about this article. Why not add yours?

About the author

Frank Sommers is a Senior Editor with Artima Developer. He also serves as chief editor of the IEEE Technical Committee on Scalable Computing's newsletter, and is an elected member of the Jini Community's Technical Advisory Committee. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld.