The Artima Developer Community
Sponsored Link

Java Community News
Brian Oliver on Why Distributed Caching is Not What Spring Developers Need

1 reply on 1 page. Most recent reply: Dec 17, 2006 1:16 AM by Nati Shalom

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 1 reply on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Brian Oliver on Why Distributed Caching is Not What Spring Developers Need Posted: Dec 14, 2006 2:34 PM
Reply to this message Reply
Summary
Tangosol announced at Spring Experience 2006 tighter integration between its Coherence distributed cache and the Spring framework. Tangosol's Brian Oliver spoke with Artima about this integration, explaining how a data grid is a natural outgrowth of distributing caching, and how Tangosol's grid-aware Spring bean helps scale enterprise applications.
Advertisement

Frank Sommers: At Spring Experience earlier this month, you said that simply integrating Tangosol's Coherence distributed cache with Spring was not what Spring users really wanted. Can you explain what you meant?

Brian Oliver: We talked with the Spring developers, including Rod Johnson, about what Coherence integration with Spring would look like: Should we provide Spring users with Coherence as just a clustering technology, or offer something else. That came up because Coherence is more than just a clustered cache. That's where it started, but Coherence does a lot more than clustered caching now.

If you think about caching in general, what does caching do? It keeps a local copy [for] low-latency access to some data. The whole concept of caching is about getting to that data.

Even if you think about distributed caching, the paradigm is that an application—a client—gets data from the cache, does some processing in the client, and then puts the results back in the cache. If you have a large distributed cluster with lots of resources, the resources are effectively doing nothing—they're just storing data. So you have a lot of servers that are just shipping data around, and are not actually doing that much work.

The standard way to think about caching in Spring is that a Spring application developer uses clustered caching to store some state for an application, and scale out by [connecting] more Spring applications to that cache. The actual processing is done in those [Spring] applications, not in the cluster, even though the cluster might have considerable resources available.

Say, you have an order object for a trade, and you want to update that order. To do that safely in a typical cluster environment, you'd have to lock the order, and then get that order back into your Spring application to do the update. Suppose you want to add a new field to that order. That [operation] is done locally, and takes almost no time whatsoever. Then you put the order back in the cache, and finally unlock it.

That's exactly what you see in many of the clustered [cache] technologies, including JavaSpaces. That's the model: Take if from the space or the cluster, do some work, then put the object back in the space. In the process, you move a lot of data across the network—we figure that it takes about fourteen network hops to do this single update in most clustered cache environments. All the processing done at the cluster nodes [facilitate] that data transfer, and not the actual update.

Frank Sommers: You're saying that resources are wasted because the clustered cache acts as a large passive memory. But you also said that caching provides an application with low-latency access to data, which sounds like a good idea. How can you have the best of both worlds—distributed caching with better resource utilization?

Brian Oliver: We have a feature called entry processors. An entry processor is a command object, an implementation of the command pattern, that is shipped to the data in the cluster, and is executed directly on the data inside the cluster. It's almost like an inversion of work: Instead of bringing data to the client to do the work, you send the command to the data and do the work within the cluster. It's often described as a "datagrid."

With an entry processor in Coherence, you can take a command with some state in it, ship that command with its state into the cluster, for a particular order object, and execute that command there.

With this pattern, we're no longer locking the data. Every object in the grid has its own queue for entry processors to work on. You go from a synchronous lock model across a network to a completely lock-free model for performing updates on a grid.

With locking, suppose you have a thread that wants to update data. You really have three transactions. They're effectively in a queue, because they're waiting to acquire locks, they're waiting for each other. Instead of locking, you can just put their work in a queue.

With the standard [locking] model, to scale out processing, you have to scale out the clients, not the grid. With this [lock-free model], you scale out the processing by scaling out the grid, which is where all the computing power is anyway.

It's a standard practice in distributed computing to use lock-free models. But we built that into Coherence: If the server that's processing an entry processor dies, Coherence guarantees that that work will still be performed—you never lose any processing.

Frank Sommers: What's inside a command object?

Brian Oliver: It's Java code that is deployed to the grid already. We're just taking an object that is an instance of a command, and ship that across the wire. You typically find that the size of the object you want to perform an update on is much larger than the [command] code.

In addition to eliminating the need to ship around large objects, the lock-free model reduces the network calls required to perform an operation. Whereas the locking model uses about fourteen network hops per operation, the lock-free model uses only four. So not only do you ship a smaller object, you're also bringing that object across the network a lot fewer times.

Frank Sommers: The lock-free model sounds useful, but how does it relate to Spring?

Brian Oliver: We've come up with a concept called a datagrid bean. We introduce its interface as a standard Spring bean. This interface is in effect a proxy to, say, a Person object or a Trade object.

When you invoke this [datagrid] interface, you can specify what methods to invoke locally, and which ones to execute in the grid. For the Spring developer, a datagrid bean looks and feels like a standard bean, but in terms of performance, it can be completely distributed. The Spring developer doesn't have to have worry about maintaining locks.

Frank Sommers: How does a developer decide between local and distributed invocation of a datagrid bean method?

Brian Oliver: You first have to consider the size of the object. If you have a large object that would take several [network] packets to send, or even a megabyte-size object, and you just want to update a few fields in that object, an entry processor is the most efficient way to do that.

The other consideration is that you get to use the lock-free model. When you have millions of transactions that need to be executed, you can't just start locking objects on the network. That just doesn't work. You have to use a lock-free model, and it's pointless to have a cluster act as basically just a large shared memory and not do any work.

If you look at all the caching solutions out there, that's pretty much what they do. They keep data in an intermediate state, in local memory. They sit directly between your application's cache and your database. You're keeping stuff in memory, but don't actually do any work on that [data].

It was interesting to see how Spring developers reacted to the concept of having a data grid, and that Spring itself could be doing processing either within the data grid, or outside as one of the [grid] client applications.

When we thought hard about the motivations for what Spring developers really needed from Coherence, we realized that merely providing [easier access to] Coherence's caching was not one [of those needs]. Coherence has always worked with Spring as a distributed cache, although developers needed to do some work. Now we made that even easier. However, we also realized that what [Spring] developers needed the most was the ability to scale out by having Spring participate in a data grid, and that is what we provided with this integration.


Nati Shalom

Posts: 3
Nickname: natis
Registered: Oct, 2002

Re: Brian Oliver on Why Distributed Caching is Not What Spring Developers N Posted: Dec 17, 2006 1:16 AM
Reply to this message Reply
<p>
"..in many of the clustered [cache] technologies, including JavaSpaces. That's the model: Take if from the space or the cluster, do some work, then put the object back in the space. In the process, you move a lot of data across the network..</p>

Just to put things in the correct context - Collocation of data and code is an implementation details not a limitation of the space as a model. With some of the implementation such as ours (GigaSpaces) you can choose to trigger events where the data is just by updating the relevant data through the notify API. The notify call could be collocated with the data. With Spring this would be done through the DAO interface which also supports declarative transactions or using the Declerative-Cache.In addition to that you can use Spring Remoting to invoke methods on specific Beans based on affinity with specific data item i.e. if you invoke a method bean.getAccount() the getAccount() call will be routed to the location the account data instance resides and will be invoked on the bean which happens to be collocated with the data. This pattern exist for long time with spaces and is a core part of the Space Based Architecture (SBA). Its been part of Spring modules for a while now and is already being used in production.

I posted a short description of our spring integration under the following blog:
http://www.gigaspacesblog.com/2006/12/09/sbas-general-principles/"

The general concept behind SBA model can be found below under the following blog:
http://www.gigaspacesblog.com/2006/12/08/give-spring-some-space/


Nati S.
GigaSpaces
Write Once Scale Anywhere

Flat View: This topic has 1 reply on 1 page
Topic: Seam 1.1 Released Previous Topic   Next Topic Topic: Google Open-Sources Google Web Toolkit

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use