Artima Developer Spotlight Forum - Oracle's Cameron Purdy on Coherence 3.4 and the Near Real-Time Enterprise

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Artima Developer Spotlight Forum
Oracle's Cameron Purdy on Coherence 3.4 and the Near Real-Time Enterprise

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Oracle's Cameron Purdy on Coherence 3.4 and the Near Real-Time Enterprise

Posted: Nov 3, 2008 12:11 PM

While enterprises traditionally relied on batch-style processing to analyze large amounts of information, such analysis can now be performed near real-time, explains Oracle's Cameron Purdy in a recent interview with Artima. Near real-time data processing capability allows data change to propagate very quickly through applications an enterprise.

A key enabler of near real-time data processing is a shared data source, such as Oracle's Coherence datagrid. As manager of the Coherence team, Purdy highlights in the interview the most recent Coherence features, such as the new C++ client API, datagrid triggers, data streaming, and a pluggable object format:

We've been working on a C++ version of Coherence, and it's now part of our latest, 3.4, release. Just like we support C# and Visual Basic on .NET, and just as we support Java clients, we now support C++ clients as well.

We provide the same API we have in C# and Java, translated, of course, to C++. We offer the same capabilities across all those platforms: C and C++ applications can connect into the datagrid, access data in the datagrid, send parallel execution [requests] into the datagrid, number-crunch across the entire grid, do parallel queries, listen to events, use the continuous query capability, use near-caches, and so on.

It's a pure C++ implementation from the ground up, on top of our Extend protocol. In general, C++ is more difficult to develop in than it is in Java, so this was a big project for us. It's about 160,000 new lines of code. It's also thread-safe, and a highly concurrent system.

Thread-safety is especially important when you do event processing: If you have events coming back from the system into your C++ application, you have to have a thread [on the client] that can process those events as they come in. With our thread-safe C++ API, you not only can access the data grid from the client, but can also have data coming into the C++ application asynchronously, in real-time, as events occur in the datagrid. Event streaming and continuous queries, or data invalidation, can be very efficiently piped into the client in this manner.

Although we don't see as many new C++ applications as we saw, say, five years ago, there are still new C++ applications being built that need to use this type of capability. In addition, a lot of organizations have investments into C++ libraries, such as calculation libraries.

In high-performance computing environments, users often have large-scale processing requirements, such as banks doing end-of-day processing where the information is all rolled up, analyzed, calculated, and stored off. Traditionally, many organizations used end-of-day, batch-type, processing for those kinds of tasks. Then, a few years ago, we had customers that performed what they called intraday updates: They had updates flowing into the system every hour, or mid-day, or at some other time interval.

Their goal was always to move to a real-time environment: as information changes, those changes would trigger a sequence of other activities, workflows, or processes, to take advantage of that new information. In that sense, by "real-time" I mean a latency of maybe 5 or 10 seconds, not real-time in the sense of hard real-time.

As enterprises now try to take systems that were geared for large-scale end-of-day processing and make them perform in a more real-time manner, they need to re-use existing libraries for real-time processing. When real-time information is being managed by the datagrid, you need to be able to combine existing libraries with live information from the grid.

For example, you need events coming out of the datagrid telling the clients that the information you're basing a calculation on has changed, and that you need to pull the necessary information and re-do the calculation, and re-submit the result to other applications. A common example is a pricing application and a risk application: the output from the risk application may be the input to the pricing application. You have to base your prices on the potential cost of financial products to a bank, and those costs are determined, in part, by risk. So changes in the risk data managed by the grid need to be pushed into the pricing application in a near real-time manner.

Near real-time is important for many kinds of applications. But several systems we've been working with require even lower latency real-time behavior. For those applications, we have been able to combine the JRockit real-time VM with Coherence. The JRockit real-time VM, which came to us through the BEA acquisition, limits the impact of garbage collection on the execution of a system. It's a configuration knob that you can set, to tell the system that you don't want your garbage collection to take longer than a certain number of milliseconds.

That capability opens up an entirely new category of applications that you could never before do in Java before because Java couldn't guarantee that type of real-time behavior. TheJRockit real-time VM eliminates latency problems, while Coherence provides the ability to have highly-available systems that automatically fail over in case a server goes down, without losing any in-flight transactions or any other information. With this combination, you now have the ability to achieve very low latency, scale out, and achieve high availability, at the same time.

JRockit and Coherence show up in two new offerings from Oracle. The first one is called WebLogic Suite that combines the WebLogic application server with Coherence, and with the JRockit real-time VM. The other offering, WebLogic Application Grid, is geared more towards standalone VM environments. This product combines the datagrid edition of Coherence with the JRockit real-time VM, and with some other infrastructure management and monitoring tools. With this product, we can now offer real-time datagrid functionality. And now you can integrate that real-time capability with C++ libraries and applications as well.

Some environments have the requirements to access the same datagrid from applications written in different languages. The [Coherence] Extend protocol is what we use to actually tie Java, .NET, and C++ clients into the datagrid. The Extend protocol is aTCP/IP-based protocol that allows clients to connect into the datagrid as if the entire datagrid were a large server.

It's a client-server protocol, and we recently added load-balancing support to this protocol: you can automatically adjust how many clients are connecting to each server so you don't overwhelm particular servers. When clients connect, they're load-balanced across various servers in the datagrid. It's a very scalable model for supporting client connections.

The protocol isn't directly used by the developer. The developer uses the Coherence APIs, the caching, querying, and processing APIs. Behind those APIs, the execution that needs to be done is communicated over the Extend protocol via a series of messages. It's the same protocol and [message] format for the different languages that connect in, and this is why we can support Java, .NET, C++, all off the same, shared server-side.

The Extend protocol is message-based, and the messages use our portable object format. That is the format we use across all our supported languages. For instance, a .NET object can be shared in the datagrid, and if a Java client client looks at that object, that client sees a Java object; if a C++ application looks at it, that application sees a C++ object.

In our latest release, Coherence now supports pluggable formats. As a result, the capability to store information in a particular format is no longer limited to this one format. If you have a particular format you use for your data—for example, legacy formats—you can continue to use that format with Coherence.

On the back-end, we added triggers. A trigger gives the developer's application the ability to see information as it was before a particular process occurred, and what that information will be after the process occurs, once that information has been committed into the datagrid. Thus, triggers allow you to look at the pre-committed state of information, and to alter the changes occurring to the information, if needed.

Triggers are in-place with the information, running inside the datagrid, where that information is being managed. As the information changes, if there is a trigger on that information, the trigger is notified of the change before the change is committed, giving an opportunity to the trigger to modify or undo that change.

Triggers are useful for auditing and validation, for example: If you want to find out about changes occurring to a piece of information—when the data changed, how, and by whom—you can place auditing on that data using a trigger. Or, if you have to keep information within certain bounds, you can validate information with triggers to make sure that data doesn't change outside of those boundaries.

Finally, we've added support for streaming: If you have large chunks of information, such as large datasets, that you need to pass back and forth [between clients and the grid], we can now pass that data very efficiently without actually using huge amounts of resources on the server. Pulling, say, a gigabyte of information from various servers, we may only use a fraction of that memory to actually do that, because we break that information up into smaller chunks and stream the data to the client.

Streaming also works well with our continuous query cache capability. Continuous querying allows the client to provide a query, and the results of that query are then kept up-to-date inside the client application. Now that client can be a C++ application as well.

We first execute the query in parallel across the datagrid, using the parallel query capability of Coherence, and then combine that with an event stream on the same query. Our event stream can be tied to a particular event or, in this case, it can be tied to a query. Any time information changes that would affect the results of the query, those events are streamed down to the C++ application. The events themselves are available to the C++ application, as is the entire set of data matching the query. So that keeps an up-do-date, real-time copy of the information inside the client application and, in addition, the client can see what is changing.

What do you think of the latest Coherence features?

Previous Topic

Next Topic


	Web Artima.com