Visualizing Cluster-Based Applications

An Interview with Terracotta's Ari Zilka

by Frank Sommers
May 14, 2008

Detecting concurrency-related bugs and performance bottlenecks is hard, especially on clusters consisting of a large number of nodes. In an interview with Artima, Terracotta co-founder and CTO Ari Zilka explains the importance of visualization in cluster-based applications, and introduces Terracotta's open-source cluster visualizer tool.

Terracotta is a free, open-source clustering engine that allows multiple JVMs to communicate with each other as if they were the same JVM by transparently extending the memory and locking semantics of Java.

Instead of developers having to worry about making their clustered POJOs implement Serializable, heap sharing and thread coordination across JVMs is "plugged into" the Java memory model: The Terracotta server knows what data changed in the application, and replicates only the object fields that have changed, and only to the nodes that need it, when those nodes need it. As a result, Terracotta eases application clustering burden, and allows enterprise applications to scale horizontally to a large number of nodes.

When applications are executed on clusters consisting of a large number of nodes, however, debugging concurrency-related issues becomes hard. In this interview from JavaOne 2008, Terracotta co-founder and CTO Ari Zilka explains how visualization tools help detect and correct concurrency-related bugs and performance bottlenecks. He also introduces an open-source Terracotta cluster visualizer:

Visualization is an important concept in software development. We often progress from being a junior developer to becoming a more experienced one when a senior member of a team shows us a debugger or a profiler for the first time, leading us to exclaim, "Oh, is that what my code is doing?" Engineers are generally visual in their thought processes. An application is visualized as a physical machine, gears turning together, inputs and outputs plugged together, and so on. Visualization is very natural for most engineers.

Visualizing concurrency and an application's multi-threaded execution, as well as synchronization, and the use of such resources as the CPU and memory and disk, have become extremely important because our most common deployment footprint is now clustered or scaled out. In such a situation, an application is spread across a bunch of physical machines. If we can't see what those physical machines are doing together, no matter how good a programmer we are, life starts to get really difficult. When you have to tail dozens, or even thousands, of server logs, looking for what's going on, and trying to correlate a thread on one server to another thread on a different server, that can make things very hard and unpleasant.

At Terracotta, we watched our teams tune customer applications, with the customers looking over their shoulders. Trying to figure out what was going on, and where the bottlenecks were, proved impossible. We were recompiling our application as well as the customers' applications, adding basically printlns to the system, trying to figure things out that way.

We realized what we needed was what I'd call an EKG, sort of like the medical term, where someone tapes all those electrodes onto your body, and then you get all those graphs as a result. Or you can think of a lie detector, with all the needles moving simultaneously down the page. In either case, we had a vision of a stacked graph of the system output: A person could look at that graph from top to bottom, and see the entirety of the application, CPU, network, and disk for every node in the cluster.

In the Terracotta environment, we also need to show you the queue depths in the server, as well as your application components and their characteristics, such as how many threads exist in each JVM, what messages are they running at the moment, what kind of locks are being acquired, which JVMs are blocking for which other JVMs, and which objects are migrating around the cluster, because multiple threads in different JVMs need those objects at the same time.

Eventually, we want to get to a stage when you can see servlet requests, session objects, all kinds of stuff, in just one view. We want to be able to pinpoint that "This HTTP request causes these 18 objects to fault into this particular JVM in the grid, and those 18 objects hold these 7 locks, blocking these other HTTP requests on these other nodes, and meanwhile the GC fires off on that machine, and does a full pause for 18 seconds, and that explains why the transaction throughput of the whole cluster slows down."

We're not done yet, but what we have today is designed for production use already. If you just click a button in the existing tool, the tool starts recording everything on its own, in real time, into a relational store. You can visualize that relational data immediately, or you can export it and send it to a developer from your production data center or, if you have a support agreement with us, you can send it to a Terracotta engineer to help you debug a situation. The person receiving the file does not need the application, and does not need to reproduce the problem. They can just visualize what's going on in the system based on that data.

This visualization tool is open-source, just as the entire Terracotta product is open-source. It's a Mozilla-based license, which is not viral in any way: you can modify the source code without having to give back. It's open all the way into production.

We make money selling a subscription, and with that subscription comes the Terracotta Operations Center, a set of enhanced tools for production users. Being able to tell a cluster that you need at least 8 JVMs at all times, and if the CPU utilization on any one of my nodes goes above 50%, for the cluster to start another node up, that's the sort of thing this operator console allows. The core clustering engine itself, or the visualization tools, all of that is free, and it's all going to remain free, and is available now in the Terracotta download bundle.

What do you think of the role of visualization tools in profiling and monitoring cluster-based applications?

Post your opinion in the discussion forum.


Talk back!

Have an opinion? Be the first to post a comment about this article.

About the author

Frank Sommers is Editor-in-Chief of Artima Developer. He also serves as chief editor of the IEEE Technical Committee on Scalable Computing's newsletter, and is an elected member of the Jini Community's Technical Advisory Committee. Prior to joining Artima, Frank wrote the Jiniology and Web services columns for JavaWorld.