Ari Zilka: Load-testing an application is not as easy as it used to be: You want to ensure in your testing that your application scales to the loads it will need to handle in production. But many applications are deployed on large clusters. How do you replicate that environment in your testing? And to what extent do you have to replicate the actual deployment environment during load testing?
Virtualization alone only partially addresses the scalability-testing issue. If your application is completely stateless—if you always fetch data afresh from a database or some other data source, but your application itself doesn't hold onto data—then you can start up a number of virtual instances of your application server and do your testing that way. [The scalability of] some tools, such as databases, are so well-understood that developers seldom load-test such applications under conditions that mirror real-world use.
Load-Testing Stateful Applications
On other hand, the situation is completely different when you have a stateful application: Your main issue is going to be testing the scaling characteristics of your application in the presence of that data. In other words, you really need to make sure your application and your infrastructure are up to the task of managing that data as the system load increases.
At the same time, you can't develop in a stateful manner and then ask your development manager to buy a copy of each production server to load-test the application on.
Virtualization comes in handy for stateful applications, too: instead of dealing with
scp'ing tarballs around, you basically stamp out an image of your application environment. Then you can purchase just one or two servers, and carve those servers into smaller instances for load-testing.
It's a simple concept, but it works well: You don't have to buy so many boxes for your development testing environment. If you need to performance-test your next release, and unit test your current release, and continuous test your trunk, you can do all that on the same box.
To facilitate those testing situations, we have a deep partnership with VMWare. In fact, Terracotta and VMWare jointly exhibited at JavaOne, showcasing some of our mutual success cases.
With Terracotta, you're off-loading your memory [management] to the Terracotta server. As a result, we together [with VMWare] give you the ability to do on-demand, on-the-fly scaling: You just push a button to get another instance [of your application environment], and push some other button to kill an instance that's no longer needed. Adding and removing instances is completely seamless: your applications don't know that [more or less servers are available], they just scale to accommodate the demand based on load.
The Right Size Load Testing Environment
That makes load-testing easier, but the question still remains: To what degree do you have to mimic your deployment environment to have sufficient confidence in your load-testing results? If you're going to deploy an application on 100 nodes, do you really have to set up virtual instances of those 100 nodes during testing?
Uniquely with Terracotta, the answer is no: you don't have to mirror your entire production environment. That's not the case with many other [clustering] frameworks. As a system gets bigger, the frameworks have to do more and more work. You have to test the multicast side of the products, the discovery side, the load balancing, and more and more.
As a result, you could be discovering new problems if you didn't test your application in development on a similar scale to what you're going to run in production. If you only have 2 nodes during development and testing, the second [node] is likely already going to have a copy of your data, so everything will work just fine. But that won't necessarily be the case when you have 100 nodes.
I was just talking to a user of one of these proprietary [clustering] frameworks. Their problem is [that] if you use more than 50% of the Java heap [on a node], the system can become unpredictable. When you kill one node, all of that node's data was married to a backup. The backup starts electing new backups, and moving some data to new peers, and then those peers are all under heap pressure. They start gc'ing with long pauses. That can make them all go down, which then causes a cascade: Their mirrors now start looking for new backups, and so on. So a 100-node cluster tends to be fragile. But the 2-node cluster you used in development is extremely robust.
With Terracotta, the partitioning is very stable, much like memcache. You don't have to change your code to go from a 2-partition Terracotta array to a 100-node array. There is no performance problem, except that you've conquered and divided your I/O to those many ways. Each operation will still experience the same latency, but the total throughput is exactly 100x. You don't have to actually run a 100-node array for testing this.
We've proven this point in our own performance testing. In [Terracotta] 3.0, we did performance testing under multiple use-cases, from HTTP sessions to queuing. For our key benchmark, we simulated the workload of one of the largest travel Web sites. This site has in production every day something like 20,000 [transactions per second]. The transaction payload is about 5KB of data in memory being changed. So you have about a gigabit or more [of data moving] over the network.
We tested that load on an 8-server array. The real customer is using manual partitioning. Terracotta 3.0 is auto-partitioning. The auto-partitioning was linearly scalable. It took 8 Terracotta servers to get to 20,000 transactions per second being written to disk. [The price of each box] was on the order of about $5,000 to do that, so that's a very economical solution.
Our value proposition has always been that we're persistent and durable to disk. You can loose power to the data center, turn things back on, and everything in Terracotta memory is back. Eventually, certain types of payloads will become disk-bound. And once you thrash on the disk, you've lost the battle.
We use file buffers, and that allows us to need to fit [relatively little data] into Java heap as objects. We trust the file I/O to do its work.
We typically tell people with large I/O jobs to use a 2GB heap on the Terracotta server. Having a 2GB heap means you have very short, millisecond [garbage-collection] pauses. Have 10GB or more of RAM on the box, and buffer half the filesystem into memory. That really helps make the system perform better.
But if every I/O turns out to be to a page not in memory, then you need Terracotta FX, our commercial product.
FX allows you to take, say, a 100GB data set, run 10 copies of Terracotta, each with 12GB of RAM. So you'd have 10 nodes, each with 2GB of heap, plus another 10GB for files cached in the remaining RAM. That's the sort of striping a storage admin would do, but it's done on pure commodity [hardware], pure software, on simple commodity data drives. With FX, it's completely automatic to fit a 100GB dataset in memory.
What happens is that when objects get created, the objects stay as a unit: all the fields inside the object that are primitive stay together on one Terracotta stripe, but the objects round-robin across the array. You can decide to round-robin all objects: Object 1 is on server 1, and object 2 is on server 2, and so on. Or you can bundle objects together based on the transactions they were created in. For instance, if three objects are created together, those 3 could go together as a bundle to one Terracotta server.
FX is functionally identical to open-source Terracotta. You can integrate the open-source one with your app, and when it's time to load the real-world data, then you get the trial of FX and see how it helps solve your heavily I/O-bound jobs.
What do you think of Terracotta's approach to load-testing clustered applications?Post your opinion in the discussion forum.
Frank Sommers is Editor-in-Chief of Artima.