The big fat boxes are not exotic anymore. You can get 8 cores, 32 GB RAM machine under $5,000, which is pretty much free. A 64-bit JVM allows setting the heap size to as big as you like. Yet, it is not a good idea to put everything into a large single JVM. A better approach is to split the RAM between as set of JVMs.
Garbage collector in Java is getting better and better, and the concurrent GC [1] helps quite a bit. Yet, sooner or later a major GC kicks in and the world stops for the JVM. The bigger the heap, the longer the major GC, and there is no way around it on a stock hardware. It is impossible to predict how long it takes, but major GCs running for minutes are not uncommon. I have seen it consistently hitting 3 minutes at one customer with a 24Gb JVM.
Significant delays caused by long major GCs might be OK if the latency is not a big deal. The trouble begins if the JVM is a part of the cluster with strict latency requirements.
During the major GC the JVM is not processing requests, so the clients are blocked in waiting.
From the cluster's point of view, the JVM that stopped responding for 2 minutes may as well be dead. So, the cluster excludes the JVM from the configuration. When the JVM resumes after the major GC, it will find itself alone and cold. Then, it will have to join the cluster, recover from the disconnection and do lots of other things, further worsening the response time.
So, as much as it is tempting to use a single, large JVM per box (low maintenance is cited often), it is not going to work in the clustered environment with the requirement for low latency.
A better, time-proven approach is to split the RAM between a set of JVMs, with size 500Mb to 1000Mb, 1 JVM per CPU core:
This way the effect of major GCs is significantly reduced because the heap size of the individual JVM is much smaller. GCs are spread across the cluster, so the effect on the individual consumer is reduced proportionally. The added benefit is that this approach enables better concurrency and load balancing because a JVM is responsible for a much smaller chunk of data, which further reduces the latency
By the way, there is no need to fill *all* the physical RAM just because you have a lot. For a box with 16 cores and 32Gb of RAM the maximum number of 1Gb JVMs would be 10-16, which leaves at least 16Gb unused. An OS will be happy to use that for internal buffers and system caching.