During last week's The Server Side Java Symposium in Prague, it looks like scaling is the biggest issue. I've collected some pointers from presentations on the topic and will certainly return to it when we start doing this sort of stuff.
Prague is a beautiful city and forms an excellent conference location. This year, The Server Side Java Symposium Europe is held here (I am writing from the conference hotel) and it seems that there are two topics that keep the minds of Java developers busy: web frameworks and scaling. As I have a natural inclination towards the latter, here are some quick notes on the topic.
Scaling is something that every Java developer is going to have to deal with as hardware is converting to multi-cores. Two is the norm today, but we were told to expect a doubling of cores every 18 months or so (yup, Moore's law still seems to hold), meaning that a measly notebook will likely have 8 cores in three years, and servers will typically ship with 32 cores. In order to use that horsepower, software needs to change to parallelism which means that every Java developer has to read up on multi-threading, synchronization (and why you should avoid both), java.util.concurrent (and why you want it), etcetera.
However, it doesn't stop at a single machine. Prominently present in the vendor area were GigaSpaces and Terracotta, and some other products regularly appeared on speakers' slides as well: Gemfire from Gemstone and Oracle's Coherence, for example.
All of these products promise linear scaling of application performance to tens or hundreds of machines by distributing data and trying to co-locate data throughout the computer cluster with processing. GigaSpaces deals with it from mostly a Jini/Javaspaces perspective, Terracotta essentially gives you a distributed JVM, Gemfire is at its core an OODB, and Coherence is a distributed JCache implementation - different pedigrees, same goal: remove disk I/O from the performance equation.
John Davies had a great presentation on that subject. His customers, mostly banks, simply can't scale their databases to the load they are getting in some systems, so they essentially do without it. I liked his comparison where today we think about disk for "on-line" access and tape for backup, and we are moving to memory for "on-line" access and disk for backup. It makes sense: if you have your data distributed across hundreds of machines in various locations, chances that you will lose in-memory data are negligible and for a lot of applications it makes sense to sync stuff asynchronously to disk "just in case", but not bother doing it inside the transaction. Any of the products I mentioned will facilitate this model, and I hope to get back to this topic as we evaluate most of them for our own use (we have outgrown the size where doing the regular LAMP thing makes sense a while ago, so this looks to be our bright future).
As I said, just some pointers at the moment. The congress is still in progress and did I mention that Prague has a lot of extremely nice bars? Yup, I'm a bit hung-over so this is all I'm able to write down today :-)
By the way, thanks for Bill Venners for gracefully allowing me to blabber about this stuff on his site. I hope you will enjoy my postings - feel free to correct me if I'm wrong, I have a thick enough skin to handle your criticism.
Nice writeup Cees. Just a small comment about GigaSpaces - though we did initially start as a JavaSpaces implementation, the product today has grown beyond that. It's very much aligned with the Spring way of doing things, i.e.non-intrusiveness, annotation/xml based configuration, smart defaults, dependency injection, etc. Jini is there under the hood as a discovery mechanism and is not exposed to the user. The following link elaborates on the GigaSpaces programming model and Spring integration (we call it OpenSpaces): http://www.gigaspaces.com/wiki/display/OLH/OpenSpaces+Overview
> Coherence is a distributed JCache implementation
Coherence was built as a peer-to-peer, clustered, in-memory data management system. The JCache API work was largely based on the Coherence API because of the popularity of using Coherence to cache data in clustered (scale-out) J2EE applications.
> All of these products promise linear scaling of application > performance to tens or hundreds of machines by distributing > data and trying to co-locate data throughout the computer > cluster with processing.
Linear scale is not possible for every use case, but for the ones that are possible, Coherence does deliver linear scalability. Scalability (via partitioning) isn't actually the hard part -- it's achieving scalability in a dynamic system that maintains availability and information reliability as new servers come online or existing servers fail.