|
Re: The Problem with Programming
|
Posted: Dec 1, 2006 2:32 PM
|
|
> You know better than anyone else that the general issues > of caching are not primarily a language thing. Neither > Java nor C# make this a no-op or you would be out of > business. There are garbage collectors for C++. There are > caching frameworks for C++, so those tens of thousands of > lines of code have already been written for reuse. You > know very well, that garbage collection makes the most > frequent and most simple cases easier, but it doesn't > solve all the problems connected to the management of > cached resources.
In this case, I was referring specifically to the massive amount of memory management that I had to do to avoid leaks for long-lived objects whose "scope of use" was defined completely externally. That is the type of problem that a garbage-collecting environment (let's not focus on the language for a second) can make very simple to deal with.
> Java takes that > power away from us and that's one reason why the memory > consumption of many Java apps is so outrageous. That's why > the garbage collector needs to use processor cycles and a > largeish memory cache to manage a huge number of heap > allocated objects that should really have been allocated > on the stack.
First, I do agree that Java (language, environment) takes power away, but what power does it take away? It takes away the power to excplitly use the stack, explicitly free allocations (and with objects having the corresponding call to a destructor made synchronously), and to explicitly pack memory in the same manner as one could efficiently do in C++. In the case that I was working with, I could "retrieve" a half million rows of data (from a hard-coded test driver) and realize a virtual "set" of data in memory to manage those retrieved results, and I could do that in a very few MB of memory (i.e. for the structures themselves). In Java, the same would be no less than 8MB, since each "row" would have to be a separately allocated object, and if that internally had any object state, it would be at least 16MB or 24MB (one or two objects respectively). So when you are dealing with large "arrays of structs", or things that benefit from explicit stack allocation, then Java is much less efficient (both space and time) than C++, and yes that will directly contribute to GC cycles being burnt (500,000 actual objects to manage instead of a hand-full of pointers representing block-allocated regions to hold 500,000 row structures).
A few points to bring us back to reality, though:
1) The approach in C++ can be much more efficient -- that is true. However, many applications written in C++ will use "new" roughly the same number of times that the application would have used it in Java, because (IMHO, from my own experience) it's hard work to do large-scale optimizations like the one I described.
2) VMs are getting better at using stack allocation. With a little foresight into the VM/language design, even pointers and arrays of structs (two things that Java for example is a non-player with) would be possible, but not so much as language dictates as much as natural VM optimizations.
3) 99% of objects could probably be stack-allocated. I had no problem with that in C++, and VMs for language like Java will likely have no problem with that in the future. The problem is the other 1%, which accounts for much of the complexity in large systems built with languages like C++.
> Using Java for software that processes a lot of data in > memory is simply not productive. It's much easier to > optimise memory consumption with C/C++ and there is a > whole range of software where optimisation of memory use > is crucial.
We do applications with tens and hundreds of gigabytes of data in memory, and we do it today in Java. It is not _easy_ in Java (we use distributed computing to accomplish it, by partitioning it over a grid-based finite state machine).
> There aren't many widely used DBMS or BI > engines written in Java, and that's for a reason.
Yes, because there were great DBMSs long before Java existed. They are not written in C++, but in C, which is what the primary systems language was when they were written.
As far as BI goes, quite a bit is done in Java, but in many cases it is done in C++, which is what the primary systems language was when they were written.
> Java is unfortunately somwhere in the middle. It's > a bad systems programming language because it takes so > much control over resources away from you. And it's a > mediocre applications language because it doesn't have > many features for high level abstraction that languages > like Python do have, AND most surprisingly, it has > absolutely awful string processing capabilities.
I generally agree. Java isn't quite a systems language (too high level) and isn't quite an apps language (too low level). Likewise, C++ was never an apps language (way too low level) but made a decent systems language.
I do enjoy Java and think it (not just the language, but the VM approach and the wide range of libs) is a good fit for that "middle 98%" of programming. OTOH I'm not emotionally tied to Java, so I have no problem if something better replaces it. (And sorry Todd, that probably won't be Smalltalk. ;-)
Peace,
Cameron Purdy http://www.tangosol.com/
|
|