Registered: Feb, 2008
Re: How To Go Slow
Posted: Feb 12, 2008 3:05 PM
> > This contradicts having networks
> > of small objects implementing the data structures.
> As I already mentioned, this represents a misunderstanding
> of OO.
No, you don't understand. There are several layers of data structures in any large program. Let me explain on example what I have in mind:
Imagine your data are like two tables of records, which are related. So you have a class A and B for respective records in those tables. The class A contains (among other things) string class S (points to its instance, to be precise). Then you have classes AA and BB for these tables perhaps, which again point to all records inside those tables. And finally, you have class C which points to AA and BB instances and contains some relation between the records in these two tables.
Now, my point is, even if all the algorithms in classes A, B, C, AA, BB would be optimal, there still could be some unoptimal remnants in the whole scheme. For example, it may happen that class S always stores length of the string, but I just don't need it anywhere in any of the other classes. It may happen that class A is implemented as an array of pointers to the instances, but it would be much more useful from cache standpoint, if it would just have the data from it's instances together. It may happen that class C always requires only a specific set of records from the classes AA and BB, so it would be useful, if they would contain also pointers to a subset of these records stored somewhere.
If I had just an empty paper and see the grand scheme of things from the start (all the classes), I would certainly define the structure of all data in memory very differently, so it could be fast for the problem at hand, class C.
The problem here is thus that the top layer (class C) only sees layer with classes AA and BB, this layer sees only layer with classes A and B, and so on.
There are so much possible inefficiencies from these 5 classes that it's amazing, and consider that large OO programs have hundreds of classes, and tens of such layers.
I believe that is actual reason why today's programs are so slow and memory-hungry. There is just no single simple piece of code that can be easily optimized. It's a big collection of such little things described above, accumulated over the layers.
That's why I am pointing to databases for inspiration - database schemas have only few layers. There are basic datatypes, maybe arrays of them, then records (table rows), tables and their relations. There are always 5 layers, and their structure and purpose is clear, and the optimization is (relatively to OOP) easy because of that.
I explain below why I think Java or C# cannot deal with this problem. I hope you see now why I feel there is need of centralized management of data structures, and compilers that are aware of the data structures in their all complexity.
> > 2. The data structures are static for the run of program
> > if I want to optimize them, I need to know the type of
> > usage ahead. This contradicts having dynamic language.
> Java and C# are static languages.
Yes, they're statically typed, but they must preserve the class structure, to be able to handle overloaded classes and for reflection. It means that they cannot optimize classes away like C++ can do (if virtual methods are not used, that is). In this sense they are more dynamic, because they cannot base their optimizations on implementation of other classes. You see, it's a dilemma - either you optimize but you have to know in advance, or you take advantage of not knowing in advance, and then it's hard to optimize.
The cases that need dynamic features (like handling of unknown inherited classes and reflection) are like 1% of all cases. It's wrong to slow down everything else (by making it impossible to optimize) just because these features are sometimes useful.