Registered: Feb, 2008
Re: How To Go Slow
Posted: Feb 12, 2008 1:32 AM
> > Each such layer will also add innecessary conversions
> > of data and data structures.
> Not necessarily. That's where generic programming shines:
> providing abstraction without penalty.
Yeah, you're right, generic programming in C++ (and also usage of non-virtual class system) allows this layer to be removed during compilation, so these abstractions give no penalty. But that's a special case in today's OO systems.
Probably it's just the whole static vs. dynamic thing. The more things you have static, you lose flexibility, but you can optimize much better.
> > There are ways out in my opinion. One is to take a look
> > at VHDL, for instance. It is a very modular language (for
> > hardware design), yet the final design is optimized as
> > a whole - there are no redundant data conversions after
> > the synthesis.
> It's comparatively easy to write such entirely declarative
> languages to address particular domains (DSLs), but I
> don't think anyone has really succeeded in making a
> general-purpose programming language that works that way.
The generic programming you mention is actually an example of paradigm that comes close to that in this respect.
> > Second approach is to decouple the data structures used
> > in the program from the actual program logic.
> Bingo. Tease out the abstract algorithm so you can reason
> about its efficiency.
Actually, I have an idea in mind based on that, below.
> > Any program
> > that uses a database essentially does that - the common
> > point is the database schema, and how exactly (with
> > what algorithms) it is implemented in the database is not
> > relevant. The actual database implementation can then
> > be optimized independently of the program, by choice of
> > indexes for example.
> Are you suggesting that programs that use databases don't
> have efficiency problems?
Well, they may have efficiency problems, but part of these problems can be easily solved by hinting the database (by normalizing the schema and creating the correct indexes). So you don't need to touch the program logic at all.
And that's my point. The program logic and the data structure implementation are somehow "decoupled" in programs that use databases (actually, it's still muddy if you use stored procedures, but that's another story).
Let me tell you more about my pet research project (which unfortunately, for the time being, competes with my other pet research projects).
I imagine a language where every data structure (especially the global ones, which cannot be described in terms of standard data structure) would be designed like a database scheme is usually designed. The actual implementation of the schemes would be thus completely decoupled from the program logic.
I even imagine that scheme implementation would be created by compiler (two manual passes - in first, compiler creates a test implementation, then you run tests of your program, you gather statistics about how the schemes are used, and in second pass, the compiler will use the gathered statistics as well as other user's input to create efficient implementation of the data structures).
I think this approach would have many advantages, most importantly, possibility to change the data structures used at will, depending on the change of requirements of program logic.
There are many interesting ideas in this direction. For example, compiler could implement the scheme in completely different way, and use the original scheme as a view of it, so the original scheme would be translated during compilation to the new scheme. You could even have older parts of program to reference the data structures in old way, while having newer parts to reference them in new way - like versioning of data structures.
Other idea is to have "virtual columns" in the scheme, which would be computed from values in other columns. The compiler would decide if it's worth to cache this value in the table, or if it should compute it at each use of this value (depending on how many inserts/reads you do and how difficult is to obtain the value).
So you see, there are bazillion possibilities, and also lot of interesting problems to overcome. However, it contradicts to current OOP paradigm (read: languages like Java, C#, Ruby, Groovy) in two fundamental ways:
1. Centralization of data structures - obviously, since I want to emphasize them. This contradicts having networks of small objects implementing the data structures.
2. The data structures are static for the run of program - if I want to optimize them, I need to know the type of usage ahead. This contradicts having dynamic language.
I generally find the idea of compiler that selects data structures for you depending on your actual needs and goals (speed, memory footprint) very intriguing, similar to the way that current compilers select instructions and registers for you instead of using assembler and doing it yourself.