> But the relational model does demand that I have the > relational logical view on top of a variety of physical > implementations, and that the relational view must not > change when the physical layer does. Developers have been > telling us what physical layer they want by the way they > de-normalize. Good support for a relational model would > allow that physical model and would hide it properly from > the queries, which should use the BCNF model.
You're solving the problem backwards. The approach you are describing happened with CODASYL in the '70s and '80s. CODASYL added a logical model on top of the physical model. Relational won. Why? Because it was a better model for a SELF-* system: self-healing, self-tuning, self-describing, etc.
In order to write queries against a BCNF table and join the results to another BCNF table via a common table expression (CTE), you'd need to build a logical model anyway. By putting the physical model first in your design considerations, you are putting optimization ahead of correctness. Moreover, you're creating a "strange loop" where the first part of your incremental, iterative analysis/design/implementation/sign-off spiral is optimization.
You can't win big going this way, that's why CODASYL lost.
Furthermore, developers can't be trusted to get most aspects of the physical layer right, such as compact storage and retrieval of elements in sparse arrays. What time has proven is that good developers can get PEEPHOLE OPTIMIZATIONS correct. So that is what a good physical schema description language allows developers to do.
Where I've observed a big shortcoming in SQL DBMSes is the maintainability of SQL itself, and DBMS vendors believing people will pay an extra $10,000 for "Visual Studio for Database Professionals" (and then another $10,000 for "Visual Studio Team System" for application layer stuff). It used be: If you are prepared to pay $10,000 dollars to Microsoft, then they will give you a product that *generally* works and makes deployment and upgrades easier. And then when you complain about an API design flaw in their SDK, you get no response. So then you start to wonder why don't you just re-frame the problem so you're not depending on Microsoft to fix your problems. This is where "Migrations" APIs have come in huge. Now you're not depending on this big, black box, disgusting IDE that's slow and stupid and has all sorts of UI bugs in addition to bugs in the core engine.
So what you really want is a HideSQL movement, where all this goop is abstracted away from you as much as possible, but you can dip in and do SQL when HideSQL can't hide something from your purview. That's what Andromeda is all about. sprox doesn't really do this. sprox has a really dumb, idiotic model when it comes to actually modeling the enterprise. The downside to Andromeda is it's written in PHP and has only one full-time developer, and the code-base is rather ancient.
The new approach to HideSQL, ironically enough, comes from Microsoft: Oslo. Reading Oslo team member Tony Bloetsch's blog is always insight, because it's like he knows all of our SQL Server performance tricks for efficiently implementing a model on a SQL DBMS.
> Developers have been > telling us what physical layer they want
Tell them to shut up. Do you tell the surgeon how to remove a tumour from your brain? Expertise matters. It's not their job.
> How about I create a materialized view which contains the > joins of orders and items. Then I create an optimizer > smart enough so that when I write the join and use the > BCNF model it uses the materialized view. Is that having > it both ways? Does not sound very difficult at all.
In the context being considered, SSD on multi-machines, none of this matters. That's the point. The logical definition and the physical definition are the same. The SSD moves data around for its own purposes, anyway.
> Building a truly relational system has been obvious for a > long time. Since it does not solve any significant > problems so it a a route to failure.
You really need to read Date. The relational model is all about solving problems with more rudimentary data stores.
> The durability of SSD's has some advantages, but in terms > of access time, they don't beat memory.
Yes they do: nearly as fast, and persistent. In the 1990's it was recognized that databases were faster if you disabled unix file caching. One cache, run by the engine, was always faster. Full DRAM database machines can, and have been, built. Just way too many $$$.
> In the context being considered, SSD on multi-machines, > none of this matters. That's the point. The logical > definition and the physical definition are the same. The > SSD moves data around for its own purposes, anyway.
However the developers doing denormalised schemas may not have been offered such hardware. With a low end database running on regular hardware, denormalising may be the only way they can meet the performance target.
> The new approach to HideSQL, ironically enough, comes from > Microsoft: Oslo. Reading Oslo team member Tony Bloetsch's > blog is always insight, because it's like he knows all of > our SQL Server performance tricks for efficiently > implementing a model on a SQL DBMS.
I knew there was another one, not java. I haven't looked at the Oslo site in nearly a year. I suppose I should go back and look, but I've returned to the *nix world, so I doubt I'd ever be able to use it.
Even java has "someting". I would call it "another one", because it invovles gluing together a lot of pieces, such as xtext, to duplicate what Oslo has.
Oslo is by far the most general solution. Even still, I dislike it. For one, important parts (to me) are closed source and proprietary, or simply not extensible. Right now certain parts are clearly implemented in puzzlingly stupid ways. Bear in mind this was simply a ~10 test run I gave it, but I know what I'm looking at when I look at it.
Probably the most significant contributio for Oslo will be Quadrant ("the tool" part) if they get it right. To replicate this in Java you need an Eclipse plug-in (xtext) and its a bit of a pain in the ass to extend Eclipse plug-ins themselves, so right there it's already slower development and we're talking about hiring armies here to solve the problem.
> However the developers doing denormalised schemas may not > have been offered such hardware. With a low end database > running on regular hardware, denormalising may be the only > way they can meet the performance target.
The thing is, I never hear people actually discuss how they are profiling and measuring their systems. I never hear them explain how their systems handle at high loads, what design decisions they did to prevent it. The only design decision seems to be in the back-end. If you're not telling me what kind of latency and propagation delay you've got on the network, etc. then you are not telling me enough details to make me think "I trust you."
Nope. Instead, I think "here is a 20-something who wants to play around with some novel code he wrote." Not going to hang multi-million dollar companies on your shoulders.
> > In the context being considered, SSD on multi-machines, > > none of this matters. That's the point. The logical > > definition and the physical definition are the same. > The > > SSD moves data around for its own purposes, anyway. > > However the developers doing denormalised schemas may not > have been offered such hardware. With a low end database > running on regular hardware, denormalising may be the only > way they can meet the performance target.
Well, depends on "normal". Heavily RAID10 arrays will go as fast as the SSD machines I speak of. They've been doing that for years, just at too high a cost. The new SSD machines bring performance into new realms.
But, yes, a single 300gig drive in a basic PC will behave badly.