Registered: Sep, 2003
Re: Database Denormalization and the NoSQL Movement
Posted: Sep 23, 2009 7:00 AM
> But there are two tables, so I say two.
Again, when stored on SSD, no matter what the join type (nested loop, sort/merge, etc.) retrieving the parts of the Full Order happens all at once for each Order; it's as if the Order is stored "together" since physical location of bytes on the dies is irrelevant. In fact, the location of bytes is always changing as the controller moves data for wear leveling.
> > Since
> > this is SSD, running an a multi-core/processor machine,
> > the engine retrieves the Order and Order Items in
> > parallel, merging on output.
> This sounds like two physical retrieves, one for the order
> and one for the items. This is if you are lucky and the
> items for the order clustered together.
Clustering is not relevant on the SSD; all accesses are equally timed, random or sequential doesn't matter. This is one of the advantages to the SSD. The controller routinely moves data about on the dies for wear leveling purposes; you never know from one microsecond to the next where the physical location of bytes are, and it doesn't matter.
> Plus an operation
> to connect the two. Of course since this is an SQL
> database this is a BTree and you are jumping around index
> pages too.
> Plus there is the merge operation. And SQL databases are
> good at all this, but there is a reason they are the
> slowest things in most applications.
They are slow due to access when you Trust only in the Rust(tm). Data retrieval in SSD changes the equation.
> I hope you now agree that storing Orders and Order Items
> independently requires two retrieves, rather than one, if
> they were stored together.
I don't agree when run on an SSD multi-machine with an industrial strength engine (MySql is not). For such a machine and such an engine, a query plan for a join picks up the Items with each Order simultaneously. Again, with SSD, there is no guarantee that data will be stored in a clustered fashion in the silicon. What the controller presents to the OS and application is not what's on the silicon. The redirection of bad blocks in HDD is a simplified version of what goes on with SSD. With SSD, the controller moves data as needed.
> Why can't I have the single retrieve physical model share
> the same logical model as other physical implementations,
> where for example the order items are stored separately
> from orders.
If you implement "single retrieve physical model", by which I assume you mean a de/un normalized "row" with Order/Order Item columns for each Item, you get all the problems of flat files. So no, you can't have it both ways.
> People do de-normalize the physical model for read
> performance. The promise of the relation model is that
> you can do optimization of the physical model and clients
> don't change because the logical model does not change.
> The SQL products don't support this kind of thing well.
> . You could do is create a materialized view with the
> de-normalized model and if SQL optimizer would use it that
> would solve the problem. I don't think the SQL optimizers
> do that.
All of this is based on the assumption of the Trust Only on Rust(tm) stored paradigm. The SSD paradigm is different. There is no join penalty. So, nothing to be gained for Read Only Access of Order/Order Item, whether as base table or materialized view.
> I am saying:
> 1. SQL database are not relational in significant ways.
> 2. SQL database are useful
> 3. Therefore some non relational databases are useful
Your therefore does not follow from your axioms. What follows from your axioms: build a truly relational language to replace SQL (there are).
What coders find useful about xml/flat structures is the same as COBOL coders did with IMS in 1966: they can embed ONE specific access path useful to the code in the data structure. That is not a reason to abandon an 85% relational solution (SQL). With SSD (and CTE supportive databases), there is no longer any reason to bow to xml/flat data storage. And this is apples and oranges, anyway.
The failure of SQL to be faithful to Codd is not what drives coders to xml, or what motivates the join penalty (so called) with Trust Only the Rust(tm) storage. Not writing joins (SQL language or otherwise) is a reaction to a physical implementation failure, which no longer applies with SSD multi- machines. Coders flocked to xml because it meant they could wrest control of the data into each application's own little silo, just as COBOL coders had it in the 1960's with IMS. There was no intellectual decision, only a hegemonic one. That, and few if any coders have ever read Codd or Date or Gray, much less understood them. Try Weikum and Vossen for a true mind bender.