Re: Software Development Has Stalled
Posted: Feb 24, 2010 6:52 AM
> > For example, let's say we have 3 files that we want to
> > update. At first, we read the modification timestamp of
> > the files; then we open each file, and we compare the
> > modification timestamp of each file with the original.
> > it the same with the original, then we proceed the with
> > changing the file. If we find that a file timestamp has
> > changed, then we abort the operation and we restore the
> > files to their previous state. This is a transaction
> > does not involve a relational schema.
> Who controls locking of those files? Your application?
> Do you ask the OS for a lock? Do you set a flag your
> r application (or language) understands? Do you lock the
> entire file(s)? What if you need to update one small part
> of one file? Do you lock all the other data in that file
> and all the data in the others while you fiddle the bits?
> How do you know which parts of the 3 files are related?
> ? Do you update the files in place, or do you write out a
> new copy of each file, before and after update? Where do
> you log updates (including adds and deletes)? It gets
> really messy really, really fast.
It was just an example to show you that transactions and schemas are unrelated things. In my example, the O/S provides the mechanism for locking the files and the applications use this mechanism. I am not saying that this is how information stores should work.
> If you want the updates to not trash your data, then your
> application code *must* know the meta-data of the data it
> is updating. You can embed that meta-data and logic in
> each and every application program; COBOL and java coders
> like to do that. Or you can rely on the database engine
> to ensure that any application, even a data editor,
> correctly updates the data. You really need to review G&R
> and W&V, they describe why such a path is doomed.
> Remember, it's been 50 years (yes, that long) since CICS
> S attempted to do that. IBM is still fixing it, it ain't
> an easy task.
I am not saying that meta-data are not useful. I am saying that meta-data are not related to transactions in anyway. Please don't look at the term 'transaction' only in the context of an RDBMS system.
> > The problem I referred to in my previous post regarding
> > databases is that information is organized in tables
> > that there are predefined relations between the tables.
> > This is a rigid structure that prohibits any
> > outside of the context that it was created for.
> You say that very glibly, but that doesn't change the fact
> that the hierarchic structure of IMS and xml is far more
> rigid. This is the main reason that Dr. Codd devised the
> relational model of data: the hierarchic structure (it
> wasn't and isn't a model) is fully rigid. That's just the
> way it happened.
Yes, it's far more rigid, and so far more problematic. But I am not sure why are you mentioning it. The point of this discussion is what to do to solve the problem of software development having stalled, and part of that process is to identify the problem.
> > This is a
> > real problem for software development! it's one of the
> > reasons software development has stalled, in my
> > A much better approach would be to forget tables,
> > rows and relations. Information should be stored in
> > key-value pairs; it should be the computer's task to
> > identify redundant information inside the key-value
> > store; it should also be the computer's task to
> > relations inside the information store. This is way
> > flexible than any RDBMS and much more future proof, if
> > ask me.
> > From a technical point of view, the key-value database
> > system should be free to create indexes, tables and
> > relationships,
> Creating relationships on the fly (by application code,
> presumably) alters the semantics of the data for all
> applications that use the data. *If* your application is
> the only one using the data, then it doesn't matter,
> except for the fact that all of your programs must be
> refactored to the altered semantics each time you do
No, relationships should not be created by the application code. The computer itself will discover the relationships. The relationships would then be available to all.
> > Most of the options in RDBMS are for
> > optimization anyway;
> That's just wrong. You need to review. The bits and
> pieces are there to ensure data integrity.
Suppose you had a computer with infinite speed, and a bunch of data not normalized. You could easily find the normalized form of the data each time you requested some of them by brute forcing your way, comparing each and every bit of data with another bit of data.
Please understand that I am not saying that the strong typing of data should be abolished; what I am saying is that the recognition of data relationships should be left to the system.
> > for example, primary keys, indexes
> > and relationships exist in order to have optimized
> > searching and avoid data redundancy.
> No, absolutely not; search optimization is supported by
> secondary indices, sometimes. They're there to enforce
> data integrity. Avoiding data redundancy, yes; which is
> the whole point of the relational model. Some of us think
> that's a goal worth pursuing. It's not just avoiding
> bloat, but mostly about avoiding logic errors.
Primary keys are there to use as record identifiers. If you don't have records (and rows and columns), then you don't need primary keys. The only reason primary keys exist is as record identifiers; i.e. instead of searching the whole table each time you want to locate a record, you use the identifier to explicitly refer to the method.
Again, if you had an infinitely fast computer, you wouldn't use primary keys; you would simply brute force your way through the database and compare records to records.
So, primary keys are an optimization.
In a key-value system, you don't need primary keys, because the key is the identifier.
Indexes are not related to data integrity; their only role is to increase speed of sorting and searching. Again, if you had an infinitely fast computer, you wouldn't need indexes.
Since indexing requirements depend on what queries will be done to the system, it's better to construct the indexes after the queries; which means, it's the system that should take care of indexing.
Relationships between tables exist in order to avoid data redundancy; again, in an infinitely fast computer, there would be no redundancy, because each time new data entered the system, they would be compared to all the existing data and they would not be stored twice in it if they were found to already exist in the system. So, relationships are again, an optimization.
A key-value system takes can easily take care of most data integrity needs, by selecting the appropriate key. All the other integrity requirements that cannot be expressed via a key-value system fall into the non-easily-provable category, and so they need code. RDBMS systems provide store procedures for this reason exactly.
> > I think computers
> > today are powerful enough to do this work themselves
> > not put it on the burden of the developer.
> And how can the computers do this without the meta-data?
> As I said last time: if, by way of example, xml were
> e truly self-describing (including dtd and xsd) then one
> could write an automaton to generate the application
> language source to manage arbitrary xml. Hasn't, and
> won't, happen.
I did not say that meta-data are not needed at all. Obviously, some form of meta-data is required. What I am saying is that RDBMS is one of the reasons for software development having stalled.
> In a nutshell, no multi-user datastore can exist without a
> transaction manager. Whether you choose to write one
> yourself, or leverage 40 years of research, development,
> and implementation is up to you. Just be aware that if
> you think you have a method of doing so which hasn't been
> tried (and found wanting), you're wrong. These other
> methods have all been tried, and they failed. The reason:
> transactions are math, and that has been explored
> d continually since CICS.
I do not disagree with you at all in this. In fact, I support it in more ways than you can imagine. What I am saying though is that the computer can do that math itself without assistance from the programmer, if we choose another representation of data.