Java Community News - Column-Oriented Databases

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Java Community News
Column-Oriented Databases

27 replies on 2 pages. Most recent reply: Dec 7, 2007 4:33 PM by Raoul Duke

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 27 replies on 2 pages [ « | 1 2 ]

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Column-Oriented Databases

Posted: Dec 4, 2007 10:08 AM

Reply

Advertisement

> what makes
> you certain that storing column/row is somehow going to
> relax the rules of SQL and relational data management
> compared to storing it row/column?
>
> Peace,
>
> Cameron Purdy | Oracle

Didn't think I said that. OODBMS, do; to the extent that they rely on surrogate keys exclusively. Some SQL database folk make the same mistake, with the same results.

How a column database would implement RI, I haven't thought about. Looking at the two articles, I don't see that the authors did either. Both make reference to warehouse databases, in which case there's little to no demand for it.

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: Column-Oriented Databases

Posted: Dec 5, 2007 1:16 AM

Reply

@Robert Young
@"The notion of interative processing (i.e. COBOL and OO and xml) goes by the wayside."

Did you mean iterative? I assume that's what you mean, but interactive is also a one letter word difference.

Now saying "iterative processing goes by the wayside" is a provocative statement I am interested in hearing more about, especially as it relates to your examples. Please don't leave it at a one-liner.

As I am not a COBOL programmer and have never had to maintain COBOL source code, please keep it to OO & XML.

How does OO necessitate iterative processing? (What definition of OO are you using?)

How does XML necessitate iterative processing? (What can you say about the abstraction of SAX events and XML technologies like XSLT?)

You seem to be well-informed, so hopefully you are willing to share your knowledge with me. :)

I understood perfectly your point about ACID, by the way. One way to put it is that a database is a closed system, and that the rules for storing and retrieving the data are defined in the database itself. By putting rules outside that closed system, the "I" in ACID has been subverted, and the database cannot look after the safety of the data. One of the essential ingredients, integrity, has been compromised. Moreover, DDLs are designed to make it easy to maintain these rules.

Put another way, programmers recognize the value of a specification. The program must do what the spec defines. Conformance encourages re-usability later on and scope creep up front. It is well-known that code should be documented, and worth the effort, because the documentation can be written in parallel with the specification, making it a mutual time period on the schedule.

In addition, when development is iterative and incremental, the specification is going to change. Customers may not formally understand the problem domain. More likely, no single customer understands the whole problem domain. Not every business requires the same capabilities, even within a particular market. To punch home this point, ask a programmer at a large financial firm what the programmer across the floor works on and if he ever knew those requirements were part of the bigger picture.

Come to that, a DDL simplifies maintenance by managing the rules within that closed system. By going outside of it, you lose that critical advantage. The catch is how much damage is done can only be judged by what happens in practice. Failure has to be visible for infection to matter.

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Column-Oriented Databases

Posted: Dec 5, 2007 5:31 AM

Reply

> @Robert Young
> @"The notion of interative processing (i.e. COBOL and OO
> and xml) goes by the wayside."
>
> Did you mean iterative? I assume that's what you mean,
> but interactive is also a one letter word difference.
>
> Now saying "iterative processing goes by the wayside" is a
> provocative statement I am interested in hearing more
> about, especially as it relates to your examples. Please
> don't leave it at a one-liner.
>
iterative, of course. one-liner is really all that's needed: in OO/XML one "walks the tree", in application code written by the application programmer parsing the live data structure. this is generally done in application coded loops (some languages use tail recursion syntax; which degenerates into loops).

in COBOL, one reads a "record" from a "file", then does some processing. repeat.

one does not do that with the RM, or even SQL; to quote Date: "what not how".

moreover, to the extent that that XML attempts to implement "relations" with the ID/IDREF notation, it is left to the application programmer to keep track. and this is global to the document, not specific to the attributes. Bachman's Turing Award speech was titled (IIRC) "The Programmer as Navigator".

to put it concretely: if one had the BOM for a 777 in XML there would be 13,457 nodes for a 1"x1/4"x20 stainless steel
screw. change the qualification of the steel, and you visit all 13,457 nodes. no thank you.

another favorite quote:
[Dr.] Codd had a bunch of ...fairly complicated queries, and since I'd been studying CODASYL (the language used to query navigational databases), I could imagine how those queries would have been represented in CODASYL by programs that were five pages long that would navigate through this labyrinth of pointers and stuff [XML anyone?]. Codd would sort of write them down as one-liners. ...[T]hey weren't complicated at all. I said, 'Wow.' This was kind of a conversion experience for me. I understood what the relational thing was about after that.
-- Don Chamberlin/1995

naturally, Chamberlin has gone apostate with X-stuff.

put simply, the world is naturally relational, not hierarchical. the world is naturally aggregate, not inherited. there are certainly aspects (human families) which are hierarchical; but that doesn't mean that this structure is "natural" to everything else.

there is a long literature going back a decade constituting a war over the notion that hierarchy (whether manifest as OO or XML) is natural/superior to relational as data store. to my knowledge, none of the OO/XML advocates have offered a *mathematical* model of the quality and usefulness of the relational model. just because a lot of folks can be baboozled into believing X, doesn't mean that X is true. contemporary US history is proof enough of that.

I could go on for days, but I'll leave it at that.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Column-Oriented Databases

Posted: Dec 5, 2007 7:09 AM

Reply

> in OO/XML one "walks the tree",

this is no more a required way of programming to such data than it is with relational stores.

People can choose to do navigational stuff with any data backing.

> in application
> code written by the application programmer parsing the
> live data structure. this is generally done in
> application coded loops ...
...
> one does not do that with the RM, or even SQL; to quote
> Date: "what not how".

(cough)GUI(cough)

seems to me I've seen an awful lot of application code that does exactly what you're talking about over the years, when people start needing to do something to expose the data to end users.

I agree that manipulating data can be done declaratively in SQL but could you please enlarge on how relational databases make it possible for an entire system to be developed without someone, somewhere writing loops and navigational code?

> moreover, to the extent that that XML attempts to
> implement "relations" with the ID/IDREF notation, it is
> left to the application programmer to keep track.

huh?

Now you're really lost me. ID consistency is mandated by the standard. The degree of support comes from your choice of given engine but an XML document shouldn't even validate if ID rules are broken.

> to put it concretely: if one had the BOM for a 777 in XML
> there would be 13,457 nodes for a 1"x1/4"x20 stainless
> steel screw. change the qualification of the steel, and you
> visit all 13,457 nodes. no thank you.

I'm not denying that someone can do something that stupid with an XML document but it is certainly not an inherent characteristic of them.

If you have a solid example as to how such a mass visitation is really a required activity on an XML store I'd love to be educated, seriously. Your statement, however, sounds like someone confusing a poor textual serialisation of an XML model with the actual infoset. I can write out relational databases with redundancies too ;-)

> put simply, the world is naturally relational, not
> hierarchical. the world is naturally aggregate, not
> inherited.

the world is full of messy graphs, half-truths and inaccuracies.

Relational is a model which works very well until you start taking a very close look at the garbage values people put into the content to manipulate the system into letting them get their job done.

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Column-Oriented Databases

Posted: Dec 5, 2007 7:39 AM

Reply

> > in OO/XML one "walks the tree",
>
> this is no more a required way of programming to
> such data than it is with relational stores.
>
> People can choose to do navigational stuff with any data
> backing.
>
> > in application
> > code written by the application programmer parsing the
> > live data structure. this is generally done in
> > application coded loops ...
> ...
> > one does not do that with the RM, or even SQL; to quote
> > Date: "what not how".
>
> (cough)GUI(cough)

no one, that I know, is advocating building a GUI from DB2. although one could. UML/MDA, etc. advocate building user-facing stuff from a declarative store. there's no intellectual reason why RDBMS can't serve as well as XML for that. see Andromeda (Secure Data), or Little Steps. these use the catalog to store constraints, which get translated into UI rules. if you can stand VB, Chisholm wrote a book about business rules engines. off the mark a bit, but closer than most.

>
> seems to me I've seen an awful lot of application code
> that does exactly what you're talking about over the
> years, when people start needing to do something to expose
> the data to end users.

again, not apples to apples. see Chamberlin below.

>
> I agree that manipulating data can be done declaratively
> in SQL but could you please enlarge on how relational
> databases make it possible for an entire system to
> be developed without someone, somewhere writing loops and
> navigational code?

in terms of the read/write of the datastore, see Andromeda, above.

>
>
> > moreover, to the extent that that XML attempts to
> > implement "relations" with the ID/IDREF notation, it is
> > left to the application programmer to keep track.
>
> huh?
>
> Now you're really lost me. ID consistency is mandated by
> the standard. The degree of support comes from your choice
> of given engine but an XML document shouldn't even
> validate if ID rules are broken.

unless there's been a recent change, the match is GLOBAL to the document. it is *not* a relational construct. it is surrogate key, sort of.

>
> > to put it concretely: if one had the BOM for a 777 in
> XML
> > there would be 13,457 nodes for a 1"x1/4"x20 stainless
> > steel screw. change the qualification of the steel, and
> you
> > visit all 13,457 nodes. no thank you.
>
> I'm not denying that someone can do something that stupid
> with an XML document but it is certainly not an inherent
> characteristic of them.

it is of the implementations I've had to work with. YMMV.

>
> If you have a solid example as to how such a mass
> visitation is really a required activity on an XML
> store I'd love to be educated, seriously. Your statement,
> however, sounds like someone confusing a poor textual
> serialisation of an XML model with the actual infoset. I
> can write out relational databases with redundancies too
> ;-)

unless there's been a recent change, the match is GLOBAL to the document. it is *not* a relational construct.

for a discussion of the infoset, I'll defer to Pascal. it's readily available on the net. and if you willingly
"write out relational databases with redundancies", well...

>
> > put simply, the world is naturally relational, not
> > hierarchical. the world is naturally aggregate, not
> > inherited.
>
> the world is full of messy graphs, half-truths and
> inaccuracies.
>
> Relational is a model which works very well until you
> start taking a very close look at the garbage values
> people put into the content to manipulate the system into
> letting them get their job done.

so, we should empower knuckleheads further?

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Column-Oriented Databases

Posted: Dec 5, 2007 8:03 AM

Reply

> > > @RY
> > > moreover, to the extent that that XML attempts to
> > > implement "relations" with the ID/IDREF notation, it
> > > is left to the application programmer to keep track.

> > @me
> > huh?
> >
> > Now you're really lost me. ID consistency is mandated
> > by the standard.... an XML document shouldn't even
> > validate if ID rules are broken.

> @RY
> unless there's been a recent change, the match is GLOBAL
> to the document. it is *not* a relational construct. it
> is surrogate key, sort of.

But how do you get from that statement, with which I mostly agree, to saying is left to the application programmer to keep track.?

Yes, you can choose to program with XML in languages and styles where you might be keeping track of IDs yourself but that tracking is not an inherent requirement any more than an application programmer keeping track of unique values used to provide relational integrity in an SQL-driven relational database.

> > @me
> > Relational is a model which works very well until you
> > start taking a very close look at the garbage values
> > people put into the content to manipulate the system
> into
> > letting them get their job done.
> @RY
> so, we should empower knuckleheads further?

I'm talking about what users of systems end up entering as content because of the constraints of the system. If you want to label them as knuckleheads, that's your choice. Victim is a term that comes easier to my mind in many cases, although I also have a fine agricultural vocabulary that is employed at times of cleaning up after some people. :-)

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Column-Oriented Databases

Posted: Dec 5, 2007 8:05 AM

Reply

> @Raoul Duke
> is there a way one could take this to
> some extreme by not having either rows or columns at all?
> every datum would be (tableName,row,col,value) and would
> be in random access storage with no further explicit
> structure.

that's RDF

Now we can segue off into a discussion of how RDF isn't much more than a heavily recursive relational model with just one table...

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Column-Oriented Databases

Posted: Dec 5, 2007 8:35 AM

Reply

> > @RY
> > unless there's been a recent change, the match is
> GLOBAL
> > to the document. it is *not* a relational construct.
> it
> > is surrogate key, sort of.
>
> But how do you get from that statement, with which I
> mostly agree, to saying is left to the application
> programmer to keep track.?

since there is no connection, at the element level, the programmer needs to track who refers to whom, possibly with the aid of schema (but since XML is self-describing, why do we need them :) ); otherwise an IDREF could be ID-ed in <foo> but is meant to be in <bar>. an IDREF is not restricted to match an ID in a particular element.

>
> Yes, you can choose to program with XML in languages and
> styles where you might be keeping track of IDs yourself
> but that tracking is not an inherent requirement any more
> than an application programmer keeping track of unique
> values used to provide relational integrity in an
> SQL-driven relational database.
>
> > > @me
> > > Relational is a model which works very well until you
> > > start taking a very close look at the garbage values
> > > people put into the content to manipulate the system
> > into
> > > letting them get their job done.
> > @RY
> > so, we should empower knuckleheads further?
>
> I'm talking about what users of systems end up entering as
> content because of the constraints of the system. If you
> want to label them as knuckleheads, that's your choice.
> Victim is a term that comes easier to my mind in
> many cases, although I also have a fine agricultural
> vocabulary that is employed at times of cleaning up after
> some people. :-)

my point was: if users (and there is a growing body of literature arguing for a reversal) are empowered to not only use, but define, the data; then the level of sophistication (not a synonym for elitist) in the data structure will reflect their experience as data modelers. which is typically: my boss's boss did it this way in 1965, so let's keep doing it that way. that is not a made example, trust me. admittedly, Enterprise systems more often are victims of such users. and in 1965, systems were built where screens <->files. today, the rush is to [transport image] <-> datastore.

both are wrong, in the sense that, taking MVC as a paradigm, one would not define the model from the view. or, to put it another way: the most robust datastore is characterized by 'one fact, one place, one time' (the RM mantra). from such a datastore, one can construct any view. and a view can also be a [transport image] aka XML stream. the problem is, most CS grads these days are enamoured of fancy coding tricks (that's what they'll get paid for), rather than robust datastores, which will reduce by a considerable amount the need for code, fancy or otherwise. it's not an accident that Dr. Codd was a mathemtician.

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: Column-Oriented Databases

Posted: Dec 5, 2007 11:10 AM

Reply

@Andy Dent
> > @Raoul Duke
> > is there a way one could take this to
> > some extreme by not having either rows or columns at all?
> > every datum would be (tableName,row,col,value) and would
> > be in random access storage with no further explicit
> > structure.
>
> that's RDF

More specifically, it is EAV (Entity-Attribute-Value). RDF is just an implementation of EAV.

If you are unfamiliar with EAV, then Wikipedia has a halfway decent explanation, although it is far from perfect (what do you expect from the Quantum Encyclopedia?).

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: Column-Oriented Databases

Posted: Dec 5, 2007 11:49 AM

Reply

@Robert Young
@"put simply, the world is naturally relational, not hierarchical. the world is naturally aggregate, not inherited. there are certainly aspects (human families) which are hierarchical; but that doesn't mean that this structure is 'natural' to everything else."

I'd say the world is naturally conceptual and based on incremental, iterative perception.

I was very deliberate when I said: "In addition, when development is iterative and incremental, the specification is going to change. Customers may not formally understand the problem domain. More likely, no single customer understands the whole problem domain. Not every business requires the same capabilities, even within a particular market. To punch home this point, ask a programmer at a large financial firm what the programmer across the floor works on and if he ever knew those requirements were part of the bigger picture."

As Alistair Cockburn wrote in The Unknowable and The Incommunicable chapter of Agile Software Development, the world is not kind enough to give us all the pieces. What looks like an apple core may be the beginning steps to creating interlocking yin-yangs.

What you seem to be arguing is that you have experiences that have taught you best practices for managing information. I am well-aware of Andromeda and on the mailing list; you mentioned Andromeda and its ability to generate user interfaces based upon a description of the data in the business layer. The reason this is possible is because all decisions flow from the schema.

Making the schema a marshal is not the least bit shocking. All the interesting decisions in Operating Systems are determined by the fact user-land processes exist in their own address space, and the only way for a process to communicate outside its container is to make system calls. These system calls define the enclosing system, thus defining what the user-land process can ask the kernel to do on its behalf, forcing all important questions about Operating Systems design to happen here.

Come to that,

@Robert Young
@"iterative, of course. one-liner is really all that's needed: in OO/XML one "walks the tree", in application code written by the application programmer parsing the live data structure. this is generally done in application coded loops (some languages use tail recursion syntax; which degenerates into loops)."

I'm of average to below-average intelligence. One-liners are not enough for me. For example:

@Robert Young
@"to quote Date: 'what not how'."

Why is it impossible for a hierarchical data structure to specify what and not how?

John Zabroski

Posts: 272
Nickname: zbo
Registered: Jan, 2007

Re: Column-Oriented Databases

Posted: Dec 5, 2007 1:52 PM

Reply

@Robert Young
@"both are wrong, in the sense that, taking MVC as a paradigm, one would not define the model from the view. or, to put it another way: the most robust datastore is characterized by 'one fact, one place, one time' (the RM mantra). from such a datastore, one can construct any view. and a view can also be a [transport image] aka XML stream."

Could you elaborate more on what you mean by transport image? A definitions would be instructive. I understood the rest of the above except for that. I can only guess what you mean, but I'd rather hear you say it to test my assumptions and eliminate any group think in this conversation.

@Robert Young
@"the problem is, most CS grads these days are enamored of fancy coding tricks (that's what they'll get paid for), rather than robust datastores, which will reduce by a considerable amount the need for code, fancy or otherwise. it's not an accident that Dr. Codd was a mathematician."

How is a being enamored by a fancy coding trick any different from being enamored by a provocative one-line comment without detail sentences? Seems like the correct thing to advocate is critical thinking, and provocative one-line comments are not a substitute for critical thinking. People tend to be far removed from the consequences of their actions, and don't consider putting themselves in the direct pathway of their actions, even though it is the only way to be accountable and do self-evaluation. Instead of self-evaluation of actions, people are concerned about their future and their careers. They stop thinking about writing software that changes the world, and they start thinking about learning software processes that can lead to promotions and positions as lead developers, architects, and managers.

Raoul Duke

Posts: 127
Nickname: raoulduke
Registered: Apr, 2006

Re: Column-Oriented Databases

Posted: Dec 7, 2007 4:29 PM

Reply

> @Robert Young
> @"to quote Date: 'what not how'."
> Why is it impossible for a hierarchical data structure to
> specify what and not how?

ja. slogans are sometimes koan-ically helpful, but on the other hand they hide a lot of the devil in the details. when you use SQL you are still saying how to some degree - you have to give names of tables and columns, and you have to say what operations to perform on them. is that not in some ways similar to xpath? (i know they aren't the same thing, and i'm not asking because i love xpath and hate sql or anything - i'm really still trying to build a concrete sense in my head of the similarities and differences.)

Raoul Duke

Posts: 127
Nickname: raoulduke
Registered: Apr, 2006

Re: Column-Oriented Databases

Posted: Dec 7, 2007 4:33 PM

Reply

> > is there a way one could take this to
> > some extreme by not having either rows or columns at
> all?
> that's RDF

Sorry, I wasn't clear about what I was getting at - I meant to keep the Relational semantics, to support SQL (which I know isn't 'really' relational), but to change the underlying data store, because there is the tension of row vs. column storage. What could we do to get rid of that tension?

a) we could store tuples. this would mean neither reads nor writes win, but then again maybe neither really sucks a lot, either.

b) we could have a row store db that constantly wrote out to a column store db. this obviously has all sorts of issues of delay and synchronization and just raw storage space, but would i guess give readers good perf, and writers good perf. (as long as they weren't trying to compare results, what with the pipeline delay.)

Flat View: This topic has 27 replies on 2 pages [ « | 1 2 ]

Previous Topic

Next Topic

Sponsored Links

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use