The Artima Developer Community
Sponsored Link

Artima Developer Spotlight Forum
Database Denormalization and the NoSQL Movement

53 replies on 4 pages. Most recent reply: Sep 28, 2009 8:56 AM by John Zabroski

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 53 replies on 4 pages [ 1 2 3 4 | » ]
Bill Venners

Posts: 2284
Nickname: bv
Registered: Jan, 2002

Database Denormalization and the NoSQL Movement Posted: Sep 10, 2009 9:57 PM
Reply to this message Reply
Advertisement

In Building Scalable Databases: Denormalization, the NoSQL Movement, and Digg, Dare Obasanjo discusses the trend towards denormalized data, driven by the expense of querying data that includes large social graphs. Normalization is a technique of reducing or eliminating redundant data, resulting in a database that is better optimized for ad-hoc queries and less likely to become inconsistent when modified. Denormalization takes a database in the other direction, adding redundancy to optimize the database for reads.

Obasanjo claims the driving force behind this trend is the social features of today's web sites:

Today, lots of Web applications have "social" features. A consequence of this is that whenever I look at content or a user in that service, there is always additional content from other users that also needs to be pulled in to page. When you visit the typical profile on a social network like Facebook or MySpace, data for all the people that are friends with that user needs to be pulled in. Or when you visit a shared bookmark on del.icio.us you need data for all the users who have tagged and bookmarked that URL as well. Performing a query across the entire user base for "all the users who are friends with Robert Scoble" or "all the users who have bookmarked this blog link" is expensive even with caching. It is orders of magnitude faster to return the data if it is precalculated and all written to the same place.

Obasanjo describes the consequences of denormalization as

  • Storage costs increase
  • Fixing data inconsistency becomes the job of the application

He describes the trend among web developers towards using non-relational database the "NoSQL movement":

If you’re a web developer interested in building large scale applications, it doesn’t take long in reading the various best practices on getting Web applications to scale such as practicing database sharding or eschewing transactions before it begins to sound like all the advice you are getting is about ignoring or abusing the key features that define a modern relational database system. Taken to its logical extreme all you really need is a key/value or tuple store that supports some level of query functionality and has decent persistence semantics. Thus the NoSQL movement was borne.
The No-SQL movement is a used to describe the increasing usage of non-relational databases among Web developers. This approach has initially pioneered by large scale Web companies like Facebook (Cassandra), Amazon (Dynamo) & Google (BigTable) but now is finding its way down to smaller sites like Digg. Unlike relational databases, there is a yet to be a solid technical definition of what it means for a product to be a "NoSQL" database aside from the fact that it isn't a relational database. Commonalities include lack of fixed schemas and limited support for rich querying.

What do you think of this idea of a NoSQL movement? To what extent do you think this trend is overhyped because the large web sites that need to denormalize are well known, successful sites? Do you plan to use a key/value store instead of a relational database in the near future? If so, why did you decide to go that route?


Andre Bogus

Posts: 4
Nickname: andrebogus
Registered: Nov, 2008

Re: Database Denormalization and the NoSQL Movement Posted: Sep 11, 2009 12:41 AM
Reply to this message Reply
Many a programmer has seen the integrity requirements of relational databases not as the safety net they are, but as a bureaucratic nagging machine. Obviously, there has to be a better way - but the database can not know the requirements of the application, so the application must do the job of ensuring the level of integrity it needs.

On one hand, this smells of Not Invented Here, as the application tends to rewrite many things that a SQL database would already have covered, but on the other hand, current SQL databases do not allow for specifying the needed level of integrity.

I'd guess that we will see some back-and-forth in the database space until a middle ground will emerge.

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Database Denormalization and the NoSQL Movement Posted: Sep 11, 2009 8:40 PM
Reply to this message Reply
> but the database can not know the
> requirements of the application, so the application must
> do the job of ensuring the level of integrity it needs.

Of course it can; that it does is what Dr. Codd proved, that the integrity of the data depends on the data. The assertion derives from the failure of coders to understand data. In the goode olde days of COBOL, FORTRAN, and files only one "application" dealt with one set of "data files". And thus came the Babel of redundant data systems. Dr. Codd figured out how to solve the problem, but today's KiddieKoders have decided that what he figured out doesn't apply to them. They want to relive the hell of the 1960's. Oh well. I guess they want to get paid by the line.

To answer the question, if the data is BCNF, with proper constraints, then any application can, optionally, provide client editing. This can be done fully dynamically, reading the catalog and shipping constraints; more commonly the client side code (html/javascript/ajax/whatever) is generated from the catalog. Andromeda and sprox are two examples.

With SSD and multi-core/processor machines, there is no intelligent reason to not build from BCNF data.

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Re: Database Denormalization and the NoSQL Movement Posted: Sep 11, 2009 10:44 PM
Reply to this message Reply
> > but the database can not know the
> > requirements of the application, so the application
> must
> > do the job of ensuring the level of integrity it needs.
>
> Of course it can; that it does is what Dr. Codd proved,
> that the integrity of the data depends on the data. The
> assertion derives from the failure of coders to understand
> data. In the goode olde days of COBOL, FORTRAN, and files
> only one "application" dealt with one set of "data files".
> And thus came the Babel of redundant data systems. Dr.
> . Codd figured out how to solve the problem, but today's
> KiddieKoders have decided that what he figured out doesn't
> apply to them. They want to relive the hell of the
> 1960's. Oh well. I guess they want to get paid by the
> line.
>
> To answer the question, if the data is BCNF, with proper
> constraints, then any application can, optionally, provide
> client editing. This can be done fully dynamically,
> reading the catalog and shipping constraints; more
> commonly the client side code
> (html/javascript/ajax/whatever) is generated from the
> catalog. Andromeda and sprox are two examples.
>
> With SSD and multi-core/processor machines, there is no
> intelligent reason to not build from BCNF data.

Amen.

Wolfgang Lipp

Posts: 17
Nickname: wolf2005
Registered: Sep, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 13, 2009 6:42 AM
Reply to this message Reply
i do not now use a terrible lot of very denormalized data, nor do i employ a lot of nosql dbs such as couchdbs; i believe the latter do heavily suffer from their values not being properly searchable. a proper relational database, it is true, is sort of a pita as for all the buerocracy that cometh with it; however, you also get to have a highly searchable base of data when not eschewing this hurdle.

i have, personally, even come to abandon any and all sql/oop object/relational mapping schemas, such as sqlalchemy and the likes. it sounds like a good idea, but in practice i find these orms to be in my way even with the simplest of tasks. the methods may differ, but i get the yikes when i see how complicated and convoluted it is to establish a connection to my data and handle it using that sqlalchemy funnel. brr.

what is wrong in this whole picture is, in my opinion, very simply that, generally speaking, in the oop world, people make the mistake of puttting their business data inside objects. and i think you're not supposed to do that. everybody does it thesse days, and i believe it is fundamentally wrong, at least when done in the naive object-oriented-programming-101-textbook way.

the way i do it now is i have cat objects with nothing but (1) the general, json-ifiable data on the kittens, a (2) generalized database that can, ideally, handle very data not only on cats, but also dogs, and so, in a single table, and (3) a CAT library (alongside with a DOG library etc) with all the methods and ways to handle felines (canines, what have you).

many will shout treason at this point but i am ready to bear that.

somehow at some point i broke the link that is supposed to be so close between the data structure and the ways to handle it, and ever since then, classical oop has been looking so broken to me. i am still looking for ways to bring the data and their methods closer together, but for the time being, i am quite happy tih the state of affairs.

Kay Schluehr

Posts: 302
Nickname: schluehk
Registered: Jan, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 13, 2009 7:59 AM
Reply to this message Reply
> i have, personally, even come to abandon any and all
> sql/oop object/relational mapping schemas, such as
> sqlalchemy and the likes. it sounds like a good idea, but
> in practice i find these orms to be in my way even with
> the simplest of tasks. the methods may differ, but i get
> the yikes when i see how complicated and convoluted it is
> to establish a connection to my data and handle it using
> that sqlalchemy funnel. brr.

Can you be a bit more precise? I do not want to suppress unconditional whining - we are all just human beings - but I'm currently interested in Hibernate as a domain specific programming language and I wonder what makes people actually desperate?

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 13, 2009 11:25 AM
Reply to this message Reply
> Can you be a bit more precise? I do not want to suppress
> unconditional whining - we are all just human beings - but
> I'm currently interested in Hibernate as a domain specific
> programming language and I wonder what makes people
> actually desperate?

I'm not exactly sure what you are looking for but I'm working with Hibernate right now and I find it to be extremely confining. It's just a bad idea to try to build your object model based on years of experience and try to use hibernate at the same time.

As a concrete example of this, I've learned from experience that when associating objects, it's best to keep things uni-directional and top down. This eliminates many opportunities for space leak issues. Hibernate, however, is design such that it expects that you will at least have links from children to parents and maybe also from parents to children. It's possible to make things work with only references to children from parents but then cannot use many features of the framework.

Another thing is that I want to think that hibernate is a tool that transforms my objects into persisted data but it's not. It does that, but that's only a tiny part of it's design. It really treats objects as if they are the persisted data. If you don't think about the objects that way, you will quickly run into issues. For example, if you modify the state of an object outside a transaction scope, you might think that change would be ignored by hibernate. But that's not the case. You can try to retrieve the persisted state for that object again and it will first persist the previous change (during a read, no less.)

Josh Long

Posts: 9
Nickname: joshlong
Registered: Dec, 2008

Re: Database Denormalization and the NoSQL Movement Posted: Sep 13, 2009 1:20 PM
Reply to this message Reply
> > > but the database can not know the
> > > requirements of the application, so the application
> > must
> > > do the job of ensuring the level of integrity it
> needs.
> >
> > Of course it can; that it does is what Dr. Codd proved,
> > that the integrity of the data depends on the data. The
> > assertion derives from the failure of coders to
> understand
> > data. In the goode olde days of COBOL, FORTRAN, and
> files
> > only one "application" dealt with one set of "data
> files".
> > And thus came the Babel of redundant data systems. Dr.
> > . Codd figured out how to solve the problem, but
> today's
> > KiddieKoders have decided that what he figured out
> doesn't
> > apply to them. They want to relive the hell of the
> > 1960's. Oh well. I guess they want to get paid by the
> > line.
> >
> > To answer the question, if the data is BCNF, with
> proper
> > constraints, then any application can, optionally,
> provide
> > client editing. This can be done fully dynamically,
> > reading the catalog and shipping constraints; more
> > commonly the client side code
> > (html/javascript/ajax/whatever) is generated from the
> > catalog. Andromeda and sprox are two examples.
> >
> > With SSD and multi-core/processor machines, there is no
> > intelligent reason to not build from BCNF data.
>
> Amen.

+1 throwing out constraints and normalization for the speed gains only a few sites will ever grow large enough to need and for which there are, at this point, reasonable alternatives, is terrifying. Brings to mind babies and bathwater, premature optimization, and all that..

Kay Schluehr

Posts: 302
Nickname: schluehk
Registered: Jan, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 13, 2009 11:33 PM
Reply to this message Reply
> I'm not exactly sure what you are looking for but I'm
> working with Hibernate right now and I find it to be
> extremely confining.

I'm looking for the big picture, which goes easily out of sight.

I do think there are three major aspects. The first one is that of an ORM as a plain persistence engine. At that stage the mapping is still invisible.

We start with a couple of entities which are related to each other basically through membership ( "domain model" ). Then we persist, retrieve and delete them. In order to perform a certain set of operations and guarantee invariants one has to declare annotations on entities and their fields ( alternatively on getters, something I do ignore ). This is mostly easy going because of limited expressivity on the side of objects.

So far everything is situated on a high level of abstraction and the database target remains completely opaque. We don't even have to know that there is an RDBMS in the background and how the schemas look like. Ideally we could also use a key-value data-store with or rather without Edgar Codd's consent or that of his spiritualist medium Robert Young.

Next we take the actual DB schema into account and create the mapping. We start to use @Table, @Column, @JoinTable and other annotations. Ideally this is expressed on a highly formalized level as well: relations as a set of named + typed tuples of equal length, linked through specified fields. Objects on the other side with their tree shaped structures including containers.

Finally Hibernate defines HQL and annotations containing SQL statements and so on. This confuses me and I have no idea why all of this is necessary and how it intersects with everything else the Hn developers defined. Maybe Wolfgang Lipp is right and real men just use plain JDBC when the Hibernate developers are going to reinvent it anyway?

Achilleas Margaritis

Posts: 674
Nickname: achilleas
Registered: Feb, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 14, 2009 5:07 AM
Reply to this message Reply
3 times we tried in our company to use Hibernate, and in all our tries we ended up scratching our heads about Hibernate. We reverted to plain old SQL (using prepared statements), and the product worked like a charm. Why do we need an ORM? it never sits well with any of our experienced database developers, and it seems quite a burden to me as well.

Regarding the database denormalization issue, I have always followed the rules, i.e. every database I've worked on is normalized, but I see the reason why denormalization is required sometimes. But is it good to have your primary tables be denormalized? isn't it better to have secondary tables with denormalized data copied from the primary tables? you don't lose any advantages from normalization then, and denormalized tables can always be rebuilt from normalized ones.

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Database Denormalization and the NoSQL Movement Posted: Sep 14, 2009 6:33 AM
Reply to this message Reply
> Ideally
> we could also use a key-value data-store with or rather
> without Edgar Codd's consent or that of his spiritualist
> medium Robert Young.

1) Chris Date is the medium, not I
2) Those who ignore history are condemned to repeat it

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 14, 2009 6:55 AM
Reply to this message Reply
> So far everything is situated on a high level of
> abstraction and the database target remains completely
> opaque. We don't even have to know that there is an RDBMS
> in the background and how the schemas look like. Ideally
> we could also use a key-value data-store with or rather
> without Edgar Codd's consent or that of his spiritualist
> medium Robert Young.

I'm not so sure about that. You could start out that way and pretend that the database doesn't matter but I think you'll generally be disabused of that notion during step 2.

> Finally Hibernate defines HQL and annotations containing
> SQL statements and so on. This confuses me and I have no
> idea why all of this is necessary and how it intersects
> with everything else the Hn developers defined. Maybe
> Wolfgang Lipp is right and real men just use plain JDBC
> when the Hibernate developers are going to reinvent it
> anyway?

People tend to ignore the Criteris query API which is actually one of my favorite parts of hibernate. In general, though, these query features are needed because there needs to be a way to find the objects you need. It's easy to get the idea that Hibernate is about abstracting away the database. But it really isn't. In a nutshell, it's basically a different way to communicate with a database + object caching.

In some ways it is convenient. In others, it is not. For example, the notion of a rollback is much weaker in Hibernate. If you rollback a transaction, your objects may still contain uncommitted changes. It's up to the developer to get those changes out of the cache or they will be committed later in some other transaction.

Kay Schluehr

Posts: 302
Nickname: schluehk
Registered: Jan, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 14, 2009 10:51 AM
Reply to this message Reply
> 1) Chris Date is the medium, not I

Not so modest, Robert. There is enough space for many Assemblies of Codd.

> 2) Those who ignore history are condemned to repeat it

This shouldn't be your problem.

nes

Posts: 137
Nickname: nn
Registered: Jul, 2004

Re: Database Denormalization and the NoSQL Movement Posted: Sep 14, 2009 12:24 PM
Reply to this message Reply
I love the querying power of SQL databases; nothing IMO comes close to SQL queries on a modern DB. That said, SQL DBs are not speed daemons: table scans on single columns, multiple joins and aggregates can be slow. But unless speed is an issue I would go SQL.

I remember creating a denormalized table with the current total balance per client, because calculating the current balance based on all transactions for a client was just too slow.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Database Denormalization and the NoSQL Movement Posted: Sep 14, 2009 12:46 PM
Reply to this message Reply
> I love the querying power of SQL databases; nothing IMO
> comes close to SQL queries on a modern DB. That said, SQL
> DBs are not speed daemons: table scans on single columns,
> multiple joins and aggregates can be slow. But unless
> speed is an issue I would go SQL.

Doesn't using proper indexes and foreign keys address this? my experience is that straight queries using JDBC are an order of magnitude faster than Hibernate.

Flat View: This topic has 53 replies on 4 pages [ 1  2  3  4 | » ]
Topic: Scala's Stackable Trait Pattern Previous Topic   Next Topic Topic: Grand Central Dispatch: Apple's OS-Based Approach to Multicore Programming

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use