Weblogs Forum - Database Constraints: Use Them or Lose Them?

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Weblogs Forum
Database Constraints: Use Them or Lose Them?

56 replies on 4 pages. Most recent reply: Oct 8, 2009 9:07 AM by John Zabroski

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 56 replies on 4 pages [ 1 2 3 4 | » ]

Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Database Constraints: Use Them or Lose Them? (View in Weblogs)

Posted: Sep 23, 2009 8:00 PM

Summary
Relational databases offer a rich toolset to help ensure the integrity of data. Apart from normalized schema, however, few enterprise applications take advantage of those tools and, instead, tend to implement integrity checking inside application layers. If SQL database constraints are useful, why aren't more applications taking advantage of them?

An important trend in data management, especially on the Web, is to use relaxed SQL schemas—or to eschew the principles of relational data design altogether. A recent Artima discussion focused on the "NoSQL" movement, as it were—and the general conclusion seemed to be that relational constraints are very useful, indeed, and that developers should bypass them at their peril.

That's hardly a surprise, given that normalized relational data models came about as a way to help ensure data integrity. Users, in general, like to think of databases as authoritative sources of data: an underlying notion for popular data access methods, such as SQL queries, is that the return values represent a well-defined notion of "truth." Preserving that notion is much harder when relational constraints are relaxed, which puts most of the integrity maintenance burdens on application-level code.

While much of the discussion on NoSQL centers around database de-normalization—relaxing SQL normal forms that results in possible data redundancy—modern database management systems, as well as the SQL standard, offer many additional mechanisms to ensure data integrity. In perusing the latest edition of Joe Celko's SQL for Smarties—a highly recommended read—it became clear to me that I'm using but a small portion of SQL's power to ensure data integrity. For instance, I almost never take advantage of the SQL check mechanism when declaring database tables, and I forgo other now standard SQL facilities, too, to, in effect, keep the database clean.

Although I believe in the value of pushing as much data management as possible into a database, I tend to implement a lot of data integrity-related code in the domain and controller layers. Of course, all that code needs to be tested, too. I recently surveyed a few dozen test suites I wrote using ScalaTest, encompassing several hundred test cases, and noticed that no less than 30% of our test cases had to do with enforcing data integrity constraints at the application layer. An example of that is making sure that API-level preconditions are enforced—checking for the proper handling of null values, for instance. And if a complex object is passed as a parameter to a method, I tend to write tests to make sure the method can gracefully accommodate method parameter objects in unacceptable states.

We do all that, even though we also use Hibernate Validator in the entity layer, carefully annotating entity classes with required validation constraints. Finally, we also write a significant amount of integrity-checking code in the presentation layer—Flex, in our case—to ensure that users submit valid data to the server-side API. And that presentation-layer code has tests suites, too. In total, therefore, we encode data integrity the database, the entity layer, and in the presentation layer, with significant portions of our test cases devoted to covering integrity-checking code in the controller layer and in the presentation as well.

That's a lot of code that clearly violates the DRY principle. The closest to non-redundant integrity constraints I've seen was in ActiveRecord, Rails' O/R mapping and persistence layer. ActiveRecord annotates entity-layer classes with integrity constraints, and Rails' controller and presentation layers use that data to, for instance, display nice error messages to the user upon incorrect data entry. However, as with Hibernate Validator, ActiveRecord barely takes advantage of the underlying relational database's expressive power: Constraint-related SQL generated by both frameworks is limited mostly to foreign-key relations and non-null field declarations. The rest of the integrity checks are performed by a sort of validation layer (as part of the entity classes).

I think that in a more ideal situation the database should be the guardian of the data, including constraint enforcement. That can be achieved today by writing lots of hand-coded SQL, which may not be an effective solution because most SQL databases rely on SQL error codes to indicate constraint violations. For instance, setting a constraint so that the salary field can't accept a value less than 5000 and greater than 1000000 will cause a database to return an error code when attempting insert a record with a salary of 100—but that error code would then have to be translated into some usable value in other layers of the application. The effort involved in that translation pays few dividends, especially since it's not portable across relational database products.

Where do you think is the proper place to define data constraints? Do you agree that SQL should provide more standardized error reporting for integrity violations?

John Cowan

Posts: 36
Nickname: johnwcowan
Registered: Jul, 2006