Sponsored Link •
Bill Venners: What are the problems with data binding?
Elliotte Rusty Harold: There are some general characteristics that I have seen very commonly across data-binding APIs. Not all APIs have all of these problems, but here are some things to watch out for that often cause trouble. The number one problem is the assumption that the document has a schema, of whatever form. Very many documents do not have any formal schema whatsoever—they are merely well-formed documents.
A second problem is the assumption that a document that has a schema has a schema written in the W3C XML schema language. By far the most popular XML schema language is not the W3C schema language, it's Document Type Definitions, or DTDs for short. Increasingly new projects are looking at the W3C XML schema language, deciding it's far too complex and far too baroque to actually use, and instead moving to something called RELAX NG, which was originally invented by Murata Makoto and James Clark and is now an official OASIS standard.
The third problem is the assumption that a document that has a schema is valid according to the schema, which is also often not true.
Then beyond the mere issues of schema-ness, you get into the question of what structures the schemas can represent. Most designers of data-binding APIs have come at it from the database perspective. They assume that XML documents look pretty much like tables. They're fairly flat. They're definitely not recursive. Mixed content doesn't exist. Order doesn't really matter. None of these things are true about XML documents in the general case.
In some XML documents those conditions are true. For example, in RSS they would be true. On the other hand, in Docbook, in XHTML, in scalable vector graphics, in MathML—in none of these would any of those conditions hold. Order is important. Mixed content does exist. These applications aren't very flat. They can be recursive. XML is a lot more general than a relational table. When you start trying to assume that things look like tables, or things look like objects, you're going to get yourself into trouble.
Bill Venners: Could you define mixed content?
Elliotte Rusty Harold: This sentence is very important! Now, we put the sentence I just spoke in a sentence tag—it's a sentence element. Then the word very goes into a strong element. So you begin with a start tag, "This sentence is", another start tag, the word "very", an end tag, the word "important!", and then an end tag. It's incredibly common in HTML. We've all written pages like that when you just want to make a word or a phrase in the middle of a paragraph bold or emphasized. Or you want ot make a link, but you don't want to make the entire paragraph a link. So the parent element contains both plain text and child elements—that's mixed content.
Bill Venners: What about order doesn't matter?
Elliotte Rusty Harold: It is one of the rules in SQL that the records have no fundamental order, that the fields in a record have no real order. It's just a question of how the query wants you to order the records. However, in XML in many cases, order is very important. Let's say you're just marking up normal human language like we're talking now. If we just start words switching randomly forth back and, to follow very hard becomes it. And that's true for many other cases, such as mathematical equations marked up in MathML.
Now it 's not true in all cases. You may have very database-like, very record-like XML. An RSS document, for example, contains many different items. Each item has a title, a URL, a summary. You don't really care whether the title comes before the URL or the summary. It will all be put into the right place when the document processed. But that's not all XML.
Bill Venners: In your talk you mentioned , "Seeing the world through object colored glasses." What did you mean by that?
Elliotte Rusty Harold: What I've just described is essentially seeing the world through database colored glasses—everything's a table. And yes you could probably figure a way to stuff most anything that can be represented in a computer into a table, but some things fit better than other things. A different version of the same problem is saying well everything's an object, and we can model everything as objects. And that's equally flawed, for different reasons.
For example, the number of child elements is a problem for the object world. Typically , you think when you write an employee class that the employee has a single name field, perhaps a string, a single salary field, perhaps a double. However, if you have an XML document, there are no such rules. Some employee elements will have a single name. A few employee elements may have multiple names . Somebody changed his or her name, or somebody got married, and you want to store both names. So how do you define your class so it allows both the case of one name, two names, three names, perhaps no names in some cases where the name is unknown. All this is legal in XML. But if you do a data-binding approach, where you assume there's one class that can represent an employee, you rapidly run into problems. A class can't have three separate name fields.
Come back Monday, June 2 for Part I of a conversation with Bruce Eckel about why he loves Python. I am planning to start staggering the publication of several interviews at once, to give the reader variety. The next installment of this interview with Elliotte Rusty Harold will appear on Monday, June 16. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
Elliotte Rusty Harold is author of Processing XML with Java: A Guide
to SAX, DOM, JDOM, JAXP, and TrAX, which is available on Amazon.com at:
XOM, Elliotte Rusty Harold's XML Object Model API:
Cafe au Lait: Elliotte Rusty Harold's site of Java News and Resources:
Cafe con Leche: Elliotte Rusty Harold's site of XML News and Resources:
SAX, the Simple API for XML Processing:
DOM, the W3C's Document Object Model API:
Common API for XML Pull Parsing:
Xerces Native Interface (XNI):
TrAX (Tranformation API for XML):