What's Wrong with XML APIs

A Conversation with Elliotte Rusty Harold, Part I

by Bill Venners
May 26, 2003

Summary
Elliotte Rusty Harold talks with Bill Venners about the five styles of XML APIs, and the problems with data-binding APIs.

Elliotte Rusty Harold is a prolific author of numerous books about Java and XML, and creator of the popular Java website Cafe au Lait and XML website Cafe con Leche. He contributed to the development of JDOM, a popular XML processing API for Java. His most recent book, Processing XML with Java, shows how to parse, manipulate, and generate XML from Java applications using several XML APIs, including SAX, DOM, and JDOM.

In September, 2002, Harold unveiled at a meeting of the New York XML SIG an XML processing API of his own design: the XOM (XML Object Model) API. On Cafe au Lait and Cafe con Leche, Harold described XOM like this:

Like DOM, JDOM, dom4j, and ElectricXML, XOM is a read/write API that represents XML documents as trees of nodes. Where XOM diverges from these models is that it strives for absolute correctness and maximum simplicity. XOM is based on more than two years' experience with JDOM development, as well as the last year's effort writing Processing XML with Java. While documenting the various APIs I found lots of things to like and not like about all the APIs, and XOM is my effort to synthesize the best features of the existing APIs while eliminating the worst.

In this interview, which is being published in weekly installments, Elliotte Rusty Harold discusses the strengths and weaknesses of the various XML processing APIs for Java, the design problems with existing APIs, and the design philosophy behind XOM. In this first installment, Harold discusses the five styles of XML APIs and the problems with data-binding XML APIs.

What's Wrong with XML APIs

Bill Venners: What is wrong with XML APIs?

Elliotte Rusty Harold: XML APIs are too complicated, too simple, or both.

Bill Venners: How can they be both too complicated and too simple?

Elliotte Rusty Harold: It depends on which XML API you're talking about. Some APIs, such as DOM, are simply wildly broken and complex in ways that they don't need to be. Other APIs are too simple in that they don't completely and correctly model XML. These APIs try to pretend that XML is simpler than it actually is.

Any reasonable XML API will have some rough spots, because XML has rough spots. Some of those rough spots are design flaws in XML, but an API shouldn't be trying to fix that. A few APIs are both too simple and too complicated at the same time. The designers tried to throw in so many features that the API became excessively complex and hard to understand simply by its sheer size, while at the same time, they didn't actually get all aspects of XML correct.

Far and away the most common problem I've seen has been with namespaces. Namespaces are a real pain. They are difficult to understand. They are poorly designed. And a lot of the APIs that are out there either deliberately or accidentally try to pretend that namespaces are something other than what they actually are.

The Five Styles of XML APIs

Bill Venners: What are the five styles of XML APIs?

Elliotte Rusty Harold: There are five styles of XML APIs. The first style—the very first one to be invented historically—is a push model. The classic example of this is SAX. Another example would be the Xerces Native Interface (XNI). A push API is a streaming API. The parser takes control. The parser reads the document, and the parser tells the client application what it sees when it sees it—a start tag, and end tag, a string of text, a comment, a processing instruction, etc.

Bill Venners: It is push because the parser is pushing data at the client program.

Elliotte Rusty Harold: Right. The parser is in control of the program. You interface with the parser with a callback interface, in SAX, a ContentHandler. That was the first API to be used for XML because that was the easiest one for parser vendors to implement. Even before SAX was designed, the first three parsers all had these sorts of push APIs.

Push APIs have a number of advantages. They are very fast. You don't need to read to the end of the document before you start working with the beginning of the document. They use very little memory, because the entire document isn't in memory at once. Instead, you just see sort of a peephole into the document, just the current thing you're looking at. Typically in a push API the work goes into building up some data structure and gradually filling it from the input document until there is enough information there to act on. If you're document is, for example, a collection of articles, a list of records, something for which there are clear chunks in the data and you can process each chunk individually, a push API works very well.

On the other hand, the whole callback interface observer design pattern can be less than ideal for some developers. This brings us to the second major style of XML API, and the newest style: a pull API. A pull API is still streaming, still very fast, still very memory efficient. But instead of the parser being in control, telling the client application when it has some new information, the client application is in control, and it asks the parser to give it the next piece of information when it wants it. But the basic advantages of a pull API are the same as with a push API, except maybe the pull API is little simpler. The implementations of the various pull APIs are not very mature yet. When you actually look at the ones out there—NekoPull, XMLPULL—they have a lot of idiosyncracies both with respect to Java and XML. That's mostly just a function of maturity. There's nothing fundamentally wrong with the idea of a pull API. They're not just fully baked yet. With a little time, in a year or two, I expect pull APIs will be a very popular style of XML parsing.

The third style of XML parsing, and perhaps the most obvious style to most programmers, is a tree-based API. In a tree-based API, an XML document is read by a parser, and the parser constructs an object model, typically around a tree with nodes for elements , attributes, comments, processing instructions, text, and so forth. The entire document is stored in memory. You use the methods of the object to query the document, to navigate the document, to change and modify the document, and so forth. There are more tree-based APIs than any other kind of API: DOM, JDOM, DOM4J, Sparta, ElectricXML, and my own XOM are all tree-based APIs.

The fourth style of API, which is also a fairly recent style, is a data-binding API. It is similar to tree APIs in that the entire document is parsed and an object model is built, but in a data binding API rather than having classes which represent XML concepts, like element and processing instructions, you have classes that represent the concepts the XML represents. So a book element might become a Book object. An employee element might become an Employee object. Typically, some form of schema is compiled to produce these classes automatically. Either a W3C XML schema language schema, a DTD, or a special purpose binding schema written just for that purpose in some special purpose schema language.

And then finally the fifth kind of API is what I would refer to as a query API. These would typically be things like TrAX for transforming with XSLT, or various APIs like Jaxen for searching with XPath. There are no real standards here, but there is some interesting work being done. Generally there the real focus, the real code, goes into the XPath or XSLT query, which we merely call from Java or some other language. It's like using SQL from inside a Java program using JDBC.

The Problems with Data-Binding XML APIs

Bill Venners: What are the problems with data binding?

Elliotte Rusty Harold: There are some general characteristics that I have seen very commonly across data-binding APIs. Not all APIs have all of these problems, but here are some things to watch out for that often cause trouble. The number one problem is the assumption that the document has a schema, of whatever form. Very many documents do not have any formal schema whatsoever—they are merely well-formed documents.

A second problem is the assumption that a document that has a schema has a schema written in the W3C XML schema language. By far the most popular XML schema language is not the W3C schema language, it's Document Type Definitions, or DTDs for short. Increasingly new projects are looking at the W3C XML schema language, deciding it's far too complex and far too baroque to actually use, and instead moving to something called RELAX NG, which was originally invented by Murata Makoto and James Clark and is now an official OASIS standard.

The third problem is the assumption that a document that has a schema is valid according to the schema, which is also often not true.

Then beyond the mere issues of schema-ness, you get into the question of what structures the schemas can represent. Most designers of data-binding APIs have come at it from the database perspective. They assume that XML documents look pretty much like tables. They're fairly flat. They're definitely not recursive. Mixed content doesn't exist. Order doesn't really matter. None of these things are true about XML documents in the general case.

In some XML documents those conditions are true. For example, in RSS they would be true. On the other hand, in Docbook, in XHTML, in scalable vector graphics, in MathML—in none of these would any of those conditions hold. Order is important. Mixed content does exist. These applications aren't very flat. They can be recursive. XML is a lot more general than a relational table. When you start trying to assume that things look like tables, or things look like objects, you're going to get yourself into trouble.

Bill Venners: Could you define mixed content?

Elliotte Rusty Harold: This sentence is very important! Now, we put the sentence I just spoke in a sentence tag—it's a sentence element. Then the word very goes into a strong element. So you begin with a start tag, "This sentence is", another start tag, the word "very", an end tag, the word "important!", and then an end tag. It's incredibly common in HTML. We've all written pages like that when you just want to make a word or a phrase in the middle of a paragraph bold or emphasized. Or you want ot make a link, but you don't want to make the entire paragraph a link. So the parent element contains both plain text and child elements—that's mixed content.

Bill Venners: What about order doesn't matter?

Elliotte Rusty Harold: It is one of the rules in SQL that the records have no fundamental order, that the fields in a record have no real order. It's just a question of how the query wants you to order the records. However, in XML in many cases, order is very important. Let's say you're just marking up normal human language like we're talking now. If we just start words switching randomly forth back and, to follow very hard becomes it. And that's true for many other cases, such as mathematical equations marked up in MathML.

Now it 's not true in all cases. You may have very database-like, very record-like XML. An RSS document, for example, contains many different items. Each item has a title, a URL, a summary. You don't really care whether the title comes before the URL or the summary. It will all be put into the right place when the document processed. But that's not all XML.

Bill Venners: In your talk you mentioned , "Seeing the world through object colored glasses." What did you mean by that?

Elliotte Rusty Harold: What I've just described is essentially seeing the world through database colored glasses—everything's a table. And yes you could probably figure a way to stuff most anything that can be represented in a computer into a table, but some things fit better than other things. A different version of the same problem is saying well everything's an object, and we can model everything as objects. And that's equally flawed, for different reasons.

For example, the number of child elements is a problem for the object world. Typically , you think when you write an employee class that the employee has a single name field, perhaps a string, a single salary field, perhaps a double. However, if you have an XML document, there are no such rules. Some employee elements will have a single name. A few employee elements may have multiple names . Somebody changed his or her name, or somebody got married, and you want to store both names. So how do you define your class so it allows both the case of one name, two names, three names, perhaps no names in some cases where the name is unknown. All this is legal in XML. But if you do a data-binding approach, where you assume there's one class that can represent an employee, you rapidly run into problems. A class can't have three separate name fields.

Next Week

Come back Monday, June 2 for Part I of a conversation with Bruce Eckel about why he loves Python. I am planning to start staggering the publication of several interviews at once, to give the reader variety. The next installment of this interview with Elliotte Rusty Harold will appear on Monday, June 16. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.

Resources

Elliotte Rusty Harold is author of Processing XML with Java: A Guide to SAX, DOM, JDOM, JAXP, and TrAX, which is available on Amazon.com at:
http://www.amazon.com/exec/obidos/ASIN/020161622X/

XOM, Elliotte Rusty Harold's XML Object Model API:
http://www.cafeconleche.org/XOM/

Cafe au Lait: Elliotte Rusty Harold's site of Java News and Resources:
http://www.cafeaulait.org/

Cafe con Leche: Elliotte Rusty Harold's site of XML News and Resources:
http://www.cafeconleche.org/

JDOM:
http://www.jdom.org/

DOM4J:
http://www.dom4j.org/

SAX, the Simple API for XML Processing:
http://www.saxproject.org/

DOM, the W3C's Document Object Model API:
http://www.w3.org/DOM/

ElectricXML:
http://www.themindelectric.com/exml/

Sparta:
http://sparta-xml.sourceforge.net/

Common API for XML Pull Parsing:
http://www.xmlpull.org/

NekoPull:
http://www.apache.org/~andyc/neko/doc/pull/

Xerces Native Interface (XNI):
http://xml.apache.org/xerces2-j/xni.html

TrAX (Tranformation API for XML):
http://xml.apache.org/xalan-j/trax.html

RELAX NG:
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=relax-ng

Talk back!

Have an opinion? Readers have already posted 7 comments about this article. Why not add yours?

About the author

Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.