Sponsored Link •
Elliotte Rusty Harold talks with Bill Venners about the problems with the DOM API, and the design lessons he learned from DOM.
Elliotte Rusty Harold is a prolific author of numerous books about Java and XML, and creator of the popular Java website Cafe au Lait and XML website Cafe con Leche. He contributed to the development of JDOM, a popular XML processing API for Java. His most recent book, Processing XML with Java, shows how to parse, manipulate, and generate XML from Java applications using several XML APIs, including SAX, DOM, and JDOM.
In September, 2002, Harold unveiled at a meeting of the New York XML SIG an XML processing API of his own design: the XOM (XML Object Model) API. On Cafe au Lait and Cafe con Leche, Harold described XOM like this:
Like DOM, JDOM, dom4j, and ElectricXML, XOM is a read/write API that represents XML documents as trees of nodes. Where XOM diverges from these models is that it strives for absolute correctness and maximum simplicity. XOM is based on more than two years' experience with JDOM development, as well as the last year's effort writing Processing XML with Java. While documenting the various APIs I found lots of things to like and not like about all the APIs, and XOM is my effort to synthesize the best features of the existing APIs while eliminating the worst.
In this interview, which is being published in multiple installments, Elliotte Rusty Harold discusses the strengths and weaknesses of the various XML processing APIs for Java, the design problems with existing APIs, and the design philosophy behind XOM.
Bill Venners: What's wrong with DOM?
Elliotte Rusty Harold: There's a phrase, "A camel is a horse designed by committee." That's a slur on a camel. A camel is actually very well adapted to its environment. DOM, on the other hand, is the sort of thing that that phrase was meant to describe.
DOM is incredibly complex. It is full of gotchas.
Bill Venners: What's are some of DOM's gotchas?
Elliotte Rusty Harold: Take namespaces, for example. There are two basic models for handling namespaces in an XML API. In one model, you assign each element and attribute a certain namespace, and you figure out where the namespace declarations need to go when you serialize the document. In the other model, you don't provide any special support for namespaces—you just treat the namespaces as attributes. That also works, although it's harder on the end user. DOM is the only API I know of that does both, simultaneously. DOM requires client programmers to understand and use both models. Otherwise they'll produce namespace-malformed documents, which is truly evil. DOM has all the complexity of both approaches and the simplicity of neither.
There are a lot of other issues with DOM that stem from its cross-language
nature. For example, DOM defines exactly one exception,
DOMException, which has
short type codes to
indicate which kind of exception it is. To a Java programmer, this is just plain
weird. Java programmers use many different exception classes, and never
shorts for anything. When was the last time you used a
short in code? Have you ever? I don't think I've ever used
short, except when I was trying to demonstrate all the data
types. But using a
short makes sense from a C or C++
programming perspective, where shorts are more common, and having many
exception types is not.
support method overloading at the time DOM was invented. Therefore, DOM
could not have two methods such as
createElement, one that
takes an element name and a namespace, and another that takes only a local
name. Instead, DOM has
createElement, which takes
just the name, and
createElementNamespace, which takes
both a name and a namespace. There are many non-overloaded methods in
the DOM API that, to any Java or C++ programmer, should clearly be
There are several other DOM design decisions that confuse people.
For example, DOM trees do not allow nodes to be
detached from their parent document. Only the document that created the
node is allowed to hold the node. Also, DOM's
object is read-only. Why? I can't explain these design decisions. I just know that
they are painful when you're actually trying to get work done with DOM.
Bill Venners: What did you learn from DOM? What things did they do that you thought made sense?
Elliotte Rusty Harold: The single biggest lesson from DOM that comes
to mind is that polymorphism is good. It's very useful to have a
Node interface or class, which all parts of the XML tree
extend in some fashion. Often you just want to walk the tree
and work with all the
Nodes. You don't care whether a
Node is an
ProcessingInstruction—you don't need that more specific type. DOM works very well
in those cases.
Bill Venners: What kind of processing can you do with a
Node irrespective of its more specific type?
Elliotte Rusty Harold: You can merge two documents, for example. You want to select this portion of document A, and copy it into this element of document B. You just want to walk down the tree of document A, and copy each node in the tree into document B.
Bill Venners: One thing you said you learned from DOM is that interfaces are a bad idea. Why?
Elliotte Rusty Harold: I learned that partially from DOM. DOM is designed around interfaces, rather than concrete classes, because it is written in IDL and needs to be compiled to many different programming languages. It relies on the abstract factory design pattern to actually form documents and DOM implementations.
A large part of the trouble of getting started with DOM is learning to work with the interfaces, rather than directly with the classes. If you look at the code many XML novices write with DOM, you'll see it is littered with the implementation classes of the specific DOM implementation, such as Xerces or Crimson.
To some extent mentioning implementation classes is unavoidable, because
DOM is incomplete. The
Document class serves as an abstract
factory for creating
Text objects, and so forth. The DOM implementation
class is an abstract factory, which is used to create
DocType objects. However, they left out the part of the
abstract factory design pattern, where there's a static method that lets you load
the factory itself. You can't load the factory in DOM without using
implementation-specific classes. Overall, I just saw that the interfaces were
making life more difficult than it needed to be for a lot of programmers.
Bill Venners: You also said that you learned from DOM that successful APIs must be simple.
Elliotte Rusty Harold: Yes, although I suppose DOM is the reverse of that. DOM proves that a complex API is not likely to be successful, at least if it's substantially more complex than what it's trying to model.
Come back Monday, June 23 for Part II of a conversation with Bruce Eckel about why he loves Python. I am now staggering the publication of several interviews at once, to give the reader variety. The next installment of this interview with Elliotte Rusty Harold will appear near future. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
Elliotte Rusty Harold is author of Processing XML with Java: A Guide
to SAX, DOM, JDOM, JAXP, and TrAX, which is available on Amazon.com at:
XOM, Elliotte Rusty Harold's XML Object Model API:
Cafe au Lait: Elliotte Rusty Harold's site of Java News and Resources:
Cafe con Leche: Elliotte Rusty Harold's site of XML News and Resources:
SAX, the Simple API for XML Processing:
DOM, the W3C's Document Object Model API:
Common API for XML Pull Parsing:
Xerces Native Interface (XNI):
TrAX (Tranformation API for XML):