Elliotte Rusty Harold is a prolific author of numerous books about Java and XML, and creator of the popular Java website Cafe au Lait and XML website Cafe con Leche. He contributed to the development of JDOM, a popular XML processing API for Java. His most recent book, Processing XML with Java, shows how to parse, manipulate, and generate XML from Java applications using several XML APIs, including SAX, DOM, and JDOM.
At a meeting of the New York XML SIG in September, 2002, Harold unveiled an XML processing API of his own design: the XOM (XML Object Model) API. On Cafe au Lait and Cafe con Leche, Harold described XOM like this:
Like DOM, JDOM, dom4j, and ElectricXML, XOM is a read/write API that represents XML documents as trees of nodes. Where XOM diverges from these models is that it strives for absolute correctness and maximum simplicity. XOM is based on more than two years' experience with JDOM development, as well as the last year's effort writing Processing XML with Java. While documenting the various APIs I found lots of things to like and not like about all the APIs, and XOM is my effort to synthesize the best features of the existing APIs while eliminating the worst.
In this interview, which is being published in multiple installments, Elliotte Rusty Harold discusses the strengths and weaknesses of the various XML processing APIs for Java, the design problems with existing APIs, and the design philosophy behind XOM.
Bill Venners: What did you learn from JDOM?
Elliotte Rusty Harold: I learned a hell of a lot from JDOM. Number one, I learned it is possible to fight the W3C and win. Just today I heard from one of the attendees at this conference that they couldn't really use XML until JDOM came along, because XML and the corresponding APIs were too complex. Once they got JDOM, then they could use XML. A lot of people are using JDOM.
Bill Venners: What do you mean by, "fight the W3C and win?"
Elliotte Rusty Harold: The W3C had published their semi-official tree-based API for XML: DOM. Nonetheless, JDOM is still very useful. Many people are using it, liking it, and enjoying it. Just because the W3C has staked out territory in this space doesn't mean that if you come along with something better for some people, that you can't be successful too.
Bill Venners: What else did you learn from JDOM?
Elliotte Rusty Harold: I learned a lot of things technically. For example, I learned that it is not necessary to write your own XML parser, which is a relatively hard proposition. Instead you could use any SAX parser to build your own object model in memory. Similarly I learned you didn't need to write your own XPath engine or XSLT processor to have XPath and XSLT support. For XPath you could use Jaxen. For XSLT you could use Trax. I learned you could build on top of these other existing technologies.
Bill Venners: In your talk you said JDOM taught you that "Thread safety is not necessary." Why not?
Elliotte Rusty Harold: Thread safety may be necessary in some applications that use JDOM, but in those applications, the synchronization probably belongs in the broader application that's calling JDOM. Many applications are not multi-threaded. And even those that are multi-threaded do not necessarily share JDOM objects between multiple threads. There's no reason to put the overhead of designing for multi-threaded applications into JDOM itself. If you need to share JDOM documents between multiple threads, then you can synchronize it at a higher level.
Bill Venners: By that overhead do you mean the performance hit at runtime, or the extra work programmers have to do to make JDOM multithreaded?
Elliotte Rusty Harold: Mostly the extra work that programmers have to do. I don't worry that much about performance. That's there too, but I think designing truly thread-safe code is incredibly difficult. It really requires an expert, and I'm not an expert in designing thread-safe code.
Bill Venners: You also said that "Live lists are trouble." What are live lists, and why are they trouble?
Elliotte Rusty Harold: Live lists have troubled both DOM and JDOM. They are probably a little more trouble in JDOM because the issues haven't been thought out in as much depth as in DOM. Let's say you have an
Element object, a
Document element, and you ask for the children of that element—all the children, not just one. You get back a list, and you iterate through the list. In JDOM you get back a
java.util.List. In DOM you get back a
NodeList. In both cases, imagine you delete an element from the list as you're iterating through the list. Or you add an element to the list, even within a single thread, modifying the list as you go. In JDOM and DOM, that changes the children of that element.
You have a real reference to the real children of the element—that's what a live list is. And if you've got a multithreaded application, if another thread changes the list, you see that reflected immediately too. Live lists are useful for some purposes, although not for the most common case of reading through the list and iterating all the children. It is useful perhaps when you're modifying the list. However, both JDOM and DOM have a huge amount of extra complexity internally to support the liveness of their lists. It makes the code much harder to fix. Both JDOM and some DOM implementations have had serious bugs related to liveness of lists, in cases where the liveness failed even though it wasn't supposed to. And it makes it hard to evolve the code.
Bill Venners: So is there another kind of list, a dead list perhaps?
Elliotte Rusty Harold: Yes. Instead of returning a reference to the actual children, the API can return a copy of the list. The changes made to that list are not reflected in the original document from which the list came.
XOM lists, interestingly, are neither live nor dead. They are in-between, what I refer to as comatose.
In XOM, a change to the list does not change the document. However, a change to a member of the list does change that member back in the document. For example, in the list, you remove an element. It is not removed from the document. However, if in the list you change the name of an element, call
element.setName(), the name of the element changes in the document too. The list references to the actual elements in the document, but the list itself is not the same list used in the document.
Bill Venners: You also said you learned from JDOM to "keep everything in the same package."
Elliotte Rusty Harold: In JDOM, we had some major troubles with serialization and parsing, because the parsing went into the
org.jdom.input package, the serialization went into the
org.jdom.output package, and the core classes were just in
org.jdom. We really needed friend functions. The parser or builder needed to do things that we could not expose in the core package without exposing them to other non-parser related classes.
For example, we wanted the parser to be able to create an element without verifying it. Generally speaking, when you construct a JDOM element, the name and properties of the element are verified to make sure the element is well-formed. JDOM wants to make sure that client programmers can't bypass those checks. However, if the parser is building the document, the parser has already made those checks. So in that case, it is perfectly reasonable to bypass those checks. Because the parsers are in a different package than the classes themselves, however, the only access possibility we have is public. So without friend functions, we really need to keep everything so closely related together in the same package, so we can use package protection for those special cases.
Bill Venners: You also mentioned in your talk that JDOM taught you not to "release too early."
Elliotte Rusty Harold: For JDOM, I think the first release was beta2, and this was some three years ago now. Brett actually wrote about JDOM in his book Java and XML. By the time his book had come out, JDOM had moved on past that initial release. JDOM has changed a lot in the last three years, even though it still hasn't gotten to 1.0. I think JDOM would have benefited from earlier work with a smaller, more select group. Don't publish to the world quite so soon. Let a few more things gel into place. Then when the select group is fairly happy with it, push it out the door for broader comment. I think a lot of developers got turned off of JDOM relatively early, because they jumped onto the bandwagon too early, and they had to keep revising their code from beta2 to beta3 to beta4 to beta5. They thought JDOM would be more finished than it really was.
Bill Venners: So the trouble with releasing too early is you can alienate some of your audience.
Elliotte Rusty Harold: Yes, you can cause problems for your audience by virtue of not having a finished API.
Bill Venners: I remember seeing a lot of deprecation in JDOM. Is that because JDOM was successful enough that there was sufficient client code out there they didn't want to break, even though JDOM was still beta?
Elliotte Rusty Harold: Yes. Now that deprecated code will all be removed before 1.0, but JDOM does have a policy that when a method is renamed or changed, the old method or class is retained for at least one release as a deprecated method. Unlike Java itself, JDOM will eventually remove all deprecated methods.
Bill Venners: You suggested in your talk, "Don't optimize until the API is right."
Elliotte Rusty Harold: I have noticed too many times in the work on JDOM that the arguments against something that is obviously correct are that the performance hit will be too big. But we don't have any decent measurements on exactly where the performance hits are in JDOM. I think correctness should come first. First you get a correct API, and then you go look for opportunities to optimize the code. As Donald Knuth says, "premature optimization is the root of all evil in programming."
Come back Monday, July 21 for the final installment of a conversation with Bruce Eckel about why he loves Python. I am now staggering the publication of several interviews at once, to give the reader variety. The next installment of this interview with Elliotte Rusty Harold will appear near future. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
Elliotte Rusty Harold is author of Processing XML with Java: A Guide to SAX, DOM, JDOM, JAXP, and TrAX, which is available on Amazon.com at:
XOM, Elliotte Rusty Harold's XML Object Model API:
Cafe au Lait: Elliotte Rusty Harold's site of Java News and Resources:
Cafe con Leche: Elliotte Rusty Harold's site of XML News and Resources:
SAX, the Simple API for XML Processing:
DOM, the W3C's Document Object Model API:
Common API for XML Pull Parsing:
Xerces Native Interface (XNI):
TrAX (Tranformation API for XML):
Jaxen (a Java XPath engine):
Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.