Elliotte Rusty Harold is a prolific author of numerous books about Java and XML, and creator of the popular Java website Cafe au Lait and XML website Cafe con Leche. He contributed to the development of JDOM, a popular XML processing API for Java. His most recent book, Processing XML with Java, shows how to parse, manipulate, and generate XML from Java applications using several XML APIs, including SAX, DOM, and JDOM.
At a meeting of the New York XML SIG in September, 2002, Harold unveiled an XML processing API of his own design: the XOM (XML Object Model) API. On Cafe au Lait and Cafe con Leche, Harold described XOM like this:
Like DOM, JDOM, dom4j, and ElectricXML, XOM is a read/write API that represents XML documents as trees of nodes. Where XOM diverges from these models is that it strives for absolute correctness and maximum simplicity. XOM is based on more than two years' experience with JDOM development, as well as the last year's effort writing Processing XML with Java. While documenting the various APIs I found lots of things to like and not like about all the APIs, and XOM is my effort to synthesize the best features of the existing APIs while eliminating the worst.
In this interview, which is being published in multiple installments, Elliotte Rusty Harold discusses the strengths and weaknesses of the various XML processing APIs for Java, the design problems with existing APIs, and the design philosophy behind XOM.
Bill Venners: What development style did you use when you created XOM? In your talk you said, "This a Cathedral, not a Bazaar."
Elliotte Rusty Harold: So far, XOM has been a one person project. I have written almost all the code. Other users have spotted bugs and suggested features. Occasionally, a line or two of code has been contributed. But basically XOM has been done entirely by me. That may change in the future. There's currently a submission from Bradley Huffman on XPATH, which he wrote. That might go in, but it isn't in the codebase yet.
XOM is my API. I designed it. It's not developed by majority vote. I have certainly occasionally been convinced that I've made mistakes. Sometimes it is really obvious. Sometimes it requires a little more argument. But ultimately, I am the one who decides what goes in and what does not. That process contrasts with the design process for APIs like JDOM, where occasionally things are done by vote. It certainly contrasts with the design process for APIs like DOM, where decisions are made by a very formal procedure of voting and consensus. I think that even if this process makes XOM occasionally a little quirky in places—because like all people I have my little quirks: for example, I don't like method invocation chaining—overall benevolent dictatorships create cleaner APIs. Having one clear vision for an API is better than compromising between many competing visions.
Bill Venners: What techniques did you use to help you discover where your API was good and where it wasn't.
Elliotte Rusty Harold: I am a fan of Extreme Programming, but since I am essentially working by myself at home, pair programming isn't an option. I do use unit tests heavily on almost all the classes. The only areas where I don't have serious unit tests, where I have some tests but not full coverage, are in serialization and in parsing. Because writing unit tests for serialization and parsing is just bloody hard, so far I haven't done it. I really need to, though, because guess where all the bugs show up? They show up in serialization and parsing, where I don't have good unit test coverage.
One of the things I did that helped a lot with the design of the API, as opposed to debugging and testing it, is I went through my book Processing XML with Java, which over the course of 1000 plus pages has many sample programs in SAX, DOM, JDOM, TraX, and other APIs. I rewrote all those examples using XOM. Sometimes I would realize I was missing some obvious method that would make this easier. Also, after I got to the end of the book and I looked at my code samples, I realized I never did use certain methods. So I took them out. Or I only used a method once, but I could have used another method instead. That's why I got rid of the
getPreviousSibling methods I originally had, for example. It just became obvious after translating all the examples to XOM that those methods weren't necessary.
I think implementing all those examples helped clarify a lot of the thoughts I had about what was worth putting in the API and what wasn't. In some cases it proved I needed new things. In other cases it proved I could afford to take certain things out. Based on what I've heard from early adopters of XOM, it resulted in a pretty clean API. I don't hear a lot of calls for convenience methods that aren't there. It is very rare compared to JDOM for example, where a lot of people ask frequently for extra methods.
Bill Venners: They do? But JDOM already has a lot of convenience methods, and XOM has so few.
Elliotte Rusty Harold: But maybe I've got the right methods.
Bill Venners: Or perhaps it's cultural.
Elliotte Rusty Harold: Yes, maybe it's cultural. In XOM, they can see that there's not a lot of chance of getting additional methods added. For example, JDOM's
Attribute class has eight different methods to read an attribute value depending on whether you want to get it back as an
String, or a
Bill Venners: What you're saying is that one of the techniques that helped you design XOM is that you used your API over and over in lots of different cases.
Elliotte Rusty Harold: Yes.
Bill Venners: You looked at your book and rewrote all your XML processing examples in XOM, and you learned what was and was not needed in the XOM API.
Elliotte Rusty Harold: Right, exactly.
Bill Venners: Unit testing is a way to write clients that can give you some insight into your API, but it seems rewriting XML processing examples from other APIs to XOM was more direct way to write clients.
Elliotte Rusty Harold: Unit tests are a way to make sure that methods do what you expect them to do. They don't really help you decide what methods you need in the first place.
Bill Venners: That's not what the test-driven development folks would say. They believe writing tests first in tiny iterations helps them discover a clean design. I know you like unit tests. Do you write tests first?
Elliotte Rusty Harold: My problem with writing tests at the very first stage is that the test suite doesn't show a compile error as a red bar. If your tests don't compile, you can't run them through JUnit. I think if the JUnit GUI were rewritten so it could load the test from source code rather than a compiled .class file, and then report a compilation failure as a problem, then writing tests first would become more plausible. I generally do write at least the signature and a simple method stub before I write the test. Sometimes, even when I've got the method doing nothing more than returning
null, then I'll go write the test. But not before then.
So I still write the tests very early, but I don't like writing tests where the IDE immediately puts squiggly lines underneath the code to indicate that it doesn't compile. How annoying that is varies from one IDE to the next. It's not at all annoying in emacs. In Eclipse it's a little annoying. In IntelliJ IDEA, it's practically impossible to get anything done until your code at least compiles.
Come back Monday, September 1 for the next installment of a conversation with C# creator Anders Hejlsberg. The final installment of this conversation with Elliotte Rusty Harold will appear September 8. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
Elliotte Rusty Harold is author of Processing XML with Java: A Guide to SAX, DOM, JDOM, JAXP, and TrAX, which is available on Amazon.com at:
XOM, Elliotte Rusty Harold's XML Object Model API:
Cafe au Lait: Elliotte Rusty Harold's site of Java News and Resources:
Cafe con Leche: Elliotte Rusty Harold's site of XML News and Resources:
SAX, the Simple API for XML Processing:
DOM, the W3C's Document Object Model API:
Common API for XML Pull Parsing:
Xerces Native Interface (XNI):
TrAX (Tranformation API for XML):
Jaxen (a Java XPath engine):
Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.