The Artima Developer Community
Sponsored Link

Air Bags and Other Design Principles
A Conversation with Elliotte Rusty Harold, Part VI
by Bill Venners
August 11, 2003

<<  Page 3 of 4  >>


Ensuring Well-Formed XML

Bill Venners: What XML principles did you adhere to in your XOM API design? You listed a few in your talk, for example, "All objects can be written as well-formed XML text." All objects?

Elliotte Rusty Harold: All Node objects— all objects that represent part of an XML document, so the Element object, the Attribute object, the Text object, etc.—can be written as well-formed XML.

Bill Venners: Is that not true of JDOM?

Elliotte Rusty Harold: No, it's not true of JDOM. It's not true of DOM. It's not true of most other XML processing APIs.

Bill Venners: How is it not true of JDOM?

Elliotte Rusty Harold: As I said earlier, you can use Strings in JDOM that contain control characters that cannot be serialized. JDOM doesn't make the checks it needs to guarantee well-formedness in several areas.

Bill Venners: I see. So because the API doesn't ensure well-formedness when data is passed to the Node objects, well-formed XML is not guaranteed to come out when those objects are serialized. In your talk you also said, "Validity can be enforced by subclasses." What did you mean by that?

Elliotte Rusty Harold: Going back to the principle, "Design for subclassing or prohibit it," in most of the XOM classes, I designed for subclassing. Let's say you're developing an XHTML package, a set of subclasses of the standard XOM classes. You have classes for each of the specific element types in XHTML, such as PElement, TableElement, and BodyElement, all of which extend Element. Each of these classes could add additional constraints to those usually enforced by XOM. So the BodyElement subclass could require that the element name be "body" and the namespace URI be the XHTML namespace.

Or, in an RSS package with a LinkElement class that extends XOM's Element class, you could verify that the text of every LinkElement is actually a URI. None of these additional checks are required by XML, but specific XML applications might have such further requirements. You can enforce those in subclasses. On the other hand, you can't remove checks. A subclass is not allowed to decide, for example, that it going to allow white space in Element names.

Designing the XOM classes for subclassing took a lot of effort. It would have been much easier in a language like Eiffel, that has real assertions that are inherited by subclasses.

Serialization with XML

Bill Venners: You said in your talk, "Classes do not implement Serializable, use XML." That would probably be the main question I would have in a XOM design review. By simply marking the classes Serializable, you give clients a choice to serialize via XML or Java object serialization without adding much clutter to the public API. Why did you choose not to?

Elliotte Rusty Harold: XML is a good serialization format. It's often smaller, more compact, and faster than Java's binary object serialization. If you have a Document or an Element object, and you want to blast it across the network to somebody else, XML is much more portable. It's much more efficient to send it as XML, as text on the wire, than it is to serialize this object into Java binary serialization format. The only case where you might perhaps want to use object serialization is if you're doing remote method invocation. But my response to that is, well this is XML. We probably ought to be doing SOAP or XML-RPC, or some REST-ful thing instead.

Using Custom Lists

Bill Venners: Another comment you made in your talk was, "Lack of generics really hurts the Collections API, hence don't use it." Explain your reasoning.

Elliotte Rusty Harold: Essentially, there's no way currently in Java to say, "This is not just a generic list. This is a list of Nodes, Elements, or Attributes" Anytime you put an object into a java.util.List, you lose some type information. That results in a lot of casting, a lot of instanceof checks, and it's just plain ugly. There's probably a little performance cost, but I don't care about that. I do care that it's ugly.

It's not that hard to implement your own lists, something we all learned about in Data Structures 201. It was still too hard for me to do, though. So internally in XOM, I used a java.util.List. I used the facade design pattern to provide type safe list operations in the public API. All the casting and instanceof checks and everything else that's necessary with java.util.Lists are done in the private parts of the classes.

Interestingly, this is the exact reverse of how JDOM does it. In its private parts, JDOM uses its own FilterList class that was written by the JDOM developers. FilterList is a very sophisticated list with a lot of power. It knows a lot of details about the specific JDOM objects. In the API JDOM exposes to the world, however, it might as well be any other java.util.List that contains objects. None of that power, knowledge, or sophistication is seen. Behind the scenes in XOM, I'm just using the standard java.util.List, but out front it looks a lot nicer.

Bill Venners: I think the tradeoff there is that one of the advantages of using the Java Collections API in your public interface is that everyone already knows what they are.

Elliotte Rusty Harold: Right, that's certainly an advantage, but I don't think the XOM lists are so challenging that anybody is going to have excessive trouble learning them. The NodeList interface, for example, has two methods, one to return the size and another to get the item at a particular index. It's a read-only list, not a read-write list. If you want to write Elements into the List, you use the Element's own insertChild or appendChild methods. You can't change the lists that are exposed to you.

<<  Page 3 of 4  >>

Sponsored Links

Copyright © 1996-2018 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use