|
|
|
Sponsored Link •
|
|
Advertisement
|
Bill Venners: What XML principles did you adhere to in your XOM API design? You listed a few in your talk, for example, "All objects can be written as well-formed XML text." All objects?
Elliotte Rusty Harold: All Node objects—
all objects that represent part of an XML document, so the
Element object, the Attribute object, the
Text object, etc.—can be written as well-formed XML.
Bill Venners: Is that not true of JDOM?
Elliotte Rusty Harold: No, it's not true of JDOM. It's not true of DOM. It's not true of most other XML processing APIs.
Bill Venners: How is it not true of JDOM?
Elliotte Rusty Harold: As I said earlier, you can use
Strings in JDOM that contain control characters that cannot be
serialized. JDOM doesn't make the checks it
needs to guarantee well-formedness in several areas.
Bill Venners: I see. So because the API doesn't ensure well-formedness when
data is passed to the Node objects, well-formed XML is not guaranteed to come out
when those objects are serialized. In your talk you also said, "Validity can be
enforced by subclasses." What did you mean by that?
Elliotte Rusty Harold: Going back to the principle, "Design for
subclassing or prohibit it," in most of the XOM classes, I designed for
subclassing. Let's say you're developing an XHTML package, a set of
subclasses of the standard XOM classes. You have classes for each of the
specific element types in XHTML, such as PElement,
TableElement, and BodyElement, all of which
extend Element. Each of these classes could add additional
constraints to those usually enforced by XOM. So the
BodyElement subclass could require that the element name be
"body" and the namespace URI be the XHTML namespace.
Or, in an RSS package with a LinkElement
class that extends XOM's Element class, you could verify that
the text of every LinkElement is actually a URI. None of these
additional checks are required by XML, but specific XML applications might
have such further requirements. You can enforce those in subclasses. On the
other hand, you can't remove checks. A subclass is not allowed to decide, for
example, that it going to allow white space in Element names.
Designing the XOM classes for subclassing took a lot of effort. It would have been much easier in a language like Eiffel, that has real assertions that are inherited by subclasses.
Bill Venners: You said in your talk, "Classes do not implement Serializable, use XML."
That would probably be the main question I would have in
a XOM design review. By simply marking the classes Serializable, you give
clients a choice to serialize via XML or Java object serialization without adding
much clutter to the public API. Why did you choose not to?
Elliotte Rusty Harold: XML is a good serialization format. It's often
smaller, more compact, and faster than Java's binary object serialization. If you
have a Document or an Element object, and you
want to blast it across the network to somebody else, XML is much more
portable. It's much more efficient to send it as XML, as text on the wire, than it is
to serialize this object into Java binary serialization format. The only case where
you might perhaps want to use object serialization is if you're doing remote
method invocation. But my response to that is, well this is XML. We probably
ought to be doing SOAP or XML-RPC, or some REST-ful thing instead.
Bill Venners: Another comment you made in your talk was, "Lack of generics really hurts the Collections API, hence don't use it." Explain your reasoning.
Elliotte Rusty Harold: Essentially, there's no way currently in Java to
say, "This is not just a generic list. This is a list of Nodes,
Elements, or Attributes" Anytime you put an
object into a java.util.List, you lose some type
information. That results in a lot of casting, a lot of instanceof
checks, and it's just plain ugly. There's probably a little performance cost, but I
don't care about that. I do care that it's ugly.
It's not that hard to implement your own lists, something we all learned about in
Data Structures 201. It was still too hard for me to do, though. So internally in
XOM, I used a java.util.List. I used the facade design pattern to
provide type safe list operations in the public API. All the casting and
instanceof checks and everything else that's necessary with
java.util.Lists are done in the private parts of the classes.
Interestingly, this is the exact reverse of how JDOM does it. In its private parts,
JDOM uses its own FilterList class that was written by the
JDOM developers. FilterList is a very sophisticated list with a lot
of power. It knows a lot of details about the specific JDOM objects. In the API
JDOM exposes to the world, however, it might as well be any other
java.util.List that contains objects. None of that power,
knowledge, or sophistication is seen. Behind the scenes in XOM, I'm just using
the standard java.util.List, but out front it looks a lot nicer.
Bill Venners: I think the tradeoff there is that one of the advantages of using the Java Collections API in your public interface is that everyone already knows what they are.
Elliotte Rusty Harold: Right, that's certainly an advantage, but I don't
think the XOM lists are so challenging that anybody is going to have excessive
trouble learning them. The NodeList interface, for example, has two
methods, one to return the size and another to get the item at a
particular index. It's a read-only list, not a read-write list. If you want to
write Elements into the List, you
use the Element's own insertChild or
appendChild methods. You can't change the lists that are
exposed to you.
|
Sponsored Links
|