Sponsored Link •
Bill Venners: You also complained that JDOM XML documents are not always well-formed. Could you differentiate between well-formed and valid documents, and explain your concerns about JDOM?
Elliotte Rusty Harold: XM L documents must be well-formed. There are,
depending on how you count, anywhere from a hundred to several thousand
different rules. These "well-formedness" rules are the minimum requirements
for an XML document. The rules cover things like what characters are allowed
in element names: The letter
'a' is OK. The letter omega is OK.
The asterisk character is not OK. White space is not OK. The rules say that
every start-tag has to have a matching end-tag. Elements can nest, but they
cannot overlap. Processing instructions have the form
?, a target, white space, the data,
?, and a
>. Comments cannot contain a double hyphen. There are
many such rules governing well-formedness of XML documents.
Validity talks about which elements and attributes are allowed where. Well-formedness only talks about the structure of any XML document, irrespective of what the names are. Validity says, we're only going to allow these elements with these names in these positions. Validity is not required. Well-formedness is.
JDOM, and for that matter DOM, allows you to create malformed documents.
They do not check everything they can possibly check. For instance, they do not
currently check that the text content of a text node does not contain the null
character, which is completely illegal in an XML document. Similarly so are
vertical tabs, form feeds, and other control characters. So one way you can
create a malformed document using either JDOM or DOM, is to pass in a string
Text constructor that contains some of these control characters. In
my opinion, an XML API shouldn't allow that. It shouldn't rely on the programmer
who is using the API to know which characters are and are not legal. If a
programmer tries to do something illegal that would result in a malformed
document, it should stop them by throwing an exception.
Bill Venners: You also mentioned the internal DTD subset in this portion of your talk.
Elliotte Rusty Harold: An XML document's DocType declaration points to its Document Type Definition (DTD). If the DTD is actually contained inside the instance document, between square brackets, then that part of the DTD is called the internal DTD subset. In some cases the internal DTD can also point to an external part, which is why we distinguish internal from external. We merge the two DTD subsets to get the complete DTD. Sometimes the whole DTD is there in the internal DTD subset. Sometimes it's in the external part.
In JDOM, the internal DTD subset is not checked. You could put absolutely any string in there whatsoever, including strings that are totally illegal in an internal DTD subset. For example, you could just put the text of the Declaration of Independence as your internal DTD subset in JDOM, even though that would not be well-formed. It's just another thing that JDOM decided they would not check for well-formedness, because checking the internal DTD subset would be too onerous.
DOM solves that problem in a different way, incidentally. DOM makes the DocType declaration read-only, so it can't be changed at all. Therefore, it can't be changed to something that is malformed.
Bill Venners: How about, setter methods don't return
Elliotte Rusty Harold: I learned in JavaBeans that one of the ways you
recognize a setter method is that it returns
void, as in
public void setColor(). You know that method sets the color
property, because it follows a naming convention. The name begins with the
set. The first letter in
Color is capitalized,
and so forth. JDOM follows a different pattern, called method invocation
chaining, where for example the
setName method on the
Element class returns that
Element object. To
me, that just makes no sense. There's no reason for setter methods to return
Bill Venners: The set methods return
Elliotte Rusty Harold:You might have an element object
e in class
X , and you call
e.setName(), which returns
e. From inside the
method, yes, it's returning
this. From outside the method, it's
returning whatever object you invoked it on. That pattern is used, for example, in
the new IO library in Java, where I also don't like it. But the designers of JDOM
do like it. To me, it does not seem semantically correct. It does not seem to
indicate what the method is doing, as opposed to how the method is being