The Artima Developer Community
Sponsored Link

A Design Review of JDOM
A Conversation with Elliotte Rusty Harold, Part III
by Bill Venners
June 30, 2003

<<  Page 2 of 3  >>

Advertisement

JDOM Allows Malformed Documents

Bill Venners: You also complained that JDOM XML documents are not always well-formed. Could you differentiate between well-formed and valid documents, and explain your concerns about JDOM?

Elliotte Rusty Harold: XM L documents must be well-formed. There are, depending on how you count, anywhere from a hundred to several thousand different rules. These "well-formedness" rules are the minimum requirements for an XML document. The rules cover things like what characters are allowed in element names: The letter 'a' is OK. The letter omega is OK. The asterisk character is not OK. White space is not OK. The rules say that every start-tag has to have a matching end-tag. Elements can nest, but they cannot overlap. Processing instructions have the form <, ?, a target, white space, the data, ?, and a >. Comments cannot contain a double hyphen. There are many such rules governing well-formedness of XML documents.

Validity talks about which elements and attributes are allowed where. Well-formedness only talks about the structure of any XML document, irrespective of what the names are. Validity says, we're only going to allow these elements with these names in these positions. Validity is not required. Well-formedness is.

JDOM, and for that matter DOM, allows you to create malformed documents. They do not check everything they can possibly check. For instance, they do not currently check that the text content of a text node does not contain the null character, which is completely illegal in an XML document. Similarly so are vertical tabs, form feeds, and other control characters. So one way you can create a malformed document using either JDOM or DOM, is to pass in a string to the Text constructor that contains some of these control characters. In my opinion, an XML API shouldn't allow that. It shouldn't rely on the programmer who is using the API to know which characters are and are not legal. If a programmer tries to do something illegal that would result in a malformed document, it should stop them by throwing an exception.

Bill Venners: You also mentioned the internal DTD subset in this portion of your talk.

Elliotte Rusty Harold: An XML document's DocType declaration points to its Document Type Definition (DTD). If the DTD is actually contained inside the instance document, between square brackets, then that part of the DTD is called the internal DTD subset. In some cases the internal DTD can also point to an external part, which is why we distinguish internal from external. We merge the two DTD subsets to get the complete DTD. Sometimes the whole DTD is there in the internal DTD subset. Sometimes it's in the external part.

In JDOM, the internal DTD subset is not checked. You could put absolutely any string in there whatsoever, including strings that are totally illegal in an internal DTD subset. For example, you could just put the text of the Declaration of Independence as your internal DTD subset in JDOM, even though that would not be well-formed. It's just another thing that JDOM decided they would not check for well-formedness, because checking the internal DTD subset would be too onerous.

DOM solves that problem in a different way, incidentally. DOM makes the DocType declaration read-only, so it can't be changed at all. Therefore, it can't be changed to something that is malformed.

JDOM Ignores Setter Method Conventions

Bill Venners: How about, setter methods don't return void.

Elliotte Rusty Harold: I learned in JavaBeans that one of the ways you recognize a setter method is that it returns void, as in public void setColor(). You know that method sets the color property, because it follows a naming convention. The name begins with the word set. The first letter in Color is capitalized, and so forth. JDOM follows a different pattern, called method invocation chaining, where for example the setName method on the Element class returns that Element object. To me, that just makes no sense. There's no reason for setter methods to return anything.

Bill Venners: The set methods return this?

Elliotte Rusty Harold:You might have an element object e in class X , and you call e.setName(), which returns e. From inside the method, yes, it's returning this. From outside the method, it's returning whatever object you invoked it on. That pattern is used, for example, in the new IO library in Java, where I also don't like it. But the designers of JDOM do like it. To me, it does not seem semantically correct. It does not seem to indicate what the method is doing, as opposed to how the method is being used.

<<  Page 2 of 3  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us