The Artima Developer Community
Interviews | Discuss | Print | Email | First Page | Previous | Next
Sponsored Link

Organic Schemas and Outlier Data
A Conversation with Elliotte Rusty Harold, Part IX
by Bill Venners
October 6, 2003

<<  Page 3 of 3

Advertisement

Organic Schema Design

Bill Venners: Do you have any general guidelines for designing an XML schema, designing the data structure?

Elliotte Rusty Harold: The main thing I would say is: grow your documents organically. Try and model the actual content for which you're writing a schema, and see what sort of XML structures come out. Don't start by writing schemas. Start by writing example instance documents, and see what you get.

For example, if you're modeling invoices, pull out a few invoices. Ask yourself, "If I wrote this invoice in XML, what it would look like? That invoice, what it would look like?" If you have a large and representative enough collection of previous documents—in whatever format: paper, electronic—you can get a good start. Then you will gradually discover other documents coming into your system that don't really fit your designs. They have a couple extra fields. One document has two shipping addresses instead of one, so you figure out how to handle that in your schema. Another document has an address that's in the U.K. instead of in the United States, and that has a very different format. So you adjust the schema.

If you grow your schemas organically, you gradually figure out how the documents are likely to be structured. You don't write down in stone up front that the documents must be structured like this, that all these elements must be present, that these attributes must not be present if something else is present, and so on. You let the actual information drive the design, rather than letting the design constrain what documents you're willing to accept.

Next Week

Come back Monday, October 13 for the first installment of a conversation with C++ creator Bjarne Stroustrup. I know I promised this last week, but one must always keep up some element of surprise. Nevertheless, look for Bjarne next Monday. He will be here, really. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.

Talk Back!

Have an opinion about the design principles presented in this article? Discuss this article in the News & Ideas Forum topic, Organic Schemas and Outlier Data.

Resources

Elliotte Rusty Harold is author of Processing XML with Java: A Guide to SAX, DOM, JDOM, JAXP, and TrAX, which is available on Amazon.com at:
http://www.amazon.com/exec/obidos/ASIN/020161622X/

XOM, Elliotte Rusty Harold's XML Object Model API:
http://www.cafeconleche.org/XOM/

Cafe au Lait: Elliotte Rusty Harold's site of Java News and Resources:
http://www.cafeaulait.org/

Cafe con Leche: Elliotte Rusty Harold's site of XML News and Resources:
http://www.cafeconleche.org/

JDOM:
http://www.jdom.org/

DOM4J:
http://www.dom4j.org/

SAX, the Simple API for XML Processing:
http://www.saxproject.org/

DOM, the W3C's Document Object Model API:
http://www.w3.org/DOM/

ElectricXML:
http://www.themindelectric.com/exml/

Sparta:
http://sparta-xml.sourceforge.net/

Common API for XML Pull Parsing:
http://www.xmlpull.org/

NekoPull:
http://www.apache.org/~andyc/neko/doc/pull/

Xerces Native Interface (XNI):
http://xml.apache.org/xerces2-j/xni.html

TrAX (Tranformation API for XML):
http://xml.apache.org/xalan-j/trax.html

Jaxen (a Java XPath engine):
http://jaxen.org/

RELAX NG:
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=relax-ng

<<  Page 3 of 3

Interviews | Discuss | Print | Email | First Page | Previous | Next

Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us