Article Discussion
Plain Text and XML
Summary: Pragmatic Programmers Andy Hunt and Dave Thomas talk with Bill Venners about the value of storing persistent data in plain text and the ways they feel XML is being misused.
18 posts.
The ability to add new comments in this discussion is temporarily disabled.
Most recent reply: July 18, 2003 9:52 AM by Matt
    Bill
     
    Posts: 409 / Nickname: bv / Registered: January 17, 2002 4:28 PM
    Plain Text and XML
    May 5, 2003 1:42 AM      
    Dave Thomas says, "XML is useful in appropriate contexts, but it is being grossly abused in most of the ways it is being used today."

    Read this Artima.com interview with Pragmatic Programmers Andy Hunt and Dave Thomas:

    http://www.artima.com/intv/plain.html

    Here's an excerpt:

    Andy Hunt: Virtually any program that's going to operate on text of some sort can operate on plain text as the lowest common denominator. Very often you get into a state where you want to work with some program, but it's properties file has gotten corrupted such that the program won't even come up to let you change the property. If that file is in some binary format that needs the program itself to fix it, you're hosed. You've catch-22ed yourself right out of existence. If it's in a plain text format, you can go in with any generic tool -- a text editor, whatever you like to use to deal with plain text -- and fix the problem. So in terms of emergency recovery, or changes in the field, plain text is helpful. It provides another level of insurance.

    What do you think of Dave and Andy's comments?
    • Erik
       
      Posts: 7 / Nickname: erikprice / Registered: March 27, 2003 2:05 AM
      Re: Plain Text and XML
      May 5, 2003 7:43 AM      
      Dave Thomas says: People think, "Once I've got my data in XML that's all I've got to do. I've now got self-describing data," but the reality is they don't. They're just assuming that the tags that are in there somehow give people all the information they need to be able to deal with the data.

      One of the most useful parts of any XML file is the comments that are used to help explain things. There are faculties for making XML more human-readable, people simply have to make use of them.

      When I hear someone say "XML is self-describing", I don't take that to mean that they think an XML document all by itself can tell you everything you need to know about it. My interpretation of this is that one doesn't necessarily have to draw upon an outside source to decode the meanings of the various tags. Certainly, some systems will use XML in ways that are only slightly more readable than binary formats, in which case, yes you do need to refer to an outside source to understand the file. And there are ways to handle more complex scenarios where you need a DTD or XSD or some kind of data dictionary to understand what is going on in the XML document. But in many cases, the XML document itself can do all of the work.

      A big advantage of using XML is the fact that it's pretty widely-known. Using org.apache.commons.digester, I was able to give the users of a simple command-line tool I wrote a means of configuring the tool from a simple XML file (when I say simple, I'm talking about 5 or 6 elements). I provided an extensively commented sample XML file that could be used as a template for later configurations. If I need to extend the features of this file, I can do so easily without fear that my users will have to learn new, unfamiliar syntax for the configuration file (they will only need to learn any new elements or attributes that are added), as long as I remain within the XML standard. And chances are good that I'll be able to find a parser, perhaps one as easy to use as Digester, that can accommodate the changes, so I won't have to roll my own.

      In short, I think a large degree of the responsibility for ensuring that XML documents are human-readable is with the author, or the author of the tool that will generate the documents. It's not the XML format itself that has an inherent problem with being human-readable.
    • Carfield
       
      Posts: 12 / Nickname: carfield / Registered: September 16, 2002 3:19 PM
      Re: Plain Text and XML
      May 5, 2003 9:41 AM      
      yes... some times I will think some XML is too complicate to handle, like the file format of openoffice.

      I will think that some content package format, like SCORM (http://www.adlnet.org), only store the table of content and the pointer to the real content in the XML, then the package reader get the individual page information according to the pointer. I wonder if it is the better approach of storing big XML?
    • Matt
       
      Posts: 1 / Nickname: mmcshane / Registered: April 29, 2003 2:23 AM
      Re: Plain Text and XML
      May 5, 2003 10:45 AM      
      It's not always immediately obvious what a new technology really adds. I think TimBL (maybe Fielding?) once said of the web (paraphrasing) HTML isn't so great, HTTP isn't so great, but a universal address space (i.e. URI) turns out to be very useful.

      In the case of XML, angle brackets and attributes may not be so great, and sure, people can do some pretty stupid stuff, but the advancement of Unicode as a standard text encoding is, IMHO, a Good Thing.
    • Martin
       
      Posts: 1 / Nickname: blais / Registered: March 13, 2003 1:33 PM
      Re: Plain Text and XML
      May 5, 2003 11:25 PM      
      Just a quick pointer for the programmers out there: if you're interested in using plain text files as a way to generate Docbook XML files (which has numerous advantages, considering the growing variety of tools for it, but IMHO is a pain in the neck to edit "by hand"), check out the docutils plaintext converter (written in Python):

      http://docutils.sourceforge.net

      With this tool, I can write using plaintext files with almost no markup (or very very little, in any case), then easily generate HTML pages or LaTeX documents, and eventually Docbook XML (there is an unfinished-but-working implementation of conversion to Docbook).
      • Erik
         
        Posts: 7 / Nickname: erikprice / Registered: March 27, 2003 2:05 AM
        Re: Plain Text and XML
        May 6, 2003 3:13 AM      
        That's interesting... while I admit it's a mild pain to add the tags, and I'm sure using docutils makes it easier (I haven't used docutils, though), I was able to add most of the tags to a Docbook document I wrote, using careful Perl scripts and a feature of the BBEdit text editor called "Glossary". The real pain in the neck of using Docbook was getting Jade to work.
    • douglas
       
      Posts: 1 / Nickname: gxc / Registered: May 6, 2003 11:16 AM
      Re: Plain Text and XML
      May 6, 2003 3:18 PM      
      FWIW,
      Word 1 was pretty much just plain text.
      • Bill
         
        Posts: 409 / Nickname: bv / Registered: January 17, 2002 4:28 PM
        Re: Plain Text and XML
        May 6, 2003 7:42 PM      
        > FWIW,
        > Word 1 was pretty much just plain text.

        That's hilarious. Thanks. Perhaps we should change that to Word 2.
        • James
           
          Posts: 1 / Nickname: jmep / Registered: March 25, 2003 1:58 AM
          Re: Plain Text and XML
          May 8, 2003 5:37 AM      
          Word 1...ahhh. It wouldn't even run on the Windows 2 (not 2000) my dad had, but it was smoking on my old 8088 with two 5 1/4 floppy drives, no hard disk, and an Amber screen. ...I think my watch is now more powerful!
    • Joost de
       
      Posts: 15 / Nickname: yoozd / Registered: May 15, 2003 4:13 AM
      Re: Plain Text and XML
      May 15, 2003 8:40 AM      
      Loud cheers for the statement that XML is an ill-fit for user-input.
      That's exactly what bothered me in the recent efforts at rebuilding aspectj in open source XML format.
      (See Cedric Beust's weblog or Rickard ?berg's).
      Somehow they seem to think it's the other way round.
      Now my own opinion is validated by the authorities. My superego is sighs with satisfaction. :-)

      But what I'd like to ask Dave & Andy: is writing your own grammar using javacc, lexx/yacc or whatever the only alternative they think of when they say 'xml sucks'?
      Because a lot of times I think XML with it's metadata validation (XSD's) is better then the alternative which would be at most customersites I visit no validation or validation interpersed with parsing code.

      Ofcourse XML is now mainstream so there's more newsvalue in criticising it, but unless Dave & Andy can come up with some other alternative than using lexx/yacc I think XML is most of the times the least bad alternative.

      And Sun's machine generated XML files seem to have been a first try at declaring information (deployment information, transaction information,...) that is not in the same plane as the code. And looking back I think we decide that it has these drawbacks they mentioned and that metadata attributes as in .Net and JSR 175 & 182 offer a much better design.

      groetjes,
      Joost
    • Adam
       
      Posts: 1 / Nickname: ziggy / Registered: May 6, 2003 6:26 AM
      Re: Plain Text and XML
      May 6, 2003 10:41 AM      
      > Dave Thomas says, "XML is useful in appropriate contexts,
      > but it is being grossly abused in most of the ways it is
      > being used today."

      The problem with XML is that it's the round peg that's being shoved into holes of every shape.

      We're better off today because we have XML; at the small end of the spectrum, I'd rather parse someone's tags rather than deal with writing regexes or yacc grammars each and every time someone has worthwhile data/metadata I want to process. And let's not forget that not every programmer can switch hats to write bug-free regular expressions or yacc grammars. If they can express their data with a small XML vocabulary, I'm not going to be held hostage by their buggy grammar or my buggy regex; we can agree to use an XML parser as a foundation and reliably interchange data with each other.

      Of course there are instances where XML is misused, abused or simply a bad choice. Ant build files are one example. Apple's plist format is another. At the same time, there are text-based grammars that are worse -- Makefile syntax (in all its many splendored dialects), *roff or *TeX for example -- so shunning XML in favor of a simpler text-based format is not a panacea.

      The crux of the problem we are facing is that programmers do not create easily parsed, simple data formats easily (complex formats and buggy parsers are much easier to create). The issues with abusing XML are the same ones we find with underspecifying a text-based grammar, or creating an overly complex text-based grammar. Swinging back from XML to text/yacc isn't a magic bullet, and won't solve any problems in and of itself.
      • Martin
         
        Posts: 9 / Nickname: mfowler / Registered: November 27, 2002 3:51 AM
        Re: Plain Text and XML
        May 17, 2003 6:59 AM      
        > > Dave Thomas says, "XML is useful in appropriate
        > contexts,
        > > but it is being grossly abused in most of the ways it
        > is
        > > being used today."
        r

        > Of course there are instances where XML is misused, abused
        > or simply a bad choice. Ant build files are one example.

        My colleauge Matt Foemmel is getting sick of reading (and writing) ant files in XML. So he's come up with an alternative: Pynt <http://pynt.sourceforge.net/>. It uses a Python like scripting language and can easily call ant tasks.
        • Simon
           
          Posts: 1 / Nickname: slangford / Registered: May 18, 2003 9:58 AM
          Re: Plain Text and XML
          May 18, 2003 2:04 PM      
          Personally I don't have a problem writing all my ant scripts in XML, at least it stops my colleagues who try to do:
          <tag1/>
            <tag2/>
          </tag1>
          

          no matter how often i explain, the build falling over is a good indicator for them.

          If using a script suits someone that's fine, but in a larger project, who's to say that there's going to be a common language between all the developers, at least XML is simple (well, barring my previous example :-)).
        • Vincent
           
          Posts: 40 / Nickname: vincent / Registered: November 13, 2002 7:25 AM
          Re: Plain Text and XML
          May 20, 2003 1:12 AM      
          > My colleauge Matt Foemmel is getting sick of reading (and
          > writing) ant files in XML. So he's come up with an
          > alternative: Pynt <http://pynt.sourceforge.net/>.

          I can't get this link to work.

          V.
          • Joost de
             
            Posts: 15 / Nickname: yoozd / Registered: May 15, 2003 4:13 AM
            Re: Plain Text and XML
            May 20, 2003 0:48 PM      
            After you've clicked on the link you should remove the final character '>' in the resulting adres in your browser. It should read http://pynt.sourceforge.net

            Very interesting, this pynt. Especially if it leverages existing Ant tags and - scripts.
            Hope it will be able to generate some following.

            groetje,s
            Joost
      • Todd
         
        Posts: 27 / Nickname: tblanchard / Registered: May 11, 2003 10:11 AM
        Re: Plain Text and XML
        May 28, 2003 3:14 PM      
        > Of course there are instances where XML is misused, abused
        > or simply a bad choice. Ant build files are one example.

        This isn't that bad of a use I think.

        > Apple's plist format is another.

        Yes - the text only version of these (NextStep classic style plist) - is what I routinely use to pass around structured data as text. The XML-ization of this format achieves buzzword compliance at the expense of readability and size.

        > The crux of the problem we are facing is that programmers
        > do not create easily parsed, simple data formats easily

        But they only bother to create them in the absence of something already working. I have seen no homegrown syntaxes on NextStep/Cocoa systems because its trivial to serialize/deserialize arbitrary collections of strings.

        XML filled a void - but to borrow your analogy, the void was round and XML is roughly triangular. So its usually a bad fit.

        > Swinging back from XML to text/yacc isn't a magic bullet,
        > and won't solve any problems in and of itself.

        No, but a much simpler serialization format would go a looong way towards eliminating the hated homegrown syntaxes. I've got classic plist parser/serializers for ObjectiveC, Java, and Smalltalk. They take about a page of code and have amazing power. I don't see anything in 90% of XML that this mechanism doesn't do better in every way.

        I think XML in general is just overhead and I avoid it whenever possible.
    • Steven
       
      Posts: 2 / Nickname: situboo / Registered: July 17, 2003 9:14 AM
      Re: Plain Text and XML
      July 17, 2003 2:00 PM      
      I've been following this critism of Ant for sometime now. I find it ironic that the point of illegibility made against it contradicts one of the reasons I have suggested folks use it. How it in effect becomes a document to the build process and I don't have to go to someone in a SCM group to understand how the application is built. Where is the problem? Being dependent on the java program that consumes it?
    • Steven
       
      Posts: 2 / Nickname: situboo / Registered: July 17, 2003 9:14 AM
      Re: Plain Text and XML
      July 17, 2003 2:04 PM      
      I agree. I've heard this critism of Ant for sometime now. I find it ironic that the point of illegibility made against it contradicts one of the reasons I have suggested folks use it. How it in effect becomes a document to the build process and I don't have to go to someone in a SCM group to understand how the application is built. Where is the problem? Being dependent on the java program that consumes it?
      • Matt
         
        Posts: 62 / Nickname: matt / Registered: February 6, 2002 7:27 AM
        Re: Plain Text and XML
        July 18, 2003 9:52 AM      
        Hmm, I don't see the irony (maybe Alanis Morissette would, though).

        Is the argument here that XML files are "documents" whereas other (possibly more legible) files are not?

        Wouldn't it be better to have a crisp and clear build-file syntax which was much more tailored to a build process that could also generate an XML file output?