The Artima Developer Community
Interviews | Discuss | Print | Email | First Page | Previous | Next
Sponsored Link

Plain Text and XML
A Conversation with Andy Hunt and Dave Thomas, Part X
by Bill Venners
May 5, 2003

<<  Page 4 of 4


Misuses of XML

Dave Thomas: Now, can I just have a little rant?

Bill Venners: Sure.

Dave Thomas: XML sucks.

Bill Venners: Why?

Dave Thomas: XML sucks because it's being used wrongly. It is being used by people who view it as being an encapsulation of semantics and data, and it's not. XML is purely a way of structuring files, and as such, really doesn't add much to the overall picture. XML came from a document preparation tradition. First there was GML, a document preparation system, then SGML, a document preparation system, then HTML, a document preparation system, and now XML. All were designed as ways humans could structure documents. Now we've gotten to the point where XML has become so obscure and so complex to write, that it can no longer be written by people. If you talk to people in Sun about their libraries that generate XML, they say humans cannot read this. It's not designed for human consumption. Yet we're carrying around all the baggage that's in there, because it's designed for humans to read. So XML is a remarkably inefficient encoding system. It's a remarkably difficult to use encoding system, considering what it does. And yet it's become the lingua franca for talking between applications, and that strikes me as crazy.

Andy Hunt: It's sort of become the worst of both worlds.

Bill Venners: Actually, that was one of my last questions I was going to ask: Do you consider XML plain text? Could you elaborate on what you said about how people view XML?

Dave Thomas: People think, "Once I've got my data in XML that's all I've got to do. I've now got self-describing data," but the reality is they don't. They're just assuming that the tags that are in there somehow give people all the information they need to be able to deal with the data. Now, for some things there are standards. For example, there are some standards like RSS and RDF, which give you very simple ways of describing web page content. But a random XML file, especially machine generated XML files, can be as obscure as binary data.

Bill Venners: Yeah, I find Ant build files, which are XML, very hard to read.

Andy Hunt: Ant is actually a really good example, because in that case you're using XML as a user-specified input language, which is really inappropriate in that context. I'd much rather have something...

Bill Venners: A context-free grammar, something that's more readable.

Andy Hunt: Yeah, a genuine grammar. I want to be able to type something simple and easy for me. I don't care if it's easy for the tool to parse, that's the tool's problem. I want it to be easy for me to write. And in cases like that, it's really the case of the programmer saying, "Oh look, here's an XML parser. I can just take XML files. That's easier." So one programmer in one context puts a burden on the other 100,000 programmers trying to use it.

Bill Venners: Well, I think some people may be more comfortable reading XML than others. What I've found is that I can usually read XML files just fine if they are small and simple. For example, a couple years ago I pulled certain metadata about each web page into external files that I keep separate from the raw HTML files. The metadata file has information like title, subtitle, publication date, author, and so on. As part of my build system, I wrote a "page pumper" program that takes one metadata file and one raw HTML file as input and generates the pretty HTML file you see on the web as output. When I want to make global changes to the look and feel of, I just change page pumper and do a build all.

In the old days, what I would have done to create that metadata file was whip up a quick context free grammer with tools like Lex and YACC, and use that for the grammer of the metadata file. But given that XML was all the rage back then, I wanted as a consultant to get some experience with XML, so I used XML for the metadata file. And XML has worked just fine in that situation. I can easily edit the web page metadata files by hand and easily read them, even though they are XML, because they are small and simple. But I've often been frustrated staring at even moderate-sized Ant build files trying to decipher them, and staring at the Ant documentation trying to figure out how to do something that I think should be simple and obvious.

Dave Thomas: If you're talking about using XML in certain domains, it's fine. XSLT, for example, lets you do some really fun things with XML. When we had our book online, for example, we went from LaTeX, to XML, and then to the output format, simply because XSLT gave us some really powerful ways for manipulating the document's content. XML is useful in appropriate contexts, but it is being grossly abused in most of the ways it is being used today.

Next Week

Come back Monday, May 12 for Part I of a conversation with Elliotte Rusty Harold. If you'd like to receive a brief weekly email announcing new articles at, please subscribe to the Artima Newsletter.

Talk Back!

Have an opinion about assertions, crashing early, or the appropriate level of confidence to have in the code you write? Discuss this article in the News & Ideas Forum topic, Plain Text and XML.


Dave Thomas talks about putting abstractions into code, details into metadata in Part IV of this interview, Abstraction and Detail:

Andy Hunt and Dave Thomas are authors of The Pragmatic Programmer, which is available on at:

The Pragmatic Programmer's home page is here:

<<  Page 4 of 4

Interviews | Discuss | Print | Email | First Page | Previous | Next

Sponsored Links

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us