The Artima Developer Community
Sponsored Link

Objects, Networks, and Making Things Work
What is XML?
by Jim Waldo
October 18, 2005
In which, against my better judgement, I try to figure out what everyone means when they talk about XML...


I was doing so well. I had resolved to blog more often, and had done a pretty good job of not waiting until I had polished my thoughts before pushing them out to the web. And a pretty interesting discussion had ensued, around notions of typing, mobile code, and how to use Java for distributed computing.

But then I got a piece of email from one of my compatriots, and I hit a wall. It wasn't complex, or hard to understand, or even all that provocative (at least, I don't think it was meant to be). It made a simple statement: Of course, next you will talk about the role of XML in all this. And because of that little statement, I hit a wall, and have been silent for months.

But the time has come to face the demons and answer the question. Let me tell you what I think the role of XML is going to be in all this...or at least start what will no doubt be a series of posts trying to make it clear what I think of XML, what I think the role of XML should be, and other assorted subjects.

Those of you who know me, or have heard me speak, or have read some of what I have written in the past, may have the impression that I don't think much of XML. There is a sense in which that is true, but like so much in this world, the whole story is much more complex than that.

To begin with, I'm never quite sure what people mean when they talk about XML. XML itself is a specification of a syntax for documents, and as an extension of the Standard Generalized Markup Language (SGML) it is pretty unobjectionable and pretty unremarkable. But getting worked up about the syntax is, as Rob Gingell is wont to say, about as sensible as getting worked up about ascii.

But I don't think most people, when talking about XML, are just talking about the syntax. They talk about XML being self-documenting or human-readable. They talk about XML allowing communication between distributed objects. And none of these properties are syntactic; they all require some kind of semantics. So when people start ascribing semantic properties to a syntax, I start wondering what they are talking about. Clearly, the term XML has become shorthand for something more, something richer, something more, well, meaningful.

So perhaps what people mean when they talk about XML is actually XML and some DTD, or schema, or other interpretation that will give some semantics to go along with syntax. This combination would give some of the properties of XML that people talk about. It would allow inter-operation of programs that exchanged information using XML (and the common interpretation). But then people would need to say what schema, DTD, or interpretation they were pairing with XML, because lord knows there are a hell of a lot of different standards, pseudo-standards, and proposals that use the XML syntax but don't inter-operate.

The best interpretation I can put on the popularity of XML is that it is easy to write a parser for whatever interpretation you want to use, and thus there is a simple way to craft a mechanism for passing information from one program to another. XML on this view is no different than the use of ascii (another syntax) in the early days of Unix. In those days, there was a thriving trade in tools that would take ascii streams in and shoot ascii streams out. The idea behind all of these tools was that the user of the system could string together a sequence of the tools, with each tool producing an ascii stream that the next tool would use as input.

The result was a revolutionary system of simple tools that could be strung together to do all sorts of interesting things. Shell languages evolved to help program these early services, and a skilled Unix user had a set of tools that allowed him (or her) to do all sorts of amazing things. I still remember being told that I needed to produce a listing of terms to be indexed for a book I was editing. After a bit of thought, I ran the text of the book through tr to get all of the words on a single line, piped the results through sort to alphabetize and eliminate duplicates, and then edited the resulting word list to get the ones that were significant enough to need to be in the index.

All of these tools worked because the programmers of the tools agreed on a syntax for the input and output. You (the user of the tool) needed to make sure that the output of one tool could be understood when fed into the next tool, but there was no question about the syntactic form for output and input. And because the input and output forms were so simple, it was easy to construct the tools that would use them. Simplicity in this case meant that each tool had to do a fair amount of work in parsing and validating the input, but at least there was a common convention for passing information from one of the tools to another.

I think that the attraction of XML is much the same as the attraction of ascii... it is a simple way to specify a syntax for exchanging information between cooperating programs. Of course, the cooperating programs have to be picked in such a way that they can interpret the syntax in the same way, so they really have to be developed with a particular interpretation in mind. But it is a simple way to enable interaction, and that isn't a bad thing.

But while this understanding makes sense, it also doesn't match all of the claims made about the use of XML in distributed computing (or even the use of XML in Web Services, yet another phrase that is both over-used and seemingly devoid of common meaning). This understanding does not insure program interoperability. It doesn't explain the number of standards efforts around XML. It doesn't answer many of the problems that people say XML answers.

So I answer the question that started all of this with another question. What is the role of XML in all of this? leads to the question Just what do you mean by XML?.

Talk Back!

Have an opinion? Readers have already posted 34 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Jim Waldo adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Jim Waldo is a Distinguished Engineer with Sun Microsystems, where he is the lead architect for Jini, a distributed programming system based on Java. Prior to Jini, Jim worked in JavaSoft and Sun Microsystems Laboratories, where he did research in the areas of object-oriented programming and systems, distributed computing, and user environments. Before joining Sun, Jim spent eight years at Apollo Computer and Hewlett Packard working in the areas of distributed object systems, user interfaces, class libraries, text and internationalization. While at HP, he led the design and development of the first Object Request Broker, and was instrumental in getting that technology incorporated into the first OMG CORBA specification.

This weblog entry is Copyright © 2005 Jim Waldo. All rights reserved.

Sponsored Links


Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use