In which, against my better judgement, I try to figure out what everyone means when they talk about XML...
I was doing so well. I had resolved to blog more often, and had
done a pretty good job of not waiting until I had polished my thoughts before
pushing them out to the web. And a pretty interesting discussion had ensued,
around notions of typing, mobile code, and how to use Java for distributed
But then I got a piece of email from one of my compatriots,
and I hit a wall. It wasn't complex, or hard to understand, or even all
that provocative (at least, I don't think it was meant to be). It made a
simple statement: Of course, next you will talk about the role of XML
in all this. And because of that little statement, I hit a wall, and
have been silent for months.
But the time has come to face the demons and answer the
question. Let me tell you what I think the role of XML is going to be in
all this...or at least start what will no doubt be a series of posts
trying to make it clear what I think of XML, what I think the role of XML
should be, and other assorted subjects.
Those of you who know me, or have heard me speak, or have
read some of what I have written in the past, may have the impression that
I don't think much of XML. There is a sense in which that is true, but
like so much in this world, the whole story is much more complex than
To begin with, I'm never quite sure what people mean when they talk about
XML. XML itself is a specification of a syntax for
documents, and as an extension of the Standard Generalized Markup Language
(SGML) it is pretty unobjectionable and pretty unremarkable. But getting
worked up about the syntax is, as Rob
Gingell is wont to say, about as sensible as getting worked up
But I don't think most people, when talking about XML, are just talking
about the syntax. They talk about XML being self-documenting or
human-readable. They talk about XML allowing communication between
distributed objects. And none of these properties are syntactic; they all
require some kind of semantics. So when people start ascribing semantic
properties to a syntax, I start wondering what they are talking
about. Clearly, the term XML has become shorthand for something
more, something richer, something more, well, meaningful.
So perhaps what people mean when they talk about XML is actually XML and
some DTD, or schema, or other interpretation that will give some semantics
to go along with syntax. This combination would give some of the
properties of XML that people talk about. It would allow inter-operation of
programs that exchanged information using XML (and the common
interpretation). But then people would need to say what schema, DTD, or
interpretation they were pairing with XML, because lord knows there are a
hell of a lot of different standards, pseudo-standards, and proposals that
use the XML syntax but don't inter-operate.
The best interpretation I can put on the popularity of XML is that it is
easy to write a parser for whatever interpretation you want to use, and
thus there is a simple way to craft a mechanism for passing information
from one program to another. XML on this view is no different than the use
of ascii (another syntax) in the early days of Unix. In those days, there
was a thriving trade in tools that would take ascii streams in and shoot
ascii streams out. The idea behind all of these tools was that the user of
the system could string together a sequence of the tools, with each tool
producing an ascii stream that the next tool would use as input.
The result was a revolutionary system of simple tools that could be strung
together to do all sorts of interesting things. Shell languages evolved to
help program these early services, and a skilled Unix user had a set of
tools that allowed him (or her) to do all sorts of amazing things. I still
remember being told that I needed to produce a listing of terms to be
indexed for a book
I was editing. After a bit of thought, I ran the text of the book through
tr to get all of the words on a single line, piped the results
through sort to alphabetize and eliminate duplicates, and then
edited the resulting word list to get the ones that were significant
enough to need to be in the index.
All of these tools worked because the programmers of the tools agreed on a
syntax for the input and output. You (the user of the tool) needed to make
sure that the output of one tool could be understood when fed into
the next tool, but there was no question about the syntactic form for
output and input. And because the input and output forms were so simple,
it was easy to construct the tools that would use them. Simplicity in this
case meant that each tool had to do a fair amount of work in parsing and
validating the input, but at least there was a common convention for
passing information from one of the tools to another.
I think that the attraction of XML is much the same as the attraction of
ascii... it is a simple way to specify a syntax for exchanging information
between cooperating programs. Of course, the cooperating programs have to
be picked in such a way that they can interpret the syntax in the same
way, so they really have to be developed with a particular interpretation
in mind. But it is a simple way to enable interaction, and that isn't a
But while this understanding makes sense, it also doesn't match all of
the claims made about the use of XML in distributed computing (or even
the use of XML in Web Services, yet another phrase that is both
over-used and seemingly devoid of common meaning). This understanding
does not insure program interoperability. It doesn't explain the number
of standards efforts around XML. It doesn't answer many of the problems
that people say XML answers.
So I answer the question that started all of this with another
question. What is the role of XML in all of this? leads to the
question Just what do you mean by XML?.
Computers don't "understand" anything. People do. The nice thing about XML is that for well-designed XML-based formats, a programmer can figure out the format just by looking at a few representative examples. And people are very good at learning from examples.
I totally agree with Brian, XML is just a simple and easy way to exchange information... for humans. ASCII had to fit some machine limitations, XML has less of these limitations and whatever technology comes after XML will surely be more human-friendly. Distributing computing is still very tight to programming languages and these are still very much for machines not for humans. That's why I see no future for Web Services in distributed programming , WSDL is as horrible for humans as it is for machines
> Computers don't "understand" anything. People do. The > nice thing about XML is that for well-designed XML-based > formats, a programmer can figure out the format just by > looking at a few representative examples. And people are > very good at learning from examples. > > A DTD is also helpful, but that's extra.
Would you like to receive your newspaper in XML? Why would people need to read XML?
> I totally agree with Brian, XML is just a simple and easy > way to exchange information... for humans. ASCII had to > fit some machine limitations, XML has less of these > limitations and whatever technology comes after XML will > surely be more human-friendly.
When machines exchange information, why does it need to be human-friendly?
XML gets used for a lot of different things. Its not particularly good at any of them.
Most commonly, it gets used as a structured data serialization format. But the selection of whether to represent values as attributes vs entities means you can have very different xml formats representing the same graph of data. For serialization, attributes are a mistake. A lot of the additional scaffolding added to serialization is pointlessly complex. I speak of xml schema and its overblown "type" system. A better solution for serialization that is also human readable is the old NextStep PList format.
XML is sometimes used as a wire protocol. It sucks for this as it is much too verbose.
XML is called a markup language, but the requirement to be "well formed" (no overlapping chunks) makes it a lousy markup language.
XML is said to be "human" readable. Its not. Its bloody hard to read by eye without the aid of tools because the signal to markup ratio is so low and its loaded with pointless boilerplate. Namespaces multiply the problem.
XML is meant to be machine readable/easily parseable but XML has so many constructs and rules that XML parsers are elaborate and heavyweight things. One does not write a parser casually and one is likely to have trouble fitting and XML parser into a resource limited device like a cell phone. Compare this to a PList parser that takes about a page of Java to implement fully.
XML is meant to be "transformable" to other formats but the truth is that XSL is an incredibly cryptic and ugly language.
I avoid the use of XML wherever possible. Occasionally a useful XML format comes out (like RSS) and in that case its worth it to use it. I should note that RSS makes no use of namespaces or attributes. It is very simple. That is what makes it successful.
I receive and process mapping data (road maps) in a variety of formats. I far prefer the XML based format to all the others. As it is delivered GZ compressed it is no more bulky than the others (and one in particular is far worse). It also has no difficulty representing any character (all the others assume only 1 byte characters and are frequently delivered without specifying the character set). So for transferring data between organisations which use different software and especially where localisation issues arise, XML is the best mechanism available.
I completely agree with Jim's comments concerning XML. The others on here as well, XML is anything but readable. Web Services conceptually is the same as RPC's, CORBA, etc... Same stuff with a different implementation (yawn!) The comment that Jim made about the UNIX tools was to me the most important thought. Those tools were simple to use and you could accomplished a lot of different things very easily. Oh yeah they were *readable* too! ;-) These thoughts made me think about the concept of Jini/Javaspaces being so similiar. In that we can assemble tools, aka Jini services, and present the results in the same nice *piped*, aka Javaspaces, fashion. The tool makers can decide what the best I/O format would be for those tools based on their requirements. Plus its all wonderfully distributed! The power of this one idea is vast. I know that I am way off the subject of XML, but I just could not resist the other thoughts that popped in when I read this.
Six years ago I went to a talk on XML. The speaker was smart, articulate and well organized. The talk was by far the most tedious technology talk I had ever heard
What is XML? Let me count the meanings:
it's a religious icon it's a silver bullet it's an insurance policy it's peace of mind it's fat it's fashionable it's unreadable it's usually inappropriate
My frustration is how quickly smart people lose their critical faculties when facing something that is buzzword compatible. What's up with XML for configuration files? How are they an improvement over a property file? Or using XML as a "transport" between two components within an enterprise?
yes, it's craaaaaaaap. but it is also symptomatic of a general dumbing down of the US of A; perhaps the entire western world. that's the sad part. ok, maybe a tad hyperbolic. but consider how much other ghastly non-thinking has gone on in in the general population the last 5 years. aren't we supposed to be smarter than your average bear?
Dr. Codd figured out a way to replace IMS with something so much smarter (and managed to really annoy his employer in the process), yet the java/web/KoolAidDrinkers think that they've *invented* something. gad. tagged text data files were au courant at DEC/IBM/etc. in the 1960s.
not that i carry bill's water, but winFS is a radical nose thumbing: relationalize (and embed in the OS, a la AS//400) everything. if it weren't a M$ initiative, i might want to see it succeed.