The Artima Developer Community
Sponsored Link

Java Community News
Brett McLaughlin: What is XML Really Good For?

81 replies on 6 pages. Most recent reply: Mar 8, 2007 3:37 AM by Antti Tuomi

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 81 replies on 6 pages [ « | 1 2 3 4 5 6 | » ]
James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 8:39 AM
Reply to this message Reply
Advertisement
Let's put it this way. If you were a young person growing up in Brazil and you wanted to learn a language that would most help you in your quest to become a computer programmer, what language should you learn? The answer is English. It's not because English is the best language for computer discussion (whether it is or not is not relevant) it's because the pretty much all the information you will want is in English and most people that you will want to communicate with will speak English.

I have done a lot of work with XML and I agree that it is not really a great solution for communicating with external entities. For one thing, the verbosity creates message bloat. People may say this is trivial but when you are routinely sending 10+MB files and having to store them, the bloat becomes an issue. Compression is an option but that's a work-around, not an argument.

Given all of this, we still use XML. The reason why is the same reason a computer science student in Brazil learns English. It's what the entities I want to communicate with understand and communicate in. Saying that isn't a valid reason to use it makes no sense to me. It's not the only reason to choose a language but it's definitely an important one.

ERK

Posts: 10
Nickname: erk
Registered: Dec, 2004

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 9:02 AM
Reply to this message Reply
Thanks for clarifying.

> ...
> It's not because English is the best language
> for computer discussion (whether it is or not is not
> relevant) it's because the pretty much all the information
> you will want is in English and most people that you will
> want to communicate with will speak English.

Agreed.

> I have done a lot of work with XML and I agree that it is
> not really a great solution for communicating with
> external entities. For one thing, the verbosity creates
> message bloat. People may say this is trivial but when
> you are routinely sending 10+MB files and having to store
> them, the bloat becomes an issue. Compression is an
> option but that's a work-around, not an argument.

Agreed again.

> Given all of this, we still use XML. The reason why is
> the same reason a computer science student in Brazil
> learns English. It's what the entities I want to
> communicate with understand and communicate in.

If you need to communicate with an existing component that uses XML, then sure, that makes a great deal of sense at that component boundary. I just don't think that reasons beyond that one justify the use of XML.

Specifically, I think XML falls short as a format for configuration files, datastores/databases, and messages between components over which you have control.

And XML is a miserable choice for describing languages; to compare two XML-related attempts, I find Relax NG much more pleasant, and expressive, than XML Schema. XML Schema is fairly rich, but ugly and verbose.

> Saying
> that isn't a valid reason to use it makes no sense to me.
> It's not the only reason to choose a language but it's
> s definitely an important one.

I used the word "valid" incorrectly in my previous post - I meant XML on its own is insufficient, because it's not a language we use directly. It's a language syntax, one component of a meta-language (and I'm probably being sloppy here, but...). One still needs a schema or ontology, and some description of semantics. Unless there's an existing XML schema, and unless your partner is wedded to parsing XML, you have a choice whether or not to use XML. If you choose it, your recipient will still need to implement the proper hooks for converting your XML into their internal structures. Doing so with other syntaxes is fairly easy, though far less frequently done.

The tangential points I raised circle around the origin of languages, and are important with regard to a future beyond an immediate need to integrate with an existing XML service. To rehash: the history of software demonstrates that new, backward-incompatible languages can and do arise despite history, which is much less weighty in software than in human communications. So, by making choices and developing new formats and languages, we can influence the future of interoperability far more quickly and effectively than we can influence the future of English.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 9:12 AM
Reply to this message Reply
> > This still has nothing to do with the question I asked.
> > Why are we conversing in English and not some other
> > language? Do you not know the answer?
>
> I answered it below. However, since you seem to just love
> this dance, why don't you just rephrase and answer your
> own question: why are we conversing in English? I
> responded in English because I know English and because
> the earlier posts were all in English.

Exactly.


> Since you seem to enjoy repeatedly stating that I'm
> drifting off-topic, I'll drop this, as indeed I'm about to
> do with the entire conversation. If your purpose was to
> teach, you're failing; if your purpose was to exhaust me,
> you're succeeding.

I'm not trying to teach you. I'm making an argument.

> > As you
> > mention Esperanto was designed. You could design a
> spoken
> > language and start communicating in it, could you not?
>
> No - communicating with whom? Human languages are only
> useful to the extent that the participants understand
> them. Computer languages share that aspect with human
> languages, but the participants are not only easier to
> teach, but distributable and capable of being provided,
> cheaply, by third parties.

But this implies existing languages, does it not? Before you were saying something about not being confined to existing languages.

> > > Commonality, based primarily on popularity, based
> > > primarily on economics.
> >
> > But you have suggested that popularity is not a valid
> > choice for choosing which language to communicate with.
>
> I didn't suggest that for the general case of "languages."
> I suggested it was far less important for computer
> languages than for human ones.

I agree it is less but I think you overestimate how much less important it is.

> > Should we reconsider which language to communicate in
> on
> > n these fora? Maybe we should create a new language.
> > Natural languages tend to be imprecise.
>
> The point is irrelevant, since the conversations on these
> fora aren't between components of systems.

It's irrelevant what entities are. Two or more entities are communicating via a language. This is exactly the same as two companies communicating. Two companies could decide to make their computers communicate with English. It would be a disaster but they could try.

> > > > Software is a human creation.
> > >
> > > One far more explicit and conscious than a human
> > language,
> > > which is more organic, evolved over eons. With the
> > > exception of Esperanto, of course.
> >
> > You don't think programming languages have evolved?
> > That's demonstrably false. Computer languages have a
> > a ancestry that includes 18th century looms.
>
> I didn't say that. Read again. I said human language
> evolves in a more organic (read: so complex as to be
> unplannable) fashion, and that computer languages are more
> explicitly designed, and consciously changed. Ancestry is
> irrelevant; computer languages are designed explicitly.
> They don't have DNA, except to the extent that the
> designer(s) insert it.

Why does this matter? What difference does it make that it was a conscious change or not? What part of your argument does this point support?

> > Are you suggesting that if I want to create a B2B
> service
> > for placing an order at a wholesaler, that I should
> create
> > a new language to do so?
>
> Possibly. It depends on the service and the wholesaler,
> and on the availability of existing interfaces and
> languages for same.

So let's say you work for a wholesaler and decide XML and all other existing choices are no good so you create a new language called 'order' which will be used to accept orders from your customers. A competing company creates an XML-based service. Amazon knows and uses XML. Apple knows and uses XML. Basically every company you want to do business with uses XML or EDI. No one knows this language you just created. You require that all of these companies support your new language in order to do business with you.

If you tried to do this, I'll tell you what would happen. If you are not canned, you will be using XML before the end of the quarter because these other companies don't give a rat's ass about how ugly you think XML is.

> Are you suggesting that XML is a valid
> language on its own? It's not.

Nope. I never suggested anything like that. In fact there are hundreds of standards on top of XML for all manner of business. If you want to use those standards you must use XML.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 9:35 AM
Reply to this message Reply
> If you need to communicate with an existing component that
> uses XML, then sure, that makes a great deal of sense at
> that component boundary. I just don't think that reasons
> beyond that one justify the use of XML.

I don't disagree but once people eat the XML elephant, they tend to not want to have to worry about something in addition to that. I'm not saying that is correct but it's what people do.

> Specifically, I think XML falls short as a format for
> configuration files, datastores/databases, and messages
> between components over which you have control.

I think the main reason this has become standard is because the tools are there. Things like JavaCC are changing the equation but I still would like to see a SAX style parser for custom syntax. If you know of one, I would love to know about it. Maybe JavaCC does this but I find the documentation for JavaCC inscrutable, probably because I am not a top-down learner.

> And XML is a miserable choice for describing languages; to
> compare two XML-related attempts, I find Relax NG much
> more pleasant, and expressive, than XML Schema. XML Schema
> is fairly rich, but ugly and verbose.

No argument there.

> > Saying
> > that isn't a valid reason to use it makes no sense to
> me.
> > It's not the only reason to choose a language but it's
> > s definitely an important one.
>
> I used the word "valid" incorrectly in my previous post -
> I meant XML on its own is insufficient, because it's not a
> language we use directly. It's a language syntax, one
> component of a meta-language (and I'm probably being
> sloppy here, but...). One still needs a schema or
> ontology, and some description of semantics. Unless
> there's an existing XML schema, and unless your partner is
> wedded to parsing XML, you have a choice whether or not to
> use XML. If you choose it, your recipient will still need
> to implement the proper hooks for converting your XML into
> their internal structures. Doing so with other syntaxes is
> fairly easy, though far less frequently done.

I've actually seen a case where an existing syntax was parsed and with much effort converted to XML so that people could parse the XML to get the data in the first document. There's a mindlessness that has come about from the popularity of XML. I get where you are coming from.

The common perception is that XML is easier to integrate into code. I'm not sure this is the case. If you mean to fight the XML tide, you'd probably win more converts by showing that this is not the case. Without knowing the alternatives, people will not abandon XML. Some of this is just 'nobody gets fired for buying IBM' type logic. 'Nobody' gets fired for using XML.

A few posts back I posted an hypothetical syntax. It's not completely hypothetical. I've been thinking that simple tools should allow that syntax to be converted to XML in all cases. That would allow all the tools that exist around XML to be used and allow for communication with those that support XML without using XML. Do you feel this is still does not address the issues you have with XML?

> The tangential points I raised circle around the origin of
> languages, and are important with regard to a future
> beyond an immediate need to integrate with an existing XML
> service. To rehash: the history of software demonstrates
> that new, backward-incompatible languages can and do arise
> despite history, which is much less weighty in software
> than in human communications. So, by making choices and
> developing new formats and languages, we can influence the
> future of interoperability far more quickly and
> effectively than we can influence the future of English.

Haven't you been impacted by leveraged mindshare?

I heard someone use the term 'plus-up' on the radio recently and couldn't stop thinking about "1984" for the rest of the week.

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 9:37 AM
Reply to this message Reply
Since we've come to a common ground you can pretty much ignore the post before the previous one. It's redundant.

ERK

Posts: 10
Nickname: erk
Registered: Dec, 2004

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 1:18 PM
Reply to this message Reply
Sorry for prolonging; I'll try to make this brief.

> But this implies existing languages, does it not? Before
> you were saying something about not being confined to
> existing languages.

I just meant one of us can create a computer language, and a library to support it (parser/client stub/language bindings) and distribute it, all fairly easily. That process is countless orders of magnitude easier than launching (and finding acceptance for) a new human language, due to the far more restricted domain of a computer language.

> > I suggested it was far less important for computer
> > languages than for human ones.
>
> I agree it is less but I think you overestimate how much
> less important it is.

I think we do disagree on this factor. It's certainly possible I'm blue-skying it.

> It's irrelevant what entities are. Two or more entities
> are communicating via a language. This is exactly the
> same as two companies communicating.

Computers have far more limited domains of discourse than either companies or individuals, and demand precise protocols. I think there's a world of difference.

> Why does this matter? What difference does it make that
> it was a conscious change or not? What part of your
> argument does this point support?

I don't think analogies to human languages mean much. Designed human languages face a steep uphill battle, as opposed to nonconsciously-evolved human languages. Computer languages face a hill much less steep. There are countless other differences; analogies to human languages unnecessarily anthropomorphize software.

> So let's say you work for a wholesaler and decide XML and
> all other existing choices are no good so you create a new
> language called 'order' which will be used to accept
> orders from your customers. A competing company creates
> an XML-based service. Amazon knows and uses XML. Apple
> knows and uses XML. Basically every company you want to
> do business with uses XML or EDI. No one knows this
> language you just created. You require that all of these
> companies support your new language in order to do
> business with you.
>
> If you tried to do this, I'll tell you what would happen.
> If you are not canned, you will be using XML before the
> e end of the quarter because these other companies don't
> give a rat's ass about how ugly you think XML is.

They might if I can supply a language that's more manageable than XML, and libraries that support it. I'm not in the business above; we tend to be the consumer of interfaces, rather than a supplier.

An alternative to XML would presumably also be a syntax for defining domain-specific languages; it wouldn't need to be wholesale-specific. It would have opportunities to gain developer mindshare in other business domains, perhaps even internal interfaces between components, in which developers have more influence.

ERK

Posts: 10
Nickname: erk
Registered: Dec, 2004

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 1:25 PM
Reply to this message Reply
> Having now worked with a
> variety of application domains, I have to say that lots of
> domains deal with truly hierarchical data, and people tend
> to shoehorn that sort of data into relational schemas,
> perhaps because whether to use a relational database for
> persistence is not a question many ask. O/R mapping tools
> have alleviated the need to think about that shoehorning
> too much, since they can pretty painlessly project
> hierarchical data in a relational form, and vice versa.

Having now worked with a variety of application domains (auto paint formulas, invoicing systems, order and warehouse management systesm, print job management systems, and consumer loan systems), I have to say that few if any domains deal with truly hierarchical data, and people tend to shoehorn that sort of data into hierarchical XML schemas. I've come to opposite conclusions from you; relations better represent every domain I'm familiar with, as they represent logical predicates (thus covering more "business rules") which are ubiquitous.

I understand bills of materials are interesting hierarchies - they appear to be about the only ones. Org charts and file systems, both formerly paragons of hierarchy, are decreasingly so.

ERK

Posts: 10
Nickname: erk
Registered: Dec, 2004

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 1:27 PM
Reply to this message Reply
> Actually, data are neither hierarchical, nor relational.
> Data are functional, i.e. relations between data are
> defined by a mathematical function.

Perhaps, but my understanding was that relations encode one or more functional dependencies and eliminate the declaration of some constraints. You can always use binary relations with additional constraints, if you care to, but it's verbose and relations-as-predicates encode meaning in a more human-friendly way.

However, I may be misunderstanding you.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 2:01 PM
Reply to this message Reply
ERK writes
> XML has lots of parsers - so what? Parsing is
> well-understood, and tools like JavaCC make generating
> parsers fairly trivial.

what if you want to parse your format in multiple languages?

> You still need to write the logic
> to build whatever internal structures you require, or use
> a mapping framework

simple DOM code can be fairly portable and rewriting a bit of mapping logic is a lot less work than rewriting and debugging multiple parsers.

> > it is simple to read (verbose, but simple),
> The word "simple" requires some context

If you need to exchange data or save some configurations in an EXTENSIBLE manner, so someone else can later add embedded values or comments without breaking your format or affecting other consumers of the data, you will end up with something like XML.

The cost of debugging binary formats for interchange and configuration is only justifiable when there is a significant saving on performance. EfficientXML and other technologies have reduced the XML transmission issue to a codec - a transport-level problem.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 2, 2007 2:10 PM
Reply to this message Reply
ERK
> Having now worked with a variety of application domains
> (auto paint formulas, invoicing systems, order and
> warehouse management systesm, print job management
> systems, and consumer loan systems), I have to say that
> few if any domains deal with truly hierarchical data, and
> people tend to shoehorn that sort of data into
> hierarchical XML schemas.

In how many of those cases is the data in your relations adequately described when it comes to accurate specification of its values - accuracy, units of measure, precision etc.?

If I didn't have your source code, could I determine those issues?

Useful real-world data for interchange is usually related and heavily attributed. XML gives you the attribution and the ability to build relational structures.

Most systems ignore the issue of describing data types and work on assumptions that all values in a relation are specified the same way. That only works sometimes and a massive amount of work goes into coping when it doesn't. A good example is the exchange of Assay information when exploration boreholes are being examined by laboratories to determine mineral composition.

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 3, 2007 6:46 AM
Reply to this message Reply
> > > Most data in the world is not, upon analysis,
> > > hierarchical.
> that may be true - but it is also semi-structured, i.e.
> incomplete - a fact that is natually modelled in XML
> (missing child elements or attributes). In contrast, this
> is hard or impossible to model in SQL (it can be somewhat
> modelled using many joins)

there is no model in XML, only syntax. the rest is the debate over NULL. any downstream software will have to deal with this anomaly. it doesn't go away.

robert young

Posts: 361
Nickname: funbunny
Registered: Sep, 2003

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 3, 2007 6:48 AM
Reply to this message Reply
> > Most data in the world is not, upon analysis,
> > hierarchical.
>
> XML is not inherently hierarchical either.
>
>
> id=123
>
> ref=123

in actual use, yes it is.

the id/ref implementation is global to the document (last I looked) and not the equivalent of PK/FK (if that's what you're getting at), which tie relations together.

Gregor Zeitlinger

Posts: 108
Nickname: gregor
Registered: Aug, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 3, 2007 8:55 AM
Reply to this message Reply
> Nodes cannot contain text and elements (no mixed content)
> The syntax is changed in the following way:
>

> <parent name="bill" type="father">
> <child name="junior" type="son"/>
> </parent>
>

> becomes:
>

> parent: name="bill" type="father"
> child: name="junior" type="son"
>

>
> Note that the newlines and indentation are syntactically
> significant ala Python.
>
> Would you agree that this would be easier for human
> consumption?
No - it's mostly a matter of taste. I'd prefer the first for two reasons. I like the closing </parent> and because significant whitespace is one of the most stupid ideas I've ever heard of.

> I imaging your DSL would look something like this in the
> new syntax:
My DSL doesn't look like either of the examples (the XML version is not well formed, because of the <).

An example would be

<cell name="minor" defaultValue="false">
<if age="18" value="true"/>
<condition value="true">
<and>
<not>
<if eyeColor="blue"/>
</not>
<if voice="bass"/>
</and>
</condition>
</cell>

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 3, 2007 4:56 PM
Reply to this message Reply
Robert Young:
>> Andy Dent
> > XML is not inherently hierarchical either.
> >
> >
> > id=123
> >
> > ref=123
>
> in actual use, yes it is.
>
> the id/ref implementation is global to the document (last
> I looked) and not the equivalent of PK/FK (if that's what
> you're getting at), which tie relations together.

I have no idea where you are coming from with this. It seems like you are arguing from a position of ignorance in how a lot of real-world XML data models work.

Within an XML document, you can have a relational system with both the original simplistic id/idref and the more sophisticated relationships that are part of W3C XML Schema.

In the complex geological and mineral data we deal with this is often the case, with both internal references (starting with #) and external standard codelists or other URIs. A typical document in the assay processing domain might define a small number of procedures and have thousands of other entries referring back to them. eg:


<adx:measurement>
<adx:Assay gml:id="abc123_a_sb1">
<om:procedure xlink:href="#ICPMS1"/>
<!-- this URN is the proposed OGC identifier for the concept whose description is visible in as a GML Definition here https://www.seegrid.csiro.au/subversion/xmml/trunk/sweCommon/1.0.30/examples/phenomena.xml#Concentration -->
<om:observedProperty xlink:href="urn:x-ogc:def:phenomenon:OGC:Concentration[Sb]"/>
<om:featureOfInterest xlink:href="#abc123_a"/>
<om:result uom="ppm">75.</om:result><!-- this is an interesting result! Better check it out with a repeat measurement -->
<adx:analyte>Sb</adx:analyte>
</adx:Assay>
</adx:measurement>

James Watson

Posts: 2024
Nickname: watson
Registered: Sep, 2005

Re: Brett McLaughlin: What is XML Really Good For? Posted: Mar 5, 2007 6:23 AM
Reply to this message Reply
> Sorry for prolonging; I'll try to make this brief.

Hopefully we can conclude this branch of the thread here. You make some points here that I think warrant addressing.

> > I agree it is less but I think you overestimate how
> much
> > less important it is.
>
> I think we do disagree on this factor. It's certainly
> possible I'm blue-skying it.

> > It's irrelevant what entities are. Two or more
> entities
> > are communicating via a language. This is exactly the
> > same as two companies communicating.
>
> Computers have far more limited domains of discourse than
> either companies or individuals, and demand precise
> protocols. I think there's a world of difference.

I think this is where we are getting a disconnect. You are focusing on the languages themselves. But I am not talking about the languages themselves. I do not argue that the above is not true, it's just orthogonal to my point. Two entities need a language to communicate. They must choose a language with which to converse. My point is only about the how that decision is made, not about the character of the language itself.

> > Why does this matter? What difference does it make
> that
> > it was a conscious change or not? What part of your
> > argument does this point support?
>
> I don't think analogies to human languages mean much.
> Designed human languages face a steep uphill battle, as
> opposed to nonconsciously-evolved human languages.
> Computer languages face a hill much less steep. There are
> countless other differences; analogies to human languages
> unnecessarily anthropomorphize software.

Not to beat a dead horse but I wasn't making an analogy. Analogies fall apart at some point. I'm saying that it's the exactly the same problem.

> They might if I can supply a language that's more
> manageable than XML, and libraries that support it. I'm
> not in the business above; we tend to be the consumer of
> interfaces, rather than a supplier.

I think you are missing that as the wholesaler you are just one node in a huge interconnected web. When I worked in B2B at such a company we had hundreds of vendors and customers. Each of those companies had other vendors and/or customers. So lets I was able to come up with something that was easier to manage than XML. Amazon is still going to need XML because their other business partners use it. It just creates more things to worry about. This is the exact reason why we still had a whole group of people doing EDI.

> An alternative to XML would presumably also be a syntax
> for defining domain-specific languages; it wouldn't need
> to be wholesale-specific. It would have opportunities to
> gain developer mindshare in other business domains,
> perhaps even internal interfaces between components, in
> which developers have more influence.

I don't doubt that this could be wonderful but the conundrum is how do you get adoption to reach the critical mass. I takes more than a great solution, unfortunately. Consider email. It was not designed for the purpose that it is used for now. But how do you replace it. You can create a much better solution but unless the other side is using it too, it's useless.

Flat View: This topic has 81 replies on 6 pages [ « | 1  2  3  4  5  6 | » ]
Topic: Martin Fowler Takes ANTLR for a Test Drive Previous Topic   Next Topic Topic: Desklets for Java

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use