The Artima Developer Community
Sponsored Link

Weblogs Forum
Presenting a Simpler Alternative to XML: PicoML

25 replies on 2 pages. Most recent reply: Nov 20, 2005 10:42 AM by Christopher Diggins

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 25 replies on 2 pages [ 1 2 | » ]
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Presenting a Simpler Alternative to XML: PicoML (View in Weblogs)
Posted: Nov 2, 2005 4:10 PM
Reply to this message Reply
Summary
I have devised a simple mark-up scheme for my own purposes which is much simpler than XML.
Advertisement
If you have ever looked at the official XML specification then you know that as far as mark-up languages go, it is far more complicated than it needs to be. Writing an XML parser is supposed to be easy, but in practice is very hard to do correctly. This is a great example of how bad things get when technology is designed by committee.

Anyway, enough bitching. I currently need a simple and efficient markup language to represent textual data with a hierarchical structure, and I came up with the following format I'm calling PicoML:

  document ::=
    "[#picoml]" tagged_content

  tagged_content ::=
    open_tag content close_tag

  content ::=
    text? (tagged_content | text)*

  open_tag ::=
    "[" text "]"

  close_tag ::=
    "[/]"

  text ::=
    ^("]" | "[" | "/" | "\") | escaped_char

  escaped_char ::=
    "\[" | "\/" | "\]" | "\\"
And that is literally it! No attributes, no entities, no DTD's, no B.S. Just a compact, and easily parsed markup language.

It is trivial to map an XML document to PicoML. Take for instance the following XML:

  <cd>
  <title>Maggie May</title>
  <artist>Rod Stewart</artist>
  <country>UK</country>
  <company>Pickwick</company>
  <price>8.50</price>
  <year>1990</year>
  </cd>
This becomes in PicoML:
  [cd]
  [title]Maggie May[/]
  [artist]Rod Stewart[/]
  [country]UK[/]
  [company]Pickwick[/]
  [price]8.50[/]
  [year]1990[/]
  [/]
If someone goes around stuffing data in attributes (and violating the spirit of XML), the mapping is still trivial:
  <stupid fu="bar"/>
becomes the following:
  [stupid][fu]bar[/][/]
In fact I think there is a relatively trivial XSLT to do the mapping automatically. So who else is tired of looking at an emperor with no clothing?


Sean Conner

Posts: 19
Nickname: spc476
Registered: Aug, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 6:37 PM
Reply to this message Reply

(cd
(title Maggy May)
(artist Rod Stewart)
(country UK)
(company Pickwick)
(price 8.50)
(year 1990)
(stupid (foo bar))
)


But in a more serious note, what if I want to include a literal "[" as text? What then? And to be more pedantic, what unit is the price listed in? British pounds? Attributes can be useful in XML, so the price tag can be defined as:

<price scheme="ISO-4217" currency="GBP">8.50</price>

But in your simpler method, is there a preferred order for the “attributes”?

Another question: who writes their own XML parsers? There exist XML parsers, for free, that can be used.

Derek Parnell

Posts: 22
Nickname: derekp
Registered: Feb, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 7:00 PM
Reply to this message Reply
> If you have ever looked at the <a
> href="http://www.w3.org/TR/REC-xml/">official XML
> specification</a> then you know that as far as mark-up
> languages go, it is far more complicated than it needs to
> be.

You are not alone. I think everybody who has had a good look at XML has said (at least once) "there's got to be a better way". Some, like yourself, have come up with alternatives that do mostly the same but with must easier to use syntax.


One of my favourites goes a bit like this (from memory)


document ::=
[meta_content] tagged_content

meta_content ::=
"(" "*" ":" content ")"

tagged_content ::=
open_tag content close_tag

content ::=
tagged_content* | text

open_tag ::=
"(" id_text ":"

close_tag ::=
")"

id_text ::=
alpha id_char*

id_char ::=
alpha | digit | "_" | "$" | "@" | "#"

text ::=
^( "\" | "(" | ")" ) | escaped_char

escaped_char ::=
"\(" | "\)" | "\\"

So using your example we could get something like...

(*: (encoding: utf-8) (keywords: recording,music,pop) )
(cd:
(title: Maggie May)
(artist: Rod Stewart)
(country: UK)
(company: Pickwick)
(price: (AUD: 8.50) (USD: 5.60) )
(year: 1990)
)

> If someone goes around stuffing data in attributes (and
> violating the spirit of XML), the mapping is still
> trivial:
>

<stupid fu="bar"/>


becomes the following:


(stupid: (fu: bar) )


>So who else is tired of looking
> at an emperor with no clothing?

Lot's of people. Most have come to a similar conclusion and result as you have.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 7:16 PM
Reply to this message Reply
> what if I want to include a
> literal "[" as text? What then?

Take a closer look at the BNF, it is already covered as an escaped_char:

\[ , \] , \/ , \\

> And to be more pedantic,
> what unit is the price listed in? British pounds?

If you want more information, add more tags.

[price]8.50[currency]pounds[/][/]

> Attributes can be useful in XML.

On this I 100% disagree. They simply add confusion, ambiguity and complexity. Everything can be covered using elements. Attributes are redundant.

> so the price tag can be
> e defined as:
>
> <price scheme="ISO-4217"
> currency="GBP">8.50</price>


It is easier to make mistakes (mismatched closing tags), it is more verbose, it is harder & slower to parse. I see no advantages over:

[price][scheme]ISO-4217[/][currency]GBP[/]8.50[/]

> But in your simpler method, is there a preferred order for
> the “attributes”?

I don't understand the question.

> Another question: who writes their own XML parsers?
> There exist XML parsers, for free, that can be used.

And the overwhelming majority are incorrect! And the ones that aren't are really really really slow.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 7:21 PM
Reply to this message Reply
> One of my favourites goes a bit like this (from memory)
[snip]

That markup language is quite cool, do you remember the name?

Derek Parnell

Posts: 22
Nickname: derekp
Registered: Feb, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 7:45 PM
Reply to this message Reply
> > One of my favourites goes a bit like this (from memory)
> [snip]
>
> That markup language is quite cool, do you remember the
> name?

LOL! I never got to name it. It was one of a few that I dreamed up over the years. But it is the one I like the most. Let's called it YAM (yet another markup) ;-)

The semantics are simple too. The only tricky bits were that it allowed matching parenthesis inside a 'content'. You could have ...

(title: The (Many) Loves of Dobbie Gilles)

so you only had to 'escape' unmatched parentheses. Also, text enclosed in matching quotes was not examined for parenthesis and would not be trimmed. The quotes could be a single quote ', double quote ", or back-quote `.


(extention: " and ')'" )


where in this case the content of the tagged entry 'extention' would be the eight characters between the matching double-quotes.

I used this mark up in a GUI Form definition language for Windows programming that I wrote.

Morel Xavier

Posts: 73
Nickname: masklinn
Registered: Sep, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 7:58 PM
Reply to this message Reply
Duh, the way tags are nested look like good ol' SGML with generic end tag instead of no tag at all.
> > > One of my favourites goes a bit like this (from
> memory)
> > [snip]
> >
> > That markup language is quite cool, do you remember the
> > name?
>
> LOL! I never got to name it. It was one of a few that I
> dreamed up over the years. But it is the one I like the
> most. Let's called it YAM (yet another markup) ;-)
>
We could also call it Lisp, though :p

Derek Parnell

Posts: 22
Nickname: derekp
Registered: Feb, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 8:04 PM
Reply to this message Reply
> Duh, the way tags are nested look like good ol' SGML with
> generic end tag instead of no tag at all.

Originality ain't one of my strong points !

[snip]

> We could also call it Lisp, though :p

Nah, that name's been taken ... how about "Lithp" ?

m l

Posts: 1
Nickname: mrmathieu
Registered: Nov, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 8:19 PM
Reply to this message Reply
YAML and JSON come to mind, and already have parsers in a lot of language. Maybe you don't need to invent yet another data representation format.

http://www.yaml.org
http://www.json.org

Greg Jorgensen

Posts: 65
Nickname: gregjor
Registered: Feb, 2004

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 11:30 PM
Reply to this message Reply
Have you ever heard of YAML?
(http://www.yaml.org/)

Last week you were suggesting storing program source code in XML, and several people pointed out the complexity of that. This week XML is too complicated now that you are apparently writing your own parser (why?). Will picoML be a better fit for storing source code?

> So who else is tired of looking at an emperor with no clothing?

Indeed.

Greg Jorgensen

Posts: 65
Nickname: gregjor
Registered: Feb, 2004

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 2, 2005 11:39 PM
Reply to this message Reply
> > Another question: who writes their own XML parsers?
> > There exist XML parsers, for free, that can be used.
>
> And the overwhelming majority are incorrect! And the ones
> that aren't are really really really slow.

Proof? Examples? Incorrect according to what test cases? Slow according to what metric? I don't want to defend XML here but you throw these rocks over the wall all the time, claiming that some language or technique is poorly-designed, poorly-implemented, slow, unsuitable for your purposes. Could you for once back up these statements with an actual example or proof? Until you have a 100% correct and fast XML parser (or programming language, or macro processor) all others can be judged against your claims have no credibility.

Kristian Dupont

Posts: 22
Nickname: chryler
Registered: Dec, 2003

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 3, 2005 4:32 AM
Reply to this message Reply
I don't know whether you are actually trying to come up with an alternative to xml or not - it seems rather theoretical to me. If not, then I don't think that the question is very interesting. A grammar for a small utility language used only by you and perhaps a few others would have value if it presented something novel which is not the case here as far as I can tell.
In case your purpose is to actually compete with xml, just being "simpler" sounds to me very much like the original goals of Java compared to C++ - and now look at the complexity of Java 5. Inertia picks up as your language matures. People will begin asking for schemas, namespaces, include mechanisms, etc. etc. Not to mention tools, libraries in all kinds of languages.

Harrison Ainsworth

Posts: 57
Nickname: hxa7241
Registered: Apr, 2005

in defense of xml, and alternatives Posted: Nov 3, 2005 7:43 AM
Reply to this message Reply
Facile criticisms of XML are popular, and worth little. The designers had engineering tradeoffs to make and not a free hand to create the perfect solution.

anyway, check out:
http://www.pault.com/pault/pxml/xmlalternatives.html

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 3, 2005 10:43 AM
Reply to this message Reply
> One of my favourites goes a bit like this (from memory)

The more I look at your markup the more I like it! It is essentially an s-expression with labels, which significantly enhances plain old s-expressions. If I was to modify the spec for my own purposes (giving you credit of course) I would make the following changes:

First I would strip the meta_content. I don't see any sufficiently compelling reason to have separate meta tags versus ordinary tags. Next I would remove the special requirements for id characters. This would increase the speed of the parser, and have an interesting side-effect: labels can be used as raw-data. Next I would remove the string parsing rules (which incidentally you left out of the BNF). Finally I leave out the mismatched non-escaping paranthesis rule. It may be convenient for a human, but it adds more complexity to the parser.

My goal is to have a specification with as few rules as possible, and can be parsed with blinding speed.

The end result would be:


document ::=
tagged_content

tagged_content ::=
open_tag content close_tag

content ::=
text? (tagged_content text?)*

open_tag ::=
"(" text ":"

close_tag ::=
")"

text ::=
^( "\" | "(" | ")" | ":" ) | escaped_char

escaped_char ::=
"\(" | "\)" | "\\"


What do you think?

Reno C.

Posts: 10
Nickname: saxml
Registered: Oct, 2005

Re: Presenting a Simpler Alternative to XML: PicoML Posted: Nov 3, 2005 10:59 AM
Reply to this message Reply
> I have devised a simple mark-up scheme for my own purposes

Is it for your human readable abstract trees ? (Your previous ASTXML post)

> What do you think?

Lack of namespace support, maybe ;)

Flat View: This topic has 25 replies on 2 pages [ 1  2 | » ]
Topic: Generics: Unbounded wildcard puzzle Previous Topic   Next Topic Topic: OsXml - An XML Format for Publishing Open-Source Code


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us