The Artima Developer Community
Sponsored Link

Weblogs Forum
Proposal: XML-- (read: XML minus minus)

7 replies on 1 page. Most recent reply: Nov 11, 2005 9:02 AM by Christopher Diggins

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 7 replies on 1 page
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Proposal: XML-- (read: XML minus minus) (View in Weblogs)
Posted: Nov 7, 2005 1:47 PM
Reply to this message Reply
Summary
Wouldn't it be nice if there existed a standard for XML when you don't need the whole thing? I propose a strict subset of XML called XML minus minus (XML--).
Advertisement
Often in my work, I find XML would be very handy, (e.g. configuration files, data serialization, etc.), but that I only need a fraction of the specification. In these cases it doesn't make sense to embed a gargantuan industrial strength XML parser in my code, so I usually use a homegrown partial XML parser.

Inventing a brand new markup language is one possibility I have explored (such as Labelled S-Expressions), but to be honest I don't think the idea will take off. Many markups exist, and people prefer things that are already well-known (and well marketed) such as XML.

I want to propose a specification for a strict subset of XML called XML--. The specification is the same as that for XML 1.0 (third edition) but with the following restrictions:

  • no attributes
  • no CDATA sections
  • no processing instructions
  • no document type definitions
  • the standalone document declaration MUST have the value "yes"
  • the encoding must be UTF-8
  • support only for the entities: & > < " and '
This work is inspired by TinyXML. The main difference though is that XML-- does not support attributes. I wonder what more could be done, to make this idea into a viable specification with actual users?

Postscript: Justification for Dropping Attributes

I should just make a quick justification, the attributes are dropped for several reasons:
  • speeds up parsing significantly
  • reduces complexity of the parser
  • speeds up tree-building phases
  • internal representations of the document are much simpler
No loss of information needs to occur because an XML element such as:
<mytag attribute_name="attribute_value"/>
can be rewritten trivially as:
<mytag>
  <attribute_name>attribute_value</attribute_name>
</mytag>


Noam Tamim

Posts: 26
Nickname: noamtm
Registered: Jun, 2005

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 7, 2005 4:59 PM
Reply to this message Reply
What about namespaces? They are responsible for much of XML's bloat*. Plus, TinyXML does not support them.

The problem with eliminating attributes in that the result tends to be very big. You're likely to have no empty elements, so every tag will be repeated ( <tag>...</tag>, instead of <tag/> ). So file size can be ~twice as large as XML with many attributes.

I still like this idea, though. But only if as part of the specification there's a compression recommendation (XML-- will ZIP pretty good), and it also eliminates namespaces.

Noam.

* I realise that namespaces are essential for large applications and vendor interoperability. They are still bloated, however.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 7, 2005 5:04 PM
Reply to this message Reply
> What about namespaces? They are responsible for much of
> XML's bloat*. Plus, TinyXML does not support them.

Oh yeah, those! Drop 'em.

> The problem with eliminating attributes in that the result
> tends to be very big.

Agreed.

> You're likely to have no empty
> elements, so every tag will be repeated ( <tag>...</tag>,
> instead of <tag/> ). So file size can be ~twice as large
> as XML with many attributes.

Agreed.

> I still like this idea, though. But only if as part of the
> specification there's a compression recommendation (XML--
> will ZIP pretty good), and it also eliminates namespaces.

Good idea. I have an idea for a compression scheme involving hash-tables, which is easy to implement. I hope to be able to finish it in a couple of weeks, as long as I don't get too distracted.

Max Lybbert

Posts: 314
Nickname: mlybbert
Registered: Apr, 2005

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 7, 2005 5:35 PM
Reply to this message Reply
/* The problem with eliminating attributes in that the result tends to be very big. You're likely to have no empty elements, so every tag will be repeated ( <tag>...</tag>, instead of <tag/> ). So file size can be ~twice as large as XML with many attributes.
*/

I think part of the bloat CDiggins was referring to is in the parser (needing to parse two separate syntaxes).

However, you are very right that start and end tags do lead to unnecessarily bloated files that need parsing.

art src

Posts: 33
Nickname: articulate
Registered: Sep, 2005

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 9, 2005 7:22 PM
Reply to this message Reply
I love the idea of removing features. Many of these are things many people don't know about, or use. Removing them makes things simpler, more predictable, and generally better.

Attributes are convenient syntax, but useless model, because, as pointed out, you can trivally rewrite them as elements.

So how complicated is the rewrite?

We already agree that <p></p> is the same as <p/>, why not agree that:

<a href="blah">...</a>

Is the same as:

<a><href>blah</href>...</a>

Don't attributes parse as something like:

([:whitespace:]+[:word:]+=("|')[^"]*(\2))*>

Removing the distinction between attributes and elements removes a decision people would otherwise have to make designing schemas. Keeping the syntax maintains the benefits others have described (size and convenience).

Mike Looijmans

Posts: 5
Nickname: milosoft
Registered: Nov, 2005

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 11, 2005 8:46 AM
Reply to this message Reply
Funny you should mention configuration files.

When using XML for config, I've been doing something close to what you propose. In languages with reflection (C#, Java) this works quite well. When you encounter something like:

<something>somevalue</something>

this means that in the current context, we go look for an attribute "somevalue". Once found, we see if it's a property, field or function. In this case, we'd either call the function with 'somevalue' as parameter or set the field or property to 'somevalue'. How to pass somevalue (as string or something else) is done by requesting the type information, and asking the type information for a converter from string to appropriate type (e.g. Color, Integer, ...).
This severely limits the allowed XML, approximately to what you propose. When needed, you can always extend the parser with more complex syntax (for example, i did something for accessing arrays and other indexed properties).

By nesting, you can "change context", e.g.:
<myobject><something>somevalue</something><someother >12</someother></myobject>
would set two properties in the object myobject. This behaviour is triggered by the content being more XML.

This worked very well in C#, but when I tried something like that in Python, I bumped into the fact that Python is equipped with excellent reflection, but objects don't have static types so you cannot decide whether you should set the someother property to a string or an integer.

In Python I took th Pythonic solution for configuration: Write config files in Python. Any one will understand and be able to modify a config file that says:

color="red"
users = {"pete": "Peter Cambell", "claus": "Santa Claus"}

In fact, I find this syntax much more readable (and writable) than XML. XML may be ASCII printable, but it certainly is not human readable.

Mike Looijmans

Posts: 5
Nickname: milosoft
Registered: Nov, 2005

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 11, 2005 8:54 AM
Reply to this message Reply
Attributes are in fact very useful in XML, when using it to merge data with existing data.

Suppose I send this XML to some business logic component that manages my data. I never explained it anything about my table structure, but it can still decide what to do:

<user login="mike" host="mymachine">
<fullname>Mike Looijmans</fullname>
<occupation>Developer</occupation>
</user>

This instructs my data storage to:
- Lookup a user "mike" on host "mymachine"
- If it does not exist, create a new entry for it
- Update the fullname and occupation for that entry.

Without the attributes, i would have a hard time explaining which of the fields should be considered as "primary key", and which only supply additional data.

One may argue that the receiver "should know these things", but it would make the system way less simple and flexible.

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: Proposal: XML-- (read: XML minus minus) Posted: Nov 11, 2005 9:02 AM
Reply to this message Reply
> Attributes are in fact very useful in XML, when using it
> to merge data with existing data.
>
> Suppose I send this XML to some business logic component
> that manages my data. I never explained it anything about
> my table structure, but it can still decide what to do:
>
> <user login="mike" host="mymachine">
> <fullname>Mike Looijmans</fullname>
> <occupation>Developer</occupation>
> </user>
>
> This instructs my data storage to:
> - Lookup a user "mike" on host "mymachine"
> - If it does not exist, create a new entry for it
> - Update the fullname and occupation for that entry.

I need to point out that the XML record only instructs the data storage because the data-storage has predetermined that that is what the login attribute and host mean. It could have just as easily read that data as XML elements. It all depends on how you want to interpret the data.

> Without the attributes, i would have a hard time
> explaining which of the fields should be considered as
> "primary key", and which only supply additional data.

It would be no less, and no more hard than using an element, e.g.:

<user>
<login>mike</login>
<host>mymachine</host>
<fullname>Mike Looijmans</fullname>
<occupation>Developer</occupation>
</user>

> One may argue that the receiver "should know these
> things", but it would make the system way less simple and
> flexible.

But the reciever already has to have a predetermined interpretation of the attributes, so why not make it a predetermined interpretation of key elements.

Flat View: This topic has 7 replies on 1 page
Topic: Separation of Concerns Previous Topic   Next Topic Topic: What would it take for a new language to impress you?


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us