The Artima Developer Community
Sponsored Link

Weblogs Forum
Simplifying XML Manipulation

29 replies on 2 pages. Most recent reply: Jun 23, 2006 6:30 PM by Andy Dent

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 29 replies on 2 pages [ « | 1 2 ]
Steve R. Hastings

Posts: 5
Nickname: steveha
Registered: Jun, 2006

Re: Simplifying XML Manipulation Posted: Jun 9, 2006 10:49 AM
Reply to this message Reply
Advertisement
My example for xe was a straightforward port of the test code, but really in xe you would probably not be building your XML structures by hand very often. Usually you would make a class to do it for you. With a class, you can set up sensible default values, check values to make sure they are legal, and so on.

I coded up a couple of classes to demonstrate this. See below. Like my first example, this is tested code; if you download xe you can run this.

One other thing about xe: I'm proud of the way it handles reading in XML data. You create an XML data structure, and you call the .import_xml() method. This then reads in the XML data, and tries to match things up. Where there is a match, it puts the value in your data structure; if there is no match, it will add a member to your data structure and put the data in there. Basically, you just describe the XML data you expect, and if your description is good, it will magically Just Work. This is especially cool because you can describe things like an Atom feed that have a list of 0 or more identical elements, and that will work too!

The .import_xml() method is based on openAnything() by Mark Pilgrim. It accepts a file-like object, a filename, a URL, or a string.

However, namespaces throw a monkey wrench right now. If you are reading in, say, an Atom feed, and you have bound "a" to the Atom namespace, then "a:title" should match with the "title" member in the data structure; right now that doesn't work at all.

Here's the new sample code.

import xe

lst_valid_carriers = ["FDXE", "UPS", "USPS"]
class CarrierCode(xe.TextElement):
def __init__(self, carr_code):
if carr_code is None:
carr_code = lst_valid_carriers[0]
elif carr_code not in lst_valid_carriers:
s = ", ".join(lst_valid_carriers)
raise ValueError, "carrier code must be one of: " + s
xe.TextElement.__init__(self, "CarrierCode", carr_code)

class RequestHeader(xe.NestElement):
def __init__(self, acct_num, meter_num, carr_code=None, svc=None, pkg=None):
xe.NestElement.__init__(self, "RequestHeader")

# AccountNumber must be in range 0..99999
self.acct_num = xe.IntElement("AccountNumber", acct_num, 0, 99999)

# MeterNumber must be in range 0..9999
self.meter_num = xe.IntElement("MeterNumber", meter_num, 0, 9999)

self.carr_code = CarrierCode(carr_code)

if svc is None:
svc = "STANDARDOVERNIGHT"
self.service = xe.TextElement("Service", svc)

if pkg is None:
pkg = "FEDEXENVELOPE"
self.packaging = xe.TextElement("Packaging", pkg)

root = RequestHeader(12345, 6789)

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Simplifying XML Manipulation Posted: Jun 9, 2006 8:29 PM
Reply to this message Reply
>With a class, you can set up sensible default values,
>check values to make sure they are legal

It seems like a number of the contributors to this thread are at a level of "schema awareness" similar to mine before joining CSIRO a couple of years ago.

With schema-aware processing tools there is no reason for user code to be setting up defaults or checking values. OK, there might be a good reason to have code generating XML that writes defaults rather than relying on the schema's declared defaults, but certainly value type and range-checking should be in the schema.

Steve R. Hastings

Posts: 5
Nickname: steveha
Registered: Jun, 2006

Re: Simplifying XML Manipulation Posted: Jun 9, 2006 10:37 PM
Reply to this message Reply
> With schema-aware processing tools there is no reason for
> user code to be setting up defaults or checking values.

Is there a book or web page you recommend for learning more about this?


> OK, there might be a good reason to have code generating
> XML that writes defaults rather than relying on the
> schema's declared defaults

Example: a user's FedEx class that defaults to the user's account number and other user-specific shipping details.


> but certainly value type and
> range-checking should be in the schema.

It might also be faster to have the code "know" the values to check rather than having to parse the schema each time you run your program... especially for trivial programs. That's just a guess, though, and I could be wrong.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Simplifying XML Manipulation Posted: Jun 9, 2006 11:49 PM
Reply to this message Reply
There's a lot of stuff about design patterns for XML Schemas on the XMML wiki, which is an international collaboration (XMML working name now renamed GeoSciML partly due to confusion and a squatter on xmml.com).

https://www.seegrid.csiro.au/twiki/bin/view/Xmml/WebHome

The O'Reilly "XML Schema" book is pretty good, we have a much-creased paper copy at work and it is online at Safari:
http://safari.oreilly.com/0596002521

I am not saying W3C XML Schema is particularly good, but it is sufficiently powerful and usable for rich data descriptions. One of the biggest headaches is that it allows more realistically flexible data descriptions than programming languages are easily able to deal with (that's one of the things I'm hoping to fix with CEDSimply).

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Simplifying XML Manipulation Posted: Jun 10, 2006 5:36 PM
Reply to this message Reply
> It seems like a number of the contributors to this thread
> are at a level of "schema awareness" similar to mine
> before joining CSIRO a couple of years ago.

Definitely true for me, regarding schemas and namespaces. I'm basically creating XML only when I have to, and so far I haven't run into namespace or schema issues. Thanks for pointing it out.

Manuzhai of Kashan

Posts: 1
Nickname: manuzhai
Registered: Jun, 2006

Re: Simplifying XML Manipulation Posted: Jun 13, 2006 2:01 AM
Reply to this message Reply
Or you could just have used ElementTree, cElementTree or lxml; a best-of-breed solution that will be in the Python standard library from 2.5 onwards.

(Although admittedly it is more focused on data-oriented XML than on document-oriented XML.)

Sean Jamieson

Posts: 3
Nickname: sjamieson
Registered: Jun, 2006

Re: Simplifying XML Manipulation Posted: Jun 14, 2006 10:41 AM
Reply to this message Reply
py.xml and the tool you created are both interesting solutions to the problem of creating XML in a structured and visually pleasing way.

I have created my own tool, that is, IMHO, very elegant and visually pleasing. I've called it xmlmodel, as I see it as a way of defining the structure of your xml document in terms of an object model.

An example:

#!/usr/bin/env python
from xmlmodel import *
from datetime import datetime

class rss( XMLModel ):
class XMLAttrs:
version = '2.0'

class channel( XMLNode ):
title = XMLValue('test')
description = XMLValue('something')
link = XMLValue('http://here')
lastBuildDate = XMLDateTime( format = "%a, %d %b %Y %H:%M:%S EST" )
generator = XMLValue()
docs = XMLValue()

class item( XMLNodeList ):
title = XMLValue()
link = XMLValue()
description = XMLValue()
category = XMLList()
pubDate = XMLDateTime( format = "%a, %d %b %Y %H:%M:%S EST" )

feed = rss()

feed.channel.title = 'Latest Headlines'
feed.channel.description = 'Most Recent Headlines'
feed.channel.generator = 'XMLModel 0.1a'
feed.channel.lastBuildDate = datetime( 2006, 5, 10, 8, 24, 30 )

# the following would theoretically be looped in an function or method somewhere

item = feed.channel.item.new()
item.title = 'foo'
item.link = 'http://foo'
item.description = 'foo bar'
item.category.append( 'foo' )
item.category.append( 'bar' )
item.pubDate = datetime( 2005, 1, 2, 3, 4, 5 )

item = feed.channel.item.new()
item.title = 'bar'
item.link = 'http://bar'
item.description = 'bar baz'
item.category.append( 'bar' )
item.category.append( 'baz' )
item.pubDate = datetime( 2006, 2, 3, 4, 5, 6 )

print feed


the package can be found at the python cheeseshop:
http://www.python.org/pypi/xmlmodel

I welcome any comments or suggestions.

(but please, no more "have you heard of XXX project, it's really wonderful", I'm sure it is, but unless its something that is doing it the same way I am, it is not really relevant. If someone out there already has this idea and a more mature codebase, I'd be pleased to drop this, and contribute to that.)

Bruce Eckel

Posts: 875
Nickname: beckel
Registered: Jun, 2003

Re: Simplifying XML Manipulation Posted: Jun 14, 2006 11:32 AM
Reply to this message Reply
Interesting; a quirky use of classes (only composed of static fields), but I can see the syntax you're aiming at.

The only problem I can see is if there is more than one element with the same tag at the same level, which I think is legal XML (a list of identical items).

Sean Jamieson

Posts: 3
Nickname: sjamieson
Registered: Jun, 2006

Re: Simplifying XML Manipulation Posted: Jun 14, 2006 12:04 PM
Reply to this message Reply
Look a little closer, that is happening in my example :-)
class item( XMLNodeList ): defines a repeating node, which is a subclass of both XMLNode and the builtin list.

item = feed.channel.item.new() creates a new item in the RSS channel, and retuns a reference for you to manipulate.

you can also append nodes to an XMLNodeList using the list's append, insert, and __setitem__ methods, these do not need to be instances of item, but simply instances of XMLNode.

Sean Jamieson

Posts: 3
Nickname: sjamieson
Registered: Jun, 2006

Re: Simplifying XML Manipulation Posted: Jun 14, 2006 12:12 PM
Reply to this message Reply
Oh, and by the way, they aren't static fields.

Was that a "*blink*" i just heard?

In fact, what I'm doing is collecting the classes defined within the XMLModel subclass, and instantiating them when an instance of the XMLModel is created, so all those sub classes become composite objects.

So you can create more than one instance, and use them, with out conflict.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Simplifying XML Manipulation Posted: Jun 14, 2006 6:19 PM
Reply to this message Reply
> It might also be faster to have the code "know" the values
> to check rather than having to parse the schema each time
> you run your program... especially for trivial programs.

That sounds a bit like "it might be faster to have the code 'know' the values to check rather than relying on the database schema rules to enforce them." :-)

Schemas are referred to by instance documents, not incorporated. There's no reason why a processing system can't have a schema cached. You could have an architecture where validation against a schema was performed separately before calling user code.

Krzysztof Sobolewski

Posts: 7
Nickname: jezuch
Registered: Dec, 2003

Re: Simplifying XML Manipulation Posted: Jun 15, 2006 8:00 AM
Reply to this message Reply
I read somewhere that XML is just reinvented and *terribly* overengineered Lisp.
In Lisp code is data and data is code, so mixing them is a normal thing. Languages in XML? (Jelly, anyone?) Why do you think there are so many dialects of Lisp? :)

Noam Tamim

Posts: 26
Nickname: noamtm
Registered: Jun, 2005

Re: Simplifying XML Manipulation Posted: Jun 23, 2006 2:43 AM
Reply to this message Reply
...
Another example is Ant. The creator of this tool has since apologized for using XML
...

Do you have a reference to that? I couldn't find such a public apology on the web.

Vincent O'Sullivan

Posts: 724
Nickname: vincent
Registered: Nov, 2002

Re: Simplifying XML Manipulation Posted: Jun 23, 2006 9:06 AM
Reply to this message Reply
I'm guessing but I think the reference is to the book "Pragmatic Project Automation" by Mike Clark (ISBN 0-9745140-3-9). It contains (on page 29) a page long explation by James Davidson (the author of Ant) about why he used XML in Ant. The article title is "The Creator of Ant Exorcizes One of His Demons". The full article ends with the following paragraph...

"If I knew then what I know now, I would have tried using a real scripting language, such as JavaScript via the Rhino component or Python via JPython, with bindings to Java objects that implemented the functionality expressed in today s tasks. Then, there would be a first-class way to express logic, and we wouldn't be stuck with XML as a format that is too bulky for the way that people really want to use the tool."

> ...
> Another example is Ant. The creator of this tool has
> since apologized for using XML
> ...
>
> Do you have a reference to that? I couldn't find such a
> public apology on the web.

Andy Dent

Posts: 165
Nickname: andydent
Registered: Nov, 2005

Re: Simplifying XML Manipulation Posted: Jun 23, 2006 6:30 PM
Reply to this message Reply
I'm pretty sure the first reference to this apology that I read was one of Bruce Eckel's postings http://www.mindview.net/WebLog/log-0046

The creator of Ant writes here about his regrets in (A) using XML and (B) not making Ant more powerful by incorporating enough language constructs. I agree wholeheartedly on both counts, and yet I'm not ready to undertake the project of creating a new build system, much as I would like to have a better one for my own use.

the http://x180.net/Articles/Java/AntAndXML.html link in the above seems to have been dropped (wish I'd saved it).

Googling AntAndXML.html will get lots of references

Flat View: This topic has 29 replies on 2 pages [ « | 1  2 ]
Topic: Simplifying XML Manipulation Previous Topic   Next Topic Topic: XML Processors can't Ignore Namespaces

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use