The Artima Developer Community
Sponsored Link

Designing with the Python Community
A Conversation with Guido van Rossum, Part VI
by Bill Venners with Frank Sommers
February 17, 2003

<<  Page 2 of 4  >>


The Unpythonic DOM API

Bill Venners: What didn't you like about the DOM API?

Guido van Rossum: There are too many things that look different than what they are.

Bill Venners: What do you mean by "look different than what they are?"

Guido van Rossum: In Python, an object can have attributes and methods. You call methods with parentheses and possibly an argument list. You refer to attributes with a dot, as in Python has internal mechanisms with which you can easily implement something that looks much like an attribute, but is actually implemented by a pair of functions for setting and getting a value. In the latest Python version it is a little more formalized—we call them properties—but you can do the same thing in older Python versions.

When reading the DOM implementation, I discovered that certain properties are actually implemented by functions that do extensive work, and return a new, expensively created object every time you access the property. That feels like the wrong use of a property. If I write a program that uses in two places nearby each other, then I expect that in both places I am accessing the same object, and it's probably cheap. I don't expect that extracting foo the first time involves heavy computation, and that extracting foo the second time involves the same computation all over again. And moreover, I don't expect the second to give me a different copy of the data. That's an example of something that didn't work for me in the DOM API implementation.

Bill Venners: I understand that DOM was actually defined as a language-independent API that could be ported to any language. That language-independent goal is what made it clumsy. In Java, DOM is also rather hard to use, because it is "unjavanic." Several more javanic DOM APIs have been created, the most popular of which is JDOM.

Guido van Rossum: People have done the same to replace DOM in Python. From a Python point of view, it would be more natural to represent an XML tree as a nested construction of the primitive Python data types, where you use just dictionaries and lists and maybe tuples, as well as strings and numbers.

All the abstraction DOM throws on top of the primitive Python data types makes DOM less efficient. The extra abstraction also makes it harder to intuit your program's performance characteristics. A hobby of mine is teaching people to see if they write code this way, it will be faster than if they write it that way. I think it's good to have a feel for which operations are slow and which are fast. That's difficult in Python, because one line of code can take a lot of time to execute if it actually invokes a rich library or does something on a million data structure elements. But it's still good to have an idea that certain things are faster than other things. I don't know how exactly they acquired their knowledge, but the stronger Python programmers have a better feel for which constructs of the language are fast. They understand that local variables are faster than global variables, object attributes are faster than method calls.

Bill Venners: Do you subscribe to the theory that you should delay optimization until you really have a performance problem?

Guido van Rossum: Absolutely, but nevertheless, you can also write code that will just be guaranteed to take longer without having any other benefits. If you can write it in five lines or ten, I'd prefer five lines even knowing that it's more expensive. But if you can write it in five lines in two different ways, then you should have some intuition or guidelines for what are the more recommended operations.

<<  Page 2 of 4  >>

Sponsored Links

Copyright © 1996-2017 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us