Sponsored Link •
Bill Venners: What didn't you like about the DOM API?
Guido van Rossum: There are too many things that look different than what they are.
Bill Venners: What do you mean by "look different than what they are?"
Guido van Rossum: In Python, an object can have attributes and
methods. You call methods with parentheses and possibly an argument list. You refer to
attributes with a dot, as in
foo.bar. Python has internal mechanisms with
which you can easily implement something that looks much like an attribute, but is
actually implemented by a pair of functions for setting and getting a value. In the latest
Python version it is a little more formalized—we call them properties—but you can do
the same thing in older Python versions.
When reading the DOM implementation, I discovered that certain properties are
actually implemented by functions that do extensive work, and return a new, expensively
created object every time you access the property. That feels like the wrong use of a
property. If I write a program that uses
x.foo in two places nearby each
other, then I expect that in both places I am accessing the same object, and it's probably
cheap. I don't expect that extracting
foo the first time involves heavy
computation, and that extracting
foo the second time involves the same
computation all over again. And moreover, I don't expect the second
x.foo to give me a different copy of the data. That's an example of
something that didn't work for me in the DOM API implementation.
Bill Venners: I understand that DOM was actually defined as a language-independent API that could be ported to any language. That language-independent goal is what made it clumsy. In Java, DOM is also rather hard to use, because it is "unjavanic." Several more javanic DOM APIs have been created, the most popular of which is JDOM.
Guido van Rossum: People have done the same to replace DOM in Python. From a Python point of view, it would be more natural to represent an XML tree as a nested construction of the primitive Python data types, where you use just dictionaries and lists and maybe tuples, as well as strings and numbers.
All the abstraction DOM throws on top of the primitive Python data types makes DOM less efficient. The extra abstraction also makes it harder to intuit your program's performance characteristics. A hobby of mine is teaching people to see if they write code this way, it will be faster than if they write it that way. I think it's good to have a feel for which operations are slow and which are fast. That's difficult in Python, because one line of code can take a lot of time to execute if it actually invokes a rich library or does something on a million data structure elements. But it's still good to have an idea that certain things are faster than other things. I don't know how exactly they acquired their knowledge, but the stronger Python programmers have a better feel for which constructs of the language are fast. They understand that local variables are faster than global variables, object attributes are faster than method calls.
Bill Venners: Do you subscribe to the theory that you should delay optimization until you really have a performance problem?
Guido van Rossum: Absolutely, but nevertheless, you can also write code that will just be guaranteed to take longer without having any other benefits. If you can write it in five lines or ten, I'd prefer five lines even knowing that it's more expensive. But if you can write it in five lines in two different ways, then you should have some intuition or guidelines for what are the more recommended operations.