Designing with the Python Community

A Conversation with Guido van Rossum, Part VI

by Bill Venners with Frank Sommers

February 17, 2003

Summary

Python creator Guido van Rossum talks with Bill Venners about the importance of "pythonic" API design, the usefulness of intuiting performance, the value of experience and community feedback in design decisions, and the process of deciding how to evolve Python's standard library.

Guido van Rossum is the author of Python, an interpreted, interactive object-oriented programming language. In the late 1980s, Van Rossum began work on Python at the National Research Institute for Mathematics and Computer Science in the Netherlands, or Centrum voor Wiskunde en Informatica (CWI) as it is known in Dutch. Since then, Python has become very popular among developers, who are attracted to its clean syntax and reputation for productivity.

In this interview, which is being published in six weekly installments, Van Rossum gives insights into Python's design goals, the source of Python programmer productivity, the implications of weak typing, and more:

In Part I: The Making of Python, Van Rossum describes Python's history, major influences, and design goals.
In Part II: Python's Design Goals, Van Rossum talks about Python's original design goals—how he originally intended Python to "bridge the gap between the shell and C," and how it eventually became used on large- scale applications.
In Part III: Programming at Python Speed, Van Rossum discusses the source of Python's famed programmer productivity and the joys of exploring new territory with code.
In Part IV: Contracts in Python, Van Rossum discusses the nature of contracts in a runtime typed programming language such as Python.
In Part V: Strong versus Weak Typing, Van Rossum discusses the robustness of systems built with strongly and weakly typed languages, the value of testing, and whether he'd fly on an all-Python plane.

In this final installment, Van Rossum discusses the importance of pythonic API design, the usefulness of intuiting performance, the value of experience and community feedback in design decisions, and the process of deciding how to evolve Python's standard library.

Bill Venners: Few people design programming languages, but many people design programs. Large programs are often composed of parts that look like libraries or APIs. Many people design APIs like that. What do you think is important in design? What makes a design good? What things do you value in an API or program design?

Guido van Rossum: That's a really tough question. One example of an API design I found unsatisfactory is the DOM API for dealing with XML. That originally started in the Java world. I'm not sure if the problems with it are the same in the Java version as they are in the Python version. I have a feeling that the Python translation of the DOM API was actually done by sticking too closely to the Java version, and thereby being unpythonic, which is a completely undefined term.

Bill Venners: But you know it when you see it.

Guido van Rossum: That's exactly the problem. I can't teach anyone else what makes a pythonic interface.

The Unpythonic DOM API

Bill Venners: What didn't you like about the DOM API?

Guido van Rossum: There are too many things that look different than what they are.

Bill Venners: What do you mean by "look different than what they are?"

Guido van Rossum: In Python, an object can have attributes and methods. You call methods with parentheses and possibly an argument list. You refer to attributes with a dot, as in foo.bar. Python has internal mechanisms with which you can easily implement something that looks much like an attribute, but is actually implemented by a pair of functions for setting and getting a value. In the latest Python version it is a little more formalized—we call them properties—but you can do the same thing in older Python versions.

When reading the DOM implementation, I discovered that certain properties are actually implemented by functions that do extensive work, and return a new, expensively created object every time you access the property. That feels like the wrong use of a property. If I write a program that uses x.foo in two places nearby each other, then I expect that in both places I am accessing the same object, and it's probably cheap. I don't expect that extracting foo the first time involves heavy computation, and that extracting foo the second time involves the same computation all over again. And moreover, I don't expect the second x.foo to give me a different copy of the data. That's an example of something that didn't work for me in the DOM API implementation.

Bill Venners: I understand that DOM was actually defined as a language-independent API that could be ported to any language. That language-independent goal is what made it clumsy. In Java, DOM is also rather hard to use, because it is "unjavanic." Several more javanic DOM APIs have been created, the most popular of which is JDOM.

Guido van Rossum: People have done the same to replace DOM in Python. From a Python point of view, it would be more natural to represent an XML tree as a nested construction of the primitive Python data types, where you use just dictionaries and lists and maybe tuples, as well as strings and numbers.

All the abstraction DOM throws on top of the primitive Python data types makes DOM less efficient. The extra abstraction also makes it harder to intuit your program's performance characteristics. A hobby of mine is teaching people to see if they write code this way, it will be faster than if they write it that way. I think it's good to have a feel for which operations are slow and which are fast. That's difficult in Python, because one line of code can take a lot of time to execute if it actually invokes a rich library or does something on a million data structure elements. But it's still good to have an idea that certain things are faster than other things. I don't know how exactly they acquired their knowledge, but the stronger Python programmers have a better feel for which constructs of the language are fast. They understand that local variables are faster than global variables, object attributes are faster than method calls.

Bill Venners: Do you subscribe to the theory that you should delay optimization until you really have a performance problem?

Guido van Rossum: Absolutely, but nevertheless, you can also write code that will just be guaranteed to take longer without having any other benefits. If you can write it in five lines or ten, I'd prefer five lines even knowing that it's more expensive. But if you can write it in five lines in two different ways, then you should have some intuition or guidelines for what are the more recommended operations.

Human Factors

Bill Venners: Many Python enthusiasts have told me that when they need to do something in Python, they often find an easy-to-use library and develop that something in three lines of code. The Python language itself seems very human- friendly to me. When you designed the "human interface" of Python, to what extent were you guided merely by taste or your own design sense? To what extent did you do user testing or some kind of research?

Guido van Rossum: The designers of the ABC language, Python's primary influence, tweaked the language based on the feedback from user testing. I've done minimal user testing, but I've been very open to feedback from the user community.

I'm an email junky. I've received many emails from both experienced and beginning Python users. Their suggestions register in my brain, and at some point, manifest into a better design decision.

It's hard to formalize and say these are my design guidelines for the language or for APIs. I have a lot of experience as a programmer. I've been programming since I was 18 years old, in many different environments. I started in a batch shop on a large mainframe, worked my way through Unix time-sharing machines, through PCs and desktops. And I've worked on very different kinds of projects, from research to more application development.

Bill Venners: You have a lot of experience that guides you in design decisions.

Guido van Rossum: Nothing beats experience.

Bill Venners: But it does sound like you're open to community feedback, which also helps you design better.

Guido van Rossum: The Python community does user testing by letting a vast group work with either a prototype implementation, a previous implementation, or a third-party implementation being prepared to go into the standard library. Then we tweak it as we go. We are not afraid to do a whole system redesign. One benefit of the ease with which you can change code in Python is that you're not so afraid to rethink your decisions.

Breaking Code Release to Release

Bill Venners: Sometimes after you've made a public release you might want to make an improvement to something in an API, but programmers have already written code to that API. To what extent do you break code from release to release in Python?

Guido van Rossum: We actually try very hard not to break code unless absolutely necessary. Only under rare circumstances do we resort to fixing a design bug in an incompatible way. More recently, as Python's user community has grown, we've become even more conservative about breaking code.

In the early days I changed the syntax drastically every few weeks or months. That's no longer the case. We now keep the old way of doing things, and add a new, better option that we attempt to persuade people to use. We use the carrot instead of the stick. Maybe eventually we'll start warning people when they run code the old way; for example, when people still use the old regular expression library five years after we said that's no longer how we're going to do it, we're going to give them warnings.

The Python Decision Process

Frank Sommers: How do you decide what goes into the standard library? If I have a suggestion and decide to develop my own library, what's the decision process?

Guido van Rossum: Python is a language, but it's also a community. The community works in a certain way that seems to affect how the language evolves, like the decision on what does and does not go into the standard library. I used to accept contributions from almost anybody, as long as I thought they had a certain cool factor.

Bill Venners: The contribution was cool or the person making the contribution was cool?

Guido van Rossum: The contribution. I usually didn't know the people. Until 1994, I had hardly met any Python person outside the Netherlands face to face. The majority of users were in the US by then. No, I've always tried to judge contributions by merit and not by personalities.

In the early days I was fairly quick to adopt new ideas, and then I realized the community was growing and that meant more and more contributions. I had to be more selective. My first step was always saying no. Then, if people didn't take no for an answer, I would ask for arguments. Why do you think this is useful not just for you but for a large number of Python users?

If you are writing one particular approach for a popular application area, but there are lots of different ways of doing it, I won't put your particular way in the standard library if I can help it. But if there's one obvious way, clearly one best approach, I'm much more likely to put it into the standard library.

Application APIs are usually less likely to end up in the standard library than APIs that support many different application areas. At some point I think I made the mistake of including too much multimedia stuff. Some of the multimedia APIs are now withering away, because I lost interest in multimedia and because they're no longer relevant to what people are doing with multimedia now. There's no MP3 support, for example, but there's some support for old-style audio files that are probably not particularly useful anymore.

On the other hand, the standard library's Internet protocols, for example, are useful for many people. We also have a small collection of useful mathematical algorithms that aren't implemented by our basic data types. For example, if you have a sorted array, you can do a binary search or a binary insert into that array.

On occasion, a large group of users in a particular area will suddenly vocalize what they feel we must support, because it's going to be the next buzzword. Or maybe it's an application area where Python is going to be really useful because the area involves a lot of tinkering and experimenting and prototyping before you get it right. A big example is XML.

Python has a lot of XML support, because at some point a lobby of people said XML is going to be big, and Python is such a good language for dealing with XML. They asked for standard XML support in Python's standard library, because they didn't want to depend on a third party. So I suggested they create a third-party XML support library for Python. It matured through several revisions of user testing, feedback, and improvement, and then became part of the standard library. And that is still growing. There are still some parts of XML that Python doesn't support, but there's third-party support that will eventually make its way into the standard library.

Next Week

Come back Monday, February 24 for the first installment of a conversation with Pragmatic Programmers Dave Thomas and Andy Hunt. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.

Resources

Python.org, the Python Language Website:
http://www.python.org/

Introductory Material on Python:
http://www.python.org/doc/Intros.html

Python Tutorial:
http://www.python.org/doc/current/tut/tut.html

Python FAQ Wizard:
http://www.python.org/cgi-bin/faqw.py

Guido van Rossum's home page:
http://www.python.org/~guido/

Other Guido van Rossum Interviews:
http://www.python.org/~guido/interviews.html

Talk back!

Have an opinion? Be the first to post a comment about this article.

About the authors

Frank Sommers is an editor with Artima Developer. He is also founder and president of Autospaces, Inc., a company providing collaboration and workflow tools in the financial services industry.

Bill Venners is president of Artima Software, Inc. and editor-in-chief of Artima.com. He is author of the book, Inside the Java Virtual Machine, a programmer-oriented survey of the Java platform's architecture and internals. His popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in the Jini Community since its inception. He led the Jini Community's ServiceUI project that produced the ServiceUI API. The ServiceUI became the de facto standard way to associate user interfaces to Jini services, and was the first Jini community standard approved via the Jini Decision Process. Bill also serves as an elected member of the Jini Community's initial Technical Oversight Committee (TOC), and in this role helped to define the governance process for the community. He currently devotes most of his energy to building Artima.com into an ever more useful resource for developers.