The Artima Developer Community
Sponsored Link

Weblogs Forum
Python 3000 Status Update (Long!)

47 replies on 4 pages. Most recent reply: Jun 13, 2008 4:43 PM by Brad Schick

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 47 replies on 4 pages [ 1 2 3 4 | » ]
Guido van van Rossum

Posts: 359
Nickname: guido
Registered: Apr, 2003

Python 3000 Status Update (Long!) (View in Weblogs)
Posted: Jun 18, 2007 11:29 PM
Reply to this message Reply
Summary
Here's a long-awaited update on where the Python 3000 project stands. We're looking at a modest two months of schedule slip, and many exciting new features. I'll be presenting this in person several times over the next two months.
Advertisement

Project Overview

Early History

The first time I came up with the idea of Python 3000 was probably at a Python conference in the year 2000. The name was a take on Windows 2000. For a long time there wasn't much more than a list of regrets and flaws that were impossible to fix without breaking backwards compatibility. The idea was that Python 3000 would be the first Python release to give up backwards compatibility in favor of making it the best language going forward.

Recent History

Maybe a year and a half ago (not coincidentally around the time I started working for Google, which gave me more time for work on Python than I had had in a long time) I decided it was time to start designing and planning Python 3000 for real. Together with the Python developer and user community I came up with a Plan. We created a new series of PEPs (Python Enhancement Proposals) whose numbers started with 3000. There was a PEP 3000 already, maintained by others in the community, which was mostly a laundry list of ideas that had been brought up as suitable for implementation in Python 3000. This was renamed to PEP 3100; PEP 3000 became the document describing the philosophy and schedule of the project.

Since then, we have, well, perhaps not moved mountains, but certainly a lot of water has flowed under the bridge of the python-dev mailing list, and later the separate python-3000 mailing list.

Tentative Schedule

A schedule was first published around a year ago; we were aiming for a first 3.0 alpha release by the end of the first half of 2007, with a final 3.0 release a year later. (Python 3.0 will be the version when it is released; "Python 3000" or "Py3k" is the project's code name.)

This schedule has slipped a bit; we're now looking at a first alpha by the end of August, and the final release is moved up by the same amount. (The schedule slip is largely due to the amount of work resulting from the transition to all-Unicode text strings and mutable raw bytes arrays. Perhaps I also haven't delegated enough of the work to other developers; a mistake I am frantically trying to correct.)

Python 2.6

There will be a "companion" release of Python 2.6, scheduled to be released a few months before 3.0, with an alpha release about 4 months before then (i.e., well after the first 3.0 alpha). The next two sections explain its role. If you're not interested in living on the bleeding edge, 2.6 is going to be next version of Python you'll be using, and it will not be very different from 2.5.

Compatibility and Transition

Compatibility

Python 3.0 will break backwards compatibility. Totally. We're not even aiming for a specific common subset. (Of course there will be a common subset, probably quite large, but we're not aiming to make it convenient or even possible to write significant programs in this subset. It is merely the set of features that happen to be unchanged from 2.6 to 3.0.)

Python 2.6, on the other hand, will maintain full backwards compatibility with Python 2.5 (and previous versions to the extent possible), but it will also support forward compatibility, in the following ways:

  • Python 2.6 will support a "Py3k warnings mode" which will warn dynamically (i.e. at runtime) about features that will stop working in Python 3.0, e.g. assuming that range() returns a list.
  • Python 2.6 will contain backported versions of many Py3k features, either enabled through __future__ statements or simply by allowing old and new syntax to be used side-by-side (if the new syntax would be a syntax error in 2.5).
  • Complementary to the forward compatibility features in 2.6, there will be a separate source code conversion tool. This tool can do a context-free source-to-source translation. As a (very simply) example, it can translate apply(f, args) into f(*args). However, the tool cannot do data flow analysis or type inferencing, so it simply assumes that apply in this example refers to the old built-in function.

Transitional Development

The recommended development model for a project that needs to support Python 2.6 and 3.0 simultaneously is as follows:

  1. Start with excellent unit tests, ideally close to full coverage.
  2. Port the project to Python 2.6.
  3. Turn on the Py3k warnings mode.
  4. Test and edit until no warnings remain.
  5. Use the 2to3 tool to convert this source code to 3.0 syntax. Do not manually edit the output!
  6. Test the converted source code under 3.0.
  7. If problems are found, make corrections to the 2.6 version of the source code and go back to step 3.
  8. When it's time to release, release separate 2.6 and 3.0 tarballs (or whatever archive form you use for releases).

The conversion tool produces high-quality source code, that in many cases is indistinguishable from manually converted code. Still, it is strongly recommended not to start editing the 3.0 source code until you are ready to reduce 2.6 support to pure maintenance (i.e. the moment when you would normally move the 2.6 code to a maintenance branch anyway).

Step (1) is expected to take the usual amount of effort of porting any project to a new Python version. We're trying to make the transition from 2.5 to 2.6 as smooth as possible.

If the conversion tool and the forward compatibility features in Python 2.6 work out as expected, steps (2) through (6) should not take much more effort than the typical transition from Python 2.x to 2.(x+1).

Status of Individual Features

There are too many changes to list them all here; instead, I will refer to the PEPs. However, I'd like to highlight a number of features that I find to be significant or expect to be of particular interest or controversial.

Unicode, Codecs and I/O

We're switching to a model known from Java: (immutable) text strings are Unicode, and binary data is represented by a separate mutable "bytes" data type. In addition, the parser will be more Unicode-friendly: the default source encoding will be UTF-8, and non-ASCII letters can be used in identifiers. There is some debate still about normalization, specific alphabets, and whether we can reasonably support right-to-left scripts. However, the standard library will continue to use ASCII only for identifiers, and limit the use of non-ASCII in comments and string literals to unit tests for some of the Unicode features, and author names.

We will use "..." or '...' interchangeably for Unicode literals, and b"..." or b'...' for bytes literals. For example, b'abc' is equivalent to creating a bytes object using the expression bytes([97, 98, 99]).

We are adopting a slightly different approach to codecs: while in Python 2, codecs can accept either Unicode or 8-bits as input and produce either as output, in Py3k, encoding is always a translation from a Unicode (text) string to an array of bytes, and decoding always goes the opposite direction. This means that we had to drop a few codecs that don't fit in this model, for example rot13, base64 and bz2 (those conversions are still supported, just not through the encode/decode API).

New I/O Library

The I/O library is also changing in response to these changes. I wanted to rewrite it anyway, to remove the dependency on the C stdio library. The new distinction between bytes and text strings required a (subtle) change in API, and the two projects were undertaken hand in hand. In the new library, there is a clear distinction between binary streams (opened with a mode like "rb" or "wb") and text streams (opened with a mode not containing "b"). Text streams have a new attribute, the encoding, which can be set explicitly when the stream is opened; if no encoding is specified, a system-specific default is used (which might use guessing when an existing file is being opened).

Read operations on binary streams return bytes arrays, while read operations on text streams return (Unicode) text strings; and similar for write operations. Writing a text string to a binary stream or a bytes array to a text stream will raise an exception.

Otherwise, the API is kept pretty compatible. While there is still a built-in open() function, the full definition of the new I/O library is available from the new io module. This module also contains abstract base classes (see below) for the various stream types, a new implementation of StringIO, and a new, similar class BytesIO, which is like StringIO but implements a binary stream, hence reading and writing bytes arrays.

Printing and Formatting

Two more I/O-related features: the venerable print statement now becomes a print() function, and the quirky % string formatting operator will be replaced with a new format() method on string objects.

Turning print into a function usually makes some eyes roll. However, there are several advantages: it's a lot easier to refactor code using print() functions to use e.g. the logging package instead; and the print syntax was always a bit controversial, with its >>file and unique semantics for a trailing comma. Keyword arguments take over these roles, and all is well.

Similarly, the new format() method avoids some of the pitfalls of the old % operator, especially the surprising behavior of "%s" % x when x is a tuple, and the oft-lamented common mistake of accidentally leaving off the final 's' in %(name)s. The new format strings use {0}, {1}, {2}, ... to reference positional arguments to the format() method, and {a}, {b}, ... to reference keyword arguments. Other features include {a.b.c} for attribute references and even {a[b]} for mapping or sequence access. Field lengths can be specified like this: {a:8}; this notation also supports passing on other formatting options.

The format() method is extensible in a variety of dimensions: by defining a __format__() special method, data types can override how they are formatted, and how the formatting parameters are interpreted; you can also create custom formatting classes, which can be used e.g. to automatically provide local variables as parameters to the formatting operations.

Changes to the Class and Type System

You might have guessed that "classic classes" finally bite the dust. The built-in class object is the default base class for new classes. This makes room for a variety of new features.

  • Class decorators. These work just like function decorators:

    @art_deco
    class C:
        ...
    
  • Function and method signatures may now be "annotated". The core language assigns no meaning to these annotations (other than making them available for introspection), but some standard library modules may do so; for example, generic functions (see below) can use these. The syntax is easy to read:

    def foobar(a: Integer, b: Sequence) -> String:
        ...
    
  • New metaclass syntax. Instead of setting a variable __metaclass__ in the body of a class, you must now specify the metaclass using a keyword parameter in the class heading, e.g.:

    class C(bases, metaclass=MyMeta):
        ...
    
  • Custom class dictionaries. if the metaclass defines a __prepare__() method, it will be called before entering the class body, and whatever it returns will be used instead of a standard dictionary as the namespace in which the class body is executed. This can be used, amongst others, to implement a "struct" type where the order in which elements are defined is significant.

  • You can specify the bases dynamically, e.g.:

    bases = (B1, B2)
    
    class C(*bases):
        ...
    
  • Other keyword parameters are also allowed in the class heading; these are passed to the metaclass' __new__ method.

  • You can override the isinstance() and issubclass() tests, by defining class methods named __instancecheck__() or __subclasscheck__(), respectively. When these are defined, isinstance(x, C) is equivalent to C.__instancecheck__(x), and issubclass(D, C) to C.__subclasscheck__(D).

  • Voluntary Abstract Base Classes (ABCs). If you want to define a class whose instances behaves like a mapping (for example), you can voluntarily inherit from the class abc.Mapping. On the one hand, this class provides useful mix-in behavior, replacing most of the functionality of the old UserDict and DictMixin classes. On the other hand, systematic use of such ABCs can help large frameworks do the right thing with less guesswork: in Python 2, it's not always easy to tell whether an object is supposed to be a sequence or a mapping when it defines a __getitem__() method. The following standard ABCs are provided: Hashable, Iterable, Iterator, Sized, Container, Callable; Set, MutableSet; Mapping, MutableMapping; Sequence, MutableSequence; Number, Complex, Real, Rational, Integer. The io module also defines a number of ABCs, so for the first time in Python's history we will have a specification for the previously nebulous concept file-like. The power of the ABC framework lies in the ability (borrowed from Zope interfaces) to "register" a concrete class X as "virtually inheriting from" an ABC Y, where X and Y are written by different authors and appear in different packages. (To clarify, when virtual inheritance is used, the mix-in behavior of class Y is not made available to class X; the only effect is that issubclass(X, Y) will return True.)

  • To support the definition of ABCs which requires that concrete classes actually implement the full interface, the decorator @abc.abstractmethod can be used to declare abstract methods (only in classes whose metaclass is or derives from abc.ABCMeta).

  • Generic Functions. The inclusion of this feature, described in PEP 3124, is somewhat uncertain, as work on the PEP seems to have slowed down to a standstill. Hopefully the pace will pick up again. It supports function dispatch based on the type of all the arguments, rather than the more conventional dispatch based on the class of the target object (self) only.

Other Significant Changes

Just the highlights.

Exception Reform

  • String exceptions are gone (of course).
  • All exceptions must derive from BaseException and preferably from Exception.
  • We're dropping StandardError.
  • Exceptions no longer act as sequences. Instead, they have an attribute args which is the sequence of arguments passed to the constructor.
  • The except E, e: syntax changes to except E as e; this avoids the occasional confusion by except E1, E2:.
  • The variable named after as in the except clause is forcefully deleted upon exit from the except clause.
  • sys.exc_info() becomes redundant (or may disappear): instead, e.__class__ is the exception type, and e.__traceback__ is the traceback.
  • Additional optional attributes __context__ is set to the "previous" exception when an exception occurs in an except or finally clause; __cause__ can be set explicitly when re-raising an exception, using raise E1 from E2.
  • The old raise syntax variants raise E, e and raise E, e, tb are gone.

Integer Reform

  • There will be only one built-in integer type, named 'int', whose behavior is that of 'long' in Python 2. The 'L' literal suffix disappears.
  • 1/2 will return 0.5, not 0. (Use 1//2 for that.)
  • Octal literal syntax changes to 0o777, to avoid confusing younger developers.
  • Binary literals: 0b101 == 5, bin(5) == '0b101'.

Iterators or Iterables instead of Lists

  • dict.keys() and dict.items() return sets (views, really); dict.values() returns an iterable container view. The iter*() variants disappear.
  • range() returns the kind of object that xrange() used to return; xrange() disappears.
  • zip(), map(), filter() return iterables (like their counterparts in itertools already do).

Miscellaneous

  • Ordering comparisons (<, <=, >, >=) will raise TypeError by default instead of returning arbitrary results. The default equality comparisons (==, !=, for classes that don't override __eq__) compare for object identity (is, is not). (The latter is unchanged from 2.x; comparisons between compatible types in general don't change, only the default ordering based on memory address is removed, as it caused irreproducible results.)
  • The nonlocal statement lets you assign to variables in outer (non-global) scopes.
  • New super() call: Calling super() without arguments is equivalent to super(<this_class>, <first_arg>). It roots around in the stack frame to get the class from a special cell named __class__ (which you can also use directly), and to get the first argument. __class__ is based on static, textual inclusion of the method; it is filled in after the metaclass created the class object (but before class decorators run). super() works in regular methods as well as in class methods.
  • Set literals: {1, 2, 3} and even set comprehensions: {x for x in y if P(x)}. Note that the empty set is set(), since {} is an empty dict!
  • reduce() is gone (moved to functools, really). This doesn't mean I don't like higher-order functions; it simply reflects that almost all code that uses reduce() becomes more readable when rewritten using a plain old for-loop. (Example.)
  • lambda, however, lives.
  • The backtick syntax, often hard to read, is gone (use repr()), and so is the <> operator (use !=; it was too flagrant a violation of TOOWTDI).
  • At the C level, there will be a new, much improved buffer API, which will provide better integration with numpy. (PEP 3118)

Library Reform

I don't want to say too much about the changes to the standard library, as this is a project that will only get under way for real after 3.0a1 is released, and I will not personally be overseeing it (the core language is all I can handle). It is clear already that we're removing a lot of unsupported or simply outdated cruft (e.g. many modules only applicable under SGI IRIX), and we're trying to rename modules with CapWords names like StringIO or UserDict, to conform with the PEP 8 naming standard for module names (which requires a short all-lowercase word).

And Finally

Did I mention that lambda lives? I still get the occasional request to preserve it, so I figured I'd mention it twice. Don't worry, that request has been granted for over a year now.

Speaking Engagements

I'll be presenting (or have presented) this material in person at several events:

The slides from OSCON are up: http://conferences.oreillynet.com/presentations/os2007/os_vanrossum.ppt (earlier versions were nearly identical).


Dave Benjamin

Posts: 10
Nickname: ramenboy
Registered: Nov, 2003

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 12:25 AM
Reply to this message Reply
I live in Phoenix, and the 21st just happens to be my birthday! I'd love to hear you speak, but, pray tell, where *is* the Phoenix office? My Google-fu is weak...

Thanks,
Dave Benjamin

Morel Xavier

Posts: 73
Nickname: masklinn
Registered: Sep, 2005

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 1:32 AM
Reply to this message Reply
I'm still a bit annoyed that reduce() is gone, but oh well, I'll just system() call Haskell code when I need folds.

On the other hand, I'd like to know if the old printf-format style is completely gone and replaced by format(), or if format() is merely an additional way to do it?

Morel Xavier

Posts: 73
Nickname: masklinn
Registered: Sep, 2005

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 1:35 AM
Reply to this message Reply
(replying to myself because I don't think I can edit the previous post): as a side note, why isn't reduce() merely moving to the functools module?

functools is a module of "Tools for working with functions and callable objects" after all, moving reduce there would remove it from the global namespace and still allow fold users to have it if they want/need it (plus it makes sense, folds + partial application make for great stuff)

Jack Diederich

Posts: 6
Nickname: jackdied
Registered: Mar, 2005

Re: reduce() Posted: Jun 19, 2007 2:37 AM
Reply to this message Reply
reduce() is hard to use legibly and like sum() it is open to bad big-O solutions. It doesn't even save finger typing. I code golf occasionally (on codegolf.com) and I've never found a way to make code shorter with reduce() versus a loop.

Morel Xavier

Posts: 73
Nickname: masklinn
Registered: Sep, 2005

Re: reduce() Posted: Jun 19, 2007 3:44 AM
Reply to this message Reply
> reduce() is hard to use legibly

i don't think so

> and like sum() it is open
> to bad big-O solutions.

That's a completely different issue.

> It doesn't even save finger
> typing.

Most fold users find it does, but more importantly it maps differently to the brain. When you're used to thinking recursively folding just makes sense in many situations. You're not and you prefer iteration, that's fine. Others don't, and like the security of most fold's immutable operations (when in regular iterations you have to mutate stuff at some point)

Oren Tirosh

Posts: 3
Nickname: orentirosh
Registered: Feb, 2005

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 4:09 AM
Reply to this message Reply
Ok, so class body __metaclass__ goes away. What about the module-level __metaclass__ variable? Do you view it as purely a feature to support the classic/new-style transition?

I do have some other uses for it...

Nick Coghlan

Posts: 13
Nickname: ncoghlan
Registered: Dec, 2004

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 4:47 AM
Reply to this message Reply
> Octal literal syntax changes to 0o777,to avoid confusing younger developers.

I personally found the avoidance of data corruption bugs due to the use of int(x, 0) when processing data files with leading zeroes a more convincing rationale for this change...

I'll also second the question about reduce() - are we ditching it entirely, or just moving it to functools?

Anyway, nice summary!

George Sakkis

Posts: 14
Nickname: gsakkis
Registered: Jun, 2007

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 6:19 AM
Reply to this message Reply
Is __format__ orthogonal to __repr__/__str__/__unicode__ ? It's rather unclear, to me at least, how (or if) all these will coexist in Py3k.

Looking forward to the alpha!

Jeffrey Jacobs

Posts: 3
Nickname: timehorse
Registered: Jun, 2007

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 7:54 AM
Reply to this message Reply
Great job Guido!!!

I so look forward to Py3k! So many great decisions, though a few I'm still getting used to...

1) Making print and format functions is a very good idea. It should teach people that they should just use sys.stdout and sys.stderr directly instead of mucking with those values to get print to go out stderr by setting "sys.stdout = sys.__stderr__". Much improved. So I assume you intend "format('\{{1}, {a}\}', 'first', {'a':'second'})" to produce something like '{first, second}'? Is that the idea? Perhaps just the PEP number for this one so I can read it myself.

2) Generic Functions: Does this mean writing overloaded functions based on type, or are you going to allow C++ style template parameters using function decorations, e.g. @template<class typeA, class typeB> and then use partial template specialization to bind the types to specific function implementations. Again, I suppose I should read the corresponding PEP...

3) map, zip, filter returning iterators: makes me nervous but seems sound in principle. And if you need to map a list , a, to a list, for instance, you could always do 'list(map(f, a))', could you not? So sounds like a plan!

4) Reduce, I shall miss thee, but thine demise was foretold. Not 100% convinced that writing out in code is cleaner. After all, C++ has a a mutable form of reduce in its std algorithms and C++ is famous for going light on the library overkill (with the exception of std::basic_string).

5) How about '{,}' for an empty Set()? I guess it would be a pain to remember '{,}' for Sets and '{}' for Dicts but then again, as new developers come on board, they may not know the history of Python or why (from their point of view) '{}' is chosen arbitrarily to mean Dict. In fact, I would probably assume it meant Set since that is the simpler type and Sets, like Tuples, Strings, Bytes and Lists are all 1 item per element types, so why not '{}' assume the same, that is an empty Set? After all, Py3k is a redesign, so why not either eliminate '{}' all together or just say '{}' mean Set, where maybe an empty Set can be up-converted automatically to a Dict but if a Dict is explicitly desired, you must write 'Dict()'. Just a thought.

6) OT: Look, can Java just add a Matcher.expand method like Python's re.match.expand??? That's all I'm asking!
( http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6562056 )

NOTE: Bruce reminds me that this means there is technically "No Rule 6."

7) So it has NOTHING to do with Mystery Science Theater? Riiiiight! ;)

8) Hey! We miss you on the East Coast. Gulf Coast Pycon was a blast, and Great Lakes Pycon is be windy swingin' (thanks AMK and Mr. Goodger) but it's years since Pycon DC (Mr. Holden: when is Pycon London?) and even if the Baltimore Sprint comes to be, we miss you here in NoVa Guido!

9) Yay! Lambda!

10) Call it like it is: ABCs are Java Interfaces (or C++ Pure Virtual Classes).

There is a reason they call you Benevolent Dictator For Life, sir! In the end, you are always right!! :)

Now take care and keep up the excellent work!!!

Chuck Wegrzyn

Posts: 3
Nickname: wegrzyn
Registered: May, 2003

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 8:36 AM
Reply to this message Reply
Too bad in the function signatures you didn't include a way to indicate what exceptions can be thrown. A great many times you won't find that information in the documentation and you need to read the code.

John Gabriele

Posts: 1
Nickname: biped
Registered: Jun, 2007

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 9:11 AM
Reply to this message Reply
Not sure if these are quite on-topic or not, but:

1. Will Py3k come with setuptools instead of distutils?

2. Will docutils be part of Py3k?

Andrew Kuchling

Posts: 4
Nickname: amk
Registered: Jan, 2006

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 10:49 AM
Reply to this message Reply
> 1. Will Py3k come with setuptools instead of distutils?
> 2. Will docutils be part of Py3k?

Channeling van Rossum: those are questions about the standard library, which he isn't planning to rule on.

(setuptools has been verbally accepted for inclusion in 2.6, but we'll see if anyone finds the time to actually incorporate it.)

Tom -

Posts: 1
Nickname: tmber
Registered: Jun, 2007

go all generic functions? Posted: Jun 19, 2007 12:18 PM
Reply to this message Reply
"Generic Functions. The inclusion of this feature, described in PEP 3124, is somewhat uncertain, as work on the PEP seems to have slowed down to a standstill. Hopefully the pace will pick up again. It supports function dispatch based on the type of all the arguments, rather than the more conventional dispatch based on the class of the target object (self) only."

I think having both generic functions and member functions in a language is rather confusing since member functions behave just like generic functions that dispatch only on the first argument. I'd either leave out generic functions, or drop the member function syntax and use generic function syntax for everything.

Whatever you do, I'd drop the @overload keyword--generic functions are not the same as overloading.

Bram Cohen

Posts: 2
Nickname: bram
Registered: Mar, 2005

Re: Python 3000 Status Update (Long!) Posted: Jun 19, 2007 1:45 PM
Reply to this message Reply
Will there be a specially used main(argv) method of modules in Python 3000, which is only run if the given module is the main one? That would avoid the bizarre, repetitive, and pointless if __name__ == '__main__': main(sys.argv)

Flat View: This topic has 47 replies on 4 pages [ 1  2  3  4 | » ]
Topic: Python 3000 Status Update (Long!) Previous Topic   Next Topic Topic: Scala steadily marches towards world domination!

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use