Summary
Last week I started a Python 3000 FAQ, and solicited additional questions. Here are some additional answers.
Advertisement
This is a sequal to last week's post.
It's tempting to replace most Q/A pairs below with
Q. Will Python 3000 have feature X (which has not been
proposed yet)?
A. No. The deadline for feature proposals (PEPs) was April 30,
2007.
but I figured it would be better to try to explain why various ideas
(most of which aren't new) weren't proposed, or why they were rejected.
Q. Will implicit string concatenation be removed in Python 3000?
(I.e., instead of ("a""b") you'd have to write ("a"+"b").)
A. No. This was proposed in PEP 3126, but rejected.
Q. Will the binary API for strings be standardized in Python 3000?
(Depending on a compile-time switch, Unicode strings use either a
2-byte wide or 4-byte wide representation.)
A. No, there are still compelling reasons to support 2 bytes in
some cases and 4 bytes in others. Usually this is dealt with by
compiling from source with the headers corresponding to the installed
Python binary. If that doesn't work for you, and you really care
about this, I recommend that you bring it up on the python-3000
mailing list, explaining your use case.
Q. Why isn't the GIL (Global Interpreter Lock) recursive?
A. Several reasons. Recursive locks are more expensive, and the
GIL is acquired and released a lot. Python's thread package doesn't
implement recursive locks in C (they are an add-on written in Python,
see RLock in threading.py). Given the different thread APIs on
different platforms it's important that the C code involved in threads
is minimal. But perhaps the most important reason is that the GIL
often gets released around I/O operations. Releasing only a single
level of a recursive lock would not be correct here; one would have to
release the underlying non-recursive lock and restore the recursion
level after re-acquiring. This is all rather involved. A
non-recursive lock is much easier to deal with.
Q. Will we be able to use statements in lambda in Python 3000?
A. No. The syntax (turning indentation back on inside an
expression) would both awkward to implement and hard to read for
humans. My recommendation is just to define a local (i.e., nested)
function -- this has the same semantics as lambda without the
syntactic restrictions. After all, this:
foo = lambda: whatever
is completely equivalent to this:
def foo(): return whatever
(except that the lambda doesn't remember its name).
Q. Will Python 3000 require tail call optimization?
A. No. The argument that this would be a "transparent"
optimization is incorrect -- it leads to a coding style that
essentially depends on tail call optimization, at which point the
transparency is lost. (Otherwise, why bother asking to require it?
:-) Also, tracebacks would become harder to read. Face reality --
Python is not a functional language. It works largely by side
effects on mutable objects, and there is no opportunity for program
transformation based on equivalent semantics.
Q. Will Python 3000 provide "real" private, protected and public?
A. No. Python will remain an "open kimono" language.
Q. Will Python 3000 support static typing?
A. Not as such. The language would turn into Java-without-braces.
However, you can use "argument annotations" (PEP 3107) and write
a decorator or metaclass to enforce argument types at run-time. I
suppose it would also be possible to write an extension to
pychecker or pylint that used annotations to check call
signatures.
Q. Why doesn't str(cforcinX) return the string of
concatenated c values?
A. Then, to be consistent, str(['a','b','c']) would have to
return 'abc'. I don't think you want that. Also, what would
str([1,2,3]) do? It's a grave problem if str() ever raises
an exception because that means the argument cannot be printed:
print calls str() on each of its arguments.
About the Unicode question, a little Googling reveals that it has been discussed on python-dev before, but without a conclusion:
"UCS2 vs. UCS4 matters because the two versions use and expose different C APIs and thus an extension written for UCS2 doesn't run with a Python built for UCS4 and vice-versa."
The question was about the interface, though. We don't care how it is stored internally, and converting between the two is rather trivial, I think. But it's very difficult to make a C binary that can link against different binary interfaces at runtime.
Our use case is simple: we want to combine Python and C code, because sometimes you need the extra speed of C, or its ability to integrate with other C libraries, or the ability to continue using existing C code while migrating code gradually to Python.
Since most of our users are not programmers, they don't typically have a C compiler or the Python headers, so only providing them with source code isn't very helpful.
> The question was about the interface, though. We don't > care how it is stored internally, and converting between > the two is rather trivial, I think. But it's very > difficult to make a C binary that can link against > different binary interfaces at runtime.
This was done intentionally to prevent extension module compiled for 2-byte unicode to access the 4-byte unicode internals (or vice versa).
In Python 2.x, if you don't use any PyUnicode_* APIs, your C code should link with either version. Is that not your experience? In 3.0 this isn't much of an option because all strings are unicode. Perhaps we can do something different, but please do bring it up on the python-3000 list.
> Our use case is simple: we want to combine Python and C > code, because sometimes you need the extra speed of C, or > its ability to integrate with other C libraries, or the > ability to continue using existing C code while migrating > code gradually to Python. > > Since most of our users are not programmers, they don't > typically have a C compiler or the Python headers, so only > providing them with source code isn't very helpful.
Why don't you distribute a Python interpreter binary built with the right options? Depending on users having installed the correct Python version (especially if your users are not programmers) is asking for trouble.
Please do consider continuing this on python-3000@python.org.
This is regarding function annotations. Given decorators so often rely on constructing a new function, and just pass *args, **kw to the original one, will it break annotation of the original function? Hope I am making myself clear.
So often we do something like this in a constructor:
class Person(Model): def __init__(self, *args, **kw): super(Person, self).__init__(*args, **kw) # do some person specific thing
What will happen to the annotation of Model.__init__. Does it mean in order to preserve the annotation of Model.__init__ I will have to duplicate them on my Person.__init__?
I think Ruby's eating our lunch with respect to domain specific languages by offering blocks (and the ability to drop parentheses). Couldn't we do something like this:
..def mycommand(*args, **kw) cb: ......# this function requires a callback function called cb ......# do something with args and kw ......return cb() # call the callback
and to use it:
..mycommand("hello", name="test") do: ......# the body of this block will actually become a ......# method that is passed as an argument to mycommand
If you wanted to support arguments to callbacks you could add those after do, e.g.,
..mycommand("hello", name="test") do(arg): ...
It would be pretty similar to the way Ruby does it, no?
I really love list comprehensions, but the (later introduced) generator expressions are much more useful and should be encouraged.
Unfortunately, list comprehensions got the best syntax using [], while generator expressions are using () which are already very overloaded as they are used in function calls etc.
Since you can easily "fake" list comprehensions by adding a list() around generator expressions, wouldn't it be better (since we can be incompatible) to give this (better) syntax to generator expressions and drop list comprehensions completely?
..list([ e for e in l ])
if you absolutely need list comprehensions.
My guess is that most people actually want generator expressions but they end up using list comprehensions since that's the most obvious syntax.
Bjørn, I agree that generator expressions are often more useful than list comprehensions. The problem is there is already a defined list literal syntax (eg L = [1,2,3]) in the language and list comprehensions have been used for awhile in Python (since 2.0). Changing the behavior might break a lot of code. I also like the ability to create a generator in a function call. For example:
s = set(word for line in page for word in line.split())
I believe that list comprehensions will actually be syntactic sugar for list(genexp) in Python 3000.
For Stephane which have problem creating an artima account... (traduction from his french post in pythonfr mailing list - with my own english ;-):
Am-I the only one to deplore the reduce() function ? I think its nice, it goes well with map() (cf. http://labs.google.com/papers/mapreduce.html) and its a programming style to promote.
> Q. If you're killing reduce(), why are you keeping map() and filter()? > A. I'm not killing reduce() because I hate functional programming; I'm > killing it because almost all code using reduce() is less readable > than the same thing written out using a for loop and an accumulator > variable. On the other hand, map() and filter() are often useful and > when used with a pre-existing function (e.g. a built-in) they are > clearer than a list comprehension or generator expression. (Don't use > these with a lambda though; then a list comprehension is clearer and > faster.)
> This is regarding function annotations. Given decorators > so often rely on constructing a new function, and just > pass *args, **kw to the original one, will it break > annotation of the original function? Hope I am making > myself clear. >
You can copy the annotations just like the docstring:
_func.__annotations__ = f.__annotations__.copy()
If _func's signature is a transformation of f's signature you can of course modify the copy appropriately.
> So often we do something like this in a constructor: >
> class Person(Model): > def __init__(self, *args, **kw): > super(Person, self).__init__(*args, **kw) > # do some person specific thing >
> What will happen to the annotation of Model.__init__. Does > it mean in order to preserve the annotation of > Model.__init__ I will have to duplicate them on my > Person.__init__?
No. In general the signature of a subclass __init__ doesn't have to match the signature of the base class __init__ so it would be a mistake to do something like this automatically; and deducing that the signatures are actually the same would require way too much understanding in the compiler.
For other methods than __init__ you could argue that the signature *should* be the same, but there are all sorts of situations where overriding methods actually have a different signature -- e.g. they could change the argument names, or add a new (optional) argument, or change the set of types they accept. So again, this is not done automatically.
Don't think of this as a weak form of static typing. It's more appropriate to think of it as per-argument docstrings.
> I think Ruby's eating our lunch with respect to domain > specific languages by offering blocks (and the ability to > drop parentheses).
I don't think Ruby-envy is the way to design a language. remember, Ruby doesn't have first-class functions! The languages are just different. Besides (as I mentioned at the top of the article) the deadline for new features (especially new syntax) is long past. Try again for Python 4000. ;-)
> Bjørn, I agree that generator expressions are often more > useful than list comprehensions. The problem is there is > already a defined list literal syntax (eg L = [1,2,3]) in > the language and list comprehensions have been used for > awhile in Python (since 2.0). Changing the behavior might > break a lot of code. I also like the ability to create a > generator in a function call. For example: > > s = set(word for line in page for word in > line.split()) > > I believe that list comprehensions will actually be > syntactic sugar for list(genexp) in Python 3000.
Correct.
Also note that you don't need to parenthesize a generator expression when it's the sole argument to a function, so Bjørn's example list([ e for e in l ]) can actually be written list(e for e in l) already.
> For Stephane which have problem creating an artima > account... (traduction from his french post in pythonfr > mailing list - with my own english ;-): > > Am-I the only one to deplore the reduce() function ? > I think its nice, it goes well with map() > (cf. http://labs.google.com/papers/mapreduce.html) and its > a programming style to promote.
Not in Python. In function languages you don't have a way to update a variable on each pass though a loop, e.g.
total = 0 for x in lst: total += x
So this must be written as the equivalent of
total = reduce(lambda x, y: x+y, lst)
However I have done an extensive survey of the use of reduce() in actual Python code, and found that most cases could easily be rewritten using the built-in sum() function or "".join(...), and those that couldn't were mostly completely unreadable. (For a particularly bad example, see http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061 ("one-liner word-wrap function"). It doesn't fit on a line and the algorithm used is quadratic. Apart from that, it's unreadable (and the fact that it was quadratic escaped most readers, including the Django developers who incorporated it into Django until I pointed it out).
So I have to disagree that reduce() is a style to promote, at least in Python, where a much more straightforward style is available.
The point was that the common case should be for people to create genexps (in most cases it won't make a difference to the outcome of the algorithm, but will use less memory).
Not including [] or () around generator expressions or list comprehensions seems a bit hackish syntactically; at the very least it's another special case people have to learn.