This post originated from an RSS feed registered with Python Buzz
by Thomas Guest.
Original Post: Octal Literals
Feed Title: Word Aligned: Category Python
Feed URL: http://feeds.feedburner.com/WordAlignedCategoryPython
Feed Description: Dynamic languages in general. Python in particular. The adventures of a space sensitive programmer.
I recently discovered that you could write binary literals directly
using Ruby, which I thought a good idea. Programming languages
have to think in binary so it’s important that a language should
support them naturally. I also spotted that Ruby extends the usual C
convention for octal literals. In this case, I think Ruby makes the
mistake of building on a broken design.
Perhaps the Latin Small Letter O was more obvious?
The Problem with Octal Numbers
The optional O (that’s the letter O) to explicitly indicate the
base does make some sort of sense – it’s consistent with the X for
hexadecimal and indeed the B which Ruby adds for binary. But really it’s
just adding confusion to an already confusing design.
Here, a novice programmer has padded the numbers in the countdown
with leading zeros to make them line up nicely. Fortunately the
compiler
catches the problem in this case:
invalid digit "9" in octal constant
We might not have been so lucky, though. Here’s some dangerously
broken Python:
This runs through the interpreter without raising a
SyntaxError. We do have a semantic error, though. "L" and "X" map to
octal literals with decimal values 40 and 8 respectively. 0ops!
When are Octal Literals Needed?
I’ve never really needed them. 8 is a power of two, but so is 16: if
it’s a binary number we need, a binary literal would be better; and, in
the absence of language support for binary literals, a hexadecimal number
is more useful than an octal since two hex numbers make up a byte.
Occasionally octals are useful ways to insert a non-printable character into
a string literal. Here’s an example:
Octal value in a String Literal
std::string s = "ABC\177DEF";
Here, the escaped octal value 177 is embedded into the string. Octal 177
equals hexadecimal 7F, but we run into trouble if we try:
Very large Hex value in a String Literal
std::string s = "ABC\x7FDEF";
Here, the "DEF" characters are valid hexadecimal and therefore
become part of the number we’re embedding; so we’ve tried to put the
hex number 7FDEF into a byte. If we’re lucky our
compiler will warn us:
warning: hex escape sequence out of range
If we’re unlucky or if we don’t act on the warning, the result is
implementation defined. In any case, it’s certainly not what we
wanted. Of course, embedded octal escape sequences suffer from the
exact same problem if succeeded by one of the letters "0" - "7".
The workaround is simple:
Hex 7F in a String Literal
std::string s = "ABC\x7F" "DEF";
or even:
Hex 7F in a String Literal
std::string s = "ABC" "\x7F" "DEF";
In other words, even this use of octal values is of limited practical use.
Optional Radices for Integral Literals
Octal literals – as implemented in the C family of languages – are
problematic and not especially useful. However, it is occasionally
useful to be able to write numbers using a different radix (and probably
more useful than we realise
since we’ve never been able to try it). I’ve
already said why I think binary numbers are desirable. Hexadecimal
numbers, which pack so neatly into bytes, are also of special interest.
But why restrict ourselves to radices 10, 16, 8 and 2? A bit of
Googling
found this suggestion from Andrew Koenig on a Python mail list archive:
I am personally partial to allowing an optional radix (in decimal)
followed by the letter r at the beginning of a literal, so 19, 8r23,
and 16r13 would all represent the same value.