This post originated from an RSS feed registered with Python Buzz
by Ian Bicking.
Original Post: 64 bit immediates in Python
Feed Title: Ian Bicking
Feed URL: http://www.ianbicking.org/feeds/atom.xml
Feed Description: Thoughts on Python and Programming.
Patrick Logan reminded me of immediates, and how interesting they
can get when you have 64 bits.
As a quick overview of an immediate -- typically an object in a
dynamically-typed, reference-based language (like Python or Smalltalk)
will be made up of a memory location. In memory you have information
about the object type, instance variables, etc. So on every operation
you get this pointer, find the location in memory, and do stuff.
With an immediate you put the entire object into the "pointer". To
do this, you might reserve the high bit to signal that this is an
immediate. If it's set, you have 31 bits left to encode the rest of
the object -- more than enough for a significant set of integers, and
potentially other objects (like True, False, and None). It gets
exciting with 64 bits, because you now have 63 bits of information
left, which gives a lot of room for storing different objects.
For instance, let's say you have a four-bit header for all Python
references (which is enough to distinguish a string), and 30 bits to
give the string's address (I have no idea how long addresses really
have to be, but anyway...); that gives you 30 bits to do whatever you
want with. You could store a hash value in there, which means that in
many cases you could compare and use strings without going into the
heap (memory) at all. You could store the string length (maybe only
using some of the bits, and limited to strings under a certain
length).
Immediates can generally only be used for immutable types, because an
immediate isn't passed around by reference -- since the "reference"
stores the object directly, it gets copied everytime it is
re-references (e.g., by assigning it to a variable). This is fine for
immutables, because multiple instances are indistinguishable from a
single instance, but for mutable values it would cause problems.
Some of this can already be done with 32 bits -- this sort of
optimization is (I believe) common in Smalltalk and Scheme
interpreters, typically for encoding small integers. And I don't
believe this is being done at all in the Python interpreter (though I
don't know for sure -- someone can correct me in the comments if I'm
wrong). Maybe something else for the thesis idea list