I thought I'd ask our lead VM engineer - Eliot Miranda - for
some details on method lookup in Smalltalk (VisualWorks in
particular):
Rodney, you should read the following books & papers (in
order); they'll help you understand Smalltalk's performance.
[Goldberg83] Adele Goldberg, David Robson, Smalltalk-80: The
Language and its Implementation, Addison-Wesley, 1983, ISBN
0.201.11371.6.
Now out of print but available by combining
- Adele Goldberg, David Robson, Smalltalk-80: The Language,
Addison-Wesley, 1989, 0.201.13688.0
- The Blue Book (Implementation)
- [Deutsch84] L. Peter Deutsch, Allan M. Schiffman, "Efficient
Implementation of the Smalltalk-80 System", 11th Annual Symposium
on Principles of Programming Languages, pp. 297-302, January 1984,
ACM.
Context Management in VisualWorks 5i
Eliot Miranda - Available on the web here(PDF)
But briefly, here's how things are faster than you expect. Message selectors are maintained by the system as a pool of unique strings (Symbols), so that equality comparison of message selectors requires only comparing the addresses of the symbol objects. In the 70's and early 80's message lookup was optimized by the run-time system maintaining a small (1024 or 2048 entry) method lookup cache that remembers recent method lookups. The table is hashed by the identity hash of the receiver's class and the message selector. On early systems the id hash is equivalent to the object's address. Method lookup then becomes:
hash := receiver class hash + selector hash bitAnd: CacheSize.
(cache at: hash) class == receiver class
and: [(cache at: hash) selector selector])
ifTrue:
[targetMethod := (cache at: hash) method]
ifFalse:
[targetMethod := self lookup: selector in: receiver class.
(cache at: hash)
class: receiver class;
selector: selector;
method: targetMethod.
Since then Dynamic Translation (a.k.a. JIT compilation) has
ncreased performance by nearly an order of magnitude. The run-time
system does not interpret bytecode; instead it maintains a cache of
the most recent used methods compiled on-demand to machine code. We
call these nmethods. Every message send is first translated into
the following machine code sequence:
classRegister := selector. "i.e. load a register with the address of a symbol"
call unlinkedSend1Args. "i.e. call a run-time routine to find the method, encoding
the arg count in the call for arg counts 0 to small n"
When unlinkedSend is invoked it locates the receiver from e.g.
argument registers, obtains the selector from classregister and
uses a modified version of first-level method lookup cache
algorithm above to locate an nmethod for the lookup. If an nmethod
isn't found it searches the class hierarchy, translates the method
to native code and stores it in the first-level lookup cache. And
now the clever bit... The send site is rewritten from
classRegister := selector.
call unlinkedSend1Args.
to
classRegister := class. "i.e. whatever the class of the receiver was when unlinkedSend was called"
call nmethod.entryPoint
The nmethod's code at entry point then checks that the class of
the current receiver agrees with that stored in classregister,
e.g.
entry:
tempRegister := receiver class.
tempRegister != classRegister ifTrue:
[self handleSendMiss].
...
So if the receiver's class is the same as it was when the send
site was rewritten the target method is the same and we're done. So
we simply have a class dereference, a regiser assignment and a
comparison. 90% of send sites are monomorphic. So this speeds
things up enormously.
Polymorphic send stes are sped up by using "polymorphc inline
caches" or PICs, which look like a jump table, doing a series of
class comparisons.
There is also substantial mechanism to allow native stack frames
to be used, creatring context objects for method activations only
when required.