Ruby Buzz Forum - A Chat with Matz, Classs Variable reversion, and A Mystery Explained

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
A Chat with Matz, Classs Variable reversion, and A Mystery Explained

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Rick DeNatale

Posts: 269
Nickname: rdenatale
Registered: Sep, 2007

Rick DeNatale is a consultant with over three decades of experience in OO technology.

A Chat with Matz, Classs Variable reversion, and A Mystery Explained

Posted: Nov 3, 2007 1:02 AM

This post originated from an RSS feed registered with Ruby Buzz by Rick DeNatale.
Original Post: A Chat with Matz, Classs Variable reversion, and A Mystery Explained Feed Title: Talk Like A Duck Feed URL: http://talklikeaduck.denhaven2.com/articles.atom Feed Description: Musings on Ruby, Rails, and other topics by an experienced object technologist.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Rick DeNatale Latest Posts From Talk Like A Duck

I finally got to meet Matz in person today, and we had a nice little chat.

I was pleased to learn that he follows this blog. My readership might be small, but it's quality readership. One thing I told him was that as much as I love, and still love, Smalltalk, I really do feel that I love Ruby just a little bit more.

We talked a bit about the upcoming stabilization of 1.9, which will be frozen as far as language and standard libary changes for the Christmas 2007 1.9.1 release, at which time transition from development to stable status. He has been backing out some of the differences between 1.8.x and 1.9.0. He told me that he had recently, or was soon to, revert the changes to the scoping of class variables. It appears that class variables will continue to be shared between classes and subclasses.

He also cleared up a technical ruby mystery that's been puzzling me for some time.

About a year ago I wrote about a change in Ruby 1.9 which cleaned up the semantics when modules included by a class are re-included in a subclass. The mystery is that sometime after I wrote the article, in a subsequent revision of 1.9, this got dropped. I never could find out why this happened, so today I took the opportunity to ask Matz in person.

He agreed that the way that it worked temporarily was the right way for it to work, but it got dropped because YARV had difficulties with the change in implementing the super keyword.

We didn't go into further details, but here's my informed speculation on the situation.

When you send a message to a Ruby object, what happens, at least notionally, is that a linked list of (pseudo)objects is searched to find the first one which contains a method with that name in its method hash. Let's call this the method lookup chain. The head of the chain is pointed to by the receiving objects klass field, and is either the singleton class of the object if it has one, or the Class of the object, links in the chain represent modules and classes in the ancestry of the object. If the end of the chain is reached a method_not_found message is generated and searched for beginning again from the beginning of the receiving objects method chain.

Each link is either a class or a pseudo object marked as an "IClass" (for Included Class) whose method hash pointer points to the method hash of the Module it represents. Because module method hashes are shared in this way, changes modules methods will be visible to every class or module which includes the module immediately.

When the 'receiver' is super, the same kind of search occurs as for a send to self, but instead of starting with the receivers method chain, we want to search starting with the link in the chain after the one in which the currently executing method was found.

Smalltalk implementations typically reify methods as CompiledMethod objects which have an instance variable which refers to the class in which the method was defined. So a send to super simply starts the search with the superclass of the currently executing CompiledMethods definition class.

This approach doesn't work for Ruby because Modules can be included in multiple classes or modules, which means that methods defined in Modules can appear in multiple places in multiple method lookup chains.

One way to find the right place to start searching for a method with super as the receiver is to start from the beginning of selfs method lookup chain, search until we find the currently executing method, then start the super search with the next link in the chain. I believe that this is how Ruby works, except for that brief time in the evolution of 1.9.

In Smalltalk you must name the message when you do a send to super. In Ruby you can't. Smalltalk, unlike Ruby, allows you to send a different message than the current one to super. In Smalltalk, the keyword super effectively means "self, but resolve any messages starting with the superclass of the class which defined the currently executing method," while in Ruby it means something like, "the current message, but look for me after where you found the currently executing method." This difference simplifies finding the current method in the chain. We know we're looking for a method with the same name in both the first and second searches. Since each link in the chain has a hash of methods, it's generally faster to do a hash lookup on the name, and in the first stage of the search compare the value returned with the current method.

This two-phase search for super runs into a small snag when a modules method hash appears more than once in a given method lookup chain. The first stage super search can find the wrong place in the chain.

Here's some code similar to that in the earlier article.

module M
  def foo #foo_m
    puts "foo in M1"
    super
  end
end

class C1
  include M
end

class C2 < C1
  def foo #foo_c
    puts "foo in C2"
    super
  end
end 

class C3 < C2
  include M
end

Admittedly this code is contrived, it's specifically crafted to expose the issue in the simplest way, C1, C2, and C3 would probably have other methods which don't need to be shown here.

If every module inclusion in the above code created an IClass then the method chain for instances of C3 Would look like this:

The boxes with names inside represent the class objects, empty orange boxes represent IClass pseudo-objects, the blue box labeled M is the module object, and the blue labels represent the methods.

Okay, now let's say we evaluate the expression:

C3.new.foo

We first search the method chain for a foo method and the first one we find is foo_m in the first IClass, so we invoke it.
We reach the super in foo_m, so we start at the beginning of the method chain to find it which we do in the first IClass, since this is a send to super, we then look for the next mathod named foo, and we find foo_c in class C2, so we invoke it.
We reach the super in foo_c, so we start at the beginning of the method chain to find it which we do in the first IClass, since this is a send to super, we then look for the next mathod named foo, and we find foo_c in class C2, so we invoke that.
We reach the super in foo_c, so we start at the beginning of the method chain to find it which we do in the first IClass, since this is a send to super, we then look for the next mathod named foo, and we find foo_c in class C2, so we invoke that.
Houston, we have a problem!

So, including IClasses for a module more than once can lead to an infinite loop when there's a super, if the two-part re-search is used.

And this is most probably the reason why the code to include a module checks to see if the module is already included and doesn't actually re-include it in this, admittedly rare case.

So late last year, some other mechanism must have been being used to find the start of the super method chain, to allow real module re-inclusion. Perhaps, whenever a method was invoked, a pointer to the Class or IClass where it was found was placed on the execution stack so that it could be found if needed to resolve super, perhaps some other mechanism was used.

Whatever method was used, it was apparently incompatible, or hard to implement efficiently with YARV, so when YARV became a standard part of 1.9, the old module inclusion logic came back.

Oh well, being able to re-include really makes the semantics of Module inclusion clearer, seems a shame, albeit perhaps a rather small one.

Read: A Chat with Matz, Classs Variable reversion, and A Mystery Explained

Previous Topic

Next Topic


	Web Artima.com