Recently, Yehuda Katz wrote an article in reaction to a Pythonista's criticism of Ruby and in particular to the complaint that you can't call a proc with parentheses:
my_method = Proc.new {1 + 2}
my_method()
Yehuda did a great job of defending why this is consistent with the rest of Ruby, and talked about how blocks in Ruby are really used. As part of the article, he conflated blocks and Procs. The difference is rather subtle, and in most cases can be ignored. But Giles pointed out that the difference is there, and is documented both in the Pickaxe and in David Flanigan and Matz's "The Ruby Programming language" and I posted a comment in support of Giles after Yehuda pushed back maintaining his view that there was no difference.
Yehuda followed up with another article, Which prompted this article.
Let me start by saying that I'm not interested in starting a war here. First, I've nothing but respect for Yehuda, and I'm very impressed by his body of work, particularly the refactoring of Rails from Rails 2 to Rails 3. Second, I agree that most of this stuff really doesn't matter all that much, and only when discussing things at a rather deep level. It can be like a debate about how many angels can dance on the head of a pin! But such discussions can be fun and in rare instances even enlightening.
Proc Identity
Here's Yehuda's example from the second article:
def foo
yield
end
def bar(&block)
puts block.object_id
baz(&block)
end
def baz(&block)
puts block.object_id
yield
end
foo { puts "HELLO" } #=> "HELLO"
bar { puts "HELLO" } #=> "2148083200\n2148083200\nHELLO"
He points out that the identity of the proc object bound to the two arguments named block in the bar and baz methods doesn't change. He then gives a slightly different example:
def foo(&block)
puts block.object_id
yield
end
b = Proc.new { puts "OMG" }
puts b.object_id
foo(&b) #=> 2148084040\n2148084040\nOMG
and then proposes two "mental models" of what's going on:
- The &b unwraps the Proc object, and the &block recasts it into a Proc. However, it somehow also wraps it back into the same wrapper that it came from into the first place. or...
- The &b puts the b Proc into the block slot in fooâs argument list, and the &block gives the implicit Proc a name. There is no need to explain why the Proc has the same object_id; it is the same Object!
He then says that the first actually represents the MRI implementation. Based both on a reading of Flanigan and Matz and the MRI code, I respectfully disagree, the truth lies in the middle
The first thing to realize is that the use of & before the last argument of a method definition is not the same thing as the use of & before the last parameter value in a method invocation. The
An analogy with Splat
Ruby has a similar prefix *, a.k.a splat, or what David Black likes to call "the unary unarray" operator. Actually depending on where it occurs * is either an "unarray" or a "make array" operator.
Consider this code:
array = [1, 2, 3]
def three_args(a, b, c)
"a is #{a}, b is #{b}, c is #{c}"
end
a, b, c = *array
a # => 1
b # => 2
c # => 3
three_args(*array) # => "a is 1, b is 2, c is 3"
def glob_args(*args)
args
end
x, *array2 = 4, 5, 6
x # => 4
array2 # => [5, 6]
glob_args(:now, :is, "the", Time) # => [:now, :is, "the", Time]
As I said above, * is an unarray operator when it appears either in the right hand side of an assignment, or in the argument list of a method invocation. It acts as an operator which coalesces multiple values into an array if it appears on the left hand side of an assignment or in front of a formal parameter in a method
One important consideration in the use of both splat and proc arguments, is that the method definition doesn't "know" whether the method will be invoked like this:
foo # with no block
# or this:
foo { 1 }
# or this:
foo(&b)
or even this:
proc = Proc.new {1}
foo(proc)
All of these are valid, and need to work.
Method definition with a & Formal Parameter
In Yehuda's last quoted example this is the def foo case.
Method Invocation with a & arg
Two things happen when a method is called with a & argument. The cases of calling the method with either an implicit block, an explicit value for the argument, or no argument for the block at all all need to be handled. And the cases of explicitly calling the argument e.g via block.call, and yielding to the block both have to work.
So the &block argument in the definition of foo means that the method prelude ensures that the block argument will either be nil or will refer an object which is a Proc, or at least appears to be a proc. It does this at the entry to the invocation of the method.
It also must ensure that yield semantics will work whether the method was called with an implicit block, or an explicit value for the parameter. Note that in MRI, block yield works by reference to a field called iter in the current stack frame, which in Ruby 1.8 is used to find the node in the abstract syntax tree of the method which defined the block which corresponds to that block. And yield is implemented by evaluating the subtree of the AST rooted at that node. Of course the representation in YARV in Ruby 1.9 represents executable code differently but the effect is the same. The yield keyword deals with the internal "VM" representation of the executable ruby code directly without surfacing it as a Ruby object. This is the real difference between a block and an instance of Proc.
The draft Ruby standard abstracts this iter field a bit using the notation [block] to refer to a logical stack of blocks in the execution context. The draft distinguishes between block and procs. Since I started writing this article, Avdi Grimm wrote another reaction to Yehuda's second article, which looks at the same issues I'm talking about here from the perspective of the draft standard.
With all that said, here's how that formal block argument is handled when the foo method is invoked
- If block is not nil, then it sees if a block was given in the method call, and if so, it creates a Proc object which will cause the block to be executed when the proc is called.
- If block is NOT nil
- If the value of block is not already an instance of Proc send :to_proc to the value (with a guard to see if it responds, but that's a minor implementation decision to avoid having the overhead of catching a MethodMissing exception). This is why defining Symbol#to_proc allows writing things like (1..10).map(&:succ).
- Set up the VM so that yield will work should that happen. In Ruby 1.8 this involves correctly pushing an iterator onto the stack frame, I haven't read through the YARV implementation but I'm sure that it has the same effect.
- If the object referenced by block isn't a proc or convertible using #to_proc, to a 'proc-like' duck which can quack to the tune of #call, then a TypeError is raised.
Note that this isn't wrapping the argument with a proc, it's ensuring that we have an object which acts as a proc, if it needs to be 'cast' into a proc it will be, but if it's already a real Proc, or it responds to to_proc by returning self, then it will be the same object.
Method invocation with a & prefix on the last argument value
This part is a bit simpler. The argument itself is just passed through. The trick is that in the process of invoking the method, the sending code must do the same thing as step 2.2 above in case the method does a yield.
Does it Matter?
Barring any embarrassing mistakes on my part, this is what happens in MRI ruby, as described in section 6.4.5 Block Arguments of Flanigan and Matsumoto, as well as my reading of the MRI code.
Yehuda makes the point that this is all pretty invisible to the Ruby programmer, and he's right. It would seem that a Ruby implementation could ALWAYS turn blocks into procs and not have a separate hidden iter structure in the VM. One reason for not doing so is performance. Since Procs are closures and capture the bindings of any variables in their scope, there is some overhead to their creation and destruction, if a block is only accessed via yield, then it's guaranteed not to have a lifetime past the return of the called method. So this is an optimization. And such optimizations are known in other dynamic language implementations. Smalltalk gives the illusion of uniformly using closures to represent blocks, but most implementations cheat and recognize cases where the overhead of creating a closure can be avoided. In some cases this is invisible to the Smalltalk programmer, but not always.
So although we might know exactly angels are dancing on the head of the pin, or what steps they are doing, the ruby language books and the draft standard let it slip that they are there.