Summary
Scala pattern matchers with side effects can have unexpected results in for expressions.
Advertisement
Scala has taken the traditional for-construct and put it on steroids.
One of the interesting features is the ability to use pattern matching
as part of a for expression. For example, the following:
case class Person(firstName:String, lastName: String);
val people = List(
Person("Jane", "Smith"),
Person("John", "Doe"),
Person("Jane", "Eyre"));
for (Person("Jane", last) <- people) yield "Ms. " + last;
will (effectively) iterate over all people with a first name of Jane,
bind last to each person's last name in turn, and then
generate a list of the results of the yield block,
thus returning
List("Ms. Smith", "Ms. Eyre")
In this example, we're using a case class to provide the pattern
matching for us, but we can easily role our own. For example,
consider the following matcher which will return the first element of
an iterator, if there is one:
object First { def unapply[A](iter:Iterator[A]): Option[(A)] = { if (iter.hasNext) Some(iter.next) else None;
}
}
We can try to get the first element of some iterators:
val iters = List(Iterator.from(1), Iterator.from(3)); for(First(x) <- iters) yield x;
However, this does not yield List(1,3) as one might
expect. Instead, it yields List(2, 4)!
So what is going on here? According to Programming in Scala,
a pattern match in a for loop gets converted into a filter followed by
a map. Specifically,
In other words, first the input expression is filtered for those
elements which match the pattern, and then in a separate pass, the
pattern is applied for the purpose of
evaluating expr2. Unfortunately, there are a couple
of issues with this. First, if the pattern match is computationally
expensive, that work will be doubled for every match. Second, and
more concerning, if the pattern match is not side effect free, then
unexpected results like the one above can happen. In the example
above, the pattern match, by calling next on an iterator,
changes the state of the iterator, so that a different result comes
out in the second pass.
Interestingly, Programming Scala states that "it's
guarannteed that a pattern-matching generator will never throw
a MatchError". In fact, this is not true. For example,
executing
for (First(x) <- List(Iterator.single(1))) yield x;
will try to access a single-element iterator twice, resulting in:
scala.MatchError: empty iterator
at $anonfun$2.apply(<console>:6)
at $anonfun$2.apply(<console>:6)
at scala.List.map(List.scala:812)
One way to view this situation is that Scala's for
construct is inherently functional, and should not be mixed with
non-functional constructs such as iterators. However, it turns out
that there is a completely functional fix to this problem. Simply
change the compile-time pattern-match translation to something like:
expr1 map { casepat => Some(expr2); case _ => None;
} filter { case Some(_) => true; case None => false;
} map { case Some(x) => x;
}
Of course, there is a cost to this approach: by introducing a third
pass, extra an extra list of Option objects is created.
Another fix would be to add new methods filterMap
and filterForEach which cut down the work to a single
pass. While less elegant, it would clearly provide a performance
boost, and it would be largely hidden from view. In the mean time,
one should use pattern matching in for expressions with care.
Pattern matching expressions (including extractors and guards) should always be side-effect free. The compiler can't make guarantees about when or how many times such expressions will get evaluated.
Nice post; an important thing to learn. I think you left something out, though. In the iterators example illustrating the problem, you never defined iters. That would make it much clearer.
> Nice post; an important thing to learn. I think you left something out, though. In the iterators example illustrating the problem, you never defined iters. That would make it much clearer.
Thanks for catching that! I've added the definition of iters to the example.