The Artima Developer Community
Sponsored Link

Code by Any Other Name
Pattern matching in Scala for expressions
by Ian Robertson
January 31, 2010
Summary
Scala pattern matchers with side effects can have unexpected results in for expressions.

Advertisement

Scala has taken the traditional for-construct and put it on steroids. One of the interesting features is the ability to use pattern matching as part of a for expression. For example, the following:

case class Person(firstName:String, lastName: String);

val people = List(
  Person("Jane", "Smith"),
  Person("John", "Doe"),
  Person("Jane", "Eyre"));

for (Person("Jane", last) <- people) yield "Ms. " + last;
will (effectively) iterate over all people with a first name of Jane, bind last to each person's last name in turn, and then generate a list of the results of the yield block, thus returning
List("Ms. Smith", "Ms. Eyre")
In this example, we're using a case class to provide the pattern matching for us, but we can easily role our own. For example, consider the following matcher which will return the first element of an iterator, if there is one:
object First {
  def unapply[A](iter:Iterator[A]): Option[(A)] = {
    if (iter.hasNext) Some(iter.next) else None;
  }
}
We can try to get the first element of some iterators:
val iters = List(Iterator.from(1), Iterator.from(3));
for(First(x) <- iters) yield x;
However, this does not yield List(1,3) as one might expect. Instead, it yields List(2, 4)!

So what is going on here? According to Programming in Scala, a pattern match in a for loop gets converted into a filter followed by a map. Specifically,

for (pat <- expr1) yield expr2
is translated by the compiler to
expr1 filter {
  case pat => true;
  case _ => false;
} map {
  case pat => expr2;
}
In other words, first the input expression is filtered for those elements which match the pattern, and then in a separate pass, the pattern is applied for the purpose of evaluating expr2. Unfortunately, there are a couple of issues with this. First, if the pattern match is computationally expensive, that work will be doubled for every match. Second, and more concerning, if the pattern match is not side effect free, then unexpected results like the one above can happen. In the example above, the pattern match, by calling next on an iterator, changes the state of the iterator, so that a different result comes out in the second pass.

Interestingly, Programming Scala states that "it's guarannteed that a pattern-matching generator will never throw a MatchError". In fact, this is not true. For example, executing

for (First(x) <- List(Iterator.single(1))) yield x;
will try to access a single-element iterator twice, resulting in:
scala.MatchError: empty iterator
    at $anonfun$2.apply(<console>:6)
    at $anonfun$2.apply(<console>:6)
    at scala.List.map(List.scala:812)

One way to view this situation is that Scala's for construct is inherently functional, and should not be mixed with non-functional constructs such as iterators. However, it turns out that there is a completely functional fix to this problem. Simply change the compile-time pattern-match translation to something like:

expr1 map {
  case pat => Some(expr2);
  case _ => None;
} filter {
  case Some(_) => true;
  case None => false;
} map {
  case Some(x) => x;
}
or, more succinctly:
expr1 map {
  pat.unapply(_);
} filter {
  _.isDefined;
} map {
  _.get;
}
Of course, there is a cost to this approach: by introducing a third pass, extra an extra list of Option objects is created. Another fix would be to add new methods filterMap and filterForEach which cut down the work to a single pass. While less elegant, it would clearly provide a performance boost, and it would be largely hidden from view. In the mean time, one should use pattern matching in for expressions with care.

Talk Back!

Have an opinion? Readers have already posted 6 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Ian Robertson adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Ian Robertson is an application architect at Verisk Health. He is interested in finding concise means of expression in Java without sacrificing type safety. He contributes to various open source projects, including jamon and pojomatic.

This weblog entry is Copyright © 2010 Ian Robertson. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use