The Artima Developer Community
Scalazine | Discuss | Print | Email | First Page | Previous | Next
Sponsored Link

Scalazine
What's New in Scala 2.8: The Architecture of Scala Collections
by Martin Odersky and Lex Spoon
December 17, 2010

<<  Page 3 of 3

Advertisement

Integrating new sets and maps

As a second example you'll learn how to integrate a new kind of map into the collection framework. The idea is to implement a mutable map with String as the type of keys by a "Patricia trie" The term Patricia is in fact an abbreviation for "Practical Algorithm to Retrieve Information Coded in Alphanumeric." The idea is to store a set or a map as a tree where subsequent character in a search key determines uniquely a descendant tree. For instance a Patricia trie storing the three strings "abc", "abd", "al", "all", "xy" would look like this:

  <image not found>

Figure 1: A sample patricia tree

To find the node corresponding to the string "abc" in this trie, simply follow the subtree labeled "a", proceed from there to the subtree labelled "b", to finally reach its subtree labelled "c". If the Patricia trie is used as a map, the value that's associated with a key is stored in the nodes that can be reached by the key. If it is a set, you simply store a marker saying that the node is present in the set.

import collection._
  
class PrefixMap[T]
extends mutable.Map[String, T] 
   with mutable.MapLike[String, T, PrefixMap[T]] {
  
  var suffixes: immutable.Map[Char, PrefixMap[T]] = Map.empty
  var value: Option[T] = None
  
  def get(s: String): Option[T] =
    if (s.isEmpty) value
    else suffixes get (s(0)) flatMap (_.get(s substring 1))
  
  def withPrefix(s: String): PrefixMap[T] = 
    if (s.isEmpty) this
    else {
      val leading = s(0)
      suffixes get leading match {
        case None =>
          suffixes = suffixes + (leading -> empty)
        case _ =>
      }
      suffixes(leading) withPrefix (s substring 1)
    }
  
  override def update(s: String, elem: T) =
    withPrefix(s).value = Some(elem)
  
  override def remove(s: String): Option[T] =
    if (s.isEmpty) { val prev = value; value = None; prev }
    else suffixes get (s(0)) flatMap (_.remove(s substring 1))
  
  def iterator: Iterator[(String, T)] =
    (for (v <- value.iterator) yield ("", v)) ++
    (for ((chr, m) <- suffixes.iterator; 
          (s, v) <- m.iterator) yield (chr +: s, v))
  
  def += (kv: (String, T)): this.type = { update(kv._1, kv._2); this }
  
  def -= (s: String): this.type  = { remove(s); this }
  
  override def empty = new PrefixMap[T]
}

Listing 10: An implementation of prefix maps with Patricia tries.

Patricia tries support very efficient lookups and updates. Another nice feature is that they support selecting a subcollection by giving a prefix. For instance, in Figure 1 you can obtain the sub-collection of all keys that start with an "a" simply by following the "a" link from the root of the tree.

Based on these ideas we will now walk you through the implementation of a map that's implemented as a Patricia trie. We call the map a PrefixMap, which means that it provides a method withPrefix that selects a submap of all keys starting with a given prefix. We'll first define a prefix map with the keys shown in the running example:

scala> val m = PrefixMap("abc" -> 0"abd" -> 1"al" -> 2
  "all" -> 3"xy" -> 4)
m: PrefixMap[Int] = Map((abc,0), (abd,1), (al,2), (all,3),
  (xy,4))

Then calling withPrefix on m will yield another prefix map:

scala> m withPrefix "a"
res14: PrefixMap[Int] = Map((bc,0), (bd,1), (l,2), (ll,3))

Listing 10 shows the definition of PrefixMap. This class is parameterized with the type of associated values T, and extends mutable.Map[String, T] and mutable.MapLike[String, T, PrefixMap[T]]. You have seen this pattern already for sequences in the RNA strand example; then as now inheriting an implementation class such as MapLike serves to get the right result type for transformations such as filter.

A prefix map node has two mutable fields: suffixes and value. The value field contains an optional value that's associated with the node. It is initialized to None. The suffixes field contains a map from characters to PrefixMap values. It is initialized to the empty map.

You might ask why did we pick an immutable map as the implementation type for suffixes? Would not a mutable map have been more standard, since PrefixMap as a whole is also mutable? The answer is that immutable maps that contain only a few elements are very efficient in both space and execution time. For instance, maps that contain fewer than 5 elements are represented as a single object. By contrast, the standard mutable map is a HashMap, which typically occupies around 80 bytes, even if it is empty. So if small collections are common, it's better to pick immutable over mutable. In the case of Patricia tries, we'd expect that most nodes except the ones at the very top of the tree would contain only a few successors. So storing these successors in an immutable map is likely to be more efficient.

Now have a look at the first method that needs to be implemented for a map: get. The algorithm is as follows: To get the value associated with the empty string in a prefix map, simply select the optional value stored in the root of the tree. Otherwise, if the key string is not empty, try to select the submap corresponding to the first character of the string. If that yields a map, follow up by looking up the remainder of the key string after its first character in that map. If the selection fails, the key is not stored in the map, so return with None. The combined selection over an option value is elegantly expressed using flatMap. When applied to an optional value, ov, and a closure, f, which in turn returns an optional value, ov flatMap f will succeed if both ov and f return a defined value. Otherwise ov flatMap f will return None.

The next two methods to implement for a mutable map are += and -=. In the implementation of PrefixMap, these are defined in terms of two other methods: update and remove.

The remove method is very similar to get, except that before returning any associated value, the field containing that value is set to None. The update method first calls withPrefix to navigate to the tree node that needs to be updated, then sets the value field of that node to the given value. The withPrefix method navigates through the tree, creating sub-maps as necessary if some prefix of characters is not yet contained as a path in the tree.

The last abstract method to implement for a mutable map is iterator. This method needs to produce an iterator that yields all key/value pairs stored in the map. For any given prefix map this iterator is composed of the following parts: First, if the map contains a defined value, Some(x), in the value field at its root, then ("", x) is the first element returned from the iterator. Furthermore, the iterator needs to traverse the iterators of all submaps stored in the suffixes field, but it needs to add a character in front of every key string returned by those iterators. More precisely, if m is the submap reached from the root through a character chr, and (s, v) is an element returned from m.iterator, then the root's iterator will return (chr +: s, v) instead. This logic is implemented quite concisely as a concatenation of two for expressions in the implementation of the iterator method in PrefixMap. The first for expression iterates over value.iterator. This makes use of the fact that Option values define an iterator method that returns either no element, if the option value is None, or exactly one element x, if the option value is Some(x).

import scala.collection.mutable.{BuilderMapBuilder}
import scala.collection.generic.CanBuildFrom
  
object PrefixMap extends {
  def empty[T] = new PrefixMap[T]
  
  def apply[T](kvs: (String, T)*): PrefixMap[T] = {
    val m: PrefixMap[T] = empty
    for (kv <- kvs) m += kv
    m
  }
  
  def newBuilder[T]: Builder[(String, T), PrefixMap[T]] = 
    new MapBuilder[String, T, PrefixMap[T]](empty)
  
  implicit def canBuildFrom[T]
    : CanBuildFrom[PrefixMap[_], (String, T), PrefixMap[T]] = 
      new CanBuildFrom[PrefixMap[_], (String, T), PrefixMap[T]] {
        def apply(from: PrefixMap[_]) = newBuilder[T]
        def apply() = newBuilder[T]
      }
}

Listing 11: The companion object for prefix maps.

Note that there is no newBuilder method defined in PrefixMap. There is no need to, because maps and sets come with default builders, which are instances of class MapBuilder. For a mutable map the default builder starts with an empty map and then adds successive elements using the map's += method. Mutable sets behave the same. The default builders for immutable maps and sets use the non-destructive element addition method +, instead of method +=.

However, in all these cases, to build the right kind of set or map, you need to start with an empty set or map of this kind. This is provided by the empty method, which is the last method defined in PrefixMap. This method simply returns a fresh PrefixMap.

We'll now turn to the companion object PrefixMap. In fact it is not strictly necessary to define this companion object, as class PrefixMap can stand well on its own. The main purpose of object PrefixMap is to define some convenience factory methods. It also defines a CanBuildFrom implicit to make typing work out better.

The two convenience methods are empty and apply. The same methods are present for all other collections in Scala's collection framework so it makes sense to define them here, too. With the two methods, you can write PrefixMap literals like you do for any other collection:

scala> PrefixMap("hello" -> 5"hi" -> 2)
res0: PrefixMap[Int] = Map((hello,5), (hi,2))
  
scala> PrefixMap.empty[String]
res2: PrefixMap[String] = Map()

The other member in object PrefixMap is an implicit CanBuildFrom instance. It has the same purpose as the CanBuildFrom definition in the last section: to make methods like map return the best possible type. For instance, consider mapping a function over the key/value pairs of a PrefixMap. As long as that function produces pairs of strings and some second type, the result collection will again be a PrefixMap. Here's an example:

scala> res0 map { case (k, v) => (k + "!""x" * v) }
res8: PrefixMap[String] = Map((hello!,xxxxx), (hi!,xx))

The given function argument takes the key/value bindings of the prefix map res0 and produces pairs of strings. The result of the map is a PrefixMap, this time with value type String instead of Int. Without the canBuildFrom implicit in PrefixMap the result would just have been a general mutable map, not a prefix map.

Summary

To summarize, if you want to fully integrate a new collection class into the framework you need to pay attention to the following points:
  1. Decide whether the collection should be mutable or immutable.
  2. Pick the right base traits for the collection.
  3. Inherit from the right implementation trait to implement most collection operations.
  4. If you want map and similar operations to return instances of your collection type, provide an implicit CanBuildFrom in your class's companion object.
 

You have now seen how Scala's collections are built and how you can build new kinds of collections. Because of Scala's rich support for abstraction, each new collection type can have a large number of methods without having to reimplement them all over again.

Share your opinion

Have a question or opinion about Scala's collections API? Discuss this article in the Articles Forum topic, What's New in Scala 2.8: The Architecture of Scala Collections.

Resources

front cover Programming in Scala Martin Odersky and Lex Spoon are coauthors of Programming in Scala, now available in a Second Edition:
http://www.artima.com/shop/programming_in_scala_2ed

The Scala programming language website is at:
http://www.scala-lang.org

The Scala 2.8 release notes are at:
http://www.scala-lang.org/node/7009

How to use Scala collections is described at:
http://lampwww.epfl.ch/~odersky/whatsnew/collections-api/collections.html

About the authors

Martin Odersky is the creator of the Scala language. He is a professor at EPFL in Lausanne, Switzerland, and a founder of Scala Solutions Inc. He works on programming languages and systems, more specifically on the topic of how to combine object-oriented and functional programming. Since 2001 he has concentrated on designing, implementing, and refining Scala. Previously, he has influenced the development of Java as a co-designer of Java generics and as the original author of the current javac reference compiler. He is a fellow of the ACM.

Lex Spoon is a software engineer at LogicBlox, Inc. He worked on Scala for two years as a post-doc at EPFL. He has a Ph.D. in computer science from Georgia Tech, where he worked on static analysis of dynamic languages. In addition to Scala, he has worked on a wide variety of programming languages, ranging from the dynamic language Smalltalk to the scientific language X10 to the logic language that powers LogicBlox. He and his wife currently live in Atlanta with two cats, a chihuahua, and a turtle.

<<  Page 3 of 3

Scalazine | Discuss | Print | Email | First Page | Previous | Next

Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us