This post originated from an RSS feed registered with Scala Buzz
by Zemian Deng.
Original Post: Implementing String#scan in scala
Feed Title: thebugslayer
Feed URL: http://www.jroller.com/thebugslayer/feed/entries/atom?cat=%2FScala+Programming
Feed Description: Notes on Scala and Sweet web framework
I kind of like Ruby's String#scan method and wonder how hard is it to make that possible in Scala. So I gave it a try.
At the end, I want something like this to work:
So the trick is to add #scan and #scanGroup to the existing java.lang.String class that will do the text extraction and return an List of matched string.
Here is the implementation(needs to place above code if on same file):
class StringHelper(val orig : String){
//create a matcher object then pass to user function
private def matcher(pattern : String, input : String)(f : java.util.regex.Matcher => Unit) =
f(java.util.regex.Pattern.compile(pattern).matcher(input))
def scan(pattern : String) = {
var ls : List[String] = Nil
matcher(pattern, orig){ m => while(m.find) ls = m.group(0) :: ls }
ls.reverse
}
def scanGroup(pattern : String) = {
var ls : List[List[String]] = Nil
matcher(pattern, orig){ m =>
if(m.groupCount > 0)
while(m.find){
var groupls : List[String] = Nil
for(i <- 1 to m.groupCount)
groupls = m.group(i) :: groupls
ls = groupls.reverse :: ls
}
}
ls.reverse
}
}
implicit def stringHelper(s : String) = new StringHelper(s)
Scala will use stringHelper to convert to the new instance of StringHelper whenever java.lang.String#scan or #scanGroup is found. This is done "implicitly" hence the keyword used, so that our main entry program can remain simple.
I noticed that Ruby version can return wither array of string, or a array of array when grouping is used. This is a perfect example of the flexibility of dynamic language that can be dangerous if not used correctly. One can break a provided code block just by giving a different pattern. In Scala, these two return types are distinct, and therefore I have two separated method for handling them. To me it's much more clear and less buggy at runtime.
The result looks like this: