If It's Not Nailed Down, Steal It

Pattern Matching, S-Expressions, and Domain Specific Languages in Ruby

by Topher Cyll

May 23, 2006

Summary

There's a whole world of language features that we sometimes miss out on as Rubyists, such as pattern matching, S-expressions, and external domain-specific languages. But the good news is that we can have them, too, as long as we're not afraid to steal a few things first.

Theft #1: Pattern Matching

Pattern matching is a relatively rare language feature found in Standard ML, OCaml⁰, Haskell, Common Lisp (CLOS), and a handful of others. It’s a form of multiple dispatch, so pattern matching functions run different code when called with different arguments. Pattern Matching lets you do this dispatch based on type, value, and even internal structure of the function’s arguments.

If there was ever a time to steal Pattern Matching, it’s now. Long rumored but now visible in the distance, Perl6 will be making heavy use of Multiple Dispatch.

Casing the Joint

In Haskell (a pattern matching language), a function to return the nth Fibonacci number looks like this:

  fib 0 = 1
  fib 1 = 1
  fib n = (fib n-1) + (fib n-2)

In this recursive function, the base cases (when the function argument is either 0 or 1) are explicitly given the value of 1. Meanwhile, the recursive case is written generically, so if the argument is not 0 or 1, it is named ‘n’, and ‘(fib n-1) + (fib n-2)’ is recursively evaluted.

The Heist

The good news for the heist is that we only need to run the goods the last mile. They’re waiting safely bundled in a Ruby gem on Rubyforge. Let’s make off with them now while no one’s looking.

  $ gem install -r multi

Phew. Okay, let’s see what we got.

Pattern Matching in Ruby

The Fibonacci function above looks like this in normal Ruby:

  def fib(n)
    return 1 if n == 0 || n == 1
    return fib(n-1) + fib(n-2)
  end

Which really isn’t so bad. But let’s rewrite the above function using multi.rb anyways:

  multi (:fib, 0)       { 1 }
  multi (:fib, 1)       { 1 }
  multi (:fib, Integer) { |n| fib(n-1) + fib(n-2) }

For better or worse there it is: Pattern matching in Ruby. We’ll inspect our acquisition in a moment, but first let’s look at the example above.

Dissection

There are three calls to the multi function above. The first parameter to each of them is the symbol :fib. We already know this is a Fibonacci function, so we appear to be calling the multi function with the first argument being the name of the function as a symbol.

Using the Haskell fib function as a template, the rest of the Ruby one is not hard to decipher. The parameters to multi after the method name are the pattern to match. The fib method does two kinds of matching. We see in the top two declarations that it is matching instance values like 0 or 1. But in the last declaration, we see a class instead of an instance.

Actually, it’s a little more complicated than that under the hood. In Ruby, the class Integer is actually an instance of class Class, but multi does some hand waving here. It assumes you probably weren’t really hoping somebody called your fib method with the parameter Integer, and were instead hoping to matching a parameters of type Integer.

The other interesting features of the declarations are the blocks following each multi. These blocks form the bodies of the multi-methods. When a multi-method call finds an implementation match, it passes in the matching parameters to the block. Of course, Ruby lets us ignore block parameters when we feel like it, so in the first two method bodies we don’t even give them names. But in the third implementation we match any Integer parameter and we probably want to hang onto that value. We name it n, like the Haskell example, and then we can use it directly in our method body.

Taking It Around the Block

Don’t forget to require the gem and it’s components before trying multi.

  #!/usr/bin/ruby -w
  require 'rubygems'
  require 'multi'
  require 'amulti'
  require 'smulti'

Now, let’s define our Fibonacci method and try calling it.

  fib(0) ===> 1
  fib(1) ===> 1
  fib(4) ===> 5
  fib(5) ===> 8

It works! So far we’ve been using multi to write what are essentially free functions. However, we’ll probably want to do methods that are part of objects as well.

Using Multi in Classes

A multi method is not entirely the same as a method defined using def. multi is a actually a method call, while def is a keyword. If multi and def were the same, we’d be able to write something like this:

  class Example
    def method1(x)
      return 0 if x == 0
      return @a + x
    end

    multi(:method2, 0) { 0 }
    multi(:method2, Integer) {|x| @a + x }
  end

Unfortunately, because of the difference, this doesn’t work. But why?

Here’s a hint: Ruby blocks are closures. Normally this is great, but here it means that @a captures a reference to the @a variable in the class Example. Not the instance, but the class itself. So calls to method2() on instances of the class will all use the same @a variable (located in the Example class), while calls to method1() will use the unique instance variable in each instance.

How can we get around this? Well, we need to perform the multi definition someplace where the context for a block is the instance, not the class. Someplace that gets run for each instance:

  class Example
    def initialize()
      multi(:method2, 0)       { 0 }
      multi(:method2, Integer) {|x| @a + x }
    end

    def method1(x)
      return 0 if x == 0
      return @a + x
    end
  end

By putting the code in the initialize method we get the proper block scope and make sure every instance gets the method. This slows us down a little, but dispatch with multi isn’t blazingly fast anyways, so this isn’t such a big deal. But this approach does have some drawbacks. One problem in particular is that if a subclass forgets to call super then the subclass’ instances will lack the multi-methods of the superclass. This also makes it difficult to add multi-methods at runtime to all instances of a class without using ObjectSpace.

Destructuring

Destructuring is a fairly common operation. For example, Ruby performs implicit Array destructuring on mulitple assignment:

  one, two, three = [1, 2, 3]

Ruby can also do explicit Array destructuring on method invocation.

  [].push(*[1, 2, 3])

produces [1, 2, 3] instead of [[1, 2, 3]].

You can start to see what’s going on here. Destructuring means taking apart a complex data structure, piece by piece. The Multiple Dispatch library contains two destructuring implementations, ‘amulti’ and ‘smulti’. amulti destructures arrays, smulti destructures strings.

Let’s have a look at amulti.

  amulti(:foo, Integer) {|i, rest| puts i; foo(rest) }
  amulti(:foo)          { puts "DONE" }
  foo([1, 2, 3])

Produces:

  1
  2
  3
  DONE

Methods defined using amulti must take only a single list as a argument. The first elements of the list will be matched against the various patterns of the available dispatches, until one is found that fits. Then those elements are removed from the list and passed in as the first parameters while the rest of the list is passed in as the last parameter.

In the example above, the first rule matches if the first item in the list is an Integer. It then passes that Integer into the block with a list containing the rest of the list. When there are no more Integers to be removed, the final body is called. smulti works just the same except that it uses Regexp and String literals to tear chunks off the front of strings. It can be used to do simple parsing and other exercises.

Behind the Scenes

So how does multi work? Multi is built with two kinds of objects, Dispatches which represent a pattern along a method body, and a singleton Dispatcher that handles all calls to all multiply dispatched functions. The definition of the multi function itself is quite trivial.

  def multi(method_name, *patterns, &body)
    Multi::DISPATCHER.add(Multi::Dispatch, self, method_name, patterns, body)
  end

This instructs the singleton dispatcher to create a new Dispatch of type Multi::Dispatch which is registered inside the dispatcher under that method name for that object. Multi defines methods on a per object basis to get around the scope problem described earlier. The registration process must therefore store the dispatch in a manner that it is associated with the object. Multi does something pretty sleazy here, and uses, along with the method name, the object’s object_id to store the dispatch in a hash table. Since there can be multiple dispatches per method, the dispatch is actually stored in a list in that particular location of the hash table, as seen below.

  key = [obj.object_id(), method_name]
  @map[key] ||= []
  @map[key].push(type.new(patterns, body))

If this is the first time a multimethod pattern has been added, a stub method must also be installed that will ask the dispatcher to run the right dispatch if the method is ever called.

  if ! obj.methods.include?(method_name)
    obj.instance_eval <<-"DONE" 
      def #{method_name}(*params, &block)
        Multi::DISPATCHER.call(self, \"#{method_name}\", params, block)
      end
    DONE

  end

Then if the method is ever called, it calls the dispatcher, which runs through the list of possible dispatches searching for the first match.

  dispatches = @map[[obj.object_id, method_name]]

  dispatch = dispatches.find{|dispatch| dispatch.match?(params) }
  dispatch.call(params, block)

Some languages that use multiple dispatch search for the “best” dispatch (for example, classes closest in the inheritence hierarchy to the actually method arguments). Multi is strictly in order. One nice you can do with Multi, though, is define new Dispatch classes. smulti is implemented using a class named StringDispatch and a wrapper ‘smulti’ that looks almost identical to the definition of regular ‘multi’.

  def smulti(method_name, *patterns, &body)
    Multi::DISPATCHER.add(Multi::StringDispatch, self, method_name, patterns, body)
  end

We’ll look at smulti some more in a bit. But for now, having successfully nabbed pattern matching, let’s move on to the next heist.

Theft #2: S-Expressions

Symbolic expressions, or, more commonly, S-expressions, are a rarity found almost exclusively in the land of Lisp. Here’s are some examples:

  (+ 8 (- 4 2))  
  (if (not nil) t (and nil t))

Both of these s-expressions are Lisp code. The first is just math expressed in prefix notation common to Lisp code. The second uses an if statement and some logical operators. The if statement is not so different from a ruby statement, though the use of ‘t’ for truth and ‘nil’ for false may seem a bit strange a first.

And here’s a more complicated example of S-expressions used as data, not code:

  (content
    (title "S-Expression Demo")
    (rated 5.0))

In it’s simplest form, an S-expression is a list of symbols. Lists look like the contents of parenthesis separated by spaces. Symbols are barewords. So:

  (this is an sexpression)

is a list containing the symbols ‘this’, ‘is’, ‘an’, ‘sexpression’. Most Lisp implementations also accept string and number literals, so we can write things like in the example s-expressions above.

Ruby gives us everything we need to represent S-expressions. Ruby list literals are written in enclosing brackets ([...]) and use commas as separators. Symbols are written as barewords proceeded by a colon (:).

Given that information, what does our simple example look like in Ruby?

  [:this, :is, :an, :sexpression]

Not nearly as pretty, especially when things get more complicated.

  [:content,
   [:title, "S-Expression Demo"],
   [:rated, 5]]

The commas are what really kill it for me, but the colons aren’t so hot either. Luckily, there’s a Ruby gem to let us write Ruby S-expressions in Lisp syntax.

Another Gem

First install the gem:

  $ gem install -r sexp

Then, remember to require the gem:

  #!/usr/bin/ruby -w
  require 'rubygems'
  require 'sexp'

Now let’s take it for a test run:

  "(this is an sexpression)".parse_sexp ===> [:this, :is, :an, :sexpression]
  '(content
    (title "S-Expression Demo")
    (rated 5))'.parse_sexp  

    ===> [:content, [:title, "S-Expression Demo"], [:rated, 5.0]]

Can it also do the reverse?

  [:this, :is, :an, :sexpression].to_sexp ===> "(this is an sexpression)"

It sure can.

S-Expressions are Built with smulti

The S-Expression parser is built almost entirely using the smulti mechanism we talked about earlier. It uses regular expressions to tear chunks off the front of strings and builds objects out of them. As described above, it parses lists, symbols, strings, and numbers. Let’s look at the parse() method for SExpressionParser::Main.

  smulti(:parse, /\s+/)    {|c, rest| parse(rest)                }
  smulti(:parse, /\(/)     {|c, rest| @res = List.new(rest)      }
  smulti(:parse, /\"/)     {|c, rest| @res = String.new(rest)    }
  smulti(:parse, NumberRE) {|c, rest| @res = Number.new(rest, c) }
  smulti(:parse, SymbolRE) {|c, rest| @res = Symbol.new(rest, c) }

The top dispatch says if the first character is any whitespace character, to ignore it and parse the rest of the string minus that character. However, the second and third dispatch are much more interesting. They match the opening character for lists and strings accordingly. If they are triggered, then the text after the opening character will be passed into a specialized parser. The same happens for Numbers and Symbols, although their regexps are seperated out for readability.

While the leading characters to lists and strings are typically thrown away, we identify symbols and numbers when we see a character or a digit accordingly. Well, actually a digit or a leading period for decimal numbers. These first characters are part of the symbol or number, so we can’t just throw them out. That’s why there are passed in to their specialized parsers, unlike the others.

  smulti(:parse, NumberRE) {|c, rest| @res = Number.new(rest, c) }
  smulti(:parse, SymbolRE) {|c, rest| @res = Symbol.new(rest, c) }

We don’t need to look at all the sub parsers, but let’s peak into the List parser class. It’s parse method is very simple.

  smulti(:parse, /\)/  ) {|s, rest| leave(rest) }
  smulti(:parse, /\s+/ ) {|s, rest| parse(rest) }
  smulti(:parse, //    ) {|s, rest|
    item = Main.new(rest)
    add(item.value)
    parse(item.unwanted)

  }

As in the main parser, white space is thrown away and parsing continues.

  smulti(:parse, /\s+/ ) {|s, rest| parse(rest) }

If at any point we find the closing parenthesis, we call the inherited leave() method. leave() is our way of telling the List parser that we’re done. It stores the remaining text in an instance variable where it can be retrieved by someone else (in this case the Main parser that called us) and returns.

  smulti(:parse, /\)/  ) {|s, rest| leave(rest) }

However, in all other situations, the text is actually passed of to a new Main parser that can handle any of the basic s-expression types that we could encounter anywhere, even inside a list. This parser’s task is to make sense of the numbers, strings, symbols, or nested lists that this list might contain. And when the Main parser completes, we add the newly created item to our list of contents and continue parsing whatever comes next.

As parsing goes, s-expressions are not a difficult exercise. But it’s nice to see we can use multiple dispatch to quickly throw together a parser. And the fact of the matter is, the real fun of s-expressions is not parsing them, but using them.

So shall we put our previous heists to work for us and steal just one more thing?

Theft #3: External Domain Specific Languages

DLSs, or domain specific languages, are hot. Sometimes a general-purpose programming language just isn’t the clearest or shortest way to solve some problem. Creating a sub-language that can solve the problem more elegantly is one of the best ways to get around this.

Martin Fowler, an advocate of object-oriented programming, patterns, and agile software development, divides DSLs into two types. In his own words [1], “External DSLs are written in a different language than the main (host) language of the application and are transformed into it using some form of compiler or interpreter… Internal DSLs morph the host language into a DSL itself.”

Internal DSLs

Most Ruby DSLs are Internal DSLs. Here’s an example from the Dwemthy’s Array DSL in the infamous and terrifying Chapter 6 of Why’s Poignant Guide to Ruby²:

  class TeethDeer < Creature
     life 655
     strength 192
     charisma 19
     weapon 109
  end

This code describes the fearsome TeethDeer creature, known for it’s deadly bite. There’s no magic here, just some class methods. It’s nice that this code is only Ruby. Ruby on Rails also uses some small Internal DSLs in Ruby. David Heinemeier Hansson extends some of the core Ruby classes to let you write 3.days.ago and have it do the right thing. Unfortunately, coding these kinds of DSLs can get complex, and the Ruby syntax may ultimately feel limiting.

External DSLs

External DSLs aren’t as common in Ruby, but they do show up. For example, the Ruby DBI library uses embedded SQL:

  sth = dbh.prepare("SELECT * FROM users");

So, as a point of comparison, 3.days.ago is still all Ruby code, but “SELECT * FROM users” is not.

Logo

Okay, enough talk, let’s implement a simple version of the Logo programming language in Ruby. Remember Logo³? That language with the silly turtle that draws? Logo code is very simple and traditionally looks like this:

  repeat 4 [ forward 100 right 90 ]

This produces:

Logo output 1

Our logo will differ only in that instead of square brackets for blocks, we’ll use parenthesis:

  repeat 4 ( forward 100 right 90 )

The “good old days” of just flipping bits in video memory are long gone, so let’s draw SVG (scalable vector graphics) images instead. SVG is an XML drawing format, and since it’s just text, it should be relatively easy for us to generate. We’ll start by creating a Logo class, and giving it a render() method

  def render
    return <<-END
  <?xml version="1.0" standalone="no"?>
  <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20010904//EN" 
  "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
  <svg width="100%" height="100%" xmlns="http://www.w3.org/2000/svg">
   <g stroke="black" stroke-width="1">
    #{@buffer}</g>
   </svg>
  END
  end

This is an alarming snippet, but it just creates the headers and root element destined for our SVG file, and dumps the instance variable @buffer as part of the output. The real trick is going to be turning the Logo commands into SVG lines and storing them in that buffer.

The Turtle

Just to refresh your memory, the little Logo turtle drags a pen with him, so every time he moves, he draws a line. The most basic commands he listens to are ‘left’, ‘right’, and ‘forward’. Left and right ask him to turn some number of degrees left or right, and forward tells him to walk some number of pixels forward.

Therefore, we’ll have three other instance variables: @x, @y, and @angle. In order to keep @angle between 0 and 360 (not strictly necessary, but it’s cleaner this way), let’s make a setter for angle.

  def turn(num)
    @angle = (@angle + num) % 360
  end

Now, when the turtle moves forward, we’ve got to actually draw a line. Using the standard polar to cartesian conversion, the move() method computes the turtle’s new position and draws a line between there and his old position (so long as the @pen instance variable is set). Of course, he doesn’t actually draw the line, he just puts the text representing it into our @buffer.

  def move(distance)
    oldx, oldy = @x, @y
    radians = @angle * Math::PI / 180
    @x += distance * Math.cos(radians)
    @y += distance * Math.sin(radians)
    return unless @pen
    @buffer += <<-END
  <line x1="#{oldx}" y1="#{oldy}" x2="#{@x}" y2="#{@y}"/>
    END
  end

We’d have enough now to actually render something if we had an initialize() method to setup @x, @y, @angle, and @pen. Then we could write:

  logo = Logo.new
  logo.move(100)
  logo.turn(90)
  logo.move(100)

That’s just Ruby though, so let’s add an eval method. Note that we wrap the string in parenthesis, so that the ‘sexp’ library will return us a list of all the commands.

  def eval(string)
    run("(#{string})".parse_sexp)
  end

Go, Turtle. Go!

Okay, let’s put multi to use again and build a method to run Logo S-expressions. We can put this in the initialize method along with the code to initialize our instance variables.

  def initialize
    @x, @y  = 100, 100
    @angle  = 0
    @buffer = "" 
    @pen    = true

    amulti(:run, :right, Numeric)   {|sym, r, rest| turn(r) ; run(rest)  }
    amulti(:run, :left, Numeric)    {|sym, l, rest| turn(-l) ; run(rest) }
    amulti(:run, :forward, Numeric) {|sym, f, rest| move(f) ; run(rest)  }
    amulti(:run, :penup)            {|sym, rest| @pen = true ; run(rest) }
    amulti(:run, :pendown)          {|sym, rest| @pen = false ; run(rest)}
    amulti(:run, :repeat, Numeric, Array) do |sym, i, code, rest|
      i.to_i.times{ run(code) }
      run(rest)
    end
    amulti(:run) {}
  end

So what’s going on here?

The ‘run’ function takes a list of commands and arguments and consumes them as appropriate. The first two definitions pull the :right or :left command off the list along with a number of degrees to rotate. The turn() function does the dirty work. Note that all of the ‘run’ bodies recurse on the remaining arguments.

The definition that matches :forward moves the turtle, :penup and :pendown sets the value of @pen, and, finally, :repeat takes a number of times to repeat and an array of code to run. So let’s try this out.

(You can download the complete source here)

Testing Time

Running this code …

  logo = Logo.new
  logo.eval %q{
    repeat 4 (forward 100 right 90)
    forward 50 right 90 forward 100
  }
  puts logo.render

... gives us:

Logo output 2

Looks right!

Let’s Quit While We’re Ahead

Pattern-matching multiple dispatch, S-expressions, and Logo, that’s a pretty good haul. So next time you’re missing Haskell, or Lisp, or something really weird like Logo, why not just stick with Ruby? Of course, you might have to steal a few things first …

Resources

[0] DSLs in OCaml
http://www.venge.net/graydon/talks/mkc/html/index.html

[1] Internal versus External DSLs
http://www.martinfowler.com/articles/languageWorkbench.html

[2] Why’s (Poignant) Guide to Ruby
http://poignantguide.net/ruby/

[3] Introduction to Logo
http://mckoss.com/logo/

Talk back!

Have an opinion? Readers have already posted 3 comments about this article. Why not add yours?

About the author

Jim Freeze has been a Ruby enthusiast since he learned of the language in early 2001.

An Electrical Engineer by trade working in the Semiconductor Industry, Jim has focused on extending Ruby into the EDA space and building libraries to make the language more palatable for the corporate community. Lately Jim has been working on integrating Ruby and Rails with Asterisk.

Jim is the author of the CommandLine and Stax gems.