The Artima Developer Community
Sponsored Link

Ruby Code & Style
If It's Not Nailed Down, Steal It
Pattern Matching, S-Expressions, and Domain Specific Languages in Ruby
by Topher Cyll
May 23, 2006

<<  Page 3 of 5  >>

Advertisement

Theft #2: S-Expressions

Symbolic expressions, or, more commonly, S-expressions, are a rarity found almost exclusively in the land of Lisp. Here’s are some examples:

  (+ 8 (- 4 2))  
  (if (not nil) t (and nil t))

Both of these s-expressions are Lisp code. The first is just math expressed in prefix notation common to Lisp code. The second uses an if statement and some logical operators. The if statement is not so different from a ruby statement, though the use of ‘t’ for truth and ‘nil’ for false may seem a bit strange a first.

And here’s a more complicated example of S-expressions used as data, not code:

  (content
    (title "S-Expression Demo")
    (rated 5.0))

In it’s simplest form, an S-expression is a list of symbols. Lists look like the contents of parenthesis separated by spaces. Symbols are barewords. So:

  (this is an sexpression)

is a list containing the symbols ‘this’, ‘is’, ‘an’, ‘sexpression’. Most Lisp implementations also accept string and number literals, so we can write things like in the example s-expressions above.

Ruby gives us everything we need to represent S-expressions. Ruby list literals are written in enclosing brackets ([...]) and use commas as separators. Symbols are written as barewords proceeded by a colon (:).

Given that information, what does our simple example look like in Ruby?

  [:this, :is, :an, :sexpression]

Not nearly as pretty, especially when things get more complicated.

  [:content,
   [:title, "S-Expression Demo"],
   [:rated, 5]]

The commas are what really kill it for me, but the colons aren’t so hot either. Luckily, there’s a Ruby gem to let us write Ruby S-expressions in Lisp syntax.

Another Gem

First install the gem:

  $ gem install -r sexp

Then, remember to require the gem:

  #!/usr/bin/ruby -w
  require 'rubygems'
  require 'sexp'

Now let’s take it for a test run:

  "(this is an sexpression)".parse_sexp ===> [:this, :is, :an, :sexpression]
  '(content
    (title "S-Expression Demo")
    (rated 5))'.parse_sexp  

    ===> [:content, [:title, "S-Expression Demo"], [:rated, 5.0]]

Can it also do the reverse?

  [:this, :is, :an, :sexpression].to_sexp ===> "(this is an sexpression)" 

It sure can.

S-Expressions are Built with smulti

The S-Expression parser is built almost entirely using the smulti mechanism we talked about earlier. It uses regular expressions to tear chunks off the front of strings and builds objects out of them. As described above, it parses lists, symbols, strings, and numbers. Let’s look at the parse() method for SExpressionParser::Main.

  smulti(:parse, /\s+/)    {|c, rest| parse(rest)                }
  smulti(:parse, /\(/)     {|c, rest| @res = List.new(rest)      }
  smulti(:parse, /\"/)     {|c, rest| @res = String.new(rest)    }
  smulti(:parse, NumberRE) {|c, rest| @res = Number.new(rest, c) }
  smulti(:parse, SymbolRE) {|c, rest| @res = Symbol.new(rest, c) }

The top dispatch says if the first character is any whitespace character, to ignore it and parse the rest of the string minus that character. However, the second and third dispatch are much more interesting. They match the opening character for lists and strings accordingly. If they are triggered, then the text after the opening character will be passed into a specialized parser. The same happens for Numbers and Symbols, although their regexps are seperated out for readability.

While the leading characters to lists and strings are typically thrown away, we identify symbols and numbers when we see a character or a digit accordingly. Well, actually a digit or a leading period for decimal numbers. These first characters are part of the symbol or number, so we can’t just throw them out. That’s why there are passed in to their specialized parsers, unlike the others.

  smulti(:parse, NumberRE) {|c, rest| @res = Number.new(rest, c) }
  smulti(:parse, SymbolRE) {|c, rest| @res = Symbol.new(rest, c) }

We don’t need to look at all the sub parsers, but let’s peak into the List parser class. It’s parse method is very simple.

  smulti(:parse, /\)/  ) {|s, rest| leave(rest) }
  smulti(:parse, /\s+/ ) {|s, rest| parse(rest) }
  smulti(:parse, //    ) {|s, rest|
    item = Main.new(rest)
    add(item.value)
    parse(item.unwanted)

  }

As in the main parser, white space is thrown away and parsing continues.

  smulti(:parse, /\s+/ ) {|s, rest| parse(rest) }

If at any point we find the closing parenthesis, we call the inherited leave() method. leave() is our way of telling the List parser that we’re done. It stores the remaining text in an instance variable where it can be retrieved by someone else (in this case the Main parser that called us) and returns.

  smulti(:parse, /\)/  ) {|s, rest| leave(rest) }

However, in all other situations, the text is actually passed of to a new Main parser that can handle any of the basic s-expression types that we could encounter anywhere, even inside a list. This parser’s task is to make sense of the numbers, strings, symbols, or nested lists that this list might contain. And when the Main parser completes, we add the newly created item to our list of contents and continue parsing whatever comes next.

As parsing goes, s-expressions are not a difficult exercise. But it’s nice to see we can use multiple dispatch to quickly throw together a parser. And the fact of the matter is, the real fun of s-expressions is not parsing them, but using them.

So shall we put our previous heists to work for us and steal just one more thing?

<<  Page 3 of 5  >>


Sponsored Links



Google
  Web Artima.com   
Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us