Scala Buzz - My first ICFP contest, Part 2: parser combinators are so cool

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Scala Buzz
My first ICFP contest, Part 2: parser combinators are so cool

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eric Torreborre

Posts: 36
Nickname: etorreborr
Registered: Oct, 2007

Eric Torreborre is Tokyo programmer in love with Scala (and the author of the specs framework)

My first ICFP contest, Part 2: parser combinators are so cool

Posted: Jul 27, 2008 1:07 AM

This post originated from an RSS feed registered with Scala Buzz by Eric Torreborre.
Original Post: My first ICFP contest, Part 2: parser combinators are so cool Feed Title: A++ [Eric Torreborre's Blog] Feed URL: http://etorreborre.blogspot.com/feeds/posts/default/-/scala Feed Description: This blog is "Software on my mind" and I try to write one post a month. As Scala is a lot on my mind recently, you can expect more Scala posts quite soon!	Latest Scala Buzz Posts Latest Scala Buzz Posts by Eric Torreborre Latest Posts From A++ [Eric Torreborre's Blog]

This is the sequel to my last post on participating to the ICFP contest 2008.

The Setup

As Curt wrote it, we entered the contest doing things that should have been done well before:

setting up the wifi access so that I can connect with my laptop (it never worked so Bryan had to get out and buy new network cables)
exchanging keys to access the Subversion repository
learning how to use the build and test system

I guess that we need to make some mistakes (at least) once oneself before getting it!

However, the biggest lesson for me in this contest, was the use of Parser Combinators.

Do you speak Martian?

The second programming task in the contest, after setting up the network connection to be able to receive messages and send back commands, was to parse the input messages giving the position of the Martian rover, and its immediate environment.

As the task specifies, there are 5 kinds of messages:

initialization messages -> what's the map like? what are you technical constants?
telemetry stream -> where are you, what's your speed, what's around?
adverse event -> crashed in a crater!
success -> yes, home!
end of run -> time and status

The format of those messages is not particularly complex but the telemetry message can contain an unlimited number of obstacles descriptions for example.

For many developers, the most obvious way to transforms meaningless strings to meaningful objects like Telemetry or Martian, would be to start traversing the string, check for characters like 'T' for Telemetry, or 'I' for Initialization then pass the rest of the string to another function and continue slicing the string until all data is analyzed.

Parser combinators

But Haskell developers know better,... They can just describe the grammar of the messages they're parsing by combining parsers, then applying the resulting parser to the string input.

Something like that:


telemetry :: Parser Message
telemetry = do ts <- int
               state <- accelStateParser
               turning <- turningStateParser
               spaces
               x <- double
               y <- double
               dir <- double
               speed <- double
               obstacles <- obstaclesParser
               messageEnd
               return $ Telemetry ts state turning x y dir speed obstacles

accelStateParser :: Parser AccelState
accelStateParser = (accelerating <|> braking <|> rolling)

accelerating = stateParser 'a' Accelerating
braking = stateParser 'b' Braking
rolling = stateParser '-' Rolling

stateParser :: Char -> a -> Parser a
stateParser c cons = do _ <- char c
                        return cons

turningStateParser :: Parser TurningState
turningStateParser = hardleft <|> softleft
                         <|> hardright <|> softright
                         <|> straight

hardleft  = stateParser 'L' HardLeft
softleft  = stateParser 'l' SoftLeft
hardright = stateParser 'R' HardRight
softright = stateParser 'r' SoftRight
straight  = stateParser '-' Straight
obstaclesParser :: Parser [Object]
obstaclesParser = do l <- many obstacleParser
                     return l

obstacleParser :: Parser Object
obstacleParser = do code <- oneOf "bchm"
                    spaces
                    case code of
                         'b' -> object Boulder
                         'c' -> object Crater
                         'h' -> object Home
                         'm' -> martian
                         _   -> error "Can't get here in parser!"

object :: (Circle -> a) -> Parser a
object cons = do x <- double
                 y <- double
                 r <- double
                 spaces
                 return $ cons (Circle (x:.y) r)

martian :: Parser Object
martian = do x <- double
             y <- double
             dir <- double
             speed <- double
             spaces
             return $ Martian (Circle (x:.y) 0.4) dir speed

In the above code, you can read that the parser for telemetry messages is a sequence of different parsers: a timestamp parser, a acceleration state parser, a turning state parser, ..., a parser for lists of obstacles. And a parser for list of obstacles is just "many" parsers for one obstacle. This is indeed as easy as being able to describe a grammar for that mini-language.

"k" should better be 1

But I didn't say that describing a grammar was easy! For instance, at first, we made a mistake in the way we managed spaces. We had almost each parser trying to consume spaces before doing its job. This generated unnecessary ambiguity in the parsing.

For example, let's say that you want to be able to parse this kind of message:


" O T O O T"

(honestly, I don't know why you would like to do something like that but you're the boss here)

You'd better combine those 3 parsers:

spaces -> will consume any number of spaces

oneParser = char 'O' >>= spaces -> will consume 'O' followed by any number of spaces

twoParser = char 'T' >>= spaces -> will consume 'T' followed by any number of spaces

If you define parsers differently, having "oneParser" and "twoParser" consuming spaces before they parse their character, which parser should be used on the first space of the message?

Actually, parser combinators in Haskell are also capable of backtracking and "unparsing" results if the chosen parser was not appropriate (trying things with the the "try" parser combinator). But why add more complexity when things can be described more directly?

Of course, there is a formal theory behind the example above: LL(k) grammars, where LL(1) is a type of grammar only needing to look at the next character to know which parser to use. But there's nothing like a simple "real-life" example to be able to grasp it.

Compare and contrast

Now, honestly, most other contest solutions were not using Parser combinators and the aren't soooo ugly (in Dylan for example). I think they're just less readable and more error-prone because of index manipulation.

For fun and comparison, I also developed a similar parser in Scala:


  def telemetryMessage = for { timestamp    <- intNumber
                               accState     <- accelState
                               turnState    <- turningState
                               vehicleX     <- doubleNumber
                               vehicleY     <- doubleNumber
                               vehicleDir   <- doubleNumber
                               vehicleSpeed <- doubleNumber
                               obstacles    <- objects }
                          yield Telemetry(timestamp, accState(), turnState(),
                                          vehicleX, vehicleY, vehicleDir, 
                                          vehicleSpeed, obstacles)
                        
  def accelState = (  
      "a" ^^^ Accelerating
    | "b" ^^^ Braking
    | "-" ^^^ Rolling
  )
 
  def turningState = (  
      "l" ^^^ SoftLeft
    | "L" ^^^ HardLeft
    | "-" ^^^ Straight
    | "r" ^^^ SoftRight
    | "R" ^^^ HardRight
  )

  def objects = rep(objectParser)
  def objectParser: Parser[Object] = (
      "b" ~ spaces ~> circleParser ^^ (new Boulder(_))
    | "c" ~ spaces ~> circleParser ^^ (new Crater(_))
    | "h" ~ spaces ~> circleParser ^^ (new Crater(_))
    | "m" ~ spaces ~> martianParser
  )
     
  def circleParser = for { x <- doubleNumber
                           y <- doubleNumber
                           r <- doubleNumber }
                     yield new Circle(x, y, r)
                    
  def martianParser = for { circle <- circleParser
                            dir    <- doubleNumber
                            speed  <- doubleNumber }
                      yield new Martian(circle, dir, speed)

The interesting things in the Scala implementation, although it could seem a bit ugly at first sight, are:

the use of a sequence operator "~>" getting rid of what has already been parsed. Indeed, when we have parsed "b " we know we need to create a Boulder object. The 'b' character and the space are then not usefull anymore

the use of functions to transform the parsed string into a meaningful object, like "l" ^^^ SoftLeft, creating a new SoftLeft object (it's a case class) when parsing "l"

implicit definitions. In the example above I don't need to declare my parser for 'l' as letter("l"). Combining "l" with another parser automatically creates a parser for the character 'l'

Why XML?

One of the reasons for the ubiquitous adoption (and hate!) of XML is, I think, the general availability of parsers and binding libraries. This is why this often becomes the easiest option for configuration files (is it changing with YAML and JSON?).

It doesn't have to be that way. I hope that parser combinators can now bring another legitimate option when considering the implementation of an external mini-language for configuration or protocol. They're simple to use and readily available on your Java platform through Scala. Try them at home!

Read: My first ICFP contest, Part 2: parser combinators are so cool

Previous Topic

Next Topic


	Web Artima.com