The Artima Developer Community
Sponsored Link

The Explorer
The Adventures of a Pythonista in Schemeland/8
by Michele Simionato
October 25, 2008
Summary
In this episode I will explain the meaning of the "code is data" concept. To this aim I will discuss the quoting operation which allows to convert a code fragment into a list of symbols and primitive values - i.e. converts code into data. Then, I will discuss the issue of evaluating data as code.

Advertisement

Quoting

A distinguishing feature of the Lisp family of languages is the existence of a quoting operator denoted with a quote ' or with (quote ), the first form being syntactic sugar for the second. The quoting operator works as follows:

  1. on primitive values such as numbers, literal strings, symbols, etc, it works as an identity operator:
> '1
1
> '"hello"
"hello"
> ''a
'a
  1. expressions are converted into lists; for instance '(display "hello") is the list
> (list 'display '"hello")
(display "hello")

whereas '(let ((x 1)) (* 2 x)) is the list

> (list 'let (list (list 'x '1)) (list '* '2 'x))
(let ((x 1)) (* 2 x))

et cetera.

Hence every Scheme/Lisp program admits a natural representation as a (nested) list of symbols and primitive values: code is data. On the other hand, every nested list of symbols and primitive values corresponding to a syntactically valid Scheme/Lisp programs can be executed, both at runtime - with eval - or at compilation time - through macros. The consequences are fascinating: since every program is a list, it is possible to write programs that, by building lists, build other programs. Of course, you can do the same in other languages: for instance in Python you can generate strings corresponding to valid source code and you can evaluate such strings with various mechanisms (eval, exec, __import__, compile, etc). In C/C++ you can generate a string, save it into a file and compile it to a dynamic library, then you can import it at runtime; you also have the mechanism of pre-processor macros at your disposal for working at compile time. The point is that there is no language where code generation is as convenient as in Scheme/Lisp where it is buil-in, thanks to s-expressions or, you wish, thanks to parenthesis.

Quasi-quoting

In all scripting languages there is a form of string interpolation; for instance in Python you can write

def say_hello(user):
     return "hello, %(user)s" % locals()

In Scheme/Lisp, there is also a powerful form of list interpolation:

> (define (say-hello user)
    `("hello" ,user))

> (say-hello "Michele")
   ("hello" "Michele")

The backquote or (quasiquote ) syntax ` introduces a list to be interpolated (template); it is possible to replace some variables within the template, by prepending to them the unquoting operator (unquote ) or ,, denotated by a comma. In our case we are unquoting the user name, ,user. The function say-hello takes the user name as a string and returns a list containing the string "hello" together with the username.

There is another operator like unquote, called unquote-splicing or comma-at and written ,@, which works as follows:

> (let ((ls '(a b c))) `(func ,@ls))
(func a b c)

In practice ,@ls "unpacks" the list ls into its components: without the splice operator we would get:

> (let ((ls '(a b c))) `(func ,ls))
(func (a b c))

The power of quasiquotation stands in the code/data equivalence: since Scheme/Lisp code is nothing else than a list, it is easy to build code by interpolating a list template. For instance, suppose we want to evaluate a Scheme expression in a given context, where the contexts is given as a list of bindings, i.e. a list names/values:

(eval-with-context '((a 1)(b 2) (c 3))
  '(* a  (+ b c)))

How can we define eval-with-context? The answer is by eval-uating a template:

(define (eval-with-context ctx expr)
 (eval `(let ,ctx ,expr) (environment '(rnrs))))

Notice that eval requires a second argument that specifies the language known by the interpreter; in our case we declared that eval understands all the procedures and macros of the most recent RnRS standard (i.e. the R6RS standard). The environment specification has the same syntax of an import, since in practice it is the same concept: it is possible to specify user-defined modules as the eval environment. This is especially useful if you have security concerns, since you can run untrusted code in a stricter and safer subset of R6RS Scheme.

eval is extremely powerful and sometimes it is the only possible solution, in particular when you want to execute generic code at runtime, i.e. when you are writing an interpreter. However, often you only want to execute code known at compilation time: in this case the job of eval can be done more elegantly by a macro. When that happens, in practice you are writing a compiler.

Programs writing programs

http://www.phyast.pitt.edu/~micheles/scheme/writer.jpg

Once you realize that code is nothing else than data, it becomes easy to write programs taking as input source code and generating as output source code, i.e. it is easy to write a compiler. For instance, suppose we want to convert the following code fragment

(begin
 (define n 3)
 (display "begin program\n")
 (for i from 1 to n (display i)); for is not defined in R6RS
 (display "\nend program\n"))

which is not a valid R6RS program into the following program, which is valid according to the R6RS standard:

(begin
 (define n 3)
 (display "begin program\n")
 (let loop (( i 1)) ; for loop expanded into a named let
   (unless (>= i  n) (display i) (loop (add1 i))))
   (display "\nend program\n")))

begin is the standard Scheme syntax to group multiple expressions into a single expression without introducing a new scope (you may introduce a new scope with let) and preserving the evaluation order (in most functional languages the evaluation order is unspecified).

More in general, we want to write a script which is able to convert

(begin                                 (begin
  (expr1 ...)                            (expr1' ...)
  (expr2 ...)             -->            (expr2' ...)
   ...                                    ...
  (exprN ...))                           (exprN' ...))

where the expressions may be of kind for or any other kind not containing a subexpression of kind for. Such a script can be thought of as a preprocessor expanding source code from an high level language with a primitive for syntax into a low level language without a primitive for. Preprocessors of this kind are actually very primitive compilers, and Scheme syntax was basically invented to make the writing of compilers easy.

In this case you can write a compiler expanding for expressions into named lets as follows:

(import (rnrs) (only (ikarus) pretty-print))

;; a very low-level approach
(define (convert-for-into-loop begin-list)
  (assert (eq? 'begin (car begin-list)))
  `(begin
     ,@(map (lambda (expr)
              (if (eq? 'for (car expr)) (apply convert-for (cdr expr)) expr))
            (cdr begin-list))))

; i1 is subject to multiple evaluations, but let's ignore the issue
(define (convert-for i from i0 to i1 . actions)
  ;; from must be 'from and to must be 'to
  (assert (and (eq? 'from from) (eq? 'to to)))
  `(let loop ((i ,i0))
     (unless (>= i ,i1) ,@actions (loop (+ i 1)))))

(pretty-print
 (convert-for-into-loop
  '(begin
     (define n 3)
     (display "begin program\n")
     (for i from 1 to n (display i))
     (display "\nend program\n"))))

Running the script you will see that it replaces the for expression with a named let indeed. It is not difficult to extend the compiler to make it able to manage sub-expressions (the trick is to use recursion) and structures more general than begin: but I leave that as an useful exercise. In a future episode I will talk of code-walkers and I will discuss how to convert generic source code. In general, one can convert s-expression based source code by using an external compiler, as we did here, or by using the built-in mechanism provided by Scheme macros. Scheme macros are particularly powerful, since they feature extremely advanced pattern matching capabilities: the example I have just given, based on the primitive list operations car/cdr/map is absolutely primitive in comparison.

The next episode will be entirely devoted to macros. Don't miss it!

Appendix: solution of the exercises

In the latest episode I asked you to write an equivalent (for lists) of Python built-ins enumerate and zip. Here I give my solutions, so you may check them with yours.

Here the equivalent of Python enumerate:

(define (py-enumerate lst)
  (let loop ((i 0) (ls lst) (acc '()))
    (if (null? ls) (reverse acc)
        (loop (+ 1 i) (cdr ls) (cons `(,i ,(car ls)) acc)))))

and here is an example of usage:

> (py-enumerate '(a b c))
  ((0 a) (1 b) (2 c))

Here is the equivalent of Python zip:

(define (zip . lists)
 (apply map list lists))

Here is an example of usage:

> (zip '(0 a) '(1 b) '(2 c))
((0 1 2) (a b c))

Notice that zip works like the transposition operation in a matrix: given the rows, it returns the columns of the matrix.

Talk Back!

Have an opinion? Readers have already posted 11 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Michele Simionato adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Michele Simionato started his career as a Theoretical Physicist, working in Italy, France and the U.S. He turned to programming in 2003; since then he has been working professionally as a Python developer and now he lives in Milan, Italy. Michele is well known in the Python community for his posts in the newsgroup(s), his articles and his Open Source libraries and recipes. His interests include object oriented programming, functional programming, and in general programming metodologies that enable us to manage the complexity of modern software developement.

This weblog entry is Copyright © 2008 Michele Simionato. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use