Summary
In this episode I explain how to break hygiene in Scheme macros by using datum->syntax. I discuss how you can play
with the lexical context. Finally, I spend some word on non-hygienic macro systems versus hygienic systems.
Advertisement
Breaking hygiene
In the previous episode I said that hygienic macros are good, since
they solve the variable capture problem. However, purely hygienic macros
introduce a problem of their own, since they
make it impossible to introduce variables at all.
Consider for instance the following trivial macro:
(def-syntax (define-a x)
#'(define a x))
(define-ax)apparently expand to (defineax), so
you may find the following surprising:
Why is the variable a not bound to 1? The problem is that hygienic
macros never introduce identifiers implicitly.
Auxiliary names introduced in a macro are not visible outside and the
only names which enter in the expansion are the ones we put in.
A mechanism to introduce identifiers, i.e. a mechanism
to break hygiene, is needed if you want to define binding forms.
Scheme has a builtin mechanism to break hygiene, and we already saw
it: it is the datum->syntax utility which converts literal
objects (datums) into syntax objects. I have shown
datum->syntax at work in episodes 27 and 28 : it was used there
to convert lists describing source code into syntax objects. A more
typical use case for datum->syntax is to turn symbols into proper
identifiers. Such identifiers can then be introduced in macros and
made visible to expanded code.
In order to understand the mechanism, you must always remember
that identifiers in Scheme - in the technical sense of objects
recognized by the identifier? predicate - are not just raw
symbols, they are syntax objects with lexical information attached to
them. If you want to turn a raw symbol into an identifier you must add
the lexical information to it, and this is done by copying the lexical
information coming from the context object in datum->syntax.
For instance, here is how
you can "fix" the macro define-a:
The symbol 'a here is being promoted to a bona fide identifier,
by adding to it the lexical context associated to the macro name.
You can check that the identifier a is really introduced as
follows:
> (define-a* 1)
> a
1
A more realistic example is to use syntax->datum
to build new identifiers from strings.
For that purpose I have added an identifier-append
utility in my (apslang) library, defined as follow:
;; take an identifier and return a new one with an appended suffix
(define (identifier-append id . strings)
(define id-str (symbol->string (syntax->datum id)))
(datum->syntax id (string->symbol (apply string-append id-str strings))))
Here is a simple def-book macro using identifier-append:
(def-syntax (def-book name title author)
(with-syntax (
(name-title (identifier-append #'name "-title"))
(name-author (identifier-append #'name "-author")))
#'(begin
(define name (vector title author))
(define name-title (vector-ref name 0))
(define name-author (vector-ref name 1)))))
def-book here is just as an example of use of identifier-append,
it is not as a recommended pattern to define records.
There are much better ways to define records in Scheme, as we will see
in part VI of these Adventures.
Anyway, def-book works as follows.
Given a single identifier name and two values it
introduces three identifiers
in the current lexical scope: name (bound to a vector
containing the two values), name-title (bound to the
first value) and name-author (bound to the second value).
> (def-book bible "The Bible" "God")
> bible
#("The Bible" "God")
> bible-title
"The Bible"
> bible-author
"God"
The lexical context is just the set of
names which are visible to an object in a given lexical position
in the source code. Here is an example of a lexical context which
is particularly restricted:
The identifier #'here only sees the names define,
syntax and dummy-ctxt: this is the lexical
context of any object in its position in the source code. Had we not
restricted the import, the lexical context of #'here would have
been the entire rnrs set of identifiers. We can use dummy-ctxt
to expand a macro into a minimal context. Here is an example of
a trivial macro expanding into such minimal context:
The macro expand-to-car expands to a syntax object obtained by
attaching to the symbol 'car the lexical context dummy-ctxt.
Since in such lexical context the built-in car is not defined,
the expansion fails:
A similar macro expand-to-dummy-ctxt instead would succeed since
dummy-ctxt is bound in that lexical context:
> (def-syntax expand-to-dummy-ctxt
(lambda (x) (datum->syntax dummy-ctxt 'dummy-ctxt)))
> (expand-to-dummy-ctxt)
#<syntax here [char 115 of /home/micheles/gcode/scheme/aps/dummy-ctxt.sls]>
In the definition of define-macro I gave
in episode 28 I used the name of the defined macro as lexical
context. The consequence of this choice is that define-macro style
macros are expanded within the lexical context of the code
where the macro is invoked. For instance in this example
> (let ((x 42))
(define-macro (m) 'x) ; (m) should expand to 42
(let ((x 43))
(m)))
43 ; surprise!
(m) expand to 43 since in the lexical context where the macro
is invoked x is bound to 43. However, this behavior is quite
surprising, and most likely not what it is wanted. This is actually
another example of the free-symbol capture problem. It should be
clear that the capture comes from expanding the macro in the macro-call
context, not in the macro-definition context.
Understanding non-hygienic macros is important if you intend to work
in the larger Lisp world. In the scheme community
everybody thinks that hygiene is an essential feature and
all major Scheme implementations provide hygienic macros; nevertheless,
in the rest of the world things are different.
For instance, Common Lisp does not use hygienic macros and it copes with the
variable capture problem by using gensym; the free symbol
capture problem is not solved, but it is extremely rare, because
Common Lisp has multiple namespaces and a package system.
The hygiene problem is more serious in Lisp-1 dialects like the
newborns Arc and Clojure. Arc macros behave just like
define-macro and are fully unhygienic, whereas Clojure macros are
rather special, being nearly hygienic. In particular Clojure
macros are not affected by the free-symbol capture problem:
The reason is that Clojure is smart enough to recognize the fully
qualified list object appearing at macro definition
time (clojure.core/list) as a distinct object from the local
variable list bound to the number 1. Moreover, the ordinary
capture problem can be solved with gensym or even cooler feature,
automatic gensyms (look at the documentation of the syntax-quote
reader macro if you want to know more). Speaking as a
non-expert, Clojure macros seem to fare pretty well with respect to
the hygiene issue.
It is worth mentioning that if you use a package system (like in
Common Lisp) or a namespace system (like in Clojure) in practice
variable capture becomes pretty rare. In Scheme instead, which uses a
module system, hygiene is essential: if you are writing a module
containing macros which can be imported and expanded in an unknown
lexical scope, in absence of hygiene you could introduce name clashes
impossible to foresee in advance, and that could be solved only by the
final user, which however will likely be ignorant of how your library
works.
This is why in Scheme the macro expansion is not literally
inserted in the original code, and a lot of magic takes place to avoid
name clashes. In practice, the implementation of Scheme macros takes
care of distinguishing the introduced identifiers with some specific
mechanism (it could be based on marking the names, or on explicit
renaming). As a consequence, the mechanism of macro expansion is less
simple to explain: you cannot just cut and paste the result
of the expansion in your original code.
Personally I have made my mind up and I am in the pro-hygiene camp
now. I should admit that for a long time I have been in the opposite
camp, preferring the simplicity of define-macro over the
complexity of syntax-case. It turns out I was wrong. The major
problem of syntax-case is a cosmetic one: it looks very complex
and cumbersome to use, but that can be easily solved by providing a
nicer API - which I did with sweeet-macros. Actually I have
been able to use sweet-macros for twenty episodes without
explaining the intricacies of the hygienic expansion.
Having attended to a talk on the subject at the EuroLisp Symposium,
I will mention here that there are ways to implement hygienic
macros on top of defmacro in Common Lisp portably. Therefore
there is no technical reason why hygienic macros are not widespread in
the whole Lisp world, just a matter of different opinions on the
importance of the problem and the different tradeoffs. I believe that
eventually all Lisp dialects will start using hygienic macros, but
that could take decades, because of inertia and backward-compatibility
concerns.