Sponsored Link •

Weblogs Forum
The Adventures of a Pythonista in Schemeland/7

18 replies on 2 pages. Most recent reply: Oct 20, 2008 10:17 PM by Michele Simionato

 Welcome Guest   Sign In

 Back to Topic List Reply to this Topic Search Forum Threaded View
 Previous Topic Next Topic
 Flat View: This topic has 18 replies on 2 pages [ 1 2 | » ]
 Michele Simionato Posts: 222 Nickname: micheles Registered: Jun, 2008
The Adventures of a Pythonista in Schemeland/7 (View in Weblogs)
Posted: Oct 20, 2008 10:17 PM
Summary
In this installment I pave the way to the heart of Lisp, i.e. to the famous *code is data* concept. In order to do that, I will have to introduce two fundamental data types first: symbols and lists.
 Advertisement

# Symbols

Scheme and Lisp have a particular data type which is missing in most languages (with the exception of Ruby): the symbol.

From the grammar point of view, a symbol is just a quoted identifier, i.e. a sequence of characters corresponding to a valid identifier preceded by a quote. For instance, 'a, 'b1 e 'c_ are symbols. On the set of symbols there is an equality operator eq? which is able to determine if two symbols are the same or not:

```> (define sym 'a)
> (eq? sym 'b)
#f
> (eq? sym 'a)
#t
```

#f e #t are the Boolean values False and True respectively, as you may have imagined. The equality operator is extremely efficient on symbols, since the compiler associates to every symbol an integer number (this operation is called hashing) and stores it in an interal registry (this operation is called interning): when the compiler checks the identity of two symbols it actually checks the equality of two integer numbers, which is an extremely fast operation.

You may get the number associated to a symbol with the function symbol-hash:

```> (symbol-hash sym)
117416170
> (symbol-hash 'b)
134650981
> (symbol-hash 'a)
117416170
```

It is always possible to convert a string into a symbol and viceversa thanks to the functions string->symbol and symbol->string, however conceptually - and also practically - symbols in Scheme are completely different from strings.

The situation is not really different in Python. It is true that symbols do not exist as a primitive data type, however strings corresponding to names of Python objects are actually treated as symbols. You can infer this from the documentation about the builtin functions hash e intern, which says: normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys. BTW, if you want to know exactly how string comparison works in Python I suggest you to look at this post:

Scheme has much more valid identifiers than Python or C, where the valid characters are restricted to a-zA-Z-0-9_ (I am ignoring the possibility of having Unicode characters in identifiers, which is possible both in R6RS Scheme and Python 3.0). By convention, symbols ending by ? are associated to boolean values or to boolean-valued functions, whereas symbols ending by ! are associated to functions or macros with side effects.

The function eq?, is polymorphic and works on any kind of object, but it may surprise you sometimes:

```> (eq? "pippo" "pippo")
#f
```

The reason is that eq? (corrisponding to is in Python) checks if two objects are the same object at the pointer level, but it does not check the content. Actually, Python works the same. It is only by accident than "pippo" is "pippo" returns True on my machine, since the CPython implementation manages differently "short" strings from "long" strings:

```>>> "a"*10 is "a"*10 # a short string
True
>>> "a"*100 is "a"*100  # a long string
False
```

If you want to check if two objects have the same content you should use the function equal?, corresponding to == in Python:

```> (equal? "pippo" "pippo")
#t
```

It you know the type of the objects you can use more efficient equality operators; for instance for strings you can use string=? and for integer numbers =:

```> (string=? "pippo" "pippo")
#t

> (= 42 42)
#t
```

To be entirely accurate, in addition to eq and equal, Scheme also provides a third equality operator eqv?. eqv? looks to me like an useless complication of the language, so I will not discuss it. If you are interested, you can read what the R6RS document says about it.

# Lists

The original meaning of LISP was List Processing, since lists were the fundamental data type of the language. Nowadays Lisp implements all possible data types, but still lists have a somewhat privileged position, since lists can describe code. A Lisp/Scheme list is a recursive data type such that:

1. the list is empty: '()
2. the list is the composition of an element and a list via the cons operation (cons stands for constructor): (cons elem lst).

For instance, one-element lists are obtained as composition of an element with the empty list:

```> (cons 'x1 '()); one-element list
(x1)
```

Two-elements lists are obtained by composing an element with a one-element list:

```> (cons 'x1 (cons 'x2 '())); two-element list
(x1 x2)
```

That generalizes to N-element lists:

```> (cons 'x1 (cons 'x2 (cons 'x3 ..... (cons 'xN '()))) ...)
(x1 x2 x3 ... xN)
```

For simplicity, the language provides an N-ary list constructor list

```> (list x1 x2 x3 ... xN)
(x1 x2 x3 ... xN)
```

but the expression (list x1 ... xN) is nothing else than syntactic sugar for the fully explicit expression in terms of constructors.

There are also the so-called improper lists, i.e. the ones where the second argument of the cons is not a list. In that case the representation of the list displayed at the REPL contains a dot:

```> (cons 'x1 'x2) ; improper list
(x1 . x2)
```

It is important to remember that improper lists are not lists, therefore operations like map, filter and similar do not work on them.

As we anticipated in episode 4, the first element of a list (proper or improper) can be extracted with the function car; the tail of the list instead can be extracted with the function cdr. If the list is proper, its cdr is proper:

```> (cdr (cons 'x1 'x2))
x2

> (cdr (cons 'x1 (cons 'x2 '())))
(x2)
```

At low level Scheme lists are implemented as linked list, i.e. as couples (pointer-to-sublist, value) until you arrive at the null pointer.

# Some example

To give an example of how to build Scheme lists, here I show you how you could define a range function analogous to Python range. Here are the specs:

```> (range 5); one-argument syntax
(0 1 2 3 4)
> (range 1 5); two-argument syntax
(1 2 3 4)
> (range 1 10 2); three-argument syntax
(1 3 5 7 9)
> (range 10 0 -2); negative step
(10 8 6 4 2)
> (range 5 1); meaningless range
()
```

Here is the implementation:

```(define range
(case-lambda
((n); one-argument syntax
(range 0 n 1))
((n0 n); two-argument syntax
(range n0 n 1))
((n0 n s); three-argument syntax
(assert (and (for-all number? (list n0 n s)) (not (zero? s))))
(let ((cmp (if (positive? s) >= <=)))
(let loop ((i n0) (acc '()))
(if (cmp i n) (reverse acc)
(loop (+ i s) (cons i acc))))))))
```

Here case-lambda is a syntactic form that allows to define functions with different behavior according to the number of arguments. for-all is an R6RS higher order function: (for-all pred lst) applies the predicate pred to the elements of list lst, until a false value is found - in that case it returns #f - otherwise it returns #t. Here the assertion checks at runtime that all the passed arguments n0, n and s are numbers, with s non-zero.

The first let defines a comparison function cmp which is >= if the step s is positive, or <= if the step s is negative. The reason is clear: if s is positive at some point the index i will get greater than n, whereas if s is negative at some point i will get smaller than n.

The trick used in the loop is extremely common: instead of modifying a pre-existing list, at each iteration a new list is created ex-novo by adding an element (cons i acc); at the end the accumulator is reversed (reverse acc). The same happens for the counter i which is never modified.

This is an example of functional loop; imperative loops based on mutation are considered bad style in Scheme and other functional languages. Notice by contrast that in Common Lisp imperative loops are perfectly acceptable and actually there is a very advanced LOOP macro allowing you to do everything with imperative loops.

Actually, the range function defined here is more powerful than Python's range, since it also works with floating point numbers:

```> (range 1.3 2.5 .25)
(1.3 1.55 1.8 2.05 2.3)
```

As always, the only way to really get Scheme lists is to use them. I suggest you try the following exercises:

1. implement an equivalent of Python enumerate for Scheme lists;
2. implement an equivalent of Python zip for Scheme lists.

I will show the solutions in the next episode. If you are lazy and you want an already written list library, I suggest you to give a look at the SRFI-1 library, which is very rich and available practically in all Scheme implementations. Many of the SRFI-1 features are built-in in the R6RS standard, but many other are still available only in the SRFI-1 .

 James Watson Posts: 2024 Nickname: watson Registered: Sep, 2005
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 21, 2008 2:18 PM
Whenever I've read articles about LISP, there's often a focus on the idea that LISP doesn't make much distinction between code and data. This is generally touted as a really awesome feature but that always leaves me a little puzzled.

Back when I was getting my degree my favorite CS professor hammered it into our heads that as far as the machine was concerned, there was no distinction between instructions and data. Instructions are just data that is interpreted by the machine. So that's the first thing that puzzles me: why is this treated as if it's something invented by LISP? It's a fundamental feature of the computer.

So given that this is not really special, why is it that most languages don't work this way? The answer, as far as I can tell, is that people don't want them to. It's a powerful feature, no doubt, but it's also a really dangerous one. The distinction between code and data is deliberate. On a side note, it's also a core concern of computer security specialists.

I recently had a colleague mention to me that he was working on a system I had helped implement. He mentioned that he wanted to get rid of a bunch of stuff that he didn't think was useful. I agreed that it was an option but that the stuff in question had been added because it addressed a major, inherent problem that had come up in the past. I can't get past the feeling that the 'code is data' aspect of LISP is a similar situation. Maybe someone can help me understand.

 Michele Simionato Posts: 222 Nickname: micheles Registered: Jun, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 21, 2008 10:07 PM
> Back when I was getting my degree my favorite CS professor
> hammered it into our heads that as far as the machine was
> concerned, there was no distinction between instructions
> and data. Instructions are just data that is interpreted
> by the machine. So that's the first thing that puzzles
> me: why is this treated as if it's something invented by
> LISP? It's a fundamental feature of the computer.

Right, but only in LISP code is naturally represented as nested lists and only in LISP you have powerful built in facilities to manage lists. Scheme adds pattern matching as the natural way of defining macros, but you will see all of that in the next episodes.

> So given that this is not really special, why is it that
> most languages don't work this way? The answer, as far as
> I can tell, is that people don't want them to. It's a
> powerful feature, no doubt, but it's also a really
> dangerous one. The distinction between code and data is
> deliberate.

I agree. There may be various reasons for that, more or less valid (people do not stand parenthesis, you can actually do a lot even in languages without the code==data concept, debugging a program without macros is simpler etc.)

> On a side note, it's also a core concern of
> computer security specialists.

I do not follow you. Are you referring to the security risk
of evaluating untrusted code? But macros affect the user program, not code of unknown origin.

> I recently had a colleague mention to me that he was
> working on a system I had helped implement. He mentioned
> that he wanted to get rid of a bunch of stuff that he
> didn't think was useful. I agreed that it was an option
> but that the stuff in question had been added because it
> addressed a major, inherent problem that had come up in
> the past. I can't get past the feeling that the 'code is
> data' aspect of LISP is a similar situation. Maybe
> someone can help me understand.

Not clear what you mean. Anyway, if your language does not support the code==data concept, you have to reinvent it somehow. For instance, consider GUI builders, where you write an XML representation of our widgets (remember, XML is nothing else than a verbose implementation of s-expressions) and there is a framework which is able to transform your XML representation in your hosting languages classes. Here the framework is converting data (the XML) in code. In Lisp this functionality is built-in and you do not need a big framework and another language to learn.

This in theory: in practice it may be easier to use a GUI builder written by others in a mainstream language than to write your own in LISP, but this it another question, the well know old issue of libraries.

 Aaron Maenpaa Posts: 4 Nickname: zacherates Registered: Nov, 2006
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 22, 2008 6:11 AM
> > On a side note, it's also a core concern of
> > computer security specialists.
>
> I do not follow you. Are you referring to the security
> risk
> of evaluating untrusted code? But macros affect the user
> program, not code of unknown origin.

He's referring to the fact that the machine's unwillingness to distinguish between code and data is the enabling factor behind buffer overflow exploits. C and the underlying machine are quite willing to let you write past the end of an input buffer, overwrite the return address and jump into the middle of the stack and run the exploit payload.

If, on the other hand, the machine distinguished between code and data, it would complain when you overwrote the return address or at least when you jumped into the middle of what, in theory, is supposed to be a string.

 James Watson Posts: 2024 Nickname: watson Registered: Sep, 2005
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 22, 2008 6:45 AM
> Not clear what you mean. Anyway, if your language does not
> support the code==data concept, you have to reinvent it
> somehow. For instance, consider GUI builders, where you
> write an XML representation of our widgets (remember, XML
> is nothing else than a verbose implementation of
> s-expressions) and there is a framework which is able to
> transform your XML representation in your hosting
> languages classes. Here the framework is converting data
> (the XML) in code. In Lisp this functionality is built-in
> and you do not need a big framework and another language
> to learn.

I could quibble over 'reinvent' but yes, sort of. But there is a distinction. These kinds of facilities are targeted. They only do a small set of things. They are not fully functional languages. They are restricted (ideally) to operations that are known to be safe. Of course there is a trade-off that extra effort must be put into doing these things as opposed to what is required in LISP.

The key point is that it seems to me that a lot of LISP people assume that this is just a oversight or error on the language designers part. There isn't any consideration that perhaps there was a specific reason for not supporting it. Just like my colleague never considered that the 'unnecessary' stuff that was in our design was there for a specific an important reason.

 James Watson Posts: 2024 Nickname: watson Registered: Sep, 2005
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 22, 2008 7:06 AM
> > > On a side note, it's also a core concern of
> > > computer security specialists.
> >
> > I do not follow you. Are you referring to the security
> > risk
> > of evaluating untrusted code? But macros affect the
> user
> > program, not code of unknown origin.
>
> He's referring to the fact that the machine's
> unwillingness to distinguish between code and data is the
> enabling factor behind buffer overflow exploits. C and the
> underlying machine are quite willing to let you write past
> the end of an input buffer, overwrite the return address
> and jump into the middle of the stack and run the exploit
> payload.

That's one manifestation of the issue at hand but it's a more general idea. All professional developers that work with SQL in code (should) know that you should almost always use prepared statements against the database. There's a performance benefit but more importantly, it's much safer.

I was in a discussion not that long ago about using XML in logs and someone pointed out that XML is basically a crappy reinvention of LISP (which I agree with if you are taking about XML used for writing code) and that writing logs in LISP was a really effective approach because you could execute the logs.

I have to admit something like that is really sexy and cool but it also strikes me as extremely dangerous. Maybe there's a reason that prevents bad things from happening, but it seems to me that this kind of thing has the same fundamental flaw that allows for SQL injection. You are executing code that was written based on the actions of someone (or something) that you don't control. Maybe an attacker knows enough about the system to force it to write certain log statements that compromise the system or perhaps there are corner cases that cause problematic code to be written to the logs. I don't think humans can effective traverse all the possibilities and when you have an open ended system, you are playing with fire.

AS I understand it, macros can write macros and I don't see anything that prevents them from acting like viruses. I may be making a big deal out of nothing. It's just something that I think about when I read about LISP. A more mundane concern is that these features would make code unmaintainable. Maybe not in a one person project but on team projects, I don't see how much use of these features could be sustained.

 Michele Simionato Posts: 222 Nickname: micheles Registered: Jun, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 22, 2008 7:27 AM
> AS I understand it, macros can write macros and I don't
> see anything that prevents them from acting like viruses.

I see a bit of paranoia here. I do not think this is an issue.

> I may be making a big deal out of nothing. It's just
> t something that I think about when I read about LISP. A
> more mundane concern is that these features would make
> code unmaintainable. Maybe not in a one person project
> but on team projects, I don't see how much use of these
> features could be sustained.

You have a point here. Actually I am not an advocate of macros exactly for that reason. Macros are very cool for a language implementor, less so for an application programmer.
Notice that the goal of my *Adventures* is to spread cool ideas, but "cool" does not necessarily means "good".
Anyway, episode #12 will be entirely devoted to discussing these issues. You have to wait a bit, because first I have to explain macros, and that will take episodes #9,#10 and #11 (I will explain just the easiest things). Episode #8
will discuss the code==data equation.

 James Watson Posts: 2024 Nickname: watson Registered: Sep, 2005
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 22, 2008 7:55 AM
> > AS I understand it, macros can write macros and I don't
> > see anything that prevents them from acting like
> viruses.
>
> I see a bit of paranoia here. I do not think this is an
> issue.

But the more general issue of executing things like logs. Do you not see a risk? My understanding from previous discussions was that there was a known risk and that there are approaches used to mitigate them. If true, that tells me that the risk is real.

 Michele Simionato Posts: 222 Nickname: micheles Registered: Jun, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 23, 2008 4:31 AM
> But the more general issue of executing things like logs.
> Do you not see a risk? My understanding from previous
> s discussions was that there was a known risk and that
> there are approaches used to mitigate them. If true, that
> tells me that the risk is real.

Of course executing untrusted code is risky, but this is not an argument against macros. BTW executing logs is not such a common technique (I never heard of it); a typical technique is to execute configuration files, but this is fine since they should be edited by trusted users (programmers, sysadmins) and you can always restrict the environment your code is evaluated in. There is an example in the next episode.

 James Watson Posts: 2024 Nickname: watson Registered: Sep, 2005
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 23, 2008 6:31 AM
> Of course executing untrusted code is risky, but this is
> not an argument against macros. BTW executing logs is not
> such a common technique (I never heard of it); a typical
> technique is to execute configuration files, but this is
> fine since they should be edited by trusted users
> (programmers, sysadmins) and you can always restrict the
> environment your code is evaluated in. There is an example
> in the next episode.

I didn't realize this article was about macros. The word 'macro' only appears twice and in what read to me as side-bars. It seemed to me to be about muddying the separation between code and data. As I mentioned before, this is something that concerns security specialists. You don't have to take my word for it, it's a well known principle of computer science. Here's a couple links that came up in Google:

http://securology.blogspot.com/2007/09/separation-of-code-and-data.html

http://portal.acm.org/citation.cfm?id=1180405.1180448

It doesn't please me to point this out. I'd much rather someone tell me that Lisp makes it possible to blur the separation safely (and have it be true, of course.)

 Guglielmo Nigri Posts: 1 Nickname: gnit Registered: Oct, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 24, 2008 11:16 AM
Very interesting. I'd like to quote Rob Pike (see http://www.lysator.liu.se/c/pikestyle.html) -- it might be a bit out-of-context, but I think it adds to the discussion.

"Pascal, like its creator, believes firmly in the separation of code and data. It therefore (at least in its original form) has no ability to create initialized data. This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored­program computer. Code and data are the same, or at least they can be. How else can you explain how a compiler works?"

Security (i.e. not trusting one's own programs) was not of much of a concern in Turing and von Neumann's times, of course :-)

 Bill Pyne Posts: 165 Nickname: billpyne Registered: Jan, 2007
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Oct 24, 2008 1:26 PM
> So given that this is not really special, why is it that
> most languages don't work this way? The answer, as far as
> I can tell, is that people don't want them to. It's a
> powerful feature, no doubt, but it's also a really
> dangerous one. The distinction between code and data is
> deliberate. On a side note, it's also a core concern of
> computer security specialists.

I'm certainly not a historian but I think the reason more languages don't support it is more due to an historical division. Lisp has been around quite a while but wasn't efficient for a long time. (Techniques like tail call recursion didn't make their way in until Scheme as far as I know.) I think the division is more a consequence of languages modeling pushes of "data" and "instructions" into certain registers in order to be maximally efficient.

 Grant Rettke Posts: 23 Nickname: grettke Registered: Nov, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Nov 2, 2008 7:59 AM
XML is data is code, code is data is XML and people don't seem to be worried :)

 Grant Rettke Posts: 23 Nickname: grettke Registered: Nov, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Nov 2, 2008 8:03 AM
> I was in a discussion not that long ago about using XML in
> logs and someone pointed out that XML is basically a
> crappy reinvention of LISP (which I agree with if you are
> taking about XML used for writing code) and that writing
> logs in LISP was a really effective approach because you
> could execute the logs.

Why would you execute log files?! :)

> AS I understand it, macros can write macros and I don't
> see anything that prevents them from acting like viruses.
> I may be making a big deal out of nothing.

You are :).

> It's just
> t something that I think about when I read about LISP. A
> more mundane concern is that these features would make
> code unmaintainable. Maybe not in a one person project
> but on team projects, I don't see how much use of these
> features could be sustained.

If you took the deep-dive and learned about how they work, you would find that these nightmare scenarios aren't very likely to occur.

Also keep in mind, I've seen nightmare scenarios occur in plenty of language *without* macros, and I'm sure you have, too! :)

 Grant Rettke Posts: 23 Nickname: grettke Registered: Nov, 2008
Re: The Adventures of a Pythonista in Schemeland/7 Posted: Nov 2, 2008 8:05 AM
> Macros are very cool for a
> language implementor, less so for an application
> programmer.

What is the difference between an implementer and an application programmer?

What if your language is patterns? What difference does it make if it is macros? :)

 Flat View: This topic has 18 replies on 2 pages [ 1  2 | » ]
 Previous Topic Next Topic

 Sponsored Links

 Web Artima.com

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use