This post originated from an RSS feed registered with Python Buzz
by Andrew Dalke.
Original Post: GardenSnake language
Feed Title: Andrew Dalke's writings
Feed URL: http://www.dalkescientific.com/writings/diary/diary-rss.xml
Feed Description: Writings from the software side of bioinformatics and chemical informatics, with a heaping of Python thrown in for good measure.
I experimented with PyParsing but I couldn't figure out how to use it
to parse an indentation-based language like Python. I gave up and
tried PLY, which has an API
very much like the SPARK library. After using it for a while now I
prefer PLY over SPARK. Its error messages are better and its
documentation was exactly right for me.
I looked around but could find no examples of how to use a lex/parser
pair to parse an indentation-based language. Python uses its own
specialized tokenizer and parser designed for Python and it didn't
look easy to port to another parsing system. With some work I figured
it out.
I ended up writing a filter (or rather three filters) between the Plex
tokenizer and its parser. Plex sees all newlines and whitespace but
knows to ignore them when inside of (parens) and to return only
leading whitespace. My filters watch the tokenizer output stream and
tweak a flag so can the tokenizer can filter out non-leading
whitespace.
What took the longest time was figuring out that there are three
possible indentation states in Python: INDENT not allowed, INDENT may
occur, INDENT required. I only had the first and last and without the
middle one I couldn't come up with a set of conditions to make it
work.
I got the tokenizer mostly working using a trivial language. Python's
grammar is a bit more complicated so I decided to implement a subset
of Python which captures most of the indentation cases. What I
eventually did was use the parser rules to create a Python AST for
this new language. Let Python by my back-end. Doing that found
several flaws in my logic, which I hope are all now fixed.
I used Python's woefully underdocumented "compiler" module for this.
I know it just well enough to use it but not enough to help improve
the documentation. There's parts of it which I just do because that's
what other code does. (Eg, do I have to do syntax.check(tree)?)
I decided to call the new language GardenSnake. It's a small snake
you can play with. Here's the
GardenSnake code with tokenizer, filters, parser, code generator
and demo all in a single 695 line file.
Here's some bullet points about GardenSnake, from the comments at the
top of the file:
only 'def', 'return' and 'if' statements
'if' only has 'then' clause (no elif nor else)
single-quoted strings only, content in raw format and encoded as "swapcase"
numbers are decimal.Decimal instances (not integers or floats)
no print statment; use the built-in 'print' function
only < > == + - / * implemented (and unary + -)
assignment and tuple assignment work
no generators of any sort
no ... well, no quite a lot
It wouldn't be hard to implement most of these. It's mostly a matter
of time and figuring how the compiler.ast is supposed to work. (Hint:
use compiler.parse to create a correct parse tree for comparison.)
But I'm satified with what I've done and don't plan to touch this code
again.
Here's the demo program at the end of the file
print('LET\'S TRY THIS \\OUT')
#Comment here
def x(a):
print('called with',a)
if a == 1:
return 2
if a*2 > 10: return 999 / 4
# Another comment here
return a+2*3
ints = (1, 2,
3, 4,
5)
print('mutiline-expression', ints)
t = 4+1/3*2+6*(9-5+1)
print('predence test; should be 34+2/3:', t, t==(34+2/3))
print('numbers', 1,2,3,4,5)
if 1:
8
a=9
print(x(a))
print(x(1))
print(x(2))
print(x(8),'3')
print('this is decimal', 1/5)
print('BIG DECIMAL', 1.234567891234567e12345)
and with the runtime for 'print' support the output is
--> let's try this \out
--> MUTILINE-EXPRESSION (Decimal("1"), Decimal("2"), Decimal("3"), Decimal("4"), Decimal("5"))
--> PREDENCE TEST; SHOULD BE 34+2/3: 34.66666666666666666666666667 True
--> NUMBERS 1 2 3 4 5
--> CALLED WITH 9
--> 249.75
--> CALLED WITH 1
--> 2
--> CALLED WITH 2
--> 8
--> CALLED WITH 8
--> 249.75 3
--> THIS IS DECIMAL 0.2
--> big decimal 1.234567891234567E+12345
Done