Agile Buzz Forum - One syntax to rule them all?

There's a trend in our community to remap other peoples ideas in to our own world. This can be a really good thing.. but sometimes we take it too far. Sometimes we remap their concepts to our own concepts. This can have dire consequences that have no obvious impact on the existing community.

Scenario: Say Smalltalk has as Logging library which has been around for a while and is used by several applications.. then we decide we really want to have an interface to syslog, which can do more than our existing Logging library. The itching obsession is to make a syslog interface that plugs in to our existing Logging library. This seems like it might be a good idea - so long as the two models match. Conceptually, they'd both be about logging, so why wouldn't they match? Well the people who made the Logging library back 10 years ago didn't really consider what we might want to log in the early years of the 21st century, how rude of them! :) I digress, this post is not about whether developers are sufficiently able to model abstractly enough to be future proof.

So let's say it's possible to make them match close enough that for all practical purposes, it doesn't matter. We now come to the main point of this post: we've created a problem we never knew we had... any one who comes in to our world from the outside world does not know our Logging framework, but they may know syslog. Or, perhaps they're in our world and want to learn about logging.. they could learn our Logging framework, but not be able to take what they've learnt in to another world (ie: another language).

Let's imagine our Logging framework was wonderfully abstract - is our Logging framework idea so great that everyone should use it? .. or is it merely different and may have slight improvements over the common APIs available elsewhere. That debate is really not the problem, the problem is that we've distinguished ourselves, for good or for bad, from the rest of the developers out there not using Smalltalk. This is not a decision to be made lightly!

The more we do this, the more barrier-to-entry we create for our language. The benefits of Smalltalk should stand on their own two feet when compared with other languages in a one-on-one battle. So in the above example, the correct thing to do is to create a straight api to syslog, one that matches the C world one-to-one.. then create the interface for our existing Logging framework to the syslog interface. This brings us the best of both worlds.. if there are a dozen or so logging frameworks we want to interface with - we can - and if we have a "Leaky" abstraction (ie: the concepts don't entirely match 1:1) then the developer can drop down to the low level library and bypass the "Logging" framework we provide, which is the abstraction over the different logging frameworks.

This has never been done that well in Smalltalk. Possibly the best example of it is the VisualAge Smalltalk graphics library, which gave you classes for each platform, then on top of that provided a common widget set, then on top of that provided a high level widget set. However, it was still in the business of hiding the nasty details, so if you really wanted to call a Windows create-window type function to switch on RGBA color spaces.. well, you're back to figuring out the low level interface yourself. I digress again.

So why am I inspired to write this post about concept-compatibility.. and why did I title it "one syntax to rule them all" ? .. the reason is due to recent project developments: OMeta, Mirror-Image and Newspeak.

I'd like to start with Newspeak. I really like what they're doing. They have a technique for building parsers that lets them explore new syntax ideas really easily. Internally, they have 3 versions of their languages syntax (I believe), which is really really cool. However, the language parser itself is written in the Newspeak language, not a language built for describing parser rules. Why is that? Well, Gilad eloquently argues that the Newspeak syntax is "pretty darned close" anyway, so why have another syntax?

But then again with that argument you could say that you don't need any other syntaxes at all and we've now discovered "the one" syntax that does everything.. ya know.. sort of like lisp. And if that were true, we really wouldn't need such a flexible parser framework anyway, right?

Well we know better, it's not true, there is no one magic syntax that does everything. Let's take some actual examples:

SQL: functional syntax for accessing a database, no variables, technically a "functional" programming language like ML, XSLT , Haskell etc
RDF/N3: knowledge syntax for describing information
Prolog: knowledge syntax for describing logic

There are a lot more syntaxes out there I'd like to talk about but let's stick with the basics. Each of these languages excels at describing something in a succinct yet readable manner. Many of these languages include syntax shortcuts to describe graphs, trees, relationships and other interesting concepts which you don't use in your day to day Object-Oriented programming.

Each of these syntaxes does rock in its particular domain and to squish them in to another syntax feels awkward and weird. Take this line of CSS for example:

body .List #selected {}

In a very few short letters we have described a functional matching rule across three different kinds of attributes - and then we've described an enclosure of information which doesn't necessarily execute unless the match is satisfied.

Which brings me to my next example - Mirror-Image by Antony Blakey and GNU-Smalltalk 3.0's syntax. Both of these projects are trying to define a more developer-friendly syntax for writing Smalltalk code. Both are describing a scripting language that can be interpreted to represent code. In my opinion, this is an interesting idea but possibly the wrong one.

If you look at the C language, it does not define an executable definition of code - it defines a compilation unit of code. This is in stark contrast to Perl, Python, Ruby, SH(ell) which all provide syntaxes that immediately run - statements that create structures are incidental.

Sometimes.. *sometimes*.. you have information, not a program. In the case of C, we have information that is to be parsed and interpreted. N3 is the same deal, it's just information that needs something 'greater' to interpret its meaning. When we start talking about Smalltalk scripting versus Smalltalk source code.. we hit an interesting dilemma.. to write a Smalltalk method in Smalltalk requires wacky escaping rules. Rules which Antony has detailed on his blog and his efforts to circumvent them. He could have saved himself the trouble by describing a compilation unit syntax, not an executable syntax.

This isn't to say what Antony is doing is wrong, merely that the path he took once again demonstrates the importance of syntax and how having one syntax to rule them all is a really really bad idea.

So, to come back to my original discussion on the syslog library where we mistakenly mapped it to the Logging library and therefore made ourselves less approachable by outsiders by shunning their way of doing things.. so to do we do the same with Syntax.

One of the reasons OMeta and Newspeak are so important to the world of Smalltalk is because they advocate opening up our syntax. When we want to describe message sends.. the syntax we already have is pretty darned good!.. but when we want to talk to C? not so good. When we want to describe information? not so good. When we want to describe a functional definition? not so good.

With the techniques likes OMeta, using PEGs.. or Newspeak.. suddenly we can get the best of both worlds. Why not write SQL inline with your Smalltalk code? .. the .NET guys do it with Linq! for goodness sake - and that's a strongly typed language. How crazy is that? And why not have a separate syntax for describing information? In Javascript you can make a new randomly-shaped object just by writing {} -- (which in my mind is how we describe how good our [] syntax is for making closures!).

I think there's a real future to be had in using the dynamic powerful nature of our Smalltalk environment to embrace and reuse syntaxes that do a good job - and when they don't do a good job? Well, we can make another new syntax, or fall back to a different tried and true form. The real challenge will be blending all this stuff together nicely. I remember the day I discovered I could write inline C code in Smalltalk/X.. I was delighted.

This path was trodden a little once before in VisualWorks. The Advanced parcels even come with a SQL parser. But somewhere they must have taken a wrong turn, because it's not part of the Smalltalk base and it's not commonly used. What lessons did they learn that we too can learn from now that we have shiny new tools like OMeta and Newspeak?


	Web Artima.com