This post originated from an RSS feed registered with Agile Buzz
by Martin Fowler.
Original Post: Oslo
Feed Title: Martin Fowler's Bliki
Feed URL: http://martinfowler.com/feed.atom
Feed Description: A cross between a blog and wiki of my partly-formed ideas on software development
Oslo is a project at Microsoft, of which various things have been
heard but with little details until this week's PDC conference. What we
have known is that it has something to do with
ModelDrivenSoftwareDevelopment and DomainSpecificLanguages.
A couple of weeks ago I got an early peek behind the curtain as I,
and my language-geek colleague Rebecca Parsons, went through a preview
of the PDC coming-out talks with Don Box, Gio
Della-Libera and Vijaye Raji.
It was a very interesting presentation, enough to convince me that
Oslo is a technology to watch. It's broadly a Language
Workbench. I'm not going to attempt a comprehensive review of the
tool here, but just my scattered impressions from the walk-through. It
was certainly interesting enough that I thought I'd publish my
impressions here. With the public release at the PDC I'm sure you'll
be hearing a lot more about it in the coming weeks. As I describe my
thoughts I'll use a lot of the language I've been developing for my book, so you may find
the terminology a little dense.
Oslo has three main components:
a modeling language (currently code-named M) for textual
DSLs
a design surface (named Quadrant) for graphical DSLs
a repository (without a name) that stores semantic
models in a relational database.
(All of these names are current code names. The marketing
department will still use the same smarts that replaced "Avalon and
Indigo" with "WPF and WCF". I'm just hoping they'll rename "Windows"
to "Windows Technology Foundation".)
The textual language environment is bootstrapped and provides three base
languages:
MGrammar: defines grammars for Syntax
Directed Translation.
MSchema: defines schemas for a Semantic Model
MGraph: is a textual language for representing the
population of a Semantic Model. So while MSchema represents types,
MGraph represents instances. Lispers might think of MGraph as
s-expressions with a ugly syntax.
You can represent any model in MGraph, but the syntax is often not
too good. With MGrammar you can define a grammar for your own DSL
which allows you to write scripts in your own DSL and build a parser to
translate them into something more useful.
Using the state machine example from my book introduction, you
could define a state machine semantic model with MSchema. You could
then populate it (in an ugly way) with MGraph. You can build a decent
DSL to populate it using MGrammar to define the syntax and to drive a
parser.
There is a grammar compiler (called mg) that will take
an input file in MGrammar and compile it into what they call an image
file, or .mgx file. This is different to most parser generator
tools. Most parser generators tools take the grammar and generate code
which has to be compiled into a parser. Instead Oslo's tools compile
the grammar into a binary form of the parse rules. There's then a
separate tool (mgx) that can take an input script and a
compiled grammar and outputs the MGraph representation of the syntax
tree of the input script.
More likely you can take the compiled grammar and add it to your
own code as a resource. With this you can call a general parser
mechanism that Oslo provides as a .NET framework, supply the reference
to the compiled grammar file, and generate an in-memory syntax
tree. You can then walk this syntax tree and use it to do whatever you
will - the parsing strategy I refer to as Tree
Construction.
The parser gives you a syntax tree, but that's often not the same as
a semantic model. So usually you'll write code to walk the tree and
populate a semantic model defined with MSchema. Once you've done this
you can easily take that model and store it in the repository so that
it can accessed via SQL tools. Their demo showed entering some data
via a DSL and accessing corresponding tables in the repository,
although we didn't go into complicated structures.
You can also manipulate the semantic model instance with
Quadrant. You can define a graphical notation for a schema and then
the system can project the model instance creating a diagram using
that notation. You can also change the diagram which updates the
model. They showed a demo of two graphical projections of a model,
updating one updated the other using Observer
Synchronization. In that way using Quadrant seems like a similar
style of work to a graphical Language Workbench such MetaEdit.
As they've been developing Oslo they have been using it on other
Microsoft projects to gain experience in its use. Main ones so far
have been with ASP, Workflow, and web services.
More on M
We spent most of the time looking at the textual environment. They have a
way of hooking up a compiled grammar to a text editing control to
provide a syntax-aware text editor with various completion and
highlighting goodness. Unlike tools such as MPS, however, it is
still a text editor. As a result you can cut and paste stretches of
text and manipulate text freely. The tool will give you squigglies if
there's a problem parsing what you've done, but it preserves the
editing text experience.
I think I like this. When I first came across it, I rather liked
the MPS notion of: "it looks like text, but really it's a structured
editor". But recently I've begun to think that we lose a lot that
way, so the Oslo way of working is appealing.
Another nice text language tool they have is an editor to help
write MGrammars. This is a window divided into three vertical
panes. The center pane contains MGrammar code, the left pane contains
some input text, and the right pane shows the MGraph representation of
parsing the input text with the MGrammar. It's very example
driven. (However it is transient, unlike tests.) The tool resembles
the capability in Antlr to process sample text right away with a
grammar. In the conversation Rebecca referred to
this style as "anecdotal testing" which is a phrase I must remember to
steal.
The parsing algorithm they use is a GLR parser. The grammar syntax
is comparable to EBNF and has notation for Tree Construction expressions. They
use their own varient of regex notation in the lexer to be more
consistent with their other tools, which will probably throw people
like me more used to ISO/Perl regexp notation. It's mostly similar,
but different enough to be annoying.
One of the nice features of their grammar notation is
that they have provided constructs to easily make parameterized rules -
effectively allowing you to write rule subroutines. Rules can also be
given attributes (aka annotations), in a similar way to .NET's
language attributes. So you can make a whole language case insensitive
by marking it with an attribute. (Interestingly they use "@" to mark
an attribute, as in the Java syntax.)
The default way a grammar is run is to do tree construction. As it
turns out the tree construction is the behavior of the default class
that gets called by the grammar while it's processing some input. This
class has an interface and you can write your own class that
implements this. This would allow you to do embedded translation and
embedded interpretation. It's not the same as code actions, as the
action code isn't in the grammar, but in this other class. I reckon
this could well be better since the code inside actions often swamp
grammars.
They talked a bit about the ability to embed one language in
another and switch the parsers over to handle this gracefully -
heading into territory that's been explored by Converge. We didn't look at this deeply
but that would be interesting.
An interesting tidbit they mentioned was that originally they
intended to only have the tools for graphical languages. However they
found that graphical languages just didn't work well for many problems
- including defining schemas. So they developed the textual tools.
(Here's a thought for the marketing department. If you stick with
the name "M" you could use this excellent film for
marketing inspiration ;-))
Comparisons
Plainly this tool hovers in the same space as tools like
Intentional Software and JetBrains MPS that I dubbed as Language
Workbenches in 2005. Oslo doesn't exactly fit the definition for a
language workbench that I gave back then. In particular the textual
component isn't a projectional editor and you don't have to use a
storage representation based on the abstract representation (semantic
model), instead you can store the textual source in a more
conventional style. This lesser reliance on a persistent abstract
representation is similar to Xtext. At some point I really need to
rethink what I consider the defining elements of a Language Workbench
to be. For the moment let's just say that Xtext and Oslo feel like
Language Workbenches and until I revisit the definition I'll treat
them as such.
One particularly interesting point in this comparison is comparing
Oslo with Microsoft's
DSL tools. They are different tools with a lot of overlap, which
makes you wonder if there's a place for both them. I've heard vague
"they fit together" phrases, but am yet to be convinced. It could be
one of those situations (common in big companies) where multiple
semi-competing projects are developed. Eventually this could lead to
one being shelved. But it's hard to speculate about this as much
depends on corporate politics and it's thus almost impossible to get a
straight answer out of anyone (and even if you do, it's even harder to
tell if it is a straight answer).
The key element that Oslo shares with its cousins is that it
provides a toolkit to define new languages, integrate them together,
and define tooling for those languages. As a result you get the
freedom of syntax of external DomainSpecificLanguages
with decent tooling - something that deals with one of the main
disadvantages of external DSLs.
Oslo supports both textual and graphical DSLs and seems to do so
reasonably evenly (although we spent more time on the textual). In
this regard it seems to provide more variety than MPS and Intentional
(structured textual) and MetaEdit/Microsoft's DSL tools (graphical). It's also
different in its textual support in that it provides real free text
input not the highly structured text input of Intentional/MPS.
Using a compiled grammar that plugs into a text editor strikes me
as a very nice route for supporting entering DSL scripts. Other tools
either require you to have the full language workbench machinery or to
use code generation to build editors. Passing around a representation
of the grammar that I could plug into an editor strikes me as a good
way to do it. Of course if that language workbench is Open Source (as
I'm told MPS will be), then that may make this issue moot.
One of the big issues with storing stuff like this in a repository
is handling version control. The notion that we can all collaborate on
a single shared database (the moral equivalent of a team editing one
copy of its code on a shared drive) strikes me as close to
irresponsible. As a result I tend to look askance at any vendors who
suggest this approach. The Oslo team suggests, wisely, that you treat
the text files as the authoritative source which allows you to use
regular version control tools. Of course the bad news for many
Microsoft shops would be that this tool is TFS (or, god-forbid, VSS),
but the great advantage of using plain text files as your source is
that you can use any of the multitude of version control systems to
store it.
A general thing I liked was most of the tools leant towards
run-time interpretation rather than code generation and
compilation. Traditionally parser generators and many language
workbenches assume you are going to generate code from your models
rather than interpreting them. Code generation is all very well, but
it always has this messy feel to it - and tends to lead to all sorts
of ways to trip you up. So I do prefer the run-time emphasis.
It was only a couple of hours, so I can't make any far-reaching
judgements about Oslo. I can, however, say it looks like some very
interesting technology. What I like about it is that it seems to
provide a good pathway to using language workbenches. Having Microsoft
behind it would be a big deal although we do need to
remember that all sorts of things were promised about Longhorn that
never came to pass. But all in all I think this is an interesting
addition to the Language Workbench scene and a tool that could make
DSLs much more prevalent.