The Artima Developer Community
Sponsored Link

Heron-Centric: Ruminations of a Language Designer
JavaScript Tokenizers, XML and XSLT
by Christopher Diggins
October 28, 2005
I just wrote a small open-source JavaScript tokenizer that outputs XML. Here is why you might be interested.


I wrote a JavaScript tokenizer which creates tokens for JavaScript code at

This little program might be interesting if you are simply interested in learning a bit about JavaScript and/or regular expressions, or you could use a tokenizer for language parsing.

The reason I wrote it was because I am developing a prototype of a multi-pass compiler based purely on XSLT.

For the uninitiated XSLT stands for eXtensible Stylesheet Language Transformations. It is a turing-complete programming language, and a surprisingly powerful one at that.
XSLT 1.0 (what is supported by most browsers) does not support regular expressions, though the working draft of XSLT 2.0 does. So for the purposes of the prototypes I am using a JavaScript.

Recently I had been blogging about the idea of a macro language being more effective as a pattern matching language, than a procedurally based language. I have since uncovered XSLT, and it is one heck of a powerful macro language.

Originally I was just looking at using XSLT to manipulate Abstract Syntax Trees (AST) when it dawned on me that not only could I rewrite an AST using XSLT, I could also generate an AST from a Parse Tree, and I could generate a parse tree from a token list. The only step missing was a tokenizer.

I know that some people will see this as a purely academic exercise, or more specifically that I am a monkey with a hammer. The thing is that theoretically it appears that with an XSLT document it is possible to generate a language agnostic representation (ASTXML). Then another XSLT document can be used to generate source in virtually any language.

Just imagine, a babelfish for source code!

Talk Back!

Have an opinion? Readers have already posted 7 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Christopher Diggins adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Christopher Diggins is a software developer and freelance writer. Christopher loves programming, but is eternally frustrated by the shortcomings of modern programming languages. As would any reasonable person in his shoes, he decided to quit his day job to write his own ( ). Christopher is the co-author of the C++ Cookbook from O'Reilly. Christopher can be reached through his home page at

This weblog entry is Copyright © 2005 Christopher Diggins. All rights reserved.

Sponsored Links


Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use