The Artima Developer Community
Sponsored Link

Weblogs Forum
JavaScript Tokenizers, XML and XSLT

7 replies on 1 page. Most recent reply: Mar 17, 2012 4:18 PM by Claudius Teodorescu

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 7 replies on 1 page
Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

JavaScript Tokenizers, XML and XSLT (View in Weblogs)
Posted: Oct 28, 2005 9:52 AM
Reply to this message Reply
Summary
I just wrote a small open-source JavaScript tokenizer that outputs XML. Here is why you might be interested.
Advertisement
I wrote a JavaScript tokenizer which creates tokens for JavaScript code at http://www.cdiggins.com/tokenizer.html.

This little program might be interesting if you are simply interested in learning a bit about JavaScript and/or regular expressions, or you could use a tokenizer for language parsing.

The reason I wrote it was because I am developing a prototype of a multi-pass compiler based purely on XSLT.

For the uninitiated XSLT stands for eXtensible Stylesheet Language Transformations. It is a turing-complete programming language, and a surprisingly powerful one at that.
XSLT 1.0 (what is supported by most browsers) does not support regular expressions, though the working draft of XSLT 2.0 does. So for the purposes of the prototypes I am using a JavaScript.

Recently I had been blogging about the idea of a macro language being more effective as a pattern matching language, than a procedurally based language. I have since uncovered XSLT, and it is one heck of a powerful macro language.

Originally I was just looking at using XSLT to manipulate Abstract Syntax Trees (AST) when it dawned on me that not only could I rewrite an AST using XSLT, I could also generate an AST from a Parse Tree, and I could generate a parse tree from a token list. The only step missing was a tokenizer.

I know that some people will see this as a purely academic exercise, or more specifically that I am a monkey with a hammer. The thing is that theoretically it appears that with an XSLT document it is possible to generate a language agnostic representation (ASTXML). Then another XSLT document can be used to generate source in virtually any language.

Just imagine, a babelfish for source code!


Dale Asberry

Posts: 161
Nickname: bozomind
Registered: Mar, 2004

Re: JavaScript Tokenizers, XML and XSLT Posted: Oct 28, 2005 4:47 PM
Reply to this message Reply
very cool

Reno C.

Posts: 10
Nickname: saxml
Registered: Oct, 2005

Re: JavaScript Tokenizers, XML and XSLT Posted: Oct 29, 2005 10:41 AM
Reply to this message Reply
XSLT 2. We're all waiting for it!
Something like RELAX NG (http://www.relaxng.org) which is based on RELAX (http://www.xml.gr.jp/relax) and TREX (http://www.thaiopensource.com/trex) could be useful too. Or at least the underlying idea because it's designed to validate documents rather than manipulate them. An example for java: RELAX NG Compiler Compiler (http://relaxngcc.sourceforge.net/en/index.htm)

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: JavaScript Tokenizers, XML and XSLT Posted: Oct 29, 2005 11:25 AM
Reply to this message Reply
> very cool

Thanks Dave.

> XSLT 2. We're all waiting for it!
> Something like RELAX NG (http://www.relaxng.org) which is
> based on RELAX (http://www.xml.gr.jp/relax) and TREX
> (http://www.thaiopensource.com/trex) could be useful too.
> Or at least the underlying idea because it's designed to
> validate documents rather than manipulate them. An example
> for java: RELAX NG Compiler Compiler
> (http://relaxngcc.sourceforge.net/en/index.htm)

Thanks for pointing these out Reno. Here is a naive question: why would the RELAX NG compiler, and similar tools, not simply use XSLT to generate Java code?

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: JavaScript Tokenizers, XML and XSLT Posted: Oct 29, 2005 12:41 PM
Reply to this message Reply
I now have written a first pass XSLT transform, from a token list to a parse tree.

If you have Internet Explorer 6.0 (and I think Firefox 1.0.2?) you should be able to see the transform performed automatically at:

http://www.cdiggins.com/js-pass1-demo.xml

This document uses the style sheet at:

http://www.cdiggins.com/js-pass1.xslt

Don't forget to peek at the source. For those wondering, i"so what", well I want you to think about what is involved in writing a compiler/translator in the traditional way. There are typically six main phases:

1) tokenizer (currently done by my JavaScript program, but I just learned I can in fact write one in XSLT 1.0, albeit an inefficient one.

2) token list -> parse tree

3) parse tree -> abstract syntax tree

4) abstract syntax tree -> optimizable format (this phase is optional)

5) optimizable format -> optimized format (this phase can be repeated multiple times)

6) optimized format -> some executable format or another source code

Now this usually represents a massive amount of work in an imperative language like C/C++/Java, however using a declarative pattern matching language like XSLT, means that with a handful of a few simple documents, an entire compiler can be built with very little work. Anyway that's the theory.

Reno C.

Posts: 10
Nickname: saxml
Registered: Oct, 2005

Re: JavaScript Tokenizers, XML and XSLT Posted: Oct 29, 2005 2:41 PM
Reply to this message Reply
>why would the RELAX NG compiler, and similar tools, not simply use XSLT to generate Java code?

It's only a supposition:
Because we have to process a RELAX NG document (the grammar definition, well, a 'BNF' for xml). It might be easier to do it in another language than XSLT alone!

Anyway, your approach seems interesting. Are you sure the phase 3) "parse tree -> abstract syntax tree" can be handled easily ?

Christopher Diggins

Posts: 1215
Nickname: cdiggins
Registered: Feb, 2004

Re: JavaScript Tokenizers, XML and XSLT Posted: Oct 29, 2005 3:07 PM
Reply to this message Reply
> >why would the RELAX NG compiler, and similar tools, not
> simply use XSLT to generate Java code?
>
> It's only a supposition:
> Because we have to process a RELAX NG document (the
> grammar definition, well, a 'BNF' for xml). It might be
> easier to do it in another language than XSLT alone!

That might be it. XSLT is a bit of a bear to work with. I would much rather work in a procedural language than a declarative one. Perhaps I am simply not smart enough to design and understand declarative programs ;-)

> Anyway, your approach seems interesting. Are you sure the
> phase 3) "parse tree -> abstract syntax tree" can be
> handled easily ?

I think so. I believe the only hard part was making a linear form like the tokens specification (which is effectively a flat list) into a hierarchical structure. The next challenge will be to isolate statements into grouped nodes, which should be quite easy.

Claudius Teodorescu

Posts: 1
Nickname: claudius
Registered: Mar, 2012

Re: JavaScript Tokenizers, XML and XSLT Posted: Mar 17, 2012 4:18 PM
Reply to this message Reply
Hi,

Any good news on this?

Claudius

Flat View: This topic has 7 replies on 1 page
Topic: The Adventures of a Pythonista in Schemeland/30 Previous Topic   Next Topic Topic: Down on the Upside


Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2014 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use - Advertise with Us