The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Markdown redux, and walking the AST to generate (statically) valid XHTML

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Markdown redux, and walking the AST to generate (statically) valid XHTML Posted: Apr 13, 2009 6:08 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Markdown redux, and walking the AST to generate (statically) valid XHTML
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

My earlier post (about Markdown processors and how simple top-down parsers can be up to two or three orders of magnitude faster than regexp-based gsub kludges) got unexpectedly popular. Some critics accused me of deliberately bastardizing Markdown's syntax for the sake of implementation convenience and speed. In order to prove that this is not the case, I reverted the changes relative to Markdown I introduced with ease of writing in mind and measured again. The exact changes are listed below. According to git, the resulting code is in fact shorter:

$ git diff --stat -b 298c05 simple_markup.ml
 simple_markup.ml |  167 +++++++++++++++++++++++++++---------------------------
 1 files changed, 83 insertions(+), 84 deletions(-)

I think this, if anything, shows that the motivation behind my changes was not to simplify the implementation and make it shorter. As for the performance, the 330KB document I was using is processed in 50ms (vs. 63ms for discount, 2.7s for Pandoc, 2.1s for python-markdown or 30s for Bluecloth).

Transforming the AST into (statically) valid XHTML

As I said in my earlier post, I quickly recognized the advantages of going beyond mere "markup -> HTML" conversion and being able to manipulate the AST. New targets can be supported in a very compact way; for instance, (X)HTML generation takes only 40 lines of code. The code is clear and concise thanks to the use of pattern matching (it's also fast :). Moreover, I can use higher-order functions to allow for specific processing of "interesting" elements. For instance, I'm rendering pre-formatted blocks, links and images differently depending on whether the markup comes from a comment or was written by me (e.g. some extensions are allowed only in the latter case, and only the alt attribute is shown in the former).

The code below simply generates a typed HTML tree from the markup tree. I'm using Ocsigen's XHTML.M module, which employs OCaml's type system to ensure that the resulting HTML is valid. Invalid markup just doesn't compile, as you'd get a type error; e.g., if I try to express something invalid like <b><li>foo</li></b>, the type system will catch it and tell me the acceptable elements:

# b [li [pcdata "foo"]];;
     -----------------
Error: This expression has type ([> `Li ] as 'a) elt = 'a XHTML.M.elt
       but is here used with type
        [< `A | `Abbr | `Acronym | `B | `Bdo | `Big | `Br | `Button
        | `Cite | `Code | `Del | `Dfn | `Em | `I | `Img | `Input | `Ins
        | `Kbd | `Label | `Map | `Noscript | `Object | `PCDATA | `Q
        | `Samp | `Script | `Select | `Small | `Span | `Strong | `Sub
        | `Sup | `Textarea | `Tt | `Var ] elt = 'b XHTML.M.elt
       The second variant type does not allow tag(s) `Li

The compiler (or toplevel) not only tells me the HTML would not be valid (li not allowed inside b), but also lists the elements I can use there.

Here are the ~40 lines of code that generate the HTML on this site (the resulting tree is included in a larger, typed tree --- ensuring the end result is valid when the page is composed). There are no type annotations whatsoever, as OCaml is able to infer everything here (OTOH, I can get the type of any single expression with a couple keypresses in vim; you can also generate the interface with the type of all the functions with ocamlc -i):

Read more...

Read: Markdown redux, and walking the AST to generate (statically) valid XHTML

Topic: Free at last! Previous Topic   Next Topic Topic: 82,520 minutes on Phusion Passenger

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use