The Artima Developer Community
Sponsored Link

Python Buzz Forum
A Python syntax highlighter

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Thomas Guest

Posts: 236
Nickname: tag
Registered: Nov, 2004

Thomas Guest likes dynamic languages.
A Python syntax highlighter Posted: Aug 4, 2006 10:14 AM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Thomas Guest.
Original Post: A Python syntax highlighter
Feed Title: Word Aligned: Category Python
Feed URL: http://feeds.feedburner.com/WordAlignedCategoryPython
Feed Description: Dynamic languages in general. Python in particular. The adventures of a space sensitive programmer.
Latest Python Buzz Posts
Latest Python Buzz Posts by Thomas Guest
Latest Posts From Word Aligned: Category Python

Advertisement

In a recent post I described my first ever Ruby program – which was actually a syntax highlighter for Python and written in Ruby, ready to be used in a Typo web log. Since the post was rather a long one, I decided to post the code itself separately. Here it is, then.

The Test Code

As you can see, currently only comments, single- and triple- quoted strings, keywords and identifiers are recognised. That’s really all I wanted, for now. For completeness, I may well add support for numeric literals. Watch this space!

typo/vendor/syntax/test/syntax/tc_python.rb
require File.dirname(__FILE__) + "/tokenizer_testcase"

class TC_Syntax_Python < TokenizerTestCase

  syntax "python"

  def test_empty
    tokenize ""
    assert_no_next_token
  end
  def test_comment_eol
    tokenize "# a comment\nfoo"
    assert_next_token :comment, "# a comment"
    assert_next_token :normal, "\n"
    assert_next_token :ident, "foo"
  end
  def test_two_comments
    tokenize "# first comment\n# second comment"
    assert_next_token :comment, "# first comment"
    assert_next_token :normal, "\n"
    assert_next_token :comment, "# second comment"
  end
  def test_string
    tokenize "'' 'aa' r'raw' u'unicode' UR''"
    assert_next_token :string, "''"
    skip_token
    assert_next_token :string, "'aa'"
    skip_token
    assert_next_token :string, "r'raw'"
    skip_token
    assert_next_token :string, "u'unicode'"
    skip_token
    assert_next_token :string, "UR''"
    tokenize '"aa\"bb"'
    assert_next_token :string, '"aa\"bb"'
  end
  def test_triple_quoted_string
    tokenize "'''\nfoo\n'''"
    assert_next_token :triple_quoted_string, "'''\nfoo\n'''"
    tokenize '"""\nfoo\n"""'
    assert_next_token :triple_quoted_string, '"""\nfoo\n"""'
    tokenize "uR'''\nfoo\n'''"
    assert_next_token :triple_quoted_string, "uR'''\nfoo\n'''"
    tokenize '"""\'a\'"b"c"""'
    assert_next_token  :triple_quoted_string, '"""\'a\'"b"c"""'
  end
  def test_keyword
    Syntax::Python::KEYWORDS.each do |word|
      tokenize word
      assert_next_token :keyword, word
    end
    Syntax::Python::KEYWORDS.each do |word|
      tokenize "x#{word}"
      assert_next_token :ident, "x#{word}"
      tokenize "#{word}x"
      assert_next_token :ident, "#{word}x"
    end
  end
end

The Python Tokenizer

typo/vendor/syntax/python.rb
require 'syntax'

module Syntax

  # A basic tokenizer for the Python language. It recognises
  # comments, keywords and strings.
  class Python < Tokenizer
    # The list of all identifiers recognized as keywords.
    # http://docs.python.org/ref/keywords.html
    # Strictly speaking, "as" isn't yet a keyword -- but for syntax
    # highlighting, we'll treat it as such.
    KEYWORDS =
      %w{as and del for is raise assert elif from lambda return break
         else global not try class except if or while continue exec
         import pass yield def finally in print}
    # Step through a single iteration of the tokenization process.
    def step
      if scan(/#.*$/)
        start_group :comment, matched
      elsif scan(/u?r?'''.*?'''|""".*?"""/im)
        start_group :triple_quoted_string, matched
      elsif scan(/u?r?'([^\\']|\\.)*'/i)
        start_group :string, matched
      elsif scan(/u?r?"([^\\"]|\\.)*"/i)
        start_group :string, matched
      elsif check(/[_a-zA-Z]/)
        word = scan(/\w+/)
        if KEYWORDS.include?(word)
          start_group :keyword, word
        else
          start_group :ident, word
        end
      else
        start_group :normal, scan(/./m)
      end
    end
  end
  SYNTAX["python"] = Python

end

Read: A Python syntax highlighter

Topic: Python elegance &#8212; functions are objects too, you know Previous Topic   Next Topic Topic: Blogging clients and PeopleAggregator

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use