This post originated from an RSS feed registered with Ruby Buzz
by Eric Hodel.
Original Post: A use of Enumerable#chunk
Feed Title: Segment7
Feed URL: http://blog.segment7.net/articles.rss
Feed Description: Posts about and around Ruby, MetaRuby, ruby2c, ZenTest and work at The Robot Co-op.
In Ruby 1.9, Enumerable has a few new methods including Enumerable#chunk (which was added for 1.9.2). The #chunk method walks your Enumerable and divides it into chunks based on a selecting block. Unlike Enumerable#partition, the chunks are returned in-order. Here's an example from the documentation:
When I first saw this method I thought, "this looks like a useful method… but how?"
I'm working on bringing Markdown support to RDoc and the last remaining base Markdown feature I need to support is a hard break due to two spaces at the end of a line in a paragraph.
For background, RDoc parses various formats into a common syntax tree which is can then be transformed for any supported output (such as HTML, colored ANSI text, etc.). In this syntax tree a paragraph can contain one or more strings which are joined at output time into the paragraph you see.
To add hard line breaks, I decided to create a new HardBreak object and inject it into the paragraph where two trailing spaces are encountered in the source document. The formatters can then be updated to insert the appropriate line break character when emitting a paragraph.
Enumerable#chunk comes in because the Markdown parser doesn't join strings as it's parsing (since the grammar rules get re-used) and is instead performed as a post-processing step. (String joining as a post-processing step also makes the parser cleaner by hiding the ugliness in one spot rather than spreading it across multiple grammar rules.) Before inserting HardBreak objects this was sufficient:
parts = paragraph.parts.join.rstrip
paragraph.parts.replace [parts]
But now I need to join String chunks and include HardBreaks as-is which is a perfect use of Enumerable#chunk:
parts = paragraph.parts.chunk do |part|
String === part
end.map do |string, chunk|
string ? chunk.join.rstrip : chunk
end.flatten
paragraph.parts.replace parts
The 1.8-compatible implementation is much uglier since I have to track whether I'm in a String chunk or not in addition to performing the processing. I'm too embarrassed to post it, but you'll be able to find it in the rdoc source once I commit and push it.