Cool Tools and Other Stuff
Make, Mailing Lists, and Ruby
by Eric Armstrong
March 17, 2006

Summary
When I found Martin Fowler's article on the benefits of Rake (the Ruby version of Make), I immediately became committed to learning Ruby. I needed to write a mailing list program, too. Ruby turns out to be ideal for both.

Rake and Rant -- A Better Make and Ant

Talk about pain reduction! Rake eliminates the syntax woes of makefiles and Ant. It lets you add automatic dependency rules for file types the original implementation may not have considered, and it puts the power of the programming language completely at your disposal--so it becomes possible to code conditionally-executed tasks, task loops, and other things that are difficult to do with current build languages.

Notes:

Martin's excellent article on Rake is at http://www.martinfowler.com/articles/rake.html

Rant is another implementation of the Rake syntax. It focuses on performance and reliabiltiy. But virtually all of the how-to-use-it documentation is for Rake.

With Rake, you write something that looks very much like a makefile--only in reality it's Ruby. So you do simple makefile things, or you can do complex Ruby things. It's your choice. You have the convenience of standard makefile processing with syntax like this:

# task X depends on tasks Y and Z
task :X => [:Y, :Z] do  
  ...code for task X...
end

But since the code is in reality a Ruby script, you can use the expressive power of the language to do more creative manipulations (which Folwer talks about in his article). In this code, for example, :X defines a symbol named X, and the task method (invoked without parentheses) produces a method call that looks much like a makefile entry.

So I became highly motivated to take a deeper look at Ruby. I had taken a first look several years ago. At the time, I saw a variety of cryptic Perlisms that tended to put me off, like $_. (How self-documenting is that?) Like Perl, there were multiple ways to do things. And there was operator overloading, all of which tend to make code less readable.

I wasn't a big fan of operator overloading, mostly because of the potential readability problems. I mean, it's hard enough to read code. But if you can't even count on the operators doing what you think they're doing, how can you make any sense of what you read? But then it occured to me => might be overloaded to implement the syntax needed for a build file.

As it turns out, => is defined as a hash operator. So in this case, it's mapping a target symbol to its dependencies so Rake can do things in the right sequence. So => isn't really overloaded. But thinking that it might be changed my mind about operator overloading.

In his article, Fowler concludes that an internal domain specific language really makes sense for builds, In the same vein, I conclude that operator-overloading makes a lot of sense when it's used to define an internal domain-specific language. To quote Martin's point more fully:

All three build languages share another characteristic - they are all examples of a Domain Specific Language (DSL). However they are different kinds of DSL. In the terminology I've used before:

make is an external DSL using a custom syntax

ant (and nant) is an external DSL using an XML based syntax

rake is an internal DSL using Ruby.

The fact that rake is an internal DSL for a general purpose language is a very important difference between it and the other two. It essentially allows me to use the full power of ruby any time I need it, at the cost of having to do a few odd looking things to ensure the rake scripts are valid ruby. Since ruby is an unobtrusive language, there's not much in the way of syntactic oddities. Furthermore since ruby is a full blown language, I don't need to drop out of the DSL to do interesting things - which has been a regular frustration using make and ant. Indeed I've come to view that a build language is really ideally suited to an internal DSL because you do need that full language power just often enough to make it worthwhile - and you don't get many non-programmers writing build scripts.

Ruby's syntactic freedom and operator-overloading combine to define a relatively clean build language. I hadn't thought of operator overloading as a language-construction tool before. But in that context, it makes a lot of sense. It's a given that things take on new meanings in a different language, whether you're talking about computer languages or natural languages.

By now, too, the cryptic constructs had become more familiar, so I didn't mind them as much. That creates an internal conflict, of course. I like to say that "Human adaptability is no excuse for a poor design." But I've adapted! And now that I'm used to it, I do like the short little syntax pf $_ compared to a longer construct like #{line}. I even read the underscore as "line"! So I'm conflicted between ease of writing and ease of reading. Sigh. But I think I'll learn to live with myself--mostly because coding in Ruby is turning out to be a lot of fun.

Ruby -- Coding can be Fun

As I went deeper, I began to appreciate Ruby much more. I had long missed C's ability to send a function pointer as an argument. In Ruby, you do that with a /closure/ (a code segment in braces that can take arguments), when I saw how you execute one inside of a method that it has been passed to.

I'm still trying to fully grok the concept of closures and find the best way to explain it. For nwo, here's what I've got: Think "C-style function pointer (or anonyomous class) passed to an iterator, with an adapter interface that lets the receiver call the 'doit' method in the function". Only instead of coding all that stuff by hand, you use a few tiny pieces of syntax to define the function: {|param| ...code...}. I mean, dang. Talk about productivity. Talk about ease of programming.

You want to process a collection? No need to code an iterator. Call "each" and pass it the code you want: myList.each {|x| ...do something with x...}. (Note the implied contract. The "each" method is defined to invoke the closure with an argument. That's not exactly obvious, at first glance. Most discussions of closures don't seem to mention that contract, which may tend to make closures more of a mystery than then need to be. (On the other hand, it may be harder to understand what's really going on than to simply treat it as magic.)

Implement some other some other methods that take closures, as well, and make them part of the standard--methods like collect and find. Then you pass a closure to the find that method that identifies the kind of thing you want: myList.find {|x| ...return true when x is a sports car or motorcyle...}.

But as good as closures are, don't stop there. Make the language dynamic, so you can iterate over pretty much anything. So you can iterate over the parameters defined in a Struct, for example, and pass a function to them to print them out--and make it super simple to get to the name of the property, so you can print that as well.

Speaking of Struct, that's a nice little 'class' that doesn't have any methods. Just properties that are always accessed with getters and setters--but use the same syntax whether or not the methods are present. Then, if you decide to add a setter method later to log changes, you can do it without changing the code everywhere the property is referenced. (I think that's the way it works, anyway.)

Then add Module--a superclass of class that can't be instantiated, but which can easily be included into a class you're writing (aka "mix-ins"). The methods don't have to be declared static. They already are, because they're in a Module. That was a Perlism worth adopting.

But wait, there's more. Not only can you add to a class you're writing by including a Module, you can truly "extend" an existing class by adding stuff to it. When you do that, you're not making a copy with a new name, you're adding to the existing class. So if there's a feature you want in String, you can add it. Nice. And you can make the additions to other code that has been using String, without having to redeclare every variable in town.

And by all means build in regular-expression string processing. My one quibble: The examples I saw had syntax that seemed backwards to me. They had /regularExpression/ =~ string to see of there is a match, rather than string =~ /regularExpression/. But it turns out that both coding styles are possible. That's basically they way it should be. If A = B, then B = A. Similarly, if /RE/ =~ A, then A =~ /RE/. A match is a match is a match.

One feature that seemed mysterious and possibly inadvisable at first glance was parallel assignment: a, b = 1, 2, which assigns 1 to a and 2 to b. (Pardon me if I get syntax wrong. I'm a concept guy. The compiler corrects me all the time.) What was the value of parallel assignment, I wondered? Then I saw this in the explanation of the yield statement, in Ruby in a Nutshell:

The expression passed to yield is assigned to the block's argument(s). Parallel assignment is performed when multiple expressions are passed.

In other words, parallel assignment is the internal glue that takes the collection of arguments passed to the yield method and binds them to the collection of parameters expected by the block. Pretty darn cool. And powerful.

Note: That seems a lot like closures, to me. But I'm told that blocks and closures are basically different things. Someday, I'm going to understand that better. When I do, I'm going to write more.

There's also a powerful case statement, of the kind I haven't seen since the Icon language, of Snobol lineage. Like Ruby, it had powerful string processing and built-in regular expression processing. It also had the same kind of case statement--one where you could put an expression at the top of the statement and compare it to arbitrary expressions in the when-clauses. So can say "when testObject.color == blue", and "when testObject.length > 40, and things like that--all in the same case statement. It's not something you need a lot, but when you do need it, you can say exactly what you mean.

There's lots of good stuff in the library, too. There's a terrific, easy to use XML parser (REXML), web-enabling packages for HTTP, mail, and CGI. A package for generating HTML. Stuff for database access and unit testing. The list goes on.

Then there's the RAILs framework. That one is exciting if only because it pretty much eliminates configuration files. Like Beans, RAILs depends on coding conventions. You follow the conventions, and the framework uses reflection to do all of the wiring for you. Very nice.

Then there are features I haven't even begun to use, like generating code inside a program and then executing it--lambda functions and Lisp-like stuff like that. That's one benefit of having a symbol-definition construct: You can define variable names and method names dynamically, and then use them. That sort of thing is going to be powerful, when I finally manage to wrap my head around it. So I get to have fun today, and even greater power tomorrow. Very enticing.

The Small Stuff

Ruby has the things you expect in an OO language, like threading, exceptions, and exception handlers. But it also has a raft of smaller features that just plain make life easier. Sometimes, it's the accumulation of small advantages that paves the way to productivity, rather than the grand concepts. Ruby has both. Here's a smattering of the goodnesses:

Adjacent strings automatically concatenate. No need to code "+" or "&" or what have you to put them together.
Inline strings that can include newlines, or "here" documents. You start one with <<<, and specify an end-tag--say XYZ. That gives you <<<XYZ. The string begins on the next line, which makes it easy to line up your text. The string ends when XYZ is found at the beginning of a subsequent line. But it gets even better than that. The string starts on the next line, even if other syntax elements exist on the initial line. So the code for sending a message can look like this:
smtp.sendmail(<<<BODY, from, to)
Here is the text of the message.
Here is line two.
BODY
Note that the body of the message starts on the second line. How cool is that? Maybe it's odd. But I liked it. Of course, the text all has to start in the first column so, in practice, here-documents will be much more useful when defining constants--like when you're writing usage messages, for example. For uses like that, here-documents are invaluable.
Retry. Get an exception. Fix the problem in the rescue clause. Issue "retry" to start the method over again.

BEGIN/END segments for stuff you want to at the start or end of the program. Simple. Easy.

gets. If a file was specified on the command line, reads that. Otherwise, reads standard input. Puts the value in the standard location ($_). Not even one line of code--one word of code. Easy. Simple.

Read all lines of a file into memory with array = File::readlines(path)

String functions like center, include, split, slice, squeeze, strip. Plus regular expression processing that's built into the language: /...regular expression here.../.

-p and -n command line options. Like Perl, the interpreter iterates over the contents of a file and executes your code on each line. It then either automatically prints the line (-p), or doesn't print unless you say so (-n). Those come in handy for little jobs from time to time. It's nice to have one language for that kind of thing that you can also use for larger stuff--like the mailing list program.

Building a Mailing List Processor

The rubber meets the road when you sit down to build something. That's when you begin to answer the question, "how much work is it going to require to get stuff done?" When you're looking at a long list of backlogged projects, that becomes a pretty important question.

The answer, in Ruby, keeps turning out to be "not a heck of a lot". Reading mailing addresses out of a file: A few lines of code. Sending an SMTP message: A few more lines. Processing arguments: A couple more. Reading the message from a file: A one-liner.

The last problem to solve: Reading named parameters from a configuration file, so I can specify values like the from-address in one place. I haven't quite found the solution yet, but I'm sure it's short. It should be as simple as reading a Struct from a file. I haven't quite put the pieces together yet, but I know the answer is there.

If it weren't for the learning curve, I'd be done by now. Reading books, finding examples, understanding how things work--I've spent a little while doing that. But the code is essentially trivial. That's the way it should be! It gives me the happy feeling that the time invested in acquiring understanding will pay off handsomely in the future.

This post has gotten too long already, so I'll save the code snippets for another time. But I had to share my delight with Ruby and Rake. As always, the delight will last until I find even better tools--but it might be quite a while before that happens.

Talk Back!

Have an opinion? Readers have already posted 8 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Eric Armstrong adds a new entry to his weblog, subscribe to his RSS feed.

Digg |

del.icio.us |

About the Blogger

Eric Armstrong has been programming and writing professionally since before there were personal computers. His production experience includes artificial intelligence (AI) programs, system libraries, real-time programs, and business applications in a variety of languages. He works as a writer and software consultant in the San Francisco Bay Area. He wrote The JBuilder2 Bible and authored the Java/XML programming tutorial available at http://java.sun.com. Eric is also involved in efforts to design knowledge-based collaboration systems.


	Web Artima.com