This post originated from an RSS feed registered with Java Buzz
by Elliotte Rusty Harold.
Original Post: Regular Expressions
Feed Title: The Cafes
Feed URL: http://cafe.elharo.com/feed/atom/?
Feed Description: Longer than a blog; shorter than a book
Here’s part 12 of the ongoing serialization of Refactoring HTML, also available from Amazon and Safari.
Manually inspecting and changing each file on even a small site is tedious and often cost-prohibitive. It is much more effective to let the computer do the work by searching for mistakes and, when possible, automatically fixing them. A number of tools support this, including command-line tools such as grep, egrep, and sed; text editors such as jEdit, BBEdit, TextPad, and PSPad; and programming languages such as Java, Perl, and PHP. All these tools provide a specialized search syntax known as regular expressions. Although there are small differences from one tool to the next, the basic regular expression syntax is much the same.
For purposes of illustration, I’m going to use the jEdit text editor as my search and replace tool in this section. I chose it because it provides pretty much all the features you need, it has a reasonable GUI, it’s open source, and it’s written in Java, so it runs on essentially any platform you’re likely to want. You can download a copy from http://jedit.org/.
However, the techniques I’m showing here are by no means limited to that one editor. In my work, I normally use BBEdit instead because it has a slightly nicer interface. However, it’s payware and only runs on the Mac. There are numerous other choices. If you prefer a different program, by all means use it. What you’ll need are:
Full regular expression search and replace
The ability to recursively search a directory
The ability to filter the files you search
A tool that shows you what it has changed, but does not require you to manually approve each change
Automatic recognition of different character encodings and line-ending conventions
Any tool that meets these criteria should be sufficient.