Andy Hunt and Dave Thomas are the Pragmatic Programmers, recognized internationally as experts in the development of high-quality software. Their best-selling book of software best practices, The Pragmatic Programmer: From Journeyman to Master (Addison-Wesley, 1999), is filled with practical advice on a wide range of software development issues. They also authored Programming Ruby: A Pragmatic Programmer's Guide (Addison-Wesley, 2000), and helped to write the now famous Agile Manifesto.
In this interview, which is being published in ten weekly installments, Andy Hunt and Dave Thomas discuss many aspects of software development:
Dave Thomas: All programming is maintenance programming, because you are rarely writing original code. If you look at the actual time you spend programming, you write a bit here and then you go back and make a change. Or you go back and fix a bug. Or you rip it out altogether and replace it with something else. But you are very quickly maintaining code even if it's a brand new project with a fresh source file. You spend most of your time in maintenance mode. So you may as well just bite the bullet and say, "I'm maintaining from day one." The disciplines that apply to maintenance should apply globally.
Andy Hunt: It's only the first 10 minutes that the code's original, when you type it in the first time. That's it.
Bill Venners: What's the DRY principle?
Dave Thomas: Don't Repeat Yourself (or DRY) is probably one of the most misunderstood parts of the book.
Bill Venners: How is DRY misunderstood and what is the correct way to understand it?
Dave Thomas: Most people take DRY to mean you shouldn't duplicate code. That's not its intention. The idea behind DRY is far grander than that.
DRY says that every piece of system knowledge should have one authoritative, unambiguous representation. Every piece of knowledge in the development of something should have a single representation. A system's knowledge is far broader than just its code. It refers to database schemas, test plans, the build system, even documentation.
Given all this knowledge, why should you find one way to represent each feature? The obvious answer is, if you have more than one way to express the same thing, at some point the two or three different representations will most likely fall out of step with each other. Even if they don't, you're guaranteeing yourself the headache of maintaining them in parallel whenever a change occurs. And change will occur. DRY is important if you want flexible and maintainable software.
The problem is: how do you represent all these different pieces of knowledge only once? If it's just code, then you can obviously organize your code so you don't repeat things, with the help of methods and subroutines. But how do you handle things like database schemas? This is where you get into other techniques in the book, like using code generation tools, automatic build systems, and scripting languages. These let you have single, authoritative representations that then generate non-authoritative work products, like code or DDLs (data description languages).
Bill Venners: If you build a code generator to avoid duplication, you must invest the time to build and maintain the code generator. You have to explain the code generator to other team members. Therefore, creating a code generator has costs as well as benefits. How do you decide when the return on investment is great enough to actually justify building one?
Dave Thomas: You never build a code generator just because you feel like it. You build one because you are motivated by some underlying principle, like avoiding duplication. Or, sometimes you might want to encapsulate the knowledge of an expert and make it available to other people. For example, you can tell a Microsoft IDE that you want a new MFC (Microsoft Foundation Classes) application, and it will generate 2000 lines of code for you. Most people who use it, to their shame, don't understand that code. Some expert at Microsoft said this is the way it should be, and the whole world followed suit. Don't use code generators because you fancy doing it. Use them because there's a business benefit to doing so.
That's true for all the tools we recommend using. Don't use them just because they're there. Use them because they fit into the overall philosophy of how you want to develop.
Andy Hunt: Creating a code generator is an investment. You're banking that it will be cheaper in the long run to build the code generator, because as changes come up you can simply tweak the input to the code generator and regenerate the byproducts. Without a code generator, you will have to manually make changes by hand each time to all the byproducts. If you expect a lot of volatility, a code generator can be a good investment.
Dave Thomas: In fact, I'd go further. Typically, if I build a code generator, chances are good I will generate the products manually first. And only when I come back to it will I say, "I'm now in a situation where I need to automate this." That approach has two benefits. First, I might never come back, in which case I don't have to write the code generator. If I do come back, I've already validated the code generator's output. So I don't just march off into the unknown, I actually aim at a definite target.
Bill Venners: I once had a manager who discouraged me from making a code generator tool by using the argument, "But then we'll have to maintain the tool." I later decided he was right in that case, because it was better for the company that I just write the code by hand. In a different case, though, I created a code generation tool that paid for itself. We had a database whose schema changed from release to release. My code generator read in an SQL database schema, that we had to create anyway to change the database, and generated the layer, a C module, between the database and the code. Every time we changed the database, boom, we just regenerated the layer. It saved time and bugs, because once we got the bugs out of the code generator, the C code it generated never contained any bugs.
Dave Thomas: There's also a subtle effect that comes about when you create a code generator. If you remove the friction for change, you'll find yourself making changes you need to make more often. That's a good thing.
Andy Hunt: You're not resisting the process.
Dave Thomas: Right. Maybe a schema change is not a great example, because you don't make gratuitous schema changes. But if you have a code generator that makes something painless, then you're more likely to use it. That means you're more likely to keep things tidy and clean. And you'll extend the life of your software. It's a good habit to follow.
Andy Hunt: This relates to working with metadata, an idea presented later in the book. The idea is to work closer to the level of what you want to express. In the case of schema change, you'll go into an SQL file and make a schema change. You don't have to change member variables and data fields just because the schema changed, the code generator does that for you. When you need to make a schema change, you make it and hit the button. Metadata keeps the act you must perform commensurate with the change itself. You want to avoid, at all costs, a little change that requires a bunch of other tasks. That kind of magnification kills many projects. The simple code generator or equivalent metadata technique keeps the actions commensurate.
Dave Thomas: This extends to the idea of orthogonality. If you have a truly orthogonal system, unrelated elements are expressed independently. Here you have a business-level change: one thing changes at the business level and one thing changes in the system. If the boss says, I want negative numbers red, you change one thing and suddenly all the negative numbers in the system are red.
Andy Hunt: That sounds a lot like DRY: One piece of knowledge in the domain changes, one piece of the system changes.
Dave Thomas: That almost happened to me about a month ago. I was working with a large web application that dealt with registrations in an online membership system. The numbers started out small, a couple hundred here and there. As it grew to tens of thousands, the client found the numbers difficult to read, so they requested commas in all the numbers. Luckily, I only had one change to make, and every number the system outputted then had commas in it. That's an important thing to think about. Code generators let you do that, because they let you have one expression of something.
Bill Venners: What does orthogonality mean and why is it good?
Andy Hunt: The basic idea of orthogonality is that things that are not related conceptually should not be related in the system. Parts of the architecture that really have nothing to do with the other, such as the database and the UI, should not need to be changed together. A change to one should not cause a change to the other. Unfortunately, we've seen systems throughout our careers where that's not the case.
Dave Thomas: For example, when one client changed the number of lines on the screen, they had to change the database schema too.
Andy Hunt: This is what computer scientists call coupling, one thing is tied—or coupled—to another. There are exceptions and tradeoffs, but in most cases you want to minimize coupling between things that are otherwise unrelated. It is easy in an OO system to accidentally introduce coupling. You can do it just by the way you set up libraries, if you're working in a language like C or C++, where you link things together. If you want to make a little program to test out one API or interface and you have to link in every library in the system, you have too much coupling. Your system is not orthogonal.
Orthogonality is one of those creeping viral problems. If you introduce some new functionality and you realize you've coupled it to something unnecessarily, you might say, "These two things shouldn't really know about each other, but it's OK. It's just these two things." But the next functionality you add might also know about something it shouldn't. Soon you have four things that know about each other that shouldn't. The problem grows somewhat unexpectedly. You get a system that quickly becomes a nightmare. One way we illustrate a highly coupled system in the book is the helicopter story.
Bill Venners: I thought the helicopter story was a great illustration. Why don't you tell it?
Dave Thomas: A helicopter has four main controls: foot pedals, collective pitch lever, cyclic, and throttle. The foot pedals control the tail rotor. With the foot pedals you can counteract the torque of the main blade and, basically, point the nose where you want the helicopter to go. The collective pitch lever, which you hold in your left hand, controls the pitch on the rotor blades. This lets you control the amount of lift the blades generate. The cyclic, which you hold in your right hand, can tip one section of the blade. Move the cyclic, and the helicopter moves in the corresponding direction. The throttle sits at the end of the pitch lever.
It sounds fairly simple. You can use the pedals to point the helicopter where you want it to go. You can use the collective to move up and down. Unfortunately, though, because of the aerodynamics and gyroscopic effects of the blades, all these controls are related. So one small change, such as lowering the collective, causes the helicopter to dip and turn to one side. You have to counteract every change you make with corresponding opposing forces on the other controls. However, by doing that, you introduce more changes to the original control. So you're constantly dancing on all the controls to keep the helicopter stable.
That's kind of similar to code. We've all worked on systems where you make one small change over here, and another problem pops out over there. So you go over there and fix it, but two more problems pop out somewhere else. You constantly push them back—like that Whack-a-Mole game—and you just never finish. If the system is not orthogonal, if the pieces interact with each other more than necessary, then you'll always get that kind of distributed bug fixing.
The funny thing about the helicopter story is that I'm not a helicopter pilot. When I wrote the helicopter story, I wanted to make sure it was accurate. I knew of a USENET group on helicopters, so I posted the helicopter story saying, "This is what I'm intending to write about how helicopter controls work. Is it correct? A helicopter pilot emailed me and said, "I read what you wrote about controlling helicopters. I didn't sleep all night."
Come back Monday, March 17 for Part III of this conversation with Pragmatic Programmers Andy Hunt and Dave Thomas. If you'd like to receive a brief weekly email announcing new articles at Artima.com, please subscribe to the Artima Newsletter.
Andy Hunt and Dave Thomas are authors of The Pragmatic Programmer, which is available on Amazon.com at:
The Pragmatic Programmer's home page is here:
Dave Thomas was not the first person I've interviewed who mentioned the arcade game Whack-a-Mole. James Gosling also called upon the versatile Whack-a-Mole metaphor while pointing out that it is sometimes hard in engineering to know if you've solved a problem or moved it:
The Agile Manifesto is here:
Ward's Wiki, the first WikiWikiWeb, created by Ward Cunningham, is here:
Venners is President of Artima Software, Inc. and
Editor-In-Chief of Artima.com. He is the author of Inside the Java Virtual
Machine (Computing McGraw-Hill), a programmer-oriented
survey of the Java platform's architecture and internals. His
popular columns in JavaWorld magazine covered Java internals, object-oriented design, and Jini. Bill has been active in
the Jini Community since its
inception. He led the Jini Community's ServiceUI project, whose
ServiceUI API became the de facto standard for associating user
interfaces to Jini services. Bill also serves as an elected
member of the Jini Community's initial Technical Oversight
Committee (TOC), and in this role helped to define the governance
process for the community.