Summary
Sometimes code duplication is warranted, even good.
Advertisement
I read a lot that code duplication is bad, and that when we find it, we should refactor. In most cases, that's probably true, but not always. Sometimes code duplication is warranted, even good.
Consider a component that you are creating. You know there is another component available that has a small amount of behavior you'd like to use. You could reference that other component and avoid duplication. Is the benefit of avoiding code duplication worth the extra deployment dependency, complexity, and configuration issues that might be involved? It may not be.
Or how about when you'd like to use some of what a 3rd party component has to offer, but you don't want the dependency on the third party component because it carries with it quite a bit of overhead. Might you consider duplicating the code to avoid the heavyweight dependency? It's certainly a viable option, in my opinion.
Szyperski in Component Software states that "Although maximizing reuse has many oft-cited advantages, it has one substantial disadvantage - the explosion of context dependencies." And herein lies the rub with ideas such as the DRY principle. It is not universally applicable. No principle is actually. There will always be some condition that is grounds for violation of almost anything you've been told.
While principles, patterns, heuristics, and guidelines offer valuable advice about how to do things the right way, they don't serve as substitution for thinking on our own. Unfortunately, I think we all too often let these principles, etc. do the thinking for us.
All the most ardent "Duplication Haters" I'm aware of - The Pragmatic "be wary of the hype" Programmers, Kevlin "use comes before reuse" Henney, Uncle "I'd rather use a socket" Bob, Kent "just get the damn thing out of the door" Beck, ..., - apply the DRY principle to self-written code only. They won't even consider code "duplicate" just because some functionality is available somewhere else.
Succumbing to a desire to always reinvent the wheel is not a very productive attitude, and sometimes reuse is (effectively) free, but when it comes at a cost, you have to make trade-offs.
Or, as one perl programmer - forgot who it was - once wrote: Why reinvent the wheel, if you can reinvent the engine?
But use/reuse/abuse should be considered as an investment decision. It has nothing to do with DRY.
First of all, I agree with your point, and will go further to say that nothing is absolutely always bad.
But I have to say your argument is not really explaining what the title claims, instead it is saying dependency is worse than code duplication, which most of developers, I believe, will agree wholeheartedly.
During refactoring, duplication is one of the easiest code smells to spot and refactor, thus most will be tempted to fix it up right away, with other refactoring candidates of higher priority left untouched, among which are highly inter-dependent and tightly coupled code or deployment configuration.
I think it's important to differentiate between duplication inside your own code and duplication with code you can't control. If you control the code, you can remove duplication and manage dependencies well.
I agree with the last poster, Micheal, in that duplication in your own code (or components) should be avoided. I like to practice the DRY principle as much as possible.
So, I'd like to say: Duplication Is Almost Always Bad
Glad to see this article, very pragmatic. There are no absolutes in our business - that's why its called engineering. The bottom line is that reuse by definition necessarily implies dependency. So you have to evaluate it always case by case.
Glad to see this article, very pragmatic. There are no absolutes in our business - that's why its called engineering. The bottom line is that reuse by definition necessarily implies dependency. So you have to evaluate it always case by case.
> Glad to see this article, very pragmatic. There are no > absolutes in our business - that's why its called > engineering. The bottom line is that reuse by definition > necessarily implies dependency. So you have to evaluate it > always case by case.
Well, it's good in that it reminds us that there are some qualifiers to the general statement that "duplication is bad" but, in general, you don't have to constantly trade off removing duplication and increasing dependency in code you write. Duplication is worse from a maintenance perspective and, that said, the issue with dependency isn't how many dependencies you have in a system, but rather what kind of dependencies you have. Some are much better than others.
For instance, if we could have a single class with no dependencies that has a lot of duplicated code. Once we factor out the duplication, we'll have introduced some dependencies, but if we do it well they will be dependencies on abstractions and we'll have a much more flexible system despite having more dependency. Again, the amount of dependency isn't as important as the kind of dependencies.
>Duplication is worse from a maintenance perspective and, >that said, the issue with dependency isn't how many >dependencies you have in a system, but rather what kind of >dependencies you have. Some are much better than others.
Right. And dependencies that carry a lot of weight with them are not good. Sometimes you can refactor and replace the heavyweight dependency with a dependency upon an abstraction. But other times, this isn't feasible because you're using a 3rd party component, or working with code that you are not authorized to change, or the refactoring is complex due to a tangled mess of code and you simply haven't been given the time. In these cases, among others, duplication may be a viable option.
> >Duplication is worse from a maintenance perspective and, > >that said, the issue with dependency isn't how many > >dependencies you have in a system, but rather what kind > of >dependencies you have. Some are much better than > others. > > Right. And dependencies that carry a lot of weight with > them are not good. Sometimes you can refactor and replace > the heavyweight dependency with a dependency upon an > abstraction. But other times, this isn't feasible because > you're using a 3rd party component, or working with code > that you are not authorized to change, or the refactoring > is complex due to a tangled mess of code and you simply > haven't been given the time. In these cases, among others, > duplication may be a viable option.
I agree that good engineering means knowing when to bend or break the rules.
However I think, that as a case against DRY, Kirk's argument is flawed.
Kirk argues that removing code duplication creates deployment dependencies and that the deployment dependencies may be worse than the original code duplication.
Fair enough. Except it's faulty to assume that applying DRY will create deployment dependencies. DRY is about managing knowledge not blocks of code. DRY says don't write and maintain multiple expressions of the same code; it doesn't say that single expression can't be incorporated into multiple components.
The pragmatic programmers are fans of code generators and sophisticated automated build systems for a reason. Deployment dependencies can be eliminated as part of the build without violating DRY. The same set of .class files can be packaged in multiple JARs. The same static library can be linked into multiple components. You're DRY as long as there is a single authoritative source expression and derivative products can be generated in a consistent automated fashion.
Yes, this places an increased burden on the build system. But risks and costs are generally lower in the build environment than in the deployment environment. So is the benefit of remaining DRY and avoiding deployment dependencies worth extra build dependencies, complexity, and configuration? The answer may be yes.
>>The same set of .class files can be packaged in multiple JARs.
I wouldn't recommend making a habit out of this. It will result in errors if you inadvertently deploy two JARs with the same class. So it requires you to know the classes inside a JAR file before you use it, not just the JARs published API.
But I do agree in that this is a technique that could be used, but with caution.
Heavy and tangled dependencies can crush a system. A little bit of code duplication won't.
>>The same set of .class files can be packaged in multiple JARs
I don't practice this. I have numerous systems over the past 10 years that I have put into place that depend upon the same class. However, the class may have changed slightly for each new application that came along to use it. Differing versions of the class exist at the application level and are stored in separate projects in our CVS archive.
I'm not seeing the overhead in having applied things in the manner I have. I'm not seeing bugs that appear in multiple applications and have to be fixed in more than one spot (not sure the above method does much for you here anyway, you still have a bug in two applications and will have to redeploy two apps which is the most difficult part anyway at least at my company it is). I'm not seeing where the older applications needed the newer or revised functionality applied in the newer applications. I'm not a code duplication kind of guy and I do try to avoid it when I can, but across application boundaries I don't worry about it and I'm glad I haven't. I've already got enough to worry about.
I do agree with the other fellow that reuse at the individual programmer level is probably one of the most powerful types of reuse. However, that does not mean that my reused class cannot be morphed as I go without having to affect legacy applications. None of this is to say that I do not or have not reused 3rd party components and such across application boundaries. The context of each case may require us to think differently and create a new or revised solution.
> >>The same set of .class files can be packaged in multiple > JARs. > > I wouldn't recommend making a habit out of this. It will > result in errors if you inadvertently deploy two JARs with > the same class. So it requires you to know the classes > inside a JAR file before you use it, not just the JARs > published API. >
Right. I was thinking in terms of packaging different applications. But it's a bad example for the general case.
> But I do agree in that this is a technique that could be > used, but with caution. > > Heavy and tangled dependencies can crush a system. A > little bit of code duplication won't.
Heavy code duplication will crush a system just as surely as heavy dependencies but it does takes a longer time scale for code duplication to become a problem. This makes it easy to miss or ignore the problem.
I have seen systems where the cost and effort to effect changes (either bug fixes or new features) has escalated because of code duplication. The increase in effort may seem negligible at first but the problems compound.