Summary
One of the core tenets of object-oriented programming is encapsulation. It's one of OO's most powerful ideas. More powerful than inheritance. Unfortunately it's also one of the most ignored.
Advertisement
Wow.. it's been a year ... yikes. That's partly due to my having set up blog.daveastels.com. But I'll try to post programming related highlights here. So to kick that off... here's a piece I wrote this weekend.
In "OO in One Sentence: Keep It DRY, Shy, and Tell The Other Guy" (http://www.pragmaticprogrammer.com/articles/may_04_oo1.pdf) Andy Hunt & Dave Thomas (The Pragmatic Programmers) talk about good OO code being shy:
"The best code is very shy. Like a four-year old hiding behind a mothers skirt, code shouldnt reveal too much of itself and shouldnt be too nosy into others affairs.
But you might find that your shy code grows up too fast, shedding its demure shyness in favor of wild promiscuity. When code isnt shy, youll get unwanted coupling."
Something I see all the time, on every team I've been involved with, is code like the following (classes are generalized from examples):
myThing[] things = thingManager.getThingList();
for (int i = 0; i < things.length; i++) {
myThing thing = things[i];
if (thing.getName().equals(thingName)) {
return thingManager.delete(thing);
}
}
This code is tightly coupled to the implementation of myThing in that it gets the name property, knows that it's a string, etc. This is a classic approach from the old days or procedural programming. It is NOT OO.
How about:
myThing[] things = thingManager.getThingList();
for (int i = 0; i < things.length; i++) {
myThing thing = things[i];
if (thing.isNamed(thingName)) {
return thingManager.delete(thing);
}
}
This code still knows the details of myThing. Why should it? All it should know is how to ask a thing manager to delete a thing given its name:
return thingManager.deleteThingNamed(thingName);
thingManager is closer to MyThing, in fact they're likely packaged together, so it's not as bad for it to have somewhat more intimate knowledge of MyThing.
This is just one example of the approach of treating objects like new age data structures: Ask an object for information about its state, process that, make decisions based on that, then do something to/with the object.
In the above example, the code
asks the thingManager for its list of things
iterates through those things
asks each one for its name
checks if the name is what it is looking for
if so it asks the thingManager to delete that thing
The proper approach is to ask the thingManager "Delete a thing named *this* if you have one. Let me know if you were able to do that."
As a side note, writing code this way makes it much more understandable.
Do you write code like the first snippet? If so, you're not doing OO! Stop telling people that you are. Or better, start doing OO. Better still, get your whole team doing OO.
A good point you illustrate is that programmers shouldn't stop after they have implemented individual objects. Extend OO to encapsulating collections of those objects.
One downside of agressive encapsulation is that it forces inheritance on clients if the original class doesn't expose a useful interface. Imagine the hacks required if thingManager didn't expose any way to delete a thing. If thingManager can't be extended through inheritance because all data members are inaccessible even to subclasses, thingManager is close to useless.
I tend to prefer using generic collections and iterators when I can. I get the advantages of hiding lots of implementation details without having to limit myself to using classes that came with their own collection manager. And I have less to learn and remember if collections of objects behave the same way.
In C++ I could get similar functionality as thingManager using a std::list<myThing> and things.remove(thingName). When designing classes for languages that support generics, the class designer should design classes that play well with the built-in containers and algorithms.
A general symptom of poor encapsulation that I see in Java is code with data classes that have only getters and setters and then doer classes that operate on the data classes. So code like the delete loop may appear many times throughout code.
The bad code in your example probably contains a mutator for the collection/array too. So it's impossible to enforce the state of collection in ThingManger. Who knows what's in there if it's supposed to be empty? Is it null? Is it zero-length?
I found that once I started doing test-driven development I started to think a more carefully about interfaces and object responsibilities. The result was classes with expressive methods that delegate to the proper place, as you showed with thingManager.deleteThingNamed().
I think when programmers first learn an OO language what they learn about OO is in the introductory chapter of a programming language book that mostly focuses on inheritance. As a result, many OO programmers create very elaborate and brittle inheritance structures, but that's a whole other post.
You're thinking in terms of writing a third-party library rather than writing an application. If you can change the source, you can just add any missing methods when you need them - no subclassing necessary.
Standard collections are generic and that's both a strength and weakness. If a collection is exposed, it's difficult to see how much functionality is actually used. For example, there may be a delete method, but objects in this particular collection are never actually deleted. The resulting confusion is similar to what happens when you have dead code.
Encapsulating the collection as a private instance variable and writing application-specific methods for each use case makes it easier to understand how the collection is used. As the application's functionality changes, programmers add and delete methods, providing an up-to-date summary of the collection's usage. If there's no delete method, you know that the collection only grows, never shrinks. That's good to know.
This methodology assumes you have no backward compatibility constraints. You wouldn't want to use it for API's that you intend to publish and maintain backward compatibility with. It's a different mindset.
The problem here is just how much you need to encapsulate. There can become such a large number of operations you'll need to implement in an encapsulated, OO manner, you'll be overwhelmed by all of them. How do you both account for all the things users of the component will want to do with it, while wisely utilizing the resources to develop said component?
I am not very good at figuring out OO right away, but this is how I end up reasoning it:
If I see this once, then coupling is in one spot; I might or might not spot it:
myThing[] things = thingManager.getThingList();
for (int i = 0; i < things.length; i++) {
myThing thing = things[i];
if (thing.getName().equals(thingName)) {
return thingManager.delete(thing);
}
}
If I see it twice then I factorize the code and say:
static ThingManager removeFromThingManager(ThingManager thingManager, String thingName){
myThing[] things = thingManager.getThingList();
for (int i = 0; i < things.length; i++) {
myThing thing = things[i];
if (thing.getName().equals(thingName)) {
return thingManager.delete(thing);
}
}
}
removeFromThingManager(thingManager,thingName);
Then if I see a static functions where the first parameter is a class I have control over, I just factor out the first parameter by putting the function into the class:
ThingManager removeFromThingManager(String thingName){
myThing[] things = this.getThingList();
for (int i = 0; i < things.length; i++) {
myThing thing = things[i];
if (thing.getName().equals(thingName)) {
return this.delete(thing);
}
}
}
This is probably not the correct way to reason about a system, but I can refactor in this mechanistic way even if I am completely wasted.
re: "How do you both account for all the things users of the component will want to do"
Well, again, this is a difference in mindset between developing a publicly available library and writing an application.
For an internal component, you don't need to anticipate all the things a user might need to do, because those users are yourself and other members of your team, who have access to the source and are capable of adding new operations as needed. In fact, you shouldn't try to anticipate because you'll write dead code (methods that are never called), which is both confusing and a waste of time.
If it turns out that a class has too many operations then you can refactor. But it doesn't happen all that often.
> You're thinking in terms of writing a third-party library > rather than writing an application. If you can change the > source, you can just add any missing methods when you need > them - no subclassing necessary.
Actually I was thinking in terms of writing an application and having to use classes that are too encapsulated. Too often I run into APIs that are both over-engineered and less useful than they could be, because the designer encapsulated too much and tried to anticipate every need except for the need to be flexible.
When working with a language that provides generic containers and algorithms I think it's reasonable to want new classes, whether developed as part of the application or as a library or platform API, to at least play nice in containers, to provide iterators, etc.
In other words it's more important to me that the classes are useful in contexts the original designer didn't anticipate than it is for the designer to foresee and program for everything in advance.
When source code is available this, at worse, turns into a refactoring problem, but I'd still prefer to see classes conform to functionality and idioms of the language.
> Standard collections are generic and that's both a > strength and weakness. If a collection is exposed, it's > difficult to see how much functionality is actually used. > For example, there may be a delete method, but objects in > n this particular collection are never actually deleted. > The resulting confusion is similar to what happens when > n you have dead code.
If there's actually a need for a collection that can't be deleted from or added to or whatever, I agree that it's probably best implemented in terms other than generic containers. But in a year when I need to get a list of all things updated since six months ago, and the class interface didn't specify that, I'll be wishing for generic support.
re: "But in a year when I need to get a list of all things updated since six months ago, and the class interface didn't specify that, I'll be wishing for generic support."
Encapsulation doesn't mean you stop using collections. If the feature isn't there, you still have to write the five lines of code (or whatever it is) to do something new with the collection. It's essentially the same code regardless of whether the collection is public or private. The only difference is where you put the code.
If the collection is public, you can write the 5 lines of code inline in some random class where it's used. Eventually someone else on the team might need to do the same thing and reinvent the wheel in some other place, resulting in duplication. (I've seen some pretty bad codebases that evolved this way.)
If it's private, you have to write a new method on the class containing the collection and call it. This makes the code better organized and easier to read, and is much less likely to result in duplication.
(The funny thing is that using encapsulation in this way means that you're never really finished with a class. You don't just use it as a black box; so long as there are new features to add, you keep opening it up and changing it. It's a rather counterintuitive result of encapsulation.)
> If the collection is public, you can write the 5 lines of > code inline in some random class where it's used. > Eventually someone else on the team might need to do the > e same thing and reinvent the wheel in some other place, > resulting in duplication. (I've seen some pretty bad > codebases that evolved this way.)
It sounds like that's a time to refactor. If people get used to refactoring routinely, those codebases will evolve differently.
> (The funny thing is that using encapsulation in this way > means that you're never really finished with a class. You > don't just use it as a black box; so long as there are new > features to add, you keep opening it up and changing it. > It's a rather counterintuitive result of encapsulation.)
It may be unrealistic to expect you're ever finished with a class in the sense that it becomes a permanently-useful black box.
Why not follow reasonable guidelines for encapsulation (and other best practices) based on the requirements you know today, and be prepared to make changes when (not if) requirements change tomorrow?
Dave, thanks for this. I've seen far too much code like that which you describe; probably written some too. It's helpful to be reminded of "the basics" every now and then.
In my early career, I had the notion of encapsulation explained to me a couple of ways that really stuck:
1) If you want to borrow five dollars from me, you don't grab my pocket with my wallet, extract the wallet from it, open the wallet, start leafing through the bills until you find five dollars, extract the five dollars, put the wallet back in my pocket, and say "Thanks." You just ask me, "Can I borrow five dollars?" I'll do the rest -- I'll even track the IOU.
2) If you want to know what I had for breakfast this morning, you don't cut my stomach open, scrape a sample of not-completely-digested food, gastric juices and all, and perform a spectroscopic analysis on it to determine the chemical content and probable original food form. You just ask me, "What did you have for breakfast this morning?" I promise to be honest.
Hope this is useful. I think I'll blog about it myself.
> Actually I was thinking in terms of writing an application > and having to use classes that are too encapsulated. .. or classes that just have an awful lot of methods because of encapsulation.
I think this is a fundamental trade-off in Encapsulation: + hides implementation details + reduces code duplication - makes the API thicker and harder to understand
I think a good example of where different approaches have been tried is DOM Implementations: 1) W3C DOM: some utility methods, excessive interfaces make it hard to use 2) JDOM: less utility methods, class-based 3) DOM4J: W3C dom + tons of utility classes 4) XOM: very thin API, easy to use
Yet, IIRC, none of these DOM variants have a method that says element.deleteChildWithNameAttribute ("foo"); This may not be a very accurate comparison to the code in the original post - but I would still argue that encapsulating this peace of code would be counter-productive because it makes the API harder to understand (unless it is needed many times).