The Artima Developer Community
Sponsored Link

Computing Thoughts
Collection versus Iterator
by Bruce Eckel
July 15, 2005
Summary
According to the Design Patterns book (aka GoF), the intent of the Iterator pattern is to "Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representation." Or more simply, an iterator unifies access to containers.

Advertisement

As I observed in the previous weblog entry, the C++ STL does not try to treat all the containers as "all one type" by giving them a common interface. Instead, the C++ STL algorithms interface to both containers and arrays using the same mechanism: the iterator. You don't pass a container to a function, you pass iterators -- usually two of them, one to denote the beginning and one the end of a sequence (this way you can also denote subsequences).

The Java library designers, however, appeared to find commonality in List, Set, and with J2SE5, Queue; enough commonality to unify them under the Collection interface.

What does it mean to unify List, Set, and Queue? I would argue that what they appear to have in common is structural rather than functional: all three involve holding objects. However, what they do with those objects is very different. A List keeps elements in order -- this is the primary tool for storing objects for later use. A Set prevents duplicates, and is most often used to keep track of things, so you can ask, "is this object in the set?" And a Queue is primarily for communicating between processes (even if those processes happen to share the same thread): one process puts objects on the Queue, the other process consumes the objects. All Queues in the standard Java library keep the objects in the enqueueing order except for PriorityQueue. LinkedList implements Queue, but most of the queue implementations are for concurrency.

The problem with this approach is that classifying all these containers under Collection says that they are all the same type -- they are all basically the same thing. Or put another way, that they all happen to hold objects is all that's necessary to unify them under a common interface.

But almost all classes hold other objects, and we don't say that they are the same. Instead, we emphasize their behavior, their contract as expressed by the interface. And the behavior of Set, List and Queue are quite different.

On top of the alternative approach taken by C++, the Set, List, Queue and Map are all "container classes." They all hold objects. But the unification of "Collection" didn't extend to Map, because in the case of Map the lack of similarity is obvious, whereas in the case of the other three it can appear that they all have some kind of "sequence" behavior. However, "sequence" behavior is precisely what an Iterator expresses, and Maps also produce Iterators.

An argument for inheritance is the "Liskov Substitution Principle" where a derived type can be substituted for the base type. What you'll usually see (and this may be the precise definition of Liskov Substitutability; perhaps someone can clarify this in the comments) is a common interface, which is not extended but rather implemented to produce different behavior for different subtypes.

But the Collection hierarchy is strange. The base type Collection is duplicated exactly, without extending the interface, in the Set class, so a Set is perfectly substitutable for a Collection. However, List and further LinkedList add significantly important methods to the interface, and Queue adds different methods. In addition, the Queue methods could provide the Queue behavior by themselves, without the rest of the Collection interface.

I suggest that the appearance of the Collection interface is a result of "coincidental commonality," and that List, Set, and Queue are not the same basic type.

Examining the standard Java library code using the Python tool shown in my prior weblog posting (which is limited; it can only discover methods whose argument lists are on a single line), here is what I found. First, the number of examples I discovered using the following as arguments:

Map (65 instances)
Set (19 instances)
LinkedList (0 instances)
Iterator (11 instances)
List (38 instances)
Collection (20 instances)
Enumeration (17 instances)
ListIterator (0 instances)
Iterable (0 instances)

Iterable one of the newest standard interfaces (it's actually packaged in java.lang). It buys you more than passing an Iterator, because an Iterable is a class that will produce an Iterator, and therefore you could make multiple passes through a container -- this would seem to be a good solution for unifying access to containers. However, no classes in the standard Java library were detected that pass Iterable as an argument.

Much of the time, even when a specific type like Set is used as an argument, an Iterator is still taken. In this case, the Set is specified to restrict usage to that type, rather than trying to create a more general method.

In other cases the container is simply passed through to another method call, which doesn't give you information one way or the other, so these cases can be ignored (other than to note that, when they specify a subtype of Collection such as Set, they are placing a restriction on what can be passed).

Every once in awhile there's a call to a method like toArray() that is part of the Collection interface -- but again, this could be accomplished with an iterator.

In the standard Java libraries, Maps are most commonly used as "messengers" a.k.a. "data transfer objects," to move clumps of named data around. In these cases, they are really being used as untyped objects, so you see a lot of dynamic type checking going on. It's interesting to note that all Python objects are implemented as hash maps, so this behavior is built in and has better support than it does in Java. In many cases where a Map is used as a data transfer object in the libraries, a class could have been created instead, but clearly the author of that library saw some benefit to the dynamic nature of the "object made from a Map."

Iterators can also provide a safety mechanism (by using fail-fast iterators to more easily spot bugs) against concurrent modifications.


As an aside, I came across some strange code when doing this analysis. One file is src\java\security\cert\X509CertSelector.java
private static Set<GeneralNameInterface> 
parseNames(Collection<List<?>> names) throws IOException {
  // ...
  Iterator<List<?>> i = names.iterator();
  while (i.hasNext()) {
    Object o = i.next();
    if (!(o instanceof List)) {
      throw new IOException("expected List");
    }
    List<Object> nameList = (List<Object>)o;
  // ...

What's strange about this is that it appears at first to be only partially-rewritten code. Notice that the Collection that's passed in is parameterized on List<?>, and the Iterator that's taken from the Collection argument is also parameterized, but when next() is called the type is ignored and the result is upcast to Object. That result is then tested to see whether it's a List -- but presumably it must be, from the static type checking provided by generics. Finally, it's recast, but to a parameterized type. So at first you're thinking that maybe they just stopped using parameterized types in the rest of the method, but then they reappear after the hand-coded dynamic checking.

I saw this approach in a number of places. But I also saw a fair amount of library code that hadn't been generified, so it was puzzling. If you want to study the code examples yourself, run the Python program from the previous weblog entry (it's been rewritten a few times so it's improved a lot).

I'm guessing that this code was rewritten after generics was introduced. What I'd really like to know is the motivation for rewriting the code this way. Because one explanation is that the person rewriting it -- presumably someone at Sun -- was not clear about what the generic code was actually doing here, and so thought that maybe it was safer to leave the hand-coded dynamic check in addition to the static checks that came from using generics. If that's the case, it lends more credence to Ken Arnold's argument that generics are simply too complicated, if the library writers at Sun don't get them.

Talk Back!

Have an opinion? Readers have already posted 18 comments about this weblog entry. Why not add yours?

RSS Feed

If you'd like to be notified whenever Bruce Eckel adds a new entry to his weblog, subscribe to his RSS feed.

About the Blogger

Bruce Eckel (www.BruceEckel.com) provides development assistance in Python with user interfaces in Flex. He is the author of Thinking in Java (Prentice-Hall, 1998, 2nd Edition, 2000, 3rd Edition, 2003, 4th Edition, 2005), the Hands-On Java Seminar CD ROM (available on the Web site), Thinking in C++ (PH 1995; 2nd edition 2000, Volume 2 with Chuck Allison, 2003), C++ Inside & Out (Osborne/McGraw-Hill 1993), among others. He's given hundreds of presentations throughout the world, published over 150 articles in numerous magazines, was a founding member of the ANSI/ISO C++ committee and speaks regularly at conferences.

This weblog entry is Copyright © 2005 Bruce Eckel. All rights reserved.

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use