This post originated from an RSS feed registered with Python Buzz
by Thomas Guest.
Original Post: Zippy triples served with Python
Feed Title: Word Aligned: Category Python
Feed URL: http://feeds.feedburner.com/WordAlignedCategoryPython
Feed Description: Dynamic languages in general. Python in particular. The adventures of a space sensitive programmer.
Here’s a problem I encountered when writing the HTML generator for this site. Logically, Word Aligned is a time-ordered collection of articles. I wanted each article to link to its predecessor and successor. So the general problem is:
How do you iterate through a collection yielding (previous, this, next) triples?
You’ll notice we’ve specified behaviour at the boundaries: the first item in the collection has no predecessor, thus the first triple returned has its first item set to None; and similarly the final triple has its third item set to None. We might equally well have chosen to return a user supplied default, or to wrap the collection at its ends. For now, let’s go with the simple behaviour shown.
You’ll also have noticed I’m writing Python — fair enough, since this web site is generated off-line using Python. The long_weekend example drives the Python iterator protocol by hand, calling yesterday_today_tomorrow.next() until a StopIteration exception terminates the iteration. It’s quite rare to use iterators in this way: more commonly, you just loop through them using for, or plug them into container operations. The second and third test cases show more typical usage.
If this were C++, we’d prefer our collection to support bi-directional iteration: think of a doubly-linked std::list, or a plain old random access std::vector. Then we could just decrement/increment each iterator from the collection to find its neighbours.
In Python, we might decide to assume a random access container and write something like this:
def get_default(items):
"Return an item getter function."
n_items = len(items)
def inner(index):
"Return items[index] or None if index is out of range."
if index < 0 or index >= n_items:
return None
else:
return items[index]
return inner
def prev_this_next(items):
get = get_default(items)
for ix, item in enumerate(items):
yield get(ix - 1), item, get(ix + 1)
This code isn’t elegant but it does pass our tests. Incidentally, an attempt to implement get_default using EAFP would fail. Can you see why?
If our collection of items is a stream — by which I mean a lazily-evaluated collection — this code raises an exception. We don’t know how long the stream will be (indeed, it could be infinite) and we can’t just access elements from it at random. For C++ programmers, think of sequentially reading a file using an input iterator.
Thinking of this problem in terms of streams gives us a solution which is both more general and more simple. All we have to do is tee up three independent iterators into the stream, stagger them, then zip them back together. The itertools module supplies the components. We connect.
Now suppose you want to peel items from an iterable, three at a time. Let’s call this function three_at_a_time() and let’s specify its behaviour with some simple tests:
>>> t = three_at_a_time((1, 2, 3, 4, 5, 6))
>>> t.next()
(1, 2, 3)
>>> t.next()
(4, 5, 6)
>>> t.next()
Traceback (most recent call last):
...
StopIteration
>>> ttt = three_at_a_time((1, 2, 3, 4))
>>> [t for t in ttt]
[(1, 2, 3)]
>>> ttt = three_at_a_time(itertools.count())
>>> ttt = itertools.islice(ttt, 3, 6)
>>> [t for t in ttt]
>>> "".join(chain(*three_at_a_time("Word Aligned")))
'Word Aligned'
Note that any trailing single element or pair at the end of the collection is discarded. We might equally have decided to pad the collection with a user-supplied default or throw an exception.
Here’s one implementation.
def three_at_a_time(items):
it = iter(items)
return itertools.izip(it, it, it)