Yesterday's
post on The Comonad.Reader
referred to
my analysis of the list-based "quicksort" and described a
faster function based on difference lists. It gave an example that troubled me
for a while, and added that "the author [referring to me] was counting conses,
unfortunately that isn't a valid metric":
This function exhibits performance, even though
there seems to be one cons per level; this would show that just counting
conses isn't enough, and invalidate my analysis. The catch is that
there is not only one cons per level, and one must be careful and not
forget the conses implied by the list concatenation operation. In my analysis
of qsort, the number of conses verified
with , where the final
term represents the conses needed to concatenate the sorted sublists.
If the above function is analyzed the same way I did with qsort, the number of
conses satisfies
which clearly implies .
The key term is , which represents the conses
associated to the concatenation, the same way did in
qsort's expression; had it not been there, the function would be
. So, while this example doesn't invalidate my proof,
it serves as a good reminder of the cost of list concatenation and how easily
it turns linear-time algorithms into .
(Incidentally, it turns out that qsort would be
even without the extra term caused by list concatenation, so the proof, albeit
incorrect, would have reached the correct conclusion.)
Strict vs. non-strict, eager vs. lazy analysis
While it seems the original analysis was sound, a new question was raised
indirectly when Edward Kmett wrote that
it's not the number of times you cons that matters, but the number of thunks
that have to be forced to get to the answer
I've shown above that this is only true if you aren't careful when counting
conses, but the problem it suggests deserves to be considered by itself:
can the analysis under a strict semantics be taken as an upper bound of the
complexity in the non-strict or lazy cases?
I always assumed that the answer was a straight "yes". Intuitively, under lazy
evaluation you have thunks with the data necessary to evaluate their
associated expressions only when needed, so surely at most as much work as if
they were always evaluated is performed, right?
After pondering about this longly, I've found a case where this wouldn't be
true, suggested directly by an alternative definition of
non-strict semantics:
Non-strict semantics means that a function can have a definite value
although its argument is undefined.
What if an algorithm depended on an undefined value for termination? In
practice, this would mean that it relies on an expression raising an exception
to stop the computation and return some result after recovery. Whereas in the
strict case the expression would be evaluated unconditionally and the
computation would terminate, with a non-strict semantics it would be left
unevaluated if not needed, thus failing to raise the exception and finish
in a timely manner.
The above case is contrived, and such a program would be needlessly complex.
In fact, in OCaml it would be expressed awkwardly with a ref, and in Haskell
pattern matching over an Either value (implicit in bind) would normally force
evaluation, thus behaving as the strict counterpart.
Barring this exception, it seems to me that analyzing under a strict semantics
remains a good way to get an upper bound for the non-strict case, but lacking
a formal proof (and even with one) you can never be too sure. Any pointers?
In a strict functional language, analysis of time complexity is
straightforward, because of the following compositional rule: The time to
evaluate (ƒ(g x)) equals the time to evaluate (g x) plus the time to evaluate
(ƒ y), where y is the value of (g x). However, in a lazy language, this rule
only gives an upper bound, possibly a crude one.
I haven't read the paper yet nor followed its references/citations to see if
there's a proof for that. The problem I can foresee is that the counterexample
I came up with might not happen in the simplified languages/computational
models used in most papers.
At any rate, it seems that, for all but a few contrived programs (relying on
"useless" expressions being evaluated to raise an exception and terminate),
simple "eager" analysis remains a good way to obtain an upper bound for the
non-strict case. If it is low enough, we can stop there and enjoy.
Superlinear GC work: why Ruby scales badly, and a possible explanation of qsort's behavior?