Ruby Buzz Forum - Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.

Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC

Posted: Jul 17, 2008 5:34 AM

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC Feed Title: Eigenclass Feed URL: http://feeds.feedburner.com/eigenclass Feed Description: Ruby stuff --- trying to stay away from triviality.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Eigen Class Latest Posts From Eigenclass

Yesterday's post on The Comonad.Reader referred to my analysis of the list-based "quicksort" and described a faster function based on difference lists. It gave an example that troubled me for a while, and added that "the author [referring to me] was counting conses, unfortunately that isn't a valid metric":

   reverse (a:as) = reverse as ++ [a]
   reverse [] = reverse []

This function exhibits $O( n sup 2 )$ performance, even though there seems to be one cons per level; this would show that just counting conses isn't enough, and invalidate my analysis. The catch is that there is not only one cons per level, and one must be careful and not forget the conses implied by the list concatenation operation. In my analysis of qsort, the number of conses verified

$C(n) = p + C(p) + 1 + q + C(q) + p + 1$

with $n = p + q + 1$ , where the final $p + 1$ term represents the conses needed to concatenate the sorted sublists.

If the above function is analyzed the same way I did with qsort, the number of conses satisfies

$C(n) = C(n - 1) + n - 1 + 1$

which clearly implies $C(n) = O ( n sup 2 )$ .

The key term is $n + 1$ , which represents the conses associated to the concatenation, the same way $p + 1$ did in qsort's expression; had it not been there, the function would be $O(n)$ . So, while this example doesn't invalidate my proof, it serves as a good reminder of the cost of list concatenation and how easily it turns linear-time algorithms into $O( n sup 2 )$ .

(Incidentally, it turns out that qsort would be $O(n log n)$ even without the extra term caused by list concatenation, so the proof, albeit incorrect, would have reached the correct conclusion.)

Strict vs. non-strict, eager vs. lazy analysis

While it seems the original analysis was sound, a new question was raised indirectly when Edward Kmett wrote that

it's not the number of times you cons that matters, but the number of thunks that have to be forced to get to the answer

I've shown above that this is only true if you aren't careful when counting conses, but the problem it suggests deserves to be considered by itself: can the analysis under a strict semantics be taken as an upper bound of the complexity in the non-strict or lazy cases?

I always assumed that the answer was a straight "yes". Intuitively, under lazy evaluation you have thunks with the data necessary to evaluate their associated expressions only when needed, so surely at most as much work as if they were always evaluated is performed, right?

After pondering about this longly, I've found a case where this wouldn't be true, suggested directly by an alternative definition of non-strict semantics:

Non-strict semantics means that a function can have a definite value although its argument is undefined.

What if an algorithm depended on an undefined value for termination? In practice, this would mean that it relies on an expression raising an exception to stop the computation and return some result after recovery. Whereas in the strict case the expression would be evaluated unconditionally and the computation would terminate, with a non-strict semantics it would be left unevaluated if not needed, thus failing to raise the exception and finish in a timely manner.

The above case is contrived, and such a program would be needlessly complex. In fact, in OCaml it would be expressed awkwardly with a ref, and in Haskell pattern matching over an Either value (implicit in bind) would normally force evaluation, thus behaving as the strict counterpart.

Barring this exception, it seems to me that analyzing under a strict semantics remains a good way to get an upper bound for the non-strict case, but lacking a formal proof (and even with one) you can never be too sure. Any pointers?

Something relevant might be found by following the trail given by this paper by P. Wadler, whose abstract states the following:

In a strict functional language, analysis of time complexity is straightforward, because of the following compositional rule: The time to evaluate (ƒ(g x)) equals the time to evaluate (g x) plus the time to evaluate (ƒ y), where y is the value of (g x). However, in a lazy language, this rule only gives an upper bound, possibly a crude one.

I haven't read the paper yet nor followed its references/citations to see if there's a proof for that. The problem I can foresee is that the counterexample I came up with might not happen in the simplified languages/computational models used in most papers.

At any rate, it seems that, for all but a few contrived programs (relying on "useless" expressions being evaluated to raise an exception and terminate), simple "eager" analysis remains a good way to obtain an upper bound for the non-strict case. If it is low enough, we can stop there and enjoy.

Superlinear GC work: why Ruby scales badly, and a possible explanation of qsort's behavior?

Read more...

Read: Reexamining qsort, eager vs. lazy algorithm analysis and Ruby's (and other's) GC

Previous Topic

Next Topic


	Web Artima.com