Ruby Buzz Forum - Quicksort erratum

Articles |
News |
Weblogs |
Books |
Forums

Artima Forums | Articles | Weblogs | Java Answers | News

Sponsored Link •

Ruby Buzz Forum
Quicksort erratum

0 replies on 1 page.

Welcome Guest
Sign In

Back to Topic List

Reply to this Topic

Search Forum

Threaded View


Previous Topic		Next Topic

Flat View: This topic has 0 replies on 1 page

Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.

Quicksort erratum

Posted: Jul 15, 2008 5:35 AM

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Quicksort erratum Feed Title: Eigenclass Feed URL: http://feeds.feedburner.com/eigenclass Feed Description: Ruby stuff --- trying to stay away from triviality.	Latest Ruby Buzz Posts Latest Ruby Buzz Posts by Eigen Class Latest Posts From Eigenclass

A few days ago, I called the following Haskell function, reminiscent of Quicksort and considered the epitome of beautiful code by many, unusable for taking always " $O ( n sup 2 )$ space and time":

   qsort []  = []
   qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

Moreover, I referred to this version (using a C-like DSEL in Haskell) due to Lennart Augustsson as a usable one (as opposed to a $O ( n sup 2 )$ algorithm):

   qsortM :: forall i a m r arr .
            (MonadRef m r, Num i, Ix i, MArray arr a m, Ord a Bool) =>
            arr i a -> m (arr i a)
   qsortM ma = runE $ do
       (lb, ub) <- embed $ getBounds ma
       let mlb = pure0 lb
           mub = pure0 ub
       a <- liftArray ma
   
       let qsort' l r =
               if1 (r > l) $ do
                   i <- auto l
                   j <- auto (r+1)
                   let v = a[l] :: E m a
                       iLTj = i < (j :: E m i)
                   while iLTj $ do
                       while ((i += 1) < mub && a[i] < v)
                           skip
                       while (a[j -= 1] > v)
                           skip
                       if1 iLTj $ do
                           a[i] =:= a[j]
                   a[l] =:= a[j]
                   qsort' l (j-1)
                   qsort' i r
        
       qsort' mlb mub
       return ma

I was wrong. Instead of analyzing it with some care, I believed the $O ( n sup 2 )$ claims I read long ago in some now forgotten blog. I just saw the list concatenations and reacted without thinking.

Finding an upper bound for the number of comparisons in the best case is easy. In fact, while I'am at it, I'll find a bound for the number of cons operations, which is clearly greater than that of comparisons.

Noting the number of conses C(n) for a list of size n, for n > 3 (there are a few irregularities for n = 1 to 3 because no cons is needed when appending or prepending [] to another list), $C(n) = p + C(p) + 1 + q + C(q) + p + 1$ , where the problem for a list of length n has been divided into two halves (lists of length p and q) plus a lone pivot element. p (resp. q) conses are needed to build the list passed to the recursive call to qsort, C(p) (C(q)) conses are needed for the subproblem, and finally p + 1 conses are needed to concatenate the resulting sorted sublists.

The problem is how to find p and q. For the best case analysis, $p approx q approx n over 2$ . The n list can be divided into the desired three parts as follows:

$n mark = left floor n over 2 right floor + left ceiling n over 2 right ceiling = left floor n over 2 right floor + 1 + left ceiling n over 2 right ceiling - 1 = left floor n over 2 right floor + 1 + left floor { n + 1 } over 2 right floor - 1$

$n lineup = left floor n over 2 right floor + 1 + left floor { n - 1 } over 2 right floor$

Therefore,

$C(n) = C left ( left floor n over 2 right floor right ) + C left ( left floor { n - 1 } over 2 right floor right ) + 2 left floor n over 2 right floor + left floor { n - 1 } over 2 right floor + 2$

(There are two choices for the number of conses in the final list concatenation, differing by 1. The above is a conservative one, which will give an upper bound.)

Even after referring to Graham & Knuth & Patashnik, I saw no immediate way to solve this recurrence (please drop a comment if you have a closed-form solution). Fortunately, I just need an upper bound so I can use

$C(n) <= D(n) = 2 D left ( n over 2 right ) + 3 left ( n over 2 right ) + 2$

which is easily proved by induction, since all I've done is to remove the floor operations.

Then, using the master theorem

$D(n) = THETA left ( n sup { log sub 2 2 } log n right )$

$C(n) = O left ( n log n right )$

(Note the use of $O$ for C(n) and $THETA$ for D(n).)

The above program thus uses at most $O(n log n)$ space and time. The upper bound is satisfactory enough; in fact, it's the same as for the real quicksort, and, furthermore, the lowest achievable bound for a comparison sort.

So far, I've assumed that the program evaluates everything eagerly, which might not be the case in Haskell, as it has a non-strict semantics (which is not exactly the same as being lazy). Let's see if GHC is really smart and can recognize that qsort(l) is just a permutation of l to skip the sort altogether when some operation that doesn't depend on the order is performed (this is quite close to proving that the qsort function actually sorts the list, and would be an amazing exploit indeed).

To that end, I first rewrite the function in OCaml; if GHC is able to do less work thanks to its non-strict semantics, the solution will scale better than with OCaml:

let rec qsort = function
    [] -> []
  | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)

It's very similar to the Haskell one; the main difference is in the signs used with List.filter: ((>) x) means "x is greater than _", as opposed to "_ is greater than x" in Haskell.

The main function just creates a list of random integers, sums them up, and finally adds the numbers from the sorted list --- obtaining the same number, I hope (the sums are printed to stdout to make sure that computation is not omitted).

Here are the times I get:

The GHC binary takes nearly 800MB to sort a 3 million list, which explains why it suffers as n increases.

Well, this isn't... what was it, GHC 11? Being lazy does pay off if you try to get the nth (and in particular the first) element, though; using

Read more...

Read: Quicksort erratum

Previous Topic

Next Topic


	Web Artima.com