The Artima Developer Community
Sponsored Link

Ruby Buzz Forum
Quicksort erratum

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Eigen Class

Posts: 358
Nickname: eigenclass
Registered: Oct, 2005

Eigenclass is a hardcore Ruby blog.
Quicksort erratum Posted: Jul 15, 2008 5:35 AM
Reply to this message Reply

This post originated from an RSS feed registered with Ruby Buzz by Eigen Class.
Original Post: Quicksort erratum
Feed Title: Eigenclass
Feed URL: http://feeds.feedburner.com/eigenclass
Feed Description: Ruby stuff --- trying to stay away from triviality.
Latest Ruby Buzz Posts
Latest Ruby Buzz Posts by Eigen Class
Latest Posts From Eigenclass

Advertisement

A few days ago, I called the following Haskell function, reminiscent of Quicksort and considered the epitome of beautiful code by many, unusable for taking always "O ( n sup 2 ) space and time":

   qsort []  = []
   qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

Moreover, I referred to this version (using a C-like DSEL in Haskell) due to Lennart Augustsson as a usable one (as opposed to a O ( n sup 2 ) algorithm):

   qsortM :: forall i a m r arr .
            (MonadRef m r, Num i, Ix i, MArray arr a m, Ord a Bool) =>
            arr i a -> m (arr i a)
   qsortM ma = runE $ do
       (lb, ub) <- embed $ getBounds ma
       let mlb = pure0 lb
           mub = pure0 ub
       a <- liftArray ma
   
       let qsort' l r =
               if1 (r > l) $ do
                   i <- auto l
                   j <- auto (r+1)
                   let v = a[l] :: E m a
                       iLTj = i < (j :: E m i)
                   while iLTj $ do
                       while ((i += 1) < mub && a[i] < v)
                           skip
                       while (a[j -= 1] > v)
                           skip
                       if1 iLTj $ do
                           a[i] =:= a[j]
                   a[l] =:= a[j]
                   qsort' l (j-1)
                   qsort' i r
        
       qsort' mlb mub
       return ma

I was wrong. Instead of analyzing it with some care, I believed the O ( n sup 2 ) claims I read long ago in some now forgotten blog. I just saw the list concatenations and reacted without thinking.

Finding an upper bound for the number of comparisons in the best case is easy. In fact, while I'am at it, I'll find a bound for the number of cons operations, which is clearly greater than that of comparisons.

Noting the number of conses C(n) for a list of size n, for n > 3 (there are a few irregularities for n = 1 to 3 because no cons is needed when appending or prepending [] to another list), C(n) = p + C(p) + 1 + q + C(q) + p + 1, where the problem for a list of length n has been divided into two halves (lists of length p and q) plus a lone pivot element. p (resp. q) conses are needed to build the list passed to the recursive call to qsort, C(p) (C(q)) conses are needed for the subproblem, and finally p + 1 conses are needed to concatenate the resulting sorted sublists.

The problem is how to find p and q. For the best case analysis, p approx q approx n over 2. The n list can be divided into the desired three parts as follows:


n mark = left floor n over 2 right floor + left ceiling n over 2 right ceiling 
= left floor n over 2 right floor + 1 + left ceiling n over 2 right ceiling - 1
= left floor n over 2 right floor + 1 + left floor { n + 1 } over 2 right floor - 1

n lineup = left floor n over 2 right floor + 1 + left floor { n - 1 } over 2 right floor

Therefore,

C(n) = C left ( left floor n over 2 right floor right ) + C left ( left floor { n - 1 } over 2 right floor right ) + 2 left floor n over 2 right floor + left floor { n - 1 } over 2 right floor + 2

(There are two choices for the number of conses in the final list concatenation, differing by 1. The above is a conservative one, which will give an upper bound.)

Even after referring to Graham & Knuth & Patashnik, I saw no immediate way to solve this recurrence (please drop a comment if you have a closed-form solution). Fortunately, I just need an upper bound so I can use

C(n) <= D(n) = 2 D left ( n over 2 right ) + 3 left ( n over 2 right ) + 2

which is easily proved by induction, since all I've done is to remove the floor operations.

Then, using the master theorem

D(n) = THETA left ( n sup { log sub 2 2 } log n right )

C(n) = O left ( n log n right )

(Note the use of O for C(n) and THETA for D(n).)

The above program thus uses at most O(n log n) space and time. The upper bound is satisfactory enough; in fact, it's the same as for the real quicksort, and, furthermore, the lowest achievable bound for a comparison sort.

So far, I've assumed that the program evaluates everything eagerly, which might not be the case in Haskell, as it has a non-strict semantics (which is not exactly the same as being lazy). Let's see if GHC is really smart and can recognize that qsort(l) is just a permutation of l to skip the sort altogether when some operation that doesn't depend on the order is performed (this is quite close to proving that the qsort function actually sorts the list, and would be an amazing exploit indeed).

To that end, I first rewrite the function in OCaml; if GHC is able to do less work thanks to its non-strict semantics, the solution will scale better than with OCaml:

let rec qsort = function
    [] -> []
  | x::xs -> qsort (List.filter ((>) x) xs) @ [x] @ qsort (List.filter ((<=) x) xs)

It's very similar to the Haskell one; the main difference is in the signs used with List.filter: ((>) x) means "x is greater than _", as opposed to "_ is greater than x" in Haskell.

The main function just creates a list of random integers, sums them up, and finally adds the numbers from the sorted list --- obtaining the same number, I hope (the sums are printed to stdout to make sure that computation is not omitted).

Here are the times I get:

times.png

The GHC binary takes nearly 800MB to sort a 3 million list, which explains why it suffers as n increases.

Well, this isn't... what was it, GHC 11? Being lazy does pay off if you try to get the nth (and in particular the first) element, though; using


Read more...

Read: Quicksort erratum

Topic: Engine Yard Previous Topic   Next Topic Topic: Hellboy II

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use