decorate-sort-undecorate in Haskell
Wednesday, 24th June, 2009
Real World Haskell
Chapter 3: Defining types, streamlining functions
Section: End of chapter exercises
Exercise 6: Create a function that sorts a list of lists based on the length of each sublist. (You may want to look at the sortBy function from the Data.List module.)
The immediate exercise threw up enough problems to make it worthwhile. The functions sort
and sortBy
are in Data.List
. sortBy
takes a comparison function (and sort
is just sortBy compare
). So, here was my direct answer to the exercise:
import Data.List longer :: [a] -> [a] -> Ordering longer a b = compare (length a) (length b) sortByLength :: [[a]] -> [[a]] sortByLength a = sortBy longer a
re type signatures: I knew that longer would have to return the same type as compare:
*Main> :type compare compare :: (Ord a) => a -> a -> Ordering
That isn’t how I would write this kind of sort in Python (my main language). For a sort with any kind of custom comparison function, I would use the decorate-sort-undecorate (dsu) algorithm. Here’s a verbose implementation:
>>> def dsu(func, xs): ... dec_xs = [(func(x), x) for x in xs] ... dec_xs.sort() ... undec_xs = [x[1] for x in dec_xs] ... return undec_xs ... >>> a = [[9], [1, 1, 1, 1, 1], [2, 2, 2]] >>> dsu(len, a) [[9], [2, 2, 2], [1, 1, 1, 1, 1]] >>> dsu(sum, a) [[1, 1, 1, 1, 1], [2, 2, 2], [9]]
Passing the comparison function to sort() directly will result in it being called every time two elements from the list are compared. With dsu, the comparison function is called once for each item on the list, then sort() can just use less than.
Here’s my first naive and verbose dsu sort in Haskell:
dsu :: (Ord a, Ord b) => (b -> a) -> [b] -> [b] dsu decFunc a = undecorate (sort (decorate decFunc a)) decorate :: (t -> t1) -> [t] -> [(t1, t)] decorate decFunc [] = [] decorate decFunc (x:xs) = ( ((decFunc x), x) : decorate decFunc xs ) undecorate :: [(a,b)] -> [b] undecorate [] = [] undecorate ( (_, y) : xs) = ( y : undecorate xs )
Trying it out with length and sum:
*Main> dsu length a [[9],[2,2,2],[1,1,1,1,1]] *Main> dsu sum a [[1,1,1,1,1],[2,2,2],[9]]
A comment by gerg on the RWH website gave the following implementation of sort by length:
sortByLength :: (Ord a) => [[a]] -> [[a]] sortByLength xss = map snd (sort (zip (map length xss) xss))
This is much terser and more apparently Haskellian than my sortByLength
, and it already almost a general dsu. With one small change, and a type signature, we have:
dsu :: (Ord a, Ord a1) => (a1 -> a) -> [a1] -> [a1] dsu decFunc a = map snd (sort (zip (map decFunc a) a))
I must confess that I didn’t work out all those type signatures all by myself. I used Haskell’s type inference to work it out for me. Write the function without a type signature, then ask ghci what type it is:
Prelude> import Data.List Prelude Data.List> let dsu decFunc a = map snd (sort (zip (map decFunc a) a)) Prelude Data.List> :type dsu dsu :: (Ord a, Ord a1) => (a1 -> a) -> [a1] -> [a1]
Remember there’s no difference between a, b and t. They are not types of types, just more or less random letters used by different parts of ghci’s type inference mechanism.
Note that functions seem to be of type (a -> b).
I asked about decorate-sort-undecorate on Haskell-Beginners and received very helpful, thorough and friendly reponses. Some of their implementations of dsu used Haskell syntax I haven’t come across yet. I’d like to look into this new syntax further and write it up, but that can be another story for another day.
Wednesday, 24th June, 2009 at 8:22 pm
You might not know that Python now has a built-in function called sorted(), which effectively replaces the DSU idiom:
sorted_list = sorted(unsorted_list, key=lambda v: …)
Wednesday, 24th June, 2009 at 8:45 pm
Hello Graham, thanks for your comment.
AFAIK the thing about
sorted()
is not that is replaces dsu, but that it returns a new sorted list (leaving the old one as is).sort()
sorts the list in place.The thing that replaces (or, really, internalises) dsu is the
key
argument, which is available forsort()
too.To my shame,
key
has been around since Python 2.4, so thanks for bringing it to my attention!So, a dsu-style sort by length in Python (since, ahem, Python 2.4), would be sort(key=len):
See the Python wiki’s Sorting Mini-HOWTO, section Sorting by keys
Wednesday, 24th June, 2009 at 9:53 pm
dsu = fmap (map snd . sort) . (zip <=< map)
Wednesday, 24th June, 2009 at 10:01 pm
Jake: thanks for your comment.
“<=<" is another new bit of syntax. I'm going to look at ., $ and &&& (and now <=<) in my next post.
Wednesday, 24th June, 2009 at 10:12 pm
sortByLength = sortBy (comparing length)
Data.Ord comparing :: Ord a => (b -> a) -> b -> b -> Ordering
Data.List sortBy :: (a -> a -> Ordering) -> [a] -> [a]
Wednesday, 24th June, 2009 at 11:30 pm
By the power of monads!
— | Right-to-left Kleisli composition of monads. ‘(>=>)’, with the arguments flipped
(<= (b -> m c) -> (a -> m b) -> (a -> m c)
(<==>)
— | Left-to-right Kleisli composition of monads.
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> (a -> m c)
f >=> g = \x -> f x >>= g
Thursday, 25th June, 2009 at 4:47 am
Dear All
Thanks for your comments (and nice to have a co-author of RWH visit!).
I should have written about comparing. I’ll make sure to do so in my next post. I might have to draw the line before I get to monads, just so I can get back to the book, and come to monads in their proper place.
Ivan
Thursday, 25th June, 2009 at 8:31 am
Maybe it’s reasonable, but I wonder if it’s a good idea to speak of signs like “(.)”, “($)”, “(>>=)”, “(>=>)” and so on as ‘syntax’. They are just names of functions like “sin” or “(+)”.
I guess the point is obvious, and often enough made — but maybe not often enough? For example, if you don’t like any of these seeming syntax-bits you can replace them with a definion:
f `composedWith` g = f . g
or
f `after` g = f . g
You can’t get tired of the ‘do’ in do notation and declare for ‘pleaseDo’, or replace the ‘=’of definitions with something else. There’s no thing or function to rename, they really are syntax. You can get rid of Haskell’s dots with a definition, but you can’t get rid of one of Matz’s dots, because his dots don’t mean anything.
Maybe it’s irritating to repeat it, but the thing itself, composition, is defined so:
f . g = \x -> f (g x)
or in some equivalent way, as one also defines
f $ x = f x
It is the same with Kleisli composition, in whichever direction, as dons noted,it’s a defined function: f >=> g = \x -> f x >>= g. If you preferred you could write, say: f `fishrightward` g = \x -> f x >>= g. Of course whatever you call the thing, you’ll define it in terms of ‘bind’ (>>=). This is something that you yourself have to define explicitly when you declare a functor List or Maybe or MyAmazingTypeClass to be a monad. But wherever you use it, it’s just another NAME for a definite function, no different from “sin” or “(+)”. Thus in the so-called Maybe monad the function in question is named in the instance declaration:
instance Monad Maybe where
Nothing >>= Just f = Nothing
Just x >>= Nothing = Nothing
Just x >>= Just f = Just f x
or something like that. Those lines isolate a mapping between any pair of Maybe values and another Maybe value, and then put a name to it. This is not syntax.
It’s like “show” or “(==)” which have to be defined when you declare a type to be in the Show or Eq type class. “show” certainly doesn’t seem like syntax!
When you get rid of a few sugary things like “do” notation and the fun and games with infix functions, and the inevitable things like the form of import statements and data declarations and definitions (with “=”) and grouping with parentheses, then the only real syntactical sign in Haskell is the white space between terms, which signifies the application of the function named by the first term to the function or object named by the second. So when we leave aside the decorative bits of sugar the founders gave us (which are frequently brilliant!) we see that in a way Haskell doesn’t have any syntax, or that it’s invisible — it’s white space and the “Western” decision to treat the name on the left as naming the function to be applied to the thing name to its right … plus grouping by parentheses, which is force on us by the linear form of type. Basically, when you type something or other, it’s not because of some witty rule Matz cooked up, dot this dot that curly bracket this square bracket that. What you type is the name of the thing you’re talking about. This is part of the reason why Haskell is so beautiful.
People sometimes say that as an exercise, when you come to the monad business, you should avoid the ‘do’ sugar till you figure out what’s going on with (>>=), (>=>) and company. I wonder if it wouldn’t also be a good exercize to eliminate all non alphabetical signs, replacing “.” with “composedWith”, “>>=” with “boundTo” and so on. …It would be a little exhausting! Maybe I should try it.
Thursday, 25th June, 2009 at 5:52 am
dsu f = sortBy (compare `on` f)
No monads necessary 🙂
Thursday, 25th June, 2009 at 1:48 pm
The thing about the sortBy (comparing length) version is that it still recomputes the length for each comparison.
Although…
{-# RULES
“sortBy/comparing” forall f xs . sortBy (compare f) xs = map snd . sort . zip (map f xs) $ xs
#-}
Thursday, 25th June, 2009 at 2:00 pm
Shoot, Yair’s use of comparing is nicer than my use of on:
dsu f = sortBy (comparing f)
Thursday, 25th June, 2009 at 2:17 pm
@ Applicative I agree with everything you say. For “syntax” I really meant “thingy”. I’ll respond to your comment in more detail when I write on these thingies.
@ The rest of you It looks like I’ll be writing an intermediate piece on
sortBy (comparing ...
andsortBy (compare `on` ...
.Thursday, 25th June, 2009 at 2:22 pm
http://www.reddit.com/r/haskell/comments/8vc7d/decoratesortundecorate_in_haskell/
Thursday, 25th June, 2009 at 7:24 pm
>> Applicative I agree with everything you say. For “syntax” I really meant “thingy”.
Beautifully said. 🙂
>> AFAIK the thing about sorted() is not that is replaces dsu, but that it returns a new sorted list (leaving the old one as is). sort() sorts the list in place.
That’s true, but Haskell’s ‘sort’ and ‘sortBy’ are closer to Python’s sorted(), and Haskell has no analogue of Python sort() since Haskell lists are immutable.
I’m glad to have turned you on to ‘key’! I also used DSU in my Python code for years before getting into the habit of using a key-function. Idioms are hard habits to break. 🙂
Thursday, 25th June, 2009 at 9:21 pm
I’ve clearly got too complacent with my Python. With luck this Haskell study I’m doing (and the help I’m getting) will spruce up my approach to programming in general.
Tuesday, 30th June, 2009 at 5:29 am
The problem specification does not impose a requirement that the list elements be comparable, yet dsu does. Hence dsu can’t compare everything that sortByLength can. And when it can, dsu needlessly compares lists whenever they have the same length. Zip, which maps elements to (,) pairs, should be replaced by zipWith f, where f produces pairs that are compared only on first elements.
Tuesday, 30th June, 2009 at 5:41 am
Dear Doug
Thanks for your comment. I’ll look into zipWith.
The rest of your comment I’m afriad I don’t understand. Surely list elements have to be comparable for sortByLength to compare their lengths?
> Hence dsu can’t compare everything that sortByLength can.
Could you give an example?
Tuesday, 30th June, 2009 at 11:11 am
If you look at the type for ‘dsu length’ and ‘sortByLength’:
dsu length :: (Ord a) => [[a]] -> [[a]]
sortByLength :: [[a]] -> [[a]]
sortByLength is completely polymorphic on ‘a’, so you can compare any type of list of lists.
The dsu version further requires that the elements be comparable (Ord a). For example, you couldn’t use it to sort lists of functions by length.
Tuesday, 30th June, 2009 at 12:56 pm
Doug, thanks for pointing this out; lumi, thanks for explaining.
How interesting! So,
dsuSortByLength a = map snd (sort (zip (map length a) a))
has to have the Ord context:
dsuSortByLength :: (Ord a) => [[a]] -> [[a]]
But the earlier version:
sortByLength a = sortBy (comparing length) a
has no such requirement:
sortByLength :: [[a]] -> [[a]]
Very interesting. I’ll have to look into why that is.
Friday, 3rd July, 2009 at 8:28 pm
Doug and lumi:
I’ve corrected dsuSortByLength:
dsuSortByLength :: [[a]] -> [[a]]
dsuSortByLength xs = map snd (sortBy (comparing fst) (zip (map length xs) xs))
Discussion here.
Thanks again
Ivan
Sunday, 4th October, 2009 at 2:21 am
I want to sort this list
[(Posicion,[Int])]
where type Posicion = (Int,Int)
by the lenght of [Int]
How can I do this??
Thanks : )
Tuesday, 6th October, 2009 at 11:51 am
Sebastian, Thanks for your comment.
Have you tried the techniques above (or in some of my later posts on this subject)? How did it go?