Pattern matching in Haskell
Tuesday, 19th May, 2009
I’m working my way through Real World Haskell. Haskell is an exciting language. The book is … so near and yet so far: it could be an exciting introduction to an exciting language, as well as showing that you can learn ‘real world’ programming and think deeply about the language at the same time… Unfortunately, the current edition (both print and online versions) is extremely shoddy and consequently very annoying.
For example, Chapter 3 “Defining Types, Streamlining Functions” has a section “Pattern Matching” which describes, among other things, how the ordering of function definitions affects the behaviour of the code.
Unfortunately, the example given (reproduced below) works the same whichever way the definitions are ordered.
-- file: rwh/ch03/add.hs
sumList (x:xs) = x + sumList xs
sumList [] = 0
Why couldn’t the authors provide an example that exemplifies their point?
What’s wrong with good old factorial?
This order works properly:
-- file: fact_good.hs
factorial 1 = 1
factorial n = n * factorial (n - 1) -- brackets necessary
The reverse order doesn’t work at all:
-- file: fact_bad.hs
factorial n = n * factorial (n - 1) -- brackets necessary
factorial 1 = 1
Loading and running fact_bad.hs in ghci causes two interesting things:
- Warning on load
Prelude> :load fact_bad.hs [1 of 1] Compiling Main ( fact_bad.hs, interpreted ) fact_bad.hs:1:0: Warning: Pattern match(es) are overlapped In the definition of `factorial': factorial 1 = ... Ok, modules loaded: Main. *Main>
This shows that the ghci interpreter is (a) jolly clever and (b) jolly helpful.
- Error on run
*Main> factorial 4 *** Exception: stack overflow *Main>
Wednesday, 20th May, 2009 at 9:47 am
Indeed, defining functions along several lines is something that requires some additional rules beside the so-called line order independancy. However, it’s very similar to the “guard” constructions (appearing in “case … of” or in “|… =…| otherwise = …”).
First, no other function definition should intervene during the definition lines of f. For instance, this is impossible :
f [] = 1
g x = x
f (x:xs) = 2
Second, order matters (as with guards) so when the compiler tries to translate your sumList [1,2], it tries successfully to pattern match the first line, then it tries to translate sumList [2] (success with first line), then sumList [] fails with the first line but succeed with the second. In the case you swap the two lines, pattern matching sumList [1,2] will fail with the first line (which would be sumList []=0) but will succeed with the second and the result will eventually be the same.
On the other hand, pattern matching factorial 4, as well as factorial 3 and factorial 2, will be successful with the first line but the problem comes from the fact that pattern matching factorial 1 is still ok with the first line and it reads “factorial 1 = 1*factorial 0” so there is no need to go to line 2 for the compiler knows already what is factorial 1. In this case, you have to swap the two lines to avoid the stack overflow produced by the evaluation of “factorial 0 = 0* factorial (-1)” and all the (infinite) other negative arguments.
Wednesday, 20th May, 2009 at 12:54 pm
Dear Sauf
Thank you for your comment.
I think I’ll be coming across guard constructions shortly.
Interesting that the definitions must be in a single block. I tried:
And got the error: