Tuesday, October 28, 2008

My first F#, Binary Chop

Yesterday, I really felt like trying out F#. To get some inspiration, I visited PragProgs katas, and chose Kata Two -- Karate Chop. Or, actually, I just implemented a "functional" solution. I tried to make an imperative pointer-based solution (which I might post later, when I have resolved a strange bug). Anyway, I had never coded F# before, so there might be a few places where I could have e.g. chosen a library function instead of implementing it myself. If you have any suggestions or general comments, please post them!

By the way, is there any way to program in literate F#?
First, a few helper functions for triples.
let fst3 (a,b,c) = a
let snd3 (a,b,c) = b
let trd3 (a,b,c) = c
Define the middle index, for simplicity return 0 if list is empty.
let middleIndex xs = if List.is_empty xs then 0 else (List.length xs - 1)/2
Define a function that returns the middle element of a list, and two functions that returns the first/last remaining halfs. Note that there might not be a middle element, so we use the option type.
let firstHalf xs = Seq.take (middleIndex xs) xs

let middleElem xs =
match (xs) with
| [] -> None
| xs -> Some (xs.Item (middleIndex xs))

let lastHalf xs =
match xs with
| [] -> Seq.empty
| xs2 -> Seq.skip (middleIndex xs + 1) xs2
Let us group the functions in a convenient triple.
let splitInHalf xs = (firstHalf xs, middleElem xs, lastHalf xs)
Now, define a recursive function that 1) splits the list in three parts 2) reuse some definitions 3) return "None" if middle is empty 4) otherwise we can test the middle for equality, if equal then return the current index 5) if not equal, choose appropriate half and recurse.
let rec exists x xs i = 
let triple = splitInHalf xs in //1
let maybeMiddle = snd3 triple //2
let firstPart = Seq.to_list (fst3 triple)
let lastPart = Seq.to_list (trd3 triple)
if Option.is_none maybeMiddle then None //3
else let middle = Option.get maybeMiddle in //4
if x.Equals middle then Some(i) //5
elif middle > x then exists x firstPart (middleIndex firstPart)
else exists x lastPart ((i+1) + middleIndex lastPart)
This is the function a user would call.
let public ex x xs = exists x xs (middleIndex xs)
Some tests, just to check, plus a helper function for equality over options.

let eq x y = if Option.is_none x then Option.is_none y else x.Equals y

let res = [
ex 3 [];
ex 3 [1];
ex 1 [1];

ex 1 [1;3;5];
ex 3 [1;3;5];
ex 5 [1;3;5];
ex 0 [1;3;5];
ex 2 [1;3;5];
ex 4 [1;3;5];
ex 6 [1;3;5];

ex 1 [1;3;5;7];
ex 3 [1;3;5;7];
ex 5 [1;3;5;7];
ex 7 [1;3;5;7];
ex 0 [1;3;5;7];
ex 2 [1;3;5;7];
ex 4 [1;3;5;7];
ex 6 [1;3;5;7];
ex 8 [1;3;5;7];


let answers = [None; None; Some 0;
Some 0; Some 1; Some 2; None; None; None; None;
Some 0; Some 1; Some 2; Some 3; None; None; None; None; None

let asserts = Seq.for_all2 eq res answers
Now, I must say that I had a real good time developing this! Ok, some minor things didn't went as smoothly as I'd hope for, but it was actually the first time that I tried F#. I've never had such a good experience with a language the first day of use.

I will definitly be posting more F# posts in the future!



Joel said...

Really nice to see that you've got started with F#!

I just had a brief look at your solution so maybe I'm missing something, but I can't help wonder why you would ever want to use binary search for a list structure? Since looking up elements of a list is O(n) the binary search is really not any faster than a simple list traversal (right?).

On, the other hand I think your solution is easily ported to arrays( with constant lookup time).

Anyhow, thanks for an interesting post!

Gustaf Nilsson Kotte said...

Hrm, yes you're right. But see it more as an exercice than something useful.. :)

Christian Genne said...

Hm I disagree with Joel :)

A binary search is O(log(n)), which for more elements is quicker than O(n). Of course the list has to be sorted in the first place, or the binary search won't work at all :)

I also have some questions regarding the code:
1. Why use splitInHalf when you then directly use fst3, snd3 and trd3 to split the result up again? Perhaps better to skip these function, and simply write:

let maybeMiddle = middleElem xs
let firstPart = Seq.to_list (firstHalf xs)
let lastPart = Seq.to_list (lastHalf xs)

2. Do you really need Seq.to_list? In C# you can get the number of elements in a sequence using Count(), which would be Seq.count in F#? So I would rewrite middleIndex to:

let middleIndex xs = if !Seq.any xs then 0 else (Seq.count xs - 1)/2

(Assuming ! is not in F#)

3. Does F# cache data like Haskell? I.e. if you run middleIndex xs on the same list several times, is it only calculated once? Otherwise, for performance, you might want to calculate the middle index just once, and then send it to the functions needing it. (But then again, "premature optimization is the root of all evil"... ;))

Great post otherwise, I would love to learn more F# so please keep the posts coming :)

Gustaf Nilsson Kotte said...

0. Christian, I think what Joel means is that List is a single linked list, just as in Haskell (I haven't checked this up though). Then, lookup for the last element is not O(1) as you might think, but instead horrible O(n). Functions like "map" are not affected by this, since the cost of traversing a complete list still is amortized T(n). So, for a function that constantly is "deep diving" into the middle of lists, the List type might be inefficient. Though, look at the Code Kata site and you'll see: "You can assume that the array has less than 100,000 elements. For the purposes of this Kata, time and memory performance are not issues". Thanks Dave! :)

1. Yeah, that would be a nice and easy refactoring. I just wrote what I had in mind, kind of the usual bottom-up functions and then the "integration" function. Code reviews for the win!

2. I guess you're right. I should be consistent - either choose list or seq. In Haskell you don't have this problem.. :P

3. Hmm, the idea is that I run middleIndex each recursion, so it's always a new value.

Regarding caching (I think this is a part of "lazy evaluation"), it's quite dangerous in non-pure languages. Since you don't know if a value is dependent on a side-effect, there's no possibility to "cache". So, I don't think it exists in F#.

Thank you for good feedback! :)

Christian Genne said...

0. But the only time you really need to access the last element is when you need the number of elements, right? The List you are using is actually stored as an array (i.e. not linked list), so looking up the number of elements should be constant. If you later decide to use a sequence instead of list, you would need to iterate through all elements to find the number of elements, unless the sequence is derived from a list (array), which is the case here. It should then be constant as well.. (I'm assuming lists/sequences in F# and C# work the same).

3. Yes, I ment you call middleIndex three times on each recursion. I agree with your argument that the value of the function call isn't cached, so an option would be to call middleIndex only once, and sent that value to middleElem, firstPart and lastPart.

Gustaf Nilsson Kotte said...

0. Well, I need to grab the middle element of a list every recursion, so that's O(n/2) = O(n). Of course, the lists are getting smaller each recursion, so it's not that bad. (See: "recurrence equations on algorithms lecture" on google).

So, back to the question. I'm not convinced that an F# List is the same as a regular .NET List (http://blogs.msdn.com/jaredpar/archive/2008/09/10/debugging-f-list-s.aspx). As you see on the link, the size of a list is just a property of a list, not a method. Also, a list has a property "Head" and a property "Tail" in the debugger.

Joel said...

Don't know anything about the .NET lists but I'm pretty sure the list structure in F# is equivalent to the List structure in Haskell, i.e. an immutable single linked list:

type List 'a =
| ([])
| (::) of 'a of (List 'a)

So, as Gustav pointed out, looking up the middle element takes O(*n*/2) time, where *n* is the length of the list. In this way it is impossible to achieve anything better than linear running time why there is really no point of using 'binarySearch' at all.

So, either an array like-datastructure with constant lookup-time could be used or a balanced binary tree structure (like Data.Map in Haskell).

Christian Genne said...

Ok so I now see why it's slow :)

List and Seq seemed to similar to .Net List and IEnumerable, so I simply assumey they where the same. The function would be faster than a linear search if using a System.Collections.Generics.List though ;)