Software, Technology, Investing: F# Collections

F# has three different collection types to hold values of the same type

Array - A fixed-size, zero-based, mutable collection
List - An ordered, immutable linked list
Sequence - A logical series of elements

They have similarities but also many differences. Very few can probably remember all the functions that can be applied for a given type (Intellisense will help!) but a solid grasp of many of the functions are necessary to have in memory. For my reference I wrote this sheet to list the functions, first and second column are the important ones. In the available column ‘a’ is array, ‘l’ list, and ‘s’ is sequence. Some descriptions are missing but I will add that later as I experiment (for full details go here).

Skip this list and goto my experiments below.

Function	Description	Available
append	Add elements and return new collection	als
average	Calculate average	als
averageBy	Calculate average with a function applied to each element	als
blit	Copy section	a
cache	Compute and store elemnts	s
cast	Convert to type	s
choose	Apply function and return Some	als
collect	Apply function, concatenate and return	als
compareWith	Compare with a function	s
concat	Combines	als
countBy	Appy key-generating function to each element and return	s
copy	Copy collection	as
create	Create array	a
delay		s
distinct	Return collection with no duplicates	s
distinctBy	Return collection with no dupliates according to equality function	s
empty	Create empty collection	als
exists	Tests if any elements satisfies condition	als
exists2	Test is pair of elements satisfies condition	as
fill	Set a range of elements to a given value	a
filter	Filter and return new collection	als
find	Return first found element	als
findIndex	Return index of first found element	als
fold	Apply function to each element, threading an accumulator argument	als
fold2	Apply function to each element in two collection, threading an accumulator argument	al
foldBack		al
foldBack2		al
forall	Test if all elements satisfies predicate	als
forall2	Test if all elements satisfy predicate pairwise	als
get/nth	Get element	als
head	Get first element	ls
init	Create collection given dimension and function	als
initInfinite	Generate sequence	s
isEmpty	Test is empty	als
iter	Apply function to each element	als
iteri		als
iteri2	Apply function to pair of elements	al
iter2		al
length	Return length	als
map	Build a collection by applying a function	als
map2	Build a collection by applying a function to two collections	l
map3		l
mapi	Build array	als
mapi2	Build collection	als
max	Return greatest element	als
maxBy	Return greatest element compared with function	als
min	Return smallest element	als
minBy	Return smallest element compared with function	als
ofArray, ofList, ofSeq	Create a new collection of a different type
pairwise		s
partition	Split collection into two collections	al
permute	Return array with all elements permuted	al
pick	Apply function to successive elements	als
readonly		s
reduce		als
reduceBack		l
replicate	Create list of specified length with elements set to given value	l
rev	Reverse collection	al
scan		als
scanBack		al
singleton		s
set	Set element to specified value	a
skip	Skip elements and return new collection	s
skipWhile		s
sort	Sort using compare	als
sortBy		als
sortInPlace		a
sortInPlaceBy		a
sortInPlaceWith		a
sortWith	Sort using comparison function	al
sub	Create array from subrange	a
sum	Return sum of elements	als
sumBy	Return sum of element with function applied	als
tail	Return list without first element	l
take	Return elements up to specified count	s
takeWhile		s
toArray, toList, toSeq	Create a new collection of a different type
truncate	Return a sequence with no more than N elements	s
tryFind		als
tryFindIndex		als
tryPick		als
unfold		s
unzip	Split list of pairs into two lists	al
unzip3	Split list of triples into three lists	als
windowed		s
zip	Combines the two collections into a list of pairs	als
zip3	Combines the three collections into a list of triples	als

There are several ways to create collections but the most basic is:

let a = [| 0 .. 10 .. 100 |]

let l = [ 0 .. 10 .. 100 ]

let s = { 0 .. 10 .. 100 }

The collections above have small footprints so let’s create larger collections and make a discovery. I add a stopwatch to determine the time for each binding.

let N = int(10e6)

open System.Diagnostics

let stopWatch = Stopwatch.StartNew()

let aXL = Array.init N (fun i -> i)

let aXLt = stopWatch.Elapsed.TotalMilliseconds

let lXL = List.init N (fun i -> i)

let lXLt= stopWatch.Elapsed.TotalMilliseconds - aXLt

let sXL = Array.init N (fun i -> i)

let sXLt = stopWatch.Elapsed.TotalMilliseconds - lXLt

stopWatch.Stop()

printf "%f %f %f" aXLt lXLt sXLt

The output I get is: 35.113200 1038.648800 72.062900

Initializing the list takes considerable longer and can be explained that lists are in fact linked lists.

Now let’s try to update the element at in the middle (index 5). For the array you can do this:

a.SetValue(0, 5)

But for both the list and sequence you will discover that there is not straightforward way in doing the same. Only array elements are mutable as I wrote in the first paragraph.

To get a single element and bind to a new value:

let a5 = Array.get a 5

let l5 = List.nth l 5

let s5 = Seq.nth 5 s

For arrays and list it can also be written:

let a5 = a.GetValue(5)

let l5 = l.Item(5)

The same collections can also be created as

let a = Array.init 11 (fun n -> n * 10)

let l = List.init 11 (fun n -> n * 10)

let s = Seq.init 11 (fun n -> n * 10)

By now you have probably seen that ‘s’ has a different signature

val s : seq<int>

It is because the sequence is not actually evaluated when created. The sequence is represented as System.IEnumerable which means that the sequence is lazily evaluated. With lazy evaluation it is possible to create infinite collections, which otherwise would consume an infinite amount of memory

let sInfinite = Seq.initInfinite (fun n -> n * 10)

Seq.nth 10 sInfinite

Seq.nth 2147483647 sInfinite

The third row will take a while to evaluate, as it’s the largest possible value for a 32 bit int, and might not evaluate to the number you expect.

The average of a integer collection can easily be calculated

Array.averageBy (fun i -> float i) a

List.averageBy (fun i -> float i) l

Seq.averageBy (fun i -> float i) s

Let’s try this on a larger collection and measure the performance

open System.Diagnostics

let stopWatch = Stopwatch.StartNew()

let aa = Array.averageBy (fun i -> float i) a

let aat = stopWatch.Elapsed.TotalMilliseconds

let la = List.averageBy (fun i -> float i) l

let lat= stopWatch.Elapsed.TotalMilliseconds - aat

let sa = Seq.averageBy (fun i -> float i) s

let sat = stopWatch.Elapsed.TotalMilliseconds - lat

stopWatch.Stop()

printf "%f %f %f " aat lat sat

You’ll find that the sequence is the slowest.

Since these collections are similar but have differences there will be situations where one collection needs to be casted to another type. There are two functions to use for casting, e.g. for an array there are both toList and ofList. The former will cast the array to a list and the latter will cast a list to an array. This means that there are two ways of casting an array to a list , i.e. Array.ToList and List.OfArray. The obvious question is what’s the difference? To find out I decompiled the Array.Module which holds the functions.

public static FSharpList<T> ToList<T>(T[] array)

{

if ((object) array == null)

throw new ArgumentNullException("array");

else

return List.ofArray<T>(array);

}

public static T[] OfList<T>(FSharpList<T> list)

{

if ((object) array == null)

throw new ArgumentNullException("list");

else

return List.toArray<T>(list);

}

The function makes a call to a function in the List module.

public static FSharpList<T> OfArray<T>(T[] array)

{

return List.ofArray<T>(array);

}

public static T[] ToArray<T>(FSharpList<T> list)

{

return List.toArray<T>(list);

}

Examining this we find that using ToList() in the Array Module and using ofArray() in the List Module is in fact the same thing except for the if condition equals null. Note the difference of the static method with the first capital capital and calls to internal methods with first character lowercase. All conversion is handled by internal methods of List.

internal static T[] toArray<T>(FSharpList<T> l)

{

T[] res = new T[l.Length];

List.loop<T>(res, 0, l);

return res;

}

internal static FSharpList<T> ofArray<T>(T[] arr)

{

int length = arr.Length;

FSharpList<T> tail = FSharpList<T>.get_Empty();

int index = length - 1;

int num = 0;

if (num <= index)

{

tail = FSharpList<T>.Cons(arr[index], tail);

--index;

}

while (index != num - 1);

}

return tail;

}

Its worth looking at this for a minute and understand how it works. Both conversion to and from Array uses an iteration, when converting to an array its straightforward but the other conversion is a bit complex, with a downward counting while loop and list concatenation.

The same symmetry can be expected for Array.ToSeq and Seq.OfArray and I will show that it is the case here.

Decompiled Array module:

public static IEnumerable<T> ToSeq<T>(T[] array)

{

if ((object) array == null)

throw new ArgumentNullException("array");

else

return SeqModule.OfArray<T>(array);

}

Decompiled Seq Module:

public static IEnumerable<T> OfArray<T>(T[] source)

{

if ((object) source == null)

throw new ArgumentNullException("source");

else

return new SeqModule.OfArray <T>(source));

}

To finish this post I’ll sum up some important traits for each collection and when and how one would choose one over another.

Seq and Array are better than List for parallelism
Expose Seq to public API, use List and Array only internally
Use Array if data is rarely added (or added in larger groups), or if size is known at initialization
Since a (Linked) List are easy to add to/remove from use List if many add/inserts will be made
Use List in recursive processing with the head::tail pattern
Use Seq or Array for large collections since List consumes much more memory
Use List if you are holding immutable data and any of the other conditions suggest List
Use Seq for large collections but don’t expect to use all of the element, or don’t want all elements in memory at the same time
Seq is an abstract type and List and Array are automatically sequences. Seq support lazy evaluation. Therefore Seq can be used by default as the concrete type doesn't matter.

Software, Technology, Investing

Friday, October 4, 2013

F# Collections - Part 1

1 comment: