F#
has three different collection types to hold values of the same type
- Array - A fixed-size, zero-based, mutable collection
- List - An ordered, immutable linked list
- Sequence - A logical series of elements
They have similarities but also many
differences. Very few can probably remember all the functions that can be
applied for a given type (Intellisense will help!) but a solid grasp of many of the functions are necessary to have in memory. For my
reference I wrote this sheet to list the functions, first and second column are
the important ones. In the available column ‘a’ is array, ‘l’ list, and ‘s’ is sequence.
Some descriptions are missing but I will add that later as I experiment (for full
details go here).
Skip this list and goto my experiments
below.
Function
|
Description
|
Available
|
append
|
Add elements and return new collection
|
als
|
average
|
Calculate average
|
als
|
averageBy
|
Calculate average with a function applied to each
element
|
als
|
blit
|
Copy section
|
a
|
cache
|
Compute and store elemnts
|
s
|
cast
|
Convert to type
|
s
|
choose
|
Apply function and return Some
|
als
|
collect
|
Apply function, concatenate and return
|
als
|
compareWith
|
Compare with a function
|
s
|
concat
|
Combines
|
als
|
countBy
|
Appy key-generating function to each element and
return
|
s
|
copy
|
Copy collection
|
as
|
create
|
Create array
|
a
|
delay
|
s
|
|
distinct
|
Return collection with no duplicates
|
s
|
distinctBy
|
Return collection with no dupliates according to
equality function
|
s
|
empty
|
Create empty collection
|
als
|
exists
|
Tests if any elements satisfies condition
|
als
|
exists2
|
Test is pair of elements satisfies condition
|
as
|
fill
|
Set a range of elements to a given value
|
a
|
filter
|
Filter and return new collection
|
als
|
find
|
Return first found element
|
als
|
findIndex
|
Return index of first found element
|
als
|
fold
|
Apply function to each element, threading an
accumulator argument
|
als
|
fold2
|
Apply function to each element in two collection,
threading an accumulator argument
|
al
|
foldBack
|
al
|
|
foldBack2
|
al
|
|
forall
|
Test if all elements satisfies predicate
|
als
|
forall2
|
Test if all elements satisfy predicate pairwise
|
als
|
get/nth
|
Get element
|
als
|
head
|
Get first element
|
ls
|
init
|
Create collection given dimension and function
|
als
|
initInfinite
|
Generate sequence
|
s
|
isEmpty
|
Test is empty
|
als
|
iter
|
Apply function to each element
|
als
|
iteri
|
als
|
|
iteri2
|
Apply function to pair of elements
|
al
|
iter2
|
al
|
|
length
|
Return length
|
als
|
map
|
Build a collection by applying a function
|
als
|
map2
|
Build a collection by applying a function to two
collections
|
l
|
map3
|
l
|
|
mapi
|
Build array
|
als
|
mapi2
|
Build collection
|
als
|
max
|
Return greatest element
|
als
|
maxBy
|
Return greatest element compared with function
|
als
|
min
|
Return smallest element
|
als
|
minBy
|
Return smallest element compared with function
|
als
|
ofArray,
ofList,
ofSeq
|
Create a new collection of a different type
|
|
pairwise
|
s
|
|
partition
|
Split collection into two collections
|
al
|
permute
|
Return array with all elements permuted
|
al
|
pick
|
Apply function to successive elements
|
als
|
readonly
|
s
|
|
reduce
|
als
|
|
reduceBack
|
l
|
|
replicate
|
Create list of specified length with elements set to
given value
|
l
|
rev
|
Reverse collection
|
al
|
scan
|
als
|
|
scanBack
|
al
|
|
singleton
|
s
|
|
set
|
Set element to specified value
|
a
|
skip
|
Skip elements and return new collection
|
s
|
skipWhile
|
s
|
|
sort
|
Sort using compare
|
als
|
sortBy
|
als
|
|
sortInPlace
|
a
|
|
sortInPlaceBy
|
a
|
|
sortInPlaceWith
|
a
|
|
sortWith
|
Sort using comparison function
|
al
|
sub
|
Create array from subrange
|
a
|
sum
|
Return sum of elements
|
als
|
sumBy
|
Return sum of element with function applied
|
als
|
tail
|
Return list without first element
|
l
|
take
|
Return elements up to specified count
|
s
|
takeWhile
|
s
|
|
toArray,
toList,
toSeq
|
Create a new collection of a different type
|
|
truncate
|
Return a sequence with no more than N elements
|
s
|
tryFind
|
als
|
|
tryFindIndex
|
als
|
|
tryPick
|
als
|
|
unfold
|
s
|
|
unzip
|
Split list of pairs into two lists
|
al
|
unzip3
|
Split list of triples into three lists
|
als
|
windowed
|
s
|
|
zip
|
Combines the two collections into a list of pairs
|
als
|
zip3
|
Combines the three collections into a list of
triples
|
als
|
There are several ways to create
collections but the most basic is:
let a = [| 0
.. 10 .. 100 |]
let l = [ 0
.. 10 .. 100 ]
let s = { 0
.. 10 .. 100 }
The collections above have small footprints
so let’s create larger collections and make a discovery. I add a stopwatch to
determine the time for each binding.
let N =
int(10e6)
open
System.Diagnostics
let stopWatch
= Stopwatch.StartNew()
let aXL =
Array.init N (fun i -> i)
let aXLt =
stopWatch.Elapsed.TotalMilliseconds
let lXL =
List.init N (fun i -> i)
let lXLt=
stopWatch.Elapsed.TotalMilliseconds - aXLt
let sXL =
Array.init N (fun i -> i)
let sXLt =
stopWatch.Elapsed.TotalMilliseconds - lXLt
stopWatch.Stop()
printf "%f
%f %f" aXLt lXLt sXLt
The output I get is: 35.113200
1038.648800 72.062900
Initializing
the list takes considerable longer and can be explained that lists are in fact linked lists.
Now let’s try to update the element at in the middle (index
5). For the array you can do this:
a.SetValue(0, 5)
But for both the list and sequence you will discover that
there is not straightforward way in doing the same. Only array elements are
mutable as I wrote in the first paragraph.
To get a single element and bind to a new value:
let a5 =
Array.get a 5
let l5 =
List.nth l 5
let s5 =
Seq.nth 5 s
For arrays and list it can also be written:
let a5 =
a.GetValue(5)
let l5 =
l.Item(5)
The same collections can also be created as
let a =
Array.init 11 (fun n -> n *
10)
let l =
List.init 11 (fun n -> n *
10)
let s =
Seq.init 11 (fun n -> n *
10)
By now you have probably seen that ‘s’ has a different
signature
val s : seq<int>
It is because the sequence is not actually evaluated when
created. The sequence is represented as System.IEnumerable which means that
the sequence is lazily evaluated. With lazy evaluation it is possible to create
infinite collections, which otherwise would consume an infinite amount of
memory
let
sInfinite = Seq.initInfinite (fun n -> n *
10)
Seq.nth 10
sInfinite
Seq.nth 2147483647 sInfinite
The third row will take a while to evaluate, as it’s the
largest possible value for a 32 bit int, and might not evaluate to the number
you expect.
The average of a integer collection can easily be calculated
Array.averageBy (fun i -> float i)
a
List.averageBy (fun i -> float i)
l
Seq.averageBy (fun i -> float i)
s
Let’s try this on a larger collection and measure the
performance
open
System.Diagnostics
let stopWatch
= Stopwatch.StartNew()
let aa =
Array.averageBy (fun i -> float i)
a
let aat =
stopWatch.Elapsed.TotalMilliseconds
let la =
List.averageBy (fun i -> float i)
l
let lat=
stopWatch.Elapsed.TotalMilliseconds - aat
let sa =
Seq.averageBy (fun i -> float i)
s
let sat =
stopWatch.Elapsed.TotalMilliseconds - lat
stopWatch.Stop()
printf "%f
%f %f " aat lat sat
You’ll find that the sequence is the slowest.
Since these collections are similar but have differences
there will be situations where one collection needs to be casted to another type.
There are two functions to use for casting, e.g. for an array there are both toList
and ofList. The former will cast the array to a list and the latter will
cast a list to an array. This means that there are two ways of casting an
array to a list , i.e. Array.ToList and List.OfArray. The obvious question
is what’s the difference? To find out I decompiled the Array.Module which holds the functions.
public static
FSharpList<T> ToList<T>(T[] array)
{
if ((object)
array == null)
throw new
ArgumentNullException("array");
else
return
List.ofArray<T>(array);
}
public static T[]
OfList<T>(FSharpList<T> list)
{
if ((object)
array == null)
throw new
ArgumentNullException("list");
else
return
List.toArray<T>(list);
}
The function makes a call to a function in the List module.
public static
FSharpList<T> OfArray<T>(T[] array)
{
return
List.ofArray<T>(array);
}
public static T[]
ToArray<T>(FSharpList<T> list)
{
return
List.toArray<T>(list);
}
Examining this we find that using ToList() in the Array
Module and using ofArray() in the List
Module is in fact the same thing except for the if condition equals null. Note the difference of the static method with the first capital capital and calls to internal methods with first character lowercase. All conversion is handled by internal methods of List.
internal static T[] toArray<T>(FSharpList<T> l)
{
T[] res = new T[l.Length];
List.loop<T>(res, 0, l);
return res;
}
internal static
FSharpList<T> ofArray<T>(T[] arr)
{
int length = arr.Length;
FSharpList<T> tail =
FSharpList<T>.get_Empty();
int index = length - 1;
int num = 0;
if (num <= index)
{
do
{
tail =
FSharpList<T>.Cons(arr[index], tail);
--index;
}
while (index != num - 1);
}
return tail;
}
Its worth looking at this for a minute and understand how it works. Both conversion to and from Array uses an iteration, when converting to an array its straightforward but the other conversion is a bit complex, with a downward counting while loop and list concatenation.
The same symmetry can be expected for Array.ToSeq and Seq.OfArray and I will show that it is
the case here.
Decompiled Array module:
public static IEnumerable<T> ToSeq<T>(T[] array)
{
if ((object) array == null)
throw new ArgumentNullException("array");
else
return SeqModule.OfArray<T>(array);
}
Decompiled Seq Module:
public static IEnumerable<T> OfArray<T>(T[] source)
{
if ((object) source == null)
throw new ArgumentNullException("source");
else
return new SeqModule.OfArray <T>(source));
}
To finish this post I’ll sum up some important traits for
each collection and when and how one would choose one over another.
- Seq and Array are better than List for parallelism
- Expose Seq to public API, use List and Array only internally
- Use Array if data is rarely added (or added in larger groups), or if size is known at initialization
- Since a (Linked) List are easy to add to/remove from use List if many add/inserts will be made
- Use List in recursive processing with the head::tail pattern
- Use Seq or Array for large collections since List consumes much more memory
- Use List if you are holding immutable data and any of the other conditions suggest List
- Use Seq for large collections but don’t expect to use all of the element, or don’t want all elements in memory at the same time
- Seq is an abstract type and List and Array are automatically sequences. Seq support lazy evaluation. Therefore Seq can be used by default as the concrete type doesn't matter.
This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing data science training in Hyderabad
ReplyDelete