Assignment on Programming Techniques for Big Data

Functional programming is the basis of most modern Big Data processing systems. Before going forward to the course, it is important to master data processing techniques using a functional programming style. In this assignment, your task is to train yourselves in thinking in a functional way when processing data.

Many of the the tasks below are easy, but some are not and require many iterations (and extensive testing!) to get right. For some of them, you can find ready-made solutions on the internet. Even though it may be tempting, I advise you against copying and pasting them here, as you will regret it later on in the course. Wax on, wax off!

This assignment has a total of 110 points. For each function you define, you also need to define at least one working example.

folds and friends

T(5+5pts): Implement reduceL and reduceR using iteration for lists/arrays

In [9]:
def reduceL[A, B](list: List[A], init: B, f : (B,A) => B) : B = {
    var result = init;

    for (element <- list) {
        result = f(result, element);
    }

    return result;
}

def reduceR[A, B](list: List[A], init: B, f : (A,B) => B) : B = {
    var result = init;
    var length = list.size;
    
    for (i <- 1 to length) {
        result = f(list(length - i), result);
    }

    return result;
}

def total_val(x : Float, y : Float) : Float = {
    return x * 1 / y;
}


val list: List[Float] = List(1f, 2f, 3f, 4f, 5f, 6f, 7f, 8f, 9f);

println(reduceL[Float, Float](list, 1f, total_val));
println(reduceR[Float, Float](list, 1f, total_val));
2.7557319E-6
2.4609375

T (10pts): Implement reduceL by reusing reduceR

In [3]:
def reduceL2[A, B](list: List[A], init: B, f : (B, A) => B) : B = {    
        var g = (x : A, y : B) => f(y, x);
        return reduceR(list.reverse, init, g);
}

println(reduceL2[Float, Float](list, 1, total_val));
2.7557319E-6

T (10pts): Implement a function flatten(xs: [A]): [A] that converts a list of lists into a list formed by the elements of these lists. For example:

>>> var a = List(List(1, 2), List(2, 3), List(3, List(4)));
>>> flatten(a);
List(1, 2, 2, 3, 3, List(4))
In [4]:
def flatten[A](xs: List[List[A]]) : List[A] = {        
    var toReturn = List[A]();

    for (aList <- xs) {
        for (element <- aList) {
            toReturn = toReturn :+ element;
        }
    }

    return toReturn;
}

var a = List(List(1, 2), List(2, 3), List(3, List(4)))
println(flatten(a));
List(1, 2, 2, 3, 3, List(4))

T (10pts): You may have noticed that the original flatten definition is not recursive, as it will only flatten 1 level nested lists. Write a function deep_flatten(xs: [A]): [A] that converts a list of lists into a list formed by the elements of these lists recursively. For example:

>>> var a = List(List(1, 2), List(2, 3), List(3, List(4)));
>>> flatten(a);
List(1, 2, 2, 3, 3, 4)
In [5]:
var a = List(List(1,2),List(2,3),List(3,List(4, List(5, 6, List(4, 6)))));
println(a);

def deep_flatten[A](xs: List[A]) : List[A] = xs match {
    case Nil => Nil
    case (list: List[A]) :: tail => deep_flatten(list) ::: deep_flatten(tail);
    case element :: tail => element :: deep_flatten(tail);
}

println(deep_flatten(a));
List(List(1, 2), List(2, 3), List(3, List(4, List(5, 6, List(4, 6)))))
List(1, 2, 2, 3, 3, 4, 5, 6, 4, 6)

T (10pts): Implement group_by by reusing reduceL.

In [6]:
def group_by[A](list: List[A], theKeyFunction : (A) => String) : Map[String, List[A]] = {
    def helpme(map : Map[String, List[A]], item : A) : Map[String, List[A]] = {
        var key = theKeyFunction(item); 

        if (map.contains(key)) {

            val values: List[A] = (map.get(key)).get;
            return map + (key -> ( values :+ item));


        } else {
            return map + (key -> (item :: Nil));
        }

    }

    return reduceL[A, Map[String, List[A]]](list, Map[String, List[A]](), helpme);
}
    
    
def number_classifier(number: Int) : String = {
    if (number % 2 == 0) {
        return "even";
    } else {
        return "odd";
    }
}

val list = List(9, 1, 2, 2, 3, 4, 5, 4, 2, 6, 7, 8, 8, 7, 9);
println(group_by[Int](list, number_classifier));
Map(odd -> List(9, 1, 3, 5, 7, 7, 9), even -> List(2, 2, 4, 4, 2, 6, 8, 8))

Simple data processing

From this point forward, use the reduceL, reduceR or flatten implementations you created above.

T (5pts): Implement distinct using reduceL. Write its function signature.

In [14]:
def distinct[A](list : List[A]) : List[A] = {
    def addDistinct(distinctList: List[A], item: A) : List[A] = {
        var newList = List[A]();

        if (!distinctList.contains(item)) {
            newList = distinctList :+ item;
        } else {
            newList = distinctList;
        }

        return newList;
    }

    return reduceL(list, List[A](), addDistinct);
}
    
val list = List(9, 1, 2, 2, 3, 4, 5, 4, 2, 6, 7, 8, 8, 7, 9);
println(distinct(list));
List(9, 1, 2, 3, 4, 5, 6, 7, 8)

T (10pts): Implement a function reverse that reverses a list/array. As an example:

>>> a = [1,2,3,4]
>>> reverse(a)
[4,3,2,1]

Also, write down the function signature.

In [7]:
def reverse[A](list: List[A]) : List[A] = {
    def add_at_front(sublist: List[A], element: A) : List[A] = {
        return element :: sublist;
    }

    return reduceL[A, List[A]](list, Nil, add_at_front);

}

val list = List(9, 1, 2, 2, 3, 4, 5, 4, 2, 6, 7, 8, 8, 7, 9);
println(reverse(list));
List(9, 7, 8, 8, 7, 6, 2, 4, 5, 4, 3, 2, 2, 1, 9)

T (5pts): Implement a function mean that calculates the mean of an list of integers.

Also, write down the function signature.

In [8]:
def mean(list: List[Float]) : Float = {
    val size: Int = reduceL[Float, Int]  (list, 0, (x, y) => x+1);
    val sum:Float = reduceL[Float, Float](list, 0, (x, y) => x+y);

    return sum/size;
}

val list = List(1f, 2f, 3f, 4f);
println(mean(list));
2.5

Higher-order functions

T (10pts): Implement a function called drop_while(xs: [A], f: A -> Boolean) : [A] that drops the longest prefix of elements from xs that satisfy f.

>>> a = [1,2,3,4,5,6]
>>> dropWhile(a, lambda x: x <= 3)
[4,5,6]
In [10]:
def drop_while[A](list: List[A], evaluate: A => Boolean) : List[A] = {
    def helper(x: Any, y: A) : Any = {
        if (!x.isInstanceOf[List[A]]) {
            if (!evaluate(y)) {
                return y :: Nil;
            }
            return x;
        }
        else {
            return x.asInstanceOf[List[A]] :+ y;
        }
    }

    val newList = reduceL(list, -1, helper);
    if (newList.isInstanceOf[List[A]]) {
        return newList.asInstanceOf[List[A]];
    } else {
        return Nil;
    }

}

val list = List(1, 2, 3, 4, 5, 6, 7, 8, 9);
println(drop_while(list, (x: Int) => x <= 3));
List(4, 5, 6, 7, 8, 9)

T (10pts): Implement a function zip(xs: [A], ys: [B]): List[(A,B)] that returns a list formed from this list and another list by combining the corresponding elements in pairs. If one of the two lists is longer than the other, its remaining elements are ignored.

>>> a = [2,3,4]
>>> b = [a,b,c,d]
>>> zip(a,b)
[[2,a],[3,b],[4,c]]
In [15]:
def zip[A,B](list1: List[A], list2: List[B]) : List[(A, B)] = {
    def helper(list: List[(A,B)], element: B) : List[(A,B)] = {
        for (i <- 0 until list.size) {
            val(a, b) = list(i)
            if (b == null) {
                return list.updated(i, (a, element));
            }
        }
        return list;
    }
    
    return reduceL(list2.slice(0,list1.size),
                   list1.slice(0,list2.size).map(x => (x, null)).asInstanceOf[List[(A,B)]],
                   helper);
}

val list1 = List(1, 2, 3, 4, 5, 6, 7, 8, 9);
val list2 = List('a', 'b', 'c', 'd');
println(zip(list1, list2));
List((1,a), (2,b), (3,c), (4,d))

T (10pts): Implement a function scanL(xs: [A], init: B, f: (acc: B, x: A) -> B) -> [B] that works like reduceL but instead of producing one final result, it also returns all the intermediate results.

>>> a = [2,3,4]
>>> scanL(a, 0, lambda x, y: x + y)
[0, 2, 5, 9]
In [18]:
def scanL[A,B](xs : List[A], init: B, f: (B, A) => B) : List[B] = {
    def helper(x: List[B], y: A) : List[B] = {
        return x :+ f(x.last, y);
    }
    
    return reduceL(xs, List(init), helper);
}

def add(a: Int, b: Int) : Int = {
    return a + b;
}

val a = List(2, 3, 4);
print(scanL(a, 0, add));
List(0, 2, 5, 9)

Credits

Many of the textual descriptions of the functions where addapted from the Scala language documentation, see here.

In [ ]: