Functional programming is the basis of most modern Big Data processing systems. Before going forward to the course, it is important to master data processing techniques using a functional programming style. In this assignment, your task is to train yourselves in thinking in a functional way when processing data.
Many of the the tasks below are easy, but some are not and require many iterations (and extensive testing!) to get right. For some of them, you can find ready-made solutions on the internet. Even though it may be tempting, we advise you against copying and pasting them here, as you will regret it later on in the course. Wax on, wax off!
This assignment has a total of 115 points. Your grade is calculated with min(points/11, 10)
, i.e. you need 110 points for a 10.
A few notes:
In this part you will implement core functions that are vital to functional programming.
T (5pts): Implement map
using iteration for lists/arrays
def map[A,B](func: (A => B), xs: List[A]): List[B] = {
var res = List[B]()
for (x <- xs)
res = res :+ func(x)
return res
}
map((x:Int) => x*2, List.range(0,10))
T (5pts): Implement filter
using iteration for lists/arrays
def filter[A](func: (A => Boolean), xs: List[A]): List[A] = {
var res = List[A]()
for (x <- xs)
if (func(x))
res = res :+ x
return res
}
filter((x: Int) => (x % 2 == 0), List.range(0, 10))
T (5pts): Implement reduceR
using iteration for lists/arrays
def reduceR[A, B]( f : (A,B) => B, list: List[A], init: B) : B = {
var result = init;
var length = list.size;
for (i <- 1 to length) {
result = f(list(length - i), result);
}
return result;
}
reduceR((x: Int, y: Int) => x-y, List.range(0, 7), 0)
T (5pts): Implement a function flatten(xs: [[A]]): [A]
that converts a list of
lists into a list formed by the elements of these lists. For example:
>>> a = List(List(1,2,3),List(4,5), List(6), List(7,List(8,9)))
>>> flatten(a)
List(1, 2, 3, 4, 5, 6, 7, List(8, 9))
def flatten[A](xs: List[List[A]]): List[A] = {
var res = List[A]();
for (aList <- xs) {
for (element <- aList) {
res = res :+ element;
}
}
return res
}
flatten(List(List(1,2,3),List(4,5), List(6), List(7,List(8,9))))
In every implementation from now (also in next steps)on you should reuse at least one of your answers to an earlier question.
T (5pts): Implement reduceL
by reusing reduceR
def reduceL[A, B](func: (B, A) => B, xs: List[A], init: B): B = {
return reduceR((x: A, y: B) => func(y,x), xs.reverse, init)
}
reduceL((x: Int, y: Int) => x-y, List.range(0,7), 0)
T (10pts): Implement group_by
by reusing reduceL
.
def group_by[A,B](theKeyFunction : (A) => B, xs: List[A]) : Map[B, List[A]] = {
def helpme(map : Map[B, List[A]], item : A) : Map[B, List[A]] = {
var key = theKeyFunction(item);
if (map.contains(key)) {
val values: List[A] = (map.get(key)).get;
return map + (key -> ( values :+ item));
} else {
return map + (key -> (item :: Nil));
}
}
return reduceL(helpme, xs, Map[B, List[A]]())
}
group_by((x: Int) => if (x % 2 == 0) "even" else "odd", List.range(0,10))
T (5pts): Implement distinct
using reduceL
.
def distinct[A](xs: List[A]): List[A] = {
def helper(xs: List[A], x: A): List[A] = {
if (xs.contains(x)) xs
else xs :+ x
}
return reduceL(helper, xs, List[A]())
}
val a = List(1,2,3,1,2,3,4,5,6,5,4,3,2,1)
distinct(a)
T (5pts): Implement flatmap
.
def flatmap[A,B](func: A => List[B], xs: List[A]): List[B] = {
return flatten(map(func, xs))
}
flatmap((x: Int) => List.range(0,x), List.range(0,5))
T (5pts): Implement max(xs: [Integer]): Integer
that finds the largest value in xs
. You can assume the list is non-empty.
def max(xs: List[Int]): Int = {
def max(x: Int, y: Int): Int = {
if (x > y) x
else y
}
return reduceL((x: Int, y: Int) => max(x,y), xs, xs.head)
}
max(List(1,59,42,27,38))
T (10pts): Implement a function called drop_while(f: A -> Boolean, xs: [A]) : [A]
that drops the longest prefix of elements from xs
that satisfy f
.
>>> a = List(1,2,3,4,3,2,1)
>>> dropWhile((x: Int) => x <= 3, a)
List(4,3,2,1)
def drop_while[A](func: A => Boolean, xs: List[A]): List[A] = {
def helper(ys: List[A], y: A): List[A] = {
ys match {
case Nil if !func(y) => y :: Nil
case a :: b => ys :+ y
case _ => ys
}
}
return reduceL(helper, xs, Nil)
}
drop_while((x: Int) => x <= 3, List(1,2,1,3,5,3,1,4,1,5,6))
T (10pts): Implement a function zip(xs: [A], ys: [B]): List[(A,B)]
that returns a list formed from this list and another list by combining the corresponding elements in pairs. If one of the two lists is longer
than the other, its remaining elements are ignored.
>>> a = List(1,2,3,4)
>>> b = List(a,b,c,d,e)
>>> zip(a,b)
List((1,a),(2,b),(3,c), (4, d)]
def zip[A,B](list1: List[A], list2: List[B]) : List[(A, B)] = {
def helper(list: List[(A,B)], element: B) : List[(A,B)] = {
for (i <- 0 until list.size) {
val(a, b) = list(i)
if (b == null) {
return list.updated(i, (a, element))
}
}
return list
}
return reduceL(helper, list2.slice(0,list1.size),
list1.slice(0,list2.size).map(x => (x, null)).asInstanceOf[List[(A,B)]])
}
val list1 = List(1, 2, 3, 4, 5, 6, 7)
val list2 = List('a', 'b', 'c', 'd')
zip(list1, list2)
T (10pts): Implement a function
scanL(f: (acc: B, x: A) -> B, xs: [A], init: B) -> [B]
that works like reduceL
but instead of producing one final result, it also
returns all the intermediate results.
>>> a = List(2,3,4)
>>> scanL((x: Int, y: Int) => x + y, a, 0)
List(0, 2, 5, 9)
def scanL[A,B](f: (B, A) => B, xs : List[A], init: B) : List[B] = {
def helper(x: List[B], y: A) : List[B] = {
return x :+ f(x.last, y);
}
return reduceL(helper, xs, List(init));
}
scanL((x: Int, y: Int) => x + y, List(2,3,4), 0)
In the following questions you will solve realistic problems with the techniques you learned in this assignment. You will be working with data of San Francisco Library patrons. You can find the data file here. Below you can find what each field means.
Solve the following questions using functions you implemented earlier. The code for reading the file is already given. Hint: for testing purposes it could be beneficial to only use a small part of the dataset.
// This snippet imports the csv file to a List[Array[String]]
var patrons: List[Array[String]] = Nil
try {
patrons = scala.io.Source.fromFile("library.csv").getLines().drop(1).map(_.split(",")).toList
} catch {
case e : Throwable => println("Library data file not found")
}
T (10pts) Some patrons have indicated that they want to receive notices via email, but have not provided their email address. Implement a function that outputs a list of the IDs of these patrons.
def missing_email(xs: List[Array[String]]): List[String] = {
return map((x: Array[String]) => x(0), filter((x: Array[String]) => x(9) == "false" && x(8) == "email", xs))
}
missing_email(patrons)
T (10pts) Implement a function that outputs the total amount of checkouts from members originally registered at a given location.
>>> checkouts(patrons, "Noe Valley/Sally Brunn")
1355624
def checkouts(xs: List[Array[String]], location: String): Int = {
return reduceL((x: Int, y: String) => x + y.toInt, map((x: Array[String]) => x(2), filter((x: Array[String]) => x(5) == location, xs)), 0)
}
checkouts(patrons, "Mission")
T (15pts) Implement a function that lists the number of renewals per location in a map. Example output for part of the dataset:
>>> number_renewals(patrons)
Map(Presidio -> 431988),
(Mission -> 1218976))
def number_renewals(xs: List[Array[String]]): Map[String, Int] = {
return group_by((x: Array[String]) => x(5), xs).mapValues((x: List[Array[String]]) => reduceL((y: Int, z: Array[String]) => y + z(3).toInt, x, 0))
}
number_renewals(patrons)