In this course, we will learn

Tools and technologies

During our course we will learn how to process data with

The course will be using the following 3 programming languages

Schedule (part 1)

Week Date Topic Teacher Assignment (Deadline)
1 14/11 Course introduction, Big and Fast data, Intro to course PLs GG
1 15/11 The Unix programming environment GG Unix (jupyter) (28/11)
2 21/11 Programming for Big Data (1) GG Functional programming: Scala (jupyter), Python (jupyter) (4/12)
2 22/11 Programming for Big Data (2) GG
3 28/11 Distributed Systems JR
3 29/11 Distributed Databases, Distributed filesystems JR

Schedule (part 2)

Week Date Topic Teacher Assignment (Deadline)
4 5/12 Spark: RDDs and Pair RDDs GG Spark (18/12)
4 6/12 Spark Internals JR
5 12/12 Spark SQL, Spark use cases: Synonyms with Word2Vec, Recommending bands, Predicting pull request merges GG
5 13/12 Live Data Processing GG
6 19/12 Stream processing GG Streaming (14/1) (Note: Optional for minor students)
6 20/12 Stream processing systems GG
7 8/1 Recap GG
7 9/1 No lecture GG

Grades

There will be a resit, there is no mid-term.

You can transfer your assignment grade to the resit AS A WHOLE. No individual assignment resubmissions!

Assignments

Assignment submission

We will use CPM. The course name is TI2736-B: Big Data Processing

Course resources

Slide symbols

Lecture notes

Bibliography