An overloaded, fuzzy term
“Data too large to be efficiently processed on a single computer”
“Massive amounts of diverse, unstructured data produced by high-performance applications”
Typical numbers associated with Big Data
Numbers (\(\Uparrow\)) are from 2014! Today on FB:
Main Vs, by Doug Laney
More Vs
We call Big Data big because it is really big:
Data growth rate
We often need to combine various data sources of different types to come up with a result
Data is not just big; it is generated and needs to be processed fast. Think of:
Data needs to be processed with soft or hard real-time guarantees
The ETL cycle
Big data engineering is concerned with building pipelines
Big data analytics is concerned with discovering patters
2 basic approaches to distribute data processing operations on lots of machines
Not a new discipline:
What is new?
Large scale processing on distributed, commodity computers, enabled by advanced software using elastic resource allocation.
Software (not HW!) is what drives the Big Data industry
The big data landscape
D: Most advancement in Big Data technologies came from the industry. The universities only started contributing late. Why?
Data is the new oil
This work is
(c) 2017, 2018, 2019, 2020 - onwards by TU Delft and Georgios Gousios
and licensed under the Creative
Commons Attribution-NonCommercial-ShareAlike 4.0 International
license.