An overloaded, fuzzy term
“Data too large to be efficiently processed on a single computer”
“Massive amounts of diverse, unstructured data prodiced by high-performance applications”
Typical numbers associated with Big Data
Warning: numbers (\(\Uparrow\)) from 2014! Today on FB:
Main Vs, by Doug Laney
More Vs
We call Big Data big because it is really big:
We often need to combine various data sources of different types to come up with a result
Data is not just big; it is generated and needs to be processed fast. Think of:
Data needs to be processed with soft or hard real-time guarantees
The ETL cycle
Big data engineering is concerned with building pipelines
Big data analytics is concerned with discovering patters
2 basic approaches to distribute data processing operations on lots of machines
Not a new discipline:
What is new?
Large scale processing on distributed, commodity computers, enabled by advanced software using elastic resource allocation.
Software (not HW!) is what drives the Big Data industry
D: Most advancement in Big Data technologies came from the industry. The universities only started contributing late. Why?
This work is (c) 2017, 2018, 2019 - onwards by TU Delft and Georgios Gousios and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International license.