Big and Fast Data

What is big data?

An overloaded, fuzzy term

“Data too large to be efficiently processed on a single computer”

“Massive amounts of diverse, unstructured data prodiced by high-performance applications”

How big is “Big?”

Typical numbers associated with Big Data

How big is “Big?” – Instagram

Instagram

  • 1B daily users, clicking around the app
  • 95M photos daily
  • Most followed user: 181M followers

How big is “Big?” – FaceBook

FaceBook

Warning: numbers (\(\Uparrow\)) from 2014! Today on FB:

  • 2 Billion users
  • 1.32 Billion active users per day
  • 350 million photos per day (148k/min)
  • Every min: 510k comments, 293k status updates

The many Vs of Big data

Main Vs, by Doug Laney

  • Volume: large amounts of data
  • Variety: data comes in many different forms from diverse sources
  • Velocity: the content is changing quickly

More Vs

  • Value: data alone is not enough; how can value be derived from it?
  • Veracity: can we trust the data? How accurate is it?
  • Validity: ensure that the interpreted data is sound
  • Visibility: data from diverse sources need to be stitched together

Volume

We call Big Data big because it is really big: