What is streaming?

Big data is big

  • Typical processing scenario for big data:

    • Aggregate data in intermediate storage
    • Run batch job overnight, store results in permanent storage
    • Use Spark for interactive exploration of recent data

Assumes that the value of the data is hidden in it (“needle in haystack”)

Data is NOT static

Running processes generate data continuously, users need to continously monitor processes. The fact that we use mostly static data is due to legacy constraints.