Parallelism is about speeding up computations by utilising clusters of machines.
Distributed data parallelism involves splitting the data over several distributed nodes, where nodes work in parallel, and combine the individual results to come up with a final one.
This means that our programming model and execution should hide (but not forget!) those.
What is good about Hadoop: Fault tolerance
Hadoop was the first framework to enable computations to run on 1000s of commodity computers.
What is wrong: Performance
Microsoft’s DryadLINQ combined the Dryad distributed execution engine with the LINQ language for defining queries.