01 - Intro
- Description:: an introduction to course
course organization
- 1st part
- data centers
- storage
- network
- compute
- 2nd part
- common big data problems + solution
- 1st part
- scientific paper review (2) + presentation
what’s big data is
- a buzzword that describes 5 V words
- value, volume, variety, velocity, veracity (veridicità)
- a buzzword that describes 5 V words
google serach
- inverted index: just a dictionary
- it deals every day with 6m of researches through inverted index
- only for tags 1pb !!!
- onerosa operazione
- data centers!
data centers
- scale up: wait for a faster / larger disk: not an option (cost, time, performance)
- scale out: distribute servers: flexible (see GARR)
other challenges of large systems
- disk failure
hardware/os infrastructure
software infrastructure (hadoop [dead] / spark)
- why does everyone wants to buy a GPU for ML?