01 - Intro
Overview
- Description:: an introduction to course
Summary
-
course organization
- 1st part
- data centers
- storage
- network
- compute
- 2nd part
- common big data problems + solution
- 1st part
-
exam
- scientific paper review (2) + presentation
-
what’s big data is
- a buzzword that describes 5 V words
- value, volume, variety, velocity, veracity (veridicità)
- a buzzword that describes 5 V words
-
google serach
- inverted index: just a dictionary
- it deals every day with 6m of researches through inverted index
- only for tags 1pb !!!
- onerosa operazione
- data centers!
-
data centers
- scale up: wait for a faster / larger disk: not an option (cost, time, performance)
- scale out: distribute servers: flexible (see GARR)
-
other challenges of large systems
- disk failure
-
syllabus
-
hardware/os infrastructure
-
software infrastructure (hadoop [dead] / spark)
-
GPUs
- why does everyone wants to buy a GPU for ML?