01 - Intro

Overview

  • Description:: an introduction to course

Summary

  • course organization

    • 1st part
      • data centers
      • storage
      • network
      • compute
    • 2nd part
      • common big data problems + solution
  • exam

    • scientific paper review (2) + presentation
  • what’s big data is

    • a buzzword that describes 5 V words
      • value, volume, variety, velocity, veracity (veridicità)
  • google serach

    • inverted index: just a dictionary
    • it deals every day with 6m of researches through inverted index
      • only for tags 1pb !!!
      • onerosa operazione
      • data centers!
  • data centers

    • scale up: wait for a faster / larger disk: not an option (cost, time, performance)
    • scale out: distribute servers: flexible (see GARR)
  • other challenges of large systems

    • disk failure
  • syllabus

    • hardware/os infrastructure

    • software infrastructure (hadoop [dead] / spark)

GPUs

  • why does everyone wants to buy a GPU for ML?