10 - Storage

Definitions

  • memory can be directly addressed from the CPU (load/store)
  • storage need block-level data transfers to the memory

Hardware

  • Tape, HDD, NAND / SSD, persistent memory, DRAM, CPU cache and CPU register
    • slower to faster
    • cheaper to expensive

Large Scale File Systems (BLOB)

  • Distributed File Systems (ex. Google Drive): Large files on multiple nodes
  • NoSQL
    • key-value store → MongoDB, Cassandra, Redis…
  • NewSQL
    • Relational dbs scalable and fault-tolerant (Google Spanner, VoltDB)

GFS

  • Google File Storage
    • files splitted into 64 or 128mb chunks
    • stored as plain files on chunk servers
    • no caching
    • fault tolerance: at least 3 replicas
    • load balancing: data distributed across servers
    • API: supports