10 - Storage
Definitions
- memory can be directly addressed from the CPU (load/store)
- storage need block-level data transfers to the memory
Hardware
- Tape, HDD, NAND / SSD, persistent memory, DRAM, CPU cache and CPU register
- slower to faster
- cheaper to expensive
Large Scale File Systems (BLOB)
- Distributed File Systems (ex. Google Drive): Large files on multiple nodes
- NoSQL
- key-value store → MongoDB, Cassandra, Redis…
- NewSQL
- Relational dbs scalable and fault-tolerant (Google Spanner, VoltDB)
GFS
- Google File Storage
- files splitted into 64 or 128mb chunks
- stored as plain files on chunk servers
- no caching
- fault tolerance: at least 3 replicas
- load balancing: data distributed across servers
- API: supports