09 - GPUs TPUs
Overview
- Description:: How does storage work in a large-scale datacenter
GPUs
- from ‘10 to ‘20 GPU usage has increased
- CPUs vs GPUs:
- the CPUs have cores and a lot of caches
- the latter ones have much more cores, at the cost of simpler/smaller caches and control units
- program few threads → CPUs, many threads → GPUs
- array of streaming multiprocessors
- each of them: multiple cores with shared control logic and instruction cache
- they can also have other modules (e.g. raytracing)
- shared global memory
- they are built on the classic von Neumann architecture, that it is not performance-oriented (other models are better)
Programming a GPU
Several options can be used:
- C++
- directive-based languages (openMP)
- frameworks that abstract the hardware away (Kokkos)
- native libraries (OpenCL)
- native code (CUDA)
CUDA
- is the most used and most optimized
- threads are divided in thread blocks, multiple thread blocks from a grid
- instructions used to be copied back and forth central memory to GPU, but now they have a common shared area
- one big challenge is how to connect multiple GPUs each other
TPUs
- similar tp GPUs but they use tensors, a multidimensional array structure
- Key feature is the Matrix Multiplication Unit
- best way to connect them: torus
- performance high, cost low if compared with classic GPUs