😎 Appunti di Dag7

Cerca

❯

❯

❯

Big Data Computing

❯

❯

09 - GPUs TPUs

13 lug 2025, 2 minuti

#course-lecture

09 - GPUs TPUs

Overview

Description:: How does storage work in a large-scale datacenter

GPUs

from ‘10 to ‘20 GPU usage has increased
CPUs vs GPUs:
- the CPUs have cores and a lot of caches
- the latter ones have much more cores, at the cost of simpler/smaller caches and control units
- program few threads → CPUs, many threads → GPUs
array of streaming multiprocessors
each of them: multiple cores with shared control logic and instruction cache
- they can also have other modules (e.g. raytracing)
- shared global memory

they are built on the classic von Neumann architecture, that it is not performance-oriented (other models are better)

Programming a GPU

Several options can be used:

C++
directive-based languages (openMP)
frameworks that abstract the hardware away (Kokkos)
native libraries (OpenCL)
native code (CUDA)

CUDA

is the most used and most optimized
threads are divided in thread blocks, multiple thread blocks from a grid
instructions used to be copied back and forth central memory to GPU, but now they have a common shared area
one big challenge is how to connect multiple GPUs each other

TPUs

similar tp GPUs but they use tensors, a multidimensional array structure
Key feature is the Matrix Multiplication Unit
best way to connect them: torus
performance high, cost low if compared with classic GPUs

Vista grafico

09 - GPUs TPUs
GPUs
Programming a GPU
CUDA
TPUs

Link entranti

Big Data Computing

Creato con Quartz v4.2.2 © 2025

GitHub
Discord Community