Originally Posted by
DanceswithUnix
That bandwidth is important for GPUs actually isn't *strictly* true, you can't think of a GPU in CPU terms. A CPU has a few cores with two threads per core, the cache per thread is plentiful. The problem with a GPU is that it has thousands of threads, so even with more cache than a CPU it is overwhelmed and you end up with something like 64 bytes of cache per thread. That doesn't stop the fundamental problem that execution of a thread stops if a read instruction can't get its data, that's the same for a CPU and a GPU it is just the nature of compute. But it does mean that unlike with a CPU which can use cache to hide memory latency in the case of a GPU you are down to memory controller tricks.
What is the big trick of HBM memory? It is the fact that the 1024 bit wide memory channel is divided up into 32 bit chunks giving 32 individual memory channels servicing lots of threads in parallel. The same is true of GDDR memory, it is the ability to chop the memory accesses up into small chunks. Bandwidth of DDR4 used in CPUs is actually pretty good, but it doesn't get used in GPUs because the burst length is tuned for a CPU cache line width and that's like a flood of useless data to a GPU. There was an excellent Nvidia powerpoint stack about that, but I doubt I'll ever find it again.
Anyway, that's why I said GPUs like to be next to their memory controller.