AMD announces Radeon VII graphics: Zen 2 on track

**kompukare** · 12-01-2019, 11:36 AM

Originally Posted by DanceswithUnix

We know from Radeon VII that Vega can clock way higher on 7nm for the same power so the GPU bit wants to be on 7nm, and because GPUs don't cache well it needs to be with the memory controllers.

Hm, guess there's not much info on the 7nm vega compute parts, so come Feb these will be first 7nm GPUs to get proper reviews.
Now, GPUs and CPUs are not the same but in both case AMD are going from 14nm GF to 7nm TSMC.
So, from announcement and AT's table

The max boost at the same(ish) power goes up by 16% or so. This is 14nm so the comparison would be against Ryzen 1 rather than Ryzen 2 which is on 12nm (aka 14nm+).
Guess If R7-1800X was able to hit 4.1GHz, then that would point to around 4.75GHz for Zen 2 or Ryzen 3.

**DanceswithUnix** · 13-01-2019, 11:19 AM

Originally Posted by Corky34

With GPUs bandwidth is more important than latency so being away from the memory controller and a fast cache isn't as important as it would be for a CPU, you may want your memory close to a GPU because of signal strength (more traces equals thinner traces equal lower drive power equal shorter distances), in an ideal world AMD would use a GPU block, run traces through the substrate to a small block of HBM, and then connect that to an I/O die.

That bandwidth is important for GPUs actually isn't *strictly* true, you can't think of a GPU in CPU terms. A CPU has a few cores with two threads per core, the cache per thread is plentiful. The problem with a GPU is that it has thousands of threads, so even with more cache than a CPU it is overwhelmed and you end up with something like 64 bytes of cache per thread. That doesn't stop the fundamental problem that execution of a thread stops if a read instruction can't get its data, that's the same for a CPU and a GPU it is just the nature of compute. But it does mean that unlike with a CPU which can use cache to hide memory latency in the case of a GPU you are down to memory controller tricks.

What is the big trick of HBM memory? It is the fact that the 1024 bit wide memory channel is divided up into 32 bit chunks giving 32 individual memory channels servicing lots of threads in parallel. The same is true of GDDR memory, it is the ability to chop the memory accesses up into small chunks. Bandwidth of DDR4 used in CPUs is actually pretty good, but it doesn't get used in GPUs because the burst length is tuned for a CPU cache line width and that's like a flood of useless data to a GPU. There was an excellent Nvidia powerpoint stack about that, but I doubt I'll ever find it again.

Anyway, that's why I said GPUs like to be next to their memory controller.

**DanceswithUnix** · 13-01-2019, 11:31 AM

Originally Posted by kompukare

Hm, guess there's not much info on the 7nm vega compute parts, so come Feb these will be first 7nm GPUs to get proper reviews.

Very true, and I'm looking forward to those reviews.

What we do know: With 4 CUs missing the 7nm Vega is faster, more than the wider memory can account for judging by overclocking Vega64 memory hitting a wall quite quickly. So the difference pretty much has to be down to clock speed. Vega on 14nm also responds very well to undervolting, so I'm hoping 7nm will be more consistent. Sadly I think some of the Vega cards out there have been factory overclocked to a level where it harms the product, there are too many reports of turning the power down and performance increasing.

**Corky34** · 13-01-2019, 01:02 PM

Originally Posted by DanceswithUnix

That bandwidth is important for GPUs actually isn't *strictly* true, you can't think of a GPU in CPU terms. A CPU has a few cores with two threads per core, the cache per thread is plentiful. The problem with a GPU is that it has thousands of threads, so even with more cache than a CPU it is overwhelmed and you end up with something like 64 bytes of cache per thread. That doesn't stop the fundamental problem that execution of a thread stops if a read instruction can't get its data, that's the same for a CPU and a GPU it is just the nature of compute. But it does mean that unlike with a CPU which can use cache to hide memory latency in the case of a GPU you are down to memory controller tricks.

What is the big trick of HBM memory? It is the fact that the 1024 bit wide memory channel is divided up into 32 bit chunks giving 32 individual memory channels servicing lots of threads in parallel. The same is true of GDDR memory, it is the ability to chop the memory accesses up into small chunks. Bandwidth of DDR4 used in CPUs is actually pretty good, but it doesn't get used in GPUs because the burst length is tuned for a CPU cache line width and that's like a flood of useless data to a GPU. There was an excellent Nvidia powerpoint stack about that, but I doubt I'll ever find it again.

Anyway, that's why I said GPUs like to be next to their memory controller.

I get what you mean but the main memory on a GPU is the cache, they're essentially a collection of stream processors so they don't typical switch workloads/threads very often and the data going in and coming out is typically a constant stream, a CPU is turned to work the opposite way, on small sets of data and being able to switch quickly between workloads/threads.

It's the equivalent of having 1024 taps in a pub vs 256 people taking bottles out of the fridge, the former is great for a high volume of pints but sucks if someone asks for a beer that's not on tap, the latter sucks when it comes to high volume but you don't need to go changing barrels and flushing pipes if someone asks for something unusual.

I'm not sure why I'm thinking of beer this early in the day.

**scaryjim** · 13-01-2019, 06:17 PM

So … what's happening with the ROPs on this card? I swear the article used to say 128, and the Anandtech table has 128 - but struck through and replaced with 64?

We already know the boost clock, we know the shader count, we know that it only has 9% more peak shading performance than Vega 64 - so where does it get another 20% performance from?! It shouldn't be just the bandwidth. There's something odd going on here...

**Corky34** · 13-01-2019, 06:27 PM

From what i read it was a mistake at the presentation, apparently they said 128 on stage but that was later corrected to 64.

Guessing at where the extra 20% performance comes from I'd say a large part is from the increased bandwidth, that and some improvements from the shift to 7nm (clockspeeds, low level design changes, etc.), IIRC Vega was fabricated on a 14nm low power node (LPP) whereas 7nm is more performance orientated.

**DanceswithUnix** · 13-01-2019, 08:37 PM

Originally Posted by Corky34

I get what you mean but the main memory on a GPU is the cache, they're essentially a collection of stream processors so they don't typical switch workloads/threads very often and the data going in and coming out is typically a constant stream, a CPU is turned to work the opposite way, on small sets of data and being able to switch quickly between workloads/threads.

Basically you have those access patterns backwards, which is what I was trying to explain but perhaps I need to try again. GPUs do not do big linear or regularly spaced reads like a CPU does. They have vast numbers of threads which are switched between rapidly, so even if one thread would seem regular in access pattern that goes to heck when you dilute it with 10000 other threads. We think of graphics in terms of big image & texture files but that isn't what GPUs work on. That doesn't mean you can give up with caching, you can't the price is too high, but means you need to be careful with your memory subsystem.

So high bandwidth is not the point of HBM, it is just something that you happen to get and is easy to measure. People assume therefore that it is what matters, but that's just wrong. Same with higher bandwidth GDDR, you are actually getting higher concurrency of accesses for the win, bandwidth is the nice soundbite to tell people.

Also, saying main memory is the cache makes no sense. The word cache come from the French cacher "to hide", and what you are hiding is latency. So accessing memory isn't hiding anything. In reality GPUs have huge amounts of cache as well as scratchpad areas.

**Corky34** · 13-01-2019, 11:02 PM

Sorry but that's just wrong, yes a GPU runs many threads but they're working in parallel and all contributing to the whole, they also don't, typically, switch between workloads often. Take the rendering of a single frame for example, the rendering of that frame is divided up between all the stream processors with each one working on parts of a larger data set and they all outputting the result of all that work back into memory.

GPUs do not switch between threads rapidly, if they did we wouldn't need things like warp schedulers and other such stuff that attempts to hide the latency involved in swapping kernels/threads, it seems you're conflating the parallel nature of GPUs with their ability to swap between threads/workloads, yes they may run many threads but all those threads belong to the same kernel (they're all working on the same data set) and you can't just swap between threads without unloading the kernel that's feeding data from memory address A to SM A, memory address B to B, and so on and so forth.

GPUs work on embarrassingly parallel workloads, that's to say they separate the problem into a number of parallel tasks and for that they need high bandwidth because the problem (the data within memory) is separated into a number of parallel task (with each SM working on part of the data).

Originally Posted by DanceswithUnix

Also, saying main memory is the cache makes no sense. The word cache come from the French cacher "to hide", and what you are hiding is latency. So accessing memory isn't hiding anything. In reality GPUs have huge amounts of cache as well as scratchpad areas.

Yes but not in computing: An auxiliary memory from which high-speed retrieval is possible.

**lumireleon** · 14-01-2019, 07:18 AM

WOW!...the amount to technical explanations on this thread is phenomenal tanks to UNIX & 34.

**ultrasbm** · 14-01-2019, 02:12 PM

Originally Posted by cptwhite_uk

The bit that made me sad was the "same power envelope" ...as the Vega 64 presumably, so this thing is going to use loads of power sound like a vacuum.

RTX 2080 power around 225W:
https://www.tomshardware.com/reviews...n,5809-10.html

Vega 64 power around 275W:
https://www.tomshardware.co.uk/asus-...w-34379-4.html

Negligible difference in the grand scheme of things...

1kW of electricity is 12p. the 50 watt difference (at 100% load) would mean after 20 hours of solid 100% full on gaming, you'd be 12p out of pocket.

I wouldn't even include that in my thought process when thinking which one to buy.

**DanceswithUnix** · 15-01-2019, 09:37 AM

Originally Posted by Corky34

Sorry but that's just wrong, yes a GPU runs many threads but they're working in parallel and all contributing to the whole, they also don't, typically, switch between workloads often. Take the rendering of a single frame for example, the rendering of that frame is divided up between all the stream processors with each one working on parts of a larger data set and they all outputting the result of all that work back into memory.

Here's a slide deck from Nvidia, I expect you will find the whole thing interesting...

https://www.archive.ece.cmu.edu/~ece...uj_lrtcc5gcdk5

See slide 17 says what you said, then slide 19 debunks it.

Yes but not in computing: An auxiliary memory from which high-speed retrieval is possible.

[/quote]

But *why* do you have an area of high speed memory? The French origin of the word (which is in that link you gave) is usually cited in good computing texts as it is particularly relevant.

GPUs and some embedded CPUs do have an area of high speed memory, it's called a "scratchpad". Cache is different, because something slower is behind it, and the cache is arranged with the designer's best guesses on typical spacial and temporal locality of execution keep CPU reads away from that slow storage.

**Corky34** · 15-01-2019, 10:26 AM

Originally Posted by DanceswithUnix

Here's a slide deck from Nvidia, I expect you will find the whole thing interesting...

https://www.archive.ece.cmu.edu/~ece...uj_lrtcc5gcdk5

See slide 17 says what you said, then slide 19 debunks it.

It's difficult to know what's the point the PP presentation is trying to make as without the words that i assume accompanied it when it was used it's hard to know if they're using Professor Chaos from South Park to make a point or debunk it.

**scaryjim** · 15-01-2019, 01:20 PM

Originally Posted by Corky34

It's difficult to know what's the point the PP presentation is trying to make ...

Have you read the whole thing? It all seems pretty clear to me - and very interesting.

I think slides 24 - 26 are the key ones - they talk about DRAM activations, and how traditional CPU DRAM access patterns are very bad for GPUs as they return about an order of magnitude more data than a typical GPU access pattern requires. GPUs need access to a lot of very small data packets in parallel, due to the vast number of threads they have in flight and the fact that their memory access patterns are not predictable and regular (slide 16 explicitly says that). It just happens that the ways you get those many small DRAM accesses effectively - either using very high data rates to get many transfers /s, or using a lot of independent channels to get highly parallel access to DRAM - both tend to also provide a lot of bandwidth. But it's the atomic access that's important, not the pure bandwidth, which is why large local caches per thread aren't key to GPU performance.

**Corky34** · 15-01-2019, 04:35 PM

Originally Posted by scaryjim

Have you read the whole thing? It all seems pretty clear to me - and very interesting.

I can't say that i have, things have been a little busy today, I'll try to have a proper look when the boss isn't trying to impress his boss.

Although having read you're brief description i think I've got a better understanding of what you and DWU are driving at, you're talking about the access pattern of the RAM itself, yes?

**lyons75** · 30-05-2019, 01:55 PM

Hmmm, interesting. I prefer Nvidia's stuff for gaming but how do you think this will matchup to something like RTX 2080 or Titan?

Thread: AMD announces Radeon VII graphics: Zen 2 on track

LinkBack

Thread Tools

Re: AMD announces Radeon VII graphics: Zen 2 on track

Received thanks from:

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Re: AMD announces Radeon VII graphics: Zen 2 on track

Thread Information

Users Browsing this Thread

Posting Permissions