AMD - Zen chitchat

**scaryjim** · 24-04-2018, 12:07 PM

Originally Posted by Blaineoliver

... It does seem like the memory goes via the infinity fabric, but then that makes sense if you think about the fact memory speed affects gaming more as per CCX traffic is sped up by faster ram.

That wasn't the issue, it was the claim that Intel's memory controllers are inside the cores! They're not, they're a separate part of the die and are accessed though a crossbar, ring bus or mesh (can't remember which for each particular Intel processor line). Infinity Fabric is an extension of the same concept that makes it easier to tie different types of block together - so rather than only connecting a set of cores to a memory controller, it can also connect blocks of cores to other blocks of cores (i.e. CCXes) or to a GPU (e.g. in APUs).

Intel's IMCs are probably "closer" to the cores, topologically, but they're certainly not integrated into the cores - they're part of the "uncore" of the chip.

**Corky34** · 24-04-2018, 12:56 PM

Originally Posted by Xlucine

Err, what? That's news to me

What Blaineoliver said, it's a bit amorphous when talking about microarchitecture design as where one part stops and another starts is open to interpretation but in the case of Intel each individual core within a package has its own memory controller that can issue their own memory read/write requests onto the bus that in turn connects to the DRAM interface that talks to the RAM.

With Zen (afaict) each core talks to the cache-coherent master (CCM) and it's the CCM that places data onto the SDF that in turn links to the unified memory controller (UMC).

As a side note it's probably why AMD were all but unaffected by meltdown and specter as only the CCMs and IOMs can talk directly to the UMC, whereas with Intel there's not that separation.

Originally Posted by scaryjim

Intel's IMCs are probably "closer" to the cores, topologically, but they're certainly not integrated into the cores - they're part of the "uncore" of the chip.

Yea ^^That^^ It's not so much "in" the core but closer to it.

Diagrams may do a better job of explaining things.

kaby-lake soc block diagram...

Zen soc block diagram...

**CAT-THE-FIFTH** · 24-04-2018, 01:05 PM

Originally Posted by Corky34

True but I'm unsure how much more they could get out of memory latency, with Zen the cores in each CCX don't have direct access to RAM, the memory controllers are not integrated into the cores themselves like they are with Intel, they're attached to the infinity fabric's (IF) scalable data fabric (SDF), it adds an extra step compared to Intel.

That's not to say one is better or worse than the other as it's horses for course but it seems AMD have picked most of the low hanging fruit in terms of RAM latency already.

Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??

**Corky34** · 24-04-2018, 01:25 PM

Originally Posted by CAT-THE-FIFTH

Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??

I think it's running at the same speed as if IIRC they said they didn't want to introduce a clock skew due to increased latency, could it be DDR effectively doubles the clockspeed?

(Link)

The DRAM is attached to the DDR4 interface which is attached to the Unified Memory Controller (UMC). There are two Unified Memory Controllers (UMC) for each of the DDR channels which are also directly connected to the SDF. It's worth noting that all SDF components run at the DRAM's MEMCLK frequency. For example, a system using DDR4-2133 would have the entire SDF plane operating at 1066 MHz. This is a fundamental design choice made by AMD in order to eliminate clock-domain latency.

**CAT-THE-FIFTH** · 24-04-2018, 01:34 PM

Originally Posted by Corky34

I think it's running at the same speed as if IIRC they said they didn't want to introduce a clock skew due to increased latency, could it be DDR effectively doubles the clockspeed?

(Link)

The thing is running faster RAM causes the effective bandwidth between the CCXes to increase,so is why Ryzen seems to gain somewhat more from higher clocked RAM than Intel.

**Corky34** · 24-04-2018, 01:35 PM

Yes, it's not really running at half clock speed it's that DDR doubles the data transfer rate, (IIRC by using both the rise and fall to transmit data).

**DanceswithUnix** · 24-04-2018, 01:44 PM

Originally Posted by Corky34

... but in the case of Intel each individual core within a package has its own memory controller

Careful with those terms. Each core has a port onto it's stop on the ring bus. Memory controllers drive ddr channels. As shown by the image you embedded, Intel bundle their memory controllers into the system agent which has it's own single stop on the ring bus.

I am interested to see that Intel can still get the ring bus to work for them. ATi gave up a really long time ago as it didn't scale well enough for them. Remember this? ...

Originally Posted by CAT-THE-FIFTH

Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??

Probably a distance issue. Interconnect will have to go a long way across the die (I'm taking a clue from the name

), that means charging up a long wire which takes time so limits clock speed.

Edit: I have a sneaky feeling that Intel's pre-fetcher is much better than AMD's which would make sense as back when Intel's Core2 still used a chipset memory controller they really needed it to hide the latency of that long trip to DRAM. They hid it rather well. Put that prefetch technology into a modern core with an on-die IMC and I'm guessing it could hide the latency across their ring bus.

Another thought: If you look at the Zen diagram above and mentally substitute a ring bus for the infinity fabric, then the only difference between the Intel and AMD memory models is that AMD have the path to DRAM on the other side of the L3. OTOH each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. I can't see one as inherently "closer" to ram than the other. Just different.

**CAT-THE-FIFTH** · 24-04-2018, 01:48 PM

Originally Posted by Corky34

Yes.

The main issue is that effective bandwidth to the L3 cache is much higher than the current bandwidth between CCX units,so they do need to find away around this,as it clearly does seem to have an effect with workloads like games,and audio stuff,and its most likely we will see a move to 6 cores per CCX(or more) with the next iterations of Ryzen. Its why the effective IPC increase of the Ryzen 2000 series CPUs in games is between 5% to 7% instead of the 3% for non-gaming stuff,and DAW benchmarks saw massive jumps in certain scenarios. I suppose one way would be to increase the amount of L3 cache per core,but still doesn't stop the issue of games not recognising the CCXes properly if the dev hasn't bother patching things.

It does make me wonder how a Ryzen APU in a console running GDDR5 or HBM2 would fare though!

**Corky34** · 24-04-2018, 05:31 PM

Originally Posted by DanceswithUnix

Careful with those terms. Each core has a port onto it's stop on the ring bus. Memory controllers drive ddr channels. As shown by the image you embedded, Intel bundle their memory controllers into the system agent which has it's own single stop on the ring bus.

I am interested to see that Intel can still get the ring bus to work for them. ATi gave up a really long time ago as it didn't scale well enough for them. Remember this? ...

I was trying to be careful.

AFAIK the memory subsystems have two parts (at least in terms of what we're referring to), there's the part that places data from the core intended for RAM onto the bus and then there's the part that takes it off the bus and sends it to the RAM.

In Intel's case the part that places data onto the bus is (logically speaking) sitting between each core and it's connection to the bus (each core has its own connection), in AMD's case that part sits in the same (logical) place but groups 4 cores together and it's the CCM's job to put the data on the bus.

The diagrams i posted don't include where those parts are but for the Intel one we could probably (logically) place them on the brown links going from each core to the ring bus and on the AMD diagram it would be the fat brown lane going to the IF crossbar.

IIRC you're right in being suprised with Intel still using the ring bus, again if IIRC i read something about them struggling to get latency down to acceptable levels when core counts increased, it's probably why they've been unwilling to release higher core parts for so long as fixing that probably requires quiet a bit of redesign and using a similar system to what AMD have used.

EDIT: I won't post it here as it's a little OT but here's a link to Intel's solution to increased core counts, lots of connected ring buses, it gives me nightmares of the old ring bus networks.

And they had the cheek to poke holes in AMD's implementation, the networking world mostly moved away from the ring bus topology for a reason, Intel.

**watercooled** · 24-04-2018, 07:01 PM

Originally Posted by DanceswithUnix

Another thought: If you look at the Zen diagram above and mentally substitute a ring bus for the infinity fabric, then the only difference between the Intel and AMD memory models is that AMD have the path to DRAM on the other side of the L3. OTOH each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. I can't see one as inherently "closer" to ram than the other. Just different.

Pretty much what I wanted to say - at least at the block diagram level it's not all that different to Intel's - both access the memory controller via a bus, the blue ring on the Intel diagram is equivalent to the pink Fabric block on the AMD one.

As for the L3 cache positioning, AMD use a victim cache so I'm not sure if that would add latency but it also means AMD aren't prefetching into it like Intel do. It's also not clear from diagrams how Intel have it set up, some seem to show cores connecting to the ring through the L3 similar to AMD:
http://www.legitreviews.com/intel-hd...tecture_170869
https://www.pcper.com/reviews/Proces...ablets-servers

The ring hasn't kept scaling for Intel - in some of their past generation Xeons they've used multiple rings, and now they've switched to a mesh.

I know Intel have played with the ring's clock since Sandy bridge and IIRC it now runs at the core clock vs AMD's memory clocked fabric. Everything considered, AMD have to balance scalability, power and performance across a range of die configurations.

Edit: Nope sorry, the ring has its own clock domain now, and data is fed through the L3:

Ring Clock - The frequency at which the ring interconnect and LLC operate at. Data from/to the individual cores are read/written into the L3 at a rate of 32B/cycle operating at Ring Clock frequency.

https://en.wikichip.org/wiki/intel/m...#Clock_domains

**Corky34** · 25-04-2018, 09:26 AM

Originally Posted by DanceswithUnix

Another thought: If you look at the Zen diagram above and mentally substitute a ring bus for the infinity fabric, then the only difference between the Intel and AMD memory models is that AMD have the path to DRAM on the other side of the L3. OTOH each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. I can't see one as inherently "closer" to ram than the other. Just different.

So I've been doing some more research and sorting things out in my head and i don't think it's as simple as substitute a ring bus for the infinity fabric, if I've not got myself confused the scalable data fabric (SDF) in IF consists of two separate links, there's the infinity fabric on-package (IFOP) and infinity fabric intersocket (IFIS) links, the IFOP link from what i can tell doesn't deal with memory controller requests as those are sent via the IFIS.

At first trying to work out the difference had me confused as data being sent via the IFOP (core-2-core, core-2-L1, core-2-L2 on a single package) is sent via a link that doesn't (from what i can tell) have access to the UMC, although I'm not entirely sure on that.

It seems the cores within a CCX can "talk" to each other as and when they like however if they need to "talk" to something outside of their own CCX (memory controller, PCIe, USB, another CCX, another socket, etc, etc) then the request needs to go via the cache-coherent master that encodes the data onto the IFIS.

I've probably got that all wrong knowing me and I'm still unsure what plane the L3 cache operates on.

**scaryjim** · 25-04-2018, 11:26 AM

Originally Posted by CAT-THE-FIFTH

Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??

iirc it runs at the actual clock speed of the RAM, i.e. if you use 2666 'MHz' DDR4 the fabric runs at 1333MHz? Either way, it runs in the DRAM clock domain which means that if you run faster RAM you also run a faster infinity fabric. I wondered with first gen ryzen whether the memory speed limits were more down to the fabric then the actual IMC... the improvement to memory clocks could easily be because the fabric can clock higher on 12nm...

Originally Posted by DanceswithUnix

... each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. ...

Which is why they use a mesh on the high-core-count Xeons, I believe.

I wonder if the ring bus connecting between the L2 and L3 cache has some impact on the fact that Intel's direct RAM access latency is so good? AMD appear to have at least equivalent, if not better, cache latency, but from those block diagrams it looks like direct RAM access has to traverse the L3 cache, which means adding an extra step vs. the Intel ring bus....

**CAT-THE-FIFTH** · 25-04-2018, 04:51 PM

Looks like the AT results were down to HPET issues:

https://www.anandtech.com/show/12678...-ryzen-results

**watercooled** · 25-04-2018, 05:47 PM

So having very quickly skimmed it, it seems like AMD results are basically correct, and Intel results are the ones reading on the low side? I'll be sure to read it properly when I get a chance.

Originally Posted by watercooled

...provided they're correct and not due to e.g. a timing issue...

Seems it had something to do with that after all! Although it seems the performance was measured correctly, but the method of measurement actually impacted performance? Again, I've only skimmed it.

**Corky34** · 25-04-2018, 05:59 PM

Originally Posted by CAT-THE-FIFTH

Looks like the AT results were down to HPET issues:

https://www.anandtech.com/show/12678...-ryzen-results

I guess that invalidates those cache timing test results and by extension what i thought about AMD having picked most of the low hanging fruit with regards to their caches.

I hope they revisit those tests as for me it's interesting to see and IDK of other sites that have dug that deep.

Originally Posted by watercooled

Seems it had something to do with that after all! Although it seems the performance was measured correctly, but the method of measurement actually impacted performance? Again, I've only skimmed it.

Having read through it, and I'm happy for someone to correct me, i got the impression it's a bit of both.

By forcing Windows to only use one timer they caused problems in the performance of programs and also the measurement of that performance.

**watercooled** · 25-04-2018, 06:17 PM

As I understand it, AMD's results are valid (they don't appear to be affected by this HPET implementation 'bug'). Also, other sites should be unaffected as this is down to the way Anandtech test.

Thread: AMD - Zen chitchat

LinkBack

Thread Tools

Re: AMD - Zen chitchat

Received thanks from:

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Received thanks from:

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Re: AMD - Zen chitchat

Thread Information

Users Browsing this Thread

Posting Permissions