I wonder if anyone knows where this '22GB/s' figure appeared from, referring to the inter-CCX bandwidth? Unless it's some measured workload-specific value, it seems bizarre, and it's been bugging me since it first started appearing unreferenced in forum posts and articles.
The slides are out there for everyone to see and calculate. This is the on you need: https://i1.wp.com/thetechaltar.com/w...eCQG.jpg?ssl=1
Each CCX communicates with the fabric at memclk at 32 bytes per cycle. So, assuming DDR4-2667 as in the slide, this means 1333MHz. This is something that far too many people are claiming to understand and confusing - DDR memory transfer rate is *twice* the clock speed. The interconnect is not at a 1/2 ratio of the memory clock, and nor does it have anything to do with memory channels - all the information you need to understand that is clearly presented in the slide.
So, to continue, 1333MHz multiplied by 32B/cycle = 42.656GB/s
That's still an order of magnitude below intra-CCX L3 bandwidth, but it's not the 22GB/s figure being bandied about. It's not exactly double either so it doesn't seem like someone has confused some calculation/divided something by mistake because of that 0.5x memclk confusion.
Now on to another point - some people are again causing confusion by claiming the fabric bandwidth is below that available to memory. I'm not sure why anyone would think this, not least because the IMC connects through the fabric to the cores (one person also says 22GB/s is similar to, quite specifically, DDR3-1600 of all things - why it couldn't have been DDR4-1600 is beyond me - but either way, a single channel of this gives 12.8GB/s, not close to 22GB/s). Again looking at that slide, it seems that any CCX can fully saturate two memory channels. We don't have to do the bandwidth calculations for within the clock domain as it's 32B/cycle from the CCX, 32B/cycle to the IMC, and 16B/cycle to each memory channel (of which there are two).
And there's nothing weird going on either as this works out exactly as expected - 16B/cycle multiplied by 1333MHz gets you ~21.3GB/s. 1333MHz DDR memory means 2666MT/s, 64 bit (8 byte) wide memory channels, so 2666x8=21.3, exactly the same.
Another theory is that there are some hard partitions in the fabric. Again, this demonstrably doesn't seem to be the case, check that AIDA screenshot again - in that case they're using 1200MHz (DDR4-2400) memory but it works out all the same. 1200x16 = 19200, or 19.2GB/s per channel. Dual channel 38.4GB/s, AIDA shows ~37GB/s - no obvious bottlenecks there. So, if memory accesses can cross the fabric unhindered, why would inter-CCX cache snoops run at some arbitrarily lower number across the same fabric? That makes no sense.
What isn't yet clear AFAIK is how exactly this fabric works internally, how and if this bandwidth is shared - could CCX0 read from CCX1 at 32B/cycle, and CCX1 read from CCX0 at 32B/cycle simultaneously? Do cache snoops have to compete with the IO hub and memory for bandwidth on the fabric? E.g. can CCX0 theoretically access memory whilst CCX1 accesses the IO hub unhindered? Because nothing is mentioned on that slide about fabric bandwidth, and any one port would be able to saturate 32B/cycle.
One last thing, arrows on the slide are a bit unclear e.g. across clock boundaries like cores>L3 or L3>fabric. However I believe that this means one 'connection' per core/CCX, respectively. Reasoning> AIDA again, looking at L3 bandwidth, 32 x cclk(3500MHz) = 112GB/s. Two CCX means 224GB/s *max* if it means L3 bandwidth per CCX (which is unlikely anyway TBF). AIDA shows well above this for all tests. So from that I assume that each CCX has its own 32 bit wide 'port' on to the fabric, and further that the total bandwidth is not, as some are speculating, limited to this. AMD also talk about how scalable IF is.
So in conclusion, the 22GB/s figure seems wrong and it seems unlikely that nodes on the fabric have to compete for bandwidth, provided they're not competing for the same node of course. Nonetheless, I'd like to see some more detail about the Infinity Fabric itself.
Edit: It's typical that I find this right after posting but according to this, the fabric is a bi-directional crossbar, and so there should be no competition across the fabric itself, but of course there may still be competition for a given port. https://www.reddit.com/r/Amd/comment...ma&sh=a5ac8d75