It's a WIP.
The quote from the article is actually rather funny:
Der Kopierschutz Denuvo, der in vielen Spielen auf maximal 5 aktive Systeme in 24 Stunden limitiert, hat das aktuell aber noch verhindert, da parallel auch an weiteren Artikel mit denselben Spielen gearbeitet wird.So it seems this silly copy protection system hinders their benchmark efforts due to the 5 system limit. Guess CB are not big enough to buy multiple licences.The copy protection system, Denuvo, limits many games to a maximum of 5 system activations in 24 hours, and this is what actually/currently limited us, as we are also working on further articles using the same games.
Of course, they've also hinted that they are working on their Intel 9000 series reviews.
Rather suspect that neither the 2600X results or the Intel 9000 results will perform any differently than where their clock speeds would dictate in most cases as 6C/12T is enough for most games. The i9-9900K (@ the hugely inflated £600 mark) will gain most of most likely tiny lead over the i7-8700K purely from the clock speed. Clock-for-clock it might actually lose a bit - there is a reason why previous Intel 8+ cores chips didn't use ringbus.
EDIT:
Actually, while I do expect the new Intel 9000s to be 'the best' gaming CPUs, I wonder whatever happened to all the GPCGMR and Intel enthusiasts who were urging everyone to get the 6700K or 7700K (or even worse the i5 equivalents) versus the Ryzen 1600 or 1700 just a short while ago?
At the time that was after a good few people had said how much their mins and consistency had gone up with their Ryzen 6C/12T or 8C/16T while playing multi-player BF versus their older i5/i7. Think that was around the same time that CB ran their 6/8/+ core gaming reviews just before Ryzen showed.
Also, min frames and consistency was of course the main benefit of Mantle and DX12 which was mostly ignored by the same crowd.
Last edited by kompukare; 15-10-2018 at 08:28 PM.
It does, but given that was also an AMD design decision I'm not sure it's a better reference point! But AMD have been doing this modular 4 core cluster through 3 architectures now (bulldozer was effectively a modular 4 (int) core architecture, as was Jaguar). The only real difference in Ryzen is how they're linked together.
At last, someone other than me has said this Can't believe how many people have been talking about increasing the cores per CCX, which would be completely contrary to the whole point of modular design. The only thing that would make sense for AMD is increasing the number of CCXes, and the only real question in my mind is how many...
Take this with a grain of salt:
https://twitter.com/BitsAndChipsEng/...94745647165441
So Zen 2 has a 13% IPC increase in scientific tasks?
I've definitely thought of the possibility, but three clusters does seem like an awkward amount for layout on one die, so I'd be surprised if we saw that. Also, increasing the number of cores wouldn't subtract from the modularity at all really, just you're working with bigger building blocks brought about, in part, by the smaller node.
Very impressive if true, and would put it somewhat ahead of Skylake in those workloads. Intel don't generally clock the server parts to the hilt either, so it could help make up for the gap.
AMD could likely use a decent leapfrog in FMA throughput though as that's one area Skylake still pulls ahead per-core, and the server Skylake core is wider still.
But as much as fanboys would protest otherwise, Intel don't seem to be on the same pedestal they once were, performance wise. As much as some like to do a load of hand-waving and claim the two are incomparable, Apple's ARM cores look to be doing extremely well now (and like mobile processors in general, have been improving significantly year-on-year for a while). Yes, there are differences to be considered, but take a look at the SPEC numbers:
https://www.anandtech.com/show/13392...icon-secrets/4
https://www.anandtech.com/show/12694...rver-reality/7
And don't forget that's comparing a ~5W mobile core to a full-on x86 core drawing many times that.
Last edited by watercooled; 16-10-2018 at 07:07 PM.
I don't think 4 cores per CCX is anything to do with silicon layout, I think it is to do with optimal traffic to and from the shared L2 cache. Two cores per CCX would mean more snoop traffic between modules, 6 cores per CCX would increase contention on the L2 cache. That balance may change if they drastically redesign the cache in some way, but chances are if there is a sweet spot it won't really move.
OTOH, if a new node gives you more transistors to play with you can just place a third CCX into the top level design, hook it into the fabric and tell the tools to layout the chip. That is kind of the whole point of the fabric to be able to do that, and automated layout tools aren't fussed by there being three of something.
Corky34 (17-10-2018)
To add to what DwU's already said, it's not that changing the module subtracts from the modularity per se, but it does somewhat defeat the object of designing a reusable module.
The only reason to change the basic CCX structure is if they can't get sufficient performance out of the fabric to link more modules together - but since they're already running up to 8 CCXes across the fabric and are happy with the overall performance, that doesn't seem to be a problem.
A lot depends on how the finances are looking, but I could easily see AMD moving to maybe 3 dies on 7nm - a 3 or 4 CCX CPU die, a 2 CCX + big IGP desktop APU, and a tiny 1 CCX and IGP mobile APU. Sticking with a 4 core CCX makes that much easier (and just imagine a 4C/8T APU going up against Intel's gutless dual-core Pentiums…)
@DanceswithUnix & scaryjim: What I mean is, I imagine laying out a die with three CCXs would be awakward (as far as I know anyway) and silicon layout tends to be fairly symmetrical for a few reasons. For three modules you have a choice between three in a line maybe with fabric routing around the outside (but a fairly long distance between either end) or in an L shape with some dead area in one corner. Maybe they could arrange the uncore into that part but, personally, I don't see it as a likely layout. I'm not professing to be right though, it's just doesn't feel right to me.
WRT the sweet spot of cores per module, it's not necessarily set in stone, and could change as a result of e.g. a different node, core design or changes to the Fabric.
Yeah I see where you are coming from, I was just trying to explain the pressures involved here.
The original 8086 had 29000 transistors, each one hand placed by some engineer. There symmetry and regularity helped to avoid mistakes (get it right once and repeat) and smooth the design. Modern chips have 2 billion transistors, last I heard there was almost no hand placement going on.
So my expectation, the CCX is heavily optimised in isolation because it has to run at high clock rates. That produces something the SoC layout team can treat as a hard cell and so just place down as a block, and the uncore and fabric will route around them. You are quite right that nothing is set in stone, but for a jump in cores within a CCX of 50% is pretty severe and I'm remembering the interviews when the Athlon X6 came out saying how much harder it was to connect 6 cores together than it was to connect 4, because this part of the design is before you have the fabric to help you. If AMD want to make a 6 core APU then a 6 core CCX sounds tempting, but if the if it makes the CCX unwieldy then it could well make more sense to just stick two simpler 4 core CCXs on an APU and go straight for 8 core. Thats assuming 7nm has good enough density to make more than 4 cores feasible, though when you consider that they are skipping a full node at 10nm compared to the existing design if they wanted cores there is probably now the transistor budget to go straight to 8 core. More likely the APU will stick to 4 cores and add shaders and cache.
AFAIK the communication and power planes, at least the on-die stuff, sit above and bellow the layers that make up the cores. It's not so much the rejigging of the core to CCX ratio that would be the problem it would be the starting afresh on the optimisation of those planes that would make little sense.
As DwU alludes to it's probably better to focus on improving a known fixed block rather than constantly having to adjust things as you add or remove cores (like, to some extent, Intel does), it's basically easier to tune a network of wires for both communication and power when you know your relative distance to size will remain a constant.
I suspect that for die-scale interconnects the layout isn't quite such a problem - you can move other bits of uncore around to fill any gaps or tweak the shape of your die as desired.
Plus, who says they'd stop at 3? I don't know what the actual comparative feature sizes are between 14nm and 12nm - I suspect you don't get the expected quartering of area - but you'd only need a halving of area to cram twice as much of everything onto the die, and we already know that they're not increasing the memory channel count or the number of PCIe lanes, because they're maintaining socket compatibility with existing platforms. So not only do you get extra space from the node shrink, you also get a somewhat higher core : uncore die area ratio....
From 14 to 12nm gets you nothing but faster transistors, the actual density isn't changed or at least not the way AMD have been using it.
The interesting stuff is at 7nm. According to https://en.wikichip.org/wiki/7_nm_li...y_process#TSMC you get almost three times the density from 16nm to 7nm, but that's an ideal sram cell so between the increased unrouteable area and the pads and drivers around the periphery that don't really scale you are probably looking at a doubling in transistors in that leap. There will be pressures to reduce die size though if cost per transistor is improved but cost per mm^2 worsens, which is what makes me wonder if 12 core might be the sweet spot. Marketing might say sod the cost though, I think Nvidia and Intel have demonstrated that high performance is something that people are willing to buy. That would make a drop to a quad core APU seem quite steep. *shrug*, who knows...
There are currently 1 users browsing this thread. (0 members and 1 guests)