Page 78 of 137 FirstFirst ... 2838485868757677787980818898108118128 ... LastLast
Results 1,233 to 1,248 of 2179

Thread: AMD - Zen chitchat

  1. #1233
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Gateshead
    Posts
    15,196
    Thanks
    1,232
    Thanked
    2,290 times in 1,873 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 2x 4GB DDR4 2666
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD - Zen chitchat

    Quote Originally Posted by Blaineoliver View Post
    ... It does seem like the memory goes via the infinity fabric, but then that makes sense if you think about the fact memory speed affects gaming more as per CCX traffic is sped up by faster ram.
    That wasn't the issue, it was the claim that Intel's memory controllers are inside the cores! They're not, they're a separate part of the die and are accessed though a crossbar, ring bus or mesh (can't remember which for each particular Intel processor line). Infinity Fabric is an extension of the same concept that makes it easier to tie different types of block together - so rather than only connecting a set of cores to a memory controller, it can also connect blocks of cores to other blocks of cores (i.e. CCXes) or to a GPU (e.g. in APUs).

    Intel's IMCs are probably "closer" to the cores, topologically, but they're certainly not integrated into the cores - they're part of the "uncore" of the chip.

  2. Received thanks from:

    Corky34 (24-04-2018)

  3. #1234
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by Xlucine View Post
    Err, what? That's news to me
    What Blaineoliver said, it's a bit amorphous when talking about microarchitecture design as where one part stops and another starts is open to interpretation but in the case of Intel each individual core within a package has its own memory controller that can issue their own memory read/write requests onto the bus that in turn connects to the DRAM interface that talks to the RAM.

    With Zen (afaict) each core talks to the cache-coherent master (CCM) and it's the CCM that places data onto the SDF that in turn links to the unified memory controller (UMC).

    As a side note it's probably why AMD were all but unaffected by meltdown and specter as only the CCMs and IOMs can talk directly to the UMC, whereas with Intel there's not that separation.

    Quote Originally Posted by scaryjim View Post
    Intel's IMCs are probably "closer" to the cores, topologically, but they're certainly not integrated into the cores - they're part of the "uncore" of the chip.
    Yea ^^That^^ It's not so much "in" the core but closer to it.

    Diagrams may do a better job of explaining things.
    kaby-lake soc block diagram...

    Zen soc block diagram...
    Last edited by Corky34; 24-04-2018 at 01:21 PM. Reason: Adding pictures

  4. #1235
    Moosing about! CAT-THE-FIFTH's Avatar
    Join Date
    Aug 2006
    Location
    Not here
    Posts
    32,042
    Thanks
    3,909
    Thanked
    5,213 times in 4,005 posts
    • CAT-THE-FIFTH's system
      • Motherboard:
      • Less E-PEEN
      • CPU:
      • Massive E-PEEN
      • Memory:
      • RGB E-PEEN
      • Storage:
      • Not in any order
      • Graphics card(s):
      • EVEN BIGGER E-PEEN
      • PSU:
      • OVERSIZED
      • Case:
      • UNDERSIZED
      • Operating System:
      • DOS 6.22
      • Monitor(s):
      • NOT USUALLY ON....WHEN I POST
      • Internet:
      • FUNCTIONAL

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    True but I'm unsure how much more they could get out of memory latency, with Zen the cores in each CCX don't have direct access to RAM, the memory controllers are not integrated into the cores themselves like they are with Intel, they're attached to the infinity fabric's (IF) scalable data fabric (SDF), it adds an extra step compared to Intel.

    That's not to say one is better or worse than the other as it's horses for course but it seems AMD have picked most of the low hanging fruit in terms of RAM latency already.
    Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??

  5. #1236
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by CAT-THE-FIFTH View Post
    Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??
    I think it's running at the same speed as if IIRC they said they didn't want to introduce a clock skew due to increased latency, could it be DDR effectively doubles the clockspeed?

    (Link)
    The DRAM is attached to the DDR4 interface which is attached to the Unified Memory Controller (UMC). There are two Unified Memory Controllers (UMC) for each of the DDR channels which are also directly connected to the SDF. It's worth noting that all SDF components run at the DRAM's MEMCLK frequency. For example, a system using DDR4-2133 would have the entire SDF plane operating at 1066 MHz. This is a fundamental design choice made by AMD in order to eliminate clock-domain latency.

  6. #1237
    Moosing about! CAT-THE-FIFTH's Avatar
    Join Date
    Aug 2006
    Location
    Not here
    Posts
    32,042
    Thanks
    3,909
    Thanked
    5,213 times in 4,005 posts
    • CAT-THE-FIFTH's system
      • Motherboard:
      • Less E-PEEN
      • CPU:
      • Massive E-PEEN
      • Memory:
      • RGB E-PEEN
      • Storage:
      • Not in any order
      • Graphics card(s):
      • EVEN BIGGER E-PEEN
      • PSU:
      • OVERSIZED
      • Case:
      • UNDERSIZED
      • Operating System:
      • DOS 6.22
      • Monitor(s):
      • NOT USUALLY ON....WHEN I POST
      • Internet:
      • FUNCTIONAL

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    I think it's running at the same speed as if IIRC they said they didn't want to introduce a clock skew due to increased latency, could it be DDR effectively doubles the clockspeed?

    (Link)
    The thing is running faster RAM causes the effective bandwidth between the CCXes to increase,so is why Ryzen seems to gain somewhat more from higher clocked RAM than Intel.

  7. #1238
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD - Zen chitchat

    Yes, it's not really running at half clock speed it's that DDR doubles the data transfer rate, (IIRC by using both the rise and fall to transmit data).

  8. #1239
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    13,006
    Thanks
    780
    Thanked
    1,568 times in 1,325 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    ... but in the case of Intel each individual core within a package has its own memory controller
    Careful with those terms. Each core has a port onto it's stop on the ring bus. Memory controllers drive ddr channels. As shown by the image you embedded, Intel bundle their memory controllers into the system agent which has it's own single stop on the ring bus.

    I am interested to see that Intel can still get the ring bus to work for them. ATi gave up a really long time ago as it didn't scale well enough for them. Remember this? ...



    Quote Originally Posted by CAT-THE-FIFTH View Post
    Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??
    Probably a distance issue. Interconnect will have to go a long way across the die (I'm taking a clue from the name ), that means charging up a long wire which takes time so limits clock speed.


    Edit: I have a sneaky feeling that Intel's pre-fetcher is much better than AMD's which would make sense as back when Intel's Core2 still used a chipset memory controller they really needed it to hide the latency of that long trip to DRAM. They hid it rather well. Put that prefetch technology into a modern core with an on-die IMC and I'm guessing it could hide the latency across their ring bus.

    Another thought: If you look at the Zen diagram above and mentally substitute a ring bus for the infinity fabric, then the only difference between the Intel and AMD memory models is that AMD have the path to DRAM on the other side of the L3. OTOH each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. I can't see one as inherently "closer" to ram than the other. Just different.
    Last edited by DanceswithUnix; 24-04-2018 at 01:58 PM.

  9. #1240
    Moosing about! CAT-THE-FIFTH's Avatar
    Join Date
    Aug 2006
    Location
    Not here
    Posts
    32,042
    Thanks
    3,909
    Thanked
    5,213 times in 4,005 posts
    • CAT-THE-FIFTH's system
      • Motherboard:
      • Less E-PEEN
      • CPU:
      • Massive E-PEEN
      • Memory:
      • RGB E-PEEN
      • Storage:
      • Not in any order
      • Graphics card(s):
      • EVEN BIGGER E-PEEN
      • PSU:
      • OVERSIZED
      • Case:
      • UNDERSIZED
      • Operating System:
      • DOS 6.22
      • Monitor(s):
      • NOT USUALLY ON....WHEN I POST
      • Internet:
      • FUNCTIONAL

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    Yes.
    The main issue is that effective bandwidth to the L3 cache is much higher than the current bandwidth between CCX units,so they do need to find away around this,as it clearly does seem to have an effect with workloads like games,and audio stuff,and its most likely we will see a move to 6 cores per CCX(or more) with the next iterations of Ryzen. Its why the effective IPC increase of the Ryzen 2000 series CPUs in games is between 5% to 7% instead of the 3% for non-gaming stuff,and DAW benchmarks saw massive jumps in certain scenarios. I suppose one way would be to increase the amount of L3 cache per core,but still doesn't stop the issue of games not recognising the CCXes properly if the dev hasn't bother patching things.

    It does make me wonder how a Ryzen APU in a console running GDDR5 or HBM2 would fare though!

  10. #1241
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by DanceswithUnix View Post
    Careful with those terms. Each core has a port onto it's stop on the ring bus. Memory controllers drive ddr channels. As shown by the image you embedded, Intel bundle their memory controllers into the system agent which has it's own single stop on the ring bus.

    I am interested to see that Intel can still get the ring bus to work for them. ATi gave up a really long time ago as it didn't scale well enough for them. Remember this? ...
    I was trying to be careful.

    AFAIK the memory subsystems have two parts (at least in terms of what we're referring to), there's the part that places data from the core intended for RAM onto the bus and then there's the part that takes it off the bus and sends it to the RAM.

    In Intel's case the part that places data onto the bus is (logically speaking) sitting between each core and it's connection to the bus (each core has its own connection), in AMD's case that part sits in the same (logical) place but groups 4 cores together and it's the CCM's job to put the data on the bus.

    The diagrams i posted don't include where those parts are but for the Intel one we could probably (logically) place them on the brown links going from each core to the ring bus and on the AMD diagram it would be the fat brown lane going to the IF crossbar.

    IIRC you're right in being suprised with Intel still using the ring bus, again if IIRC i read something about them struggling to get latency down to acceptable levels when core counts increased, it's probably why they've been unwilling to release higher core parts for so long as fixing that probably requires quiet a bit of redesign and using a similar system to what AMD have used.

    EDIT: I won't post it here as it's a little OT but here's a link to Intel's solution to increased core counts, lots of connected ring buses, it gives me nightmares of the old ring bus networks. And they had the cheek to poke holes in AMD's implementation, the networking world mostly moved away from the ring bus topology for a reason, Intel.
    Last edited by Corky34; 24-04-2018 at 06:04 PM. Reason: bad sppeling

  11. #1242
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,478
    Thanks
    1,541
    Thanked
    1,029 times in 872 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by DanceswithUnix View Post
    Another thought: If you look at the Zen diagram above and mentally substitute a ring bus for the infinity fabric, then the only difference between the Intel and AMD memory models is that AMD have the path to DRAM on the other side of the L3. OTOH each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. I can't see one as inherently "closer" to ram than the other. Just different.
    Pretty much what I wanted to say - at least at the block diagram level it's not all that different to Intel's - both access the memory controller via a bus, the blue ring on the Intel diagram is equivalent to the pink Fabric block on the AMD one.

    As for the L3 cache positioning, AMD use a victim cache so I'm not sure if that would add latency but it also means AMD aren't prefetching into it like Intel do. It's also not clear from diagrams how Intel have it set up, some seem to show cores connecting to the ring through the L3 similar to AMD:
    http://www.legitreviews.com/intel-hd...tecture_170869
    https://www.pcper.com/reviews/Proces...ablets-servers

    The ring hasn't kept scaling for Intel - in some of their past generation Xeons they've used multiple rings, and now they've switched to a mesh.

    I know Intel have played with the ring's clock since Sandy bridge and IIRC it now runs at the core clock vs AMD's memory clocked fabric. Everything considered, AMD have to balance scalability, power and performance across a range of die configurations.

    Edit: Nope sorry, the ring has its own clock domain now, and data is fed through the L3:
    Ring Clock - The frequency at which the ring interconnect and LLC operate at. Data from/to the individual cores are read/written into the L3 at a rate of 32B/cycle operating at Ring Clock frequency.
    https://en.wikichip.org/wiki/intel/m...#Clock_domains
    Last edited by watercooled; 24-04-2018 at 07:13 PM.

  12. #1243
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by DanceswithUnix View Post
    Another thought: If you look at the Zen diagram above and mentally substitute a ring bus for the infinity fabric, then the only difference between the Intel and AMD memory models is that AMD have the path to DRAM on the other side of the L3. OTOH each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. I can't see one as inherently "closer" to ram than the other. Just different.
    So I've been doing some more research and sorting things out in my head and i don't think it's as simple as substitute a ring bus for the infinity fabric, if I've not got myself confused the scalable data fabric (SDF) in IF consists of two separate links, there's the infinity fabric on-package (IFOP) and infinity fabric intersocket (IFIS) links, the IFOP link from what i can tell doesn't deal with memory controller requests as those are sent via the IFIS.

    At first trying to work out the difference had me confused as data being sent via the IFOP (core-2-core, core-2-L1, core-2-L2 on a single package) is sent via a link that doesn't (from what i can tell) have access to the UMC, although I'm not entirely sure on that.

    It seems the cores within a CCX can "talk" to each other as and when they like however if they need to "talk" to something outside of their own CCX (memory controller, PCIe, USB, another CCX, another socket, etc, etc) then the request needs to go via the cache-coherent master that encodes the data onto the IFIS.

    I've probably got that all wrong knowing me and I'm still unsure what plane the L3 cache operates on.

  13. #1244
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Gateshead
    Posts
    15,196
    Thanks
    1,232
    Thanked
    2,290 times in 1,873 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 2x 4GB DDR4 2666
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD - Zen chitchat

    Quote Originally Posted by CAT-THE-FIFTH View Post
    Apparently the clockspeed of the inter-CCX connect is running at half the memory clockspeed. I assume this is down to power issues??
    iirc it runs at the actual clock speed of the RAM, i.e. if you use 2666 'MHz' DDR4 the fabric runs at 1333MHz? Either way, it runs in the DRAM clock domain which means that if you run faster RAM you also run a faster infinity fabric. I wondered with first gen ryzen whether the memory speed limits were more down to the fabric then the actual IMC... the improvement to memory clocks could easily be because the fabric can clock higher on 12nm...

    Quote Originally Posted by DanceswithUnix View Post
    ... each time Intel introduce another core you get another stop on the ring bus so AIUI it takes 1 more cycle for data to clock around. ...
    Which is why they use a mesh on the high-core-count Xeons, I believe.

    I wonder if the ring bus connecting between the L2 and L3 cache has some impact on the fact that Intel's direct RAM access latency is so good? AMD appear to have at least equivalent, if not better, cache latency, but from those block diagrams it looks like direct RAM access has to traverse the L3 cache, which means adding an extra step vs. the Intel ring bus....

  14. #1245
    Moosing about! CAT-THE-FIFTH's Avatar
    Join Date
    Aug 2006
    Location
    Not here
    Posts
    32,042
    Thanks
    3,909
    Thanked
    5,213 times in 4,005 posts
    • CAT-THE-FIFTH's system
      • Motherboard:
      • Less E-PEEN
      • CPU:
      • Massive E-PEEN
      • Memory:
      • RGB E-PEEN
      • Storage:
      • Not in any order
      • Graphics card(s):
      • EVEN BIGGER E-PEEN
      • PSU:
      • OVERSIZED
      • Case:
      • UNDERSIZED
      • Operating System:
      • DOS 6.22
      • Monitor(s):
      • NOT USUALLY ON....WHEN I POST
      • Internet:
      • FUNCTIONAL

    Re: AMD - Zen chitchat

    Looks like the AT results were down to HPET issues:

    https://www.anandtech.com/show/12678...-ryzen-results

  15. Received thanks from:

    Corky34 (25-04-2018)

  16. #1246
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,478
    Thanks
    1,541
    Thanked
    1,029 times in 872 posts

    Re: AMD - Zen chitchat

    So having very quickly skimmed it, it seems like AMD results are basically correct, and Intel results are the ones reading on the low side? I'll be sure to read it properly when I get a chance.

    Quote Originally Posted by watercooled View Post
    ...provided they're correct and not due to e.g. a timing issue...
    Seems it had something to do with that after all! Although it seems the performance was measured correctly, but the method of measurement actually impacted performance? Again, I've only skimmed it.

  17. #1247
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by CAT-THE-FIFTH View Post
    Looks like the AT results were down to HPET issues:

    https://www.anandtech.com/show/12678...-ryzen-results
    I guess that invalidates those cache timing test results and by extension what i thought about AMD having picked most of the low hanging fruit with regards to their caches.

    I hope they revisit those tests as for me it's interesting to see and IDK of other sites that have dug that deep.

    Quote Originally Posted by watercooled View Post
    Seems it had something to do with that after all! Although it seems the performance was measured correctly, but the method of measurement actually impacted performance? Again, I've only skimmed it.
    Having read through it, and I'm happy for someone to correct me, i got the impression it's a bit of both.

    By forcing Windows to only use one timer they caused problems in the performance of programs and also the measurement of that performance.
    Last edited by Corky34; 25-04-2018 at 06:16 PM.

  18. #1248
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    11,478
    Thanks
    1,541
    Thanked
    1,029 times in 872 posts

    Re: AMD - Zen chitchat

    As I understand it, AMD's results are valid (they don't appear to be affected by this HPET implementation 'bug'). Also, other sites should be unaffected as this is down to the way Anandtech test.

Thread Information

Users Browsing this Thread

There are currently 14 users browsing this thread. (0 members and 14 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •