Page 101 of 106 FirstFirst ... 51617181919899100101102103104 ... LastLast
Results 1,601 to 1,616 of 1685

Thread: AMD - Zen chitchat

  1. #1601
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    10,758
    Thanks
    1,494
    Thanked
    925 times in 796 posts

    Re: AMD - Zen chitchat

    The IO die is physically much smaller - Anandtech has some measurements.

  2. #1602
    Registered+
    Join Date
    Oct 2007
    Posts
    49
    Thanks
    0
    Thanked
    0 times in 0 posts

    Re: AMD - Zen chitchat

    Yeah i expected the io dies to be different. with 14nm its cheap to do as well. I'm quite surprised how much smaller the am4 io die is but chucking 6 ddr4 channels off does help, along with the extra security stuff.

  3. #1603
    Registered+
    Join Date
    Oct 2007
    Posts
    49
    Thanks
    0
    Thanked
    0 times in 0 posts

    Re: AMD - Zen chitchat

    one good thing with chiplets is that doing 16c is easy but you don't need to worry about trying to do 12c, as you can use 2x chiplets (with 2 defective cores from each die) or you can use two of those chiplets in two seperate hex core chips.

    chiplets are quite kind on defective dies.

    more i think about chiplets, the more benefits i see to them, with very little downsides as long as they were small enough. Bulldozer era made that impossible - but at 7nm it becomes completely sane to do so

  4. #1604
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    9,800
    Thanks
    482
    Thanked
    1,009 times in 858 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 2600X
      • Memory:
      • 16GB 3200MHz
      • Storage:
      • 1TB Linux, 1TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 28 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Samsung 2343BW 2048x1152
      • Internet:
      • Zen 80Mb/20Mb VDSL

    Re: AMD - Zen chitchat

    Quote Originally Posted by watercooled View Post
    The IO die is physically much smaller - Anandtech has some measurements.
    Pretty big though. 123mm^2 is about the size of a Coffee Lake quad core with integrated graphics.

    Traditionally northbridge style designs are pad limited which is why they started putting integrated graphics on motherboards, to find something to do with the silicon area you get with that many I/O bumps around the outside. So I wonder what they do with the space? Huge cache perhaps, or some graphics, or both.

  5. #1605
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    10,758
    Thanks
    1,494
    Thanked
    925 times in 796 posts

    Re: AMD - Zen chitchat

    Aye, there are still plenty of questions over Zen2 - along with what you said, how many cores for a CCX?

    But in terms of die size, that alone doesn't necessarily imply manufacturing cost when comparing two different types of IC e.g. I imagine it has far fewer layers and therefore lithography steps vs cores and will be running at a lower clock speed with less power distribution. It's fairly big vs CPU sizes but it's not unreasonable for a kinda-Northbridge IMO. And given they've seemingly hinted towards the possibility of 16C versions, the die presumably has the IF logic for another core chiplet to connect.

    They're still being very quiet about details though!

  6. #1606
    Senior Member
    Join Date
    Mar 2005
    Posts
    4,554
    Thanks
    144
    Thanked
    307 times in 245 posts
    • badass's system
      • Motherboard:
      • ASUS P8Z77-m pro
      • CPU:
      • Core i5 3570K
      • Memory:
      • 32GB
      • Storage:
      • 1TB Samsung 850 EVO, 2TB WD Green
      • Graphics card(s):
      • Radeon RX 580
      • PSU:
      • Corsair HX520W
      • Case:
      • Silverstone SG02-F
      • Operating System:
      • Windows 10 X64
      • Monitor(s):
      • Del U2311, LG226WTQ
      • Internet:
      • 80/20 FTTC

    Re: AMD - Zen chitchat

    Quote Originally Posted by watercooled View Post
    Aye, there are still plenty of questions over Zen2 - along with what you said, how many cores for a CCX?

    But in terms of die size, that alone doesn't necessarily imply manufacturing cost when comparing two different types of IC e.g. I imagine it has far fewer layers and therefore lithography steps vs cores and will be running at a lower clock speed with less power distribution. It's fairly big vs CPU sizes but it's not unreasonable for a kinda-Northbridge IMO. And given they've seemingly hinted towards the possibility of 16C versions, the die presumably has the IF logic for another core chiplet to connect.

    They're still being very quiet about details though!
    Part of me is wondering if that I/O die will be used for Threadripper 3 - i.e. it's got 4 channels with only 2 in use for Ryzen 3 but it's got the ability to connect 4 chiplets and 4 memory channels. The size suggests otherwise and I imagine that instead AMD will use the Rome I/O die for Threadripper as I can't see Threadripper volumes justifying another die's development cost
    "In a perfect world... spammers would get caught, go to jail, and share a cell with many men who have enlarged their penises, taken Viagra and are looking for a new relationship."

  7. #1607
    Senior Member
    Join Date
    Dec 2013
    Posts
    2,824
    Thanks
    380
    Thanked
    353 times in 246 posts

    Re: AMD - Zen chitchat

    Anyone want to place bets on the I/O die having some amount of memory (L4 cache) or not, even at 14nm does it seem bigger than what's needed just for I/O?

  8. #1608
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    9,800
    Thanks
    482
    Thanked
    1,009 times in 858 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 2600X
      • Memory:
      • 16GB 3200MHz
      • Storage:
      • 1TB Linux, 1TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 28 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Samsung 2343BW 2048x1152
      • Internet:
      • Zen 80Mb/20Mb VDSL

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    Anyone want to place bets on the I/O die having some amount of memory (L4 cache) or not, even at 14nm does it seem bigger than what's needed just for I/O?
    I think it needs enough prefetch buffering to compensate for the SerDes hops that IF uses to cross from cpu die to the I/O and back, so yes probably an L4.

    It is also interesting to note that on the GF 14nm process weighing in at 123mm^2 with 8 PCIe lanes, two memory controllers and a bunch of high speed serial I/O for HDMI & displayport is the RX560. Now the 560 is more of a square chip to maximise the amount of logic on the die, whereas the I/O chip here is rectangular to expose more edge for I/O. So this chip has fewer logic transistors but more connectivity than a 560, but I'm half expecting the I/O controller to contain a GPU if it has my guesstimate of 2.5B transistors to use up.

  9. #1609
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Manchester
    Posts
    15,007
    Thanks
    1,188
    Thanked
    2,234 times in 1,839 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 1x 8GB DDR4 2400
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD - Zen chitchat

    Quote Originally Posted by DanceswithUnix View Post
    ... this chip has fewer logic transistors but more connectivity than a 560, but I'm half expecting the I/O controller to contain a GPU if it has my guesstimate of 2.5B transistors to use up.
    That's ... a curious concept. Would there still be space if we assume the IO chip has room for up to 4 chiplets/4 memory channels (and, I guess, 64 PCIe lanes) (so it can be used for Threadripper as well as plain Ryzen)?

    Let's see - a full Zeppelin die (on the samne 14nm process) is 192mm2 and 4.8Bn transisitors. According to https://en.wikichip.org/wiki/amd/mic...en#Scalability a CCX is 44mm2 and 1.4Bn transistors. Two of those is 88mm2 and 2.8Bn transistors, leaving Zeppelin's non-CCX budget (2 memory channels, 32 PCIe lanes, peripheral IO and various IF links) at 104mm2 and 2Bn transitors.

    So, speculatively, the Ryzen 2 IO die may have ~ 19mm2 and 0.5Bn transistors to play with over a Zeppelin uncore. I dfon't see any way they can cram double the oncore resources in to that, and it doesn't sound like much space for a GPU either, it's barely enough for a bit of L4 cache (the L3 in Ryzen is 16mm2 per that source)...

    As far as Threadripper goes, I did wonder if they might engineer the smaller IO die so you can link them together, and TR will end up being essentially the same as it is now - two Ryzens linked over IF - the only difference would be that you'd be linking two IO chips together rather than two full dies...

  10. #1610
    Senior Member watercooled's Avatar
    Join Date
    Jan 2009
    Posts
    10,758
    Thanks
    1,494
    Thanked
    925 times in 796 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by badass View Post
    Part of me is wondering if that I/O die will be used for Threadripper 3 - i.e. it's got 4 channels with only 2 in use for Ryzen 3 but it's got the ability to connect 4 chiplets and 4 memory channels. The size suggests otherwise and I imagine that instead AMD will use the Rome I/O die for Threadripper as I can't see Threadripper volumes justifying another die's development cost
    I'd gone through that very thought process myself! Likewise, I think it's too small for that though.

    Quote Originally Posted by Corky34 View Post
    Anyone want to place bets on the I/O die having some amount of memory (L4 cache) or not, even at 14nm does it seem bigger than what's needed just for I/O?
    It seems a fairly outlandish thing so normally I'd doubt it, but so is a lot of what AMD is doing at the moment so it wouldn't surprise me.

    Quote Originally Posted by scaryjim View Post
    So, speculatively, the Ryzen 2 IO die may have ~ 19mm2 and 0.5Bn transistors to play with over a Zeppelin uncore. I dfon't see any way they can cram double the oncore resources in to that, and it doesn't sound like much space for a GPU either, it's barely enough for a bit of L4 cache (the L3 in Ryzen is 16mm2 per that source)...
    Interesting analysis - add to that you need the on-package interface to connect to the chiplets and you might have used up that 19mm2. I don't think 123mm2 is all that big for what it is - uncore takes up a lot of space on modern CPUs, and don't forget that's one reason for doing this in the first place given its relatively poor scaling.

  11. #1611
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Manchester
    Posts
    15,007
    Thanks
    1,188
    Thanked
    2,234 times in 1,839 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 1x 8GB DDR4 2400
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD - Zen chitchat

    Quote Originally Posted by watercooled View Post
    … It seems a fairly outlandish thing so normally I'd doubt it, but so is a lot of what AMD is doing at the moment so it wouldn't surprise me.
    As I've mentioned - often - in conversations about Ryzen, even going off-CCX on the same silicon has a significant latency penalty. With chiplets, you're talking about the potential for half your L3 cache to be on a different piece of silicon completely. If the only way to access that is a multi-step process via the IO chip, you're going to have absolute killer latency once a thread exhausts its chiplet's cache. Having a mirror L4 cache in the IO chip would help reduce that. And AMD have to minimize those cache latencies if Zen 2 is going to perform well in the real world...

    Quote Originally Posted by watercooled View Post
    … Interesting analysis - add to that you need the on-package interface to connect to the chiplets and you might have used up that 19mm2. ...
    It's worth remembering that there must be silicon in a Zeppelin die that connects the CCXes to each other and the IO/memory controllers etc., and that would all be included in the uncore in my analysis. So that shouldn't really need any extra silicon space, unless the IO chips have a lot more IF connectivity than a Zeppelin...

  12. #1612
    Senior Member
    Join Date
    Dec 2013
    Posts
    2,824
    Thanks
    380
    Thanked
    353 times in 246 posts

    Re: AMD - Zen chitchat

    Quote Originally Posted by scaryjim View Post
    As I've mentioned - often - in conversations about Ryzen, even going off-CCX on the same [s]silicon[/s] substrate has a significant latency penalty.
    FTFY, at least i think i have as IIRC each individual CCX is fabricated on a single piece of silicon.

    Quote Originally Posted by scaryjim View Post
    It's worth remembering that there must be silicon in a Zeppelin die that connects the CCXes to each other and the IO/memory controllers etc., and that would all be included in the uncore in my analysis. So that shouldn't really need any extra silicon space, unless the IO chips have a lot more IF connectivity than a Zeppelin...
    That touches on something I've been thinking about, we know currently Zen shares its L3 cache between all cores and all CCX's so it seem unlikely that would've changed with Zen2, that got me thinking why you'd want or need and L4 cache in the I/O die, if we assume the CCX's are connected directly to each other and sharing their L3 caches (very probable) then what's the advantage of adding L4 to the I/O die?

    This may sounds nuts but i thought I'd spitball it with you guys, wouldn't it make more sense to move the separate pieces of shared L3 cache within each CCX to the I/O die, you're not really increasing latencies as each cores L3 cache has to remain consistent with every other core, both within it's own CCX and others so there was already a latency penalty in doing that, moving the L3 into a separate block means you reduce, or completely eliminate, the need to directly connect each CCX to each other as you no longer need to keep the data consistent between CCX's, you only need to keep it consistent with the L3 cache on the separate (I/O) die.

    That makes it seem like an L4 cache in the I/O die seem not only pointless but more complicated than needs be as you now have to keep two caches consistent, and on of those is divided up between CCX's, thought?

    Also a side thought i had, with Zen2 having a separate I/O die and 1-2 CCX's could/would it make direct die cooling safer? I know IHS' became a thing because smaller dies increased the risk of chipping the die but with three separate dies on a single package isn't that risk reduced what with spreading the load, obviously it would be a right PITA as you'd have to separate a IHS that's been soldered on and you'd have to reduce the Z height of the heatsink but it would be interesting to see how effective direct die cooling would be.

  13. #1613
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Manchester
    Posts
    15,007
    Thanks
    1,188
    Thanked
    2,234 times in 1,839 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 1x 8GB DDR4 2400
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    FTFY, at least i think i have ...
    No, no you haven't, and this is why I keep mentioning it.

    Each zeppelin die has 2 CCXes. Each CCX has 8MiB of L3 cache. The CPU reports itself to the OS as an 8-core, 16 thread chip with 16MiB of L3 cache, but in reality it's 2 4-core, 8-thread, 8MiB L3 cache chips with a lot of clever, fast interconnects.

    Here's the relevant graph from Anandtech's 2nd-gen Ryzen Deep Dive:



    See that big jump for Ryzen between 4MiB and 8MiB (n.b. that's a log scale graph, so the jump is actually even bigger than it appears there)? Notice how it happens at exactly the same point as the 8MiB Ryzen 2400G runs out of its 8MiB cache and hits main memory? Notice how @ 8MiB strides the 2700X has roughly the same latency to cache as the I7 8700k has to main memory?

    That's the performance issue I talk about. It's nothing to do with going off-silicon, because the 2700X is one piece of silicon. It's all about the latency delay once a CCX fills its 8MiB of L3 cache and has to grab data from another CCX somewhere. That's slow on the same silicon. It's slower to another piece of silicon across a substrate (a la Threadripper), which would be the best case scenario in a multiple chiplet design*. If you had to travel across a substrate to an IO chip, from the IO chip to a second chiplet with the other CCXes on, then back to the original requesting core via the IO chip again, that's likely to be worse than going straight from the IO chip to the main memory.

    As someone's said recently (either in this thread or elsewhere on Hexus) AMD have back-engineered from a fully integrated SoC to a packaged northbridge + base CPU. They've even decoupled the memory controller from the cores, so we're right back to - effectively - using FSB. It's basically a mash-up of the Core 2 quads and the early Core i3/i5 with IGPs. It needs amazingly good cache management and interconnects to hide the latency penalties.

    * a little note on a chiplet design: if you want to keep cache accesses to other chiplets down to a single transfer across the substrate you'd need coherent links between all the chiplets That means each chiplet would need bumps and traces to 8 other chips (1 to the IO chip and 7 to the other chiplets), as well as logic and transport on the silicon itself to manage the access. That strikes me as a lot of extra silicon in each chiplet vs keeping a supplemental L4 cache on the IO chip and keeping the chiplets down to a single link to the IO chiplet.

  14. #1614
    Senior Member
    Join Date
    Dec 2013
    Posts
    2,824
    Thanks
    380
    Thanked
    353 times in 246 posts

    Re: AMD - Zen chitchat

    Apologies, i keep mixing up my dies and CCX's don't i.

    Not that it matters much but that Anandtech graph doesn't do a great job of showing what you're talking about IMO, i think PCPer did a better job when they looked at the 1600X and compared a 1800X with an Intel 5960X.



    And to be fair the ping times between cores within a CCX are lower than between cores on an Intel CPU, 80ns vs 40ns, it's only when traversing between CCX's that there's a jump up to 140ns.
    Quote Originally Posted by scaryjim View Post
    * a little note on a chiplet design: if you want to keep cache accesses to other chiplets down to a single transfer across the substrate you'd need coherent links between all the chiplets That means each chiplet would need bumps and traces to 8 other chips (1 to the IO chip and 7 to the other chiplets), as well as logic and transport on the silicon itself to manage the access. That strikes me as a lot of extra silicon in each chiplet vs keeping a supplemental L4 cache on the IO chip and keeping the chiplets down to a single link to the IO chiplet.
    I think I've got you at it now. It would only need traces in the substrate for two/one die not 8 chips as like you say each die contains 8 cores so their connections are handled within the die.
    Last edited by Corky34; 12-01-2019 at 05:37 PM.

  15. #1615
    Senior Member kalniel's Avatar
    Join Date
    Aug 2005
    Posts
    29,023
    Thanks
    1,478
    Thanked
    2,905 times in 2,354 posts
    • kalniel's system
      • Motherboard:
      • Gigabyte X58A UD3R rev 2
      • CPU:
      • Intel Xeon X5680
      • Memory:
      • 12gb DDR3 2000
      • Graphics card(s):
      • nVidia GTX 1060 6GB
      • PSU:
      • Seasonic 600W
      • Case:
      • Cooler Master HAF 912
      • Operating System:
      • Win 10 Pro x64
      • Monitor(s):
      • Dell U2311H
      • Internet:
      • O2 8mbps

    Re: AMD - Zen chitchat

    Quote Originally Posted by scaryjim View Post
    As someone's said recently (either in this thread or elsewhere on Hexus) AMD have back-engineered from a fully integrated SoC to a packaged northbridge + base CPU. They've even decoupled the memory controller from the cores, so we're right back to - effectively - using FSB. It's basically a mash-up of the Core 2 quads and the early Core i3/i5 with IGPs. It needs amazingly good cache management and interconnects to hide the latency penalties.
    I was among them (though by no means the only one). Intel have done good latency hiding in the past, and I wouldn't be surprised if AMD had picked up a lesson or two from their own console implementations too.

    A possible benefit of northbridge is you get to really tune/push that memory controller, which might help counter a touch of latency as well.

    I hadn't seen it mentioned here yet, but did you see what Anandtech said about the power?! They estimate it's nearly twice as energy efficient compared to a 9900K during Cinebench. That's got to be one of the gains by going with mixed die processes - you can keep each in the preferred power-frequency window without having to compromise quite so much.

  16. #1616
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    9,800
    Thanks
    482
    Thanked
    1,009 times in 858 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 2600X
      • Memory:
      • 16GB 3200MHz
      • Storage:
      • 1TB Linux, 1TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 28 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Samsung 2343BW 2048x1152
      • Internet:
      • Zen 80Mb/20Mb VDSL

    Re: AMD - Zen chitchat

    Quote Originally Posted by Corky34 View Post
    That makes it seem like an L4 cache in the I/O die seem not only pointless but more complicated than needs be as you now have to keep two caches consistent, and on of those is divided up between CCX's, thought?
    It is next to main memory which needs to be coherent with the L3 cache, so coherency is a wash as the logic is pretty much there anyway.

    Northbridge cache is I believe what made the Nvidia chipset motherboards so fast when they came out back in the Athlon era. Usually the L3 cache hides memory accesses of the memory controllers, but if they are on another piece of silicon then they should be tuned to hide the latency of the fabric to the IO controller, the controller can have it's own cache tuned to hide the memory latency also acting as a destination for any prefetchers it might have.

  17. Received thanks from:

    Corky34 (13-01-2019)

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •