Read more.Quote:
Available in Q3.
Printable View
Read more.Quote:
Available in Q3.
If Threadripper socket is only quad channel memory, I wonder if you only get 1 memory channel per die to balance things or if they leave 2 dies without any ram access so that lightly threaded work gets full memory width.
Is the socket quad-channel only, though? If the DIMM slots are each wired directly to the socket then the CPU could potentially address all 8 as separate channels. Unless I'm misunderstanding how multiple DIMMs per channel are typically wired, that is.
EDIT! Nevermind, just checked the specs. Definitely quad-channel only.
Technically speaking what we think of as the cores (four of them per CCX) don't communicate directly with the ram (or any external I/O), any off CCX data gets sent to the CCM whose job it is to place that data onto the SDF, it then gets taken off the SDF by a UMC and it's the UMC that communicates with the RAM, i guess you could get a bottleneck what with eight CCM's trying to feed four UMC's, I've not done the maths. :eek:
The problem is things slow down when you go off chip, so if a core wants memory that is on another die then that request has to go off die to the memory controller on the other die. Going across a carrier in a package is way better than going socket to socket across a PCB, but there will still be a cost despite the best efforts of the ram prefetchers.
I'm guessing AMD already did the maths, which is why Epyc has 8 ram channels for 4 dies, normal Ryzen has 2 channels for 1 die.
Now you've got me confused, if a core wants to access data located within the local memory of another core on another CCX it would go via the CCM's, the memory controllers are not part of the dies, as in the four cores that make up a CCX can only talk directly to either one of the other three core (and their associated local memory (L1, 2, 3) or they can talk to that CCX's CCM.
Because ZEN was designed as a SoC the actual cores are pretty dumb as IF deals with all the communication that happens outside each group of four cores, including the memory controllers (UMC's) and any I/O request, if a core want's memory that's on another die it depends where that other die is, if it's within the same CCX it does it directly, if it's anywhere else a request is made to that CCX's CCM and the CCM places the request on the SDF for either another CCM or a UMC to take it off.
Basically the cores themselves don't have direct access to the DDR memory controllers (the UMC's) or even another CCX's CCM.
If the price is right I might actually replace my "work" pc which is still on a Xeon X5645 + 12 gigs
Interesting times, cant wait to see the 28 core i9 go head to head with the 32core TR2.
at 5GHz though the 28 core will probably be slightly faster overall, but probably also 3x the price lol.
Watch your terminology here. The memory controllers are not part of the CCX, but they are part of the die. Communication from a CCX on a die to the memory controller on the same die happens across a tiny piece of silicon. For that CCX to communicate with a memory controller on a different die on the same package crosses a much greater distance through traces and additional connections. That adds latency, and when you're talking about timespans of less than 100ns, that additional latency is going to be signficant - each nanosecond is a > 1% delay.
Quite, I think there is some confusion here.
This is a Ryzen die, a lump of silicon cut from a wafer and the basic building block here:
https://www.techpowerup.com/reviews/...es/dieshot.jpg
Threadripper used to have two of these in a package, Epyc had four in a package and in both cases fully connected. Now it seems that Threadripper will be an Epyc with some memory channels omitted and in some use cases that will matter.
https://www.anandtech.com/show/12906...w-x399-refresh
Apparently 2 of the dies will have their memory channels disabled and will expect to utilise the infinity fabric for memory access. So the "first" 16 cores will retain full bandwidth and the further 16 will have increased latency.
I tend to agree with the article in that, if the latency is going to be an issue for you, buy Epyc. As long as the scheduler in windows knows how to allocate correctly, I cant imagine there will be too many workloads affected by the additional latency.
Looking at that and the prices mooted I'd probably stick with a 16c32t chip with Zen+ architecture. I'd guess and say that it would give me a huge boost without breaking the piggy bank. Will probably boost to 4ghz as well rather than 3.4 ghz for some lower thread action
Yup, i did say you got me confused, for some reason the moment i read 'die' my brain thinks of the old northbridge/southbridge way of doing things and CPU's being not much more than cores. :Oops:
Would you have a link to any pics, I've read they used custom made motherboards with 32-phase power system and used a seriously powerful refrigerant system to chill the water down to 4°C, but i can't find any pictures.
Found some pictures of the cooling intel used, cant believe intel tried to make it out as if this was a real world scenario!
https://img.purch.com/r/711x457/aHR0...A4NTc0MC5qcGc=
https://img.purch.com/r/711x457/aHR0...A4NTYzNC5qcGc=
https://img.purch.com/r/711x457/aHR0...80ODEwLkpQRw==
https://img.purch.com/r/711x457/aHR0...80ODEwLkpQRw==
Look at the cooling on the VRMs its insane! I believe its also a non standard board so again would be another cost where as threadripper works in the current x399 without issue and can be air cooled sufficiently.
The cooler has a cooling capacity of 1650w as well... madness.
I wasnt expecting AMD to delivery double the cores on threadripper but its certainly a good item from the sounds of it, impressed how good their marketing and product releases have been now that they have solid products to sell. Looks like the future is pretty bright for AMD finally!
This video says it all...with some expletives, if people have sensitive ears...
https://www.youtube.com/watch?v=tRH0-QwhvVQ
Some details leaked about 7NM Epyc:
https://www.servethehome.com/amd-epy...ds-per-socket/
64 cores!!
I saw this linked on AT forums:
https://www.youtube.com/watch?v=Kr8ZekIZUWI
https://www.youtube.com/watch?v=Kr8ZekIZUWI
Apparently that is the cooler AMD used for its Threadripper 2 systems.Its called the Coolermaster AMD Ryzen Wraith Ripper cooler.
If the link is true,that means 8 cores per CCX,meaning Ryzen 3 CPUs are 16 cores and the APUs will be 8 cores! If the latency clockspeeds have improved,a Ryzen 3 3300X might do the job for me!!
This is the cooler AMD used for its TR2 systems:
https://i.imgur.com/qkjksei.jpg
https://i.imgur.com/qkjksei.jpg
Doing the 28 core at 5GHz is an amazing feat nonetheless but it was horrifically misleading to make people think it was going to be a commercially available part within a reasonable scope of time. It feels like they were just trying to "one up" using a cannibalised Xeon.
32 core threadripper though, i wonder if it's epyc under the hood and i wonder what the actual clocks will be. But the fact they've got a 32 core part in the same socket is pretty damn fabulous. Really curious to see how far this part goes.
The core and frequency war just took a big step...
That Intel shizzle cracked me up, tells you a lot about their head space currently.
It might still make sense to keep a CCX with 4 cores, and double the CCX count. That would allow a "low end" part with just 4 cores on a tiny die. But really it's down to whether more cores in the CCX makes it too hard to keep the cache latency good hurting performance.
Big caches are slower than small caches, else we would just get one massive L1 cache. That's one of the reasons I wonder if AMD will just add more CCX modules at 7nm, twice the CCX modules means twice the cache as well as twice the cores, the only possible performance hit then is on scaling the infinity fabric to cope with the extra ports.
I can't see them increasing the CCX core count as it would require a pretty radical redesign, adding more CCX's to a package shouldn't increase the latency beyond the baseline increase of going from CCX to CCX, infinity fabric can transfer directly node-to-node, island-hopping in a bus topology, or as a mesh topology system so beyond the delay in the electrical signal you shouldn't see a difference in latency from going to a neighboring CCX or one in the opposite corner (in a grid of 2x3, 3x3, 4x4).
They wouldn't really need to do a grad anyway, since going 2x2 would give them the 16 core count and make it "relatively" easy to directly connect all the CCXes to each other. 4 way communication will be more complex and slightly slower than 2 way, but I suspect they'll be able to mask that reasonably well - EPYC doesn't seem to suffer too much from 4 way communication between dies in an MCM, which should be a magnitude more problematic than between complexes on the same die.
Interesting times ahead, whatever they choose to do...
I want to see some decent motherboards for Threadripper like the X299 Sage from Asus.
That and lack of money are the only things stopping me right now.
Looks like the website updated its article:
So it will be 48 core next year,not 64 cores.Quote:
(Edit June 6, 2018: Mea Culpa. Looks like we got some generational information “confirmed” to us incorrectly. Expect a 48 core / 96 thread generation before a 64 core / 128 thread generation. Still quite a huge gap. DDR4 and interconnect improvement information held up to further confirmations. 64 core / 128 thread apparently is still coming, just missed one generation due to a few words not being typed in messages to us.)