Read more.16nm data centre GPU card leverages HBM2 and NVLink.
Read more.16nm data centre GPU card leverages HBM2 and NVLink.
Hmm - going from the 28,672 cuda cores in the GDX-1, that works out to just 3,584 cores per GPU - that's surprisingly low. 15Bn transistors is almost double what a titan X has, yet a titan X has 3,072 cores. It's also a factor of 7, which is odd for a computer (3584=7*2^9). Perhaps this is a 4096 core GPU with 1/8th disabled for yields? I've heard the 16nm node is about twice as space efficient as the 28nm, so it's probably somewhere around a square inch (600ish mm^2) which is rather bold for a new process node - this makes more sense if they're fusing off an eighth to boost yields, although it doesn't explain why cuda cores take so many more transistors now.
Ah, so that's how they manage an announcement when they can't ship product, they say you can have one in June if you have $130000. Not many people will notice if that deadline slips by
kalniel (06-04-2016)
ANYWAY AM IMPRESSED. Raw performance numbers for the P100 are as follows; 5.3 TeraFLOPS double-precision, 10.6 TeraFLOPS single-precision, with 15.3 B transistors. Fury X has 8.9 B transistors, 4096 cores, 8.6 TeraFlops single-P but loses on Double-P with just 535 G-flops. I think the Pascal 100 extra transistors is for increasing D-P speeds.
300W TDP and also large scale availability early next year and it appears to be not fully enabled too. Hope it is not another GTX480!!
The card also prioritises FP16 and FP64 performance over FP32 which is more important for gaming.
This looks far more focussed on taking on Intel MIC.
JHH also looked unusually nervous if you watched the livestream too.
[QUOTE=Xlucine;1163601] extra transistors are for extra Double Precision speeds QUOTE]
More details about the P100 and a uarch deep dive here:
https://devblogs.nvidia.com/parallel...inside-pascal/
The entire design prioritises FP16 and FP64 performance at a uarch level in the way mixed precision works with Pascal,which is a major selling point of Pascal.
They also seem to have a 64 wide SM like GCN has 64 wide CU and even have 64K of memory per SM or CU.
It seems they are moving to a more GCN way of doing things.
Actually it reminds me a bit of Piledriver to be honest - the dual FP16 ops into a single FP32 unit.
But yes, quite a big change from Maxwell, and an arch I don't think is going to be well suited to desktop - I wouldn't be surprised if desktop products had very different ratios of units.
is AMD working on a product similar to NvLink?
Is it just me, or do NVidia's five "miracle" breakthroughs include at least 2 (16nm finfet and interposer-connected memory) that are already in mass production by other companies...?
Do you think they announced a Tesla because they need more time to get working graphics drivers? I mean, if they have silicon and it could so much as display a desktop they would demo it right?
They've certainly pushed this product at our faces early doors!
I'm not overly impressed. We all knew that an halo class product in the next generation will be using HBM2, NVlink has been a long time in the making as well so that's not a shocker. It's impressive the size of the die they have aimed for but that in and of itself isn't an achievement, is it?
If tomorrow AMD announced a stupidly huge Polaris die with a TDP of 300W then we would be surprised as, as far as I remember, AMD as normally never had a huge die, until very recently with the Fiji which I thought was a bit of a one off.
Lets see how it filters into consumer products first and then we can judge it for our own purposes.
Steam - ReapedYou - Feel free to add me!!
You'd hope so, wouldn't you. Still, it'd be more worrying if this keynote actually come from GDC2016 - that'd be a very odd place to announce a compute card, given that's the Game Developer's Conference.
However, GDC2016 was back in mid-march. This keynote was at nvidia's own GPU Technology Conference (i.e. GTC 2016): http://www.gputechconf.com/ where announcing a compute card makes a lot more sense.
OTOH, the announcement just keeps raising questions. Quite apart from the DP heavy design (which might not play out well in the gaming market), I want to know what they're doing with the HBM2 to only get 720GB/s out of it (and quite why they think that's 3x Maxwell when the 980 Ti and Titan X pull 336GB/s)? AMD's 4 stack HBM1 implementation returned 512GB/s theoretical bandwidth. The more I hear about Pascal the less impressive it appears to be....
There are currently 1 users browsing this thread. (0 members and 1 guests)