And the gamers hardware is last in business industry...
And the gamers hardware is last in business industry...
Nvidia are probably saving any GeForce Pascal announcements for E3 in June.
I wouldn't think so...
ECC only really hurts if you want to just change a little bit of data, as you have to read in a line to recalculate the check code and write the whole line back. In something like a GPU which does big bursts of data, and is heavily cached I wouldn't expect ECC to really impact much at all as you are going to be dumping full width cache lines into RAM so don't have to do that slow read-modify-write cycle.
My understanding of ECC is different then, or have i misunderstood?
I thought EEC used an extra physical unit (RAM chip, maybe a HBM module?) to store the information (database?) of the parity bits, from what I've since read about ECC RAM it seems it doesn't reduce the throughput by any significant amount (1-2%), but i was thinking more along the lines of whether having ECC on a graphics card would reduce the available memory lanes, i.e with 4 HBM modules would you lose one of those 1024-bit memory lanes because you're using one module to store information of the parity bits on the other three modules?
My understanding is that rather hefty price tag is for eight Tesla P100 boards and the supercomputer to go with it.
Last edited by Corky34; 06-04-2016 at 01:39 PM.
It sounds like your mental image of how ECC works is much like a RAID5 array! I believe that might be true for some implementations, but I'd expect a graphics processor with limited numbers of memory chips available to have some form of in-chip error detection. Also, if an entire stack of HBM was used for parity the GPU would only have access to 12GB of RAM, not 16GB as nvidia claim.
It has 60 SM's 56 activated in the GPU in the article. Each SM has 64 single-precision stream processors and 32 double-precision stream processors. So 3840 single-precision and 1920 double-precision in the full chip (3584 and 1792 in the article card) for comparison the last GPU Nvidia made with dedicated double-precision cores the GK110 in the GTX 780 Ti had 2880 single-precision cores and 960 double-precision cores.
Xlucine (07-04-2016)
I can't wait for Witcher 3 performance at 4K.......the GPU killer.
True but even if you could buy one you couldn't use it as you'd need a MoBo with a NVLink connection, apparently Nvidia says the PCI Express bus doesn't provide enough bandwidth to keep the card busy.
The first thing that sprung to mind was ECC RAM and how each stick of ram will have an extra memory chip but i guess it maybe similar to a RAID5 array, I'm just guessing though so without further reading i couldn't say for certain how either Nvidia or AMD configure things when it comes to cards designed for GPU-accelerated computation, although comparing the M40's 288 GB/sec memory bandwidth with something like the Titan X's 336.5 GB/sec shows a similar disparity between their HPC and gaming cards.
Last edited by Corky34; 06-04-2016 at 03:44 PM.
where did it say Q1 2017 for retail cards?
A long time ago people put used parity on each byte of ram. That tells you if a single bit of ram gets flipped, but not which one so you can't correct it. When memory dimms got wide enough that a 64 bit DIMM with parity was 72 bit, some bright spark worked out that using a "hamming code" would tell you if *any* two bits were corrupted and if just one was corrupted can calculate which one so it can be flipped back.
For that to work, you need to write out to ram in groups of 64 bits to calculate the extra 8 bits of ecc as all 64 bits are contributing. If you are going to write out 64 bits anyway, then it doesn't slow you down in the slightest which is why modern computers which are good at grouping writes together into big lumps aren't slowed down much.
At a guess, I would say a Tesla card would be slower than a Geforce on the memory accesses just to keep the heat down. If you save 20W per card, then with 8 cards in a chassis that's 160W (numbers a pure guess, but you see what I mean that it adds up quick).
Corky34 (07-04-2016)
The marketing slides show the GPU has PCIe interface - NVlink is a server interconnect, not a desktop feature. PCIe should be more than enough for a gaming workload, but even if for some reason Pascal was massively more bus-hungry than any previous card ever and PCI was suddenly not enough, tough, because you're getting a PCIe connection to the Intel/AMD CPU.
But I agree with others, it looks very much like a HPC-focussed card rather than a gaming GPU - the size alone is likely to make it prohibitively expensive for quite some time. And then there are things like the NVlink interfaces which take an unknown size on the silicon, but if the non-gaming-applicable stuff isn't too large I guess they could just harvest dies where it's broken to sell as gaming cards.
Edit: Just having read more of the thread, yeah a lot seems to be off with the announcement. Not least the absurd 'first' claims about CoWoS, etc. Also, what happened to that 10x performance claim? I mean we knew it was always going to be in some really specific mixed-precision workloads but I've not seen it mentioned again?
Also, have we seen any evidence of working silicon yet? I only recall seeing renders? Though at least they're now actually showing HBM rather than what looked like HMC on the older ones! xD
Last edited by watercooled; 06-04-2016 at 10:19 PM.
OTOH, the announcement just keeps raising questions. Quite apart from the DP heavy design (which might not play out well in the gaming market), I want to know what they're doing with the HBM2 to only get 720GB/s out of it (and quite why they think that's 3x Maxwell when the 980 Ti and Titan X pull 336GB/s)? AMD's 4 stack HBM1 implementation returned 512GB/s theoretical bandwidth. The more I hear about Pascal the less impressive it appears to be....[/QUOTE]
I found this. x3 the performance of Maxwell = Tesla M40 which is based on Maxwell.
The Tesla P100 package includes four 4-Hi HBM2 stacks, for a total of 16 GB of memory, and 720 GB/s peak bandwidth. That’s three times as much bandwidth as Nvidia’s previous flagship accelerator the Tesla M40. The Telsa P100’s bandwidth figure is below that of the JEDEC HBM2 spec that SK Hynix & Samsung both adhere to. Which dictates that every 4-HI HBM2 stack should operate at a 2Ghz clock speed to deliver 256GB/s of bandwidth for a total of 1TB/s for four stacks. The HBM2 modules on the Tesla P100 actually operate well below the spec at only 1.4Ghz.
This is part of Nvidia to reduce the overall power of the package.
There are currently 1 users browsing this thread. (0 members and 1 guests)