Nvidia gives some more insights into Ampere:
https://www.reddit.com/r/nvidia/comm...d_we_answered/
Could you elaborate a little on this doubling of CUDA cores? How does it affect the general architectures of the GPCs? How much of a challenge is it to keep all those FP32 units fed? What was done to ensure high occupancy?
[Tony Tamasi] One of the key design goals for the Ampere 30-series SM was to achieve twice the throughput for FP32 operations compared to the Turing SM. To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.
Doubling the processing speed for FP32 improves performance for a number of common graphics and compute operations and algorithms. Modern shader workloads typically have a mixture of FP32 arithmetic instructions such as FFMA, floating point additions (FADD), or floating point multiplications (FMUL), combined with simpler instructions such as integer adds for addressing and fetching data, floating point compare, or min/max for processing results, etc. Performance gains will vary at the shader and application level depending on the mix of instructions. Ray tracing denoising shaders are good examples that might benefit greatly from doubling FP32 throughput.
Doubling math throughput required doubling the data paths supporting it, which is why the Ampere SM also doubled the shared memory and L1 cache performance for the SM. (128 bytes/clock per Ampere SM versus 64 bytes/clock in Turing). Total L1 bandwidth for GeForce RTX 3080 is 219 GB/sec versus 116 GB/sec for GeForce RTX 2080 Super.
Like prior NVIDIA GPUs, Ampere is composed of Graphics Processing Clusters (GPCs), Texture Processing Clusters (TPCs), Streaming Multiprocessors (SMs), Raster Operators (ROPS), and memory controllers.
The GPC is the dominant high-level hardware block with all of the key graphics processing units residing inside the GPC. Each GPC includes a dedicated Raster Engine, and now also includes two ROP partitions (each partition containing eight ROP units), which is a new feature for NVIDIA Ampere Architecture GA10x GPUs. More details on the NVIDIA Ampere architecture can be found in NVIDIA’s Ampere Architecture White Paper, which will be published in the coming days.
I wonder how much of it was intentional??
kalniel (03-09-2020)
TSNM 7nm process has matured very well for the die density. Yields are 90%+ - Samsung 8nm estimates are around 70% at moment. That's a huge difference! Intel 10nm yields are supposed to be around what Samsung can manage and that's enough for Intel to have canned 10nm for a lot of things
TSNM and availability is more the fact that EVERYONE wants it at the moment. Nearly every SoC is on it, everybody was fighting for a bit of that pie. Ryzen moving forward will be on it and Apple have tried to get more capacity. Huge die sizes and people all pushing performance rather than power saving is also a factor but really that has always been the case
Old puter - still good enuff till I save some pennies!
The 3070 is using GDDR6 not 6x so presumably doesn't have the same heat issues. A few articles I've read are that the RAM chips are causing a lot of the heat issue. It's also speculated this is the real reason the 3080 only has 10GB RAM, to help balance the heat from the lesser binned chips going into the 3080 vs those they are cherry picking to put in the 3090 (and 3080ti when it launches)
There is swearing in the video !
Jon
ik9000 (04-09-2020)
Jonj1611 (04-09-2020)
https://www.nvidia.com/en-us/geforce...te-sept-16th-/
Suroosh@NVIDIA
3h
Hey everyone - two updates for you today.
First, GeForce RTX 3080 Founders Edition reviews (and all related technologies and games) will be on September 16th at 6 a.m. Pacific Time.
Get ready for benchmarks!
Second, we’re excited to announce that the GeForce RTX 3070 will be available on October 15th at 6 a.m. Pacific Time.
Interesting. Hexus can reveal performance two days earlier:
https://hexus.net/tech/news/graphics...tion-examined/Originally Posted by hexus
Apparently the dates been put back to enable all reviews to get their reviews ready. Something about issues with shipping delays due to current climate.
kalniel (12-09-2020)
There are currently 1 users browsing this thread. (0 members and 1 guests)