Page 6 of 6 FirstFirst ... 3456
Results 81 to 85 of 85

Thread: Nvidia GeForce RTX 2080 Ti and RTX 2080

  1. #81
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: Nvidia GeForce RTX 2080 Ti and RTX 2080

    Quote Originally Posted by DanceswithUnix View Post
    As for width, I got the impression that pre-Volta the ALU was mixed fp32+fp64+int but now the integer processing is stripped out into its own execution unit so address calculations can be done in parallel with the floating point. ISTR that fp16 was removed from gpus early on, around Riva TNT, and no-one missed it until now. I think in Vega AMD put the data type back in, but a dedicated fp16 matrix multiply-accumulate would seem sensible.
    Yea it was however i suspect that stripping out into individual units of FP and INT is done at the factory, possibly via firmware, as it doesn't seem logical to take what used to be a mixed precision unit and redesign it to only do INT or FP, from a failure POV it seems more logical to decide if a unit is going to do FP or INT after fabrication.

    The FP/INT 32/64 part AFAIK appears to be something that's fixed at design and fabrication time, the width of a unit is a physical thing and ideally you want the width of the unit to match the width of the data going through it, while a MP INT/FP 64 unit can do INT/FP 32 work it's a waste of silicon and power.

    Quote Originally Posted by DanceswithUnix View Post
    The bit I noticed from that pdf was the huge matrix size required to hit high performance, when AIUI most AI just uses 4x4 multiplies.
    Is does but even a 4x4 grid (afaik, a grid can also do +, -, and / on a per grid basis) consist of 16 individual 2 digit numbers (32bits), or any variation that results in either 16 or 32bits (there's also 64bits but that's more for the professional cards), at least that's my understanding and I'd welcome the input from someone with more knowledge of Tensor programing.

  2. #82
    Senior Member
    Join Date
    Aug 2009
    Location
    UK
    Posts
    431
    Thanks
    20
    Thanked
    33 times in 27 posts
    • Jace007's system
      • CPU:
      • Intel i7 7700k
      • Memory:
      • 16GB
      • Storage:
      • 500GB SSD
      • Graphics card(s):
      • nVidia 1080
      • PSU:
      • EVGA 750w
      • Operating System:
      • WinLOW

    Re: Nvidia GeForce RTX 2080 Ti and RTX 2080

    nope, i think i wont buy a GPU for at least 3 years, Just got myself a PS4 .. Gaming on PC is expensive trying to keep up with the latest tech only for a handful of really decent AAA games to show up after waiting years. Thanks but no thanks I'm out

  3. #83
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    12,986
    Thanks
    781
    Thanked
    1,588 times in 1,343 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: Nvidia GeForce RTX 2080 Ti and RTX 2080

    Quote Originally Posted by Corky34 View Post
    Yea it was however i suspect that stripping out into individual units of FP and INT is done at the factory, possibly via firmware, as it doesn't seem logical to take what used to be a mixed precision unit and redesign it to only do INT or FP, from a failure POV it seems more logical to decide if a unit is going to do FP or INT after fabrication.
    You would never disable that at manufacture, they are designed like that. Or put it another way, fp circuitry is big so you wouldn't have any idle with only an int part enabled.

    The int and fp logic are not the same, but by combining them you save instruction decoding logic etc. So they have decided to spend a few transistors to make use of int instructions being lower latency rather than having them go down the same long pipe as the floating point ops. Sounds like a marginal improvement at best, but I'm sure simulations would have shown an improvement for them to make the change.

    The FP/INT 32/64 part AFAIK appears to be something that's fixed at design and fabrication time, the width of a unit is a physical thing and ideally you want the width of the unit to match the width of the data going through it, while a MP INT/FP 64 unit can do INT/FP 32 work it's a waste of silicon and power.



    Is does but even a 4x4 grid (afaik, a grid can also do +, -, and / on a per grid basis) consist of 16 individual 2 digit numbers (32bits), or any variation that results in either 16 or 32bits (there's also 64bits but that's more for the professional cards), at least that's my understanding and I'd welcome the input from someone with more knowledge of Tensor programing.
    I believe the width of the unit is 64 bits. You do a pair of 32 bit operations, or 4 lots of 16 bit operations. Hence in the fully 64 bit enabled Volta parts they can do fp64 at half the rate of fp32.

    The tensor calculation is a simple 4x4 multiply with accumulate, so you are performing 16 lots of 4 multiplies but instead of storing the result you add them into the destination with a choice of using fp16 or fp32 for the result.

  4. #84
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: Nvidia GeForce RTX 2080 Ti and RTX 2080

    Quote Originally Posted by DanceswithUnix View Post
    You would never disable that at manufacture, they are designed like that. Or put it another way, fp circuitry is big so you wouldn't have any idle with only an int part enabled.

    The int and fp logic are not the same, but by combining them you save instruction decoding logic etc. So they have decided to spend a few transistors to make use of int instructions being lower latency rather than having them go down the same long pipe as the floating point ops. Sounds like a marginal improvement at best, but I'm sure simulations would have shown an improvement for them to make the change.
    So when i read that a CUDA core executes a floating point or integer instruction per clock for a thread Nvidia used to include both FP and INT ALU's in each CUDA 'core' (at least the notional definition of a CUDA 'core')? In other words a CUDA 'core' could only perform one or the other type per clock for a thread.

    Since Volta they still have those same separate FP & INT units but because of changes in the way the registers and cache works they can now address both units concurrently? In essence for each clock the notional idea of a CUDA 'core' can now run two threads, one for FP and another for INT?

    Quote Originally Posted by DanceswithUnix View Post
    I believe the width of the unit is 64 bits. You do a pair of 32 bit operations, or 4 lots of 16 bit operations. Hence in the fully 64 bit enabled Volta parts they can do fp64 at half the rate of fp32.
    You mean the width of the notional idea of what a Tensor 'core' is? If so that does seem to make more sense as that would mean all those FP64 'cores' on professional cards have been repurposed for Tensor 'cores' on consumer cards.

    Quote Originally Posted by DanceswithUnix View Post
    The tensor calculation is a simple 4x4 multiply with accumulate, so you are performing 16 lots of 4 multiplies but instead of storing the result you add them into the destination with a choice of using fp16 or fp32 for the result.
    AFAIK the type (multiply, addition, square root, subtraction) and size (2x2 8bit, 4x4 4bit, 8x8 2bit) can vary as long as each individual TensorRT kernel is constructed from the same type, so you could have one TensorRT kernel doing subtractions on a 2x2 grid constructed out of 4 four digit numbers, another TensorRT doing multiplication on a 4x4 grid of of 16 two digit numbers, and any variation there of that fits into a 16 or 32bit data structure, each of those 16/32bit TensorRT kernels go to make up the NN, at least that's my current understanding.

  5. #85
    Senior Member
    Join Date
    Aug 2017
    Posts
    278
    Thanks
    10
    Thanked
    28 times in 19 posts

    Re: Nvidia GeForce RTX 2080 Ti and RTX 2080

    I don't see any Hairworks comparisons. Have Nvidia dropped it?

Page 6 of 6 FirstFirst ... 3456

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •