The FP/INT 32/64 part
AFAIK appears to be something that's fixed at design and fabrication time, the width of a unit is a physical thing and ideally you want the width of the unit to match the width of the data going through it, while a MP INT/FP 64 unit can do INT/FP 32 work it's a waste of silicon and power.
Is does but even a 4x4 grid (
afaik, a grid can also do +, -, and / on a per grid basis) consist of 16 individual 2 digit numbers (32bits), or any variation that results in either 16 or 32bits (there's also 64bits but that's more for the professional cards), at least that's my understanding and I'd welcome the input from someone with more knowledge of Tensor programing.