According to this (not sure if they're referencing Nvidia or if it's their own take on it)
http://hothardware.com/News/Nvidias-...a-Rides-Again/ they're claiming it's
not VLIW?
In terms of issue width, isn't that the reason ARM A7, Saltwell, Silvermont, Bobcat, Jaguar are all 2-issue for power efficiency? It's why I'm thinking the '3-issue vs 7-issue' comparison isn't really a fair one.
Though, even with compiler-scheduled code on graphical workloads like on GPUs, AMD still saw it fit to drop from 5-wide to 4-wide VLIW to make better use of resources? I'm not sure if that's directly comparable, mind, but I would've thought it was easier to extract ILP from graphical workloads vs doing it dynamically on CPU workloads?
The Denver scheduling hardware must be massive - they claim to have chosen to not use hardware OoO for efficiency/die size reasons, but Denver looks massive, assuming it's made using the same libraries, node.