We have of course used the GDC to question Nvidia in order to learn more about what its capable GPU in terms of management of multi engine DirectX 12 ... without real success?
At the heart of DirectX 12, this feature allows to decompose rendering several lines of commands, which can be of copy, graphics or compute, and manage synchronization between the files. What to allow developers to take control over the order in which tasks are executed or to directly drive the multi-GPU. This decomposition allows in some cases to take advantage of the GPU ability to handle multiple tasks in parallel to boost performance.
This is what AMD calls Async Compute, although the term may not be correct. Indeed, the asynchronous execution of a task does not imply that it be treated concomitantly another, yet it is this last point that is crucial and allows a performance gain. AMD GPU advantage of multiple orders of processors capable of powering the GPU computing units from several different files. Treatment simultaneous tasks that maximizes the use of all GPU resources processing units, memory bandwidth etc.
Nvidia side is more complicated. If the GeForce are able to support copy files in parallel compute and graphics files, process the last two concomitantly seems problematic. Theoretically Maxwell 2 GPUs (GTX 900) have a command processor can handle 32 lines of which may be type graphics. Yet this support is still not functional in practice, as shown by example in the performance of GeForce Ashes of the Singularity.
Why ? So far we were able to get real answer to Nvidia. So of course we wanted to take advantage of the GDC to try to learn more and have questioned Nvidia at a meeting organized with Rev Lebaredian, Senior Director GameWorks. Unfortunately for us, this engineer who is part of the technical support group for video game developers was very well prepared for these issues that affect the multi engine support specificities. His answers were initially verbatim those of the brief official statement Nvidia communicated to the technical press in recent months. Namely "GeForce Maxwell can support running concurrently at the SM (groups of processing units)", "it is not yet active in the pilot," "Ashes of the Singularity is one set (not too important) among others. "
An unusual wooden language that shows, if it were still needed, that this issue bothers Nvidia. So we changed the approach to the impasse we have approached the subject from a different angle: is the Async Compute is important (for Maxwell GPU)? What Lebaredian Rev relax and open the way for a much more interesting discussion. Two arguments are then developed by Nvidia.
First, if Async Compute is a way to increase performance, what matters in the end it is the overall performance. If GeForce GPUs are the most efficient basis than the Radeon GPU, the use of multi engine in an attempt to boost their performance is not a top priority.
On the other hand, if the rate of use of the various blocks of the GeForce GPU is relatively high at the base, the potential gain from Async Compute is less important. Nvidia says here that overall there are far fewer holes (bubbles in language GPU) at the activity of units of its GPU than its competitor. But the purpose of concurrent execution is to exploit synergies in the treatment of different tasks to fill these "holes".
Behind these arguments lie Nvidia actually one of the good planning of a GPU architecture. Integrate into chips one or more advanced control processors at a cost, a cost that can eg be exploited differently to provide more computing units and boost performance directly in up games.
When developing a GPU architecture, much of the work is to provide a profile of tasks that will be supported when the new chips will be marketed. The balance of the architecture between its different types of units, among the computational power and memory bandwidth between the triangles rate and pixel throughput, etc., is a crucial point that requires good visibility, a lot of pragmatism and a strategic vision. It is clear that Nvidia is rather pulls well at this level for several generations of GPUs.
To illustrate this, let's do a few comparisons between GM200 and Fiji on the basis of results obtained in Ashes of the Singularity not Async Compute. The comparison is rough and approximate (the exploited GM200 is from the GTX 980 Ti, which operates in a slightly castrated view) but still interesting:
GM200 (GTX 980 Ti): 6.0 fps / Gtransistors, 7.8 fps / TFLOPS, 142.1 fps / TB / s
Fiji (R9 Fury X): 5.6 fps / Gtransistors, 5.8 fps / TFLOPS, 97.9 fps / TB / s
We could do the same with many games and the result would be similar or even greater (AOTS is particularly effective on Radeon): the GM200 better utilize resources at its disposal than Fiji. It is an architecture of choice, which does not directly involve it is better than another. Increase the yield of some units may cost more than the increase in their number in a greater measure. The work of architects is to find the right balance at this level.
Obviously, AMD has instead relied on gross flows of its GPU, which usually implies a lower yield and optimization of opportunity at this one. Add to this that the organization of the Async Compute in AOTS seems more efficient use of memory bandwidth surplus and you will easily understand that there is less to gain from the side of NVIDIA. Especially as the synchronization commands related to Compute Async have a cost that will be masked by a significant gain.
If our own thinking leads to rather agree with Nvidia these arguments, there is another important point for the players and that's probably what makes the number one GPU addresses the topic lip: Async Compute provides free gain for Radeon users. While this possibility was provided for in the AMD GPU for more than 4 years, they have not been able to get commercial profit, they have not been sold more expensive for the cause. This changes somewhat with the latest range of AMD that focuses strongly on this point, but in terms of perception, players like to get a free such little boost, even if only a handful of games. Conversely, the overall higher performance GPU Nvidia may have an immediate benefit in up games, and could be included directly in the price of GeForce. And from the perspective of a company whose purpose is not to post losses, it is clear that an approach makes more sense than another.
Still we are in 2016 and that the operation of the Async Compute should gradually spread, particularly thanks to the similarity between the architecture of the GPU consoles and that of the Radeon. Nvidia can not totally ignore the possibility that could reduce or eliminate the lead in terms of performance. Without going into detail, Rev Lebaredian thus wished to reiterate that there were indeed opportunities in the drivers' level to implement which they can enjoy in some cases a performance gain with the Async Compute . Opportunities that Nvidia constantly revalues, not without forgetting that its future GPU could change that at this level.