In terms of discrete time needed to load from disk, yes.
In terms of perceived time, these times may be interleaved in a manner which results in the more distant operation appearing to take longer, as it's commands are delayed, but in total the timings are shorter. That was what I was referring to.
Edit: I just re-read my initial post, and it turns out I had already stated that. So why exactly people want to believe otherwise is beyond me Yes, I am willing to chart this all out if need be, but it is exam season, and I'd rather that right now, people read into how this works themselves. If you just plot out the access points over time for a random set of requests, unsorted and sorted, this is an immediately self-evident result. Individual response time may increase, to allow system total response time to increase. Text-book stuff, quite frankly, and not for the module I have an exam on tomorrow.
Edit 2: Yes, the above is slightly over the top. Regulars know that I normally love explaining this sort of thing, and I'm just really stressed and really sad that I cannot put the effort into explaining how NCQ works right now
It doesn't matter if the loading is faster, if there is a delay before it begins. Where the benefit from NCQ ordering reduces the time needed to load the program by a factor greater than the delay, then NCQ is beneficial.
A lot of this will be dependant upon the SATA controller on the motherboard (as these can also do various 'optimisations' themselves), the operating system (which traditionally also does a small amount command queuing) and upon the vendor specific implementation of NCQ and the command buffer size.