Main PC: Asus Rampage IV Extreme / 3960X@4.5GHz / Antec H1200 Pro / 32GB DDR3-1866 Quad Channel / Sapphire Fury X / Areca 1680 / 850W EVGA SuperNOVA Gold 2 / Corsair 600T / 2x Dell 3007 / 4 x 250GB SSD + 2 x 80GB SSD / 4 x 1TB HDD (RAID 10) / Windows 10 Pro, Yosemite & Ubuntu
HTPC: AsRock Z77 Pro 4 / 3770K@4.2GHz / 24GB / GTX 1080 / SST-LC20 / Antec TP-550 / Hisense 65k5510 4K TV / HTC Vive / 2 x 240GB SSD + 12TB HDD Space / Race Seat / Logitech G29 / Win 10 Pro
HTPC2: Asus AM1I-A / 5150 / 4GB / Corsair Force 3 240GB / Silverstone SST-ML05B + ST30SF / Samsung UE60H6200 TV / Windows 10 Pro
Spare/Loaner: Gigabyte EX58-UD5 / i950 / 12GB / HD7870 / Corsair 300R / Silverpower 700W modular
NAS 1: HP N40L / 12GB ECC RAM / 2 x 3TB Arrays || NAS 2: Dell PowerEdge T110 II / 24GB ECC RAM / 2 x 3TB Hybrid arrays || Network:Buffalo WZR-1166DHP w/DD-WRT + HP ProCurve 1800-24G
Laptop: Dell Precision 5510 Printer: HP CP1515n || Phone: Huawei P30 || Other: Samsung Galaxy Tab 4 Pro 10.1 CM14 / Playstation 4 + G29 + 2TB Hybrid drive
There are all sorts of reasons DX12 adoption should be far quicker than any previous DX version.
Firstly, there's that big ol' free upgrade offer for everyone running Win 7/ Win 8.1. Those people won't all have DX12 capable graphics, but it should mean the majority of people who *do* have DX12 capable graphics will be on Win 10. And I'm going to go out on a limb and assume that DX12 has the same kinds of fall-back modes as DX11 had (which is why 3dMark worked on DX10 and DX9 hardware) so if you're developing directly for DX12 you can actually use compatibility fallbacks to support everyone.
Secondly, there's the XBOne factor, as others have mentioned. I'm unclear on whether DX12 on the XBOne is identical to the PC version (I'd assume it isn't as it should be able to get much closer to the metal) but it should at least be close enough to give minimal effort in porting.
Thirdly, there's the Mantle factor. One of the things AMD pushed with Mantle was that porting from Mantle to the other main APIs was lower effort than porting between them directly - so target Mantle and you'd save yourself work when doing DX10/DX11 and OGL ports. I'd assume, depending on how closely Vulkan is based on mantle, that the same would apply there, so potentially we could be looking at Mantle/Vulkan to DX12 porting being relatively straight-forward. At that point porting to XBOne should also be easier, and given the PS4 is sitting on AMD hardware, it's a reasonable assumption that AMD had a hand in their API which is presumably therefore not too far from Mantle, and therefore Vulkan. It starts looking a lot like most games will be able to use two relatively similar APIs to target all platforms.
Fourthly, there's casual gaming on relatively low powered machines. By removing the CPU bottleneck in draw calls it should be possible to produce far more immersive games on much lower spec hardware, including portable devices which are increasingly moving to many small cores. Better graphics in TDP-constrained environments could be a big draw, and wide-but-slow - which favours DX12 over DX11 - is ideal for low power environments (just look at how much AMD have been able to cram into 15W TDP Carrizo, for instance).
Huh, apparently I'm a bit of a DX12 evangelist....
Are there *any* mainstream games that have been CPU bound on a current system? And of those, any that parallelise well enough across multiple threads to take real advantage of hyperthreading on a quad core i7 versus not having it on an (otherwise equivalent) i5?
Bearing in mind the pipelining issues and stall hazards etc that make optimising code for maximum performance under hyperthreading pretty non-trivial and whether any game engine developers would be concerned with doing this.
Genuinely interested!
TBH, not much. That is why I went AMD, which for gaming is essentially a quad core.
Have a look at this: http://www.anandtech.com/bench/product/836?vs=1261
Now that is a Haswell i5 and an i7 with the same base and boost clocks, and they trade blows on game results. But are they boosting the same amount? The thermal interface changed, so perhaps not, there isn't much in it though.
For people like me that don't want to overclock (because the machine also has to work for its living rather than just play games) then the i7 4790K has higher clocks out of the box than the i5 4690k which is enough to make it win benchmarks, so the threading doesn't seem to hurt but clocks are still king. You also get a bit more cache with the i7, which doesn't seem to make a jot of difference either.
So no, I don't think anything out there really needs 8 threads yet, though there are clearly plenty of games that do fully use 4 threads and that is only going to go up.
rallfo (11-06-2015)
Interesting, I'd say those slight differences in performance on the game benchmarks are just the margin of error due to software and testing variances, since the i5 beats the i7 just as often in the game benchmarks.
On what you said about the larger cache not benefiting performance, that does actually make sense from a computer science POV. I'd point out that from my limited understanding, most game engine algorithms consuming CPU time would be operating on objects in memory organised within a BSP tree or similar data structure, with pretty sparse access patterns. They are primarily designed for efficient spatial representation / culling of objects in 3D space, as opposed to an optimal layout in memory with cache-friendly locality of reference that you would aim for if you were writing e.g. a number crunching protein folding algorithm.
I believe that Crystallwell was a direct attempt to remedy the inherent cache-unfriendly nature of typical game engine algorithms and their memory access patterns by including a HUGE 128mb victim cache. This is similar to the 32MB on-die SRAM functioning as L4 cache to the CPU in the Jaguar APUs AMD made for the XBox, so I think introducing a large L4/victim cache to the i7 would be ideal for desktop gamers. Not to mention that most game engines are designed primarily for consoles these days, so offering the same sort of architecture can only be a good thing performance-wise.
Back to hyperthreading and leaving aside CPU stalls because of cache thrashing: the other thing is that in your average protein folding algorithm or something you're going to be spending every clock cycle doing actual computation work, whereas I wouldn't be surprised if the majority of the instructions executed by the CPU in a typical game are actually spinlocks, (graphics) driver DPCs and interrupts etc. Hyperthreading can only offer a speedup if your CPU is bottlenecking during actual computation.
Indeed. Those cpus have the same clock speed ratings and are both based on the Haswell design, but the i5 has the updated thermal interface and the i7 has more cache as well as HT, but that is as close a match as I can think of. Clock for clock, in gaming, there isn't much in it, but the i7 does clock faster out of the box and to some of us that matters.
Don't forget that Intel have a rather good data pre-fetcher as well, to try and make sure that in common cases data is pulled into the cache before the CPU tries to access it.
On what you said about the larger cache not benefiting performance, that does actually make sense from a computer science POV. I'd point out that from my limited understanding, most game engine algorithms consuming CPU time would be operating on objects in memory organised within a BSP tree or similar data structure, with pretty sparse access patterns. They are primarily designed for efficient spatial representation / culling of objects in 3D space, as opposed to an optimal layout in memory with cache-friendly locality of reference that you would aim for if you were writing e.g. a number crunching protein folding algorithm.
I believe that Crystallwell was a direct attempt to remedy the inherent cache-unfriendly nature of typical game engine algorithms and their memory access patterns by including a HUGE 128mb victim cache. This is similar to the 32MB on-die SRAM functioning as L4 cache to the CPU in the Jaguar APUs AMD made for the XBox, so I think introducing a large L4/victim cache to the i7 would be ideal for desktop gamers. Not to mention that most game engines are designed primarily for consoles these days, so offering the same sort of architecture can only be a good thing performance-wise.
Hyperthreading gets more opportunity than that though.Back to hyperthreading and leaving aside CPU stalls because of cache thrashing: the other thing is that in your average protein folding algorithm or something you're going to be spending every clock cycle doing actual computation work, whereas I wouldn't be surprised if the majority of the instructions executed by the CPU in a typical game are actually spinlocks, (graphics) driver DPCs and interrupts etc. Hyperthreading can only offer a speedup if your CPU is bottlenecking during actual computation.
On average, every 6th instruction that a CPU executes is a jump. That becomes more and more important as CPUs try to issue more instructions per clock, we are getting to a stage where we jump nearly every clock cycle, and that makes dependant instructions etc all the more difficult and branch prediction ever more important. Haswell is 4 issue, so you need heavily unrolled loops and a lack of inter dependency on the instructions to keep the core retiring 4 instructions per clock, in most code there will be enough stalls to feed another thread. IBM find that they are into diminishing returns at 8 threads per core with Power, they have been running 4 threads successfully for years, two should be pretty clear cut.
Peter Parker (17-08-2015)
Main PC: Asus Rampage IV Extreme / 3960X@4.5GHz / Antec H1200 Pro / 32GB DDR3-1866 Quad Channel / Sapphire Fury X / Areca 1680 / 850W EVGA SuperNOVA Gold 2 / Corsair 600T / 2x Dell 3007 / 4 x 250GB SSD + 2 x 80GB SSD / 4 x 1TB HDD (RAID 10) / Windows 10 Pro, Yosemite & Ubuntu
HTPC: AsRock Z77 Pro 4 / 3770K@4.2GHz / 24GB / GTX 1080 / SST-LC20 / Antec TP-550 / Hisense 65k5510 4K TV / HTC Vive / 2 x 240GB SSD + 12TB HDD Space / Race Seat / Logitech G29 / Win 10 Pro
HTPC2: Asus AM1I-A / 5150 / 4GB / Corsair Force 3 240GB / Silverstone SST-ML05B + ST30SF / Samsung UE60H6200 TV / Windows 10 Pro
Spare/Loaner: Gigabyte EX58-UD5 / i950 / 12GB / HD7870 / Corsair 300R / Silverpower 700W modular
NAS 1: HP N40L / 12GB ECC RAM / 2 x 3TB Arrays || NAS 2: Dell PowerEdge T110 II / 24GB ECC RAM / 2 x 3TB Hybrid arrays || Network:Buffalo WZR-1166DHP w/DD-WRT + HP ProCurve 1800-24G
Laptop: Dell Precision 5510 Printer: HP CP1515n || Phone: Huawei P30 || Other: Samsung Galaxy Tab 4 Pro 10.1 CM14 / Playstation 4 + G29 + 2TB Hybrid drive
The L4 in Crystalwell is mainly for the integrated GPU; some CPU applications benefit from it but IIRC it doesn't generally make much difference for discrete GPU gaming (though having just said that I'm not certain if Crystalwell remains active if the GPU is disabled? I'll have to check.)
Hyperthreading doesn't just rely on a CPU bottleneck, it depends where about in processor it lies - if an application extracts a lot of ILP and saturates the execution ports with a single thread, hyperthreading may not help much. x264 would be a good example here, it's a really well-optimised encoder and scales very well to multiple physical cores (including Bulldozer's multiple integer cores), but SMT makes very little difference.
More importantly DX12 will spread the load over cores much more evenly which would bring the current AMD 8 cores back into the game, as single core performance isn't as good but then all 8 cores are used it works out better than an i5.
Interesting times ahead.
So why aren't games utilizing Hyper threading it's been around long enough?
Put simply software support is long behind hardware support, especially with the various levels of software dependencies. A game may want to use 4 cores but in order to do so properly DirectX also needs to handle it properly. AMD were quite smart developing Mantle, it forced Microsoft to take a stand and actually update DirectX or face people moving to Linux+Mantle if they developed sufficiently, and with AMD CPUs having tradtionally more but fewer cores, it will work out quite well for them.
As I explained before, games don't 'use hyperthreading' at all. A CPU with four physical cores is presented to applications as eight logical threads and it's mostly down to the kernel CPU scheduler to decide where to run application threads. For the most part, applications such as games aren't even aware of hyperthreading; as far as they're aware there are eight threads available.
In order to 'use hyperthreading', a number of conditions must be met as I also briefly explained earlier. To really benefit from it you need to have more threads than physical CPU cores - in general threads will run more quickly when assigned to their own physical cores so there's no competition for resources. A negative example of this is some games (and other applications besides) performing worse and/or stuttering on i7 compared to i5. In addition to this, not all workloads will really benefit from SMT depending on resources demanded by concurrent threads - SMT can improve multithreaded performance by increasing resource utilisation in each physical core but this isn't always possible.
Off on a tangent, but a similar issue can be found with multi-CPU, and even single-socket Xeon systems - MOAR isn't always better for many applications. For example, many games will perform considerably better on a 4790k than a massively more expensive socket 2011 system. Multi-CPU sockets can take this even further as unless applications are NUMA-aware, some will perform woefully on multi-socket CPUs, often far worse than even a single CPU because of, among other things, memory access times between sockets.
Thanks all for extensive replies. I guess Microsoft is flocking a dead horse with gaming at least.
From looking at DX12, the i7 should make gains in performance due to the fact that DX12 address issues with processes being piled onto a single core.
I don't think the price will be justified if you're only gaming on your system, however you may benefit greatly if you're multitasking due to have dual monitors.
There are currently 1 users browsing this thread. (0 members and 1 guests)