Read more.NVIDIA spills the beans on Fermi. Good enough to take down the Radeon HD 5870? We take a first look at the architecture.
Read more.NVIDIA spills the beans on Fermi. Good enough to take down the Radeon HD 5870? We take a first look at the architecture.
They seem incredibly reluctant to even talk about gaming, let alone give realistic hints to performance. It's almost like they are counting on HPC sales to grow exponentially to make up for a poor forcast on gaming profit.
I think AMD will be satisfied.
It's very hard to see how, from a gaming perspective, nVidia will be able to match ATI on a price per performance basis. I hope I'm wrong as the price of 5870s could stay high for quite some time if this does come to pass.
They have more than doubled the SP count, 512 is more than a GTX295, so this should be much faster given the inevitable tweaks under the hood...
I reckon it'll compete with a 5870 fairly evenly, and for me the value proposition of NVidia with CUDA, PhysX, the potential for flash acceleration etc is better than ATI (guess it depends on your viewpoint though) so assuming the power/efficiency are good I'm looking forward to this, going to be hard to choose...
Main PC: Asus P8Z77 WS / 3570k @ 4.4GHz / 8GB Vengeance Black / 2x GTX 580 / Areca 1680 / X-Fi Titanium / Corsair: HX 850 / 600T / K60 / M60 / HS1A / 2x Dell 3007 / 2 x 256GB Samsung 830 (RAID0) / 2 x 128GB Kingston V100 (RAID0) / 240GB Corsair Force 3 (RAID0) / 4 x 1TB Sumsung F1 (RAID5) / Multi-boot: Win 8 x64 Pro, Win 7 x64 Ultimate, Ubuntu and OS X Lion
HTPC: GA-Z68A-D3-B3 / i5 @ 3.6GHz / 8GB XMS3 / GTX 570 / Tevii S480 / SST-LC20 / Antec TP-550 / PS50C6900 / 2 x 64GB SSD (RAID0) + 3 x 1.5TB / Win 7 x64 Pro
Spare/Loaner: Gigabyte EX58-UD5 / i950 / 12GB RAM / GTS 450 / Corsair 300R / Silverpower 700W modular
Server Setup: HP ML110 G5 / 8GB RAM / Areca 1210 RAID / 2 x 300GB (RAID1) / 2 x 250GB (RAID1) / 3 NICs / Windows Server 2008 R2
2 x ESX 5.1 Nodes: Asus M5A78L-M/USB3 / AMD FX 6100 / 16GB XMS3 / 500W Mushkin Volta / 160GB SATA HDD / 5 NICs
NAS 1: HP Microserver N40L / 10GB RAM / 2 x 3TB + 80GB Intel SSD (Hybrid) + 2 x 1TB / 3Gbps || NAS 2: HP Microserver N40L / 10GB RAM / 2 x 3TB (RAID1) + 2 x 640GB (RAID1) + 80GB Intel SSD (Hybrid) / 3GBps || Network: TL-WR1043ND w/DD-WRT + Dell PowerConnect 5224
Happily I can afford to wait - my current card is fast enough and directx 11 brings speed improvements for all cards this time round so..
System 001: Asus Z68 Deluxe, 2600k i7, EK Supreme HF - Full Copper CPU Block, GTX 670 FTW 2GB x 2 SLI, EK 680 GPU Blocks/EK Bridge, 8GIG Corsair Vengence DDR3 RAM CL9 @ 1600mhz, Corsair HX1000, Dell U2412M, Logitech 5.1, Samsung F3 1TB x 2 (RAID 0), Samsung 830 128GB x 2 (RAID 0) SSD (System), Antec 1200 case, Thermochill 120.4 rad, Vario Pump, Windows 7 x64, Cyberpower 1500VA UPS[main]
System 002: A8 3850 APU, ASUS uATX FM1A75 MB, 4GB Corsair Vengeance DDR3, Corsair psu, OCZ Agility 3, 1TB F3, Dell 2001FP 20" LCD, £7's worth of 5.1 speakers (they rock) Windows 7 x64[wife/server]
System 003: AOpen 1557 GLSLaptop, ATI 9600 64mb, 1.5 GIG of DDR2700 memory, 60gig fujitsu HD 8mb cache, Intel Wireless and it's great! Windows 7 32bit [main lappy]
System 004: ASUS MB, Intel Core 2, 4 GIG Corsair, Silverstone HTPC case, stock cooler, GT220 1gbDDR3, Samsung F3 1TB, Kingston 40gb SSD, MCE Remote, Samsung 40" LCD (87BDX) via HDMI Windows 7 (32) [media centre]
System 005: Asus UL50AT Intel Core 2 Duo,4GB, Intel Gen 2 80GB SSD, Win 8 x64 [no justification]
System 006: HP Proliant N40L Microserver, 4x2TB drives, fan mod, Pico PSU mod, Win7 x86 [file server]
System 007: Dell Optiplex 9010, i7, 8gb, 128gb Samsung 830 x 2 (boot and VM drive), 1TB WD HDD, ATI something, Windows 8 x64 RTM [work]
I think it will be a lot of use because there are loads of programmers out there who prefer to program in Python or Java, and don't like C. It is also a lot quicker to write useful programs in high level languages, than in C.
Suppose you have an existing program written in Java. It currently takes an hour to run, and because it gets run a great deal you have a bussness need for it to run faster.
You could re-write the time crital sections in C, which will make the program about 50% faster (40 minutes), but to do so you would need to learn C, and the resultant code would be more bug prone.
Alternatively you could ask your Boss for £1000 for an nVidai CUDA card that will run the code 100 times faster (4 seconds), with only minor tweaks to the code in a language you are already familiar with.
Even if the program is not yet written, it is often still better to write in a high level language than a low level one as development will be faster. If that last bit of performance is still needed then the critical sections can still be re-written in C, but most of the time the 100x speedup from using CUDA will be good enough.
Exactly.
Originally Posted by RealWorldTechnologies
For anyone interested in the architecture to a greater degree, NVIDIA's released a whitepaper to the press a few days ago. It's now on the site, so read away (PDF).
http://www.nvidia.com/content/PDF/fe...Whitepaper.pdf
not only that, but surely to make effective use of a GPU with that many stream processors your code would already have to be written to be massively multithreaded. Having done an MSc which taught Java as its principal language, and therefore knowing the coding skills of many professional Java developers, the concept of them trying to develop a massively-multithreaded software architecture to take advantage of this leaves me shivering in terror...![]()
I don't think the language you write in is that big a deal if it comes with a decent library for this sort of stuff, or a good compiler (the world needs more compiler writers).The problem is, most workloads just aren't written to do SIMD. OK, so new CUDA can run multiple kernels, but I doubt you can run as many kernels as you have streams (I guess I should read the whitepaper!).Originally Posted by chrestomanci
If you want to make a GPGPU run fast, you need to take a lot of data, chop it up, and apply the same operations to each chunk - which is why you can dunk it through something massively parallel.
As soon as the operations you need to perform vary between each chunk (e.g. you have branches) the whole thing breaks down. Now, assuming you've got data that lends itself to parallel processing, there are ways of dealing with conditionals that don't involve branching.
Indeed, the reason GPUs have turned into the parallelised beasts that they are, is that graphics shaders and the data they work on are perfect for such situations.
There are a lot of workloads that can have multiple things happening at once, but that's not the same as doing the same thing to lots of data elements at once, which is why we don't have 512-core CPUs (yet...).
Yes they also increased the bus to 300bit and more onboard ram, this all adds to a huge expense and so it will probably offer better performance than the 5870 but will also cost a HUGE amount more.
edit: Also the amount of R&D thats gone into this project, it wont be healthy for Nvidia to compete with AMD's 5000 series (a lot less R&D costs etc) on price, either they sell at a loss or sell at a huge price that people wont buy. Although its a start of what could be an amazing platform/design its just going to be a profitless technology unless some serious cost cutting and developments can be made.
Doesnt matter anyway, by the time fermi is out AMD will already have its 6000 series out or a few months away i reckon.
A 384bit memory bus is actually smaller than the GTX285, which interfaced using a 512bit bus, so there will be a small cost saving in using less memory chips. It does mean it'll end up shipping with one of those odd-sounding memory buffers though - 1.5GB most likely (I can't see them bothering with a 768MB version of the top end card).
Of course, ATI only use a 256bit bus, so unless Nvidia run their DDR5 @ < 3200 effective they're going to have more memory bandwidth on tap. On the other hand, it's debatable whether ATIs top end cards are bandwidth limited anyway, and therefore whether that extra bandwidth will boost performance at all...
Hicks12 (18-01-2010)
Sorry i havent looked much into nvidia top end cards, only focused on the gtx 260 :L.
Thanks for informing me of that, however the rest still is right isnt it? R&D needs to be recopurated from somewhere and its going to be in the price of the cards, amd just tweaked there design and added more which is great (it shows) and in the end costs a lot less than changing the whole design!.
We will only know if bandwidth helps more when fermi is released, im betting Q2 release now tbh.
There are currently 1 users browsing this thread. (0 members and 1 guests)