Optimising i7 970 + GTX 480 SLI

**Tattysnuc** · 12-02-2011, 07:28 PM

Hi,

So that the details of what I've done don't get buried deep in a chatty thread, here's my problem.

When running the F@H smp client with the BIGADV flag set, this utilises all 12 threads (6 real, 6 hyperthreaded)

The NVidia GTX 480 uses CPU cycles to operate (so much for running the GPGPU without a CPU eh NVidia? Wonder if that's a Windows or CUDA requirement? Wonder what the hell the Chinese are running their Tesla supercomputer on....)

My reading has found that the most common recommendation is to allocate half a thread to each Nvidia client, however, trying to do this in practice is not possible so I thought that I'd share this, just so that you can see how I've done it..

With my Nvivia control panel set up thus:

Opening the task manager, then finding the binary being executed by the GPU client is startight forward for me. I have 2 GPU's so I just looked for that which I could see was duplicated, and consuming most resource... FAHCORE_15.exe *32 in my case (I guess that the *32 means it's a 32bit piece of code)

My understanding of this is that the main systray process is called Folding@home.exe and this launches the latest most appropriate CUDA software. I've had NORTON quarantine my FAHCORE_15.exe program when it was first downloaded, so just make sure that your virus checker doesn't intefere by creating an exception by adding it to the trusted list, and restoring it.

Once FAHCORE_15.exe is running, if you right click on it and try to change the affinity by right clicking on the process and selecting "set affinity" then this will report:

Hmmmm. What do we do?

I noticed that if we try this on the CPU client, that we CAN change the affinity. A little light reading shows that F@H on the GPU uses the core with the most available free cycles - time to test a theory?

Looking at the resource monitor located on the performance tab of the task manager it can be seen that the FAHCORE_15.exe is indeed changig cores. One is executed on Core 3, then the other copy is on core 2, then core 3..... It appeards to be moving dynamically. Good!

On the CPU hosted executable, I right click and set affinity to all but one of my cores...

and then close the task manager.

Now lets check the results:

Before Time per fold with 2 GPU clients and BIGADV on 12 threads: 46mins and upwards (sometimes 1.5hrs)
Time per fold with no GPU client: 23mins or there abouts
Time per fold with 2 GPU clients and BIGADV on 11 threads: 26 mins so far....

That's a considerable improvement if it is sustained....

Will confirm stats and observations over a longer period as they come in...

#UPDATE# Time per fold has now averaged out as 39mins

I'll calculate the relationship to the number of cores next and try to find the optimal setup for the rig.

**Queelis** · 12-02-2011, 08:03 PM

Very nice article, almost a kind of a tutorial

And it is a good idea to leave the GPU clients on one core that is not folding, since the SMP client folds only as fast as the slowest core. It's why I run my folding with a "-smp 3" flag when I'm gaming - it still gets a bit more than half the PPD when the CPU is idle

**Tattysnuc** · 13-02-2011, 10:52 AM

I'm trying a simpler approach, by adding -SMP 11 to the command line.

When it finishes the work unit, and submits it, Windows does not associate this defined affinity, and so restores 12 thread processing. Choosing the affinity would be a effective when you have started processing and dont want to lose ANY of the processing time by closing and re-opening.

I'll report back when I have an update...

**Queelis** · 13-02-2011, 03:55 PM

So you mean that I can disable the affinity for one of the cores for the SMP client and don't need to restart it with "-smp 3" flag? I could certainly see a use for this! Many thanks

Edit: just tried this, works beautifully! \o/

**Tattysnuc** · 13-02-2011, 06:00 PM

Thanks Queelis. Had a hunch you'd appreciate something that mentions or skirts around efficiency.

I've noticed that with 11 cores out of 12 that the time per fold has gradually crept back up to 42 mins, so I disabled a second core - effectively running SMP -10. Time per fold has since stabilised at 37 mins, and the GTX 480s have gone back to their max 16k+ per card.

It may be the work units I'm on, so I'll update again onxw I have a conclusion as opposed to an ongoing observation

**scaryjim** · 13-02-2011, 06:17 PM

There are cunning ways of setting affinity in shortcuts on win7, and there's also a .NET api for determining and setting affinity on processes (I'm still pondering whether I have the time and inclination to write such a program in my spare time; programming full time at work kind of zonks me out for thinking about that kind of thing outside of work!). But yeah, since the folding cores start and stop as work units complete, setting the affinity only works for the current unit, and each n ew unit will reset to the default of running on all available cores.

SMP -x will run that number of *threads*, but my experience is that if you don't set the affinity they wander from core to core and you don't necessarily get optimal use out of your CPU. Haven't had any time to do in-depth explorations of the difference in ppd between just using -smp x and locking the affinity for the folding process. My gut feeling is that without locking the affinity you'll get slightly lower PPD than locking the affinity to the same number of cores as threads. Certainly using affinity on the SMP client will benefit the GPU client as it will guarantee a completely free core (or more) for the GPU client(s).

Oh, and be aware that running -smp 11, with the SMP client affinity locked to 10 cores, will be less efficient as it's trying to run 11 threads on 10 cores; ideally you need both numbers (-smp x and number of cores) the same. Alternatively, you could run smp -10 and 2 GPU clients and let the OS handle the thread / core allocations.

Let us know how you get on Tatty - perhaps if there's enough of a difference between -smp x and -smp x + affinity mask it'll motivate me to write some software to manage that

**Tattysnuc** · 13-02-2011, 11:46 PM

I've overclocked my CPU to 4.41Ghz, and disabled the GTX 480s.

Currently HFM's predicting 107k which is an increase of 2k on my machine when it's folding dual GTX 480s.

Think I'll just leave it folding on CPU...... Save my leccy!

**Tattysnuc** · 13-02-2011, 11:46 PM

Time per fold.... 22 mins sustained over 6 hours

**Tattysnuc** · 14-02-2011, 10:19 AM

Checked this morning and I got a BSOD at 6am, so I'd managed to fold from 10:30 right through to 6am before I got a problem. Must look up the error because it was an odd one that i'd not seen before.... something about a core waiting for a signal from the secondary processor...

I checked the PPD rate when I rebooted and down-clocked from 4.41 to 4.2 and my rate WITHOUT GPU's was 60+ k for that client alone.

I've left the client running today at 4.2Ghz without any GPU's processing to see if that is in fact accurate (are the bonus points staggered so that the quicker the return the higher the bonus?)

Currently my estimated PPD is now 117k so if this is the case, then I shall be overclocking my CPU client more on my server (currently 3.6Ghz) and fast tracking the move to the new case. After all, If I can get 60k out of the CPU alone, or 30k + 2x15k then I'll go for the CPU only folding as that doesn't involve the leccy for my GTX 295. May as well sell it even and stick in an air cooled card for the amount of use it'll get!

**Golden Dragoon** · 14-02-2011, 11:43 AM

Yes, bonus points are closely related to the time it takes to complete the WU.
Used to be that they were limited to 10x the base points as a bonus, however they removed that restriction when they started putting out different bigadv units that some cpu's could fold a lot faster and hit the 10x limit, so as far as I know there is no limit to the bonus multiplier these days.

The formula used is apparently this:

Code:

final_points = base_points * max(1,sqrt(k*deadline_length/elapsed_time))

Though you may find it more convenient to use this:
http://www.linuxforge.net/bonuscalc.php

**Tattysnuc** · 14-02-2011, 12:42 PM

Thanks mate. That's a nice little nugget of information.

I've been looking at the "Clock Interrupt not received from Secondary Processor" error that I get when overclocking to 4.41Ghz, and surprisingly enough I found an excellent article on eVGA's forums details the voltages to tweak in response to the specific BSODs. It seems to stand up with what I've experienced, so I thought I'd copy and paste it over here for any other budding overclockers, and also so that should the thread get deleted/moved, the link remains.

Here's a link to the original article

The article itself...

VCore (default: 1.28125v, Intel's max 1.375v, VCore over 1.50v on air cooling is risky)
What it does:

Sets max voltage to the CPU cores. (if Vdroop is disabled, it will set the min voltage instead) The i7 doesn't need much voltage at speeds under 3.8ghz. (For example, I can get 3.8ghz on 1.275 vcore) Beyond that the voltage requirements climb sharply.

When to raise VCore:

* BSOD 101 "Clock Interrupt not received from Secondary Processor"
* LinX produces errors that are very
* LinX errors happen within 1 min of LinX
* LinX produces BSOD within the first minute

You know VCore is too high when:

* CPU cores approach a peak of 85c on full load
* It is unknown how higher voltages may impact the life of the CPU

CPU VTT Voltage (default: 1.1V (+0mV in BIOS) Intel's max 1.35 (+250mV)
What it does:

VTT connects the cores with the memory. Raising VTT helps keep a system stable at higher QPI rate. Since QPI is calculated from bclk: the higher the bclk the more VTT voltage you will need. VTT is also called "QPI/DRAM Core" on other motherboards,

Prevent CPU damage: VTT voltage must be within 0.5V of VDimm. Vdimm can fluctuate by as much as 0.05V from settings so you may want VTT within 0.45V of VDimm for that extra margin of safety. Example: if Vdimm is 1.65V, then VTT must be at least 1.20V.

When to raise CPU VTT Voltage:

* BSOD 124 "general hardware failure"
* LinX errors happen only after 10 min or more
* LinX hangs but does not BSOD
* LinX reboots without BSOD

You know CPU VTT Voltage is too high when:

* Most users try and stay below 1.45V (+350V) for 24/7 use without additional direct cooling.
* The motherboard doesn't read the temp so you may need an IR thermometer to be sure you are not pushing VTT too far.

CPU PLL VCore (default: 1.800V, spec range: 1.65V-1.89V)
What it does:

Keeps CPU clock in-sync with bclk.

When to raise CPU PLL VCore:

* May help with stability while increasing the bclk or CPU multiplier.(or may make it worse)
* May help with stability past 210 bclk if you observe that during runtime the QPI Link (found in E-Leet) bounces too much.
* Not a commonly raised. May actually cause instability. Test this variable alone.

You know CPU PLL VCore s too high when:

* Its possible you could actually gain stability by lowering this.

DIMM Voltage (default: 1.5V, Intel's max 1.65)
What it does:

Voltage to the RAM. Despite Intel's warnings, you can raise voltage beyond 1.65 as long as it is always within 0.5V of VTT (as described above).

When to raise DIMM Voltage :

* High performance/gaming RAM usually requires at least 1.65v to run at spec. Some manage to get it slightly lower.
* Stable bclks over 180 often require VDIMM beyond 1.65V. Remember to keep VTT voltage within 0.5V of VDIMM.

You know DIMM Voltage is too high when:

* Memory is too hot. [more info on this is needed]

DIMM DQ Vref (default: +0mV)
What it does:

It is the reference voltage for a pseudo-differential transmission line. The DQ signals sent by the memory controller on the i7 should swing between logic-hi and logic-lo voltages centered around VREF. VREF is typically half way between the drain and source voltages on the RAM. Most VREF generator circuits are designed to center between the VDD and VSS voltages on the RAM. There is usually temperature compensation built into the circuitry as well.

When to raise DIMM DQ Vref:

* Vref might be adjusted if (after measurement) it was determined not to be properly centered between VDD and VSS of the DIMM. Without a good osciloscope it's difficult to imagine that most users could set VREF correctly. They may be able to set VREF empirically by moving it up or down and checking for POST or BSOD problems.

Further reading:

http://download.micron.com/pdf/techn...dr2/TN4723.pdf The document is for DDR2 but differential signaling is a topic that transcends memory models. It has been done for decades in high-end systems and the advantages/drawbacks are well understood.

QPI PLL VCore (default: 1.1v, <1.4v is pretty safe)
What it does:

Keeps on-chip memory controller in-sync with bclk.

When to raise QPI PLL VCore:

* Try raising this along with Vcore and VTT, but in smaller increments.
* Helps stabilize higher CPU Uncore frequencies and QPI frequencies (in CPU feature)
* Try raising this when you increase memory clock speed via multiplier.
* Try raising when LinX produces errors after a few minutes without BSOD

IOH Vcore (default: 1.1V)
What it does:

Sets voltage for on-chip north bridge which connects PCIE2.0, GPU, Memory, and CPU.

When to raise IOH VCore:

* Possibly needed if you overclock your north bridge (via bclk and CPU Uncore freq.)

You know IOH VCore is too high when:

* Memory errors? (just a guess)
* GPU intensive apps like 3dmark vantage crash. (another guess)

IOH/ICH I/O Voltage (default: 1.5V)
What it does:

some sort of on-chip bus voltage. unknown

ICH Vcore (default: 1.05V)
What it does:

South Bridge chip on the motherboard. Connects all motherboard features, cards (not PCIE2.0), and drives to CPU/memory on IOH

When to raise ICH Vcore:

* I don't know if raising this can help in overclocking at all. Possibly necessary in order to keep up with an overclocked northbridge?

You know ICH Vcore is too high when:

* unknown. I wouldn't overvolt it too much though.

PWM Frequency (default: 800)
What it does:

unknown

When to raise PWM Frequency:

* Overclocking beyond 4.2ghz

You know PWM Frequency is too high when:

* VREG approaches 85c

VDroop (default: enabled)
What it does:

Safety feature designed by Intel to protect the chip from excessive wear from voltage spikes. Enabling VDroop keeps actual voltage running below the VCore setting in BIOS

What does disabling VDroop do?

* Makes VCore setting the minimum value for actual voltage; CPU will run at higher voltages than what you set in BIOS.
* Disabling VDroop is the same as enabling Load Line Calibration on other x58 boards.

Why would I want to disable VDroop?

* Some overclockers use it because it allows them to get a high overclock while setting lower VCore in BIOS. This is because the running voltage is actually higher than what was set in BIOS. Disabling VDroop keeps actual voltage higher than what is set for VCore in the BIOS. Enabling Vdroop keeps actual voltage lower than VCore.
* It might help if you are pushing the bleeding edge.

Diagnosing errors. What to do when...

BSODs

* BSOD "IRQL_NOT_LESS_THAN_OR_EQUAL" (I forget)
* BSOD 101 "Clock Interrupt not received from Secondary Processor" Try raising VCore
* BSOD 124 "general hardware failure" Try raising VTT

LinX Errors
If you get an error you would have x same (correct) results and 1 different (an error):

* If the incorrect result differs slightly from the rest (numbers very close, same powers in Residual & Residual (norm)) it is most likely that there's not enough vcore. In this case only a small vcore bump is usually needed to stabilize the system (alternatively, Vtt & GTL tweaking can sometimes fix this too)
* If the wrong result differs much from the others (different power or even positive power in Residual or Residual (norm)) it might be 1) insufficient vcore (the error would happen at the very first runs then) or 2) some memory / NB instability (when it worked for say 10 minutes ok and then produced a different result)

More serious LinX errors:

* BSOD during testing (at the very first runs) is often caused by too low vcore
* System hangs and remains hung it is almost 100% not a CPU but memory or possibly NB issue
* System reboots (with no hang prior to reboot and no BSOD) - a CPU issue, but not vcore related (insufficient PLL or Vtt I guess)
* System hangs for a short while and then BSODs - once again NB or memory problem (but might be wrong Vtt / GTL setting as well)
* System hangs and then just reboots - wrong Vtt (too low or too high) or GTL settings

**kalniel** · 14-02-2011, 01:10 PM

Nice collection. When I was playing with undervolting IOH/ICH voltages to reduce northbridge temps (was causing audio stutter if temp too high) I found that stability on resuming from standby was affected, so if that's ever an issue then you could bump these up a touch.

**Tattysnuc** · 15-02-2011, 11:20 AM

##Update##

CPU folding alone is showing today at 57200 ppd. Going to capture all the different permutations of my rig and publish the results. Will record over a 1 week period

Firstly................CPU only
Secondly............CPU + 1 GTX 480
Thirdly...............CPU + 2 GTX 480
Fourthly ............2 GTX 480
Then I'll got through the core permutations, using a physical core and a hyperthreaded core dedicated repeating steps 2 & 3

**Tattysnuc** · 15-02-2011, 01:27 PM

If anyone lese is wondering, VTT voltage appears as QPI/DRAM Core Voltage on Asus X58 boards

**Union** · 15-02-2011, 01:34 PM

I would be interested to see how things pan out with this.

They really need to update the GPU client to push all the processing over to the GPU instead of still requiring some CPU resources.

**Queelis** · 19-02-2011, 08:44 AM

Update on setting CPU affinities: I've previously only tried seeing how setting the affinity to one less core affected the CPU load, and it was just like expected, down to 75%.

But trying to actually game with such settings didn't go so well. In Counter Strike: Source, which usually has no problem running on my PC (I have the FPS limited to 120), I got heavy FPS drops which made the game almost unplayable. So still I had to close the client and run it with a "-smp 3" parameter, and only then there was no lag.

Pity, this would've been very convenient not to lose any folding time.

Thread: Optimising i7 970 + GTX 480 SLI

LinkBack

Thread Tools

Optimising i7 970 + GTX 480 SLI

Received thanks from:

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Received thanks from:

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Received thanks from:

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Re: Optimising i7 970 + GTX 480 SLI

Thread Information

Users Browsing this Thread

Similar Threads

News - Gigabyte GeForce GTX 480 Super Overclock heads to retail

Reviews - KFA2 GeForce GTX 480 LTD OC Anarchy graphics card

Is now a bad time to buy a GTX 480?

MSI GTX 480 - N480GTX-HydroGen

News - NVIDIA's Fermi to be productised as the GeForce GTX 470 and GTX 480

Posting Permissions