Which is faster, 9x333 or 8x375 on a q6600? Results inside

**graysky** · 09-06-2007, 10:37 AM

What is a better overclock?

Good question. Most people believe that a higher FSB and lower multiplier are better since this maximizes the bandwidth on the FSB. Or is a low bus rate and higher multiplier better? Or is there no difference? I looked at three different settings on my Q6600:

9x333 = 3.0 GHz (DRAM was 667 MHz)
8x375 = 3.0 GHz (DRAM was 750 MHz)
7x428 = 3.0 GHz (DRAM was 856 MHz)

The DRAM:CPU ratio was 1:1 for each test and the voltage and timings were held constant; voltage was 2.25V and timings were 4-4-4-12-4-20-10-10-10-11.

After the same experiments, at each of these settings, I concluded that there is no difference for real world applications. If you use a synthetic benchmark, like Sandra, you will see faster memory reads/writes, etc. with the higher FSB values -- so what. These high FSB settings are great if all you do with your machine is run synthetic benchmarks. But the higher FSB values come at the cost of higher voltages for the board which equate to higher temps.

I think that FSB bandwidth is simply not the bottle neck in a modern system... at least when starting at 333. Perhaps you would see a difference if starting slower. In other words, a 333 MHz FSB quad pumped to 1333 MHz is more than sufficient for today’s applications; when I increased it to 375 MHz (1500 MHz quad pumped) I saw no real-world change; same result when I pushed it up to 428 MHz (1712 MHz quad pumped). Don’t believe me? Read this thread wherein x264.exe (a video encoder) is used at different FSB and multiplier values. Have a close look at the 3rd table in that thread and note the FPS (frames per second) numbers are nearly identical for a chip clocked at the same clockrate with different FSB speeds. This was found to be true of C2Q as well as C2D chips.

You can do a similar test for yourself with applications you commonly use on your machine. Time them with a stop watch if the application doesn’t report its own benchmarks like x264 does.

Some "Real-World" Application Based Tests

Three different 3.0 GHz settings on a Q6600 system were tested with some apps including: lameenc, super pi, x264, winrar, and the trial version of photoshop. Here are the details:

Test O/C 1: 9x333 = 3.0 GHz

Test O/C 2: 8x375 = 3.0 GHz

Test O/C 3: 7x428 = 3.0 GHz

Result: I could not measure a difference between a FSB of 333 MHz, 375 MHz, or 428 MHz using these application based, "real-world" benchmarks.

Since 428 MHz is about 28 % faster than 333 MHz, you’d think that if the FSB was indeed the bottle neck, the higher values would have given faster results. I believe that the bottleneck for most apps is the hard drive.

Description of Experiments and Raw Data

Lame version 3.97 – Encoded the same test file (about 60 MB wav) with these commandline options:

Code:

lame -V 2 --vbr-new test.wav

(which is equivalent to the old –-alt-preset fast standard) a total of 10 times and averaged play/CPU data as the benchmark.

Super Pi version 1.1 – Ran both the 1M and 2M tests and compared the reported total number of seconds to calculate as the benchmark.

x264 version 0.54.620 – Ran a 2-pass encode on the same MPEG-2 (480x480 DVD source) file twice and averaged the FPS1 and FPS2 numbers as the benchmark. In case you’re wondering, here is the commandline options for this encode, pass1:

Code:

x264 --pass 1 --bitrate 1000 --stats "C:\work\test-NEW.stats" --bframes 3 --b-pyramid --direct auto --subme 1 --analyse none --vbv-maxrate 25000 --me dia --merange 12 --threads auto --thread-input --progress --no-psnr --no-ssim --output NUL "C:\work\test-NEW.avs"

And for pass2:

Code:

x264 --pass 2 --bitrate 1000 --stats "C:\work\test-NEW.stats" --ref 3 --bframes 3 --b-pyramid --weightb --direct auto --subme 6 --trellis 1 --analyse all  --8x8dct --vbv-maxrate 25000 --me umh --merange 12 --threads auto --thread-input --progress --no-psnr --no-ssim --output "C:\work\test-NEW.264" "C:\work\test-NEW.avs"

The input avisynth script was:

Code:

global MeGUI_darx = 4
global MeGUI_dary = 3
DGDecode_mpeg2source("C:\work\test-new.d2v")
AssumeTFF()
Telecide(guide=1,post=2,vthresh=35) # IVTC
Decimate(quality=3) # remove dup. frames
crop( 2, 0, -10, -4)
Spline36Resize(640,480) # Spline36 (Neutral)

RAR version 2.63 – Had rar run my standard backup batch file which generated about 0.98 G of rars (1,896 files totally). Here is the commandline I used:

Code:

rar a -u -m0 -md2048 -v51200 -rv5 -msjpg;mp3;tif;avi;zip;rar;gpg;jpg  "e:\Backups\Backup.rar" @list.txt

where list.txt a list of all the dirs I want it to back up. I timed how long it took to complete with a stop watch. I ran the backup twice and averaged it as the benchmark.

Trial of Photoshop CS3 – I used the batch function in PSCS3 to batch bicubic resize 10.1 MP to 0.7 MP (3872x2592 --> 1024x685), then applied an unsharpen mask (60 %, 0.8 px radius, threshold 12), and finally saved as quality 8 jpg. In total, 57 jpg files were used in the batch. I timed how long it took to complete two runs, and averaged them together as the benchmark.

Here are the raw data if you care to see them:

**Zak33** · 09-06-2007, 10:46 AM

nice post chap. Good to see people trying new things.

**Clunk** · 09-06-2007, 12:16 PM

But, if you have a CPU with a 9x multi, why on earth would you want to run an 8x multi at the same speed?

I could understand if you wanted to bump the ram speed up, but then, why wouldnt you use a divider, and run at a 9x multi?

Seems like a pointless exercise to be honest.

**Zak33** · 09-06-2007, 12:38 PM

Clunk...sometimes a chip will hit a certain speed, maxed out, and the ram will go higher, but you need to drop the multi to get that stable speed.

The question is: Should you forsake the multi and take advantage of the higher bandwidth of the ram, keeping you on the limit of overclocking....or should you settle for a lower ram timing and go with the standard multiplier.

I agree it's less relevent in 2007, but go back a few years, and ram speed was life. I'd also say that Tarinder loves ram speeds....loves.

**Clunk** · 09-06-2007, 12:51 PM

Zak,

I know what he is saying, but I cant see a situation with a C2D where you would ever need to do it at the speed he is talking about, which makes it pretty pointless.

If he is refering to older CPUs, then really, he should have used older CPUs for the tests.

If you look at some of the results in the overclocking threads, you can acheive huge benefits buy using an 8x multi to avoid certain parts of the FSB range, to avoid the chipset strap change. For example, at 400fsb, the P5B's chipset strap changes to the 1600 strap, and as a result, there is a massive, and noticable drop in memory bandwidth, right up to around 463fsb.

So, if you are stuck at 3.6ghz, because of the chipset strap, and you know your cpu can go higher, then you can use the 8x multi (or 7x if your board will allow) to bypass this area, still run at the same speed, and still keep your memory bandwidth.

**Supershanks** · 09-06-2007, 12:52 PM

Ran both the 1M and 2M tests and compared the reported total number of seconds to calculate as the benchmark

As basically the difference is in memory speed/bandwidth it would surely be a better representation to run 32m test.

**Phil_P** · 09-06-2007, 02:09 PM

Originally Posted by Clunk

But, if you have a CPU with a 9x multi, why on earth would you want to run an 8x multi at the same speed?

I could understand if you wanted to bump the ram speed up, but then, why wouldnt you use a divider, and run at a 9x multi?

I thought it was beneficial to try and run memory in sync (1:1) rather than using a divider. Therefore, to hit 400fsb and run DDR2-800 memory in sync it may be required to lower the multiplier (especially on something like a Q6600 Quad) if the CPU or temps are otherwise limiting you achieving that at the default multiplier.

Many distributed computing programs are very dependent upon fsb/memory bandwidth. I've not tested on modern systems, but older Athlon XP based systems displayed an almost linear relationship to fsb speeds for a given clock speed.

**Clunk** · 09-06-2007, 02:26 PM

Originally Posted by Phil_P

I thought it was beneficial to try and run memory in sync (1:1) rather than using a divider. Therefore, to hit 400fsb and run DDR2-800 memory in sync it may be required to lower the multiplier (especially on something like a Q6600 Quad) if the CPU or temps are otherwise limiting you achieving that at the default multiplier.

Many distributed computing programs are very dependent upon fsb/memory bandwidth. I've not tested on modern systems, but older Athlon XP based systems displayed an almost linear relationship to fsb speeds for a given clock speed.

Some dividers are fine, of course it depends on the chipset, but the 965 chipset that he is testing on, you can use some of them with no problems whatsoever.

Again, he isnt talking about older systems, he is talking specifically about a q6600 and a P5B deluxe.

On the P35 chipset, using a lower multi and a higher FSB results in huge memory bandwidth.

**Agrippa** · 09-06-2007, 03:38 PM

It's a test designed purely to find out if a higher bus speed yields a tangible benefit I think Clunk, nothing more than that. Pretty useful as such, since many people appear to think that a maximised bus speed makes a huge difference, while in reality you'd be hard pressed to notice any real-life difference at all.

Good stuff graysky!

**Agent** · 09-06-2007, 03:56 PM

Kinda funny - Me and Lowe were talking about this last night during a machine build

Since the Hyper Transport system come about and the quad pumped FSB, overclocking either of them hasn't had huge benefits compared to raw core speed.

**graysky** · 09-06-2007, 04:17 PM

Originally Posted by clunk

I know what he is saying, but I cant see a situation with a C2D where you would ever need to do it at the speed he is talking about, which makes it pretty pointless.

I agree: I just wanted to have some data to back-up my thought which I do now

That said, I do totally agree with what you go on to say about different FSB straps. The reason I did these experiments was to prove to myself and hopefully others that just because you can add more FSB while reducing the multiplier doesn't mean that it will translate into anything but useless heat and power consumption.

Originally Posted by supershanks

As basically the difference is in memory speed/bandwidth it would surely be a better representation to run 32m test.

I'll bite dude; that is an interesting though and I'm running those experiments right now to see. I'll update the main thread w/ the results. In a similar light, I think Phil might have a point:

Originally Posted by Phil_P

Many distributed computing programs are very dependent upon fsb/memory bandwidth.

I'm getting too lazy to keep doing these experiments, but it would be interesting to test this out assuming DCP's like folding@home, seti@home, etc. have some sort of internal benchmark they can run. Anyone willing to give this a try?

To the rest of you who replied, thanks for the kind words; I'm always glad to contribute back to the community.

**graysky** · 09-06-2007, 05:56 PM

Okay, here are the 32M data with the average of 2 runs in (red):

Difference between these is 1.7 % while the differences while the differences in FSB is 12.6 %. I'm too lazy to run the test with more n values (i.e. average 3 or 6 runs).

**graysky** · 10-06-2007, 02:18 AM

In the interest of overkill, I just completed the same benchmark @ 7x428 (edited first post in thread). Results are the same: no benefit of an even higher FSB.

**Agrippa** · 10-06-2007, 03:14 AM

Good work graysky! Always nice when the numbers prove your common sense and intuition correct.

**Clunk** · 10-06-2007, 12:06 PM

Could you just confirm that you actually got 16 seconds for 1M superpi @ 333FSB x9 with ram running at 666mhz and timings as per your "my system"?

And the same for the 375x8 as well please.

Could you post screenshots and settings used to achieve these runs please, as I'm really stuggling to get below 16.75 seconds with same speeds on the CPU, and the ram running at 666mhz, or even with the ram running 1200mhz with 4-4-4-4 timings.

**graysky** · 10-06-2007, 12:48 PM

Originally Posted by Clunk

Could you just confirm that you actually got 16 seconds for 1M superpi @ 333FSB x9 with ram running at 666mhz and timings as per your "my system"?

And the same for the 375x8 as well please.

Could you post screenshots and settings used to achieve these runs please, as I'm really stuggling to get below 16.75 seconds with same speeds on the CPU, and the ram running at 666mhz, or even with the ram running 1200mhz with 4-4-4-4 timings.

Sure dude, I never changed my RAM timings or voltage for these 3 runs; DRAM:CPU was always 1:1 and beyond the CPU-Z screenshots above, the timing settings were and still are as follows:

The only 3 things that got changed were:

1) Voltages (vcore, NB, SB, etc.)
2) FSB
3) Multiplier

Here's a super pi from the most recent run snap:

Thread: Which is faster, 9x333 or 8x375 on a q6600? Results inside

LinkBack

Thread Tools

9x333, 8x375, or 7x428 on a Q6600 - Which is faster?

Thread Information

Users Browsing this Thread

Similar Threads

ATI Catalyst 5.8 released

Posting Permissions