Configuring an A64 System
For your everyday overclocker, one of the most cost-effective solutions is to opt for a s754 512k 3000+, a CG Newcastle. CG Clawhammers are better performers, but are difficult to find outside of mobile parts. Most boards to date have certain issues with these.
The 939 Newcastles are rather pricey. Their dual-channel compensates for the lack off cache and adds a bit of a performance as well, but nothing especially significant. However, if mathematical applications, etc. are a must, this may be worth a consideration.
For the best of the best, there is no substitute for the 939 Clawhammer, AMD’s flagship line. The FX53 virtually untouchable by another processor once you get it going.
As far as chipsets go, the nF3 250GB is the best choice for most. 939 boards based on this chipset are right around the corner, as well. However, the K8T800 Pro has a slight edge in performance. Some VIA boards have PCI/AGP locks, and others don’t. The one’s that do can sometimes be temperamental. For hardcore overclockers, the VIA may be the best route. Currently, good choices for socket 754 are the EPoX 8KDAJ/3+, based on the nF3 250GB chipset, the Gigabyte K8NS Pro, which, although does not have the GB chipset, is still a very solid overclocker, and the VIA-based Abit KV8 Pro. The KV8’s seem to have PCI/AGP locks, and sport a wide range of voltage options. The VIA-based Asus A8V and nVidia-based Gigabyte K8NSNXP are the front-runners. I would opt for the latter personally, because of its no-fuss vdimm mod, and guaranteed PCI/AGP lock. (The Asus appears to be hit or miss).
Memory is where things get tricky. Running at memory voltages of over 3.3v has proven to be rather dangerous, so getting low latency memory, such as BH5/6, up to the extremely quick speeds that the A64 can take is usually not possible. Still, IMHO
, it is a much better choice than high latency, high speed memory, even without a voltage mod. The tight latencies tend to be more useful than slighter higher bandwidth. Another complication with the A64 is that it tends to be very, very picky with double-sided modules. When running two at a time, it can be very difficult to get a decent overclock, but even with one, don’t expect to get as far as you may on other platforms. This is another reason why it’s a good idea to use low latency memory. Hitting a relatively low speed wall is quite possible, and if you’re using high latency memory, you’re pretty much trapped with low performance in this situation. Low latency, low speed, on the other hand, is still very competitive. My top picks would be memory based on Micron mT (e.g. OCZ 3200/3500/3700 EB), Winbond BH5/6 (e.g. Mushkin 222 Special), or Samsung TCCD (Corsair 3200XL), in that order. All are solid choices, and can be found in dirt cheap value memory as well, if you look hard enough. If you plan to go the high speed route, Hynix D5 is just about the choice that I can recommend, as it can reach 280+ speeds, and give low latency memory a run for it’s money. Only use single-sided memory in this situation.
If you are planning to use a motherboard without PCI/AGP locks, it is advisable to use a PCI66 compliant SATA/ATA controller card, to allow you to use high PCI speeds without corruption. Also desirable in this case are video cards that can support high AGP speeds. (Especially nVidia-based cards)
You are essentially in control of the speeds of four different…ummm lets call em data paths; the CPU, the memory bus, the HyperTransport bus, and the HyperTransport’s effective data rate. Overclocking the CPU is done pretty much as it always has been, except that the HyperTransport substitutes the front side bus.
The HyperTransport can almost always go just as high as you need it, granted that you don’t exceed the motherboard’s maximum supported data rate by too much. For systems that support a 1000MHz HyperTransport data rate, for example, one could use a 200MHz HyperTransport bus with a 5x LDT multiplier. However, using 5x250 would result in an effective 1250MHz, which would almost certainly lead to instability. The LDT could be dropped to 4x, allowing for a higher HyperTransport bus speed with stability, 250, but resulting in the same 1000MHz effective data rate as default. The HyperTransport bus being increased alone doesn’t accomplish anything; unlike increasing the front side bus does other platforms. Even raising the effective data doesn’t add any noticeable increase in performance, as the bus is already so wide, that saturating it isn’t very likely. For this reason, the nF3 250’s and 150’s perform quite similarly to one another. My suggestion would be to leave the effective rate at as close to stock as possible, and raise the HyperTransport bus speed only as much as necessary. Unless you’re using an 8x multiplier, there shouldn’t be much reason to go far above 300MHz in many cases.
What complicates what multiplier and HyperTransport speed you should use are the CPU/memory dividers. No motherboard allows you to manipulate them directly. Instead, they provide “maximum memory clocks,” or supposed HTT/memory ratios. Make no mistake, no such thing exists. The memory is derived off of the CPU speed, but it’s never made clear, and the dividers need to be manipulated indirectly. Also, CPU/mem dividers are integral only; there are no half dividers, so it’s advisable not to use half multipliers. Ok, what I just said probably doesn’t make much sense, so here are some examples of how to get certain get CPU/mem dividers:
CPU/5-8 - Set memory to 200 and multiplier to desired divider
Memory to 200, multi to 9x(if available)
Memory to 183*, multi to 8x
Memory to 200, multi to 10x(if available)
Memory to 183*, multi to 9x(if available)
Memory to 166, multi to 8x
Memory to 200, multi to 11x(if available)
Memory to 183*, multi to 10x(if available)
Memory to 166, multi to 9x(if available)
Memory to 150, multi to 8x
Memory to 200, multi to 12x(if available)
Memory to 183, multi to 11x(if available)
Memory to 166, multi to 10x(if available)
Memory to 150, multi to 9x(if available)
Memory to 133, multi to 8x
Doesn’t make much sense? Don’t worry, it shouldn’t. You will probably need to experiment with different multipliers and max mem clocks to find the CPU/mem divider that you desire. Using half multipliers complicates things further, as the memory is divided integrally. Just to eliminate variables, drop the LDT multiplier down to 3x if your board supports a 1000MHz HyperTransport speed, or down to 2x if 800MHz or 600MHz is it’s maximum. You can also increase in the HTT/LDT voltage on some motherboards, which can give you an extra 20-50MHz extra MHz on your effective rate in some cases.
As I've always stressed, overclocking needs to be done carefully and systematically. This is especially important with the A64. Focus on one area that you wish to overclock, and overclock it alone. For example, if you wish to overclock your memory, drop your LDT and CPU multipliers as low as they can go, and see how far your memory can go with everything else clocked low enough to not hinder stability. For overclocking the CPU, drop the LDT as low as it can go, and set the max memory clock as low as it can go as well. Once you find your maximum memory speed and CPU clock, play around with the max memory and CPU multiplier to find the suitable CPU/mem ratio. Once you've already got in mind how far the CPU and memory each can go, this isn't too difficult. I cannot stress enough how important it is to isolate variables. It's all too common that people try to max everything out at once, fail, and then give up out of frustration. Take your time, be patient, and have fun. Dividing and conquering can make the task of overclocking the A64 a lot less daunting.
This excellent tool by Cpjk can be very helpful in removing some of the confusion in figuring out how to run things.
If you're lucky enough to have an FX, though, you don't need to bother with finding the right CPU/memory ratio. Simply find your maximum memory clock, and then increase the multiplier as necessary to max out the processor.
On a related note, the absolute core voltage for A64’s rated by AMD is 1.65v, opposed to 2.25v(I believe) for Bartons. The heat output of A64’s at the same speeds as AXP’s is roughly equivalent to what they’d put out with 0.2v less. For this reason, it usually is not too beneficial to exceed a core voltage of 1.7v or so on air cooling. On your everyday R404A setup, 1.8-1.85v usually appears to be all that’s needed for an optimal overclock.
One rather important memory-related setting is the command rate, a.k.a CPU Interface on many other boards. The default for C0 processors is 1t, and the default for the CG’s is 2t. 1t is quicker, but makes overclocking the memory with double-sided sticks especially difficult in many cases. Running at 2t, however, takes off about 1 sec in SuperPI and PIFast, and makes a couple hundred point difference in 3DMark01. The one benchmark where it takes a significant toll is the Sandra Memory Bandwidth Benchmark, where it takes 10%, or 300-400 MB/sec off. I don’t see having to run at 2t as the end of the world, unless you’re a Sandra fanatic. The difference between 1t and 2t is actually less than that between tRCD2 and tRCD3 in my experience. Again, there is no one size fits all solution. It may take some experimentation to see what combination of command rate, latencies and memory speeds are optimally for you. Low tRCD is highly recommended, but CAS doesn't matter very much. tRAS at 10, and nothing else, seems to deliver the best performance, while backing it down much lower begins to hurt.
Some notes on Windows Tweaking/Overclocking
Overclocking A64s within Windows was originally done when high HTT’s caused BIOS corruption, however this doesn’t appear to be an issue today. It still can be very convenient, and for some boards like mine that don’t allow overclocking in the BIOS with mobiles, can be a godsend. ClockGen is a Windows-based utility for overclocking. It allows multiplier, voltage, HyperTransport speed, and PCI/AGP bus speed manipulation. Changing the voltage doesn’t work on all boards, and the CPU/mem ratio and LDT multiplier cannot be changed using the utility, so some settings must be set in the BIOS. It also allows for profiles, so you can quickly change speeds on the fly. To make a profile, put the signature of your board, as found on the website in brackets on the first line, e.g. [CG-NVNF3] for nForce3’s, and then the values you want to change in the succeeding lines, e.g. FID=9.0, HTT=250. For nForce3 boards, if you set your AGP rate or HTT rate anywhere above spec in the BIOS, the AGP/PCI lock is enabled, so you can increase the HTT easy in Windows. Setting the HTT to 201 in the BIOS is the most common technique. The nVidia System utility is a nice tool to have for nVidia-based boards. It allows manipulation of the tRAS, tRCD and tRP within Windows, and also the changing of the HyperTransport and AGP/PCI speeds. A64 Tweaker is an excellent utility written by CodeRed. It allows manipulation of just about every memory-related setting on the fly in Windows. It’s made my life dozens of times easier when trying to test things out. It also has much more functionality than you’ll find in most BIOSes.