Just found this thread, my first thought was "are you running 64 bit Linux"
Firstly, the SSE implementation is I gather different on the 2 platforms, so if the SSE is done in small chunks then the AMD will be faster. That can of course be optimised out.
The other thing that strikes me: The AMD is only at 80% on one core, so you have a bottleneck. If you can feed the CPU faster, you will get more out of it.