What the benchmarks show is that in particular tasks the bulldozer architecture does well, but we already knew that - it has a massive parallel int throughput - just take a look at the
Dhrystone CPU test in the guru3d review you linked to. FX-4100's parallel int throughput (@3.6GHz) is higher than the PII X4 975BE (also @ 3.6GHz). So in ideal circumstances the parallel int throughput of bulldozer *has* gone up compared to Phenom II. So we can assume that in the benchmark tasks you've highlighted Bulldozer is able to make use of it's very high int throughput. Compare that to the Whetstone (FPU) chart, and you'll see where AMD have come unstuck: given ideal FPU circumstances the FX4100 falls well behind the PII 975BE, and is scrapping with the A8-3850, which has a 700MHz (~ 20%) clock speed shortfall. Any task which involves FPU usage, or fails to make optimal use of the parallel int throughput, is going to perform less well.
Bulldozer is an odd beast really - there's obviously a lot of potential sitting there untapped. As I said before, I'd love Trinity to fix all of the problems and come out 20% faster than Llano,
at the same clocks, in a wide range of tasks. But what I suspect is that we'll get a 20% increase in those parallel int tasks where bulldozer is already matching / beating PII on IPC, coupled with the same performance, or possibly even slightly worse, on all the other tests - bulldozer is already looking at up to 20% worse IPC in flop workloads compared to Llano, and the roadmap only claims ~15% improvement for piledriver...