AMD - Piledriver chitchat

**CAT-THE-FIFTH** · 08-08-2014, 10:58 AM

One of the major limitations with audio encoding is actually the drive/drive interface used when encoding which many sites quietly bypass.

I used an external USB2 DVD rewriter and iTunes is single threaded.

I really would love to get my hands on a A8 7600 for example and test it in many games in IGP and Crossfire modes. Don't have any money for any major computer expenditures though ATM!!

**scaryjim** · 08-08-2014, 11:11 AM

Originally Posted by Noxvayl

... Is that going to give the results you are interested in scaryjim? (http://www.phoronix-test-suite.com/?k=home) ...

Ugh, density of stuffs! A comprehensive test suite is a nice idea, but it'd take weeks to understand what all the tests mean...

It looks like the only straight audio tests are encoding, which - while it's a part of what I do - is only a small part of my overall process. I might have to work on some tests of my own - I'd love to know where the limiting factor is on, for instance, applying a light vocal reverb to an hour of podcast! Or running compression over a similar length of file. I know those things take several minutes on the computers I've run them on, but I've still not worked out whether it's a CPU bottleneck, RAM bottleneck, IO bottleneck or what...

I should probably do some more research really, I bet the answer is out there somewhere

**Noxvayl** · 08-08-2014, 03:10 PM

Originally Posted by scaryjim

Ugh, density of stuffs! A comprehensive test suite is a nice idea, but it'd take weeks to understand what all the tests mean...

It looks like the only straight audio tests are encoding, which - while it's a part of what I do - is only a small part of my overall process. I might have to work on some tests of my own - I'd love to know where the limiting factor is on, for instance, applying a light vocal reverb to an hour of podcast! Or running compression over a similar length of file. I know those things take several minutes on the computers I've run them on, but I've still not worked out whether it's a CPU bottleneck, RAM bottleneck, IO bottleneck or what...

I should probably do some more research really, I bet the answer is out there somewhere

Tell you what, you do the program that will help me do the testing and then I'll give you all the data

As I said earlier I know nothing in this regard, I am learning. I can run that suite and only give you the data from the one test of interest to you; that is if that would be helpful.

I'll have access to this machine at all times from now until 20 September 2014; I'll also have access to my own machine, my parents machine and another family machine which are all desktops and I have all the details for the bits inside them because I built them. They vary a lot but hopefully you can tell where the bottleneck lies if I do some tests on those machines.

I would enjoy doing it so if it would benefit you please do sort out that benchmarking program. I would also enjoy learning about the process, something that interests me but never got round to investigating.

**DanceswithUnix** · 08-08-2014, 04:27 PM

Originally Posted by scaryjim

Ugh, density of stuffs! A comprehensive test suite is a nice idea, but it'd take weeks to understand what all the tests mean...

It looks like the only straight audio tests are encoding, which - while it's a part of what I do - is only a small part of my overall process. I might have to work on some tests of my own - I'd love to know where the limiting factor is on, for instance, applying a light vocal reverb to an hour of podcast! Or running compression over a similar length of file. I know those things take several minutes on the computers I've run them on, but I've still not worked out whether it's a CPU bottleneck, RAM bottleneck, IO bottleneck or what...

I should probably do some more research really, I bet the answer is out there somewhere

Yeah I am not a fan of the Phoronix way of testing, seems very undirected and random.

I suppose it comes down to what package you use and how you use it. If it is something like Audacity which is free then benchmarking should be workable. Would want some realistic task for it to perform though, and an easily available source file.

**scaryjim** · 08-08-2014, 04:38 PM

I can't help wondering if there's a way to script audacity to perform some common filtering tasks on a set sound file. One of the big things I'm currently doing is noise removal, compression and reverb on ~ 1hour of vocal recording. Each of those tasks takes several minutes to complete, so it'd be good to know where the bottlenecks are.

I'll do some research and see if I can work something out

EDIT: audacity has scripting support, but it's basic and experimental - I certainly won't be doing anything with it in the near future...

**HalloweenJack** · 10-08-2014, 11:38 PM

http://www.anandtech.com/show/8316/a...xtreme9-review

^^ new 9590 review (retail kit)

**kalniel** · 11-08-2014, 08:25 AM

Originally Posted by HalloweenJack

http://www.anandtech.com/show/8316/a...xtreme9-review

^^ new 9590 review (retail kit)

Though actually the review isn't of the retail one, but the OEM one the author picked up from a systems integrator sell off

**HalloweenJack** · 11-08-2014, 10:35 AM

meh - didn't see that part , thought it was the retail kit they were using

**HalloweenJack** · 11-08-2014, 05:54 PM

http://www.anandtech.com/show/8362/a...a1100-revealed

**watercooled** · 12-08-2014, 05:34 PM

Some more details about Denver are being revealed by Nvidia. One thing I keep seeing is big flashy references to 'Dynamic Code Optimisation'.
http://wccftech.com/nvidias-64bit-de...ocked-250-ghz/

Judging by how they describe it, hasn't that kinda been a standard feature set of instruction decoders for some time now? And, by 'binary translation', I wonder if they're just referring to uOps, which are pretty much ubiquitous in modern processors of this class?

Edit: Also, I wonder what they mean by '7-wide' exactly? On the block diagram I see 7 execution ports, like A15 and Krait, but what is its decode width?

Edit2: I've just looked back at some of the earlier marketing material from Nvidia - they compare A15 being '3-wide' to Denver being '7-wide'. So do they really have a 7-wide front-end to go along with the 7 execution ports, is it some architectural peculiarity to do with their 'binary translation', or have the marketing team just got it arse-about-face?

**DanceswithUnix** · 12-08-2014, 10:24 PM

Originally Posted by watercooled

And, by 'binary translation', I wonder if they're just referring to uOps, which are pretty much ubiquitous in modern processors of this class?

I thought only x86 CPUs used uOps, because they are the only ones trying to run a dog's dinner of an instruction set. The whole point of RISC is easy & hence direct decode.

**watercooled** · 12-08-2014, 10:49 PM

AFAIK ARM (or at least some of the higher performance OoO cores) use uOps too, although most instructions are decoded to one uOp. I'll see if I can find some more details and post back.

Edit: Found some reference at ARM and Anandtech:
http://infocenter.arm.com/help/topic.../BABBGJHI.html
http://www.anandtech.com/show/7126/t...e-cortex-a12/3

I'm not exactly sure how they compare to x86 uOps though.

Edit2: Although 'ubiquitous' was probably too strong a term, having read around a bit more. I think I've seen the reference to ARM uOps in the past and read to much into it.

Also, I've just spotted this (unrelated) while scanning through some posts:

Originally Posted by DanceswithUnix

Good, though I still wonder just where AMD are headed these days.

Now see those blocks marked "Decode" that convert amd64 instructions into internal uops, can they make that do ARM instructions I wonder...

http://forums.hexus.net/cpus/241925-...ml#post2747673
Interesting prediction.

**DanceswithUnix** · 13-08-2014, 07:20 AM

Originally Posted by watercooled

I'm not exactly sure how they compare to x86 uOps though.

Edit2: Although 'ubiquitous' was probably too strong a term, having read around a bit more. I think I've seen the reference to ARM uOps in the past and read to much into it.

We can only guess without a lot more information. Or possibly asking the question, there have been "ask ARM" events on the web in the past.

But my guess is that they aren't that similar. ARM has some baggage from the 32 bit days, some must be quite irritating for them, and the CPU has modes for 32 bit and 64 bit so it can run more than one instruction set. I can't remember any specifics in ARM, but RISC cpus can have quite long instructions (like the Power architecture push multiple for procedure entry) that will internally be expanded up to lots of instructions like a macro expansion. However, there I think the similarities with x86 end.

x86 is an architecture that no sane person would create today. It was rubbish when 8086 was new. In the face of such madness, translating the instruction stream into one suitable for multi issue makes sense.

ARM V8 on the other hand looks far more sane. I would expect the uOps to be close to V8 native with some macro operations to improve code density where it makes sense. But then the devil is in the details with these things, perhaps there is a non obvious reason why translation would be more aggressive.

Also, I've just spotted this (unrelated) while scanning through some posts:

http://forums.hexus.net/cpus/241925-...ml#post2747673
Interesting prediction.

lol, well that just seems an obvious thing to fall out of AMD. They have their own problem to solve, two completely different architectures, and for AMD64 two different implementations with the cats and the big cores. The more they can share across the designs, the better use they can make of their limited engineering resources.

Now, if they could merge the designs for the cat cores so that their tablet designs could run both AMD64 and ARM code that would be rather interesting. Not sure it would make commercial success as a product idea, but it reduces the number of designs they work on.

Edit to add: Have read up some more of the latest info on Denver. Nvidia originally wanted to make an x86 chip, and I think it still shows. Static 7 way issue? The rule of thumb is that every 6th instruction is a jump, and I don't think anyone has cracked predicting two dependent jumps in a single cycle yet. Making use of more than 3 issue is hard even with out of order. So this just has to be a VLIW design. That, and the translation cache, and the way it seems to have software assist running at a level beneath the native OS, this really sounds like a new Transmeta design.

**watercooled** · 13-08-2014, 12:06 PM

According to this (not sure if they're referencing Nvidia or if it's their own take on it) http://hothardware.com/News/Nvidias-...a-Rides-Again/ they're claiming it's not VLIW?

In terms of issue width, isn't that the reason ARM A7, Saltwell, Silvermont, Bobcat, Jaguar are all 2-issue for power efficiency? It's why I'm thinking the '3-issue vs 7-issue' comparison isn't really a fair one.

Though, even with compiler-scheduled code on graphical workloads like on GPUs, AMD still saw it fit to drop from 5-wide to 4-wide VLIW to make better use of resources? I'm not sure if that's directly comparable, mind, but I would've thought it was easier to extract ILP from graphical workloads vs doing it dynamically on CPU workloads?

The Denver scheduling hardware must be massive - they claim to have chosen to not use hardware OoO for efficiency/die size reasons, but Denver looks massive, assuming it's made using the same libraries, node.

**DanceswithUnix** · 13-08-2014, 04:54 PM

Originally Posted by watercooled

According to this (not sure if they're referencing Nvidia or if it's their own take on it) http://hothardware.com/News/Nvidias-...a-Rides-Again/ they're claiming it's not VLIW?

In terms of issue width, isn't that the reason ARM A7, Saltwell, Silvermont, Bobcat, Jaguar are all 2-issue for power efficiency? It's why I'm thinking the '3-issue vs 7-issue' comparison isn't really a fair one.

Though, even with compiler-scheduled code on graphical workloads like on GPUs, AMD still saw it fit to drop from 5-wide to 4-wide VLIW to make better use of resources? I'm not sure if that's directly comparable, mind, but I would've thought it was easier to extract ILP from graphical workloads vs doing it dynamically on CPU workloads?

The Denver scheduling hardware must be massive - they claim to have chosen to not use hardware OoO for efficiency/die size reasons, but Denver looks massive, assuming it's made using the same libraries, node.

I think the point here is that the scheduling hardware is tiny. I suspect it isn't out of order because it isn't even doing dependency checks on the instruction stream. If the translation/optimization layer makes sure all is well then they shouldn't have to.

So they can call it all they want, but issuing 7 ops in parallel in a static design must at the very least borrow very heavily from VLIW. I don't see that as a bad thing, they have enough successful previous in graphics architecture to know how to make that fly if anyone can.

I don't think Intel/HP liked calling Itanium VLIW either. Intel were trying to make this work at the high end of the range which I think is way harder, trying to do it at the lower end might just work.

**watercooled** · 13-08-2014, 07:19 PM

I'm not denying it's VLIW likeness BTW, just some articles are claiming so.

Hmm, so if the scheduling etc is done in software, does that mean it literally runs on the cores? I was kind of interpreting the 'software' to be something running on independent microcontrollers, kind of like the dispatch processors we see on GPUs. But reading back through, it seems that's not the case?

Thread: AMD - Piledriver chitchat

LinkBack

Thread Tools

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Received thanks from:

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Received thanks from:

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Received thanks from:

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Re: AMD - Piledriver chitchat

Thread Information

Users Browsing this Thread

Posting Permissions