Read more.Making multi-core systems cool, clever, and clearer to program.
Read more.Making multi-core systems cool, clever, and clearer to program.
I'm not convinced that methods they use to help make use of their XCore tiles will have any relevance to x86 multi-core optimisation.
x86 is a big problem, I agree. It doesn't directly lend itself to the same sorts of techniques used in a newly developed architecture like the XCore. But I'm sure there are lessons to be learned.
Something that really amuses me is how modern x86 CPUs decode x86 instructions into internal RISC-like codes - it's almost as though we should have ditched x86 a while ago, but it had so much momentum that we couldn't.
Why haven't PC's moved onto new architectures though? Surely the market is right for a serious rethink of the way things are done, with Moore's law getting close and closer to failing, shouldn't people stop and think is there a better way?
(\___/) (\___/) (\___/) (\___/) (\___/) (\___/) (\___/)
(='.'=) (='.'=) (='.'=) (='.'=) (='.'=) (='.'=) (='.'=)
(")_(") (")_(") (")_(") (")_(") (")_(") (")_(") (")_(")
This is bunny and friends. He is fed up waiting for everyone to help him out, and decided to help himself instead!
Vista broke about 0.5% of things in order to introduce a 'better way' - people were still up in arms about it. Now imagine breaking 99.5% of things to introduce a 'better way'.. ain't going to happen.
The experience of the 64 generation also backs that up - x86 is designed around 16 and 32 bit registers/operations. When they were designing 64bit side of things Intel tried to create a new architecture (Itanium I think). It failed miserably, especially when AMD came along and said 'sod this, let's just replicate the entire 32bit instructions alongside some 64bit stuff' and thus created x84-64 or AMD64 whatever - took off like a flash and that's what we use today.
Also no one seams to have told them about the law of diminishing returns.
The idea is that to exchange data between threads (or Synchronize) costs time. So if your trying to get them to solve the same problem, you end up wasting more percentage of the CPU time on synchronization, so the more cores you add the less the advantage.
Now if your doing something like a monte carlo sim, where by your just plugging the same forumula with different (random) numbers in comparative isolation (that is to say, if the forumula takes 1 hour to evaluate, who cares about the 10ms for thread synchronization). then this could be useful.
Regrettably all to often problems are hard to parralize like that, and to be trying to do this in an imperative language like C isn't going to make things easier.
throw new ArgumentException (String, String, Exception)
Perhaps that's something we could put to them, TheAnimus?
In fact, I could find out if XMOS is willing to answer questions fielded by the HEXUS.community in a subsequent article?
I know there are quite a few programmers around these parts, as well as EEEs, so I'm sure we could come up with some challenging questions
What say ye?
Would be interesting!
I'm sure they know full well about the diminishing returns, and it would be intreging to hear their answer.
throw new ArgumentException (String, String, Exception)
Hi,
Threads in a core can complete a barrier synchronisation in 20 ns; that is, 20 ns after the last threads joins a synchronisation all threads are running again.
Inter core, there is extra latency involved: around a 50ns between two cores inside a chip, and 100 + 100ns per hop between cores that are not on the same chip
So a complete synchronisation between two threads will take around 400ns-1000ns; you will need to hide this latency if you want to keep all nodes busy all the time. In some cases you can hide this latency simply by running more than 4 threads on a node - if you run 8 threads they will all run at 50 MIPS. When one thread is blocked for a microsecond, then the other threads will speed up to 57 MIPS for that period.
Yes! XC only allows you to explicitly express the parallelism - would love somebody to port some different languages. Did I hear you volunteer?
Cheers,
Henk
Permission granted.
You're half right. [I hope I haven't mis-quoted you here.]
In the sense that the architecture hasn't migrated down to consumer level yet (and may never do), you're right.
(This assumes, of course, that Intel WANTED it to channel down.)
However, the fact that there was a follow on (i.e. Itanium 2), and more Itaniums are planned, suggests it has had success in some markets.
The 16-CPU board wouldn't fit inside all that many desktop cases, and certainly not laptops.
I appreciate that it was a demonstration, but it was a demo the size of most motherboards!
The price can be estimated, given that the chips cost $33 each. 16x$33 = $528 plus $50 for the PCB gives a price of about $2,000 - $2,5000 by the time it reaches the customer.
Leon
There are currently 1 users browsing this thread. (0 members and 1 guests)