Read more.SpiNNaker's million processor cores can model a billion bio-neurons in real-time.
Read more.SpiNNaker's million processor cores can model a billion bio-neurons in real-time.
An interesting side note - I see the EU flag on there. If it's partially funded by the EU I wonder what the contract says? I doubt us leaving the EU was a consideration when it was written but I wonder if the University will have to buy out the EU's stake in the project?
Maybe we'll just be utter dicks and send them 3600 SATA cables to the Commission President? Each in a separate envelope, obviously. It's like paying someone in pennies.
The European Human Brain Project in a country that has no brain...
What waste of money and time!! They used ARMs instead of Threadrippers, OMG
The chip was taped out in 2010, so computing on a GPU was in its infancy when this was designed. At this point I have to wonder what the system brings to the table of any interest in the face of GPUs with fp16 MAC instructions and most research happening with tensorflow which has optimised hardware already in phones in people's pockets as well as Nvidia's new GPUs.
Edit: They aren't the only ones throwing CPUs at the problem though...
https://bit-tech.net/news/tech/cpus/...ccelerators/1/
Last edited by DanceswithUnix; 06-11-2018 at 09:55 AM.
I was wondering how many GPUs it would take to do this level of performance, but then I noticed it says it can perform 200 trillion operations per second.
An Nvida V100 PCIe card claims 112 Tflop of Tensor performance. So I really have to wonder what 10 racks of ARM9 cores costing millions can do that 2 cards in a server costing about 30K can't manage. If Nvidia tensor cores can't run the model they are using here, perhaps Radeon Instinct RI25 cards could with 24 Tflop of fp16 compute per card, so you need 9 cards to exceed the ARM system.
I'm guessing the ARM 9 core was chosen for its DSP extension, so those 200 trillion ops are likely integer not floating point (I haven't bothered looking up the instruction set).
Comments on DanceswithUnix: Very well said. I strongly believe that the choice of the CPU was based on internal political EU motives instead of technical ones. Reinventing the wheel is a common practice in european funding.
@DwU: I think it maybe more to do with flexibility and the amount of connections, Tensor processors and other GPGPUs are essentially ASICS whereas this is more like a really big FPGA.
This is a custom ASIC. They taped it out years ago, so probably at 40nm back then. Connections is largely where I think this beast falls down, connecting two GPUs together isn't hard. Connecting 67 thousand arm chips each with 15 active cores is either hard or slow (edit: or poorly connected with low flexibility so you can only exchange pulses with surrounding neurons).
Sorry, i mean the whole thing not each CPU.
And the connections don't necessarily need to be fast, although it help in the overall time it takes to run models, it's more, i guess, about being able to test what works and what doesn't before designing a smaller, simpler, version.
OK I think I get what you mean, though deciding they are going to rewire 67 thousand chips because they didn't get it right last time is going to be harsh.
Neurons are all about connections though. They seem to have made the hardest part of the problem as difficult as possible Each CPU needs to get a feed from perhaps 100 other cores, and they need to arrive fast. On a GPU that's all memory accesses so you are concerned with cache optimisation but that's got to be better than physical layout.
I would imagine it's designed so they don't have to rewire everything because they didn't get it right, i would guess every chip can talk to every other chip. They don't go into much detail on the interconnect other than to say it's a custom interconnect fabric and that the packets are small (40 or 72 bits) with a bisection bandwidth of over 5 billion packets/s.
There are currently 1 users browsing this thread. (0 members and 1 guests)