You can never find a decent diagram when you need one, have a look at page 15 on this http://www.amd.com/us-en/assets/cont...docs/40555.pdf
(the document is all about accessing memory on different sockets of a multi socket system):
Both cores <-> linked to the System Request Interface (SRI) <-> linked to the Crossbar (XBar) <-> linked to the memory controller (MCT) and HyperTransport link(s). I'm sure that even this is a bit simplistic as well. (AMD just put enough info into the diagrams to help explain the document that contains it.)
Consider core0 reads address A0 - it gets the data from main memory and puts it in its cache, then core1 reads address A0, lets say core0 does not give the data to another other cores (one of those loner cores that keeps themselves to themselves) A0 is read from main memory and puts it in its core1 cache. Core0 updates it's cache entry for address A0 with '10' ... core1 updates it's cache entry for address A0 with '30', both then try to update the main memory what should A0 be 10 or 30. Don't answer that! Just explains why you have lots of books and documents on Cache Coherency Theory. BTW it not just a cache problem consider a DMA controller updating A0 after the cores have read the address.
Caches need updating so they all agree at any single point time what the data in A0 is, so they will communication with each other. I agree they do not have to supply data to each other but if you read about Cache Coherency Methods you will see that there are opportunities for cores/CPUs to do this.In AM2 X2s, each core has access to 128 KB L1 Data, 128 KB L1 Instructions, and 512 KB or 1 MB L2. They have no access to any other core's caches. It has no way of knowing what the data is in the L2. As far as I know, it's impossible to share an L1 cache because the communication just to reach it would cause the core to wait at least one extra cycle. As far as I know, each core doesn't share anything with other cores besides the die.
At a very simple level core0 has read A0, when core1 reads A0 core0 is 'snooping'* the bus sees the read of A0 and invalidates its cache entry. It's a poor implementation (and probably has never been implementation - it's a theoretical solution that creates other problems) there are more complicated, and useful, methods see the web. AMD uses MOESI I cannot see a document on AMD site, google found this: http://www.techreport.com/reviews/20...5/index.x?pg=2)
Interested to see what you can find out, can AMD explain the xbitlabs article?
* sometimes referred to as sniffing