Page 2 of 2 FirstFirst 12
Results 17 to 25 of 25

Thread: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

  1. #17
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Gateshead
    Posts
    15,196
    Thanks
    1,231
    Thanked
    2,291 times in 1,874 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 2x 4GB DDR4 2666
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by Corky34 View Post
    … I get where your coming from but I'm not sure i agree, sorry. ...
    *shrug* I have no problem with you not agreeing, so there's no need to apologise for that

    OTOH, the results clearly show that there's no cache latency increase @ 8MB step (unlike the earlier Zen designs), and that the latency @ 32MB step is higher than at 64MB step. I've thrown my best attempts at explaining those results into the ring, and I'm happy for people to offer alternate theories, or even to pick holes in my logic if I've missed something. It gets a bit tiresome if people just turn up and go "well, you might be wrong" without offering any reason, though...

  2. #18
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    12,986
    Thanks
    781
    Thanked
    1,588 times in 1,343 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by Corky34 View Post
    See if i can put it another way, if you're measuring how long it takes to read/write a block of memory you're not measuring latency you're measuring access time, to measure latency you need to send a ping from one place to another otherwise you're just measuring how long it takes read/write X amount of data.

    It's the difference between saying i can download a 40MB file in 1min and saying I've got 1500ms latency to the server I'm downloading from.
    I wonder if you are interpreting the memory size on the X axis of that graph as some sort of download size? It is a working set size, you thrash memory within that limit and I would hope these days it would be a pseudo random pattern or you are just measuring the prefetcher.
    So at the 8kb mark the program should be measuring how many random accesses it can perform within an 8kb block in a second, which can be satisfied from the L1 D$ so measures the latency of that. Random accesses within a 128KB block will hardly ever hit the L1 cache, so you end up with a measurement of L2 performance. Yes there is an I/O queue involved like in your link about hard drives, but if you aren't hammering the memory read queue then OoO processing will rearrange the instruction order to hide the latency making it uninteresting as you tend to only measure bottlenecks that hurt performance, and also makes it pretty hard to measure. Besides, it isn't a long queue compared with a hard drive where IO queues can lead to seconds of overall response time.

    It is a simple and useful test which as Scaryjim says highlighted the inter CCX latency of current Ryzen chips. If that inter CCX bump is gone, from the view of a single core, it is an interesting point.

    Edit: Another way to think of it, caches are statistical beasts, so practically you can't send a single ping as you would have no idea what you are pinging. It requires repeated accesses to find how well the main memory latency of those accesses are hidden from the CPU. Also your first access has no need to be in cache, so you are probably grabbing it from main memory. The second access you are probably grabbing it from cache, but you don't know for sure.
    Last edited by DanceswithUnix; 27-01-2019 at 09:19 AM.

  3. Received thanks from:

    scaryjim (27-01-2019)

  4. #19
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    It is some sort of download size though, isn't it? If we run a process that requires 8kb of memory to workout the answer to a problem then that data needs to be downloaded (and uploaded) to that 8kb of memory. How the caches work though is slightly missing the point I'm trying to make though, that the 'problem' of inter-die latency that people identified with Zen/Zen+ isn't actually a problem IMO, it was identified as a 'problem' because people where looking for what caused the degradation of performance in some scenarios.

    Basically I'm saying that if it wasn't for the crappy way Windows dealt with Zen/Zen+ we wouldn't/shouldn't have know about the 'problem' of inter-die latencies as we should have seen similar results as on an OS that dealt with it correctly, such as Linux. I'm also saying that from the perspective of Zen2 that design changes have probably helped to not confuse Windows.

    My guess is that the latencies are still there but it's just Windows that's doing what it should have been doing all along with Zen/Zen+, if someone conducted a ping test from one core on one die to another core on another die that we'd still see an increase in latency as PCPer demonstrated, however as the latencies should never have been 'visible' to the OS in the first place it's all a bit academic IMO.
    Last edited by Corky34; 27-01-2019 at 06:16 PM.

  5. #20
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Gateshead
    Posts
    15,196
    Thanks
    1,231
    Thanked
    2,291 times in 1,874 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 2x 4GB DDR4 2666
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by Corky34 View Post
    … Basically I'm saying that if it wasn't for the crappy way Windows dealt with Zen/Zen+ we wouldn't/shouldn't have know about the 'problem' of inter-die latencies ...
    Hmmm... perhaps "problem" is too strong a term, but the design brings with it inherent performance issues. Windows has occasional issues with the more complex NUMA patterns on Threadripper, but the simple fact is that Ryzen's split L3 cache means that it doesn't performance like a 16MB cache processor.

    This is nothing to do with core - to - core latencies, which are a different thing with a different set of performance issues; it's about how much cache each core can access, and at what speeds. Ryzen advertises itself as a 16MB cache processor, but each core can only access 8MB at full speed, and the remainder is barely any faster than main memory. If it also tells the OS it's a 16MB cache processor and all cores can access all of the cache, then tasks which like a lot of fast cache are going to drag on Ryzen.

    That's the main reason I'm interested in that lack of that cache slow-down at 8MB: it shows AMD have done something. That's going to impact IPC for Zen 2 Ryzen CPUs with up to 8 cors, as they do have full-speed access to 16MB of cache. Of course, personally I'm hoping they've done something clever, rather than just switching to big 8-core CCXes...

  6. #21
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by scaryjim View Post
    ...and the remainder is barely any faster than main memory. If it also tells the OS it's a 16MB cache processor and all cores can access all of the cache, then tasks which like a lot of fast cache are going to drag on Ryzen.
    But the point I'm trying to make is that, yes you're correct to say transferring data from an 8MB chunk that's on the opposite die is slower however that only effects performance because of the way that's handled by Windows (IMO), the reason i say that is because we're only talking about adding 100ns to the already really low 40ns for pining between cores on the same die, when we compare that to Intel we're looking at inter-thread ping times of around the 80-90ns.

    It's why i suspect we could be talking more, like you alluded to, a speed/bandwidth issue with the inter-die link combined with the awful way Windows handles Ryzen when looking at performance problems, adding an extra 100ns shouldn't cause that many issues (before i get told off for getting the latency numbers wrong I'd like to say that the UserBenchmark db is way off in its latency numbers IMO)

    Quote Originally Posted by scaryjim View Post
    That's the main reason I'm interested in that lack of that cache slow-down at 8MB: it shows AMD have done something. That's going to impact IPC for Zen 2 Ryzen CPUs with up to 8 cors, as they do have full-speed access to 16MB of cache. Of course, personally I'm hoping they've done something clever, rather than just switching to big 8-core CCXes...
    Yea, i still don't see the 8-core CCX being a thing. I suspect that it's simply because Windows is no longer getting confused over whether it should be treating the CPU (as a whole package) as a NUMA or UMA node, that and what i suspect is the switch over to PCIe 4.0 (yes i know that's reportedly only used for off package communication, but I've got a sneaky feeling that's somehow linked to inter-die communications as switching to 4.0 doesn't seem to be something there's much need for currently)
    Last edited by Corky34; 28-01-2019 at 09:21 AM.

  7. #22
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    12,986
    Thanks
    781
    Thanked
    1,588 times in 1,343 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by Corky34 View Post
    It is some sort of download size though, isn't it?
    Not really, no. I can see how you can see it like that, but in doing so you are putting your head in the wrong place which will lead you down the wrong path. But...

    How the caches work though is slightly missing the point I'm trying to make though
    arguing over a cache latency measurement is probably not helping then

    Still, NUMA drivers can only work with the underlying hardware which is what that cache benchmark is measuring. If Windows could detect that the thread was hammering L3 on the other CCX the only thing it could do is migrate the thread to the other CCX, in doing so it performs a context switch and invalidates the L1 and L2 caches of both cores which is horribly expensive. That was shown in the recent article where MS got their NUMA code wrong, it wasn't that they were placing threads badly as that would only cause a mild impact on performance. AIUI they were moving threads around, that can easily cause the halving in performance reported.

    But when the benchmark above is measuring the L3 cache it is thrashing *all* of it. There is no optimum CCX to be in, the system just has to wear it.

    we're only talking about adding 100ns
    Whoah there! ... and this is what cache is all about so let's do the maths.
    100ns, at 4GHz we get 4 clocks per ns. So that's 400 clock cycles. A modern core can execute three instructions per clock (Bulldozer was slated for only averaging two) so 100ns is the time it would ideally take to execute 1200 instructions. Can't remember the Ryzen reservation size, but I think it is around the 300 mark? So 100ns is massively beyond what out of order instructions can work around and the CPU is halted on a dependent read. On the upside, you just gave the core's second thread plenty of time to use up

  8. #23
    Senior Member
    Join Date
    Dec 2013
    Posts
    3,526
    Thanks
    504
    Thanked
    468 times in 326 posts

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by DanceswithUnix View Post
    Not really, no. I can see how you can see it like that, but in doing so you are putting your head in the wrong place which will lead you down the wrong path. But...
    Bah, it's OK, my heads so used to being there I'm starting to think it's set up home.

    Quote Originally Posted by DanceswithUnix View Post
    arguing over a cache latency measurement is probably not helping then
    Wasn't me who started this.

    Quote Originally Posted by DanceswithUnix View Post
    Still, NUMA drivers can only work with the underlying hardware which is what that cache benchmark is measuring. If Windows could detect that the thread was hammering L3 on the other CCX the only thing it could do is migrate the thread to the other CCX, in doing so it performs a context switch and invalidates the L1 and L2 caches of both cores which is horribly expensive. That was shown in the recent article where MS got their NUMA code wrong, it wasn't that they were placing threads badly as that would only cause a mild impact on performance. AIUI they were moving threads around, that can easily cause the halving in performance reported.

    But when the benchmark above is measuring the L3 cache it is thrashing *all* of it. There is no optimum CCX to be in, the system just has to wear it.

    Whoah there! ... and this is what cache is all about so let's do the maths.
    100ns, at 4GHz we get 4 clocks per ns. So that's 400 clock cycles. A modern core can execute three instructions per clock (Bulldozer was slated for only averaging two) so 100ns is the time it would ideally take to execute 1200 instructions. Can't remember the Ryzen reservation size, but I think it is around the 300 mark? So 100ns is massively beyond what out of order instructions can work around and the CPU is halted on a dependent read. On the upside, you just gave the core's second thread plenty of time to use up
    And i get that but when looking at Intel we're talking about 80ns to ping from one thread to another, with Ryzen on the same die we're talking about 40ns and going to another die we're talking about adding a 100ns to that (so 140ns total), you're correct in saying that's a long time for a CPU to be waiting and that it's more than an out of order instructions can work around but that applies to Intel also, that's why I'm saying the latency of reading/writing to the L3 on another die, and the extra 100ns, shouldn't cause the sort of performance issues SJ said he was concerned about.

  9. #24
    Not a good person scaryjim's Avatar
    Join Date
    Jan 2009
    Location
    Gateshead
    Posts
    15,196
    Thanks
    1,231
    Thanked
    2,291 times in 1,874 posts
    • scaryjim's system
      • Motherboard:
      • Dell Inspiron
      • CPU:
      • Core i5 8250U
      • Memory:
      • 2x 4GB DDR4 2666
      • Storage:
      • 128GB M.2 SSD + 1TB HDD
      • Graphics card(s):
      • Radeon R5 230
      • PSU:
      • Battery/Dell brick
      • Case:
      • Dell Inspiron 5570
      • Operating System:
      • Windows 10
      • Monitor(s):
      • 15" 1080p laptop panel

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by Corky34 View Post
    ... when looking at Intel we're talking about 80ns to ping from one thread to another, with Ryzen on the same die we're talking about 40ns and going to another die we're talking about adding a 100ns to that (so 140ns total) ...
    Still don't get why you're obsessing about thread-to-thread pings when everyone else is talking about cache latency...?

  10. Received thanks from:

    Zak33 (30-01-2019)

  11. #25
    HEXUS.timelord. Zak33's Avatar
    Join Date
    Jul 2003
    Location
    I'm a Jessie
    Posts
    35,176
    Thanks
    3,121
    Thanked
    3,173 times in 1,922 posts
    • Zak33's system
      • Storage:
      • Kingston HyperX SSD, Hitachi 1Tb
      • Graphics card(s):
      • Nvidia 1050
      • PSU:
      • Coolermaster 800w
      • Case:
      • Silverstone Fortress FT01
      • Operating System:
      • Win10
      • Internet:
      • Zen FTC uber speedy

    Re: AMD Matisse 12C 24T CPU spotted in UserBenchmark db

    Quote Originally Posted by scaryjim View Post
    Still don't get why you're obsessing about thread-to-thread pings when everyone else is talking about cache latency...?
    I don't understand why he's going on about thread-to-thread pings either.

    Corky...you're getting some stuff here a tad confused I reckon.

    Cache latency is a different subject. Please try not to muddy the water on this heavy tech stuff, because it is very specific and a right bugger to follow. I have had to re read it all twice before I realised you'd skipped subject.
    HEXUS is trying it's hardest to help people understand and grasp some pretty jolly techy stuff, and tbh we're lucky to have the brain power that we do. You're included in that... but it needs to be very specific sometimes and not side step.

    Quote Originally Posted by Advice Trinity by Knoxville
    "The second you aren't paying attention to the tool you're using, it will take your fingers from you. It does not know sympathy." |
    "If you don't gaffer it, it will gaffer you" | "Belt and braces"

Page 2 of 2 FirstFirst 12

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •