Results 1 to 8 of 8

Thread: HPE users: patch our SAS SSDs to quash permanent crash bug

  1. #1
    HEXUS.admin
    Join Date
    Apr 2005
    Posts
    31,709
    Thanks
    0
    Thanked
    2,073 times in 719 posts

    HPE users: patch our SAS SSDs to quash permanent crash bug

    Users should update the firmware to prevent crash bug occurring after 32,768 hours of use.
    Read more.

  2. #2
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    12,978
    Thanks
    778
    Thanked
    1,586 times in 1,341 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    Hopefully the patch doesn't make it fail at 65536 hours of use (which would be outside warranty )


    edit: Ooh, I speculated an overflow as soon as I saw the 32768 number, guess that makes me an expert!
    Last edited by DanceswithUnix; 27-11-2019 at 02:24 PM.

  3. Received thanks from:

    mtyson (27-11-2019)

  4. #3
    Senior Member
    Join Date
    May 2014
    Posts
    2,385
    Thanks
    181
    Thanked
    304 times in 221 posts

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    Quote Originally Posted by DanceswithUnix View Post
    Hopefully the patch doesn't make it fail at 65536 hours of use (which would be outside warranty )


    edit: Ooh, I speculated an overflow as soon as I saw the 32768 number, guess that makes me an expert!
    That was immediately my first thought thinking this is juat simply a bug with a maximum value overflow!

    Such a silly bug to have in 2019 xD

  5. #4
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    12,978
    Thanks
    778
    Thanked
    1,586 times in 1,341 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    Quote Originally Posted by Tabbykatze View Post
    That was immediately my first thought thinking this is juat simply a bug with a maximum value overflow!

    Such a silly bug to have in 2019 xD
    It isn't usually the overflow that directly kills your code though, it is usually some secondary effect like using the resulting -32768 value from the overflow to search/index into a table which doesn't have any entries suitable for negative numbers. Given that power on hours isn't usually considered that important a metric I can imagine it not being that heavily tested either.

    OTOH, if it was something like using the top bit as a debug flag then someone needs to be taken out and shot

  6. Received thanks from:

    afiretruck (27-11-2019)

  7. #5
    Senior Member
    Join Date
    May 2014
    Posts
    2,385
    Thanks
    181
    Thanked
    304 times in 221 posts

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    Quote Originally Posted by DanceswithUnix View Post
    It isn't usually the overflow that directly kills your code though, it is usually some secondary effect like using the resulting -32768 value from the overflow to search/index into a table which doesn't have any entries suitable for negative numbers. Given that power on hours isn't usually considered that important a metric I can imagine it not being that heavily tested either.

    OTOH, if it was something like using the top bit as a debug flag then someone needs to be taken out and shot
    Ha ha, chemical sheds and the ditches!

    It is very interesting that the drive is completely inoperable/irrecoverable when this value is hit which definitely follows your logic of the secondary effect, maybe the time is used as a calculation in SMART, the SMART crashes and takes the controllers with it?

    Edit: to qualify my thought, the flipped bit would make a negative time so the calculations, if uncaught, will just drop out of range. Why they're counting time using a signed 16-bit integer is a little bit odd...
    Last edited by Tabbykatze; 27-11-2019 at 04:31 PM.

  8. #6
    root Member DanceswithUnix's Avatar
    Join Date
    Jan 2006
    Location
    In the middle of a core dump
    Posts
    12,978
    Thanks
    778
    Thanked
    1,586 times in 1,341 posts
    • DanceswithUnix's system
      • Motherboard:
      • Asus X470-PRO
      • CPU:
      • 5900X
      • Memory:
      • 32GB 3200MHz ECC
      • Storage:
      • 2TB Linux, 2TB Games (Win 10)
      • Graphics card(s):
      • Asus Strix RX Vega 56
      • PSU:
      • 650W Corsair TX
      • Case:
      • Antec 300
      • Operating System:
      • Fedora 39 + Win 10 Pro 64 (yuk)
      • Monitor(s):
      • Benq XL2730Z 1440p + Iiyama 27" 1440p
      • Internet:
      • Zen 900Mb/900Mb (CityFibre FttP)

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    Quote Originally Posted by Tabbykatze View Post
    Why they're counting time using a signed 16-bit integer is a little bit odd...
    Thinking about it, there is a good chance they aren't, and this isn't an overflow...

    Imagine you store that value in a word of flash, then every hour you erase the page it is in and re-write it with the new value one higher. That's 65535 writes to a page just to store one thing, where a page has an endurance in modern flash devices of about 3000 writes. Just to count.

    Now imagine you choose an 4KB page of flash, that's 32768 bits in total. On first ever power up you clear the page so all the bits are 1's. Every hour, you clear one bit. Flash is written by erasing an entire page of bytes to all 1 bits (as in each byte 0xff) and then clearing the bits you want cleared to get the value you wanted stored. So you can actually zero a bit in flash at any time without erasing it first (flash programming fun fact!), you only need to erase to flip a zero into a one. Now you get 3.7 years of counting hours before you have to erase to count the next 3.7 years, so your 3000 erase endurance gets you 11000 years of counting. Handling the 3.7 year boundary would take some careful testing though (how many cycles you had been through being stored elsewhere).

    That's probably how I would do it anyway, and given storage devices use a 4K filesystem page that fits nicely.

    Hmm, so now I don't think it is an overflow. Will have to hand my expert title back

  9. #7
    Senior Member
    Join Date
    Apr 2016
    Posts
    772
    Thanks
    0
    Thanked
    9 times in 9 posts

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    someone found out, they had to do something about it... simple as that

  10. #8
    Long member
    Join Date
    Apr 2008
    Posts
    2,427
    Thanks
    70
    Thanked
    404 times in 291 posts
    • philehidiot's system
      • Motherboard:
      • Father's bored
      • CPU:
      • Cockroach brain V0.1
      • Memory:
      • Innebriated, unwritten
      • Storage:
      • Big Yellow Self Storage
      • Graphics card(s):
      • Semi chewed Crayola Mega Pack
      • PSU:
      • 20KW single phase direct grid supply
      • Case:
      • Closed, Open, Cold
      • Operating System:
      • Cockroach
      • Monitor(s):
      • The mental health nurses
      • Internet:
      • Please.

    Re: HPE users: patch our SAS SSDs to quash permanent crash bug

    Why don't they just build the dam higher? Or stop putting water in the drive full stop? Sounds a bit silly to me.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •