SSD Firmware Resilience: Approaches & Challenges to protect against emerging threats

Abstract

About 80% of enterprises have experienced at least one firmware attack in the last two years* What is firmware resilience, how does it apply to SSDs to address these threats? - Not a new concept. Elements of Resiliency have been around for years (multiple FW slots / copies, etc.) - Platform Firmware Resiliency Guideline (NIST SP800-193) was published in 2018 - Industry momentum around resiliency has seen from Intel**, Lattice*** and PC OEMs - Still an emerging area that we see proprietary implementations from different vendors As SSD solution provider, we want to help define a unified solution in the area of SSD recovery - Possible approach (and challenges) in client storage space where: All firmware binaries are immutable code and hence are always RSA digitally signed. They are subjected to RSA public key signature verification before being used Multiple firmware slots with copies within each slot as redundant firmware images for SSD auto recovery When SSD auto recovery failed, some mechanisms for host to discover host assisted recovery is required (i.e. either via PCIe or SMBus) When in host assisted recovery mode, device supports a limited number of admin commands for recovery (i.e. Identify, fw download/commit, Get Log Page) Require user intervention to initiate a host assisted (typically initiated via BIOS menu option) semi-automatic recovery Using a golden image maybe saved in BIOS, it is expected to restore to SSD manufacturer default state without preserving user data and any security parameters - Possible approach (and challenges) in data center storage space where: Recovery of device FW SHALL occur over a management interface Recovery of device FW SHALL only occur after a catastrophic event (e.g. no device self-recovery mechanisms succeed, FW attestation failure, etc.) Device SHALL advertise it’s need for recovery (e.g. OCP Recovery specification) FW image used for recovery SHALL be provided to the device over the management interface When a device is in need of recovery, the MANDATORY goal is device recovery. An OPTIONAL goal is recovery of user data. - Call to action to address perceived challenges Examples include… How to maintain a “golden” recovery image? Is there really such a thing? What are the industry defined interfaces to invoke recovery? Other gaps in industry specifications around recovery *Based on Microsoft’s Security Signals report published in 2021: https://www.microsoft.com/security/blog/2021/03/30/new-security-signals-... ** Intel Platform Firmware Resiliency (Intel® PFR): https://www.intel.com/content/www/us/en/products/docs/processors/xeon/pl... *** Lattice Universal Platform Firmware Resiliency (PFR) – Servers implementation: https://www.latticesemi.com/pfr

Gamil Cain
Solidigm
Related Sessions