Grokking Lossless Data Compression

Abstract

Emerging Deep Learning/ Machine learning and cloud native Applications at data center scale demand terabytes of data flowing across the storage/ memory hierarchy, straining interconnect bandwidth and component capacities. The Industry has responded with a wide range of solutions like process node shrink, higher capacity devices, new tiers, innovative form factors, new interconnect technologies and fabrics, new types of compute architectures, new algorithms and more to creatively leverage storage/memory tiering.

New paradigms like Computational storage/ memory accelerator offloads are under intense exploration to process data where it resides to ease movement of exponentially generated data. At the same time, progress has hit the proverbial wall: practical hurdles limit the scalability at every level of the memory hierarchy. On-die SRAM scaling seems to have completely stalled going from 5nm to 3nm, limiting processor IPC (Instructions per cycle) performance. Main Memory bandwidth per processor core growth slowed dramatically compared to the growth of compute "FLOPs". New memory tiers like CXL memory dramatically increase capacity per core, but at the expense of latency and the need for all new infrastructure. QLC SSDs provide Terabytes of capacity in a single device, but are limited by endurance and overprovisioning requirements. Staying within established power, thermal and cost budgets at each level of the hierarchy and at the system envelope level is critical to ease new technology introductions.

To address these challenges, Data Center customers, component manufacturers and researchers alike are investigating or have implemented several innovations, like lossless compression technology, at various levels of the hierarchy, to increase capacity, enhance effective bandwidth and stay within cost and power budgets. Compression requires more than just algorithmic implementation...compaction, management, software compatibility are critical considerations in order to be widely deployable at scale.

One size does not fit all: choices need to be made between various industry standard and proprietary algorithms, operating at varying granularities: cache line, page or file. CXL memory semantic SSDs are emerging, compression technology requires integration with cxl.io, cxl.mem semantics, dynamic capacity has to be addressed. Offload accelerators are now available within several platform ingredients, but choices need to be made carefully between processor-integrated accelerators, cores on SmartNICs ("DPUs", "IPUs"), IP/firmware integrated into SSD and CXL controllers/ switches, "AFU" (Accelerator Functional Unit) on board specialized FPGAs and purely software offloads.

In this panel session, we will explore the need, opportunities, challenges and implications of emerging data compression techniques and accelerators associated with storage and memory technologies through diverse viewpoints of ecosystem participants, including an SOC Architect, technologists in the storage/memory device and controller space, Academic Researcher in the storage and systems domain as well as Hardware IP provider. We will simulate the type of discussion that typically takes place between technologists, architects and end customers to meet design and TCO requirements, requirements to integrate into existing kernel and application software stacks. Attendees will have an opportunity to ask questions of the panel and share their collective industry/ research insights.

Nilesh Shah
ZeroPoint Technology AB
Related Sessions