Grokking Lossless Data Compression

Abstract

Emerging Deep Learning/ Machine learning and cloud native Applications at data center scale demand terabytes of data flowing across the storage/ memory hierarchy, straining interconnect bandwidth and component capacities. The Industry has responded with a wide range of solutions like process node shrink, higher capacity devices, new tiers, innovative form factors, new interconnect technologies and fabrics, new types of compute architectures, new algorithms and more to creatively leverage storage/memory tiering.

New paradigms like Computational storage/ memory accelerator offloads are under intense exploration to process data where it resides to ease movement of exponentially generated data. At the same time, progress has hit the proverbial wall: practical hurdles limit the scalability at every level of the memory hierarchy. On-die SRAM scaling seems to have completely stalled going from 5nm to 3nm, limiting processor IPC (Instructions per cycle) performance. Main Memory bandwidth per processor core growth slowed dramatically compared to the growth of compute "FLOPs". New memory tiers like CXL memory dramatically increase capacity per core, but at the expense of latency and the need for all new infrastructure. QLC SSDs provide Terabytes of capacity in a single device, but are limited by endurance and overprovisioning requirements. Staying within established power, thermal and cost budgets at each level of the hierarchy and at the system envelope level is critical to ease new technology introductions.

To address these challenges, Data Center customers, component manufacturers and researchers alike are investigating or have implemented several innovations, like lossless compression technology, at various levels of the hierarchy, to increase capacity, enhance effective bandwidth and stay within cost and power budgets. Compression requires more than just algorithmic implementation...compaction, management, software compatibility are critical considerations in order to be widely deployable at scale.

One size does not fit all: choices need to be made between various industry standard and proprietary algorithms, operating at varying granularities: cache line, page or file. CXL memory semantic SSDs are emerging, compression technology requires integration with cxl.io, cxl.mem semantics, dynamic capacity has to be addressed. Offload accelerators are now available within several platform ingredients, but choices need to be made carefully between processor-integrated accelerators, cores on SmartNICs ("DPUs", "IPUs"), IP/firmware integrated into SSD and CXL controllers/ switches, "AFU" (Accelerator Functional Unit) on board specialized FPGAs and purely software offloads.

In this panel session, we will explore the need, opportunities, challenges and implications of emerging data compression techniques and accelerators associated with storage and memory technologies through diverse viewpoints of ecosystem participants, including an SOC Architect, technologists in the storage/memory device and controller space, Academic Researcher in the storage and systems domain as well as Hardware IP provider. We will simulate the type of discussion that typically takes place between technologists, architects and end customers to meet design and TCO requirements, requirements to integrate into existing kernel and application software stacks. Attendees will have an opportunity to ask questions of the panel and share their collective industry/ research insights.

Download Presentation

Nilesh Shah

ZeroPoint Technology AB

Scott Shadley
Solidigm Technology, SNIA
Mats Oberg
Marvell Technology
Pulkit Jain
AMD (Advanced Micro Devices)
Heiner Litz
University of California, Santa Cruz (UCSC)
Angelos Arelakis
ZeroPoint Technologies

Related Sessions

Emerging Technologies

KV-CSD: An Ordered, Hardware-Accelerated Key-Value Store For Rapid Data Insertion and Queries

Rapidly increasing data sizes, the high cost of data movement, and the advent of fast, NVMe-over-fabric based flash enclosures have led to the exploration of computation near flash for more efficie

Qing Zheng

Los Alamos National Lab

Favorites

Emerging Technologies

SoC Construction Using Universal Chiplet Interconnect Express (UCIe): A Game Changer

High-performance workloads demand on-package integration of heterogeneous processing units, on-package memory, and communication infrastructure to meet the demands of the emerging compute landscape

Debendra Das Sharma

Intel

Favorites

Emerging Technologies

Versity Gateway, an Open Source High Performance Object to File Translation Tool

Gary Grider will kick off the talk with a brief background of the challenges currently faced in mass storage systems, and why it is so difficult for modern S3 based workloads to utilize them.

Ben McClelland

Versity Software, Inc.