In-SRAM Compute For Generative AI and Large Language Models

Abstract

The recent uptick in generative artificial intelligence (GAI) has put the more pressure on hardware vendors to reduce the carbon footprint of running these power hungry large language models (LLM) in the datacenter. One way to accomplish a lower in-silicon power profile is to break the Von-Neumann bottleneck by tightly integrating traditional SRAM memory cells with interleaved programable processors in the same die. We report on our progress in this area, in particular, leveraging recent open research in both mixed precision mathematics and extreme low-bit quantization of deep learning model parameters and activations running in our custom "In-SRAM" processor.

George Williams
GSI Technology
Related Sessions