Huawei preps AI SSD to ease GPU memory bottlenecks

Huawei is reportedly about to launch an AI SSD that will work with its Unified Cache Manager (UCM) software to offload key-value (KV) cache data from a GPU’s high bandwidth memory and speed AI processing by avoiding KV cache data recomputation.

When a large language model (LLM) is executing, it stores data in the form of keys and values in the GPU’s High Bandwidth Memory (HBM). In the case of long inferencing runs, this cache can fill with fresh KV data, evicting older data, causing it to be recomputed when needed again. This recomputation can extend an LLM’s run-time, delaying its response to user requests. By storing the evicted KV data in a connected SSD, it can be retrieved when needed, shortening model response times.

The coming Huawei AI SSD is said to help solve the memory wall problem affecting GPU servers, where the limited amount of HBM capacity prolongs computation times. US technology export restrictions hamper China’s efforts to use the latest GPUs and HBM. Its domestic memory manufacturers have not yet developed their own HBM tech. The Huawei AI SSD will have a large but unspecified capacity and fast, but again unspecified, I/O performance.

VAST Data and WEKA, with its Augmented Memory Grid, have software to offload KV cache contents to SSDs, as does China’s YanRong storage system supplier. PEAK:AIO and Pliops also have KV cache offload offerings.

Huawei’s scheme relies on its UCM software to provide a tiered KV cache concept, embracing GPU HBM and CPU DRAM memory and directly connected SSD storage, with data moved up and down the tiers as needed. It also has existing XtremeLink technology using eKitStor Xtreme 200E SSDs with a PCIe Gen 4 x 4 lane connection providing up to 6.5 GBps read speed and 7 GBps write speed. 

That’s good but not actually extreme. SK hynix’s Platinum P41 M.2 SSD offers 7 GBps read and 6.5 GBps write speeds with the same PCIe Gen 4 x 4 setup. Huawei would need to venture into PCIe Gen 5 interconnect technology to get faster speeds in the 12-14 GBps area for reads and writes. YanRong already has domestically manufactured PCIe 5 NVMe SSDs.

The report also mentions Huawei’s SpeedFlex PCB technology, related to thermal reliability and optimized data transmission on its printed circuit boards (PCBs). This hardly seems cutting-edge technology.

The report says the AI SSD with UCM, XtremeLink, and SpeedFlex “represents a key breakthrough for domestic SSDs. Huawei will collaborate with domestic training and inference machine manufacturers, which will help China build a new AI ecosystem and meet the challenges of globalization.”