“Phison’s engineers reasoned that GPU performance scales with memory size, meaning high-bandwidth memory (HBM) DRAM, and proper management of this memory along with the use of cheaper NAND flash, can support large models with a lower number of GPUs than normal. Phison’s combined hardware and software approach to this problem, dubbed “aiDAPTIV+,” allows a single GPU to support LLMs at a somewhat slower performance than a multiple-GPU system at a significantly lower cost.”- techtarget.com
Source: techtarget.com