At SC25, Phison showed off the potential of its aiDAPTIV+ hardware and software solution by fine-tune training the Llama 3.1 405 billion parameter model on a single server equipped with two GPUs and 192 GB of VRAM. This task normally requires a combined VRAM pool of over 7 TB and multiple NVIDIA servers — a large and costly setup. It was all done on a single system that would cost around $50,000, thanks to the power of aiDAPTIV+.
The secret sauce in making this work was an 8TB aiDAPTIVCache SSD, used to store the calculated weights and values for the LLM. aiDAPTIV+ breaks down the complex task of fine-tuning into smaller, more manageable chunks rather than trying to fit everything into VRAM. The result is the democratization of access to more powerful AI models for a wide range of organizations.
As an example of the versatility that aiDAPTIV+ brings to LLM fine-tune training, this system completed a task that would normally require a dozen NVIDIA HGX H100 servers, each with eight H100 GPUs. These servers require far more space and electricity, and costs are significantly higher than the aiDAPTIV+ server that accomplished the same task inside the Phison booth.
What is aiDAPTIV+?
aiDAPTIV+ is a hardware and software solution engineered to extend the memory space for fine-tune training of AI models and accelerate time to first token (TTFT) for inference workloads. A specialized aiDAPTIVCache SSD serves as a staging area for the unpacked LLM’s tensor weights and vectors for training. The aiDAPTIVLink middleware manages memory allocation and determines how to best utilize available resources, shifting data to and from the aiDAPTIVCache SSD into VRAM or DRAM as needed.
aiDAPTIVCache SSDs are purpose built for this task, with an extremely high endurance rating of 100 DWPD (drive writes per day). The SSDs achieve this level of endurance and availability of constant high-speed throughput utilizing SLC NAND. It’s possible to write over 300TB per day to an aiDAPTIVCache SSD under heavy use, which could burn out regular SSDs in less than two months. aiDAPTIVCache drives are rated to handle this demanding scenario for five years of constant use.
While the time to complete a fine-tune training process increases compared with utilizing pure GPU VRAM, most of the reduced performance comes from having fewer GPUs and less compute available. The aiDAPTIVCache SSDs are fast enough to not cause a severe bottleneck, resulting in training times around 5% longer than pure GPU-based training on a similar number of GPUs. Direct comparisons aren’t particularly meaningful, however, as aiDAPTIV+ enables the use of significantly larger models on GPUs that otherwise can’t fine-tune the dataset.
aiDAPTIV+ AITPC running Pro Suite on two RTX 4000 Ada GPUs.
Llama 3.1 405B aiDAPTIV+ server training details
The server utilized by Phison at SC25 featured the following hardware that enabled training of the Llama 3.1 405B model.
aiDAPTIV+ supports a variety of modern platforms, including Intel and AMD CPUs, ARM processors, a variety of motherboards and chipsets, and a wide range of memory capacities. The CPU isn’t a major factor in running aiDAPTIV+, but workstation and server platforms provide significantly higher connectivity options. Client solutions work but don’t support as many PCIe lanes, which has a direct impact on the performance of the storage devices as well as the GPUs — especially when using more than one GPU and multiple SSDs.
The Phison SC25 server used Intel’s Xeon W5-3435X processor from the Sapphire Rapids family, which provides 112 total lanes of PCI Express 5.0 connectivity. Each of the RTX Pro 6000 GPUs uses a x16 PCIe Gen5 slot, with the SSDs using U.2 x4 connections. The server as configured represents more of a baseline configuration, with plenty of room for expansion options including additional aiDAPTIVCache SSDs and GPUs to increase performance.
The aiDAPTIVCache SSD used at SC25 was an 8TB U.2 model, a required capacity to handle the training of a 405 billion parameter model. Fine-tune training of an LLM requires approximately 20 times the memory capacity as the number of parameters. In this case, 405 billion parameters equates to around 8 TB of memory. That memory is traditionally VRAM, or system RAM in some scenarios, both of which are very expensive in such quantities. aiDAPTIV+ augments the available memory with 8 TB of NAND flash storage.
It requires around 26 of the latest NVIDIA B300 Ultra accelerators to train a 405B model using a traditional approach of keeping all the data in VRAM rather than with aiDAPTIV+, which means using four servers with eight GPUs each — and each individual GPU would cost as much as the entire aiDAPTIV+ server. Alternatively, it takes about 40 NVIDIA B200 GPUs spread over five servers, 53 H200 GPUs running in seven servers, or 93 H100 accelerators housed in a dozen HGX H100 servers. It would also require all the networking and interconnect hardware, with a total power use in the 40 kW to 80 kW range. The aiDAPTIV+ server only used about 1 kW and required no additional infrastructure.
aiDAPTIV+ training results and performance
Phison used the Dolly creative writing benchmark found on Hugging Face to fine-tune the Llama 3.1 405B LLM for the SC25 demonstration. Different datasets affect the amount of time required for training, though scaling would still be similar across different hardware solutions.
The base Llama-3.1-405B-Instruct model is 2 TB in size, using the Safetensors format. The first step of the training process consists of unpacking the LLM tensors into memory — or to the aiDAPTIVCache SSD in this case. This results in over 5 TB of data and requires about 30 minutes to generate, with peak requirements during the training process of over 7 TB of memory utilization.
The Dolly dataset consists of 771K characters and 672 Q/A pairs, and the aiDAPTIV+ server completed the fine-tune training at a rate of 25 hours and 50 minutes per epoch. The entire process finished in two days and four hours with a training run of two epochs.
Over the four days the aiDAPTIV+ server ran at SC25, aiDAPTIV+ wrote hundreds of terabytes of data to the aiDAPTIVCache SSD. For typical enterprise SSDs rated at 1 DWPD, that level of writes will wear out the flash in a matter of weeks — less for lower capacity drives. With an endurance rating of 100 DWPD, Pascari aiDAPTIVCache SSDs are tailor made to handle this type of workload, running 24/7/365 for five years.
Pascari AI200E 8TB U.2 SSD used for fine-tune training of Llama 405B.
Start doing more with aiDAPTIV+
aiDAPTIV+ provides an important path for expanding the available memory for fine-tune training, and with a global shortage of DRAM causing a spike in memory prices, it does so at a fraction of the cost. It opens up the potential to leverage AI at all levels of business, using on-premises hardware that doesn’t require a massive financial investment, a data center, or large amounts of electricity. aiDAPTIV+ works for research and educational facilities, allowing experimentation with larger models without expensive cloud contracts or data sovereignty concerns.
Phison’s SC25 demonstration proves that a massive data center and costly infrastructure isn’t needed to work with the biggest AI models. aiDAPTIV+ delivers a practical, affordable path to fine-tuning models once reserved for hyperscalers by combining purpose-built aiDAPTIVCache SSDs with intelligent memory orchestration. For organizations looking to unlock larger models and true on-premises AI independence, aiDAPTIV+ represents a compelling shift in what’s possible.
Visit Phison to explore how aiDAPTIV+ can transform your AI workflows and see the technology in action.












