Rethinking Computational Storage: Unlock the Processing Power of SSDs

By Sebastien Jean | Apr 8, 2025 | All, Enterprise, Featured

Several years ago, the concept of computational storage was discussed among industry insiders and touted as a potential answer to the age-old question of how to maximize CPU processing power. The idea seemed compelling at first glance. Imagine if storage devices, like SSDs, could actually do some of the processing of the information they hold, so less data had to move between storage and the CPU. Theoretically, it might help you save on power, reduce the need for data transfer, and speed up computations.

To date, however, as with many seemingly revolutionary ideas, there just hasn’t been a way to make a business out of the concept—primarily because each use case is highly unique and it’s simply not scalable.

When engineers and developers talk about computational storage, too often it’s a pie-in-the-sky approach: “What if we could run Linux on a drive, and we just gave it bigger processors?” While the idea may seem innovative, it lacks focus and practical application. It’s misguided thinking that is overly complicated and driven by technological idealism. Ultimately, it won’t lead to the hoped-for benefits.

A smarter approach: Tailored acceleration

At Phison, we utilized our knowledge in NAND storage technology innovations to find a better way to offload the processing burden to an SSD, and that’s with a tailored approach to acceleration, one that focuses on the tasks that storage devices are best suited for: applying fixed operations to ranges of logical block addressing (LBA). We integrate specialized accelerators into our SSDs to handle specific tasks that don’t require excessive power or complexity.

For example, we create hardware accelerators that can perform specific operations at very high speeds, such as qualifying large datasets, object-based erasure coding, checksum verification, and filtering out irrelevant information before it even reaches the CPU. This allows for faster and more efficient data processing, particularly in high-demand environments like data centers or supercomputing clusters. By processing data at the SSD level, you can reduce the amount of data that needs to be moved across the PCIe bus or through the network, which alleviates congestion, eases bandwidth limitations, and speeds up overall performance.

By focusing on highly specific tasks that are considered “monkey work”, these accelerators can provide significant benefits without adding substantial cost or power consumption. The accelerated SSDs can handle large volumes of data much faster, all while consuming less power than traditional processors. Importantly, this approach can be scaled across multiple drives, creating a more efficient, parallelized system that outperforms traditional CPU-bound processing.

The host CPU is cable of doing all the tasks listed above faster than an individual SSD, but there are practical limitations to the total CPU DRAM bandwidth that can be assigned to non-OS tasks. Additionally, moving that data from SSD to DRAM consumes roughly half the DDR bandwidth available to the CPU. When factoring in that an all-flash storage chassis can have 30, 60 or even 90 SSDs, this presents a lot of offload capability to the appliance. A chassis with 90 Gen6 SSDs can process data at 2.5 TB/s without impacting any CPU resources. In this scenario, the SSDs perform pre-filtering and pre-compute tasks while the CPU manages more important operations.

Emerging applications in HPC and security are changing the computational storage landscape

Recently, Phison has pivoted and begun to look at new ways to offload some of the CPU workload to SSDs that goes a step beyond targeted accelerators. In some cases, the company is even adding CPU clusters to the storage array—but the big difference here is that the CPU clusters aren’t being used to do calculations, but to actually run web services or microservices. They show up as additional addressable CXL services on the PCIe bus.

Consider an AI project, for instance, which involves a lot of pipelining. That’s where one large language model (LLM) produces data and outputs it, then another LLM takes it and transforms it and sends it to another LLM, and so on. An example of this would be video translation of a TED talk, in which an LLM extracts the English audio and turns it into text, another LLM translates the text into Chinese, another LLM trained on a celebrity’s voice produces that audio track and so on, until the final output is a brand-new video of that celebrity delivering the talk in Chinese with synchronized lip movment.

That complex operation involves a lot of little steps that are typically handled by the CPU or GPU and require a lot of model swaps. Why couldn’t you use the SSDs to do those little steps in the background while using the main CPU to delegate tasks to these accelerators and perform other higher-level tasks? In high-performance computing (HPC) organizations, the results can be impressive.

It’s not uncommon for an HPC cluster to have 100 petabytes of data storage (which includes double and triple redundancy), which means they could have 100,000 SSDs to spread a workload across. Suddenly, operations that were taking a day or two are now being completed in mere seconds.

In large HPC arrays, there are so many SSDs that the SSDs’ bandwidth eclipses the entire network or CPU bandwidth. This is where we realized at Phison that there is a massive untapped space where SSDs can do intelligent things.

Where the HPC use case is all about speed and compute, we’ve also looked at security use cases, which are more about rock-solid FIPS 140-3 compliant products that provide security services well beyond what TPM 2.0 achieve.

An SSD can execute hundreds of cryptographic operations like signing and verification per second—and if a server has 30 to 90 SSDs, the processing power grows accordingly. Each SSD can act as an independent hardware-based agent with a root of trust that can point back to your HSM (hardware security module) server. Collectively, all of those drives surpass what one powerful CPU can do because the CPUs are not designed to be digital signing algorithm (DSA) engines. That multi-SSD power, combined with the fact that it’s hardware that’s already installed in the server, provides great benefits in bolstering security.

Remove complexity with specificity

While Phison still believes that the traditional, generic concept of computational storage won’t ultimately lead anywhere, we do see use cases where another approach to computational storage can be an asset. Targeted accelerators make specific operations less complex. And looking at the massive on-board bandwidth of SSDs and the potential benefits of leveraging that power in new ways could lead to some exciting applications in the very near future.

Rethinking Computational Storage: Unlock the Processing Power of SSDs

A smarter approach: Tailored acceleration

Emerging applications in HPC and security are changing the computational storage landscape

Remove complexity with specificity

The Foundation that Accelerates Innovation™

STAY INFORMED

Thank you for subscribing!