How Hyperscalers Can Maximize Data Storage Capabilities

By Andy Higginbotham | Jul 24, 2023 | All, Enterprise, Featured

Data is being generated at a crazy pace today. Over the last decade, the rate of data generation has grown exponentially. It’s not just humans creating all of this data, but also software and machines that “automatically” create data as a byproduct of artificial intelligence. .

It is estimated that there are already about 50 zettabytes (ZB) of data accumulated in storage systems around the world right now – and we’re on track to generate 460+ exabytes (EB) of data every day by 2025.

Source: Raconteur

Thankfully, compute and storage systems have kept up with this explosion in data. Today, massive amounts of data are stored and managed in cloud systems across the world. And “hyperscaling” is where manufacturers of cloud computing hardware integrate large-scale data processing.

What are hyperscalers and what do they do?

Hyperscale computing is the ability of an architecture to scale up or down quickly in response to increases and decreases in user traffic and demand. “Hyperscalers” are service providers with data center resources who offer compute, storage, memory, networking, application and database capabilities in the form of cloud services to large numbers of customers. They typically run large distributed or grid computing environments from which they provision these resources to their customers’ nodes.

In alphabetical order, Alibaba, Apple, Amazon, Facebook, Google, IBM, Microsoft, and Oracle are some of the biggest Hyperscalers out there.

Essentially Hyperscalers manage the physical infrastructure, operating systems and large-scale application software while the end user gets virtual instances in the form of Software as a Service (SaaS), Platform as a Service (PaaS) or Infrastructure as a Service (IaaS).

Hyperscalers bring global business consulting and IT outsourcing solutions to organizations of all sizes. They enable businesses to migrate legacy IT environments to the cloud, and build and use technology stacks for faster and efficient execution of business workloads. These technology stacks can comprise hybrid architectures (a combination of on-premise data centers and private, public or hybrid cloud systems) that run macro and microservices as well as cloud-native applications.

Software-defined storage (SDS): The solution to hyperscale storage requirements

Hyperscalers can’t simply buy storage from enterprise storage vendors. The diversity of their needs can’t be met by traditional storage technology – they need automation, virtualization and self-service features on a scale that even the best hardware struggles to match, or is too expensive to order from off-the-shelf components.

The solution that the first of these Hyperscalers (Amazon, Facebook, Microsoft and Google) came up with was software-defined storage (SDS) – an agile, cost-effective infrastructure solution that took automation to the next level and allowed them to handle large data volumes successfully.

But what is SDS? Gartner defines it as a system that abstracts software from the underlying storage hardware and provides a common management platform for data services across a heterogeneous or homogeneous IT infrastructure.

By decoupling software from hardware, Hyperscalers attempt to lower costs – they are able to use commodity components that conform to industry standards and assemble them in data center racks.

Since the defining feature of SDS is its unified control and management plane, it prioritizes reliability and availability over performance in some cases. This means that Hyperscalers need some very specific capabilities from the system:

- - Higher I/O operations per second (IOPS)
  - A per-I/O retry policy (try hard or fail fast)
  - Lower tail latency
  - Control timing over background tasks, especially when tail latency is an issue
  - Granular access to telemetry from SSD analytics, such as the speed with which every block is responding, program/erase (P/E) counts, and write amplification factors (WAFs)
  - Ability to prioritize requests even when SSD firmware does the scheduling
  - An abstraction layer that integrates all features from multiple vendors in a heterogeneous environment
  - Security features throughout the system

Taken as a whole, these customized features offer clear business benefits to Hyperscalers:

- - Lower TCO: SDS does away with the need for proprietary (read, expensive) storage. Hardware that works with industry-standard servers suffices, lowering CAPEX, while lower upgrade and maintenance costs reduce OPEX.
  - Availability: SDS can be deployed with a distributed, scale-out approach, where the software layer enforces redundancy.
  - Performance: Performance can be scaled or improved by adding powerful individual nodes on an on-demand basis.
  - Resilience: SDS offers a distributed storage platform where data is simultaneously written to multiple locations. This makes disaster recovery a straightforward process – there is no need to physically move data or applications in the event of a failure.
  - Flexibility: The hardware platform is easily managed and scaled by in-house teams. Storage provisioning is simple. Plus, there is no vendor lock-in.
  - Visibility: SDS supports most storage protocols, including block, file and object. You can consolidate these within your IT infrastructure, leading to lesser data silos and reduced fragmentation.
  - Innovation: Since SDS uses industry-standard hardware, both the storage devices and the server can take advantage of advances in computing, chipsets, flash memory and SSD storage.

While the technical, operational and business advantages of using an SDS are clear for Hyperscalers, there is one crucial link that makes or breaks the entire data processing chain: the underlying storage hardware.

Why hyperscalers are turning to SSDs for storage

Over the past few years, SSDs have increasingly become common in the enterprise, especially in workloads that involve a lot of data processing. Hyperscalers fit the bill perfectly.

Hyperscalers employ storage acceleration methods such as parallelization (running multiple, concurrent data processes) and shuffling (increasing the volume of transition data processed by applications) to meet large-scale data processing requirements – and these are all supported by present-day SSDs.

One of the biggest concerns is price. The amount of storage that Hyperscalers need to process data is enormous. While SSDs are still more expensive than HDDs in terms of basic capacity (cost-per-TB), they offer a clear benefit when you consider the price-to-performance ratio. SSDs deliver random access I/O performance that is several orders of magnitude higher than HDDs. As a result, cost-per-IOPS is significantly lower.

By 2026, some classes of SSDs are forecasted to be cheaper than HDDs on a dollar-per-terabyte basis and “crush hard drives in the enterprise” as per a study by Wikibon.

Source: Blocks and Files

While TCO is top-of-the-mind for everyone, scale and performance matter just as much (if not more) for hyperscalers, in terms of higher storage capacities as well as faster response times. Cloud providers are demanding larger hard drives than ever – suppliers already have 60+TB drives on their roadmaps. Apart from capacity and performance, there are several reasons flash-based SSDs are becoming the de facto storage solutions for cloud vendors and other enterprise hyperscalers:

- - Legacy hard drives are not built to handle cloud-native apps and I/O-intensive databases; these perform best on flash storage.
  - Flash storage boosts VM performance and makes it easier to move workloads between on-premise environments and the cloud – with reduced (and predictable) performance penalties.
  - SSDs are more environmentally friendly and consume less electricity than HDDs, with in-built power management features.

These factors have driven leading Hyperscalers to use (and offer) SSD storage as part of their premium software, platform and infrastructure services. For instance,

- - AWS provides SSD storage across its block storage EBS, GP2 and IO1 volumes, as well as file storage FSx Windows and FSx Lustre.
  - Azure provides Azure Managed Disks as its block-level storage option for Azure VMs. As with AWS, there are solid-state and magnetic options.
  - Microsoft also offers NetApp files, premium files and storage accounts on SSD.
  - GCP provides its premium Local SSD storage for high-performance VM instances and Persistent Disk for less demanding workloads.

While there are a mind-numbing variety of options, it is imperative for Hyperscalers to define and conform to storage performance standards – which is why Facebook and Microsoft partnered to develop and ratify the Open Compute Platform (OCP) NVMe Cloud SSD specification. It serves to align the industry as a whole and address hyperscaling concerns such as throughput and latency. It also sets unified, interoperable design and performance standards for SDD vendors to follow.

The OCP NVMe Cloud SSD spec puts the minimum and standard requirements for cloud service providers out there for vendors and manufacturers. This is a win-win situation: Hyperscalers have access to an always-elastic supply chain while storage OEMs know exactly what Hyperscalers want.

A side benefit is the unceasing development and evolution of storage and memory technologies that lead to more innovation. Phison experiences this firsthand.

Phison’s SSDs drive scale and innovation for hyperscalers

Phison offers customizable SSD solutions that can be optimized to power hyperscale computing and workloads. With performance, power, endurance and built-in analytics, customizable SSDs can deliver precisely what cloud applications and platforms need to work at optimal levels.

Phison’s new X1 controller-based SSD platform – announced in August 2022 – delivers the industry’s most advanced enterprise SSD solution. Engineered to meet the stringent demands of data center operators, Hyperscalers and cloud service providers, the X1 offers a 30% increase in data reads over existing competitors for the same unit of power used. This energy efficiency overcomes significant bottlenecks in high-performance computing (HPC) and AI, which are both used overwhelmingly at the hyperscale level.

The X1 controller is a powerhouse of performance. It boasts sequential read and write speeds of 7.2 GB/s and 6.7 GB/s respectively, 1.75 million IOPS read and 470,000 IOPS write in random 4K speeds, power loss protection capacitors, end-to-end data path protection, crypto erase and so on. Built using 128-layer eTLC NAND on the PCI Gen 4×4 NVMe 1.4 interface and U.3 form factor, it provides true versatility and scalability for hyperscalers. The U.3 form factor SSD is backward compatible in existing U.2 backplanes and slots.

Further, all workloads in hyperscale cloud environments aren’t the same – most require read-intensive SSDs with large data storage capacities. Phison caters to this category too, with its ESR1710 TLC NAND-based storage. This customizable SSD platform features some of the highest rack densities and lowest power consumption capabilities, even with extreme high capacities of over 15 TB.

Data storage and processing can make or break a business at the Hyperscaler level. If they want to consistently deliver large-scale solutions with dynamic provisioning, Hyperscalers need to squeeze every last bit of performance out of their SSD solution. Storage arrays built with high-speed, low-latency Phison SSD solutions can easily power workloads with some of the biggest data processing needs (such as machine learning and multiplayer gaming). It goes without saying that Hyperscalers need not look anywhere else.

How Hyperscalers Can Maximize Data Storage Capabilities

What are hyperscalers and what do they do?

Software-defined storage (SDS): The solution to hyperscale storage requirements

Why hyperscalers are turning to SSDs for storage

Phison’s SSDs drive scale and innovation for hyperscalers

The Foundation that Accelerates Innovation™

STAY INFORMED

Thank you for subscribing!