The era of computing storage is coming, and processor IP helps it

Latest update time：2021-12-09

Reads：

We are now in an era of data explosion. The increase in volume makes data processing more difficult and consumes more energy. According to Fortune Business Insights, in order to manage the increase in data in the next few years, service providers will increase storage spending by about 25% each year, reaching $85 billion in 2022 and nearly $300 billion in 2027. On the other hand, data center operators want to reduce energy costs and carbon emissions associated with operations. Therefore, service providers focus their investments on higher performance and lower power computing capabilities to reduce data movement.

In this trend, computational storage is a key technology to improve data processing . Computational storage brings computing processing closer to data, thereby improving application performance and overall infrastructure efficiency. The industry believes that some SSD hard drives may develop in the direction of computational storage, which will bring computing power to SSD storage solutions and reduce the amount of data transmitted between storage and application processors. Wikibon predicts that in the next five years or more, the shipment of SSD flash memory capacity will grow by more than 30% annually.

Benefits of Computational Storage

What are the benefits of switching from traditional storage to computational storage?

For example, the US Environmental Protection Agency (US EPA) collects pollutant levels in hundreds of cities in the United States every hour to monitor air quality. There are already millions of these measurements, and they are growing every day. If you want to find certain data in this massive database, you need to copy each database from the storage server SSD to the DRAM associated with the host processor, and then the host CPU scans and finds all records until the system extracts all relevant information from millions of records. This is undoubtedly like looking for a needle in a haystack, which is time-consuming and labor-intensive.

If you use a computing storage system, and replace the solid-state drive in the storage server with a computing storage hard drive with built-in processing power, to find a certain information, the host server only needs to send a request to the storage server, asking it to provide relevant records. In this way, the processor in each computing storage hard drive will pre-process the information and only return relevant information, rather than moving the entire database containing millions of records. The advantage of this is that data processing takes up less network bandwidth because only a small part of the database is sent over the network; the host CPU cycles required will also be much fewer, because the host CPU only needs to check the relevant records, not the entire database.

Figure 1: Comparing traditional computational data transfer with computational storage.

So how does data flow in a computational storage transaction? As shown in Figure 2. First, a traditional host request enters the storage SSD controller (computational storage hard disk) through the host interface to request data. The data extracted from the SSD to DRAM and processed by the host processor can be extremely large. In this case, the host sends a simple high-level command to the computational storage processor to request the start of the transaction.

Next, the computational storage processor starts and analyzes the command from the host, then initiates a read request to the DRAM. This request tells the storage processor to build a transfer descriptor (step 3), which is then used to dispatch to the appropriate flash channel to get the read data from the NAND flash element (step 4).

Figure 2: Computational Storage Drive data flow from host to dispatch descriptor

Next, a read request from the computational storage processor is introduced from the NAND Flash channel for analysis, as shown in Figure 3. The processor looks for a match to the requested data or key. If a matching record is found, the matching record is sent to the DDR DRAM (step 6). The data is then packaged in the host interface protocol and DMAed to the host memory via the host interface, where it is then processed or used by the host processor (step 7). Once completed, the computational storage processor sends a message back to the host processor, informing it that the transaction is complete and the data is available, or sends an error message if the process does not achieve a match (step 8).

Figure 3: Data flow from reading data to successful completion indication on a computational storage hard disk

In computational storage systems, as the amount of storage and the number of hard disks increase, the number of computational processors in the storage device also increases. Therefore, processing power scales with storage. Computational storage processors can be optimized for specific workloads to further improve performance.

Using computational storage reduces the amount of data sent from local storage (NAND Flash) to DRAM for host processing. In the US EPA example, only a very small number of records required data to be stored in DRAM, freeing up the host processor to focus on the most important data.

AI is empowering computing storage

Today's systems generate a large amount of data on the edge. Compared with sending all the data back through the cloud for processing, computing and storing data directly at the edge can reduce data movement. At the same time, using AI technology, the data stored locally on the edge can be processed separately offline, and then only the required data can be moved to the host or data center, which can greatly reduce power consumption and monetary costs and improve performance.

As artificial intelligence (AI) technology learns from the functions of the human brain and neurons, forms more mathematical functions, and creates specialized hardware, accelerators, and neural network engines that can process data, the functionality and efficiency of computing storage will be further optimized and improved.

So what applications are suitable for computational storage? For example, processor offload, video transcoding and searching text, images or videos, but also image classification and object detection and classification in automotive applications, these applications can use machine learning, encryption and/or compression to simplify or reduce the amount of data that needs to be transferred around the system to the host processor.

After understanding how computational storage helps the system, we need to start considering which processor to use to manage data, because more computing power requires more processing power.

Choose the right computing storage processor IP

For computing storage applications, Synopsys has launched the ARC HS4x/HS4xD processor IP.

DesignWare ^® ARC ^® Processor IP provides a very flexible and scalable architecture. Its extensive processor portfolio ranges from low-end three-stage pipeline processors to higher-end 10-stage pipeline real-time and embedded application processors. In addition, Synopsys' embedded vision processors also provide neural network accelerators to help with AI processing.

Figure 4: A computational storage drive can contain multiple ARC® processors for ^{different
functions}

As computing demands on computational storage drives continue to increase, data processing pressure increases. To meet these demands, the DesignWare ^® ARC ^® HS6x processor uses a Dual Issue 64-bit superscalar architecture (Figure 5), which can provide up to 6.1 CoreMark/MHz performance, small size and low power consumption. The ARC ^® HS6x processor is based on the advanced ARCv3 instruction set architecture (ISA) and pipeline, which can provide leading power efficiency and code density. The processor has a 52-bit physical address space and can directly address memory sizes up to 4.5 PB (4.5x1015).

As we move more external compute from storage access control into the local storage processor, we will need to build in additional processing power to support the required programming workloads. The ARC ^® HS6x cores are ideally suited to provide this additional processing power.

Figure 5: ARC HS6x processor

For applications that require even higher performance, a multicore processor version of the HS6x supports up to 12 ARC ^® HS6x CPU cores and up to 16 hardware accelerators in a single coherent processor cluster.

Conclusion

The transition from traditional storage architecture to computational storage is happening. Compared with traditional storage systems, where the host processor needs to handle all storage requests and data copies from storage to DRAM, the shift to in-storage or computational storage architecture, where data is locally operated on the hard disk, will reduce costs and increase efficiency. Synopsys' processor IP will provide strong support for the development of computational storage.

Today is the 2883rd content shared by "Semiconductor Industry Observer" for you, welcome to follow.

Latest articles about

■SiC giant, rebirth: how to predict the future?

■Apple chips may hit Qualcomm hard

■Chip cost per car: soaring to $1,000

■TSMC 2nm, important information

■Huang Renxun's latest views

■The risks of this type of chips that are promising have increased significantly!

■NPU, how to see it?

■Storage giants are abandoning DDR 4

■Intel, why?

■Nvidia will definitely be disrupted