It’s time to take a closer look at CXL

Latest update time：2021-11-23

Reads：

Source: The content is compiled from Rambus Blog by Semiconductor Industry Observer (ID: icbank) , thank you.

Exponential data growth is driving the semiconductor industry to embark on a groundbreaking architectural shift to fundamentally change the performance, efficiency, and cost of data centers .

Server architectures have remained largely unchanged for decades, but now they are taking a revolutionary step toward handling the yottabytes of data generated by AI/ML applications. Specifically, data centers are moving from a model where each server had dedicated processing and memory, as well as networking equipment and accelerators, to a disaggregated “pool” paradigm that intelligently matches resources and workloads.

This approach offers a wide range of benefits to the data center, including higher performance, greater efficiency, and lower total cost of ownership (TCO). While the concepts of disaggregation (or rack-level architecture) and common interfaces have been floating around for some time, the industry is decisively converging on Compute Express Link (CXL) as the cache-coherent interconnect for processors, memory, and accelerators. In fact, new server architectures and designs with the CXL interface will soon be entering the market.

What is Compute Express Link?

Compute Express Link (CXL) is an open-standard, industry-supported cache-coherent interconnect for linking processors, memory extensions, and accelerators. Essentially, CXL technology maintains memory coherency between the CPU memory space and memory on connected devices. This enables resource sharing (or pooling) for higher performance, reduces software stack complexity, and reduces overall system cost. The CXL Consortium has identified three main categories of devices that could benefit from the new interconnect :

Type 1 devices: Accelerators such as SmartNICs typically lack local memory. However, they can communicate with the host processor's DDR memory using the CXL.io protocol and CXL.cache.

Type 2 devices: GPUs, ASICs, and FPGAs are equipped with DDR or HBM memory and can use the CXL.memory protocol along with CXL.io and CXL.cache to make the host processor’s memory available locally to the accelerator—and the accelerator’s memory available locally to the CPU. They are also located in the same cache-coherent domain, which helps with heterogeneous workloads.

Type 3 devices: The CXL.io and CXL.memory protocols can be used for memory expansion and pooling. For example, buffers connected to the CXL bus can be used to implement DRAM capacity expansion, increase memory bandwidth, or add persistent memory without losing DRAM slots. In the real world, this means that high-speed, low-latency storage devices that previously replaced DRAM can be replaced with CXL-enabled devices . These may include non-volatile technologies in various form factors such as add-in cards, U.2, and EDSFF.

CXL Protocol and Standards

The Compute Express Link (CXL) standard supports a variety of use cases through three protocols: CXL.io, CXL.cache, and CXL.memory.

CXL.io : This protocol is functionally equivalent to the PCIe 5.0 protocol and leverages the broad industry adoption and familiarity of PCIe. As a base communications protocol, CXL.io is versatile and addresses a wide range of use cases.

CXL.cache: This protocol is designed for more specific applications and enables accelerators to efficiently access and cache host memory to optimize performance.

CXL.memory: This protocol enables a host (e.g., a processor) to access device-attached memory using load/store commands.

Together, these three protocols facilitate the consistent sharing of memory resources between computing devices, such as CPU hosts and AI accelerators. Essentially, this simplifies programming by enabling communication over shared memory.

Compute Express Link and PCIe 5:

How are the two related?

CXL 2.0 builds on the physical and electrical interfaces of PCIe 5.0 , with a protocol that establishes consistency, simplifies the software stack and maintains compatibility with existing standards.

Specifically, CXL leverages PCIe 5 capabilities to allow alternate protocols to use the physical PCIe layer. When a CXL-enabled accelerator is inserted into a x16 slot, the device negotiates with the host processor's port at the default PCI Express 1.0 transfer rate (2.5 GT/s). The Compute Express Link transaction protocol is only activated if both parties support CXL. Otherwise, they operate as PCIe devices.

According to VentureBeat's Chris Angelini, the alignment of CXL and PCIe 5 means both device classes can transfer data at 32 GT/s (gigatransfers per second), or up to 64 GB/s in each direction over a 16-lane link.

Angelini also noted that the performance demands of CXL will likely be a driving factor in the adoption of PCIe 6.0.

CXL Features and Benefits

Simplifying and improving low-latency connectivity and memory consistency can significantly increase computing performance and efficiency while reducing TCO.

Additionally, CXL memory expansion capabilities enable additional capacity and bandwidth beyond the tightly packed DIMM slots in today’s servers. CXL can add more memory to the CPU host processor via CXL-connected devices. When paired with persistent memory, the low latency CXL link allows the CPU host to use this additional memory in conjunction with DRAM memory. The performance of high-volume workloads, such as AI, depends on large memory capacity.

Considering these are the types of workloads that most enterprises and data center operators are investing in, the benefits of CXL are clear.

CXL 2.0 Specification: What's New?

Compute express link (cxl): Memory pool diagram – Rambus

(1) Memory pool

CXL 2.0 supports switches to enable memory pools. Using a CXL 2.0 switch, a host can access one or more devices in a pool. Although the host must be CXL 2.0 enabled to take advantage of this feature, the memory devices can be a mix of CXL 1.0, 1.1, and 2.0 enabled hardware. In 1.0/1.1, a device is restricted to a single logical device that can only be accessed by one host at a time. However, 2.0 level devices can be partitioned into multiple logical devices, allowing up to 16 hosts to access different parts of the memory simultaneously.

For example, host 1 (H1) can use half of the memory in device 1 (D1) and one quarter of the memory in device 2 (D2) to exactly match the memory requirements of its workload with the available capacity in the memory pool. The remaining capacity in devices D1 and D2 can be used by one or more other hosts, up to a maximum of 16. Devices D3 and D4 (CXL 1.0 and 1.1 enabled, respectively) can only be used by one host at a time.

(2) Exchange

By migrating to the CXL 2.0 direct-attached architecture, data centers can gain the performance benefits of main memory expansion, along with the efficiency and total cost of ownership (TCO) benefits of pooled memory.

Assuming all hosts and devices support CXL 2.0, the "switch" is merged into the memory devices via the crossbar switch in the CXL memory pool chip. This keeps latency low, but requires a more powerful chip because it is now responsible for the control plane functions that were performed by the switch.

Through low latency direct connections, connected memory devices can use DDR DRAM to extend the host main memory. This can be done on a very flexible basis, as the host is able to access all or part of the capacity of any number of devices as needed to handle specific workloads.

(3) “On-demand” memory paradigm

Similar to carpooling, CXL 2.0 allocates memory to hosts on an “as needed” basis, providing greater memory utilization and efficiency. The architecture provides the option to configure server main memory for nominal workloads (rather than worst case), enabling access to the pool when high capacity workloads are needed, and providing further benefits to TCO.

Ultimately, the CXL memory pool model can support a fundamental shift toward server disaggregation and composability, a paradigm in which discrete units of compute, memory, and storage can be combined on demand to efficiently meet the needs of any workload.

(4) Integrity and Data Encryption (IDE)

Disaggregation (or separation of components of a server architecture) increases the attack surface. This is why CXL includes a secure-by-design approach. Specifically, all three CXL protocols are protected by integrity and data encryption (IDE) that provides confidentiality, integrity, and replay protection. IDE is implemented in a hardware-level secure protocol engine instantiated in the CXL host and device chips to meet the high-speed data rate requirements of CXL without introducing additional latency.

It should be noted that the CXL chip and system itself require protections against tampering and cyberattacks. A hardware root of trust implemented in the CXL chip can provide this foundation for the security and support requirements of secure boot and secure firmware download.

To advance CXL, industry companies have formed the CXL Consortium , an open industry standards organization that aims to develop technical specifications to facilitate breakthrough performance for emerging use models while supporting an open ecosystem for data center accelerators and other high-speed enhancements.

★ Click [Read original text] at the end of the article to view the original link of this article!

*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.

Today is the 2867th content shared by "Semiconductor Industry Observer" for you, welcome to follow.

Latest articles about

■SiC giant, rebirth: how to predict the future?

■Apple chips may hit Qualcomm hard

■Chip cost per car: soaring to $1,000

■TSMC 2nm, important information

■Huang Renxun's latest views

■The risks of this type of chips that are promising have increased significantly!

■NPU, how to see it?

■Storage giants are abandoning DDR 4

■Intel, why?

■Nvidia will definitely be disrupted