The latest trends in backbone optical communications
Latest update time:2024-05-16
Reads:
In today’s article, Xiao Zaojun will talk to you about some of the latest technological trends in backbone network optical communications.
█
400G, it’s really here
As you may have heard, domestic operators’ backbone networks have fully launched 400G commercial use since last year.
First, there will be a large number of commercial verifications in 2023, and then the full launch of centralized procurement. 2024 will be the official launch of large-scale commercial use.
Not long ago, in March 2024, China Mobile opened the world's first 400G all-optical interprovincial (Beijing-Inner Mongolia) trunk line, which is regarded as an important landmark event
.
The reason for upgrading the backbone network to 400G is obvious.
On the one hand, the growth in consumer Internet traffic brought about by residents’ digital life
(high-definition video, remote conferencing, online live broadcasts, online games, etc.)
continues.
On the other hand, the entire industry is promoting digital transformation, and the surge in traffic from industry digital systems has intensified the pressure on backbone networks.
The pressure on the backbone network has increased sharply, and there is another key reason - the explosion of AI.
After the rise of the AIGC large model, it triggered a wave of AI. In order to meet the needs of AI business, a large number of intelligent computing centers need to be built. Models are developing from hundreds of billions of parameters to trillions of parameters, and GPU computing clusters are also moving from kilo-card clusters to 10,000-card clusters or even 100,000-card clusters.
As Xiaozaojun has introduced in previous articles, a GPU computing cluster is actually an array of massive GPU cards (GPU servers) connected together through a high-performance network (such as InfiniBand, RoCEv2). It has extremely high requirements for network performance and reliability, which directly affects training efficiency and cost.
Just looking at the network port rate of GPU servers, it starts from 400G per port and may even reach 800G or higher.
Network port of the GPU server
In the past, GPU computing clusters fell into the category of DCN (data center internal network). Now, as the cluster scale continues to expand, distributed intelligent computing centers have begun to be considered for model training.
In other words, several intelligent computing centers in different locations can be used together for training.
This puts forward higher requirements for DCI (data center interconnection network), and the optical communication backbone network must be able to meet this demand in terms of technical performance.
Our country’s computing power strategy still adheres to the idea of “national coordination and overall layout.”
Starting from February 2022, my country has launched the East-West Calculation Project to create a national integrated computing power system.
To put it simply, on the one hand, we need to build a large number of data centers (equivalent to power plants), and on the other hand, we must also build a strong backbone transmission network (equivalent to the transmission grid) to "circulate" these computing power to meet the needs of The needs of all walks of life.
█
400G, how is it done?
The current optical communication backbone network, as the base of the entire digital society, must have ultra-large bandwidth (400G, 800G or even 1.6T in the future), ultra-low latency (multi-level delay circles), and ultra-large-scale networking (serving Distributed computing, as well as the AI cluster just mentioned), ultra-high stability, ultra-high reliability, ultra-high security, ultra-flexible deployment, intelligent operation and maintenance management and control and other features.
Today, we mainly talk about the most important rate bandwidth.
With the development of optical communication technology, if we want to achieve the improvement of speed, we can only make efforts in the following aspects:
First, the baud rate.
The transmission rate is the bit rate, which is the number of bits transmitted per unit time, and the unit is bit/s.
Bit rate = baud rate × number of binary digits corresponding to a single modulation state.
The baud rate is the number of symbols transmitted per unit time. The higher the baud rate, the more symbols are transmitted per second. Of course, the amount of information is larger, and the speed increases.
The baud rate is determined by the capabilities of the optical device. The more advanced the device chip process, the higher the baud rate and the higher the speed (bit rate).
At present, the CMOS process has been improved from 16nm to 7nm and 5nm, and the baud rate has gradually increased from 30+Gbaud to 64+Gbaud, 90+Gbaud, and 128+Gbaud.
Today's 400G is commercially available because the baud rate can reach 128G
baud
.
Let’s look at the modulation method.
In the formula just now, the "number of binary digits corresponding to a single modulation state" is determined by the modulation method.
The modulation schemes of 400G technology currently include 16QAM, 16QAM-PCS
(PCS is a probability shaping technology, which will be introduced next time)
and QPSK, which are suitable for different application scenarios.
Optical communication is different from wireless communication, and we do not blindly pursue high-order modulation.
The lower the modulation order, the lower the line requirements and the lower the network construction cost. Therefore, in the early design stage of long-distance backbone networks, we basically focused on 16QAM and QPSK. Later, 16QAM-PCS joined the competition.
There was no mention of "digital data from the east and from the west" before. Operators believed that 400G would not require long-distance transmission. Therefore, low baud rate devices with more mature technology and lower prices were used, combined with 16QAM with a higher modulation order. , is the mainstream opinion in the industry.
Later, on the one hand, the transmission distance requirements increased from more than 1,000 km to several thousand km. On the other hand, 128GBaud baud rate devices matured rapidly (in the DCN scenario, 800G rose rapidly, stimulating and promoting the industrial chain), creating conditions for QPSK to stand out.
QPSK has higher tolerance to nonlinearity and can appropriately increase fiber input power compared to 16QAM-PCS. Secondly, QPSK’s back-to-back OSNR threshold is optimized compared to 16QAM-PCS. Furthermore, the channel spacing of QPSK is set to 150GHz, so that there is almost no filtering cost during the transmission process.
These advantages
have made QPSK gradually become the industry's unanimous first choice for backbone networks and DCI.
A rough comparison of the three options
Now, for the first two solutions, the application scenarios being considered are more in urban areas or provincial trunk areas.
The third is the extended band.
Baud rate and modulation mainly affect the single wave rate. An optical fiber can have multiple waves, as long as the spectrum range is large enough.
Single wave bandwidth × single fiber wave number = single fiber bandwidth.
As written in the previous table, the channel spacing of QPSK 400G reaches 150GHz. Neither traditional C-band nor extended C-band is sufficient to meet the spectrum bandwidth requirements.
Therefore, the C6T+L6T method is now gradually adopted, with a total spectrum bandwidth of 12THz. Calculate, 80 waves, single wave 400G, together is a single fiber 32T capacity. If you sacrifice some distance and use it to save traffic, deploy QPSK or 16QAM-PCS, and the capacity can be even larger, reaching 48T.
For a detailed introduction to the bands, you can see here:
What are the bands for optical communications?
The biggest problem with extending the band is whether the device can support it and whether the cost is controllable. The devices mentioned here include ITLA, CDM, ICR, EDFA and WSS, etc., involving optical transceiver, optical path switching, amplification, etc.
If the band is expanded, there is another issue involved, which is integration.
The current band extension is actually more like a simple binding of two systems (C and L). The two systems operate independently, transmit by combining the waves, and then when they reach the other end, they split the waves and continue to process them separately.
If there are two systems, the volume will be larger, the power consumption will be higher, and the design will be more complicated. Therefore, the industry needs to study how to integrate devices so that one system can support different extended bands at the same time, that is, to achieve true integration.
Optical fiber communications, in addition to optical modules and optical equipment, also need to pay attention to optical fibers.
The current mainstream optical fiber is G.652D optical fiber. 400G QPSK, on G.652D, can also transmit 1500km with the help of EDFA amplification.
After years of verification, the industry has determined that G.654E optical fiber is the new successor. If G.654E with better performance is used, the transmission distance of 400G QPSK can be increased by more than 30% under the same conditions.
G.654E optical fiber already has the capability of large-scale production and will be deployed on a large scale on long-distance trunk lines. Some low-loss optical fibers of the G.654 series have also become the first choice for ultra-long-distance transmission across oceans in submarine cable systems.
In addition to traditional optical fiber. The industry also believes that multi-core optical fiber and hollow-core optical fiber have broad application prospects.
Multi-core optical fiber is a kind of space division multiplexing. Inserting more fiber cores into one optical fiber and using few modes can greatly increase the capacity of the optical fiber.
Hollow-core optical fibers are even more awesome. They directly make the optical fiber hollow and replace the glass fiber core with air.
Hollow-core optical fiber has been proven to bring greater capacity, lower latency, smaller transmission loss, and ultra-low nonlinearity. It is unanimously considered by the industry to be one of the most promising technologies in optical communications.
█
The next step for 400G, 800G or 1.6T?
After 400G is officially commercialized on a large scale, the entire industry will focus on the technical standard system beyond 400G.
The industry is still stepping up its debate on whether to develop 800G, 1.2T or 1.6T next.
If you want to achieve a higher rate, you must
continue to make a fuss about "modulation method + baud rate". 130GBd, or higher 260GBd, is the inevitable direction. Higher baud rates mean that related devices must keep up and form a mature industrial chain.
Beyond 400G, QPSK can no longer be relied upon. 16QAM modulation is currently the option generally recognized by the industry.
The band also needs to be further expanded. On the basis of expanding C and L, consider expanding to S-band, U-band, E-band, etc. If it is C+L+S, it is 12T+5T, reaching a bandwidth of 17THz.
Due to the superposition of many factors, the transmission rate of a single optical fiber in a single direction exceeds 100Tbps, which is just around the corner.
Within the data center, 800G (based on a baud rate of 100GBd or above, single channel 100G) is already commercially available. Single channel 200G, 400G, 800G, but the time is sooner or later. In this regard, progress abroad is faster.
As capacity continues to increase, so do the technical challenges. The development of optical communications, to put it bluntly, depends on devices, chips, processes, and materials.
To meet the aforementioned power consumption, security, operation and maintenance requirements, it also relies on a series of innovations such as process, architecture, packaging, artificial intelligence, and digital twins. There is still a lot of work that needs to be done upstream and downstream of the industrial chain. There is still a long way to
go
.
█Final
words
Optical communication is the digital artery of the entire society. Over the years, people have questioned many technologies (including 5G), but no one will question optical communications because it is a rigid need for social development.
The trend of increasing human data traffic will not change in the next few decades. The rapid rise of artificial intelligence technology will further amplify this trend.
The current development of optical communications cannot meet the demand. This means that companies will have greater motivation to invest resources in research and development in order to obtain profits.
It is hoped that the optical communications industry will further explode and pave the way for the development of a digitally intelligent society.
references:
1. "Key technologies, application progress and future prospects of high-speed optical transmission in the AI era", Institute of Technology and Standards, Academy of Information and Communications Technology, Zhang Haiyi;
2. "Computing power network opens a new era of 400G all-optical", China Mobile Research Institute, Duan Xiaodong;
3. "400G All-Optical Computing Power Internet in the AI Era", China Unicom Research Institute, Tang Xiongyan.