Zhengzhou flood: did the disaster recovery mechanism of the communication network work?
[Copy link]
In the past two days, Zhengzhou, Henan and other areas suffered from rare heavy rainfall, resulting in serious flood disasters.
The floods caused urban waterlogging, bringing heavy losses to the lives and property of local residents. The shocking video footage from the scene has touched the hearts of people across the country.
At present, the frontline is carrying out intensive rescue and relief operations. We can only pray silently, hoping that the rain will stop soon, the water will recede soon, the losses in the disaster area will not be further expanded, and the lives of the people in the disaster area can return to normal as soon as possible.
As a communications person, Xiaozaojun paid special attention to the losses of communications network facilities while paying attention to the disaster situation on site.
According to past experience, when a disaster strikes, the local communication infrastructure will certainly be damaged. A reliable communication network is an important guarantee for disaster rescue and relief, and is also the cornerstone for stabilizing the emotions of disaster victims on the front line.
In other words, once a disaster occurs, front-line communications personnel must quickly engage in repairing communications equipment and providing emergency support.
In Xiaozaojun’s circle of friends, there are already communications colleagues in Henan who are working overtime to repair the problem, striving to resume business as soon as possible.
Image from Zhengzhou Unicom
According to the fault notification information reported by colleagues on the scene, this flood has indeed caused damage far beyond any previous ones .
Fault notification message at the Zhengzhou site
In the past, ordinary floods would only submerge base stations and access rooms. More seriously, they would submerge the convergence rooms and machine rooms in various districts and counties. The waterlogging caused by this heavy rainfall even flooded some backbone core machine rooms in the provincial capital , and rainwater backflow occurred in the main and backup machine rooms .
This situation is extremely rare. It has probably not happened in China in recent decades.
The backbone computer room runs important core network equipment, which is the heart of the entire communication network.
Core network room
At present, the operator's HLR equipment is the most affected.
HLR, the full name is Home Location Register, is a user database device and one of the key devices of the core network. It stores the data information of all local users, including basic information of users, basic service information, supplementary service information, etc.
HLR was the name in the 2G/3G era. Now in the 4G/5G era, HLR has been renamed HSS (Home Subscriber Server) , and its functions and performance have been upgraded.
HLR and HSS, as user databases, are the core of the entire communication network. Whenever there is a major network failure, it is mostly related to them. Either the database is deleted by mistake, or the transmission is interrupted (such as fiber break), resulting in the interruption of the HLR (HSS) link.
In 2017, a major network outage in Nanning, Guangxi was caused by the accidental deletion of 800,000 user data in the operator's HLR. The entire network service was interrupted for 8 hours and 39 minutes, which had a huge impact. The responsible party was fined 500 million yuan.
The HLR in Zhengzhou was decommissioned (in the telecommunications industry term: decommissioned), which had a huge impact. However, judging from the on-site situation, the disaster recovery mechanism played a role, so there was no large-scale communication interruption.
First of all, I would like to remind brothers and sisters in the disaster-stricken areas in Henan Province to try not to turn off their mobile phones in the near future, because turning on and off the mobile phone requires contacting the HLR for "registration".
Under normal circumstances
In the case that the HLR is out of service, when the mobile phone is turned on, the signaling message cannot reach the HLR, and the identity confirmation from the network cannot be obtained, and the mobile phone cannot access the network.
When HLR is out of service
Generally speaking, after a mobile phone is connected to the network, the network will also regularly "update the location" of the mobile phone. In other words, the network will ask the mobile phone to report status information every once in a while. In this case, when the disaster occurred, the local operator may have manually modified the configuration on the network side to extend the update cycle and avoid location update failure.
In addition, the operator's off-site backup solution also played an important role in the decommissioning of Zhengzhou HLR .
When both the local primary and backup HLRs were affected by the disaster, the operator activated the backup HLR located in the capital city of a neighboring province to temporarily replace the decommissioned local HLR to ensure the implementation of services.
This is basically the highest level of backup, specifically designed for extreme situations such as war, terrorist attacks, earthquakes, etc.
Different levels of disaster recovery
In extremely special cases, when the number of user calls surges and there are too many signaling messages in the network, exceeding the load of the network link, the network side may take measures such as canceling user authentication to reduce the network signaling load as much as possible and avoid complete network congestion.
During this flood, the RADIUS equipment for the fixed-line broadband access service was offline , so the method of deauthentication was adopted.
Radius, the full name is Remote Authentication Dial In User Service, a remote user dial-in authentication system. As the name suggests, it is also a device that authenticates and authorizes users. When a radius failure occurs on site, the solution is to directly disable authentication and enable the dial-up without authentication policy to ensure smooth network access for all users.
In addition to HLR, according to on-site feedback, microwave relay lines were also temporarily affected and IPTV services were also affected, but these were not too troublesome.
At present, the communication engineers on site are carrying out intensive equipment repairs. It is believed that the functions of the core backbone network will be restored soon. As the flood recedes, the repairs of the computer rooms at various sites will also be started in full swing, and the mobile phone and broadband services of the people will gradually return to normal.
Finally, I pray again that the rain in Henan will stop soon and the floods will recede soon. I hope everyone is safe. I also hope that all front-line communication engineers will pay attention to safety, fulfill their mission, and succeed in rescue operations!
The picture is from the Internet, not the disaster area in Zhengzhou
Author: Xiaozaojun
Source: Fresh Date Classroom https://mp.weixin.qq.com/s/-FhonJzN52WE0gbdnNEFsg
|