What’s special about space-grade chip design
Source: This article is reproduced from the public account Network Switch FPGA, author: Dr. Di Zhixiong, graduate tutor of Southwest Jiaotong University , thank you.
Electronic equipment systems are an important support for spacecraft such as Starlink, Mars Rover, Yutu, and Chang'e, while aerospace-grade chips are the heart of aerospace electronic equipment.
There is a legend circulating in the industry that a space-grade FPGA chip from Xilinx, with a unit price of about 5 million yuan, is the most expensive chip in history.
This legend is not completely without basis. Space-grade chips must be radiation-resistant, and their value is often dozens, even hundreds, or even thousands of times higher than the common consumer-grade chips in our lives. So, compared with consumer-grade chips, what is so special about these expensive space-grade chips in the design stage?
The space environment where aerospace-grade chips are located
There are a large number of high-energy particles and cosmic rays in the space environment where spacecraft operate. These particles and rays can penetrate the spacecraft shielding layer and interact with the materials of components to produce radiation effects, causing device performance degradation or functional abnormalities, affecting the on-orbit safety of the spacecraft. The main space radiation sources that cause device radiation effects include the Earth's radiation belt, galactic cosmic rays, solar cosmic rays, and artificial radiation.
Among them, the radiation effect that has the most serious impact on chip operation is the "single particle effect".
According to statistics, from 1971 to 1986, 39 synchronous satellites launched abroad experienced 1,589 failures, 1,129 of which were related to space radiation, and 621 of which were caused by single particle effects. These statistics show that the main failures of electronic devices in aerospace applications come from space radiation, and failures caused by single particle effects account for a large proportion of them.
Some of these faults are permanent and irreversible, such as a single-particle lock that causes a local short circuit inside the chip, which generates a large current and burns the device. This type of error can be avoided by applying some specific processes or device libraries. Most errors in space are recoverable errors caused by the logical state jump of semiconductor devices, such as a single-particle flip that causes memory storage errors.
Single-Event Upsets (SEU) refers to the change of the potential state of a component due to radiation, "0" becomes "1", or "1" becomes "0", but generally does not cause physical damage to the device. Because "single-event upsets" occur frequently, they need to be paid special attention to during the chip design stage. This is also the focus of this article.
How to protect against "single event upset" during chip design
(1) Select the appropriate process
In the aerospace field, the smaller the process, the better. Generally speaking, the smaller the process, the worse the radiation resistance. Therefore, in order to ensure reliability, processes with larger line widths are generally selected, such as 0.18um, 90nm, 65nm, etc., rather than blindly pursuing the cutting-edge process of Moore's Law.
(2) Strengthening the standard unit process library
The standard cell process library is the cornerstone of digital chips. If a digital chip is regarded as a building, the standard cell process library is the brick that constitutes the building. The standard cell process library includes a variety of basic units such as inverters, AND gates, registers, selectors, full adders, etc. Each standard unit corresponds to multiple unit circuits of different sizes (W/L) and different driving capabilities. Complex digital chips can be constructed based on these basic units.
Given the large scale of digital chips, it is difficult to design them through fully customized circuit structures. Directly reinforcing the commercial process library is the lowest cost option. Based on the standard cell library provided by the manufacturer, combined with radiation-resistant reinforcement measures, the designed input and output cell library has radiation-resistant capabilities. The reinforced process library needs to be verified by the wafer factory.
(3) Design redundancy
Among the radiation-resistant reinforcement methods, triple-module redundancy (TMR) is the most representative fault-tolerant mechanism. At the same time, three modules with the same function perform the same operation. Since a single particle flip can only flip one path, the "three-choose-two" voter will select the correct results of the remaining two paths, enhancing the reliability of the circuit system. The most significant advantage of triple-module redundancy is its strong error correction capability and simple design, which greatly improves the reliability of the circuit; but the disadvantage is also obvious, which will increase the circuit by more than 3 times. The TMR method is more flexible, and TMR can be designed at any level such as register level, circuit level, module level, etc. according to performance requirements, and some EDA tools can also automatically insert it.
Error Detection and Correction (EDAC) is also a simple and efficient circuit design method for protecting against single-particle upsets. EDAC is mainly based on the principles of error detection and correction. It generates and saves the check code from the written data through the conversion circuit. When reading, it judges the check code. If only one bit is wrong, the system automatically corrects it and outputs the correct data. At the same time, it also writes back the data to overwrite the original erroneous data. Although EDAC has powerful error correction capabilities, it requires error correction and decoding circuits, so the structure is relatively complex and is not suitable for high-performance data channels. EDAC can also be used to correct multi-bit errors, but the error correction circuit will be more complicated.
Weighing the pros and cons of TMR and EDAC, TMR is usually used in logic circuit design and EDAC is used in memory read and write circuits.
(4) Module independence
Single particle upsets occur frequently, and it is necessary to consider that the overall function of the chip will not be affected after the upset occurs. Therefore, in the architecture design, it is necessary to ensure that the modules maintain strong independence as much as possible, and have independent reset functions as much as possible, so that after a single particle upsets the signal value, on the one hand, the faulty circuit can be restored to normal as soon as possible through the reset signal; on the other hand, ensure that other modules that work normally are not affected. In addition, an abnormality detection circuit needs to be added to reset the circuit when an abnormality is found.
summary
Although the above methods can effectively protect against single-particle upset effects, they also cause a lot of trouble for logic synthesis and layout and routing, and need to be handled with caution during the physical implementation of the chip. In addition to the above methods, Muller C units and dual interlocked storage cell structures (DICE) can also be introduced to protect transistor-level circuits, and ring gates can be used to replace strip gates in the layout stage.
In short, in the field of aerospace, chip performance is not the first consideration, but reliability is the top priority. Only when the chip has the ability to resist radiation can the normal operation of the spacecraft be ensured.
*Disclaimer: This article is originally written by the author. The content of the article is the author's personal opinion. Semiconductor Industry Observer reprints it only to convey a different point of view. It does not mean that Semiconductor Industry Observer agrees or supports this point of view. If you have any objections, please contact Semiconductor Industry Observer.
Today is the 2486th issue of content shared by "Semiconductor Industry Observer" for you, welcome to follow.
Recommended Reading
★ With a 40-fold increase in five years, we have witnessed the miracle era of semiconductors
★ Which one will win in the incremental market of 12-inch and 8-inch wafers?
★ Changes and thoughts caused by "chip fever"
Semiconductor Industry Observation
" The first vertical media in semiconductor industry "
Real-time professional original depth
Scan the QR code , reply to the keywords below, and read more
Storage|Wafer| CMOS|FPGA|RF|TSMC|Chinese Chip|Huawei
Reply
Submit your article
and read "How to become a member of "Semiconductor Industry Observer""
Reply Search and you can easily find other articles that interest you!
Featured Posts