SEU—How to deal with this invisible black hand
Single Event Upset (SEU) is caused by the presence of a large number of high-energy charged particles in the space environment. Electronic components are irradiated by these particles, causing the level state to jump, "0" to "1", or "1" to "0". It is also called soft failure because it is not a real damage to the device hardware, but is recoverable. Single event upsets are most likely to occur in devices such as RAM that use bistable storage. In high-reliability systems, large-capacity DRAM storage products have SEU detection and error correction requirements. With the increase in chip integration, the impact of SEU on SRAM cannot be underestimated. The configuration of most FPGA products is based on SRAM. More and more users are beginning to consider how to reduce the impact of SEU on FPGA systems, so as to avoid system dysfunction and catastrophic accidents in severe cases. Now let's take a look at how Lattice's mainstream products achieve the above functions.
The figure shows the SED detection function module of the ECP series, which is mainly composed of the configuration space access controller, SED control state machine, and 32-bit CRC data register. The SED controller serially reads back the status values of all configuration storage spaces and performs CRC verification. If it is the same as the value stored in the 32-bit CRC verification code space, it proves that no SEU has occurred in the current configuration space. Otherwise, it can be considered that one or more soft failures have occurred. The more important point here is that the SEU detection is performed in the background and will not have any impact on the normal operation of the device. The 32-bit verification code is automatically generated by the Diamond development tool based on the bitstream file and automatically updated to the aforementioned register during programming download. It should be noted that the contents of the EBR and distributed memory are not protected by the CRC verification.
The above figure is the timing diagram of the SED overall module. SEDENABLE is the enable signal of the original control module, SEDSTART triggers the start of detection, SEDINPROG is high to indicate that the detection of the entire configuration space is in progress, SEDDONE indicates that a round of detection is completed, and if the CRC check is found to be wrong, the SEDERR signal is used to indicate it. The time required for a round of detection depends on the size of the device and the speed of the clock used by the detection circuit. Generally speaking, for ECP5 series devices, the detection clock can complete the full chip detection work within one second when it is 2.5MHz.
What should we do if a SEU is detected? The most direct way is to refresh the configuration file again, which is what we call SEC (Single Event Correction).
There are many ways to reconfigure, Master SPI, Slave SPI, I2C or JTAG. The above picture is loaded using MSPI. A low-to-high flip of the PROGRAMN pin allows the device to read the configuration information from the off-chip SPI Flash to overwrite the original file. After the configuration starts, the DONE signal goes low, and then goes high when the configuration is successfully completed, indicating that the configuration is successfully completed.
Since the probability of SEU is very low, it is difficult to occur in actual circuit operation. How can we verify whether our SED/SEC works reliably? Diamond software provides a debugging function for injecting soft failure SEI (Single Event Injection). This tool allows you to randomly generate one or more soft failure bit streams in the background and load them into the device to simulate the actual situation, thereby verifying whether the detection circuit works reliably and stably.
In summary, Lattice's mainstream FPGA products have taken into account the impact of SEU on system reliability, and have targeted soft failures in hardware and software. Users can decide whether to directly use SEC recovery bit files to complete system repairs or report information to a higher application layer to take the next step based on their own application architecture.
For more information about SED/SEC, please visit Lattice's official website (www.latticesemi.com) and search for relevant keywords.