Discussion on Anti-interference Method of Single Chip Microcomputer System Software-EEWORLD

Collect

　　While improving the anti-interference ability of hardware systems, software anti-interference is gaining more and more attention due to its flexible design, saving hardware resources and good reliability. The following takes the MCS-51 single-chip microcomputer system as an example to study the anti-interference method of microcomputer system software.

　　1 Research on software anti-interference methods

　　In engineering practice, the content of software anti-interference research is mainly: 1. Eliminating the noise of analog input signals (such as digital filtering technology); 2. Methods to get the program back on track when the program is running chaotically. This article proposes several effective software anti-interference methods for the latter.

　　1.1 Instruction Redundancy

　　The CPU fetches instructions by first fetching the opcode and then the operand. When the PC is disturbed and an error occurs, the program will deviate from the normal track and "fly around". When it fetches a two-byte instruction, if the instruction fetching moment falls on the operand, the operand will be mistaken as the opcode, and the program will fail. If it "flies" to a three-byte instruction, the probability of error is even greater.

　　Inserting some single-byte instructions artificially at key places, or rewriting valid single-byte instructions is called instruction redundancy. Usually, two or more bytes of NOP are inserted after two-byte instructions and three-byte instructions. In this way, even if the program flies to the operand, the existence of the no-operation instruction NOP prevents the subsequent instructions from being executed as operands, and the program is automatically put on the right track.

　　In addition, inserting two NOPs before instructions that play an important role in the system flow, such as RET, RETI, LCALL, LJMP, JC, etc., can also put the flying program back on track and ensure the execution of these important instructions.

　　1.2 Interception Technology

　　The so-called interception means to guide the flying program to the designated location and then handle the error. Usually, software traps are used to intercept flying programs. Therefore, the traps must be designed reasonably first, and then the traps must be placed in appropriate locations.

　　(1) Design of software traps

　　When the flying program enters the non-program area, the redundant instructions will not work. Through the software trap, the flying program is intercepted and led to the specified location, and then the error is handled. The software trap is used to lead the captured flying program to the reset entry address 0000H. Usually, the following instructions are filled in the non-program area of EPROM as software traps:

　　NOPNOPLJMP 0000H has a machine code of 0000020000.

　　(2) Trap Arrangement

　　Usually, the EPROM space that is not used in the program is filled with 0000020000. The last line should be filled with 020000. When the flying program falls into this area, it can automatically enter the orbit. The free cells between the modules in the user program area can also be filled with trap instructions. When the interrupt used is opened due to interference, set a software trap in the corresponding interrupt service program to capture the wrong interrupt in time. For example, although an application system does not use external interrupt 1, the interrupt service program of external interrupt 1 can be in the following form:

　　NOPNOPRETI return instruction can use "RETI" or "LJMP 0000H". If the design of the fault diagnosis program and the system self-recovery program is reliable and perfect, using "LJMP 0000H" as the return instruction can directly enter the fault diagnosis program, handle the fault as soon as possible and restore the program operation.

　　Considering the capacity of program memory, generally 2-3 software traps per 1K space are sufficient for effective interception.

　　1.3 Software “Watchdog” Technology

　　If an out-of-control program enters an "infinite loop", the "watchdog" technology is usually used to get the program out of the "infinite loop". By continuously detecting the program loop running time, if it is found that the program loop time exceeds the maximum loop running time, it is considered that the system is trapped in an "infinite loop" and error handling is required.

　　"Watchdog" technology can be implemented by hardware or software. In industrial applications, severe interference sometimes destroys the interrupt mode control word and turns off the interrupt. Then the system cannot "feed the dog" regularly, and the hardware watchdog circuit fails. Software watchdog can effectively solve this problem.

　　In actual application, the author adopts a ring interrupt monitoring system. Timer T0 monitors timer T1, timer T1 monitors the main program, and the main program monitors timer T0. The software "watchdog" using this ring structure has good anti-interference performance and greatly improves the system reliability. For the measurement and control system that needs to frequently use the T1 timer for serial communication, timer T1 cannot be interrupted, and can be monitored by serial port interrupt instead (if the MCS-52 series microcontroller is used, T2 can also be used instead of T1 for monitoring). The monitoring principle of this software "watchdog" is: set a running observation variable in the main program, T0 interrupt service program, and T1 interrupt service program, assuming that they are MWatch, T0Watch, and T1Watch. Each time the main program loops once, MWatch increases by 1. Similarly, T0 and T1 interrupt service programs are executed once, and T0Watch and T1Watch increase by 1. In the T0 interrupt service program, the changes of T1Watch are detected to determine whether T1 is running normally. In the T1 interrupt service program, the changes of MWatch are detected to determine whether the main program is running normally. In the main program, the changes of T0Watch are detected to determine whether T0 is working normally. If it is detected that an observed variable changes abnormally, such as it should be added by 1 but is not added by 1, then go to the error handling program for troubleshooting. Of course, the maximum cycle period of the main program, the timing period of timer T0 and T1 should be fully and reasonably considered. Due to space limitations, I will not go into details. [page]

　　2. System Fault Handling and Self-Recovery Program Design

　　Resetting the microcontroller system due to interference or after power failure is an abnormal reset. Fault diagnosis should be performed and the system should be able to automatically restore to the state before the abnormal reset.

　　2.1 Identification of abnormal reset

　　The execution of the program always starts from 0000H. There are four possible reasons why the program starts from 0000H: 1. System power-on reset; 2. Software fault reset; 3. Watchdog timeout and hardware reset; 4. Task is in the process of execution and then power-on reset. Except for the first case, all of the four cases are abnormal resets and need to be identified.

　　(1) Identification of hardware reset and software reset

Here, hardware reset refers to power-on reset and watchdog reset. Hardware reset has an impact on registers, such as PC=0000H, SP=07H, PSW=00H after reset. Software reset has no impact on SP and SPW. Therefore, for microcomputer measurement and control systems, when the program runs normally, set the SP address to be greater than 07H, or set the 5th user flag bit of PSW to 1 when the system runs normally. Then when the system is reset, it only needs to detect the PSW.5 flag bit or SP value to determine whether it is a hardware reset.

　　Since the state of the on-chip RAM is random during hardware reset, and the on-chip RAM can maintain the state before reset during software reset, one or two units in the chip can be selected as the power-on mark. Assume that 40H is used as the power-on mark, and the power-on mark word is 78H. If the content of the 40H unit is not equal to 78H after the system is reset, it is considered a hardware reset, otherwise it is considered a software reset and turns to error processing. If two units are used as power-on marks, the reliability of this judgment method is higher.

　　(2) Identification of power-on reset and watchdog fault reset

　　Power-on reset and watchdog fault reset are both hardware resets, so in order to correctly identify them, it is generally necessary to use non-volatile RAM or EEROM. When the system is operating normally, set an observation unit that can be protected by power failure. When the system is operating normally, keep the observation unit at a normal value (set to AAH) in the interrupt service program of the timed dog feeding, and clear the unit in the main program. Since the observation unit can be protected by power failure, it can be determined whether the watchdog is reset by detecting whether the unit is at a normal value when the system is powered on.

　　(3) Identification of normal power-on reset and abnormal power-on reset

　　Identifying the power-on reset and normal power-on reset caused by unexpected situations such as power failure in the measurement and control system is particularly important for process control systems. For example, a measurement and control system that uses time as the control standard takes 1 hour to complete a measurement and control task. When the measurement and control has been performed for 50 minutes, the system voltage abnormality causes a reset. At this time, if the system is reset and the measurement and control is started from the beginning, it will cause unnecessary time consumption. Therefore, a monitoring unit can be used to monitor the current system's operating status and system time, and the control process can be decomposed into several steps or several time periods. After each step is executed or each time period is run, the monitoring unit is set to the shutdown permission value. Different tasks or different stages of tasks have different values. If the system is performing a measurement and control task or is executing a certain time period, the monitoring unit is set to an abnormal shutdown value. Then, after the system is reset, the original operating status of the system can be judged based on this unit, and the error handling program can be jumped to restore the original operating status of the system.

　　2.2 Programming for system self-recovery after abnormal reset

　　For some process control systems with strict sequence requirements, if the system is not reset normally, it is generally required to resume operation from the module or task that is out of control. Therefore, the measurement and control system must back up important data units and parameters, such as system operation status, system process value, current input and output values, current clock value, observation unit value, etc. These data must be backed up regularly and immediately if modified.

　　When it is determined that the system is not reset normally, some necessary system data must be restored first, such as the initialization of the display module and the initialization of the external expansion chip. Then the system status and operating parameters of the measurement and control system are restored, including the restoration of the display interface. After that, the tasks, parameters, running time, etc. before the reset are restored, and then the system enters the running state.

　　It should be noted that truly restoring the operating status of the system requires extremely detailed backup of the system's important data and data reliability checks to ensure the reliability of the restored data.

　　Secondly, for multi-task, multi-process measurement and control systems, data recovery needs to consider the order of recovery.

　　System basic initialization refers to initialization of chips, display, input and output modes, etc. It should be noted that the initialization of input and output should not cause malfunction. The initialization of tasks before reset refers to the execution status and running time of tasks.

　　3 Conclusion

　　Due to limited space, this article does not discuss some other common methods of software anti-interference, such as digital filtering, RAM data protection and error correction. In engineering practice, several anti-interference methods are usually used together to complement each other to achieve better anti-interference effects. Fundamentally speaking, hardware anti-interference is active, while software anti-interference is passive. It is completely feasible to carefully analyze the interference source, combine hardware and software anti-interference, improve the system monitoring program, and design a stable and reliable single-chip microcomputer system.

Reference address：Discussion on Anti-interference Method of Single Chip Microcomputer System Software

Previous article：Design of intelligent power limiting, lightning protection and fire protection controller based on Renesas microcontroller
Next article：What is a microcontroller programmer

Popular Resources
Popular amplifiers