Error handling in C programming for embedded systems

Aguilera

Error handling in C programming for embedded systems [Copy link]

1. Error Concept
1.1 Error Classification
In terms of severity, program errors can be divided into fatal and non-fatal. For fatal errors, no recovery action can be performed. The most that can be performed is to print an error message on the user's screen or write it to a log file, and then terminate the program. For non-fatal errors, most are temporary in nature (such as resource shortages), and the general recovery action is to try again after a delay.
In terms of interactivity, program errors can be divided into user errors and internal errors. User errors are presented to users and usually indicate errors in user operations; while internal program errors are presented to programmers (who may carry data details that are not accessible to users) for error checking and troubleshooting.
Application developers can decide which errors to recover and how to recover. For example, if the disk is full, consider deleting non-essential or expired data; if the network connection fails, consider reestablishing the connection after a short delay. Choosing a reasonable error recovery strategy can avoid abnormal termination of the application, thereby improving its robustness.
1.2 Processing Steps
Error handling is the handling of any unexpected or abnormal conditions that occur during program execution. Typical error handling includes five steps:
1) Software errors occur during program execution. The error may be caused by a hardware response event (such as division by zero) that is mapped as a software error by the underlying driver or kernel.
2) The cause of the error and related information are recorded with an error indicator (such as an integer or structure).
3) The program detects the error (reads the error indicator, or it actively reports it);
4) The program decides how to handle the error (ignore, partially handle, or completely handle);
5) Resume or terminate the execution of the program.
The above steps are expressed in C language code as follows:
int func()
{
int bIsErrOccur = 0;
//do something that might invoke errors
if(bIsErrOccur) //Stage 1: error occurred
return -1; //Stage 2: generate error indicator
//...
return 0;
}

int main(void)
{
if(func() != 0) //Stage 3: detect error
{
//Stage 4: handle error
}
//Stage 5: recover or abort
return 0;
}
The caller may hope that a successful return from the function indicates complete success, and that the program will be restored to the state before the call in case of failure (but it is difficult for the called function to guarantee this).
2. Error Propagation
2.1 Return Value and Return Parameters
The C language usually uses the return value to indicate whether the function is executed successfully. The caller checks the return value through statements such as if to determine the execution status of the function. Several common calling forms are as follows:
if((p = malloc(100)) == NULL)
//...

if((c = getchar()) == EOF)
//...

if((ticks = clock()) < 0)
//...
The return value of Unix system call-level functions (and some old Posix functions) sometimes includes both error codes and useful results. Therefore, the above calling form can receive the return value and check the error in the same statement (return a legal data value when the execution is successful).
The benefits of the return value method are simplicity and efficiency, but there are still many problems:
1) Reduced code readability
Functions without return values are unreliable. However, if each function has a return value, in order to maintain the robustness of the program, the correctness of each function must be verified, that is, its return value must be checked when called. In this way, a large part of the code may be spent on error handling, and the debugging code and the normal process code are mixed together, which is quite confusing.
2) Quality degradation
Conditional statements have more potential errors than other types of statements. Unnecessary conditional statements increase the workload of troubleshooting and white box testing.
3) Limited information
Only one value can be returned through the return value, so it can only simply mark success or failure, and cannot be used as a means to obtain specific error information. Multiple values can be flexibly returned through bit encoding, but it is not commonly used. String processing functions can refer to IntToAscii() to return specific error causes and support chain expressions:
char *IntToAscii(int dwVal, char *pszRes, int dwRadix)
{
if(NULL == pszRes)
return "Arg2Null";

if((dwRadix < 2) || (dwRadix > 36))
return "Arg3OutOfRange";

//...
return pszRes;
}
4) Definition conflict
Different functions may have different return value rules for success and failure. For example, Unix system call-level functions return 0 for success and -1 for failure; new Posix functions return 0 for success and non-0 for failure; the isxxx function in the standard C library returns 1 for success and 0 for failure.
5) Unconstrained
The caller can ignore and discard the return value. When the return value is not checked and processed, the program can still run, but the result is unpredictable. The
new Posix function return value only carries status and exception information, and returns useful results through pointers in the parameter list. The return parameters are bound to the corresponding actual parameters, so the caller cannot completely ignore them. Multiple values can be returned through return parameters (such as structure pointers), and more information can be carried.
Combining the advantages of return values and return parameters, the return value (including useful results) method can be used for Get-type functions, and the return value + return parameter method can be used for Set-type functions. For pure return values, the following parsing interface can be provided as needed:
typedef enum{
S_OK, //SuccessS_ERROR
, //Failure (unclear reason), general statusS_NULL_POINTER
, //Parameter pointer is NULL
S_ILLEGAL_PARAM, //Illegal parameter value, generalS_OUT_OF_RANGE
, //Parameter value exceeds
the limitS_MAX_STATUS //Cannot be used as return value status, only used as enumeration maximum value
}FUNC_STATUS;
#define RC_NAME(eRetCode) \
((eRetCode) == S_OK ? "Success" : \
((eRetCode) == S_ERROR ? "Failure" : \
((eRetCode) == S_NULL_POINTER ? "NullPointer" : \
((eRetCode) == S_ILLEGAL_PARAM ? "IllegalParas" : \
((eRetCode) == S_OUT_OF_RANGE ? "OutOfRange" : \
"Unknown")))))
When the return value error code comes from a downstream module, it may conflict with the error code of this module. At this time, it is recommended not to pass the downstream error code directly upward to avoid confusion. If it is allowed to output error information to the terminal or file, the error scene (such as function name, error description, parameter value, etc.) can be recorded in detail, and converted into the error code defined by this module before passing it upward.
2.2 Global status flag (errno)
When a Unix system call or some C standard library functions fail, a negative value is usually returned, and the global integer variable errno is set to a value containing error information. For example, the open function returns -1 when an error occurs, and sets errno to a value such as EACESS (insufficient permissions). The
C standard library header file <errno.h> defines errno and its possible non-zero constant values (starting with the character 'E'). Some basic errno constants have been defined in ANSI C, and the operating system will also expand some of them (but its error description is still insufficient). In Linux, error constants are listed in the errno(3) manual page, which can be viewed with the man 3 errno command. All error numbers specified by POSIX.1 have different values, except for EAGAIN and EWOULDBLOCK, which have the same value.
Posix and ISO C define errno as a modifiable integer lvalue, which can be an integer containing the error number or a pointer to a function that returns the error number. The definition used previously was:
extern int errno;
However, in a multithreaded environment, multiple threads share the process address space, and each thread has its own local errno (thread-local) to prevent one thread from interfering with another thread. For example, Linux supports multi-threaded access to errno, and defines it as:
extern int *__errno_location(void);
2 #define errno (*__errno_location())
The function __errno_location has different definitions in different library versions. In the single-threaded version, it directly returns the address of the global variable errno; in the multi-threaded version, different threads call __errno_location and return different addresses.
In the C runtime library, errno is mainly used in functions declared in the math.h (mathematical operations) and stdio.h (I/O operations) header files.
The following points should be noted when using errno:
1) When a function returns successfully, it is allowed to modify errno.
For example, when calling the fopen function to create a new file, other library functions may be called internally to detect whether there is a file with the same name. The library function used to detect files may fail and set errno when the file does not exist. In this way, every time the fopen function creates a new file that does not exist before, errno may still be set even if no program error occurs (fopen itself returns successfully).
Therefore, when calling a library function, you should first check the return value as an error indication. Only when the function return value indicates an error, check the errno value:
//Call library function
2 if (return error value)
3 //Check errno
2) When a library function returns an error, errno may not be set, depending on the specific library function.
3) errno is set to 0 at the beginning of the program, and no library function will clear errno again.
Therefore, before calling a runtime library function that may set errno, it is best to set errno to 0. Check the value of errno after the call fails.
4) Before using errno, avoid calling other library functions that may set errno. For example:
if (somecall() == -1)
2 {
3 printf("somecall() failed\n");
4 if(errno == ...) { ... }
5 }
The somecall() function sets errno when it returns with an error. But when checking errno, its value may have been changed by the printf() function. To correctly use errno set by somecall(), save its value before calling printf():
if (somecall() == -1)
2 {
3 int dwErrSaved = errno;
4 printf("somecall() failed\n");
5 if(dwErrSaved == ...) { ... }
6 }
Similarly, when calling a reentrant function from within a signal handler, save errno before and restore it afterwards.
5) When using a modern version of the C library, include the use of <errno.h>header file; in very old Unix systems, this header file may not exist, in which case you can manually declare errno (such as extern int errno).
The C standard defines two functions, strerror and perror, to help print error information.
#include <string.h>
char *strerror(int errnum);
This function maps errnum (i.e., errno value) to an error message string and returns a pointer to the string. The error string and other information can be combined and output to the user interface, or saved to a log file, such as printing the error message to the file pointed to by fp through fprintf(fp,"somecall failed(%s)", strerror(errno)).
The perror function outputs the error message string corresponding to the current errno to the standard error (i.e., stderr or 2).
#include <stdio.h>
void perror(const char *msg);

This function first outputs the string pointed to by msg (user-defined information), followed by a colon and a space, then the error type description corresponding to the current errno value, and finally a newline. When redirection is not used, the function outputs to the console; if the standard error output is redirected to /dev/null, no output will be seen.
Note that the error message set corresponding to errno in the perror() function is the same as that of strerror(). But the latter can provide more positioning information and output methods.
The usage examples of the two functions are as follows:
int main(int argc, char** argv)
{
errno = 0;
FILE *pFile = fopen(argv[1], "r");
if(NULL == pFile)
{
printf("Cannot open file '%s'(%s)!\n", argv[1], strerror(errno));
perror("Open file failed");
}
else
{
printf("Open file '%s'(%s)!\n", argv[1], strerror(errno));
perror("Open file");
fclose(pFile);
}

return 0;
}
The execution result is:
[wangxiaoyuan_@localhost test1]$ ./GlbErr /sdb1/wangxiaoyuan/linux_test/test1/test.c
Open file '/sdb1/wangxiaoyuan/linux_test/test1/test.c'(Success)!
Open file: Success
[wangxiaoyuan_@localhost test1]$ ./GlbErr NonexistentFile.h > test Open file failed : No such file or directory [wangxiaoyuan_@localhost test1]$ ./GlbErr NonexistentFile.h 2> test
Cannot open file 'NonexistentFile.h'(No such file or directory)! You can also imitate the definition and processing of errno and customize your own error code: int *_fpErrNo(void) { static int dwLocalErrNo = 0; return &dwLocalErrNo; } #define ErrNo (*_fpErrNo()) #define EOUTOFRANGE 1 //define other error macros... int Callee(void) { ErrNo = 1; return -1; } int main(void) { ErrNo = 0; if((-1 == Callee()) && (EOUTOFRANGE == ErrNo)) printf("Callee failed(ErrNo:%d)!\n", ErrNo); return 0; }

With the help of global status flags, the interface of the function (return value and parameter list) can be fully utilized. But like the return value, it implicitly requires the caller to check the flag after calling the function, and this constraint is equally fragile.
In addition, the global status flag has the risk of reuse and overwriting. The function return value is an unnamed temporary variable, generated by the function and can only be accessed by the caller. After the call is completed, the return value can be checked or copied, and then the original return object will disappear and cannot be reused. Because it is unnamed, the return value cannot be overwritten.
2.3 Local jump (goto)
Use the goto statement to jump directly to the error handling code in the function. Take the division by zero error as an example:
double Division(double fDividend, double fDivisor)
{
return fDividend/fDivisor;
}
int main(void)
{
int dwFlag = 0;
if(1 == dwFlag)
{
RaiseException:
printf("The divisor cannot be 0!\n");
exit(1);
}
dwFlag = 1;
double fDividend = 0.0, fDivisor = 0.0;
printf("Enter the dividend: ");
scanf("%lf", &fDividend);
printf("Enter the divisor : ");
scanf("%lf", &fDivisor);
if(0 == fDivisor) //Not very rigorous floating point number comparison
goto RaiseException;
printf("The quotient is %.2lf\n", Division(fDividend, fDivisor));
return 0;
}
The execution result is as follows:
[wangxiaoyuan_@localhost test1]$ ./test
Enter the dividend: 10
Enter the divisor : 0
The divisor cannot be 0!
[wangxiaoyuan_@localhost test1]$ ./test
Enter the dividend: 10
Enter the divisor : 2
The quotient is 5.00Although
goto statements can destroy code structure, they are very useful for centralized error handling. The following is a pseudocode example:
CallerFunc()
{
if((ret = CalleeFunc1()) < 0);
goto ErrHandle;
if((ret = CalleeFunc2()) < 0);
goto ErrHandle;
if((ret = CalleeFunc3()) < 0);
goto ErrHandle;
//...
return;

ErrHandle:
//Handle Error(eg printf)
return;
}
2.4 Non-local jumps (setjmp/longjmp)
Local goto statements can only jump to labels within the function in which they are located. If you want to jump across functions, you need to use the non-local jump functions setjmp() and longjmp() provided by the standard C library. They play the role of non-local labels and goto respectively, and are very suitable for handling errors that occur in deeply nested function calls. "Non-local jump" is to skip several call frames on the stack and return to a function on the current function call path.
#include <setjmp.h>
int setjmp(jmp_buf env);
void longjmp(jmp_buf env,int val);
The function setjmp() saves the current system stack environment when the program is running in the buffer env structure. The return value is 0 when the function is called for the first time. The longjmp() function restores the previous stack environment based on the env structure saved by setjmp(), that is, "jumps back" to the program execution point when setjmp was previously called. At this time, the setjmp() function returns the parameter val value set by the longjmp() function, and the program will continue to execute the next statement after the setjmp call (as if it had never left setjmp). The parameter val is a non-zero value. If it is set to 0, the setjmp() function returns 1.
It can be seen that setjmp() has two types of return values, which are used to distinguish whether it is the first direct call (return 0) or a jump from somewhere else (return a non-zero value). For a setjmp, there can be multiple longjmps, so these longjmps can be distinguished by different non-zero return values.
Take a simple example to illustrate the non-local jump of setjmp/longjmp:
jmp_buf gJmpBuf;
void Func1(){
printf("Enter Func1\n");
if(0)longjmp(gJmpBuf, 1);
}
void Func2(){
printf("Enter Func2\n");
if(0)longjmp(gJmpBuf, 2);
}
void Func3(){
printf("Enter Func3\n");
if(1)longjmp(gJmpBuf, 3);
}
int main(void)
{
int dwJmpRet = setjmp(gJmpBuf);
printf("dwJmpRet = %d\n", dwJmpRet);
if(0 == dwJmpRet)
{
Func1();
Func2();
Func3();
}
else
{
switch(dwJmpRet)
{
case 1:
printf("Jump back from Func1\n");
break;
case 2:
printf("Jump back from Func2\n");
break;
case 3:
printf("Jump back from Func3\n");
break;
default:
printf("Unknown Func!\n");
break;
}
}
return 0;
}
The execution result is:
dwJmpRet = 0
2 Enter Func1
3 Enter Func2
4 Enter Func3
5 dwJmpRet = 3
6 Jump back from Func3When
setjmp/longjmp is embedded in a single function, it can simulate the nested function definition in the PASCAL language (that is, defining a local function within a function). When setjmp/longjmp is used across functions, it can simulate the exception mechanism in object-oriented languages.
When simulating the exception mechanism, first set a jump point through the setjmp() function and save the return scene, and then use the try block to contain the code that may cause errors. You can throw an exception through the longjmp() function in the try block code or in the function it calls. After throwing an exception, it will jump back to the jump point set by the setjmp() function and execute the exception handler contained in the catch block.
Take the division by zero error as an example:
jmp_buf gJmpBuf;
void RaiseException(void)
{
printf("Exception is raised: ");
longjmp(gJmpBuf, 1); //throw, jump to exception handling code
printf("This line should never get printed!\n");
}
double Division(double fDividend, double fDivisor)
{
return fDividend/fDivisor;
}
int main(void)
{
double fDividend = 0.0, fDivisor = 0.0;
printf("Enter the dividend: ");
scanf("%lf", &fDivisor);
printf("Enter the divisor : ");
if(0 == setjmp(gJmpBuf)) //try block
{
scanf("%lf", &fDivisor);
if(0 == fDivisor) //You can also put this judgment and RaiseException in Division
RaiseException();
printf("The quotient is %.2lf\n", Division(fDividend, fDivisor));
}
else //catch block (exception handling code)
{
printf("The divisor cannot be 0!\n");
}
return 0;
}
The execution result is:
Enter the dividend: 10
2 Enter the divisor : 0
3 Exception is raised: The divisor cannot be 0!
By using the setjmp/longjmp functions in combination, exceptions that may occur in complex programs can be centrally processed. Different exceptions can be processed according to the return value passed by the longjmp() function.
The following points should be noted when using the setjmp/longjmp functions:
1) The setjmp() function must be called first and then the longjmp() function to restore to the previously saved program execution point. If the calling order is reversed, the program execution flow will become unpredictable and it is easy to cause the program to crash.
2) The longjmp() function must be within the scope of the setjmp() function. When the setjmp() function is called, the program execution point environment it saves is only valid within the scope of the current calling function (or later). If the calling function returns or exits to the upper (or higher) function environment, the program environment saved by the setjmp() function will also become invalid (the stack memory will become invalid when the function returns). This requires that setjmp() should not be encapsulated in a function. If it is to be encapsulated, a macro must be used (see "C Language Interface and Implementation" "Chapter 4 Exceptions and Assertions" for details).
3) The jmp_buf variable is usually defined as a global variable to facilitate longjmp calls across functions.
4) In general, variables stored in memory will have the value at the time of longjmp, while variables in the CPU and floating-point registers will be restored to the value at the time of calling setjmp. Therefore, if the value of an automatic variable or register variable is modified between calling setjmp and longjmp, the variable will maintain the modified value when setjmp returns from the longjmp call. To write a portable program that uses non-local jumps, the volatile attribute must be used.
5) Using the exception mechanism does not require checking the return value once for each call, but because an exception may be thrown anywhere in the program, you must always consider whether to catch the exception. In large programs, determining whether to catch an exception will be a great burden of thinking, affecting development efficiency. In contrast, indicating an error through a return value helps the caller to check the most recent error location. In addition, the running order of the program in the return value mode is clear at a glance, which is more readable for maintainers. Therefore, it is not recommended to use the setjmp/longjmp "exception handling" mechanism in applications (unless it is a library or framework).