A method for verifying pointer program-EEWORLD

Collect

As the country, society and daily life rely more and more on software systems, the high reliability of safety-critical software has become a necessary condition for ensuring national security, maintaining sustainable economic development and maintaining social stability.
Formal verification is an important method to improve the reliability of software. Roughly speaking, there are two ways to formally verify software. The first is model checking, which can automatically verify finite state systems by traversing all state spaces of the system and automatically construct counterexamples that do not meet the verification properties. This method is more popular in the industry. The second is logical reasoning, which uses a certain program logic to perform calculations, strictly reason about program properties, generate verification conditions, and then use theorem provers to prove them. The method discussed in this article is based on logical reasoning.
For the reasoning of pointer programs, the key lies in the judgment and processing of aliases. An important limitation of the commonly used Hoare logic is that different names in the program represent different program objects, that is, aliases are not allowed.
One solution to the judgment of pointer aliases is to use separation logic. One problem with using separation logic is that common automatic theorem provers cannot prove verification conditions with separation conjunctions (*, Separating Conjunction), and special automatic theorem proving tools must be designed for separation logic.
This paper proposes a method to eliminate access path aliases by using shape graph information, so that pointer programs can still be verified using Hoare logic.
1 PointerC language and shape graph logic
1.1 PointerC language
PointerC is a C-like language that emphasizes pointer types and adds shape declarations. For detailed syntax information, please refer to reference [1]. In the structure declaration, the shape declaration of the pointer field determines what shape of data structure this structure is used to construct, and also limits the shape that the pointer of this structure type can point to. This is a language extension made to meet the needs of shape analysis. The allowed shapes are singly linked list, circular singly linked list, doubly linked list, and circular doubly linked list.
1.2 Shape graph and shape logic
Before program verification, the program is first analyzed based on shape graph logic. The shape analysis constructs a shape graph for each program point. These shape graphs constitute the pointer information required for program verification. Here, the shape graph [1] is introduced by example.
Figure 1 (the shape graph of the ordered linked list node inserted into the loop invariant of the function in reference [1]) is used as an example to illustrate the connection between the shape graph and program point pointer information. In Figure 1, the round node represents a declared variable of pointer type; the rectangular node with a dashed border does not represent any program element; the rectangular node represents a structure variable generated by malloc; the gray rectangular node is a condensed node, which represents a number (can be 0) of adjacent structure variables of the same type belonging to the same data structure, and there may be an integer expression representing the number of condensed nodes and an assertion constraining the expression at the bottom. If not, it means that the number of condensed nodes is a natural number, but it cannot be associated with any variable or constant. As shown in Figure 1, head == ptr1, ptr == ptr1->next, the length of the linked list pointed to by head is m, and the condensed node pointed to by ptr represents m-1 nodes, which can be represented by the superscript of head(->next)m-1.

It can be seen that the shape graph can be used as a program assertion. It is the conjunction of pointer equality, inequality and alias assertions that the graph can express, including integer data assertions about table length or the number of condensed nodes under the predicate node and the condensed node.
Shape graph logic is an extension of Hoare logic designed based on the above viewpoint. The form of program specification is {G∧Q}S{G′∧Q′}, where G is a shape graph and Q is a symbolic assertion expressing other properties of the program. The conjunction of the two parts G∧Q is used as a complete assertion of the program point. The first step of the program verifier in this paper is to use shape graph logic to perform shape analysis on the program without the programmer providing function pre- and post-conditions and loop invariants related to the shape. Since the deduction of G′ after a statement from G before the statement is not affected by Q, during shape analysis, the program specification is simplified to {G}S{G′}, so as to use the inference rules of shape graph logic to establish the shape graph G of each program point. In the process of shape analysis, the loop invariant inference algorithm is also used to obtain the loop invariant shape graph of each loop [2].
After completing the shape analysis, the program verifier verifies other properties of the program Q. In {G∧Q}S{G′∧Q′}, if S is not a pointer operation statement, then G′ is the same as G, but Q′ may be different from Q. If S is a pointer operation statement (pointer assignment, allocation of space, and release of space, etc.), in addition to the possible differences between G′ and G, Q′ and Q may also have some subtle differences. The following is the part that this article focuses on.
2 Verification method of pointer program
When there are multiple possible shapes of program point data structures, G is expressed as G1∨G2∨…∨Gn. Similarly, Q may also be a disjunctive form of Q1∨Q2∨…∨Qm. The complete assertion can be organized into the form of Disjunctive Normal Form G1∧Q1∨G2∧Q2∨…∨Gk∧Qk. According to shape graph logic, we can use a case of disjunctive normal form as an example to discuss, written as G∧Q, where G and Q are both in conjunction form.
The program verifier performs the strongest postconditional calculation based on shape graph logic [2] and generates verification conditions, which are automatically proved by the prover Z3 [3].
2.1 The connection between shape graphs and symbolic assertions
Symbolic assertions Q allow assertions about whether a pointer is equal to NULL or whether two pointers are equal. Even if there are no such assertions in function pre- and postconditions and loop invariants, they will be added to Q during the strongest postconditional calculation because they appear in the Boolean expressions of conditional statements or loop statements.
The assertion that the pointer is equal to NULL or the two pointers are equal in Q will be absorbed because of the duplication of information with G, or because of the contradiction, making G∧Q false.
The legality of the access path in Q depends on G. For example, if a non-pointer access path p -> … -> data appears in Q, then ignoring ->data, the remaining prefix should be an access path on G to a certain structure node. If it reaches a dangling node, a null node, or there is no such path, it is illegal. If it reaches a predicate node, it depends on the situation after the predicate node is expanded.
Whether there are aliases between access paths in Q, between access paths in Q and the access paths in the next statement S, and between access paths in S all depend on G, that is, G can be used to judge.
In pointer operation statements, when assigning a value to pointer u, it may affect symbolic assertions: if there is an access path with u or u as a prefix in the symbolic assertion, u should be replaced with u′ which is equal to u but not an alias. Another occasion that affects symbolic assertions is that atomic assertions involving data on the released node should be deleted after the free statement.
G also has symbolic assertions, which are attached to the condensed node to limit the number of structure nodes it represents. There will be no contradiction between the symbolic assertions of G and those of Q, but the former sometimes gives more accurate information.

2.2 Extension of program reasoning rules
When using reasoning rules to generate post-condition G′∧Q′ from the pre-condition G∧Q of statement S, it is necessary to ensure that Q is legal and that there is no duplication or contradiction between Q and G.
First, consider that S is a pointer operation statement. Simple statements that modify pointer data will cause changes in pointer values, or increase or decrease storage heap blocks, thus causing changes in the shape graph. According to the introduction in Section 2.1, the impact on Q is the replacement of access paths or the deletion of some assertions. First, assume that Q and S have no aliases. The case of aliases will be introduced when considering non-pointer operation statements. The following are various statement rules.
(1) Pointer assignment statement u=v
If u is neither a null pointer nor a dangling pointer, the post-assertion is obtained according to the following rules.
{G∧Q} u = v {G′∧Q[u′/u]}
Where G′ is the shape graph obtained by shape analysis, and Q[u′/u] represents the replacement of u in Q as an access path (including as a prefix) with its equal and non-aliased access path u′.
(2) Other statements for pointer assignment
The allocation statement u=malloc(t) and the function call statement ret=f(act) are handled in the same way as the assignment statement above.
(3) After the freeing statement free(u)
releases the node pointed to by u, the atomic assertions in Q containing u or aliases of u should no longer exist, so the rules are as follows:
{G∧Q} free(u) {G′∧Q′}
where Q′ is obtained by deleting all atomic assertions in Q containing u or aliases of u.
It is easy to understand that if Q has no aliases, the rules of these statements will not cause Q′ to have aliases, because the small modifications they make to Q will not introduce aliases. Consider
non-pointer operation statements again. As long as there are no aliases in the preceding assertion Q and the statement S, the use of Hoare's assignment axiom is reliable. If there are aliases, the information of G can be used to eliminate the aliases first (change the access paths that are aliases to use the same access path), and then use the assignment axiom. Define the eliminate_aliases function as (S′,Q′)=eliminate_aliases(G, S, Q), which eliminates the aliases in S and Q according to G and obtains S′ and Q′.
By limiting the assignment axiom of Hoare logic to be used only when there are no aliases and adding the following alias elimination inference rule, we can reason about programs with access path aliases.
For statements that modify pointer data, the predicate Q may be provided by the programmer. For example, it is not ruled out that Q has aliases in the loop invariant, so this rule is sometimes needed.
The rules for compound, conditional and loop statements and the form of inference rules are consistent with the corresponding rules of Hoare logic.

3 System Prototype
Based on shape graph logic, a program verifier for PointerC language [1] is implemented. It can verify programs using various shapes defined in Figure 1. In addition to verifying shapes, it can also verify some properties of data on nodes. This paper verifies the function of inserting nodes into an ordered linked list.
The verifier is divided into the following modules, which are executed in the order listed:
(1) The front end of a general compiler. After lexical analysis, syntax analysis and static semantic checking of the source program, an abstract syntax tree is generated.
(2) Shape analysis. Traverse the abstract syntax tree and generate shape graphs for each program point based on shape declarations and shape graph logic.
(3) Generation of verification conditions. Traverse the abstract syntax tree and generate verification conditions for each function according to the strongest post-condition calculation method based on the function pre- and post-conditions and loop invariants provided by the programmer.
(4) Proof of verification conditions. Translate each generated verification condition G∧Q?圯Q′ into the form of P∧Q?圯Q′ according to the method introduced in the previous section, and give them to the prover one by one for proof.
This paper proposes a method that uses shape graph information to eliminate access path aliases so that pointer programs can still be verified using Hoare logic. With the support of an automatic theorem prover, a prototype of a program verifier for the PointerC language is developed to demonstrate the feasibility of this method.
References
[1] ZHANG Y, LI ZP, CHEN YY, et al. Shape graph logic and A shape system (Extended Verison). URL: http://ssg.ustcsz.edu.cn/content/shape-graph-logic.2010(11).
[2] NECULA G. Proof-carrying code,In Proc.24th ACM Symp.On Principles of Prog.Lang. New York, 1 997(1):106-119.
[3] MOURA LD, BJORNER N. Z3: An Efficient SMT solver, conference on tools and algorithms for the construction and analysis of Systems(TACAS). Budapest, Hungary, volume 4963 of LNCS, 2008:337-340.