Suggestions for clock tolerance correction in the CAN standard-EEWORLD

Collect

CAN is an event-triggered communication protocol that uses a lossless arbitration method of identifiers (IDs) to schedule the transmission of different messages. Arbitration relies on bit values, so the accuracy of bit value sampling is very important. In order to obtain the same accurate sampling at all nodes, the synchronization of bit time is the key. In normal transmission, it is also necessary to overcome the sampling point changes caused by the clock difference between nodes to reduce the error during reading. However, whether the bit time can be synchronized depends on the clock deviation. For this reason, the CAN standard specifies the calculation method of clock tolerance. It is now found that the calculation formula specified in the standard is not enough, which will affect the reliability of the application. In particular, the standard 939 commonly used in the automotive industry is based on CAN 2.0B. The clock tolerance has a greater impact on CAN 2.0B, so it is very important to correct it. A complete understanding of the standard is not only directly related to the application, but also has new implications for further improving CAN performance.

1 ISO 11898-1:2003 regulations on clock tolerance
ISO 11898-1:2003 Section 12.4.1.2 stipulates that the unit of bit time is Tq, which is a configurable parameter. There are NTQ (8 to 25) Tq in 1 bit, which is obtained by dividing the oscillator. Due to the limited oscillator and divider resources of the hardware, the selection is limited. Each bit is divided into 4 segments: synchronization segment S (Tq), transmission segment Pr (Tq to 8Tq), buffer 1 segment P1 (Tq to 8Tq) and buffer 2 segment P2 (Tq to 8 Tq), which are all configurable parameters. The bit value is sampled at the boundary between P1 and P2. CAN divides synchronization into two types: hard synchronization and resynchronization. When the bus is idle, the transition edge (R/D transition edge) from the recessive bit to the dominant bit of the new frame SOF causes hard synchronization, and immediately resets the local bit time to the S segment. The R/D transition edge in frame transmission causes resynchronization. When the transition edge falls on the P2 segment after the previous bit sampling, the P2 is shortened. When the transition edge falls after S, the length of the local bit P1 is lengthened. At this time, the maximum absolute amount of the local bit time correction does not exceed the value of SJW (resynchronization jump width). SJW is a configuration parameter between Tq and 4 Tq. For some more in-depth discussions on CAN bit time and synchronization, please refer to the references.
In the CAN standard, the term oscillator tolerance is used to represent the clock tolerance. In actual implementation, some implementation schemes use an oscillator plus a phase-locked loop to form a clock. In this case, the CAN clock deviation consists of two parts. In order to be consistent with the standard text, this article does not strictly distinguish between clock tolerance and oscillator tolerance. When the relative error of the oscillator frequency is expressed as △, according to the provisions of 12.4.2.5 of ISO11898-1, it has two constraints.
① During normal transmission, due to the CAN fill bit rule, the resynchronization distance is at most 10 bits. For correct synchronization, we have:
(2×△f)×10×NBT where NBT is the nominal bit time.
② When an error occurs, the node with the error must send an error frame. In order to distinguish whether it is a local error or a global error, it is necessary to examine whether the 7th bit after the active error flag is sent is still a dominant bit. Since there may be 6 dominant bits before the error, the two synchronization segments S are 13 bits apart. The allowable difference is less than the buffer segment length:
(2×△f)×(13×NBT-P2) The smallest of the two inequalities is the clock tolerance of this application. For example, when Tbit=1 000 ns, the bus length is 20 m, and the transceiver delay is 150 ns, the entire transmission delay is Tprop=500 ns, and Tq=125 ns is taken, and Pr=4, P1=1, P2=2, SJW=1, and NBT=8 are calculated. The △f calculated by the above two formulas are 0.006 25 and 0.004 90 respectively. The smaller one, 0.004 90, is taken, which is close to 0.5%.

2 Problems in Transmitter Clock Synchronization
The CAN bus has the characteristic that the dominant bit takes precedence over the recessive bit, that is, when multiple nodes on the bus send simultaneously, as long as one node sends a dominant bit, the final result on the bus is a dominant bit. Therefore, when two nodes at a certain distance send dominant bits at the same time, due to the time required for transmission, one node cannot see the R/D transition edge of the other node (as shown in Figure 1, e is the phase difference of synchronization), because the bus level has already been set to the dominant bit by the node.

In this case, even if the clocks of nodes A and B are different, they cannot establish synchronization. Assume that A is faster than B. Only when the synchronization segment S of node A is more and more ahead of the synchronization segment S of node B, and the amount of advance is greater than the transmission time, can node B see the R/D transition edge of A and start synchronization.
Now let's analyze the synchronization problem of two transmitters in the arbitration area. Assume that they see that the bus is idle and start to send at the same time. Their IDs are different only in the last 1 bit. There is a literature introducing the synchronization process of sending nodes. It is assumed that when the bus is idle, one transmitter is ahead of other transmitters and sends more than Pr/2 segments. Due to the randomness of the occurrence of events, this is only a special case. For timing messages, they are triggered by the clock of the local node. However, there is no synchronization between the local clocks, and there is a frequency difference between them, so the phase difference of the time specified by the timing message to be sent will change periodically. The situation where one transmitter is ahead of all other transmitters is also only a special case. Under the assumption of this article, since the R/D transition edges that appear in each bit before the ID are not seen by the other party, there is no synchronization relationship between them. Assuming that the node with slow clock has explicit bit and the node with fast clock has hidden bit, and assuming that the first bit of ID is hidden bit, then the R/D transition edge of slow node may be seen by fast node. However, it will have a large phase difference, which may exceed the resynchronization jump width SJW, so that the fast node cannot be correctly synchronized, which will cause the sampling to produce bit value reading error in the place where the level sent by the slow node is not stable.
In the last bit of 29-bit ID of CAN 2.0B, that is, the 31st bit of arbitration domain, there may be 7 padding bits, that is, 37 bits have not been synchronized. In order to sample correctly, the difference between the synchronization segments of fast and slow sending nodes when not synchronized should be less than the resynchronization jump width SJW:
(2×△f)×37×NBT≤SJW (3)
Taking the data of the above example, NBT=8, SJW=1, we get △f≤0.001 68, which is much smaller. [page]

Since the receivers close to the fast node can see the R/D transition edge of the fast node, they have been synchronized with the fast node. Finally, they have to synchronize with the slow node that has not been synchronized, and they will also encounter synchronization problems when the phase difference is large. If the last 2 bits of the ID are already dominant, the fast node will not see the transition edge of the slow node, because there is no R/D transition edge for synchronization at this time. The fast node in the arbitration domain and the nearby receivers that have been synchronized with it will be completely unable to synchronize with the slow node. They will sample according to their original phase and bit time. In the worst case, it will take another 7 bits to see the next R/D transition edge of the slow node, as shown in Figure 2.

The transmitter that has not seen the transition edge in the arbitration domain has now become a receiver. However, if the subsequent transition edge can be correctly synchronized, the offset of the sampling point in the arbitration domain will be smaller, which should ensure the correct sampling, that is, the correct arbitration. At this time, the maximum time that the arbitration domain is not synchronized is 40 bits (including the possible 8-bit padding bit), and the next nearest transition edge is 6 bits later. Therefore, in order to correctly sample, it should be:
(2×△f)×46×NBT≤SJW (4)
Still using the data in the example in the first part, NBT=8, SJW=1, we can get △f≤0.001 35, which is a little smaller. If NBT=25 and SJW=1 in a certain system, we can get △f≤0.00043. This is a very small value.
According to the above analysis, the worst case is formula (4). For CAN2.0 A, the worst case can be derived as follows:
(2×△f)×21×NBT≤SJW (5)
If the transmission time is relatively short, and the slow node has synchronized after delaying the fast node by Pr/2, then at the last bit of the slow node, the fast node will see the synchronization edge from the slow node with e=Pr. To ensure synchronization, at least:
Pr≤SJW (6)
If (6) is satisfied, synchronization can occur. For example, in a system where the transmission delay occupies a small time share, the original ISO11898-1 formula is used. In use, when reading the ACK bit, the worst synchronization edge will be 11 bits apart (the CRC delimiter is not included in the fill bit rule), and (1) should also be modified. If (6) is not satisfied, such as in a high-speed system, (4) or (5) should be considered. ISO11898-1 should add the above content. Generally speaking, (4) and (5) are more stringent than (1) and (2). If simplification is required, they are sufficient. For example, for CAN2.0A, using the data from the previous example, NBT=8, SJW=1, from equation (5) we can get △f≤0.002 97, which is also better than the original 0.004 90.

3 Clauses that should be added to ISO 16845:2004
ISO16845 has a total of 9 conformance test clauses (8.7.1 to 8.7.9) for transmitter time synchronization related functions, of which only 2 are used for resynchronization with phase difference: 8.7.4, synchronization when e<0 and |e|≤SJW; 8.7.5, synchronization when P<0 and e>SJW. From the analysis in Part 2, it can be known that the last bit of the transmitter's exit in arbitration needs to be synchronized, and it is still in the transmitter state at this time. Only after synchronization can it be sampled correctly, decide whether to exit, and ensure that the winning transmitter can be correctly tracked after exiting. Therefore, it is necessary to add test clauses when e>O and |e|≤SJW, and e>0 and |e|>SJW. These clauses can refer to the corresponding clauses of the receiver (7.7.3 and 7.7.4).
The low-level test equipment LT in ISO 16845 is a dedicated device. It is connected to the Tx and Rx of the CAN under test (called the implementation under test IUT). LT generates the necessary input conditions to Rx, and then measures whether the response of IUT is qualified from Tx. LT should not affect IUT when it is not a test input. Taking the test when e>O and |e|>SJW as an example, IUT is arranged to send a frame with a fill bit in the ID field as a dominant bit. When the fifth recessive bit is sent, LT delays e to turn Rx into a dominant bit, and then creates a recessive bit value for Rx at the new sampling point after the delay (original P1+SJW). At this time, the output Tx of IUT will provide a dominant bit according to the synchronization rule one bit after the R/D transition edge provided by LT, as shown in Figure 3. The above test design is based on the following principle: If the IUT operates normally, it will synchronize with the R/D transition edge provided by LT and sample the recessive bit provided by LT. At this time, it sends its next fill bit, and the R/D transition edge of the fill bit has been moved by the previous synchronization of IUT. If the synchronization is not normal, or the jump amount is not correct, the IUT will sample the dominant bit, it will fail arbitration and exit, and will not send again. The range of P tested is (SJW+1)~(NBT-P2-1). This design is different from the standard 7.7.4. The author believes that the method of 7.7.4 cannot achieve the purpose. The detailed discussion of this issue is beyond the scope of this article and will not be repeated.

It should be pointed out that the transmitter sending the dominant bit in 8.7.2 will not synchronize with the R/D transition edge when e>0. This requirement is different from the synchronization problem when e>0 described in this article. This article talks about the transmitter sending the recessive bit, which sees the R/D transition edge of other transmitters; while 8.7.2 only describes the characteristics of the CAN chip's Tx and Rx when they are tested separately as output and input. In actual application, they are inseparable and will not encounter this situation.

4 Summary
General technical data of CAN mention that the reliability at high speed is worse than that at low speed, and low speed should be used if possible in the application. From the above analysis of the bit synchronization clock tolerance, it can be seen that the clock tolerance is small at high speed, so once there is a problem with the clock, synchronization and sampling will be affected. The CAN standard ISO11898-1:2003 only considers the requirements of some occasions, and does not consider the situation where the transmitter fails to synchronize during the arbitration stage, so the tolerance result given is too wide. Designers may choose an inappropriate oscillation source based on this, resulting in insufficient reliability of the electronic control unit (ECU). For example, there are now some correctable RC or CMOS oscillators with an accuracy close to 0.3% to 2%. They are low-priced and close to the tolerance of the original CAN standard design, and may be inappropriately selected. Therefore, it is necessary to supplement the standard. With the expansion of CAN applications, efforts to increase the operating frequency of CAN are also continuing. On the one hand, some applications can shorten the transmission distance to shorten the transmission time, such as robots, weapons, etc.; on the other hand, CAN is cost-effective and attractive to such applications. In situations where the transmission time is close to critical situations, more attention should be paid to the problem of clock tolerance.
Considering the possibility of solving the problem from another perspective is: set the synchronization of the transmitter and receiver in the arbitration domain and the first synchronization edge after arbitration to hard synchronization. This approach can improve the sampling after synchronization, but it does not improve the sampling before synchronization. They still require a higher precision clock to ensure that the sampling point is within the buffer segments P1 and P2. Moreover, too much hard synchronization increases the chance of unnecessary synchronization for interference, which is not a good thing. Therefore, this solution is worthless.
From the analysis, it can be seen that when the resynchronization jump width SJW can be selected to be larger, the allowable clock deviation is larger. Unrestricted SJW is equivalent to performing hard synchronization at any time. SJW less than P1 and P2 makes the change of the sampling point smaller. When a false R/D jump edge appears on the bus due to interference, it will cause erroneous resynchronization. A small SJW helps to reduce the probability of reading errors. Therefore, the key to taking into account the requirements of reducing the error rate and reducing the clock fault tolerance limit is to design a cost-effective method to filter out interference.

Reference address：Suggestions for clock tolerance correction in the CAN standard

Previous article：Data Acquisition System Based on EZ-USB FX2 and MAX1195
Next article：Data acquisition and control system design based on USB interface

Popular Resources
Popular amplifiers