Author | Huang Gang (Member of the YiBo Technology Expressway Team)
There are only three possible results for debugging DDR: debugging fails, debugging passes, and debugging takes a long time to pass. You may never imagine how long it takes for a DDR module designed by a PCB engineer to pass after it is processed. One day? One week? One month? Even...
Mr. Gaosuo has made great progress in DDR design simulation in recent years, thanks to the AI (artificial intelligence) boom. As the core product in this field, AI computing power cards have become a product that major communication companies and chip companies have been competing to develop in recent years. Among them, the DDR module is the most core module in the AI computing power card, supporting the large-capacity and fast computing capabilities of the computing power card.
Mr. Gaosuo has had the opportunity to come into contact with all kinds of AI computing power cards while cooperating with major companies. Although the DDR modules in them have similar functions, the specific structures have many changes. For example, different capacities lead to different numbers of particles; different board sizes lead to different topologies; different PCB layers lead to different layouts and densities of DDR modules; some have different reference layers for DDR routing due to different power consumption and current, some need to refer to the power layer, and some need adjacent layer routing; of course, there are also different target rates that need to be run, and our design margins will also be different, etc. Therefore, for us, Mr. Gaosuo, the DDR design of each computing power card is different. Of course, after the design and processing, the difficulty of debugging with our customers is also different. Mr. Gaosuo has also shared some classic cases with you at seminars in recent years, so that everyone has a new understanding of the difficulty of DDR design and debugging. Some characteristics of AI products make its design more difficult than any previous DDR products. Of course, we will also have many test and simulation cases. Here, regarding the case of DDR debugging, we will share with you an experience from fail to pass.
On a quiet and peaceful afternoon, Mr. Gaosu had just started his afternoon work, still a little sleepy from a nap, when he suddenly received a very "refreshing" email from a client, which immediately perked everyone up.
It turned out that a main AI computing power card designed and processed by a customer in our company had a debugging fail problem. The customer itself is a company with strong R&D capabilities and is very rigorous. They have rich experience in hardware principles and debugging. However, they have been debugging the DDR module of this product for several weeks and still have not been successful. Since the board was designed by our PCB engineer, Mr. Gaosuo must have been ordered to intervene in their debugging.
Mr. Gaosuo opened the PCB file and saw the connection between the FPGA and the C1 DDR channel. This channel is composed of 9 DDR particles, which is what we call a 1-to-9 DDR topology. Due to the high density of the board, it can only be laid out and wired in the form of front and back stickers, as shown in the figure below.
Since the FPGA chip has a design guide document about DDR, our PCB engineers and customers repeatedly confirmed before the board was put into production that the design of the DDR module was completely routed in accordance with every detailed guide in the document. For example, the length of each segment of L0, L1, L2, etc. in the figure below is required in the document.
The customer thought that the layout and routing were carried out according to the above design guidance, and that the design actually met the requirements, so he insisted on spending nearly a month on debugging, hoping to solve the problem through debugging. After Mr. Gaosuo intervened, he found that the customer had actually done a lot of debugging, including changes in the drive internal resistance, changes in the ODT resistance, fine-tuning of the power supply voltage, changes in the VTT resistance, flying wires, etc., but still could not reach the rated rate of 2400Mbps. Since Mr. Gaosuo had not simulated this project at the time, we first suggested doing a debug simulation, that is, a simulation based on the debugging results, to see whether the simulation test is well-fitted and find out the problem.
Since we are quite confident in the FPGA simulation model of Xilinx and the simulation model of DDR particles, and we have done a lot of simulation test comparisons before, we found that the fitting degree of simulation and test waveforms is relatively high. In addition, Mr. Gaosuo saw that this topology is still very complex, so he is confident that he can get a "poor" simulation result under the configuration parameters debugged by the customer! You heard it right, our debug simulation is to get a poor simulation result, so that it can match the actual debugging fail situation.
Sure enough, what Mr. Gaosuo hoped for happened. When we simulated the address control signal, we found that the signal quality of the DDR particle closest to the FPGA did not meet the requirements. Why do we need to look at the particle closest to the main chip? Mr. Gaosuo has said this many times, so I will not repeat it here.
Similarly, according to the customer's debugging situation, we selected different values of the drive internal resistance and VTT resistance in the simulation. Indeed, similar to the debugging situation, we could not get a good signal quality. So far, we have made a good start. At least we can get the conclusions corresponding to the test results in the simulation.
But what else can Mr. High Speed do in the simulation? Although we found the poor waveform through simulation, it does not provide much guidance for debugging. Therefore, we continue to use the simulation model to see if there are any other driver configurations we can try. We open the ibs model of the FPGA and see the following selectable driver configurations. In fact, we and the customer only use the configuration on the left, which has a choice of 40 to 60 ohm internal resistance. We have tried both simulation and customer debugging, but there is no obvious improvement.
But we were surprised to find that there are two green columns on the model with almost the same configuration as the red columns, but the only difference is the difference between F, M and S. So Mr. Gaosuo took some time to scan the three buffers of F, M and S under the same 40 ohm drive internal resistance. Will there be any difference?
The result surprised and excited Mr. Gaosuo. It turned out that in FAST, MEDIUM and SLOW modes, the waveforms of the same driving internal resistance were significantly different. We can see that in MEDIUM and SLOW modes, the rising edge slew of the signal will be slower, which will avoid some reflections, reduce the ringback of the signal, and increase the margin of the eye height.
According to the above scan results, we selected MEDIUM mode to simulate the whole channel to see if there is any improvement compared with the previous fast mode results.
The result brought joy to Mr. Gaoshang. The results of our simulation using MEDIUM mode can be significantly improved, and the quality of the same particle signal has become acceptable.
Mr. Gaosuo solved the problem by choosing the FPGA model. Selecting a driver with a slower rising edge slew can actually get better signal quality.
At this point, we only have one last question left, which is whether we can let customers choose the MEDIUM mode in the debugging parameter configuration? The customer sent us their debugging software interface, and we saw that there is indeed such a mode to choose from the drop-down menu. Then we asked the customer to automatically switch from the default FAST mode to the MEDIUM mode to see if there is any improvement in the effect.
After waiting for about a day, we all felt relieved when we received a good news email from the customer. After debugging for a month, the customer finally solved the problem quickly by manually debugging the buffer switch.