This post was last edited by le062 on 2018-11-5 05:52Written in the front According to my previous tests, the continuous read and write speeds of ST-Link V2 and Jlink V8 are both around 160KB/S, while most CMSIS-DAP debuggers are limited by full-speed HID and it is difficult to increase the speed. The continuous read and write speed of
+ OpenOCD is only 23KB/S. At the beginning of the year, I tried to use NUC505 for CMSIS-DAP. 505 should be the cheapest integrated USB HS PHY microcontroller on the market. The high-speed HID message can be set to 1024Byte, and the send and receive interval is 125uS. Its USB part has no bottleneck at all. However, its SPI module is very slow. After a transmission is completed, it will wait for several CLKs before triggering the completion flag. Even if the CLK is increased to 14MHz, the continuous read and write speed is only at the level of 150KB/S. It can only be said that a rotten wood cannot be carved. The protagonist of this time, GD32F350, was known last year. It is said that it is the same price as GD32F150. The USB part is changed to DWCOTG, and the execution speed of the 32KB code will not be slow. After reading the datasheet in detail, it is found that the internal 48M clock can be calibrated by the USB SOF signal, so there is no need for a crystal oscillator. The price is claimed to be as low as 30 cents in bulk. Well, it's so-so. Anyway, I bought 3pcs on Taobao for a total of 21 yuan. In general, the "seemingly" super low solution price, similar to the SPI design of the ST-LinkV2 main chip, may also be able to reach the mainstream speed indicator of 160KB/S, which is worth a try. 1. Hardware Design The simpler the hardware, the better. A QFN28 GD32F350 with some resistors and capacitors is enough. Considering the bootloader and the adapter shell, a button and a two-color LED are needed. The debugging interface needs to be brought out during the development phase, and the LQFP48 package chip must be used, so the test board is compatible with both. For hardware information, see this post.
2. Software DesignDesign requirements: While retaining the basic functions of the CMSIS-DAP debugger, maximize the chip potential of GD32F350 and maximize the SWD/JTAG debugging speed through the OpenOCD dedicated driver and USB BULK transmission method. Functional requirements:
Support SWD+SWO, JTAG debugging interface
Support one USBCDC serial port
Compatible with CMSIS-DAP HID protocol, driver-free use on all platforms
Develop BULK transmission interface
Support OpenOCD BULK transmission interface
Bootloader emulating USB disk
Key points of hardware driver development:
DWCOTG integrated in GD32F350 only supports 4 bidirectional endpoints, of which No. 0 is the control endpoint, No. 1 is used for BULK transmission interface, No. 2 is used for CMSIS-DAP HID interface, No. 3 is used for CDC data interface, and No. 4 is configured as CDC control interface, but No. 4 port does not actually exist, and the device always returns NAK response, which does not affect the CDC serial port function.
CDC serial port supports host computer configuration of baud rate, ranging from: 8M, 4M, 3.2M-2K. For the design details of the high-speed serial port driver, please refer to this post.
For the SWD interface, the CLK in one SPI is used to output SWCLK, and MISO and MOSI are combined to implement SWDIO. Take the specific waveform as an example:
As shown in the figure, the waveform timing of the basic unit of SWD transmission consists of three blocks: request, response and data, and there are some idle bits between each block. The request is 8 bits, the response block and its front and back idle bits are uncertain, and the data block contains 32 bits of data and its tail idle bits. The transmission direction between blocks may be inconsistent, so it needs to be divided into three sections. In GD32F350, 16-bit continuous waveforms are transmitted through SPI 16-bit mode (such as the data block in the figure above), 8-bit continuous waveforms are transmitted through SPI 8-bit mode (such as the request block in the figure above), and the remaining waveforms are completed through IO flipping (such as the response block in the figure above). All hardware in DAPLINK uses GPIO flipping to complete timing operations. In the actual optimization process, the fastest GPIO frequency can reach 8M, but this mode will occupy a lot of CPU, which does not feel very appropriate.
The JTAG interface part is similar to the SWD interface, one master SPI and one slave SPI. Take the specific waveform as an example:
The idea is that the full 8 bytes are sent and received through SPI, and the remaining bytes are flipped by GPIO. The JTAG part has not been optimized for timing, so there is no 16BIT continuous waveform, and the GPIO flip delay is also large.
Protocol development: The protocol layer code was completed in the previous NUC505 solution. The logic is the same as CMSIS-DAP, but the protocol layer is decoupled from the driver layer. The protocol layer can asynchronously call the transmission interface and then wait for the transmission completion event. The driver part of NUC505 is also completely non-blocking. This design idea sounds beautiful. Isn't non-blocking a synonym for beauty? However, reality likes to slap you in the face. The non-blocking of the driver layer will inevitably introduce the clock overhead of the interrupt mechanism, and the asynchronous event will inevitably introduce the program scheduling overhead. The final result is that the CPU core is not busy, but the SWD timing waveform is blank, which is not very good. This time when using GD32F350 for development, I also encountered this problem. The final solution is to couple the entire protocol layer with the SWD/JTAG driver layer and run it in main, while all other programs run in pendsv or higher priority drivers. III. Functional Test Passed tests:
Openocd 0.10 CMSIS-DAP mode SWD interface test, rate range 1M-32M
Openocd 0.10 CMSIS-DAP mode JTAG interface test, rate range 1M-8M
Openocd 0.10 BULK mode SWD interface test, rate range 1M-32M
Openocd 0.10 BULK mode JTAG interface test, rate range 1M-8M
IAR 7.80.3 CMSIS-DAP mode SWD interface test, rate range 1M-32M/AUTO IAR 7.80.3 CMSIS-DAP mode JTAG interface test, rate range 1M-8M/AUTO
USBCDC function, the tested maximum baud rate is 921600
Unfinished tests:
SWO function, it seems that IAR does not support the SWO function of CMSIS-DAP, this issue has not been investigated in depth
OpenOCD CMSIS-DAP mode SWD Transmission speed: Too lazy to mention, 23KB/S. OpenOCD BULK mode SWD transfer speed:
Operation
Transfer Speed
4MHz Read
102KB/S
4MHz Write
106KB/S
8MHz Read
123KB/S
8MHz Write
132KB/S
16MHz Read
128KB/S
16MHz Write
150KB/S
32MHz read
142KB/S
32MHz write
156KB/S
BULK mode is much better than driver-free HID, but why is its speed not as fast as ST-LINK V2/JLINK V8? Let's continue to look at the waveform:
32MHz write operation partial waveformIt is obvious that the problem is still in USB transmission. The OpenOCD driver should take the blame. The CMSIS-DAP protocol supports command queue operation, but the driver developer only uses a send-receive method. enocd-vllink/blob/master/src/jtag/drivers/cmsis_dap_usb.c#L365]Source code[/url]], although I changed the BULK driver, it is just Ctrl+C, Ctrl+V regular operation[
], so the USB data transmission is still one send and one receive, and the entire upper and lower mechanisms form a large blocking operation loop, it would be strange if it is not slow. So I have to reconstruct an OpenOCD driver later and use the asynchronous queue method to send commands. This modification requires developers to have a certain degree of familiarity with OpenOCD as a whole, so I don’t want to do it these days. Then, I have to look forward to the future. Using the asynchronous queue method, or filling the blanks in the four waveforms above with waveforms, what SWD speed can be achieved? 4MHz can be close to 200K, 8MHz is about 300KB, 16MHz is about 400KB, and 32Mhz may exceed 500KB. Haha, it's really good. Then, I will post a post "0.3 US dollars, you can't buy a loss or be fooled, punch ST, step on J-Link, if it's not FPGA, just lie down" Finally, the JTAG rate is very bad, so I won't talk about it. Fourth, make and test cases Because the host computer driver is not perfect, I did not try to build an executable program under the Windows system. Interested Linux users please compile it yourself.
For the hardware part, please refer to the first part.
Please build the host computer program by yourself:
V. Follow-up First fix the asynchronous queue driver, then make a Bootloader, optimize the JTAG waveform, and the CDC serial port must also support mode configuration. Some codes in the SWD driver also have room for optimization. ------------------------------------------------- Updated on November 5, 2018: Support USB Bulk asynchronous ping-pong transmission; CDC serial port supports mode configuration; slightly optimized SWD; completed SWD stability test; SWD @16M speed can reach 400KB/S.
vllink_lite.r2.20181102.zip(813.95 KB, downloads: 83)
2018-11-5 05:48 上传
点击文件名下载附件
After downloading the attachment, git pull to pull the latest version, or directly visit:
com/vllogic/vllink_lite[/url]com/vllogic/vllink_lite[/url]lol Finally, the JTAG speed is terrible, so I won't talk about it. Fourth, production and test cases Since the host computer driver is not perfect, I did not try to build an executable program under the Windows system. Interested Linux users can compile it by themselves.
For the hardware part, please refer to the first part.
Please build the host computer program by yourself:
V. Follow-up First fix the asynchronous queue driver, then make a Bootloader, optimize the JTAG waveform, and the CDC serial port must also support mode configuration. Some codes in the SWD driver also have room for optimization. ------------------------------------------------- Updated on November 5, 2018: Support USB Bulk asynchronous ping-pong transmission; CDC serial port supports mode configuration; slightly optimized SWD; completed SWD stability test; SWD @16M speed can reach 400KB/S.
vllink_lite.r2.20181102.zip(813.95 KB, downloads: 83)
2018-11-5 05:48 上传
点击文件名下载附件
After downloading the attachment, git pull to pull the latest version, or directly visit:
lol Finally, the JTAG speed is terrible, so I won't talk about it. Fourth, production and test cases Since the host computer driver is not perfect, I did not try to build an executable program under the Windows system. Interested Linux users can compile it by themselves.
For the hardware part, please refer to the first part.
Please build the host computer program by yourself:
V. Follow-up First fix the asynchronous queue driver, then make a Bootloader, optimize the JTAG waveform, and the CDC serial port must also support mode configuration. Some codes in the SWD driver also have room for optimization. ------------------------------------------------- Updated on November 5, 2018: Support USB Bulk asynchronous ping-pong transmission; CDC serial port supports mode configuration; slightly optimized SWD; completed SWD stability test; SWD @16M speed can reach 400KB/S.
vllink_lite.r2.20181102.zip(813.95 KB, downloads: 83)
2018-11-5 05:48 上传
点击文件名下载附件
After downloading the attachment, git pull to pull the latest version, or directly visit:
Can you share the USB code? I am now debugging the USB of GD32F350. The official routines always show unknow device. The firmware you provided can recognize the USB normally. The official does not even have an application note. I don't know where to start.
I found the problem. The library is not written rigorously. IAR8.3 will optimize global variables even if the optimization level is none. I just need to change to 7.8.
MARk, I'll study it when I have time. There are still jlink-v8s, but they are pirated. They always lose firmware and are slow. I'll make a few for fun.