15640 views|10 replies

750

Posts

3

Resources
The OP
 

CMSIS-DAP compatible debugger [Vllink Lite] [Copy link]

 
This post was last edited by le062 on 2018-11-5 05:52 Written in the front According to my previous tests, the continuous read and write speeds of ST-Link V2 and Jlink V8 are both around 160KB/S, while most CMSIS-DAP debuggers are limited by full-speed HID and it is difficult to increase the speed. The continuous read and write speed of
链接已隐藏,如需查看请登录或者注册
+ OpenOCD is only 23KB/S. At the beginning of the year, I tried to use NUC505 for CMSIS-DAP. 505 should be the cheapest integrated USB HS PHY microcontroller on the market. The high-speed HID message can be set to 1024Byte, and the send and receive interval is 125uS. Its USB part has no bottleneck at all. However, its SPI module is very slow. After a transmission is completed, it will wait for several CLKs before triggering the completion flag. Even if the CLK is increased to 14MHz, the continuous read and write speed is only at the level of 150KB/S. It can only be said that a rotten wood cannot be carved. The protagonist of this time, GD32F350, was known last year. It is said that it is the same price as GD32F150. The USB part is changed to DWCOTG, and the execution speed of the 32KB code will not be slow. After reading the datasheet in detail, it is found that the internal 48M clock can be calibrated by the USB SOF signal, so there is no need for a crystal oscillator. The price is claimed to be as low as 30 cents in bulk. Well, it's so-so. Anyway, I bought 3pcs on Taobao for a total of 21 yuan. In general, the "seemingly" super low solution price, similar to the SPI design of the ST-LinkV2 main chip, may also be able to reach the mainstream speed indicator of 160KB/S, which is worth a try. 1. Hardware Design The simpler the hardware, the better. A QFN28 GD32F350 with some resistors and capacitors is enough. Considering the bootloader and the adapter shell, a button and a two-color LED are needed. The debugging interface needs to be brought out during the development phase, and the LQFP48 package chip must be used, so the test board is compatible with both. For hardware information, see this post. 2. Software Design Design requirements: While retaining the basic functions of the CMSIS-DAP debugger, maximize the chip potential of GD32F350 and maximize the SWD/JTAG debugging speed through the OpenOCD dedicated driver and USB BULK transmission method. Functional requirements:
  • Support SWD+SWO, JTAG debugging interface
  • Support one USBCDC serial port
  • Compatible with CMSIS-DAP HID protocol, driver-free use on all platforms
  • Develop BULK transmission interface
  • Support OpenOCD BULK transmission interface
  • Bootloader emulating USB disk
Key points of hardware driver development:
  • DWCOTG integrated in GD32F350 only supports 4 bidirectional endpoints, of which No. 0 is the control endpoint, No. 1 is used for BULK transmission interface, No. 2 is used for CMSIS-DAP HID interface, No. 3 is used for CDC data interface, and No. 4 is configured as CDC control interface, but No. 4 port does not actually exist, and the device always returns NAK response, which does not affect the CDC serial port function.
  • CDC serial port supports host computer configuration of baud rate, ranging from: 8M, 4M, 3.2M-2K. For the design details of the high-speed serial port driver, please refer to this post.
  • For the SWD interface, the CLK in one SPI is used to output SWCLK, and MISO and MOSI are combined to implement SWDIO. Take the specific waveform as an example: As shown in the figure, the waveform timing of the basic unit of SWD transmission consists of three blocks: request, response and data, and there are some idle bits between each block. The request is 8 bits, the response block and its front and back idle bits are uncertain, and the data block contains 32 bits of data and its tail idle bits. The transmission direction between blocks may be inconsistent, so it needs to be divided into three sections. In GD32F350, 16-bit continuous waveforms are transmitted through SPI 16-bit mode (such as the data block in the figure above), 8-bit continuous waveforms are transmitted through SPI 8-bit mode (such as the request block in the figure above), and the remaining waveforms are completed through IO flipping (such as the response block in the figure above). All hardware in DAPLINK uses GPIO flipping to complete timing operations. In the actual optimization process, the fastest GPIO frequency can reach 8M, but this mode will occupy a lot of CPU, which does not feel very appropriate.
  • The JTAG interface part is similar to the SWD interface, one master SPI and one slave SPI. Take the specific waveform as an example: The idea is that the full 8 bytes are sent and received through SPI, and the remaining bytes are flipped by GPIO. The JTAG part has not been optimized for timing, so there is no 16BIT continuous waveform, and the GPIO flip delay is also large.
Protocol development: The protocol layer code was completed in the previous NUC505 solution. The logic is the same as CMSIS-DAP, but the protocol layer is decoupled from the driver layer. The protocol layer can asynchronously call the transmission interface and then wait for the transmission completion event. The driver part of NUC505 is also completely non-blocking. This design idea sounds beautiful. Isn't non-blocking a synonym for beauty? However, reality likes to slap you in the face. The non-blocking of the driver layer will inevitably introduce the clock overhead of the interrupt mechanism, and the asynchronous event will inevitably introduce the program scheduling overhead. The final result is that the CPU core is not busy, but the SWD timing waveform is blank, which is not very good. This time when using GD32F350 for development, I also encountered this problem. The final solution is to couple the entire protocol layer with the SWD/JTAG driver layer and run it in main, while all other programs run in pendsv or higher priority drivers. III. Functional Test Passed tests:
  • Openocd 0.10 CMSIS-DAP mode SWD interface test, rate range 1M-32M
  • Openocd 0.10 CMSIS-DAP mode JTAG interface test, rate range 1M-8M
  • Openocd 0.10 BULK mode SWD interface test, rate range 1M-32M
  • Openocd 0.10 BULK mode JTAG interface test, rate range 1M-8M
  • IAR 7.80.3 CMSIS-DAP mode SWD interface test, rate range 1M-32M/AUTO IAR 7.80.3 CMSIS-DAP mode JTAG interface test, rate range 1M-8M/AUTO
  • USBCDC function, the tested maximum baud rate is 921600
Unfinished tests:
  • SWO function, it seems that IAR does not support the SWO function of CMSIS-DAP, this issue has not been investigated in depth
OpenOCD CMSIS-DAP mode SWD Transmission speed: Too lazy to mention, 23KB/S. OpenOCD BULK mode SWD transfer speed:
Operation Transfer Speed
4MHz Read 102KB/S
4MHz Write 106KB/S
8MHz Read 123KB/S
8MHz Write 132KB/S
16MHz Read 128KB/S
16MHz Write 150KB/S
32MHz read 142KB/S
32MHz write 156KB/S
BULK mode is much better than driver-free HID, but why is its speed not as fast as ST-LINK V2/JLINK V8? Let's continue to look at the waveform: 4MHz write operation partial waveform 8MHz write operation partial waveform 16MHz write operation partial waveform 32MHz write operation partial waveformIt is obvious that the problem is still in USB transmission. The OpenOCD driver should take the blame. The CMSIS-DAP protocol supports command queue operation, but the driver developer only uses a send-receive method. enocd-vllink/blob/master/src/jtag/drivers/cmsis_dap_usb.c#L365]Source code[/url]], although I changed the BULK driver, it is just Ctrl+C, Ctrl+V regular operation[
链接已隐藏,如需查看请登录或者注册
], so the USB data transmission is still one send and one receive, and the entire upper and lower mechanisms form a large blocking operation loop, it would be strange if it is not slow. So I have to reconstruct an OpenOCD driver later and use the asynchronous queue method to send commands. This modification requires developers to have a certain degree of familiarity with OpenOCD as a whole, so I don’t want to do it these days. Then, I have to look forward to the future. Using the asynchronous queue method, or filling the blanks in the four waveforms above with waveforms, what SWD speed can be achieved? 4MHz can be close to 200K, 8MHz is about 300KB, 16MHz is about 400KB, and 32Mhz may exceed 500KB. Haha, it's really good. Then, I will post a post "0.3 US dollars, you can't buy a loss or be fooled, punch ST, step on J-Link, if it's not FPGA, just lie down" Finally, the JTAG rate is very bad, so I won't talk about it. Fourth, make and test cases Because the host computer driver is not perfect, I did not try to build an executable program under the Windows system. Interested Linux users please compile it yourself.
  • For the hardware part, please refer to the first part.
  • Please build the host computer program by yourself:
    链接已隐藏,如需查看请登录或者注册
  • Preview firmware: vllink_lite_gd32f350c8_0x08000000.zip (20.47 KB, downloads: 35)
  • openocd configuration example:
    1. source [find interface/vllink.cfg] transport select swd source [find target/stm32f4x.cfg] reset_config srst_only
    复制代码
V. Follow-up First fix the asynchronous queue driver, then make a Bootloader, optimize the JTAG waveform, and the CDC serial port must also support mode configuration. Some codes in the SWD driver also have room for optimization. ------------------------------------------------- Updated on November 5, 2018: Support USB Bulk asynchronous ping-pong transmission; CDC serial port supports mode configuration; slightly optimized SWD; completed SWD stability test; SWD @16M speed can reach 400KB/S. vllink_lite.r2.20181102.zip (813.95 KB, downloads: 83) After downloading the attachment, git pull to pull the latest version, or directly visit:
链接已隐藏,如需查看请登录或者注册
com/vllogic/vllink_lite[/url]com/vllogic/vllink_lite[/url]lol Finally, the JTAG speed is terrible, so I won't talk about it. Fourth, production and test cases Since the host computer driver is not perfect, I did not try to build an executable program under the Windows system. Interested Linux users can compile it by themselves.
  • For the hardware part, please refer to the first part.
  • Please build the host computer program by yourself:
    链接已隐藏,如需查看请登录或者注册
  • Preview firmware: vllink_lite_gd32f350c8_0x08000000.zip (20.47 KB, downloads: 35)
  • openocd configuration example:
    1. source [find interface/vllink.cfg] transport select swd source [find target/stm32f4x.cfg] reset_config srst_only
    复制代码
V. Follow-up First fix the asynchronous queue driver, then make a Bootloader, optimize the JTAG waveform, and the CDC serial port must also support mode configuration. Some codes in the SWD driver also have room for optimization. ------------------------------------------------- Updated on November 5, 2018: Support USB Bulk asynchronous ping-pong transmission; CDC serial port supports mode configuration; slightly optimized SWD; completed SWD stability test; SWD @16M speed can reach 400KB/S. vllink_lite.r2.20181102.zip (813.95 KB, downloads: 83) After downloading the attachment, git pull to pull the latest version, or directly visit:
链接已隐藏,如需查看请登录或者注册
lol Finally, the JTAG speed is terrible, so I won't talk about it. Fourth, production and test cases Since the host computer driver is not perfect, I did not try to build an executable program under the Windows system. Interested Linux users can compile it by themselves.
  • For the hardware part, please refer to the first part.
  • Please build the host computer program by yourself:
    链接已隐藏,如需查看请登录或者注册
  • Preview firmware: vllink_lite_gd32f350c8_0x08000000.zip (20.47 KB, downloads: 35)
  • openocd configuration example:
    1. source [find interface/vllink.cfg] transport select swd source [find target/stm32f4x.cfg] reset_config srst_only
    复制代码
V. Follow-up First fix the asynchronous queue driver, then make a Bootloader, optimize the JTAG waveform, and the CDC serial port must also support mode configuration. Some codes in the SWD driver also have room for optimization. ------------------------------------------------- Updated on November 5, 2018: Support USB Bulk asynchronous ping-pong transmission; CDC serial port supports mode configuration; slightly optimized SWD; completed SWD stability test; SWD @16M speed can reach 400KB/S. vllink_lite.r2.20181102.zip (813.95 KB, downloads: 83) After downloading the attachment, git pull to pull the latest version, or directly visit:
链接已隐藏,如需查看请登录或者注册


This post is from GD32 MCU

Latest reply

Thanks for sharing   Details Published on 2021-3-10 16:06
Personal signature

要666

 

5216

Posts

239

Resources
From 2
 
Attached to the judges' supplementary materials:
le062
ISP burning GD32F350
le062
Vllink lite hardware design information
le062
GD32F350 dual serial port non-blocking high-speed transceiver mechanism and its driver


This post is from GD32 MCU
Add and join groups EEWorld service account EEWorld subscription account Automotive development circle
 
 
 

750

Posts

3

Resources
From 3
 
The USB and SWD parts have been optimized. The low-cost debugger for GD32F350 has been completed.
This post is from GD32 MCU
 
Personal signature

要666

 
 

196

Posts

0

Resources
4
 
Awesome!
This post is from GD32 MCU
 
 
 

111

Posts

0

Resources
5
 
Support, talent!!!
This post is from GD32 MCU
 
 
 

19

Posts

0

Resources
6
 
Can you share the USB code? I am now debugging the USB of GD32F350. The official routines always show unknow device. The firmware you provided can recognize the USB normally. The official does not even have an application note. I don't know where to start.
This post is from GD32 MCU
 
 
 

19

Posts

0

Resources
7
 
I found the problem. The library is not written rigorously. IAR8.3 will optimize global variables even if the optimization level is none. I just need to change to 7.8.
This post is from GD32 MCU
 
 
 

36

Posts

0

Resources
8
 
Trying to make a DAP downloader
This post is from GD32 MCU
 
 
 

375

Posts

10

Resources
9
 
MARk, I'll study it when I have time. There are still jlink-v8s, but they are pirated. They always lose firmware and are slow. I'll make a few for fun.
This post is from GD32 MCU
 
 
 

21

Posts

0

Resources
10
 
Thanks for sharing! I have always wanted to make a remote upgrade module! The benefits are huge!
This post is from GD32 MCU
 
 
 

661

Posts

0

Resources
11
 

Thanks for sharing

This post is from GD32 MCU
 
 
 

Guess Your Favourite
Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list