TMS320C6678 multi-core DSP power-on loading technology

Aguilera · Published on 2017-11-10 20:17

TMS320C6678 multi-core DSP power-on loading technology [Copy link]

In the application of video detection, medical imaging and infrared image fast tracking system, the increasingly complex two-dimensional, three-dimensional and even four-dimensional image processing requires a parallel processing system that can run complex algorithms. To realize these complex systems, high-end FPGA + high-performance DSP is currently a commonly used solution, and the performance of a single DSP has been developed to the limit. Therefore, to solve complex parallel algorithms, multi-core DSP is now a new direction of development, among which the root loading technology of multi-core DSP is one of its difficulties. The DSP chip TMS320C6678 (C6678) launched by TI has a high-performance DSP with 8 cores, and the operating frequency of each core is 1 GHz. The supported Boot modes are SPI, I2C, EMAC, SRIO and parallel port Emif16 NOR-FLASH. Among them, Emif16 NOR-FLASH mode is a relatively simple and independent system without the participation of the host computer. Most independent DSP systems adopt this method. Some sporadic loading information about C6472 and C6678 can be searched on the Internet, all of which are with the help of third-party conversion tools, which are too general. The following is a detailed discussion on the power-on loading of the parallel port Emif16NOR-FLASH of C6678. 1 The power-on loading process of C6678 The so-called power-on loading (power-on bootstrap) is a small program that runs before the normal operation of the user program after the DSP is reset, just like the BIOS of a PC. Multi-core loading is very different from single-core loading. It is responsible for not only the loading of the main core but also the loading and activation of other cores. The Emif16 NOR-FLASH of C6678 can directly execute the program (XIP) (this is different from the C641x series DSP). Its power-on loading process is shown in Figure 1. After power-on reset, the DSP first runs the program solidified in the on-chip ROM at address 0x20b00000, which is called the on-chip loader. The on-chip loader determines the Boot mode used by the user based on the status of the DSP hardware pins to jump to the secondary loader of the corresponding mode. As shown in Figure 1, in Emif16 NOR-FLASH mode, after running the on-chip Loader, the PC pointer directly points to the NOR-FLASH first address 0×70000000 and starts to execute the secondary Loader program on the FLASH. The secondary Loader is stored in the range of FLASH start address 0×70000000~0×70000400. The root table data of the application (that is, the data of the application burned into the FLASH) is saved from 0×70000400. The function of the secondary Loader is to move the root table data of Core0~Core7 stored in the FLASH to the corresponding address segment of the DSP. After the move, the secondary Loader program PC pointer jumps to the main program entry address _c_int00 of Core0 and starts to execute the application of Core0. At the beginning of the application of Core0, there is code to activate and run other cores (this is also different from the special feature of the single core), and the entire multi-core loading is completed. In fact, if your application is very small and does not require high running speed, processes 2, 3 and 4 in Figure 1 can be omitted. Just burn the original code data of the application to the FLASH starting from 0×70000000 and power on to run normally (this is not possible on C641x). In this way, many high performance features of DSP cannot be reflected, and most multi-core projects use embedded sysbios projects, which occupy a relatively large amount of memory, so the normal Boot process must use the secondary loading process shown in Figure 1.

As can be seen from Figure 1, for a complete multi-core loading process, what the developer needs to do is to write the secondary loader, generate the image file in FLASH, write the FLASH burner, and write the trigger code of the main core to each auxiliary core (the loaded application is not within this scope). 2 Composition and generation of multi-core image files The image file is the entire data file that the user wants to burn into the external FLASH. It is a synthetic data file composed of the code data of the secondary loader (in the front of the file) and the root table (Boot Table) data of the application (at the back of the file). The secondary loaders of single-core and multi-core are the same, the difference is the root table data at the back. The root table is a data packet in which all the codes and data of the application are stored in segments according to the addresses occupied on the chip. The first 4 bytes of the packet are the entry address _C_int00 of the main() function, followed by several data segments. The first 4 bytes of each segment are the byte length of the data segment Byte_count_x (x is the segment number), followed by 4 bytes Address_x for the storage address of the segment on the chip, followed by Byte_count_x bytes of specific data Data_x. After all the data segments, there are 4 bytes of 0 as the end mark of the root table. The format of the root table is shown in Table 1. The number of data bytes in each segment may not be an integer multiple of 4. The data area in the root table is added with 0 at the end and rounded up to an integer multiple of 4 B. Therefore, the number of bytes in the entire root table file must be an integer multiple of 4. The generation of root table data is very simple. The Out file finally generated by the application is generated by selecting different parameters through the CCS built-in tool hex6x.exe. The generated file is the root table file. You can choose to generate a binary file or a text file. This study uses binary. The generation command is (app is the application name, app.out is the connection file generated by ccs):

hex6x -boot -b -e _c_int00 -order L -memwidth=32 -romwidth=32 -o app.bin app.out

[color=rgb(51, 102, 153) !important]Copy code app.bin is the generated binary root table file. Adding the binary code of the secondary Loader program to the header of the root table file is the image file of the app application. The multi-core image file is a file formed by merging the secondary loader Loader and the root tables of multiple core applications. Multiple cores correspond to multiple independent projects, and CCS generates multiple out files, and hex6x.exe generates the root table file of each core. Then, remove the last 4 0 bytes of the root table file of Core0, remove the starting entry address _C_int00 and the last 4 0 bytes of the root table file of each auxiliary core, and add it to the root table file of Core0 with the last byte removed. Then, treat the _C_int00 of each core as a 4-byte data segment and save it to the back of the above synthetic file. The storage address of each _C_int00 on the chip is the Boot Magic Address of each core. For example, the Boot Magic Ad-dress of Core1 is 0x1187fffc.Core2 is 0x1287fffc,…, Core7 is 0x1787fffc. After all the root table data segments are formed, 4 0 bytes are added to the end of the file as the end mark, so that the merged root table file is shown in Table 2. Similarly, the code data of the secondary loader is added to the header of the file to form a multi-core image file. The generation of the single-core root table file generated by hex6x to the synthetic image file is all file operations, which can be completed with general C language tools, or even Matlab and other tools. Compared with Table 1, Table 2 only adds all auxiliary core data segments and the _C_int00 special data segment of each core. The header and end bytes are the same, so it is completely suitable for the secondary loader to move data in a unified Boot Table format. It should be noted that when the root table data segment generated by hex6x.exe of each auxiliary core out file is mapped to the range of L2 (0×00800000~0x0087FFFF), it is the same as Core0 The addresses are mutually covered. When generating the synthetic root table, the L2 base address of each core must be added 0×10000000 + n*0×1000000 (n is the auxiliary core number). For example, the address of Core1 is 0×00825000, which is mapped to 0×11825000. Similarly, the address of Core2 is mapped to 0×12825000, and that of Core7 is mapped to 0×17825000. 3 Secondary Loader Program and FLASH Burning Program The secondary loader is a small code program. Its function is to move the root table data of FLASH, such as the data saved in the format of Table 2 from 0×70000400 in Figure 1, to the RAM of DSP. The loader is relatively simple, generally a small assembly code, as follows:

It should be noted that, since DDR is not initialized, the secondary loader cannot load DDR data. DDR is only used as data storage. If it is really necessary to load it into DDR, the DDR data can only be stored in a specified segment of FLASH. After Core0 starts to initialize DDR, the data can be read into RAM. For the FLASH burning program, the main task is to burn the multi-core synthesis file into the external Emif16 NOR-FLASH memory. Since most of TI's Emif parallel port loading is open, developers can completely generate and burn image files according to their own ideas and formats, so TI does not provide a burner. In fact, once the synthesis root table file is generated, the burner becomes easy. The programmer generally uses a CCS project that outputs the legacy COFF format. From the loading process in Figure 1, the image file to be burned includes the code of the secondary loader and the root table file in Table 2. The secondary loader can be placed before the main() function at the beginning of the programmer, or it can be placed in the same position as the application of Core0. This study adopts the former and maps the secondary loader code to the memory specified in the burning project. boot_load segment. The programming process of the programmer is shown in Figure 2.

4 Auxiliary core triggers the loading of multiple cores. If Figure 1 only enters the _C_int00 address of Core0 and other cores are not activated, the loading will still fail. The triggering of the auxiliary core requires two conditions: one is to write the entry address _C_int00 of each core project to the Boot Magic Address of each core; the other is to write the inter-core interrupt trigger register IPCx(1 of each core. Once the auxiliary core is triggered, in the auxiliary core application, the data 0xbabeface must be written to the Boot Magic Address of each core to replace the respective _C_int00. 5 Conclusion Loading multi-core DSP is a relatively complex but very important process, and it is also one of the application difficulties of multi-core technology. If a multi-core DSP developer wants to successfully move towards application, power-on loading is a must. The application project of each core can be an output elf format sysbios project or not a sysbios project. The above multi-core Emif16NOR-FLASH loading method has been successfully loaded in the self-developed C6678 image signal processing system. com/data/attachment/forum/201407/24/095339dfj4fffwc8iogwaa.jpg[/img] It should be noted that, since DDR is not initialized, the secondary loader cannot load DDR data. DDR is only used as data storage. If it is really necessary to load it into DDR, the DDR data can only be stored in a specified segment of FLASH. After Core0 starts to initialize DDR, the data can be read into RAM. For the FLASH burning program, the main task is to burn the multi-core synthesis file into the external Emif16 NOR-FLASH memory. Since most of TI's Emif parallel port loading is open, developers can completely generate and burn image files according to their own ideas and formats, so TI does not provide a burner. In fact, once the synthetic root table file is generated, the burner becomes easy. The burner generally uses a CCS project that outputs the legacy COFF format. From the loading process in Figure 1, the image file to be burned includes the code of the secondary loader and the root table file in Table 2. The secondary loader can be placed before the main() function at the beginning of the burner, or it can be placed in the same position of the application of Core0. This study adopts the former and maps the secondary loader code to the memory specified in the burning project. The boot_load segment. The programming process of the burner is shown in Figure 2.

4 Auxiliary core triggering multi-core loading, if Figure 1 only enters the _C_int00 address of Core0 and other cores are not activated, the loading will still fail. The triggering of the auxiliary core requires two conditions: one is to write the entry address _C_int00 of each core project to the Boot Magic Address of each core; the other is to write the inter-core interrupt trigger register IPCx(1 of each core. Once the auxiliary core is triggered, in the auxiliary core application, the data 0xbabeface must be written to the Boot Magic Address of each core to replace the respective _C_int00. 5 Conclusion Loading multi-core DSP is a relatively complex but very important process, and it is also one of the application difficulties of multi-core technology. If a multi-core DSP developer wants to successfully move towards application, power-on loading is a must. The application project of each core can be an output elf format sysbios project or not a sysbios project. The above multi-core Emif16NOR-FLASH loading method has been successfully loaded in the self-developed C6678 image signal processing system. com/data/attachment/forum/201407/24/095339dfj4fffwc8iogwaa.jpg[/img] It should be noted that, since DDR is not initialized, the secondary loader cannot load DDR data. DDR is only used as data storage. If it is really necessary to load it into DDR, the DDR data can only be stored in a specified segment of FLASH. After Core0 starts to initialize DDR, the data can be read into RAM. For the FLASH burning program, the main task is to burn the multi-core synthesis file into the external Emif16 NOR-FLASH memory. Since most of TI's Emif parallel port loading is open, developers can completely generate and burn image files according to their own ideas and formats, so TI does not provide a burner. In fact, once the synthetic root table file is generated, the burner becomes easy. The burner generally uses a CCS project that outputs the legacy COFF format. From the loading process in Figure 1, the image file to be burned includes the code of the secondary loader and the root table file in Table 2. The secondary loader can be placed before the main() function at the beginning of the burner, or it can be placed in the same position of the application of Core0. This study adopts the former and maps the secondary loader code to the memory specified in the burning project. The boot_load segment. The programming process of the burner is shown in Figure 2.

4 Auxiliary core triggering multi-core loading, if Figure 1 only enters the _C_int00 address of Core0 and other cores are not activated, the loading will still fail. The triggering of the auxiliary core requires two conditions: one is to write the entry address _C_int00 of each core project to the Boot Magic Address of each core; the other is to write the inter-core interrupt trigger register IPCx(1 of each core. Once the auxiliary core is triggered, in the auxiliary core application, the data 0xbabeface must be written to the Boot Magic Address of each core to replace the respective _C_int00. 5 Conclusion Loading multi-core DSP is a relatively complex but very important process, and it is also one of the application difficulties of multi-core technology. If a multi-core DSP developer wants to successfully move towards application, power-on loading is a must. The application project of each core can be an output elf format sysbios project or not a sysbios project. The above multi-core Emif16NOR-FLASH loading method has been successfully loaded in the self-developed C6678 image signal processing system.