Multi-core DSP power-on loading technology based on TMS320C6678-EEWORLD

Collect

In the application of video detection, medical imaging and infrared image fast tracking system, the increasingly complex two-dimensional, three-dimensional and even four-dimensional image processing requires a parallel processing system that can run complex algorithms. To realize these complex systems, high-end FPGA + high-performance DSP is currently a commonly used solution, and the performance of a single DSP has been developed to its limit. Therefore, to solve complex parallel algorithms, multi-core DSP is now a new direction of development, and the root loading technology of multi-core DSP is one of its difficulties.

The DSP chip TMS320C6678 (C6678) launched by TI has a high-performance DSP with 8 cores, and the operating frequency of each core reaches 1 GHz.

The supported Boot modes include SPI, I2C, EMAC, SRIO and parallel port Emif16 NOR-FLASH. Among them, the Emif16 NOR-FLASH mode is a relatively simple and independent system that does not require the participation of the host computer. Most independent DSP systems use this method.

The scattered loading information about C6472 and C6678 that can be found on the Internet are all based on third-party conversion tools and are too general. The following is a detailed discussion on the power-on loading of the parallel port Emif16NOR-FLASH of C6678.

1 Power-on loading process of C6678

The so-called power-on loading (power-on bootstrap) is a small program that runs before the normal user program after the DSP is reset, just like the BIOS of a PC. Multi-core loading is very different from single-core loading. It is responsible for not only the loading of the main core but also the loading and activation of other cores. The Emif16 NOR-FLASH of C6678 can directly execute the program (XIP) (this is different from the C641x series DSP). Its power-on loading process is shown in Figure 1.

After power-on reset, DSP first runs the program solidified in the on-chip ROM at address 0x20b00000, called the on-chip loader. The on-chip loader determines the Boot mode adopted by the user according to the status of the DSP hardware pins to jump to the secondary loader of the corresponding mode. As shown in Figure 1, in the Emif16 NOR-FLASH mode, after running the on-chip loader, the PC pointer directly points to the NOR-FLASH first address 0×70000000 and starts to execute the secondary loader program on the FLASH. The secondary loader is stored in the range of 0×70000000~0×70000400 of the FLASH start address. The root table data of the application (that is, the data of the application burned into the FLASH) is saved starting from 0×70000400. The function of the secondary loader is to move the root table data of Core0~Core7 stored in FLASH to the corresponding address segment of DSP. After the move, the PC pointer of the secondary loader program jumps to the main program entry address _c_int00 of Core0 and starts to execute the application program of Core0. At the beginning of the application program of Core0, the code for activating and running other cores is added (this is also different from the special feature of single core), and the whole multi-core loading is completed. In fact, if your application is very small and the running speed requirement is not high, the 2, 3 and 4 processes in Figure 1 can be omitted. Just burn the original code data of the application to the position starting from 0×70000000 of FLASH and power on and run normally (this is not possible on C641x). In this way, many high performances of DSP cannot be reflected, and most multi-core projects use embedded sysbios projects, which occupy a large amount of memory, so the normal Boot process must use the secondary loading process shown in Figure 1.

As can be seen from Figure 1, for a complete multi-core loading process, developers need to write the secondary loader, generate the image file in FLASH, write the FLASH burner, and write the trigger code of the main core to each auxiliary core (the loaded application is not within this scope).

2 Composition and Generation of Multi-core Image Files

The image file is the complete data file that the user wants to burn to the external FLASH. It is a composite data file composed of the code data of the secondary loader (in the front of the file) and the root table (Boot Table) data of the application (in the back of the file). The secondary loaders of single-core and multi-core are the same, the difference is the root table data in the back. The root table is a data packet in which all the code and data of the application are stored in segments according to the addresses occupied on the chip. The first 4 bytes of the packet are the entry address _C_int00 of the main() function, followed by several data segments. The first 4 bytes of each segment are the byte length of the data Byte_count_x (x is the segment number), followed by 4 bytes Address_x for the storage address of the segment on the chip, followed by Byte_count_x bytes of specific data Data_x. After all data segments are ended, there are 4 bytes of 0 as the end mark of the root table. The format of the root table is shown in Table 1. The number of data bytes in each segment may not be an integer multiple of 4. The data area in the root table is added with 0s at the end and rounded up to an integer multiple of 4 B. Therefore, the number of bytes in the entire root table file must be an integer multiple of 4.

The root table data is generated very simply. The Out file generated by the application is generated by selecting different parameters through the ccs built-in tool hex6x.exe. The generated file is the root table file. You can choose to generate a binary file or a text file. This study uses binary. The generation command is (app is the application name, app.out is the connection file generated by ccs):

hex6x -boot -b -e _c_int00 -order L -memwidth=32 -romwidth=32 -o app.bin app.out

app.bin is the generated binary root table file. Adding the binary code of the secondary Loader program to the head of the root table file is the image file of the app application.

The multi-core image file is a file formed by merging the secondary loader and the root tables of multiple core applications. Multiple cores correspond to multiple independent projects, and CCS generates multiple out files, and hex6x.exe generates the root table files of each core. Then, remove the last 4 0 bytes of the root table file of Core0, remove the entry address _C_int00 and the last 4 0 bytes of the root table file of each auxiliary core, and add it to the root table file of Core0 with the last byte removed. Then, treat the _C_int00 of each core as a 4-byte data segment and save it to the back of the above composite file. The storage address of each _C_int00 on the chip is the Boot Magic Address of each core, such as the Boot Magic Ad-dress of Core1 is 0x1187fffc, Core2 is 0x1287fffc, ..., Core7 is 0x1787fffc. After all the root table data segments are formed, add 4 0 bytes as the end mark to the end of the file, so that the merged root table file is shown in Table 2. Similarly, add the code data of the secondary Loader to the header of the file to form a multi-core image file. The generation of the single-core root table file generated by hex6x to the synthetic image file is all file operations, which can be completed with general C language tools or even tools such as Matlab.

Compared with Table 1, Table 2 only adds all auxiliary core data segments and the _C_int00 special data segment of each core. The header and end bytes are the same, so it is completely suitable for the secondary loader to move data in a unified Boot Table format. It should be noted that when the root table data segment of each auxiliary core's out file is mapped to the range of L2 (0×00800000~0x0087FFFF) through hex6x.exe, it overlaps with the address of Core0. When generating the synthetic root table, the L2 base address of each core must be added 0×10000000 + n*0×1000000 (n is the auxiliary core number). For example, the address of Core1 is 0×00825000, which is mapped to 0×11825000. The same address is mapped to 0×12825000 for Core2 and 0×17825000 for Core7.

3 Secondary Loader Program and FLASH Burning Program

The secondary loader is a small code program, whose function is to move the root table data of FLASH, such as the data saved in the format of Table 2 starting from 0×70000400 in Figure 1, to the RAM of DSP. The loader is relatively simple, usually a small assembly code, as follows:

It should be noted that since DDR is not initialized, the secondary loader cannot load DDR data. DDR is only used as a data storage device. If you really want to load it into DDR, you can only store the DDR data in a specified segment of FLASH. After Core0 starts to initialize DDR, the data can be read into RAM.

For the FLASH burning program, the main task is to burn the multi-core synthesis file into the external Emif16 NOR-FLASH memory.

Since most of TI's Emif parallel port loaders are open, developers can generate and burn image files according to their own ideas and formats, so TI does not provide burners. In fact, once the synthetic root table file is generated, the burner becomes easy. The burner generally uses a CCS project that outputs the legacy COFF format. From the loading process in Figure 1, the image file to be burned includes the code of the secondary loader and the root table file in Table 2. The secondary loader can be placed before the main() function at the beginning of the burner, or it can be placed in the same position of the Core0 application. This study uses the former to map the secondary loader code to the memory specified in the burning project. boot_load segment. The programming process of the burner is shown in Figure 2.

4 Triggering the Auxiliary Core

For multi-core loading, if only the _C_int00 address of Core0 is entered and other cores are not activated in Figure 1, the loading will still fail. The triggering of the auxiliary core requires two conditions: first, the entry address _C_int00 of each core project is written to the Boot Magic Address of each core; second, the core-to-core interrupt trigger register IPCx (1

Once the auxiliary core is triggered, in the auxiliary core application, the data 0xbabeface must be written to the Boot Magic Address of each core to replace the respective _C_int00.

5 Conclusion

Multi-core DSP loading is a relatively complex but important process, and it is also one of the difficulties in the application of multi-core technology. If a multi-core DSP developer wants to successfully move towards application, power-on loading is a must. The application project of each core can be a sysbios project that outputs elf format, or it can be a non-sysbios project. The above multi-core Emif16NOR-FLASH loading method has been successfully loaded in the self-developed C6678 image signal processing system.

Keywords：TMS320C6678 Reference address：Multi-core DSP power-on loading technology based on TMS320C6678

Previous article：Design of digital storage oscilloscope card based on DSP
Next article：Design of stepper motor control system based on FPGA

Popular Resources
Popular amplifiers