STM32MP157A-DK1 Review (3) Cortex-A7 lights up

cruelfox

STM32MP157A-DK1 Review (3) Cortex-A7 lights up [Copy link]

This post was last edited by cruelfox on 2020-4-19 08:46

　　Although I can use the onboard ST-Link to debug the STM32MP157 through the SWD port, I encountered unexpected situations during the experiment. In short, the debugging operation in OpenOCD is not as smooth as that for the STM32 series MCU. I am not familiar with the debugging principles behind OpenOCD, and it is my first time dealing with the Cortex-A7 core, so I encountered some obstacles here.

　　For example, when halting the CPU, there may be an obvious wait, and then OpenOCD reports an error
> halt
stm32mp15x.cpu0 rev 5, partnum c07, arch f, variant 0, implementor 41
stm32mp15x.cpu0 cluster 0 core 0 multi core
Timeout waiting for halt

　　For example, you may encounter a state where the step command cannot be executed single-step (the PC register does not change).

　　What's worse is that the reset init command I used to reset and initialize the Cortex-M no longer works properly.
> reset init
ap 0 selected, csw 0x10006000
mem2array: Read @ 0x50001000, w=4, cnt=1, failed
mem2array: Read @ 0x50000208, w=4, cnt=1, failed
SRST line asserted
SRST line released
stlink_connect(connect)
SWD DPIDR 0x6ba02477
timed out while waiting for target halted
stm32mp15x.cpu0 rev 5, partnum c07, arch f, variant 0, implementor 41
target halted in Thumb state due to debug-request, current mode: Supervisor
cpsr: 0x800001f3 pc: 0x00010b6a
MMU: enabled, D-Cache: disabled, I-Cache: enabled
ap 0 selected, csw 0x1000600
0pc (/32): 0x00010B6A
stm32mp15x.cpu1: ran after reset and before halt ...
Timeout waiting for halt
embedded:startup.tcl:24: Error :
in procedure 'ocd_process_reset'
in procedure 'ocd_process_reset_inner' called at file "embedded:startup.tcl", line 261
in procedure 'arp_reset_default_handler'
in procedure 'arp_re set_plan_srst_dbg_gated' called at file "embedded:startup.tcl", line 387
in procedure 'stm32mp15x.cpu1' called at file "embedded:startup.tcl", line 339
in procedure 'ocd_bouncer'
at file "embedded:startup.tcl", line 24

Deferring arp_examine of stm32mp15x.cpu2
Use arp_examine command to examine it manually!
Deferring arp_examine of stm32mp15x.ap2
Use arp_examine command to examine it manually!
timed out while waiting for target halted
TARGET: stm32mp15x.cpu1 - Not halted

　　If the reset init command can reset the CPU to the state before executing the first instruction, then we can locate the program entry and start tracking the execution process. I often do this for Cortex-m series. Now this trick fails, so I don’t know what program is executed when I connect to the debugger. If it is in Linux running state, it may be a certain program or kernel. If it is during u-boot, it is most likely the u-boot code. If the TF card is not inserted and the power is turned on, it should be the program in the ROM bootloader. These can be inferred.

　　But I don't know what the halt timeout phenomenon is.

　　According to the memory map in the manual, we know that the lowest address 0x0 of the 4G space is where the ROM bootloader is located, and then there are several SRAMs starting from 0x10000000, and the most important SYSRAM for the A7 core is at 0x2FFC0000. Starting from 0x40000000, there are registers for hardware devices, which are similar to STM32 MCUs.

　　However, when debugging in the ROM program stage, an error occurred when using the OpenOCD memory viewing command:

> mdb 0 16
data abort at 0x00000000, dfsr = 0x00000008

> mdb 0x10000000 16
data abort at 0x10000000, dfsr = 0x00000005

> mdb 0x2ffc0000 16
data abort at 0x2ffc0000, dfsr = 0x00000008

> mdb 0x30000000 16
0x30000000: 08 00 00 30 68 02 00 30 00 00 00 00 00 00 00 00

> mdw 0x40000000 16
data abort at 0x40000000, dfsr = 0x00000007

　　I tried several addresses, but I could not read the beginning of the RAM alias area. This is puzzling. The program runs in ROM, so how can it not read the ROM? I checked the PC register and found that the executed address is in the ROM area:

> reg pc
pc (/32): 0x0000A396
Then look at this address, it can be read
> mdb 0xA396 16
0x0000a396: f8 49 c0 1b 88 42 07 d9 07 f0 c9 fb 26 ea 05 05
> arm disassemble 0xA396 4 thumb
0x0000a396 0x49f8 LDR r1, [pc, #0x3e0] ; 0x0000a778
0x0000a398 0x1bc0 SUBS r0, r0, r7
0x0000a39a 0x4288 CMP r0, r1
0x0000a39c 0xd907 BLS 0x0000a3ae
　　It seems Some addresses of ROM are masked, which may be due to the activation of protection mechanism, or the control of MMU (memory management unit). Since reset init cannot be used to stop debugging in the state after reset, I don't know what the scene is like after the ROM program is initialized.

　　Judging from the SP register, SYSRAM is indeed still partially accessible.

> reg sp
sp (/32): 0x2FFC1BE0
> mdw 0x2ffc1000 4
0x2ffc1000: 260148df 0640bac1 077ff997 4e5e4a05
　　Let's find another device register, let's take a look at GPIOA (the address range is 0x50002000 ~ 0x500023FF from the manual)
> mdw 0x50002000 4
0x50002000: f7ffffbf 00002000 0c0000c0 00000000

　　This means that GPIOA is accessible. Then I can try to write a program to light up an LED and run it.

　　In the official STM32Cube_FW_MP1_V1.2.0 software package, I found the register definition file stm32mp157axx_ca7.h as usual for STM32 MCUs. The content is very similar to that of STM32 MCUs!
　　So I can start playing with it through register operations. First, turn on the light, and only need to operate GPIO. (Because the ROM program has done some initialization work)

　　According to the circuit diagram, there are two user LEDs on PA13 and PA14, and they are also connected to buttons.

So the lighting program is written like this:

#include "stm32mp157axx_ca7.h"

void _start(void)
{
	int i;
	GPIOA->MODER = 1<<13*2|1<<14*2|~(3<<13*2|3<<14*2);
	for(;;)
	{
		if(GPIOA->ODR & 1<<14)
			GPIOA->BRR = 1<<14;
		else
			GPIOA->BSRR = 1<<14;
		for(i=0;i<10000;i++)
		{
			int n;
			if(GPIOA->ODR & 1<<13)
				GPIOA->BRR = 1<<13;
			else
				GPIOA->BSRR = 1<<13;
			for(n=0;n<100;n++)
			{
				__NOP();
               /////  总共100条 NOP 指令，这里为了页面空间省略了
				__NOP();
			}
		}
	}
}

　　Because here I just wrote a program fragment, there is no initialization code (not needed, no global variables), no need to use the standard library, and no need to write main()

　　Then try to compile it first. Can I use the GCC that I used to develop STM32 before?

E:\stm32mp157\a7>arm-none-eabi-gcc -mcpu=cortex-a7 -Os test.c -c -std=gnu99

　　This can be compiled (adding -std=gnu99 is because the header files used require it). Then take a look at the compiled code:

E:\stm32mp157\a7>arm-none-eabi-objdump -d test.o

test.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <_start>:
0: e59f31e0 ldr r3, [pc, #480]; 1e8 <_start+0x1e8>
4: e3e0230a mvn r2, #671088640; 0x28000000
8: e5832000 str r2, [r3]
c: e59f31d4 ldr r3, [ pc, #468] ; 1e8 <_start+0x1e8>
10: e5932014 ldr r2, [r3, #20]
14: e3120901 tst r2, #16384 ; 0x4000
18: e3a02901 mov r2, #16384 ; 0x4000
1c: 15832028 strne r2, [r3, #40]; 0x28
20: 05832018 streq r2, [r3, #24]
24: e3023710 movw r3, #10000; 0x2710
28: e59f21b8 ldr r2, [pc, #440]; 1e8 <_start+0x1e8>
2c: e5921014 ldr r1, [r2, #20]
30: e3110a02 tst r1, #8192; 0x2000
34: e3a01a02 mov r1, #8192; 0x2000
38: 15821028 strne r1, [r2, #40] ; 0x28
3c: 05821018 streq r1, [r2, #24]
40: e3a02064 mov r2, #100 ; 0x64

　　It looks normal, so put it into the SRAM of STM32MP157 and run it. Because this code does not use absolute memory addresses except GPIOA, it can be run at any address. Extract the compiled binary file:

E:\stm32mp157\a7>arm-none-eabi-objcopy -Obinary test.o test.bin

　　Then, use OpenOCD commands to write it to memory address 0x2FFC4000 (make sure it is accessible first)
> load_image e:/stm32mp157/a7/test.bin 0x2FFC4000
492 bytes written at address 0x2ffc4000
downloaded 492 bytes in 0.062500s (1.500 KiB/s)
　　The program is written to SRAM, and then run it:Resume the CPU execution, starting from 0x2FFC4000.
> resume 0x2FFC4000
stm32mp15x.cpu0 rev 5, partnum c07, arch f, variant 0, implementor 41
Timeout waiting for halt
Polling target stm32mp15x.cpu0 failed, trying to reexamine
stm32mp15x.cpu0: hardware has 6 breakpoints, 4 watchpoints

　　Something went wrong, and the PC register stayed at 0x2FFC4000

> reg pc
pc (/32): 0x2FFC4000

　　I checked the CPSR and found that the T bit was set to 1, so I tried to cancel it. Because the code I compiled was in ARM mode.

> reg cpsr
cpsr (/32): 0x800001F3

> reg cpsr 0x1d3
cpsr (/32): 0x000001D3

> resume

　　Then, the LED started blinking, it worked!

　　Look at the waveform on LED LD6 (red)

Because each time I modify the state of this pin (PA13) in the code, there are 10,000 NOP instructions. It can be estimated that the execution time of each NOP instruction is: 25.1us/10,000 = 2.51ns. If each NOP instruction is executed at a clock cycle, it is about 400MHz clock. This is just speculation and may not be correct.

This content is originally created by cruelfox , a netizen of EEWORLD forum. If you need to reprint or use it for commercial purposes, you must obtain the author's consent and indicate the source

cruelfox

　　32-bit ARM processors have two instruction encoding modes: ARM and Thumb. In the above experiment, I generated ARM mode code. From the disassembly, we can see that each instruction is encoded with a 32-bit word length as the basic unit. The Cortex-m series MCU only supports Thumb mode, and each instruction word length is 16-bit as the basic unit. This also means that the PC register address is 4-byte aligned in ARM mode, while it is 2-byte aligned in Thumb mode.

　　Of course, Cortex-A7 can also run programs in Thumb mode. For example, if the program above is compiled with GCC with the -mthumb option, Thumb code will be generated.

E:\stm32mp157\a7>arm-none-eabi-gcc -mcpu=cortex-a7 -Os test.c -c -std=gnu99 -mthumb

E:\stm32mp157\a7>arm-none-eabi-objdump -d test.o

test.o: file format elf32-littlearm

Disassembly of section .text:

00000000 <_start>:
0: 4b40 ldr r3, [pc, #256] ; (104 <_start+0x104>)
2: f06f 5220 mvn.w r2, #671088640 ; 0x28000000
6: 601a str r2, [r3, #0 ]
8: 4b3e ldr r3, [pc, #248] ; (104 <_start+0x104>)
a: 695a ldr r2, [r3, #20]
c: f412 4f80 tst.w r2, #16384 ; 0x4000
10: f44f 4280 mov.w r2, #16384 ; 0x4000
14: bf14 ite ne
16: 629a strne r2, [r3, #40]; 0x28
18: 619a streq r2, [r3, #24]
1a: f242 7310 movw r3, #10000; 0x2710
1e: 4a39 ldr r2, [pc , #228] ; (104 <_start+0x104>)
20: 6951 ldr r1, [r2, #20]
22: f411 5f00 tst.w r1, #8192 ; 0x2000
26: f44f 5100 mov.w r1, #8192 ; 0x2000
2a: bf14 ite ne
2c: 6291 strne r1, [r2, #40] ; 0x28
2e: 6191 streq r1, [r2, #24]
30: 2264 movs r2, #100 ; 0x64
32: bf00 nop
34: bf00 nop
　　The size of a Thumb instruction is Save a lot. In the same way, you can load the program into the SRAM of STM32MP157 and run it. If the CPU is in ARM mode, you need to set it to Thumb mode before running.

　　With Thumb mode, the difference between Cortex-A7 and Cortex-m series is further reduced. So can it execute the instructions of Cortex-m0/m3? I think there is probably no problem, as Cortex-A7 covers a larger instruction set. For the above program, the Thumb code generated by -mcpu=cortex-a7 and -mcpu=cortex-m3 is exactly the same.