[STM32H7R/S] Evaluation ⑥ Use CUBE AI to transplant the model to the development board for operation

不爱胡萝卜的仓鼠 · Published on 2024-10-22 23:55

[STM32H7R/S] Evaluation ⑥ Use CUBE AI to transplant the model to the development board for operation [Copy link]

This post was last edited by Hamster who doesn't like carrots on 2024-11-3 20:59

Before I start training my own model and running it on the development board, I have to try to port an existing model to the STM32 development board to gain experience.

The trained model cannot be run directly on STM32. My superficial understanding is that STM32 is just a single-chip microcomputer. Its hardware capabilities and software architecture cannot directly run the AI model. Then a tool is needed to convert these complex things into C language code, so that the STM32 single-chip microcomputer can run the AI model. CUBE AI is the tool that ST has prepared for all developers.

1. Install CUBE AI

CUBE AI is a software package in CUBEMX. The installation is very simple. Just open cubemx and click "Help" -> "Manage embedded software packages".

Then find "STMicroelectronics"->"X-CUBE-AI" on the pop-up page, select the version you need, and click "install". I installed 9.0.0 earlier. When I wrote this article today, the latest version has reached 9.1.0, so I'm lazy and don't update it.

2. Use CUBE AI to transplant the model

2.1 Activate CUBE AI

Open our serial port project, find the "X-CUBE-AI" option on the left, and click

Then the pop-up interface is as follows

Select the version of CUBE AI. I installed 9.0.0 here, so I chose 9.0.0. Then check core, and then choose APP according to your needs. I chose Validation here. The meanings of these three options of APP are as follows

The successful opening interface is as follows. The checkboxes are automatically checked and we don’t need to click them manually.

2.2 Adding a network

Next, you need to add a network (actually, add an AI model). Since I don’t have my own trained model, I will directly use the model in the project downloaded from Cloud AI last time (this model is actually downloaded from ST’s Model Zoo. You can directly visit the following website: https://stm32ai.st.com/model-zoo/ to obtain various models prepared by ST. It also supports you to add samples and retrain again)

Select the model type. I use Keras, which also supports TFLite and 0NNX. You can choose according to your actual situation. Then select the network file

We don't need to worry about the serial port output, because the serial port has been initialized in the previous project, and it is automatically matched here.

Add the model here and it’s OK

Next, let's analyze it. If we don't analyze it, we won't be allowed to generate the project later.

I have an error here and need to modify the registry (I won’t go into details about how to modify the registry, just search on Baidu)

After modification, analyze again and you can see the progress bar moving.

The analysis is completed as shown below



Analyzingmodel 
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exeanalyze--targetstm32h7--namenetwork-mC:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5--compressionnone--verbosity1--allocate-inputs--allocate-outputs--workspaceC:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171808875528810010910823047711904506--outputC:/Users/Administrator/.stm32cubemx/network_output 
STEdgeAICorev9.0.0-19802 
Creatingc(debug)infojsonfileC:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171808875528810010910823047711904506\network_c_info.json 
 
Exec/reportsummary(analyze) 
---------------------------------------------------------------------------------------------------------------------------------------- 
modelfile:C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 
type:keras 
c_name:network 
compression:none 
options:allocate-inputs,allocate-outputs 
optimization:balanced 
target/series:stm32h7 
workspacedir:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171808875528810010910823047711904506 
outputdir:C:\Users\Administrator\.stm32cubemx\network_output 
model_fmt:float 
model_name:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset 
model_hash:0x1e108c42827f4c62598744246d259703 
params#:2,752items(10.75KiB) 
---------------------------------------------------------------------------------------------------------------------------------------- 
input1/1:'input_1',f32(1x8x8x2),512Bytes,activations 
output1/1:'dense_1',f32(1x8),32Bytes,activations 
macc:8,520 
weights(ro):11,008B(10.75KiB)(1segment) 
activations(rw):1,024B(1024B)(1segment)* 
ram(total):1,024B(1024B)=1,024+0+0 
---------------------------------------------------------------------------------------------------------------------------------------- 
(*)'input'/'output'bufferscanbeusedfromtheactivationsbuffer 
Modelname-CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset 
------------------------------------------------------------------------------------------ 
m_idlayer(original)oshapeparam/sizemaccconnectedto 
------------------------------------------------------------------------------------------ 
0input_1(InputLayer)[b:1,h:8,w:8,c:2] 
------------------------------------------------------------------------------------------ 
1conv2d(Conv2D)[b:1,h:6,w:6,c:8]152/6085,192input_1 
------------------------------------------------------------------------------------------ 
2activation(Activation)[b:1,h:6,w:6,c:8]288conv2d 
------------------------------------------------------------------------------------------ 
3max_pooling2d(MaxPooling2D)[b:1,h:3,w:3,c:8]288activation 
------------------------------------------------------------------------------------------ 
5flatten(Flatten)[b:1,c:72]max_pooling2d 
------------------------------------------------------------------------------------------ 
6dense_dense(Dense)[b:1,c:32]2,336/9,3442,336flatten 
dense(Dense)[b:1,c:32]32dense_dense 
------------------------------------------------------------------------------------------ 
7dense_1_dense(Dense)[b:1,c:8]264/1,056264dense 
dense_1(Dense)[b:1,c:8]120dense_1_dense 
------------------------------------------------------------------------------------------ 
model:macc=8,520weights=11,008activations=--io=-- 
Numberofoperationsperc-layer 
---------------------------------------------------------- 
c_idm_idname(type)#optype 
---------------------------------------------------------- 
03conv2d(Conv2D)5,768smul_f32_f32 
16dense_dense(Dense)2,336smul_f32_f32 
26dense(Nonlinearity)32op_f32_f32 
37dense_1_dense(Dense)264smul_f32_f32 
47dense_1(Nonlinearity)120op_f32_f32 
---------------------------------------------------------- 
total8,520 
Numberofoperationtypes 
---------------------------------- 
operationtype#% 
---------------------------------- 
smul_f32_f328,36898.2% 
op_f32_f321521.8% 
Complexityreport(model) 
------------------------------------------------------------------------------- 
m_idnamec_maccc_romc_id 
------------------------------------------------------------------------------- 
3max_pooling2d||||||||||||||||67.7%|5.5%[0] 
6dense_dense|||||||27.8%||||||||||||||||84.9%[1,2] 
7dense_1_dense|4.5%||9.6%[3,4] 
------------------------------------------------------------------------------- 
macc=8,520weights=11,008act=1,024ram_io=0 
Requestedmemorysizepersegment("stm32h7"series) 
----------------------------------------------------------- 
moduletextrodatadatabss 
----------------------------------------------------------- 
NetworkRuntime900_CM7_GCC.a10,220000 
network.o584401,796168 
network_data.o5216880 
lib(toolchain)*31832800 
----------------------------------------------------------- 
RTtotal**11,1743841,884168 
----------------------------------------------------------- 
weights011,00800 
activations0001,024 
io0000 
----------------------------------------------------------- 
TOTAL11,17411,3921,8841,192 
----------------------------------------------------------- 
*toolchainobjects(libm/libgcc*) 
**RT-AIruntimeobjects(kernels+infrastructure) 
Summarypertypeofmemorydevice 
-------------------------------------------- 
FLASH%RAM% 
-------------------------------------------- 
RTtotal13,44255.0%2,05266.7% 
-------------------------------------------- 
TOTAL24,4503,076 
-------------------------------------------- 
CreatingtxtreportfileC:\Users\Administrator\.stm32cubemx\network_output\network_analyze_report.txt 
elapsedtime(analyze):7.829s 
Modelfile:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 
TotalFlash:24450B(23.88KiB) 
Weights:11008B(10.75KiB) 
Library:13442B(13.13KiB) 
TotalRam:3076B(3.00KiB) 
Activations:1024B 
Library:2052B(2.00KiB) 
Input:512B(includedinActivations) 
Output:32B(includedinActivations) 
Done 
Analyze complete on AI model

Here we can also conduct simulation tests on the computer

After the computer simulation test is completed, the log is as follows



StartingAIvalidationondesktopwithrandomdata...
  
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exevalidate--targetstm32h7--namenetwork-mC:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5--compressionnone--verbosity1--allocate-inputs--allocate-outputs--workspaceC:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171971186255410016731057912494845799--outputC:/Users/Administrator/.stm32cubemx/network_output 
STEdgeAICorev9.0.0-19802 
Settingvalidationdata... 
generatingrandomdata,size=10,seed=42,range=(0,1) 
I[1]:(10,8,8,2)/float32,min/max=[0.005,1.000],mean/std=[0.498,0.294],input_1 
Nooutput/referencesamplesareprovided 
Creatingc(debug)infojsonfileC:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\network_c_info.json 
CopyingtheAIruntimefilestotheuserworkspace:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\inspector_network\workspace 
 
Exec/reportsummary(validate) 
---------------------------------------------------------------------------------------------------------------------------------------- 
modelfile:C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 
type:keras 
c_name:network 
compression:none 
options:allocate-inputs,allocate-outputs 
optimization:balanced 
target/series:stm32h7 
workspacedir:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799 
outputdir:C:\Users\Administrator\.stm32cubemx\network_output 
model_fmt:float 
model_name:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset 
model_hash:0x1e108c42827f4c62598744246d259703 
params#:2,752items(10.75KiB) 
---------------------------------------------------------------------------------------------------------------------------------------- 
input1/1:'input_1',f32(1x8x8x2),512Bytes,activations 
output1/1:'dense_1',f32(1x8),32Bytes,activations 
macc:8,520 
weights(ro):11,008B(10.75KiB)(1segment) 
activations(rw):1,024B(1024B)(1segment)* 
ram(total):1,024B(1024B)=1,024+0+0 
---------------------------------------------------------------------------------------------------------------------------------------- 
(*)'input'/'output'bufferscanbeusedfromtheactivationsbuffer 
RunningtheKerasmodel... 
RunningtheSTMAIc-model(AIRUNNER)...(name=network,mode=HOST) 
X86sharedlib(C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\inspector_network\workspace\lib\libai_network.dll)['network'] 
Summary'network'-['network'] 
------------------------------------------------------------------------------------------ 
inputs/ouputs:1/1 
input_1:f32[1,8,8,2],512Bytes,inactivationsbuffer 
output_1:f32[1,1,1,8],32Bytes,inactivationsbuffer 
n_nodes:5 
compile_datetime:Nov3202420:50:43 
activations:1024 
weights:11008 
macc:8520 
------------------------------------------------------------------------------------------ 
tools:LegacyST.AI9.0.0 
capabilities:IO_ONLY,PER_LAYER,PER_LAYER_WITH_DATA 
device:AMD64,AMD64Family23Model1Stepping1,AuthenticAMD,Windows 
------------------------------------------------------------------------------------------ 
NOTE:durationandexectimeperlayerisjustanindication.TheyaredependentoftheHOST-machinework-load. 
ST.AIProfilingresultsv1.2-"network" 
------------------------------------------------------------ 
nbsample(s):10 
duration:0.015msbysample(0.008/0.069/0.018) 
macc:8520 
------------------------------------------------------------ 
Inferencetimepernode 
------------------------------------------------------------------------------- 
c_idm_idtypedur(ms)%cumulname 
------------------------------------------------------------------------------- 
03Conv2dPool(0x109)0.00958.4%58.4%ai_node_0 
16Dense(0x104)0.00317.4%75.8%ai_node_1 
26NL(0x107)0.0018.1%83.9%ai_node_2 
37Dense(0x104)0.0002.7%86.6%ai_node_3 
47NL(0x107)0.00212.8%99.3%ai_node_4 
------------------------------------------------------------------------------- 
total0.015 
------------------------------------------------------------------------------- 
Statisticpertensor 
------------------------------------------------------------------------------- 
tensor#type[shape]:sizeminmaxmeanstdname 
------------------------------------------------------------------------------- 
I.010f32[1,8,8,2]:5120.0051.0000.4980.294input_1 
O.010f32[1,1,1,8]:320.0001.0000.1250.321output_1 
------------------------------------------------------------------------------- 
Savingvalidationdata... 
outputdirectory:C:\Users\Administrator\.stm32cubemx\network_output 
creatingC:\Users\Administrator\.stm32cubemx\network_output\network_val_io.npz 
m_outputs_1:(10,1,1,8)/float32,min/max=[0.000,1.000],mean/std=[0.125,0.321],dense_1 
c_outputs_1:(10,1,1,8)/float32,min/max=[0.000,1.000],mean/std=[0.125,0.321],dense_1 
Computingthemetrics... 
Crossaccuracyreport#1(referencevsC-model) 
---------------------------------------------------------------------------------------------------- 
notes:-theoutputofthereferencemodelisusedasgroundtruth/referencevalue 
-10samples(8itemspersample) 
acc=100.00%,rmse=0.000000063,mae=0.000000015,l2r=0.000000183,nse=1.000,cos=1.000 
8classes(10samples) 
------------------------------------------------ 
C010....... 
C1.0...... 
C2..0..... 
C3...0.... 
C4....0... 
C5.....0.. 
C6......0. 
C7.......0 
Evaluationreport(summary) 
-------------------------------------------------------------------------------------------------------------------------------------- 
Outputaccrmsemael2rmeanstdnsecostensor 
-------------------------------------------------------------------------------------------------------------------------------------- 
X-cross#1100.00%0.00000010.00000000.0000002-0.00000000.00000011.00000001.0000000dense_1,(8,),m_id=[7] 
-------------------------------------------------------------------------------------------------------------------------------------- 
acc:Classificationaccuracy(allclasses) 
rmse:RootMeanSquaredError 
mae:MeanAbsoluteError 
l2r:L2relativeerror 
nse:Nash-Sutcliffeefficiencycriteria,biggerisbetter,best=1,range=(-inf,1] 
cos:COsineSimilarity,biggerisbetter,best=1,range=(0,1] 
CreatingtxtreportfileC:\Users\Administrator\.stm32cubemx\network_output\network_validate_report.txt 
elapsedtime(validate):7.011s 
Validation ended

Then we can choose whether to compress, and choose a balance between speed and RAM saving. This is similar to our previous cloud AI, so I won’t go into details. In this way, through adjustment and PC simulation, we can get the expected results before generating the project, which greatly saves time

2.3 Other peripherals that must be turned on

The CPU's I CACHE, D CACHE, and ART all need to be turned on (I didn't find ART here, so I'll ignore it for now. If you have it, turn it on)

Then turn on CRC

2.4 Generate Project

When generating a project, adjust the minimum heap value to 0x2000, and then generate the project

There should be no warnings when generating a project. I just didn't analyze the model, and it warned me that if I force the project to be generated, there may be problems.

2.5 Minor changes

My project is a bit strange, so I have to make some extra modifications. This is not necessary for normal projects.

First, the ld file needs to be changed back (see the previous article for specific operations)

Secondly, a syscall.c file will be deleted, which will cause the compilation to fail. You need to get this file back. (There will be warnings when compiling, but it is not a big problem. The code can run. I can't solve this problem for the time being.)

3. Run the Validation test project

Because we just selected the Validation test project, various information will be printed after power-on, and then the user will be allowed to enter CMD

The power-on log is as follows

[20:41:39.248]收←◆
#
# AI Validation 7.1
#
Compiled with GCC 12.3.1
STM32 device configuration...
 Device       : DevID:0x0485 (STM32H7[R,]Sxx) RevID:0x1003
 Core Arch.   : M7 - FPU  used
 HAL version  : 0x01010000
 SYSCLK clock : 600 MHz
 HCLK clock   : 300 MHz
 FLASH conf.  : ACR=0x00000037 - latency=7
 CACHE conf.  : $I/$D=(True,True)

[20:41:39.379]收←◆ Timestamp    : SysTick + DWT (delay(1)=1.000 ms)

AI platform (API 1.1.0 - RUNTIME 9.0.0)
Discovering the network(s)...

Found network "network"
Creating the network "network"..
Initializing the network
Network informations...
 model name         : network
 model signature    : 0x1e108c42827f4c62598744246d259703
 model datetime     : Sun Nov  3 20:31:53 2024
 compile datetime   : Nov  3 2024 20:32:53
 tools version      : 9.0.0
 complexity         : 8520 MACC
 c-nodes            : 5
 map_activations    : 1
  [0]  @0x24000D60/1024
 map_weights        : 1
  [0]  @0x70013060/11008
 n_inputs/n_outputs : 1/1
  I[0] (1,8,8,2)128/float32 @0x24000DE0/512
  O[0] (1,1,1,8)8/float32 @0x24000D60/32

-------------------------------------------
| READY to receive a CMD from the HOST... |
-------------------------------------------

# Note: At this point, default ASCII-base terminal should be closed
# and a serial COM interface should be used
# (i.e. Python ai_runner module). Protocol version = 3.1

Seeing this means that our code has run successfully.

At this point we need to close the serial port tool, then go to cubemx and click on verify on target

Select the serial port of the development board, and then use the default baud rate of 115200

Wait for the development board to interact with the host computer and complete the test

The test results are as follows


StartingAIvalidationontargetwithrandomdata... 
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exevalidate--targetstm32h7--namenetwork-mC:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5--compressionnone--verbosity1--allocate-inputs--allocate-outputs--workspaceC:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171943553747800013843934305667415686--outputC:/Users/Administrator/.stm32cubemx/network_output--modetarget--descserial:COM49:115200 
STEdgeAICorev9.0.0-19802 
Settingvalidationdata... 
generatingrandomdata,size=10,seed=42,range=(0,1) 
I[1]:(10,8,8,2)/float32,min/max=[0.005,1.000],mean/std=[0.498,0.294],input_1 
Nooutput/referencesamplesareprovided 
Creatingc(debug)infojsonfileC:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171943553747800013843934305667415686\network_c_info.json 
 
Exec/reportsummary(validate) 
---------------------------------------------------------------------------------------------------------------------------------------- 
modelfile:C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5 
type:keras 
c_name:network 
compression:none 
options:allocate-inputs,allocate-outputs 
optimization:balanced 
target/series:stm32h7 
workspacedir:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171943553747800013843934305667415686 
outputdir:C:\Users\Administrator\.stm32cubemx\network_output 
model_fmt:float 
model_name:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset 
model_hash:0x1e108c42827f4c62598744246d259703 
params#:2,752items(10.75KiB) 
---------------------------------------------------------------------------------------------------------------------------------------- 
input1/1:'input_1',f32(1x8x8x2),512Bytes,activations 
output1/1:'dense_1',f32(1x8),32Bytes,activations 
macc:8,520 
weights(ro):11,008B(10.75KiB)(1segment) 
activations(rw):1,024B(1024B)(1segment)* 
ram(total):1,024B(1024B)=1,024+0+0 
---------------------------------------------------------------------------------------------------------------------------------------- 
(*)'input'/'output'bufferscanbeusedfromtheactivationsbuffer 
RunningtheKerasmodel... 
RunningtheSTMAIc-model(AIRUNNER)...(name=network,mode=TARGET) 
INTERNALERROR:E801(HwIOError):Invalidfirmware-COM49:115200 
Validation ended