This post was last edited by Hamster who doesn't like carrots on 2024-11-3 20:59
Before I start training my own model and running it on the development board, I have to try to port an existing model to the STM32 development board to gain experience.
The trained model cannot be run directly on STM32. My superficial understanding is that STM32 is just a single-chip microcomputer. Its hardware capabilities and software architecture cannot directly run the AI model. Then a tool is needed to convert these complex things into C language code, so that the STM32 single-chip microcomputer can run the AI model. CUBE AI is the tool that ST has prepared for all developers.
1. Install CUBE AI
CUBE AI is a software package in CUBEMX. The installation is very simple. Just open cubemx and click "Help" -> "Manage embedded software packages".
Then find "STMicroelectronics"->"X-CUBE-AI" on the pop-up page, select the version you need, and click "install". I installed 9.0.0 earlier. When I wrote this article today, the latest version has reached 9.1.0, so I'm lazy and don't update it.
2. Use CUBE AI to transplant the model
2.1 Activate CUBE AI
Open our serial port project, find the "X-CUBE-AI" option on the left, and click
Then the pop-up interface is as follows
Select the version of CUBE AI. I installed 9.0.0 here, so I chose 9.0.0. Then check core, and then choose APP according to your needs. I chose Validation here. The meanings of these three options of APP are as follows
The successful opening interface is as follows. The checkboxes are automatically checked and we don’t need to click them manually.
2.2 Adding a network
Next, you need to add a network (actually, add an AI model). Since I don’t have my own trained model, I will directly use the model in the project downloaded from Cloud AI last time (this model is actually downloaded from ST’s Model Zoo. You can directly visit the following website: https://stm32ai.st.com/model-zoo/ to obtain various models prepared by ST. It also supports you to add samples and retrain again)
Select the model type. I use Keras, which also supports TFLite and 0NNX. You can choose according to your actual situation. Then select the network file
We don't need to worry about the serial port output, because the serial port has been initialized in the previous project, and it is automatically matched here.
Add the model here and it’s OK
Next, let's analyze it. If we don't analyze it, we won't be allowed to generate the project later.
I have an error here and need to modify the registry (I won’t go into details about how to modify the registry, just search on Baidu)
After modification, analyze again and you can see the progress bar moving.
The analysis is completed as shown below
Analyzingmodel
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exeanalyze--targetstm32h7--namenetwork-mC:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5--compressionnone--verbosity1--allocate-inputs--allocate-outputs--workspaceC:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171808875528810010910823047711904506--outputC:/Users/Administrator/.stm32cubemx/network_output
STEdgeAICorev9.0.0-19802
Creatingc(debug)infojsonfileC:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171808875528810010910823047711904506\network_c_info.json
Exec/reportsummary(analyze)
----------------------------------------------------------------------------------------------------------------------------------------
modelfile:C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
type:keras
c_name:network
compression:none
options:allocate-inputs,allocate-outputs
optimization:balanced
target/series:stm32h7
workspacedir:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171808875528810010910823047711904506
outputdir:C:\Users\Administrator\.stm32cubemx\network_output
model_fmt:float
model_name:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
model_hash:0x1e108c42827f4c62598744246d259703
params#:2,752items(10.75KiB)
----------------------------------------------------------------------------------------------------------------------------------------
input1/1:'input_1',f32(1x8x8x2),512Bytes,activations
output1/1:'dense_1',f32(1x8),32Bytes,activations
macc:8,520
weights(ro):11,008B(10.75KiB)(1segment)
activations(rw):1,024B(1024B)(1segment)*
ram(total):1,024B(1024B)=1,024+0+0
----------------------------------------------------------------------------------------------------------------------------------------
(*)'input'/'output'bufferscanbeusedfromtheactivationsbuffer
Modelname-CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
------------------------------------------------------------------------------------------
m_idlayer(original)oshapeparam/sizemaccconnectedto
------------------------------------------------------------------------------------------
0input_1(InputLayer)[b:1,h:8,w:8,c:2]
------------------------------------------------------------------------------------------
1conv2d(Conv2D)[b:1,h:6,w:6,c:8]152/6085,192input_1
------------------------------------------------------------------------------------------
2activation(Activation)[b:1,h:6,w:6,c:8]288conv2d
------------------------------------------------------------------------------------------
3max_pooling2d(MaxPooling2D)[b:1,h:3,w:3,c:8]288activation
------------------------------------------------------------------------------------------
5flatten(Flatten)[b:1,c:72]max_pooling2d
------------------------------------------------------------------------------------------
6dense_dense(Dense)[b:1,c:32]2,336/9,3442,336flatten
dense(Dense)[b:1,c:32]32dense_dense
------------------------------------------------------------------------------------------
7dense_1_dense(Dense)[b:1,c:8]264/1,056264dense
dense_1(Dense)[b:1,c:8]120dense_1_dense
------------------------------------------------------------------------------------------
model:macc=8,520weights=11,008activations=--io=--
Numberofoperationsperc-layer
----------------------------------------------------------
c_idm_idname(type)#optype
----------------------------------------------------------
03conv2d(Conv2D)5,768smul_f32_f32
16dense_dense(Dense)2,336smul_f32_f32
26dense(Nonlinearity)32op_f32_f32
37dense_1_dense(Dense)264smul_f32_f32
47dense_1(Nonlinearity)120op_f32_f32
----------------------------------------------------------
total8,520
Numberofoperationtypes
----------------------------------
operationtype#%
----------------------------------
smul_f32_f328,36898.2%
op_f32_f321521.8%
Complexityreport(model)
-------------------------------------------------------------------------------
m_idnamec_maccc_romc_id
-------------------------------------------------------------------------------
3max_pooling2d||||||||||||||||67.7%|5.5%[0]
6dense_dense|||||||27.8%||||||||||||||||84.9%[1,2]
7dense_1_dense|4.5%||9.6%[3,4]
-------------------------------------------------------------------------------
macc=8,520weights=11,008act=1,024ram_io=0
Requestedmemorysizepersegment("stm32h7"series)
-----------------------------------------------------------
moduletextrodatadatabss
-----------------------------------------------------------
NetworkRuntime900_CM7_GCC.a10,220000
network.o584401,796168
network_data.o5216880
lib(toolchain)*31832800
-----------------------------------------------------------
RTtotal**11,1743841,884168
-----------------------------------------------------------
weights011,00800
activations0001,024
io0000
-----------------------------------------------------------
TOTAL11,17411,3921,8841,192
-----------------------------------------------------------
*toolchainobjects(libm/libgcc*)
**RT-AIruntimeobjects(kernels+infrastructure)
Summarypertypeofmemorydevice
--------------------------------------------
FLASH%RAM%
--------------------------------------------
RTtotal13,44255.0%2,05266.7%
--------------------------------------------
TOTAL24,4503,076
--------------------------------------------
CreatingtxtreportfileC:\Users\Administrator\.stm32cubemx\network_output\network_analyze_report.txt
elapsedtime(analyze):7.829s
Modelfile:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
TotalFlash:24450B(23.88KiB)
Weights:11008B(10.75KiB)
Library:13442B(13.13KiB)
TotalRam:3076B(3.00KiB)
Activations:1024B
Library:2052B(2.00KiB)
Input:512B(includedinActivations)
Output:32B(includedinActivations)
Done
Analyze complete on AI model
Here we can also conduct simulation tests on the computer
After the computer simulation test is completed, the log is as follows
StartingAIvalidationondesktopwithrandomdata...
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exevalidate--targetstm32h7--namenetwork-mC:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5--compressionnone--verbosity1--allocate-inputs--allocate-outputs--workspaceC:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171971186255410016731057912494845799--outputC:/Users/Administrator/.stm32cubemx/network_output
STEdgeAICorev9.0.0-19802
Settingvalidationdata...
generatingrandomdata,size=10,seed=42,range=(0,1)
I[1]:(10,8,8,2)/float32,min/max=[0.005,1.000],mean/std=[0.498,0.294],input_1
Nooutput/referencesamplesareprovided
Creatingc(debug)infojsonfileC:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\network_c_info.json
CopyingtheAIruntimefilestotheuserworkspace:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\inspector_network\workspace
Exec/reportsummary(validate)
----------------------------------------------------------------------------------------------------------------------------------------
modelfile:C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
type:keras
c_name:network
compression:none
options:allocate-inputs,allocate-outputs
optimization:balanced
target/series:stm32h7
workspacedir:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799
outputdir:C:\Users\Administrator\.stm32cubemx\network_output
model_fmt:float
model_name:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
model_hash:0x1e108c42827f4c62598744246d259703
params#:2,752items(10.75KiB)
----------------------------------------------------------------------------------------------------------------------------------------
input1/1:'input_1',f32(1x8x8x2),512Bytes,activations
output1/1:'dense_1',f32(1x8),32Bytes,activations
macc:8,520
weights(ro):11,008B(10.75KiB)(1segment)
activations(rw):1,024B(1024B)(1segment)*
ram(total):1,024B(1024B)=1,024+0+0
----------------------------------------------------------------------------------------------------------------------------------------
(*)'input'/'output'bufferscanbeusedfromtheactivationsbuffer
RunningtheKerasmodel...
RunningtheSTMAIc-model(AIRUNNER)...(name=network,mode=HOST)
X86sharedlib(C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171971186255410016731057912494845799\inspector_network\workspace\lib\libai_network.dll)['network']
Summary'network'-['network']
------------------------------------------------------------------------------------------
inputs/ouputs:1/1
input_1:f32[1,8,8,2],512Bytes,inactivationsbuffer
output_1:f32[1,1,1,8],32Bytes,inactivationsbuffer
n_nodes:5
compile_datetime:Nov3202420:50:43
activations:1024
weights:11008
macc:8520
------------------------------------------------------------------------------------------
tools:LegacyST.AI9.0.0
capabilities:IO_ONLY,PER_LAYER,PER_LAYER_WITH_DATA
device:AMD64,AMD64Family23Model1Stepping1,AuthenticAMD,Windows
------------------------------------------------------------------------------------------
NOTE:durationandexectimeperlayerisjustanindication.TheyaredependentoftheHOST-machinework-load.
ST.AIProfilingresultsv1.2-"network"
------------------------------------------------------------
nbsample(s):10
duration:0.015msbysample(0.008/0.069/0.018)
macc:8520
------------------------------------------------------------
Inferencetimepernode
-------------------------------------------------------------------------------
c_idm_idtypedur(ms)%cumulname
-------------------------------------------------------------------------------
03Conv2dPool(0x109)0.00958.4%58.4%ai_node_0
16Dense(0x104)0.00317.4%75.8%ai_node_1
26NL(0x107)0.0018.1%83.9%ai_node_2
37Dense(0x104)0.0002.7%86.6%ai_node_3
47NL(0x107)0.00212.8%99.3%ai_node_4
-------------------------------------------------------------------------------
total0.015
-------------------------------------------------------------------------------
Statisticpertensor
-------------------------------------------------------------------------------
tensor#type[shape]:sizeminmaxmeanstdname
-------------------------------------------------------------------------------
I.010f32[1,8,8,2]:5120.0051.0000.4980.294input_1
O.010f32[1,1,1,8]:320.0001.0000.1250.321output_1
-------------------------------------------------------------------------------
Savingvalidationdata...
outputdirectory:C:\Users\Administrator\.stm32cubemx\network_output
creatingC:\Users\Administrator\.stm32cubemx\network_output\network_val_io.npz
m_outputs_1:(10,1,1,8)/float32,min/max=[0.000,1.000],mean/std=[0.125,0.321],dense_1
c_outputs_1:(10,1,1,8)/float32,min/max=[0.000,1.000],mean/std=[0.125,0.321],dense_1
Computingthemetrics...
Crossaccuracyreport#1(referencevsC-model)
----------------------------------------------------------------------------------------------------
notes:-theoutputofthereferencemodelisusedasgroundtruth/referencevalue
-10samples(8itemspersample)
acc=100.00%,rmse=0.000000063,mae=0.000000015,l2r=0.000000183,nse=1.000,cos=1.000
8classes(10samples)
------------------------------------------------
C010.......
C1.0......
C2..0.....
C3...0....
C4....0...
C5.....0..
C6......0.
C7.......0
Evaluationreport(summary)
--------------------------------------------------------------------------------------------------------------------------------------
Outputaccrmsemael2rmeanstdnsecostensor
--------------------------------------------------------------------------------------------------------------------------------------
X-cross#1100.00%0.00000010.00000000.0000002-0.00000000.00000011.00000001.0000000dense_1,(8,),m_id=[7]
--------------------------------------------------------------------------------------------------------------------------------------
acc:Classificationaccuracy(allclasses)
rmse:RootMeanSquaredError
mae:MeanAbsoluteError
l2r:L2relativeerror
nse:Nash-Sutcliffeefficiencycriteria,biggerisbetter,best=1,range=(-inf,1]
cos:COsineSimilarity,biggerisbetter,best=1,range=(0,1]
CreatingtxtreportfileC:\Users\Administrator\.stm32cubemx\network_output\network_validate_report.txt
elapsedtime(validate):7.011s
Validation ended
Then we can choose whether to compress, and choose a balance between speed and RAM saving. This is similar to our previous cloud AI, so I won’t go into details. In this way, through adjustment and PC simulation, we can get the expected results before generating the project, which greatly saves time
2.3 Other peripherals that must be turned on
The CPU's I CACHE, D CACHE, and ART all need to be turned on (I didn't find ART here, so I'll ignore it for now. If you have it, turn it on)
Then turn on CRC
2.4 Generate Project
When generating a project, adjust the minimum heap value to 0x2000, and then generate the project
There should be no warnings when generating a project. I just didn't analyze the model, and it warned me that if I force the project to be generated, there may be problems.
2.5 Minor changes
My project is a bit strange, so I have to make some extra modifications. This is not necessary for normal projects.
First, the ld file needs to be changed back (see the previous article for specific operations)
Secondly, a syscall.c file will be deleted, which will cause the compilation to fail. You need to get this file back. (There will be warnings when compiling, but it is not a big problem. The code can run. I can't solve this problem for the time being.)
3. Run the Validation test project
Because we just selected the Validation test project, various information will be printed after power-on, and then the user will be allowed to enter CMD
The power-on log is as follows
[20:41:39.248]收←◆
#
# AI Validation 7.1
#
Compiled with GCC 12.3.1
STM32 device configuration...
Device : DevID:0x0485 (STM32H7[R,]Sxx) RevID:0x1003
Core Arch. : M7 - FPU used
HAL version : 0x01010000
SYSCLK clock : 600 MHz
HCLK clock : 300 MHz
FLASH conf. : ACR=0x00000037 - latency=7
CACHE conf. : $I/$D=(True,True)
[20:41:39.379]收←◆ Timestamp : SysTick + DWT (delay(1)=1.000 ms)
AI platform (API 1.1.0 - RUNTIME 9.0.0)
Discovering the network(s)...
Found network "network"
Creating the network "network"..
Initializing the network
Network informations...
model name : network
model signature : 0x1e108c42827f4c62598744246d259703
model datetime : Sun Nov 3 20:31:53 2024
compile datetime : Nov 3 2024 20:32:53
tools version : 9.0.0
complexity : 8520 MACC
c-nodes : 5
map_activations : 1
[0] @0x24000D60/1024
map_weights : 1
[0] @0x70013060/11008
n_inputs/n_outputs : 1/1
I[0] (1,8,8,2)128/float32 @0x24000DE0/512
O[0] (1,1,1,8)8/float32 @0x24000D60/32
-------------------------------------------
| READY to receive a CMD from the HOST... |
-------------------------------------------
# Note: At this point, default ASCII-base terminal should be closed
# and a serial COM interface should be used
# (i.e. Python ai_runner module). Protocol version = 3.1
Seeing this means that our code has run successfully.
At this point we need to close the serial port tool, then go to cubemx and click on verify on target
Select the serial port of the development board, and then use the default baud rate of 115200
Wait for the development board to interact with the host computer and complete the test
The test results are as follows
StartingAIvalidationontargetwithrandomdata...
C:/Users/Administrator/STM32Cube/Repository/Packs/STMicroelectronics/X-CUBE-AI/9.0.0/Utilities/windows/stedgeai.exevalidate--targetstm32h7--namenetwork-mC:/Users/Administrator/Downloads/CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5--compressionnone--verbosity1--allocate-inputs--allocate-outputs--workspaceC:/Users/ADMINI~1/AppData/Local/Temp/mxAI_workspace171943553747800013843934305667415686--outputC:/Users/Administrator/.stm32cubemx/network_output--modetarget--descserial:COM49:115200
STEdgeAICorev9.0.0-19802
Settingvalidationdata...
generatingrandomdata,size=10,seed=42,range=(0,1)
I[1]:(10,8,8,2)/float32,min/max=[0.005,1.000],mean/std=[0.498,0.294],input_1
Nooutput/referencesamplesareprovided
Creatingc(debug)infojsonfileC:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171943553747800013843934305667415686\network_c_info.json
Exec/reportsummary(validate)
----------------------------------------------------------------------------------------------------------------------------------------
modelfile:C:\Users\Administrator\Downloads\CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset.h5
type:keras
c_name:network
compression:none
options:allocate-inputs,allocate-outputs
optimization:balanced
target/series:stm32h7
workspacedir:C:\Users\ADMINI~1\AppData\Local\Temp\mxAI_workspace171943553747800013843934305667415686
outputdir:C:\Users\Administrator\.stm32cubemx\network_output
model_fmt:float
model_name:CNN2D_ST_HandPosture_8classes_hand_posture_ST_VL53L5CX_handposture_dataset
model_hash:0x1e108c42827f4c62598744246d259703
params#:2,752items(10.75KiB)
----------------------------------------------------------------------------------------------------------------------------------------
input1/1:'input_1',f32(1x8x8x2),512Bytes,activations
output1/1:'dense_1',f32(1x8),32Bytes,activations
macc:8,520
weights(ro):11,008B(10.75KiB)(1segment)
activations(rw):1,024B(1024B)(1segment)*
ram(total):1,024B(1024B)=1,024+0+0
----------------------------------------------------------------------------------------------------------------------------------------
(*)'input'/'output'bufferscanbeusedfromtheactivationsbuffer
RunningtheKerasmodel...
RunningtheSTMAIc-model(AIRUNNER)...(name=network,mode=TARGET)
INTERNALERROR:E801(HwIOError):Invalidfirmware-COM49:115200
Validation ended