i.MX RT helps you cross-border AI ——3. Model deployment

Latest update time：2021-09-01 23:05

Reads：

Previous Issue Review

In the previous issue, we observed the entire process of applying AI to MCU-level platforms from a "God's perspective".

The key to this is the conversion and deployment of AI models, which connects the two worlds of training models and using models. There are two main ways to do this, namely converting the model into code for the underlying library of the docking platform, or inserting an execution engine in the firmware and converting the model into corresponding instructions.

So, what do they look like? To be honest, I have indeed traveled through both of these paths, but I am not yet fully prepared to meet you. However, at least I can prepare some gifts for today.

Figure 1. Model deployment diagram

The first two articles in this series:

i.MX RT helps you cross over to AI - 1. A good start

i.MX RT helps you cross over to AI - 2. Integrate AI modules into the system

Model to Code

I believe everyone is familiar with the process of compiling a program, and this model-to-code conversion is just like "compiling" the model, except that the result of compiling the model is C source code, and the result of compiling the C source code is machine instructions.

On the Cortex-M platform, Arm provides a low-level library dedicated to performing neural network operations, called CMSIS-NN, which provides an API interface in C language, supports common ordinary convolution, spatial and channel separation convolution, and fully connected operations, and also supports auxiliary operations such as activation and downsampling used in conjunction with the main operation. The organic combination of these five "building blocks" is sufficient to build most deep neural network models.

As for CMSIS-NN, I will talk about it in a special topic later. It plays a key role.

We can figuratively think of CMSIS-NN as a special CPU that provides the above five instructions, and the model is the source code. Model-to-code conversion is to "compile" the model into the "machine language" of CMSIS-NN.

Unfortunately, there is no ready-made "CMSIS-NN Universal Model Compiler". I am following Chairman Mao's teachings: "Do it yourself, and you will have enough food and clothing". I will first make a self-sufficient "compiler" and then organize a decent tool to offer to everyone.

At present, we can compile CIFAR-10 and another more complex MobileFaceNet (face recognition). We still need to do some debugging next.

In addition, NXP's MCUs often have some innovations, such as heterogeneous dual-cores, DSPs, and coprocessors, which are very helpful in improving the performance of the model. At the same time, obtaining the input of the model requires driving some peripherals (such as cameras, microphones) and preprocessing the data. If the code generation tool can generate corresponding parts for specific usage scenarios and devices, it will undoubtedly significantly improve the performance of specific devices and applications, which is also the direction of our future efforts.

Model to intermediate representation

If the above model-to-code conversion is like the compilation method, then converting the model into an intermediate expression of an execution engine is like the "interpretation" method, and this execution engine is the interpreter.

The interpreter can either call CMSIS-NN to improve efficiency on the Cortex-M platform or have a built-in underlying NN computing library to improve versatility. Because running DL models on the MCU platform is still a new world, the existing execution engine has not been optimized for CMSIS-NN and can only use the built-in general NN library.

A representative of execution engines is Google's Tensorflow-Lite (TF-Lite for short). The name of TF must have been well-known, but this TF-Lite is not a simplified version of TF, but just an execution engine. The most important thing is that it does not have the function of training models.

Google provides a tool called "toco" to convert TF models (pb format files) into intermediate expressions that TF-Lite can interpret. When integrated into the MCU, the converted file is expanded into a C array definition or placed in an SD card, and TF-Lite is compiled and linked into the firmware on the MCU side, and then it can be used.

In current practice, the editor and his friends tried out examples of face recognition and keyword and password recognition on a modified version of TF-Lite.

Long-term planning - online AI model market

In actual applications, it is often the case that a piece of hardware needs to run different models. For example, the owner of a parts processing factory receives orders from many customers, and the shapes of the parts to be processed are also very strange. However, there is one constant requirement, which is to detect defective products as much as possible.

It is easy to create a parts detector based on machine vision technology based on OpenMV RT, but it is obviously impractical to train a model to identify all possible parts and defective products.

If there is such a model market that allows processing plants to install and uninstall models for detecting defective specific parts like installing a mobile phone app, and cooperate with model developers for mutual benefit, it will be a great benefit to the country and the people!

To realize this beautiful vision, an optimized AI application execution engine, cloud market, and "post-installation application" mechanism are needed. As a veteran who has been working on AI-IoT for a long time, NXP has already implemented the "EdgeScale" cloud service on the Linux-based MPU platform, which can easily deploy personalized functions and migrate applications on devices that support EdgeScale services.

Although the MCU platform is very different from MPU + Linux, with the unremitting efforts of our family, we have recently successfully tried to integrate and update the OpenMV RT firmware on the MCU (i.MX RT) through EdgeScale and deploy new application functions. In the future, we will not update the overall firmware but only install the application model, and provide a model market as a platform to help develop the MCU ecosystem in AI-IoT!

Figure 2. EdgeScale AI model application market

In the following series, the editor will learn how to deploy AI in MCU from the outside to the inside, from this to that.

[Excerpt from NXP MCU Gas Station]

【Recommended reading】

How does the CPU access memory?

Distribution of physical and virtual addresses

Linux kernel memory management algorithm Buddy and Slab

Memory management of Linux user-mode processes

Add Geek Assistant WeChat and join the technical exchange group

Long press, scan the code, and follow the official account

Like it, leave a message, and forward it!