Design of TinyML Image Classification Camera Based on ESP32
Source: InternetPublisher:无人共我 Keywords: Camera ESP32 Updated: 2024/06/03
Background of the project
We are facing an increasingly embedded machine learning revolution. And when we talk about machine learning (ML), the first thing that comes to mind is image classification, a kind of ML Hello World!
One of the most popular and affordable development boards with an integrated camera is the ESP32-CAM, which combines the Espressif ESP32-S MCU chip with the ArduCam OV2640 camera.
The ESP32 chip is powerful enough to even process images. It includes I2C, SPI, UART communications as well as PWM and DAC outputs.
parameter:
Working voltage: 4.75-5.25V
Splash: Default 32Mbit
RAM: Internal 520KB + External 8MB PSRAM
Wireless network: 802.11b/g/n/e/i
Bluetooth: Bluetooth 4.2BR/EDR and BLE standards
Supported interfaces (2Mbps): UART, SPI, I2C, PWM
Support TF card: maximum support 4G
I口:9
Serial port rate: default 115200bps
Spectrum range: 2400 ~2483.5MHz
Antenna type: Onboard PCB antenna, gain 2dBi
Image output formats: JPEG (only supports OV2640), BMP, GRAYSCALE
Below, the general circuit board pinout:
Please note that this device does not have an integrated USB-TTL serial module, so to upload code to the ESP32-CAM you need a special adapter, as shown below:
Or USB-TTL serial conversion adapter as follows:
If you want to learn about the ESP32-CAM, I highly recommend Rui Santos's books and tutorials.
Install ESP32-Cam on Arduino IDE
From the Arduino IDE open the preferences window and go to: Arduino > Preferences
Use the following line input:
https://dl.espressif.com/dl/package_esp32_index.json
Enter the following content in Additional Board Manager URLs
Next, open the boards manager by going to Tools > Board > Boards Manager.. and enter using esp32. Select and install the latest package
Select ESP32 development board:
For example, AI-Thinker ESP32-CAM
Finally, don't forget to select the port to which the ESP32-Cam is connected.
That's it! The device should be fine. Let's do some testing.
Testing the board with BLINK
The ESP32-CAM has a built-in LED connected to GPIO33. So, change the Blink sketch accordingly:
#define LED_BUILT_IN 33 void setup() { pinMode(LED_BUILT_IN, OUTPUT); // Set the pin as output } // Remember that the pin work with inverted logic // LOW to Turn on and HIGH to turn off void loop() { digitalWrite(LED_BUILT_IN, LOW); //Turn on delay (1000); //Wait 1 sec digitalWrite(LED_BUILT_IN, HIGH); //Turn off delay (1000); //Wait 1 sec } |
Special reminder, the LED is located under the circuit board.
Testing WiFi
One of the ESP32S's slick features is its WiFi capability. So, let's test its radio by scanning for WiFi networks around it. You can do this running one of the code examples that comes with the board.
Go to Arduino IDE Examples and look for WiFI ==> WiFIScan
On the serial monitor you should see the wifi networks in range of the device (SSID and RSSI). This is what I get at home:
Testing the Camera
For camera testing you can use the following code:
Examples ==> ESP32 ==> Camera ==> CameraWebServer
Choose only the right camera:
#define CAMERA_MODEL_AI_THINKER |
And enter using your network credentials:
const char* ssid = "*********"; const char* password = "*********"; |
On the serial monitor you will get the correct address to run the server where you can control the camera:
Here I entered: http://172.16.42.26
Running your web server
So far we can test all the ESP32-Cam hardware (MCU and camera) as well as the wifi connection. Now, let's run a simpler code to capture a single image and present it on a simple web page. This code is based on Rui Santos' great tutorial: ESP32-CAM Take Photo and Display in Web Server Development
Download the file from GitHub: ESP332_CAM_HTTP_Server_STA, change the wifi credentials and run the code. The result is as follows:
Try checking the code; it's easier to understand how the camera works.
Fruits and Vegetables - Image Classification
Now that we have our embedded camera running, it’s time to try image classification.
We should start training the model and proceed with inference on the ESP32-CAM. We need to find a large amount of data for training the model.
TinyML is a set of technologies related to machine learning inference on embedded devices. Due to limitations (mainly memory in this case), we should limit classification to three or four categories. We distinguish apples from bananas and potatoes (you can try other categories).
So let's find a specific dataset that contains images of these categories. Kaggle is a good start:
https://www.kaggle.com/kritikseth/fruit-and-vegetable-image-recognition
The dataset contains images of the following food items:
Fruits-bananas, apples, pears, grapes, oranges, kiwis, watermelons, pomegranates, pineapples, mangoes.
Vegetables - cucumber, carrot, pepper, onion, potato, lemon, tomato, radish, beetroot, cabbage, lettuce, spinach, beans, cauliflower, bell pepper, capsicum, radish, corn, sweet corn, sweet potato, paprika, jalapeno, ginger, garlic, peas, eggplant.
Each class is divided into training (100 images), testing (10 images), and validation (10 images).
Download the dataset from the Kaggle website to your computer.
Training models using Edge Impulse Studio
We will be using Edge Impulse for training, the leading development platform for machine learning on edge devices.
Enter your account credentials at Edge Impulse (or create a free account). Next, create a new project:
data collection
Next, in the Upload Data section, upload files of the selected category from your computer:
If you end up with three categories of data, reading the training will help
You can also upload additional data for further model testing or to split your training data.
Impulse Design
Pulse takes raw data (in this case, images), extracts features (resizes the images), and then uses a learning block to classify new data.
As mentioned before, classifying images is the most common use of deep learning, but it requires a lot of data to accomplish this task. We have about 90 images per class. Is this enough? Not at all! We will need thousands of images to "teach or model" the difference between an apple and a banana. However, we can solve this problem by retraining a previously trained model using thousands of images. We call this technique "Transfer Learning" (TL).
Using TL, we can fine-tune a pre-trained image classification model on our data and achieve good performance even on relatively small image datasets (our case).
So, starting with our original images, we will resize them to (96x96) pixels and then feed them into our transfer learning block:
Preprocessing (feature generation)
In addition to resizing the images, we should also change them to grayscale to keep the actual RGB color depth. Doing so, each of our data samples will have 9 dimensions, 216 features (96x96x1). Keeping it in RGB, this dimension would be three times larger. Using grayscale helps reduce the final amount of memory required for inference.
Don't forget to "Save Parameters". This will generate the features to be used in training.
Training (transfer learning and data augmentation)
In 2007, Google introduced MobileNetV1, a family of general-purpose computer vision neural networks designed for mobile devices that supports classification, detection, etc. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of various use cases.
Although the basic MobileNet architecture is already small and has low latency, there are many times when a specific use case or application may require a smaller and faster model. To build these smaller and less computationally expensive models, MobileNet introduces a very simple parameter α (alpha), called the width multiplier. The width multiplier α has the effect of uniformly thinning the network at each layer.
Edge Impulse Studio provides MobileNet V1 (96x96 images) and V2 (96x96 and 160x160 images) with multiple different alpha values (from 0.05 to 1.0). For example, you will get the highest accuracy with V2, 160x160 images, and alpha=1.0. Of course, there is a trade-off. The higher the accuracy, the more memory is required to run the model (about 1.3M RAM and 2.6M ROM), which means more latency.
At the other extreme, using MobileNet V1 with α=0.10 (approximately 53.2K RAM and 101K ROM) results in an even smaller footprint.
To run this project on the ESP32-CAM, we should stay on the lower end of the possibilities, guaranteeing the case for inference but not guaranteeing high accuracy.
Another necessary technique used with deep learning is data augmentation. Data augmentation is a method that can help improve the accuracy of machine learning models. Data augmentation systems make small, random changes to the training data during the training process (such as flipping, cropping, or rotating images).
Here you can see how Edge Impulse implements data augmentation strategies on your data:
# Implements the data augmentation policy def augment_image(image, label): # Flips the image randomly image = tf.image.random_flip_left_right(image) # Increase the image size, then randomly crop it down to # the original dimensions resize_factor = random.uniform(1, 1.2) new_height = math.floor(resize_factor * INPUT_SHAPE[0]) new_width = math.floor(resize_factor * INPUT_SHAPE[1]) image = tf.image.resize_with_crop_or_pad(image, new_height, new_width) image = tf.image.random_crop(image, size=INPUT_SHAPE) # Vary the brightness of the image image = tf.image.random_brightness(image, max_delta=0.2) return image, label |
Exposure to these changes during training can help prevent the model from taking shortcuts by “memorizing” surface cues from the training data, meaning it can better reflect deeper underlying patterns in the dataset.
The last layer of our model will have 16 neurons with 10% dropout to prevent overfitting. Here is the training output:
The results are not great. The model achieves about 77% accuracy, but the amount of RAM memory used during inference is expected to be very small (about 60 KB), which is very good.
deploy
The trained model is deployed as a .zip Arduino library for the specific ESP32-Cam code.
Open your Arduino IDE and under Sketch, go to Include Library and add the .ZIP Library. Select the file you just downloaded from Edge Impulse Studio and that's it!
Under the Examples tab in the Arduino IDE, you should find a sketch code under the project name.
Open static buffer example:
You can see that the first code calls a library that has everything you need to run inference on your device.
#include 《ESP32-CAM-Fruit-vs-Veggies_inferencing.h》
Of course, this is a generic code (a "template") that just takes a sample of raw data (stored in the variable: features = {} and runs the classifier, doing inference. The results are displayed on the serial monitor.
What we should do is take samples (images) from the camera, preprocess it (resize to 96x96, convert to grayscale and flatten it. This will be the input tensor for our model. The output tensor will be a tensor containing three values showing the probability of each class.
On the website: https://github.com/edgeimpulse/example-esp32-cam, Edge impulse adapted the code that can be used for camera testing (Examples ==> ESP32 ==> Camera ==> CameraWebServer), including the necessary to run inference on the ESP32-CAM. On GitHub, download the code Basic-Image-Classification, include your project library, select your camera and your wifi network credentials:
Upload the code to your ESP32-Cam and you should be able to start sorting fruits and vegetables! You can check it on the serial monitor:
Testing the model (inference)
Take a picture with the camera, and the classification results will appear on the serial monitor:
The images captured by the camera can be verified on the web page:
Other tests:
in conclusion
The ESP32-Cam is a very flexible, inexpensive, and easy to program device. This project serves to demonstrate the potential of TinyML, but I am not sure if the overall results can be applied to real applications (in a developed way). Only the smallest transfer learning model worked properly (MobileNet V1, α=0.10), and any attempt to use a larger α to improve accuracy resulted in Arena allocation errors. One of the possible reasons is the amount of memory already used in the final common code to run the camera. Therefore, the next step of the project is to optimize the final code to free up more memory for running the model.
- Electronic dice circuit based on CD4017
- DIY an electromagnetic glove
- IoT-based weather data logger
- Make a draft beer machine using Windows 10 UWP
- Design a BLE thermos cup using M5Stack
- DIY an open source air quality monitor
- How to build an autonomous robot using the DonkeyCar platform
- Simple and easy-to-make electronic mosquito repellent
- Design of smart radiator valve for home based on Raspberry Pi
- Laser adjustable constant current drive circuit
- Do you know the power supply design of the vehicle panoramic surround view camera system?
- Camera power supply circuit
- How does an optocoupler work? Introduction to the working principle and function of optocoupler
- 8050 transistor pin diagram and functions
- What is the circuit diagram of a TV power supply and how to repair it?
- Analyze common refrigerator control circuit diagrams and easily understand the working principle of refrigerators
- Hemisphere induction cooker circuit diagram, what you want is here
- Circuit design of mobile phone anti-theft alarm system using C8051F330 - alarm circuit diagram | alarm circuit diagram
- Humidity controller circuit design using NAND gate CD4011-humidity sensitive circuit
- Electronic sound-imitating mouse repellent circuit design - consumer electronics circuit diagram