▲
Click above to follow
STM32
Neural network algorithms based on supervised learning require a large amount of data as input. The model is completely driven by data. The quality of the data is a necessary condition for the algorithm to be effective. Therefore, how to collect data efficiently and correctly label or analyze it is extremely important. If If there is a problem with one step, all subsequent work will be in vain.
This article will introduce PECC's data collection board, as well as data collection tools from GPM China, some methods for simply analyzing
data
quality, and some matters that need attention.
▲
Figure 1.PECC hardware V1.0
2. Introduction to the host computer
▲
Figure 2. Host computer main interface
2.1. Open the serial port
Click ① to find the corresponding serial port number, select the appropriate baud rate through ②, and click ③ to open the serial port. Notice:
1. If the connected device is a serial port + USB virtual serial port, you need to select the correct baud rate for normal communication.
2. If the connected device is a USB virtual serial port (such as the PECC development board), the baud rate can be selected at will.
Set the label through ④, ⑤ set the sampling rate (up to 400KHz), ⑥ set the sampling time, ⑦ select the channel (the host computer currently supports four channels), and select ⑧ to start data collection.
2.2.1. Data annotation methods and principles
1. What is set in ④ is the file label, which can also be understood as the file saving path. The python script will mark the data according to the saved file path.
2. There must be and only "Arc" and "Normal" strings in the folder, case-independent. Among them, "Arc" means that the data in this folder are all arc signals; "Normal" means that the data in this folder are all arc-free signals. See Figure 3 for an example of data annotation.
3. Collection of “Normal” signals. Arc-free signal acquisition is relatively simple. You only need to pay attention to different acquisition conditions to make the data distributed wider. In order to ensure no arc, all wire connection parts must ensure full contact to prevent poor contact from causing arcs inside the joints, which cannot be seen by the naked eye. It is best to have an oscilloscope connected to both ends of the arc drawing machine, and the voltage must be 0 to ensure There is no arc inside the arc machine joint that cannot be seen by the naked eye.
4. Collection of “Arc” signal. In order to ensure that all collected data are arc signals, it is necessary to turn on the arc drawing machine to
generate
an arc, then click ⑧ to start collecting, and then disconnect the arc drawing machine after the upper computer has completed the collection.
5. Since arcing is related to many factors, data collection should be collected under various conditions. Factors currently
known to affect the arcing effect include but are not limited to: whether there is a breaker, whether there is an optimizer, current level, inverter channel, hardware acquisition circuit, arcing distance, etc. When labeling data, these situations need to be labeled accordingly to facilitate later analysis. The current level can collect data within all current levels in steps of 2-3A.
6. Collection duration or data size. There is no clearly defined size, but it is recommended that the total duration of each current level be no less than 30 seconds.
It is not recommended to make the same condition too large, because too much data may result in insufficient memory, inability to train, or too long training time.
2.2.2.Data
folder
1. After the folder is set and the data is collected, the folder will be automatically generated in the same directory of the host computer.
2. Under the Chart page, left-clicking on the folder will update the folder path to ④ to facilitate collection.
2.2.3.
Power supply for acquisition board
The PECC board uses USB power supply and can be connected to a laptop. However, USB power supply will introduce power frequency noise. During the collection
and verification stages, the consistency of the environment must be ensured. Therefore, a unified device must be used for power supply in both the collection and verification stages and cannot be switched. computer or power supply. It is recommended to use the same computer for data collection and verification, and connect it to the power source without using batteries.
▲
Figure 3. Data annotation
▲
Figure 4. Data collection Log
2.2.4. Collect logs
1. Log information will be automatically saved in the same directory as the host computer, with the current time as the file name and .log as the end.
2. If there is packet loss or other error messages in the Log information, it is best to delete the collected and saved data. From the above data collection log, it can be seen that ①: At the 400KHz sampling rate, there is packet loss in the data. Then, you can find the data in the Chart interface and click delete to delete it. ②: 400KHz normal data collection Log.
At present, only at the 400KHz sampling rate, a small probability of packet loss has been found, and no other cases have been found. Increasing the USB communication rate in the future should solve this problem, but serial communication still has the possibility of bit error rate.
2.3.1.Data display
▲
Figure 5.Chart page
1. Click on the collected data ①, and the waveform graph will be drawn on the right.
2. ②: Time domain graph, the abscissa represents the number of sampling points, and the ordinate represents the ADC value. ③: Frequency domain graph, the abscissa represents N × sampling rate, if the sampling rate is 250KHz, then 0.5 represents 125KHz, and the ordinate represents the amplitude.
3. Set the frame length ⑤ and drag ④ to see the time domain and frequency domain graphics under different frame length windows.
2.3.2. Data comparison
▲
Figure 6. Time domain and frequency domain comparison function of the host computer software
1. Select ① trace, and then select other data files, you can compare the waveforms of different data. The picture above shows the comparison of arc and non-arc signals.
2. Check the ② time domain part: check whether there are abnormal points, whether there are sample points exceeding the maximum amplitude, whether the center point is near 2048 (the maximum value of the 12-bit ADC is 4096, and the center point is 2048). It can be seen that the hardware Is there a design issue, such as incorrect magnification, or incorrect center point. You can also compare it with the oscilloscope to see if the collected data is consistent with that on the oscilloscope to check the hardware or firmware.
3. Check ③ frequency domain part: Check whether the filtering range of the filter is correct and whether the wave limiting point is correct. As can be seen in the above figure, the data has relatively good resolving power in the relatively low frequency part.
2.4. Online identification
▲
Figure 7. Recognition results
Select ① sampling rate and ③ channel, click Start Recognition, and the development board will enter the AFCI recognition mode.
Log information similar to ④ in the figure will be output during recognition, indicating that the sampling rate has been set to 250KHz, the CH2 channel has been opened, and output The percentage information of normal and arc represents the probability of no arc and arc.
2.5. Document verification
The file verification function refers to downloading the collected raw data files in csv format to the development board for verification
to determine whether the AI function of the firmware part is normal. If the firmware and model are correct, the output result should be the same as the marked content.
▲
Figure 8. File verification
▲
Figure 9.Log information
Figure 8: In the Chart interface, select the file that needs to be verified. Double-click it and the verification dialog box will pop up. Click Yes to verify.
Figure 9: Displayed output result information and remaining verification data. The channel information can be ignored because the data is downloaded from the host computer and has nothing to do with the channel.
2.6. Communication protocol
Under the Note interface, there are corresponding serial communication protocols and Release information.
▲
Figure 10. Note interface
Data collection and annotation is the first step to make a good neural network. You need to be extra careful and careful,
otherwise
the naked eye. Too much dirty data will lead to poor model generalization ability.
One way is to use the model to verify all the data, then select the data that failed the verification result, and then plot
it out. If it is distinguished by the naked eye, the workload is very huge and cumbersome, so the validity of the data should be ensured during the data collection process.
©
THE END
Click
"Read the original text"
to learn more
Your sharing, likes, and watching
I like all of them
Featured Posts
-
The annoying computer language - Python
- Pythonisaninterpreted,object-oriented,dynamicdatatypehigh-levelprogramminglanguage;itseemsnothing!Howcoulditmakemeangry?Pleaselistenpatientlyandletmetellyoua"story".
Manyyearsago,atleastin2004,Iheardaboutthis
-
bigbat
Embedded System
-
How does this complementary multivibrator work?
- Asshowninthepicture.
TheNPNtubeQ1andthePNPtubeQ2mustbeturnedonandoffatthesametime.
Afterpower-on,currentflowsthroughresistorR1,butcapacitorC1isnotchargedandthevoltageacrossbothendsisclosetozero,soQ1
-
xljin2006
Analog electronics
Latest articlesabout